## Profitable App profiles for the App Store and Google Play Markets

Under this project, mobile apps that are profitable in App Store and Google Play store are collected. I have taken the role of data analyst to help the developers understand what kinds of apps are likely to attract more users so they can make data-driven decision with respect the the application they develop. The objective of this project is to gather 

At our firm, the apps we build are free to download and install. The main revenue is from in-app ads meaning revenue for any given app is influenced by the number of users who use the app.

**Collect and Analyze data**
 - ten thousand Android apps from Google Play ‚Äî the data was collected in August 2018
 - seven thousand iOS apps from the App Store ‚Äî the data was collected in July 2017



In [1]:
# Open the datset
from csv import reader

# Apple Store
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

# Google Play
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]


The function below will help when explor the data. This will make the output more readable.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # add a new empty line after each row
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
print(ios_header)
print('\n')
explore_data(ios,0,4, True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


Number of rows: 7197
Number of columns: 17


The Apple store dataset has 7197 apps and has 17 variables. The column that could be useful for the analysis are `track_name`, `currency`, `price`, `rating_count_tot`, `rating_count_ver` and `prime_genre`. The columns are not very descriptive by themself. 

Check [AppleStore Column descriptions](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)

In [4]:
print(android_header)
print('\n')
explore_data(android,0,4, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13


For Android, there are 10841 apps and 13 columns. The columns that could be useful for analysis are `App`,`Category`,`Rating`,`Reviews`,`Type`, `Price`, and`Genres`

### Data Cleaning ###
Noted that our company only builds free app for English speaking users, thus we will need to delete the non-free apps and the apps that are not in english.

#### Deleting Wrong Data ####
The Google play dataset has a discussion board on Kaggle.com [Discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion). There is some error on the row 10472.

In [5]:
print(android[10472]) # the row with problem
print('\n')
print(android_header)
print(android[0])     # the correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


The row 10472 is for Life Made Wi-fi..., its rating is 19 which is wrong since the maximum rating an app can reveice is 5.

In [6]:
print(len(android))
del android[10472]
print(len(android))  # To check if the row is deleted.

10841
10840


#### Dealing with duplicates ####
First, finding the duplicates from one of the apps. We can see that Instagram has more than 1 entry.

In [7]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Finding all the apps that has duplicates

In [8]:
duplicate_apps = []
unique_apps = []

for app in android:
    name=app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
    
print(len(duplicate_apps))
print(duplicate_apps[:15])

1181
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


There are 1181 duplicates entries. We want to keep just 1 entry for each app. One way to do this is to remove all the duplicate, however there are so many more better way to deal with the duplicates.

From the instagram data, the difference between each line is the number of reviews. It is assumably that the highest review is the newest data collected. Therefore, instead of remove the duplicates, the row with the highest number of reviews will be kept and the other entries will be removed.

To filter out the recent reviews:
 * Create a dictionary where each key is a unique app name, the value is the highest number of reviews of that app
 * Create a new data set using the dictionary. A dataset with only one entry per app.
 


In [9]:
ios_duplicate_apps = []
ios_unique_apps = []

for app in ios:
    name=app[2]
    if name in ios_unique_apps:
        ios_duplicate_apps.append(name)
        ios_unique_apps.append(name)
    
print(len(ios_duplicate_apps))
print(ios_duplicate_apps[:15])

0
[]


**Building Dictionary**

In [10]:
review_max={}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in review_max and review_max[name] < n_reviews:
        review_max[name] = n_reviews
    
    elif name not in review_max:
        review_max[name] = n_reviews


Before, it is found that there are 1181 duplicates, so the length of the review max will be the difference between the length of the dataset - 1181

In [11]:
print('Expexcted:',len(android)-1181)
print('Actual:', len(review_max))

Expexcted: 9659
Actual: 9659


The duplicates will be removed 

if the number of reviews of the current app is in the number of reviews of that apps described in the review_max above.

If the name of the app is not already in the already_added list, it will be added. This is for the case when there are multiple entries with the same highest number of reviews

In [12]:
android_clean = []
already_added = []
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (review_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)


Explore after removing the duplicates. There should be 9,659 rows left

In [13]:
explore_data(android_clean, 0 ,4 , True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9659
Number of columns: 13


### Removing Non-English Apps ###

In [14]:
# Examples of some of the apps with non English 
print(android_clean[4412][0])
print(android_clean[7940][0])

‰∏≠ÂõΩË™û AQ„É™„Çπ„Éã„É≥„Ç∞
ŸÑÿπÿ®ÿ© ÿ™ŸÇÿØÿ± ÿ™ÿ±ÿ®ÿ≠ DZ


One way to deal with this is to remove any apps that contains a symbol that is not commonly used in English text.

English text usually includes letters from English alphabet, numbers, punctuation marks (ie. . , ! ? ;) and other symbols (ie. + * /). These characters that are encoded using ASCII standard had a corresponding number between 0 to 127 which mean those out of the range of 0-127 are non-ASCII

In [15]:
def is_English(string):
    
    for character in string:
        if ord(character) > 127: #use ord() to find the corresponding number
            return False
    return True

# Cheack if the function work
print(is_English('Instagram'))
print(is_English('‰∏≠ÂõΩË™û AQ„É™„Çπ„Éã„É≥„Ç∞'))


True
False


The function works fine but there are some English app names that use emojis and other symbols like (‚Ñ¢, ‚Äî (em dash), ‚Äì (en dash), etc.), these fall outside the ASCII range. Using the function will remove these apps as well.

This is the way to find what is the number corresponding to a character.

In [16]:
print(ord('‚Ñ¢'))
print(ord('üòú'))

8482
128540


To minimize the impact of using is_English function created above, the app will be removed only when the name contains more than 3 non-ASCII characters.

In [17]:
def is_English(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
        
    if non_ascii > 2:
        return False
    else:
        return True
    
        
print(is_English('Docs To Go‚Ñ¢ Free Office Suite'))
print(is_English('Instachat üòú'))
print(is_English('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))
print(is_English('Instagram'))
print(is_English('‰∏≠ÂõΩË™û AQ„É™„Çπ„Éã„É≥„Ç∞'))

True
True
False
True
False


In [18]:
print(ord('Ê¨¢'))

27426


With this function, a few of the non-English app will still able to pass the filter, but it is good enough at this point.

**Filtering**

In [19]:
android_eng = []
ios_eng = []

for app in android_clean:
    name = app[0]
    if is_English(name):
        android_eng.append(app)
        
for app in ios:
    name = app[2]
    if is_English(name):
        ios_eng.append(app)
        
explore_data(android_eng,0,3,True)
print('\n')
explore_data(ios_eng,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9597
Number of columns: 13


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '1

### Isolating the Free Apps###
Free and non-free apps are filtered using price condition == '0'. 

In [20]:
android_final = [] 
ios_final = [] 

for apps in android_eng:
    price = apps[7]
    if price == '0':
        android_final.append(apps)
    
for apps in ios_eng:
    price = apps[5]
    if price == '0':
        ios_final.append(apps)


print('Andriod:',len(android_final))
print('iOS:',len(ios_final))

print(android_header)
print(ios_header)

Andriod: 8848
iOS: 3203
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


After isolation, there are 8848 apps left for android and 3203 for ios

### Analysis Part ###
Recall that the objective of this project is to determine the kinds of apps that attrct users more since that is what influences the revenue.

To minimize the risk and over head, the company validation strategy for an app idea is:
1. Build minimal android version of the app, and add it to Google Play.
2. If the app has a good response from users then develop further
3. If the app is profitable after 6 months, then build an iOS version and add it to the App Store.
    
Since the company would like to launch the app in both markets, it is better to find app profiles that are successful in both stores.

#### Most common genres ####
For Android the genre is 9.
For iOS the genre is 12.

In [21]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentage = {}
    for key in table:
        percentage = (table[key]/total)*100
        table_percentage[key] = percentage
        
    return table_percentage

def display_table(dataset, index):
    table=freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key],key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':' , entry[0])

In [22]:
print('iOS store')
display_table(ios_final,12)

iOS store
Games : 58.25788323446769
Entertainment : 7.836403371838902
Photo & Video : 4.995316890415236
Education : 3.6840462066812365
Social Networking : 3.3093974399000934
Shopping : 2.5913206369029034
Utilities : 2.466437714642523
Sports : 2.1542304089915705
Music : 2.0605682172962845
Health & Fitness : 2.0293474867311896
Productivity : 1.7483609116453322
Lifestyle : 1.5610365282547611
News : 1.3424914142990947
Travel : 1.248829222603809
Finance : 1.0927255697783327
Weather : 0.8741804558226661
Food & Drink : 0.8117389946924758
Reference : 0.5307524196066188
Business : 0.5307524196066188
Book : 0.3746487667811427
Navigation : 0.18732438339057134
Medical : 0.18732438339057134
Catalogs : 0.1248829222603809


Based on the data, more than 58% of the free apps are games followed by Entertainment and Photo&Video. There is only 3.56% for education and 3.3% for social networking.

It is observable that in App store apps with practical purpose are rare. However, high numbers of fun apps doesn't mean there are high demand

In [23]:
print('Google Play:','Category')
display_table(android_final,1)
print('\n')
print('Google Play:', 'Genres')
display_table(android_final, 9)


Google Play: Category
FAMILY : 18.942133815551536
GAME : 9.697106690777577
TOOLS : 8.453887884267631
BUSINESS : 4.599909584086799
PRODUCTIVITY : 3.899186256781193
LIFESTYLE : 3.887884267631103
FINANCE : 3.7070524412296564
MEDICAL : 3.5375226039783
SPORTS : 3.390596745027125
PERSONALIZATION : 3.322784810126582
COMMUNICATION : 3.2323688969258586
HEALTH_AND_FITNESS : 3.0854430379746836
PHOTOGRAPHY : 2.949819168173599
NEWS_AND_MAGAZINES : 2.802893309222423
SOCIAL : 2.667269439421338
TRAVEL_AND_LOCAL : 2.3395117540687163
SHOPPING : 2.2490958408679926
BOOKS_AND_REFERENCE : 2.1360759493670884
DATING : 1.8648282097649187
VIDEO_PLAYERS : 1.7970162748643763
MAPS_AND_NAVIGATION : 1.3901446654611211
FOOD_AND_DRINK : 1.2432188065099457
EDUCATION : 1.164104882459313
ENTERTAINMENT : 0.9606690777576853
LIBRARIES_AND_DEMO : 0.9380650994575045
AUTO_AND_VEHICLES : 0.9267631103074141
HOUSE_AND_HOME : 0.8024412296564195
WEATHER : 0.7911392405063291
EVENTS : 0.7120253164556962
PARENTING : 0.6555153707052441

It is different for Google Store. There are not as many as fun apps under Google play store. A good number of apps in the store are designed for practical uses.
Noted that if one further explore on the family category which accounts for 18% (highest), there are mostly games for kids.

It is not clear what is the difference betweehn genre and categories. It only shows that the genres has higher granularity.

#### Most popular Apps by Genre on the App Store ####
To find the popularity, we can look at the average of installs for each app genre. For Google Play it is under installs column 5. However, the install for iOS is not provided directly but we can get it from rating_count_tot colum 7. This is the total number of rating.

To find the average number of user ratings per app genre on the App Store
 - Isolate apps for each genres in iOS
 - Sum up the user ratings for the apps of that genre
 - Divide the sum by the number of apps belonging to that genre(not by the total number of the apps)

In [24]:
print(ios_header[-5])
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0 # store the sum of number of user ratings
    len_genre = 0 # store the number of apps for each genre
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[6])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total/len_genre
    print(genre, ':' , avg_n_ratings)


prime_genre
Productivity : 21028.410714285714
Weather : 52279.892857142855
Shopping : 27230.734939759037
Reference : 79350.4705882353
Finance : 32367.02857142857
Music : 57326.530303030304
Utilities : 19156.493670886077
Travel : 28243.8
Social Networking : 71548.34905660378
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Games : 22886.36709539121
Food & Drink : 33333.92307692308
News : 21248.023255813954
Book : 46384.916666666664
Photo & Video : 28441.54375
Entertainment : 14195.358565737051
Business : 7491.117647058823
Lifestyle : 16815.48
Education : 7003.983050847458
Navigation : 86090.33333333333
Medical : 612.0
Catalogs : 4004.0


From the result, iOS, Reference, Social Networking, and Navigation have the most average number of user ratings.

Digging deeper into the 'Navigation category', most of the ratings are from Waze and Google maps

In [25]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[2],':', app[6]) # print name and number of rating

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Geocaching¬Æ : 12811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS ‚Äì Car Navigation & Offline Maps : 3582
Google Maps - Navigation & Transit : 154911


Let's look at Social Networking. The same pattern also shown. The high number of average is influenced by only some apps like Facebook, Skype, Tumblr, and etc.

For Social network, Navigation and Music apps, the average numbers do not seem to be useful as the average is influenced by only some big players. In other words, they seem to be more popular than they actually are. To get a clearer picture we can remove these extremely popular apps for each genre and recalculate the average.

In [26]:
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[2],':',app[6])


Facebook : 2974676
LinkedIn : 71856
Skype for iPhone : 373519
Tumblr : 334293
Match‚Ñ¢ - #1 Dating App. : 60659
WhatsApp Messenger : 287589
TextNow - Unlimited Text + Calls : 164963
Grindr - Gay and same sex guys chat, meet and date : 23201
imo video calls and chat : 18841
Ameba : 269
Weibo : 7265
Badoo - Meet New People, Chat, Socialize. : 34428
Kik : 260965
Qzone : 1649
Fake-A-Location Free ‚Ñ¢ : 354
Tango - Free Video Call, Voice and Chat : 75412
MeetMe - Chat and Meet New People : 97072
SimSimi : 23530
Viber Messenger ‚Äì Text & Call : 164249
Find My Family, Friends & iPhone - Life360 Locator : 43877
Weibo HD : 16772
POF - Best Dating App for Conversations : 52642
GroupMe : 28260
Lobi : 36
WeChat : 34584
ooVoo ‚Äì Free Video Call, Text and Voice : 177501
Pinterest : 1061624
Áü•‰πé : 397
Qzone HD : 458
Skype for iPad : 60163
LINE : 11437
QQ : 9109
LOVOO - Dating Chat : 1985
QQ HD : 5058
Messenger : 351466
eHarmony‚Ñ¢ Dating App - Meet Singles : 11124
YouNow: Live Stream Video Chat :

The average of 79,350 for references genres is also influenced by Bible and Dictionary apps.

In [27]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[2],':',app[6])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pok√©mon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


**Google Play most popular Apps by Genre**

In [28]:
display_table(android_final, 5)

1,000,000+ : 15.75497287522604
100,000+ : 11.539330922242314
10,000,000+ : 10.567359855334539
10,000+ : 10.194394213381555
1,000+ : 8.39737793851718
100+ : 6.928119349005425
5,000,000+ : 6.826401446654612
500,000+ : 5.560578661844485
50,000+ : 4.769439421338156
5,000+ : 4.486889692585895
10+ : 3.5375226039783
500+ : 3.2436708860759493
50,000,000+ : 2.2830018083182644
100,000,000+ : 2.1360759493670884
50+ : 1.9213381555153706
5+ : 0.7911392405063291
1+ : 0.5085895117540687
500,000,000+ : 0.27124773960216997
1,000,000,000+ : 0.22603978300180833
0+ : 0.045207956600361664
0 : 0.011301989150090416


The install numbers recorded in Google Play data do not precise enough. It is not clear wheather the apps with 1,000+ intalls have 1000 installs, 5000 installs or 100k installs. But, for the purpose of this project, we only need to find out which apps attract the most users so it is not neccessary to get the exact number.

To do computation on the number of installs, we will have to remove the characters that prevent the conversion from string to float. Those characters are , , + , etc. *str.replace(old,new)* will be used.

In [29]:
print(android_header[1])
# Category is at index =1
category_android = freq_table(android_final, 1)

for category in category_android:
    total = 0           # store the sum of installs for each genre
    len_category = 0    # store the number of apps specific to each genre
    for apps in android_final:
        category_app = apps[1]
        if category_app == category:
            n_install = apps[5]
            n_install = n_install.replace('+',"")
            n_install = n_install.replace(',',"")
            n_install = float(n_install)
            total += n_install
            len_category += 1
    avg_install_android = total / len_category
    print(category, ':', avg_install_android)

Category
ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8814199.78835979
BUSINESS : 1712290.1474201474
COMICS : 832613.8888888889
COMMUNICATION : 38590581.08741259
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1360598.042253521
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1446158.2238372094
GAME : 15544014.51048951
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3650602.276666667
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10830251.970588235
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5145550.285714285
VIDEO_PLAYERS : 24727872.452830188
NEWS_

On average, communication apps have the most install (38,590,581), which presumably is skewed up by a few apps like WhatsApps, Facebook, Skype, and Gmail. Let's check if this is true.

In [30]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Me

If these over 100+ were to be removed, the average would drastically fall

In [31]:
under_100_m = []

for app in android:
    n_installs = app[5]
    n_installs = n_installs.replace(',',"")
    n_installs = n_installs.replace('+',"")
    if float(n_installs) < 100000000:
        under_100_m.append(float(n_installs))
        
sum(under_100_m)/ len(under_100_m)

3177694.7371129016

From 38m+ installs, removing the 100+ installs app, the average is down to 3m installs. The same pattern as in iOS market is seen in Google Play Market as well. Though the Games genre is pretty popular, it is seemed to be saturated. So, it would be better to come up with a different reccomendation.

Let's explore BOOKS_AND_REFERENCE genres where it gains some popularity in iOS as well.


In [32]:
for app in android_final:
    if app[1] =='BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0],':',app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad üìñ Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


For Book and Reference, there are only a few apps that inflated the average install. There's some potential in this market. It should be helpful to get some ideas what are the apps in the genre. Let's explore those with popularity of 1,000,000 and 100,000,000 instals

In [33]:
for app in android_final:
    if app[1] =='BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                      or app[5] == '5,000,000+'
                                      or app[5] == '10,000,000+'
                                      or app[5] == '50,000,000+'):
        print(app[0],':',app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+

The genre seems to be dominated by spftware and processing and reading ebooks as well as libraries and dictionaries therefore, it would be a good idea not build something similar to these apps as they are more competitive.

It is also seeable that there are a number of apps building around Quran, this suggests that building an app around a popular book can be profitable. However, there is alreayd so many libraries in the market, therefore adding some special feature to the books would be a nice idea to differentiate the app.