# Profitable Apps That Attracts More Users

We only build aps that are free to download and install, and our main source of revenue consists of in-app ads. Source of revenue is mostly influenced by the number of users who uses the app.

The goal for this project is to analyze data to help developers understand what kinds of apps are likely to attract more users to increase our revenue.

In [1]:
from csv import reader
# Open AppleStore Data and Separate Data from Header
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
apple_list = list(read_file)
apple_header = apple_list[0]
apple_data = apple_list[1:]

# Open AndroidStore Data and Separate Data from Header
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android_list = list(read_file)
android_header = android_list[0]
android_data = android_list[1:]

In [2]:
# Function to return wanted data interval and number of rows and columns based on user's request
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    print('Number of rows:', len(dataset))
    #print('Number of columns:', len(dataset[0]))

print(apple_header)
print('\n')
explore_data(apple_data, 0, 3, True)
print('\n')

print(android_header)
print('\n')
explore_data(android_data, 0, 3, True)
print('\n')

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.

In [3]:
print(android_header)
print('\n')
# The data with incorrect information. It is missing data and has columns shifted.
print(android_data[10472])
print('\n')
print('Number of Android Data Before Deletion of Invalid Row', len(android_data))
del android_data[10472]
print('Number of Android Data After Deletion of Invalid Row', len(android_data))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Number of Android Data Before Deletion of Invalid Row 10841
Number of Android Data After Deletion of Invalid Row 10840


Row 10472 had invalid data. It was missing the category section which resulted in the shift of other columns. It caused the data to have rating of 19, which is not possible since the highest rating in the app store is 5.

In [4]:
# Finding Duplicate Data In Google Play Store
unique_list = []
duplicate_list = []

for row in android_data:
    name = row[0]
    if name in unique_list:
        duplicate_list.append(name)
    else:
        unique_list.append(name)

print('Number of dupliate apps:', len(duplicate_list))

Number of dupliate apps: 1181


After finding the duplicate apps, we will try to keep the one with the latest updated information. This can be based on number of installs. The data with the most installs will be kept. 

This can be done by creating a dictionary
* Create a dictionary where each key is the unique app name, and value is the number of installs of that app
* Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of installs)

Android data should only include unique apps. To count the number of unique apps, we need to substract the duplicate apps from the android data.

In [5]:
print('Expected number of unique apps:', len(android_data) - 1181)

Expected number of unique apps: 9659


Let's start by building the dictionary to keep only the apps with the most amount of installs. The dictionary will loop through each app and overwrite the existing value for the app name when it finds the number of installs with higher number than the existing value.

In [6]:
reviews_max = {}

for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Actual Number of unique apps:', len(reviews_max))        

Actual Number of unique apps: 9659


Now, let's use the reviews_max dictionary to remove the duplicates. For the duplicate cases, we'll only keep the entries with the highest number of reviews. In the code cell below:

We start by initializing two empty lists, android_clean and already_added.
We loop through the android data set, and for every iteration:
We isolate the name of the app and the number of reviews.
We add the current row (app) to the android_clean list, and the app name (name) to the already_cleaned list if:
The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and
The name of the app is not already in the already_added list. **We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.**

In [7]:
android_clean = []
already_added = []
for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

In [8]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659


We have 9659 apps as expected.

We will remove non-english apps by using the [ASCII](https://en.wikipedia.org/wiki/ASCII) system. English uses number that is equal to or less than 127 so we will use this number to create a condition to figure out if the language is english or not. 

In [9]:
def check_language(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True

print(check_language('Instagram'))
print(check_language('안녕'))
print(check_language('Instachat 😜'))
print(check_language('Docs To Go™ Free Office Suite'))

True
False
False
False


The function checks the language to a certain point but because it doesn't recognize some symbols and emojis, it returns false even if the app is using english. To prevent some data loss of these english apps that inclues certain symbols, we'll only remove an app if its name has more than three characters with corresponding numbers failing outside the ASCII range.

In [10]:
def check_language(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii +=1
    if non_ascii > 3:
        return False
    else:
        return True

print(check_language('Instagram'))
print(check_language('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_language('Instachat 😜'))
print(check_language('Docs To Go™ Free Office Suite'))

True
False
True
True


In [11]:
android_english_app = []
android_nonenglish_app = []
for app in android_clean:
    name = app[0]
    if check_language(name):
        android_english_app.append(app)
    else:
        android_nonenglish_app.append(app)

ios_english_app = []
ios_nonenglish_app = []
for app in apple_data:
    name = app[1]
    if check_language(name):
        ios_english_app.append(app)
    else:
        ios_nonenglish_app.append(app)
        
explore_data(android_english_app, 0, 3, True)
print('\n')
explore_data(android_nonenglish_app, 0, 3, True)
print('\n')
explore_data(ios_english_app, 0, 3, True)   
print('\n')
explore_data(ios_nonenglish_app, 0, 1, True)  
print('\n')

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614


['Flame - درب عقلك يوميا', 'EDUCATION', '4.6', '56065', '37M', '1,000,000+', 'Free', '0', 'Everyone', 'Education', 'July 26, 2018', '3.3', '4.1 and up']


['သိင်္ Astrology - Min Thein Kha BayDin', 'LIFESTYLE', '4.7', '2225', '15M', '100,000+', 'Free', '0', 'Everyone', 'Lifestyle', 'July 26, 2018', '4.2.1', '4.0.3 and up']


['РИА Новости', 'NEWS_AND_MAGAZINES', '4.5', '44274', '8.0M', '1,000,000+', 'Free', '0', 'E

In [12]:
android_free_app = []
ios_free_app = []
for app in android_english_app:
    price = app[7]
    if price == '0':
        android_free_app.append(app)
for app in ios_english_app:   
    price = app[4]
    if price == '0.0':
        ios_free_app.append(app)

print('Number of Free Android Apps:', len(android_free_app))        
print('Number of Free iOS Apps:', len(ios_free_app))

Number of Free Android Apps: 8864
Number of Free iOS Apps: 3222


Among the cleaned data, we need to find the most popular genre in each app store. This information will provide us with the knowledge of which genre is the most popular and possibly most profitable.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we then develop it further.
3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

In [16]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

def freq_table(dataset, index): 
    frequency_table = {}
    total = 0
    for app in dataset:
        total += 1
        value = app[index]
        if value in frequency_table:
            frequency_table[value] +=1
        else:
            frequency_table[value] = 1

    freq_table_percentages = {}
    for key in frequency_table:
        percentage = (frequency_table[key] / total) * 100
        freq_table_percentages[key] = percentage 
    
    return freq_table_percentages

We will now analyze the prime_genre column of the iOS App Store.

In [24]:
print('iOS Prime Genres')
print('---------------------------')
display_table(ios_free_app, -5)

iOS Prime Genres
---------------------------
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

We will now analyze the Category and Genres column of the Google Play Data Set

In [26]:
print('Android Genres')
print('---------------------------')
display_table(android_free_app, -4)
print('\n')
print('Android Category')
print('---------------------------')
display_table(android_free_app, 1)

Android Genres
---------------------------
Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718

The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

To figure out which genre is the most popular, we can calculate the average number of installs or the average number of user ratings for the app. To calculate the average number of user ratings for each genre, we'll need to use a for loop inside of another for loop. The first for loop will loop through each genre in the dictionry. The second for loop will loop through the genres in the dataset to match the genre in the dictionary. If the match is found, it will tally up the ratings and the number of apps in that genre to calculate the average rating.

In [28]:
ios_genres = freq_table(ios_free_app, -5)

for genre in ios_genres:
    total = 0
    len_genre = 0
    for app in ios_free_app:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

News : 21248.023255813954
Lifestyle : 16485.764705882353
Navigation : 86090.33333333333
Music : 57326.530303030304
Utilities : 18684.456790123455
Catalogs : 4004.0
Business : 7491.117647058823
Productivity : 21028.410714285714
Shopping : 26919.690476190477
Finance : 31467.944444444445
Photo & Video : 28441.54375
Health & Fitness : 23298.015384615384
Food & Drink : 33333.92307692308
Book : 39758.5
Travel : 28243.8
Entertainment : 14029.830708661417
Games : 22788.6696905016
Social Networking : 71548.34905660378
Medical : 612.0
Sports : 23008.898550724636
Education : 7003.983050847458
Reference : 74942.11111111111
Weather : 52279.892857142855


We can see that navigation app has the most number of ratings so let's analyze what is included in that genre. We will also look at the Entertainment genre because it is a very popular category in the Android app store. It seems like it is uncharacteristic for this genre to have such a little number of ratings so we'll see if there are potential in this genre.

In [44]:
for app in ios_free_app:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings
print('\n')                
for app in ios_free_app:
    if app[-5] == 'Entertainment':
        print(app[1], ':', app[5])        

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Netflix : 308844
Fandango Movies - Times + Tickets : 291787
Colorfy: Coloring Book for Adults : 247809
IMDb Movies & TV - Trailers and Showtimes : 183425
TRUTH or DARE!!! - FREE : 171055
Mad Libs : 117889
Twitch : 109549
Action Movie FX : 101222
Voice Changer Plus : 98777
iFunny :) : 98344
The CW : 97368
The Moron Test : 88613
DIRECTV : 81006
ABC – Watch Live TV & Stream Full Episodes : 78890
Xbox : 72187
Redbox : 60236
Talking Tom Cat 2 for iPad : 56399
Hulu: Watch TV Shows & Stream the Latest Movies : 56170
NBC – Watch Now and Stream Full TV Episodes : 55950
Emoji> : 55338
DIRECTV App for iPad : 47506
Amazon Prime Video : 43667
CBS Full Episodes and Live TV : 39436
FOX NOW - Watch Full Episodes and Stream Live TV : 39391
Talking Angel

The Navigation genre in the Apple app store seems very skewed because it is heavily dominated by two apps(Waze and Google Maps). The Entertainment genre seems to have potential because they are less skewed and iOS app store has such a little number of reviews compared to Android app store.

In [32]:
android_category = freq_table(android_free_app, 1)

for category in android_category:
    total = 0
    len_category = 0
    for app in android_free_app:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

DATING : 854028.8303030303
BEAUTY : 513151.88679245283
HOUSE_AND_HOME : 1331540.5616438356
MEDICAL : 120550.61980830671
ENTERTAINMENT : 11640705.88235294
PERSONALIZATION : 5201482.6122448975
TOOLS : 10801391.298666667
SHOPPING : 7036877.311557789
TRAVEL_AND_LOCAL : 13984077.710144928
EDUCATION : 1833495.145631068
EVENTS : 253542.22222222222
GAME : 15588015.603248259
PRODUCTIVITY : 16787331.344927534
BOOKS_AND_REFERENCE : 8767811.894736841
HEALTH_AND_FITNESS : 4188821.9853479853
FAMILY : 3695641.8198090694
BUSINESS : 1712290.1474201474
COMMUNICATION : 38456119.167247385
AUTO_AND_VEHICLES : 647317.8170731707
PARENTING : 542603.6206896552
FINANCE : 1387692.475609756
SOCIAL : 23253652.127118643
WEATHER : 5074486.197183099
FOOD_AND_DRINK : 1924897.7363636363
ART_AND_DESIGN : 1986335.0877192982
NEWS_AND_MAGAZINES : 9549178.467741935
PHOTOGRAPHY : 17840110.40229885
MAPS_AND_NAVIGATION : 4056941.7741935486
SPORTS : 3638640.1428571427
LIBRARIES_AND_DEMO : 638503.734939759
COMICS : 817657.272727

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs. Let's analyze the communication category in the android app store.

In [35]:
for app in android_free_app:
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

In [37]:
for app in android_free_app:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [39]:
under_100_m = []

for app in android_free_app:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

If we removed communications app that has over 100m installs, the average number of install is reduced drastically because it is heavily skewed by few apps. 

In [40]:
for app in android_free_app:
    if app[1] == 'ENTERTAINMENT':
        print(app[0], ':', app[5])

Complete Spanish Movies : 1,000,000+
Pluto TV - It’s Free TV : 1,000,000+
Mobile TV : 10,000,000+
TV+ : 5,000,000+
Digital TV : 5,000,000+
Motorola Spotlight Player™ : 10,000,000+
Vigo Lite : 5,000,000+
Hotstar : 100,000,000+
Peers.TV: broadcast TV channels First, Match TV, TNT ... : 5,000,000+
The green alien dance : 1,000,000+
Spectrum TV : 5,000,000+
H TV : 5,000,000+
StarTimes - Live International Champions Cup : 1,000,000+
Cinematic Cinematic : 1,000,000+
MEGOGO - Cinema and TV : 10,000,000+
Talking Angela : 100,000,000+
DStv Now : 5,000,000+
ivi - movies and TV shows in HD : 10,000,000+
Radio Javan : 1,000,000+
Talking Ginger 2 : 50,000,000+
Girly Lock Screen Wallpaper with Quotes : 5,000,000+
🔥 Football Wallpapers 4K | Full HD Backgrounds 😍 : 1,000,000+
Movies by Flixster, with Rotten Tomatoes : 10,000,000+
Low Poly – Puzzle art game : 1,000,000+
BBC Media Player : 10,000,000+
Amazon Prime Video : 50,000,000+
Adult Glitter Color by Number Book - Sandbox Pages : 1,000,000+
IMDb M

The Entertainment category seems to be dominated by streaming services. There are some niche apps, such as drawing, that has over 10 million installs but not many. There are a lot of apps that are targeted towards kids. 

In [41]:
for app in android_free_app:
    if app[1] == 'ENTERTAINMENT' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Hotstar : 100,000,000+
Talking Angela : 100,000,000+
IMDb Movies & TV : 100,000,000+
Talking Ben the Dog : 100,000,000+
Netflix : 100,000,000+


Entertainment apps that are extremely popular with over 100m installs are pretty balanced. There are couple streaming apps along with review and information providing app for movies and tv shows. Interestingly, there are also apps that doesn't provide streaming services. Talking angela and Talking Ben The Dog are interactive apps for kids. From these top apps in this category, it shows that ENTERTAINMENT category is used by Adults and Kids at the same time.

In [42]:
for app in android_free_app:
    if app[1] == 'ENTERTAINMENT' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Complete Spanish Movies : 1,000,000+
Pluto TV - It’s Free TV : 1,000,000+
Mobile TV : 10,000,000+
TV+ : 5,000,000+
Digital TV : 5,000,000+
Motorola Spotlight Player™ : 10,000,000+
Vigo Lite : 5,000,000+
Peers.TV: broadcast TV channels First, Match TV, TNT ... : 5,000,000+
The green alien dance : 1,000,000+
Spectrum TV : 5,000,000+
H TV : 5,000,000+
StarTimes - Live International Champions Cup : 1,000,000+
Cinematic Cinematic : 1,000,000+
MEGOGO - Cinema and TV : 10,000,000+
DStv Now : 5,000,000+
ivi - movies and TV shows in HD : 10,000,000+
Radio Javan : 1,000,000+
Talking Ginger 2 : 50,000,000+
Girly Lock Screen Wallpaper with Quotes : 5,000,000+
🔥 Football Wallpapers 4K | Full HD Backgrounds 😍 : 1,000,000+
Movies by Flixster, with Rotten Tomatoes : 10,000,000+
Low Poly – Puzzle art game : 1,000,000+
BBC Media Player : 10,000,000+
Amazon Prime Video : 50,000,000+
Adult Glitter Color by Number Book - Sandbox Pages : 1,000,000+
Twitch: Livestream Multiplayer Games & Esports : 50,000,000

Most apps within 1,000,000+ and 50,000,000+ are streaming apps. There are some variation because some picture and image manipulating tools but it is heavily dominated by streaming services. 

# Conclusion
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

Although Communication and Navigation category has the most number of installs in google play store and iOS store, it seems like the Entertainment category has more potential for success. It is able to target both adults and kids and is less dominated by couple apps. It seems like users download multiple Entertainment apps for their needs compared to Communication and Navigation where users only use the top apps in the store.