# Info about profitable apps on AppStore and Google Play Markets

The purpose of this test is to find the profitable apps on App Store and Google Play.

We are looking only at the free apps and in English.

In [1]:
from csv import reader

# Import App Store Data
data_file = open('AppleStore.csv')
data_read = reader(data_file)
app_store_all = list(data_read)
app_store_header = app_store_all[0]
app_store = app_store_all[1:]

# Import Google Play Data
data_file = open('googleplaystore.csv')
data_read = reader(data_file)
gplay_data = list(data_read)
gplay_header = gplay_data[0]
gplay = gplay_data[1:]

Below is the helper function `explore_data` which can be used to repeatedly  print rows in a more readable way.

The function also can return the number of rows and columns

Parameters:
- `dataset` - the data set as a list of lists read from the csv file
- `start` - int, start row
- `end` - int, end row
- `rows_and_columns` - bool, flag to signal if we want the function to return the number of rows and column also. Default **False** 


In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each rowb

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
print(app_store_header)
print('\n')
explore_data(app_store, 0, 3, 1)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


In [4]:
print(gplay_header)
print('\n')
explore_data(gplay, 0, 3, 1)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [5]:
print(gplay_header)
print('\n')
#explore_data(gplay, 10472, 10474)

print(list(zip(gplay_header, gplay[0])))
print('\n')
print(list(zip(gplay_header, gplay[10472])))


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


[('App', 'Photo Editor & Candy Camera & Grid & ScrapBook'), ('Category', 'ART_AND_DESIGN'), ('Rating', '4.1'), ('Reviews', '159'), ('Size', '19M'), ('Installs', '10,000+'), ('Type', 'Free'), ('Price', '0'), ('Content Rating', 'Everyone'), ('Genres', 'Art & Design'), ('Last Updated', 'January 7, 2018'), ('Current Ver', '1.0.0'), ('Android Ver', '4.0.3 and up')]


[('App', 'Life Made WI-Fi Touchscreen Photo Frame'), ('Category', '1.9'), ('Rating', '19'), ('Reviews', '3.0M'), ('Size', '1,000+'), ('Installs', 'Free'), ('Type', '0'), ('Price', 'Everyone'), ('Content Rating', ''), ('Genres', 'February 11, 2018'), ('Last Updated', '1.0.19'), ('Current Ver', '4.0 and up')]


Above we can see that at index 10472 the information for 'Life Made WI-Fi Touchscreen Photo Frame' is innacurate making the index entry uselss, needs to be removed.

In [6]:
if len(gplay) >= 10472:
    gplay.pop(10472)
else:    
    print("Index Out of Range") 

In [7]:
app_names = []
for app in app_store:
    name = app[1]
    app_names.append(name)

if len(app_names) == len(set(app_names)):
    print('No duplicates')
else:
    print('We have duplicates')

We have duplicates


Seems that some apps are duplicated. 

We will build a function that checks for duplicated apps, based on the app name ( index 0 for google play)

In [8]:
def has_name_duplicates(data_set, name_index):
    app_names = []
    
    for app in data_set:
        name = app[name_index]
        app_names.append(name)
        
    if len(app_names) == len(set(app_names)):
        return False
    else:
        return True

In [9]:
#has_name_duplicates(gplay, 0)
print(has_name_duplicates(app_store, 1))
print(has_name_duplicates(gplay, 0))


True
True


In [10]:
def get_duplicates(data_set, name_index):
    duplicated_apps = []
    unique_apps = []

    for app in data_set:
        name = app[name_index]
        if name in unique_apps:
            duplicated_apps.append(name)
        else:
            unique_apps.append(name)

    return duplicated_apps

In [11]:
print(get_duplicates(app_store,1))

['Mannequin Challenge', 'VR Roller Coaster']


In [12]:
for app in app_store:
    name = app[1]
    if name == 'Mannequin Challenge' or name == 'VR Roller Coaster' :
        print(list(zip(app_store_header,app)))
        print(app_store.index(app))

[('id', '1173990889'), ('track_name', 'Mannequin Challenge'), ('size_bytes', '109705216'), ('currency', 'USD'), ('price', '0.0'), ('rating_count_tot', '668'), ('rating_count_ver', '87'), ('user_rating', '3.0'), ('user_rating_ver', '3.0'), ('ver', '1.4'), ('cont_rating', '9+'), ('prime_genre', 'Games'), ('sup_devices.num', '37'), ('ipadSc_urls.num', '4'), ('lang.num', '1'), ('vpp_lic', '1')]
2948
[('id', '952877179'), ('track_name', 'VR Roller Coaster'), ('size_bytes', '169523200'), ('currency', 'USD'), ('price', '0.0'), ('rating_count_tot', '107'), ('rating_count_ver', '102'), ('user_rating', '3.5'), ('user_rating_ver', '3.5'), ('ver', '2.0.0'), ('cont_rating', '4+'), ('prime_genre', 'Games'), ('sup_devices.num', '37'), ('ipadSc_urls.num', '5'), ('lang.num', '1'), ('vpp_lic', '1')]
4442
[('id', '1178454060'), ('track_name', 'Mannequin Challenge'), ('size_bytes', '59572224'), ('currency', 'USD'), ('price', '0.0'), ('rating_count_tot', '105'), ('rating_count_ver', '58'), ('user_rating', 

Since we had only two duplicated apps in the app store, it's relativly easy to compare them. It seems they are meant to be different apps, maybe running on different iPads .

In [13]:
print(len(get_duplicates(gplay,0)))

1181


We have a lot of dupicated apps in Google Play, more precise 1181. We will print a slice and check the differences for one.

In [14]:
duplicated_apps = get_duplicates(gplay,0)
print(duplicated_apps[:5])

['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


In [15]:
for app in gplay:
    name = app[0]
    if name == 'Google My Business':
        print(list(zip(gplay_header,app)))
        print(gplay.index(app))

[('App', 'Google My Business'), ('Category', 'BUSINESS'), ('Rating', '4.4'), ('Reviews', '70991'), ('Size', 'Varies with device'), ('Installs', '5,000,000+'), ('Type', 'Free'), ('Price', '0'), ('Content Rating', 'Everyone'), ('Genres', 'Business'), ('Last Updated', 'July 24, 2018'), ('Current Ver', '2.19.0.204537701'), ('Android Ver', '4.4 and up')]
193
[('App', 'Google My Business'), ('Category', 'BUSINESS'), ('Rating', '4.4'), ('Reviews', '70991'), ('Size', 'Varies with device'), ('Installs', '5,000,000+'), ('Type', 'Free'), ('Price', '0'), ('Content Rating', 'Everyone'), ('Genres', 'Business'), ('Last Updated', 'July 24, 2018'), ('Current Ver', '2.19.0.204537701'), ('Android Ver', '4.4 and up')]
193
[('App', 'Google My Business'), ('Category', 'BUSINESS'), ('Rating', '4.4'), ('Reviews', '70991'), ('Size', 'Varies with device'), ('Installs', '5,000,000+'), ('Type', 'Free'), ('Price', '0'), ('Content Rating', 'Everyone'), ('Genres', 'Business'), ('Last Updated', 'July 24, 2018'), ('Cu

Based on the data, the only difference between the duplicated rows / apps is the number of reviews. 

We need a clean version of google play apps list. Makes sense to keep the duplicated verions with the highest amount of reviews. 

In [16]:
def clean_duplicates(data_set, name_index, reviews_index):
    review_max = {}
    for i in range(len(data_set)):
        name = data_set[i][name_index]
        review = float(data_set[i][reviews_index])

        if name in review_max:
            if review_max[name][0] < review:
                review_max[name][0] = review
                review_max[name][1] = i
        else:
            review_max[name] = [review, i]

    clean_apps = []
    for name in review_max:
        clean_apps.append(data_set[review_max[name][1]])

    return clean_apps

In [17]:
gplay_clean = clean_duplicates(gplay, 0, 3)
print(len(gplay_clean))
print(len(gplay) - len(gplay_clean))

9659
1181


We calculated above the number of duplicates and if we make the difference between the original data set and the cleaned one, we get the same number.

We have now a clean version of the google play apps,`gplay_clean` , without duplicates.

-------------------------------
-------------------------------

# Get English Apps only
We want to filter out non English application. We should get most of them by checking if the chars in the string name are in the ASCII range 0-127.

Some English names may contain non English so to compensate we allow 3 chars outside the range above.

In [18]:
def is_english(name):
    non_english = 0
    for ch in name:
        if ord(ch) > 127:
            if non_english < 3:
                non_english += 1
            else:
                return False
        
    return True

In [19]:
test_name = app_store[813][1]
print(test_name, is_english(test_name))
test_name = app_store[2948][1]
print(test_name, is_english(test_name))

test_name = gplay_clean[0][0]
print(test_name, is_english(test_name))
test_name = gplay_clean[1][0]
print(test_name, is_english(test_name))


爱奇艺PPS -《欢乐颂2》电视剧热播 False
Mannequin Challenge True
Job Korea - Career Jobs True
Bubble Shooter DX AdFree True


In [20]:
def remove_non_english(data_set, name_index):
    english_app = []
    for app in data_set:
        name = app[name_index]
        if is_english(name):
            english_app.append(app)
    
    return english_app

In [21]:
app_store_english = remove_non_english(app_store, 1)
print(len(app_store), len(app_store_english))

7197 6183


In [22]:
gplay_english = remove_non_english(gplay_clean, 0)
print(len(gplay), len(gplay_clean), len(gplay_english))

10840 9659 9614


# Get the Free English Apps
Next step is to filter the free application. We go over the data set and filter only the apps with the price 0.

App Store has 0.0 to show it's free, while Google Play has 0

In [23]:
def get_free_app(data_set, price_index, price_string):
    free_apps = []
    for app in data_set:
        price = app[price_index]
        if price == price_string:
            free_apps.append(app)
    return free_apps

In [24]:
app_store_free = get_free_app(app_store_english, 4, '0.0')
print(len(app_store_english), len(app_store_free))
print("\n")

6183 3222




A bit over half of the english apps in the App Store are free

In [25]:
gplay_free = get_free_app(gplay_english, 7, '0')
print(len(gplay_english), len(gplay_free))

9614 8864


Almost all the English apps in Google Play are free

# Filter by Genre
Next step we can try is to have a look at the applications genres and see if there is one doing well in both stores.
We will create a function that returns the frequency table for any desired column index.

In [26]:
def get_freq_table(data_set, index):
    total_apps = len(data_set)
    freq_table = {}
    for app in data_set:
        val = app[index]
        if val in freq_table:
            freq_table[val] += 1
        else:
            freq_table[val] = 1
    
    return freq_table

In [27]:
def print_freq_table(data_set, index, pc = 1):
    freq_table = get_freq_table(data_set, index)
    sorted_table = sorted(freq_table.items(), key=lambda kv: kv[1], reverse=True)
    total_apps = len(data_set)
    for pair in sorted_table:
        if pc:
            print(pair[0] + ' : ' +  str((pair[1] / total_apps) * 100))
        else:
            print(pair[0] + ' : ' + str(pair[1]))

In [28]:
print_freq_table(app_store_free, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Most encountered genre is for the App Store is Games, and overall its the entertainment categories that dominate( games, entertainment, photo video)

Let's check the Google Play. In this case there are two potential keys: Categories (index 1) and Genres ( index -4) . We will check for each to see the difference

In [29]:
print_freq_table(gplay_free, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Looking over the result using the 'Category' as the key we see there is a Games too here, but with a way way lower frequency compared to the App Store.

Based on this indicator, the Family category is the best, but it's unclear though what specific type, they can be games that could be played together, or educational apps.

We can have a look for Google Play at the frequencey of the other key, :

In [30]:
print_freq_table(gplay_free, -4) 

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Libraries & Demo : 0.9363718411552346
Role Playing : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Based on this indicator, we have a lot of categories with a more even spread. Seems it like an extra division of the Categories we checked before.

Would make sense to do the observations based on the Categories. 

As a conclusion:
- in the App Store the entertainment apps are dominating by far
- in the Google Play its more even between entertainment ones and practical ones

-------------------------------

In order to see which genres are most popular, we could check the install numbers. Google Play data set has a column for it, **`Installs`**. Apple Store doesn't have a similar column, but we can try and make do by analizing the ratings from the **`rating_count_tot`**. 

The following function will return the averages for any number based column per genre.

In [31]:
def get_averages(data_set, genre_index, for_index):
    genres_freq = get_freq_table(data_set, genre_index)
    genres_avg = []
    for genre in genres_freq:
        genre_total = 0
        sum_val = 0
        for app in data_set:
            name = app[genre_index]
            val = float(app[for_index])
            if name == genre:
                genre_total += 1
                sum_val += val
        avg_val = sum_val / genre_total
        genres_avg.append((genre, avg_val))
    
    sorted_genres_avg = sorted(genres_avg, key = lambda p: p[1], reverse = True)
    return sorted_genres_avg

In [32]:
genre_rating_app_store = get_averages(app_store_free, -5, 5)

In [33]:
for genre_avg in genre_rating_app_store:
    print(genre_avg[0], ":", genre_avg[1])

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


`Navigation` apps are having the higher rating count. We should look into how the ratings are per navigation app

In [34]:
for app in app_store_free:
    genre = app[-5]
    if genre == 'Navigation':
        name = app[1]
        rating  = float(app[5])
        print(name, ':', rating)

Waze - GPS Navigation, Maps & Real-time Traffic : 345046.0
Google Maps - Navigation & Transit : 154911.0
Geocaching® : 12811.0
CoPilot GPS – Car Navigation & Offline Maps : 3582.0
ImmobilienScout24: Real Estate Search in Germany : 187.0
Railway Route Search : 5.0


Major companies dominate by far here. Would be pretty hard to compete here.

We can check for the next two averages, References and Social Networking.

In [35]:
for app in app_store_free:
    genre = app[-5]
    if genre == 'Navigation':
        name = app[1]
        rating  = float(app[5])
        print(name, ':', rating)

Waze - GPS Navigation, Maps & Real-time Traffic : 345046.0
Google Maps - Navigation & Transit : 154911.0
Geocaching® : 12811.0
CoPilot GPS – Car Navigation & Offline Maps : 3582.0
ImmobilienScout24: Real Estate Search in Germany : 187.0
Railway Route Search : 5.0


Here there might be the niche to work on an app. Taking a popular book and making it into an interactive app, potentially with AR even and / or adding gamification. 
Depending on the book, copyright costs migh be involved.

Now lets check the Social Network ones

In [36]:
for app in app_store_free:
    genre = app[-5]
    if genre == 'Social Networking':
        name = app[1]
        rating  = float(app[5])
        print(name, ':', rating)

Facebook : 2974676.0
Pinterest : 1061624.0
Skype for iPhone : 373519.0
Messenger : 351466.0
Tumblr : 334293.0
WhatsApp Messenger : 287589.0
Kik : 260965.0
ooVoo – Free Video Call, Text and Voice : 177501.0
TextNow - Unlimited Text + Calls : 164963.0
Viber Messenger – Text & Call : 164249.0
Followers - Social Analytics For Instagram : 112778.0
MeetMe - Chat and Meet New People : 97072.0
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414.0
InsTrack for Instagram - Analytics Plus More : 85535.0
Tango - Free Video Call, Voice and Chat : 75412.0
LinkedIn : 71856.0
Match™ - #1 Dating App. : 60659.0
Skype for iPad : 60163.0
POF - Best Dating App for Conversations : 52642.0
Timehop : 49510.0
Find My Family, Friends & iPhone - Life360 Locator : 43877.0
Whisper - Share, Express, Meet : 39819.0
Hangouts : 36404.0
LINE PLAY - Your Avatar World : 34677.0
WeChat : 34584.0
Badoo - Meet New People, Chat, Socialize. : 34428.0
Followers + for Instagram - Follower Analytics : 28633.0
GroupMe : 28

Major well established apps are dominating here, bringing a new succesfull one would be difficult.

We will list all the other too, also it seems we get the results sorted already so we can filter the top 10.

In [37]:
for genre_avg in genre_rating_app_store[3:]:
    print('------------', genre_avg[0], '------------' )
    stop = 0
    for app in app_store_free:
        genre = app[-5]
        if genre == genre_avg[0]:
            name = app[1]
            rating  = float(app[5])
            print(name, ':', rating)
            if stop == 9:
                break
            stop += 1

------------ Music ------------
Pandora - Music & Radio : 1126879.0
Spotify Music : 878563.0
Shazam - Discover music, artists, videos & lyrics : 402925.0
iHeartRadio – Free Music & Radio Stations : 293228.0
SoundCloud - Music & Audio : 135744.0
Magic Piano by Smule : 131695.0
Smule Sing! : 119316.0
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420.0
Amazon Music : 106235.0
SoundHound Song Search & Music Player : 82602.0
------------ Weather ------------
The Weather Channel: Forecast, Radar & Alerts : 495626.0
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648.0
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583.0
MyRadar NOAA Weather Radar Forecast : 150158.0
AccuWeather - Weather for Life : 144214.0
Yahoo Weather : 112603.0
Weather Underground: Custom Forecast & Local Radar : 49192.0
NOAA Weather Radar - Weather Forecast & HD Radar : 45696.0
Weather Live Free - Weather Forecast & Alerts : 35702.0
Storm Radar : 22792.0
---------

We can see the same trend, outisde Games, big name companies lead with their apps.

It might be possbile to try in some genres, but costs will be involved ( for an weather app might not be free to get the data for example)

Now lets get to Google Play. In this case we actually have a column with the install numbers. 

In [38]:
print_freq_table(gplay_free, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


We gonna assume the installs number column is  actuall number, as in 50,000,000+ means therea are 50.000.000 installs and calculate the averages based on that. 

Unfortunatelly we cant fully re-use the above `get_averages` function, we need a specific one for the Google Play data set.

In [39]:
def get_averages_installs_gplay():
    categories_freq = get_freq_table(gplay_free, 1)
    installs_avg = []
    for category in categories_freq:
        category_total = 0
        sum_val = 0
        for app in gplay_free:
            name = app[1]
            if name == category:
                category_total += 1
                installs = app[5].replace(',','').replace('+','')
                sum_val += float(installs)
        avg_val = sum_val / category_total
        installs_avg.append((category, avg_val))
    
    sorted_installs_avg = sorted(installs_avg, key = lambda p: p[1], reverse = True)
    return sorted_installs_avg

In [40]:
gplay_installs_avg = get_averages_installs_gplay()

In [41]:
for p in gplay_installs_avg:
    print(p[0], ':', p[1])

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

Communication apps have on average the most installs. We can have a look at those apps.

In [42]:
def get_gplay_installs_per_category(category):
    comm_apps_installs = []
    float_str_pairs = {}
    for app in gplay_free:
        if app[1] == category:
            installs = float(app[5].replace(',','').replace('+',''))
            float_str_pairs[installs] = app[5]
            comm_apps_installs.append((app[0], installs))

    sorted_one = sorted(comm_apps_installs, key=lambda p: p[1], reverse=True)
    
    return sorted_one, float_str_pairs

In [43]:
comm_app_installs, float_str_pairs = get_gplay_installs_per_category("COMMUNICATION")
for p in comm_app_installs:
    print(p[0], ":", float_str_pairs[p[1]])

Gmail : 1,000,000,000+
WhatsApp Messenger : 1,000,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
Hangouts : 1,000,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Skype - free IM & video calls : 1,000,000,000+
Viber Messenger : 500,000,000+
LINE: Free Calls & Messages : 500,000,000+
imo free video calls and chat : 500,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
imo beta free calls and text : 100,000,000+
Telegram : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Android Messages : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
Firefox Browser fast & private : 100,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Kik : 100,000,000+
BBM - Free Calls & Messages : 100,000,000+

Again we can see well established applications here, would be very hard to grab a market share here.



Lets have a look at some more

In [44]:
comm_app_installs, float_str_pairs = get_gplay_installs_per_category("VIDEO_PLAYERS")
for p in comm_app_installs:
    print(p[0], ":", float_str_pairs[p[1]])

Google Play Movies & TV : 1,000,000,000+
YouTube : 1,000,000,000+
MX Player : 500,000,000+
Motorola FM Radio : 100,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
VLC for Android : 100,000,000+
Motorola Gallery : 100,000,000+
VMate : 50,000,000+
HD Video Downloader : 2018 Best video mate : 50,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
KineMaster – Pro Video Editor : 50,000,000+
DU Recorder – Screen Recorder, Video Editor, Live : 50,000,000+
LIKE – Magic Video Maker & Community : 50,000,000+
Vote for : 50,000,000+
Ringdroid : 50,000,000+
Samsung Video Library : 50,000,000+
Vigo Video : 50,000,000+
Inst Download - Video & Photo : 10,000,000+
HTC Gallery : 10,000,000+
video player for android : 10,000,000+
Mobizen Screen Recorder for SAMSUNG : 10,000,000+
BSPlayer FREE : 10,000,000+
Video Downloader : 10,000,000+
YouTube Studio : 10,000,000+
BitTorrent®- Torrent

For VIDEO it's the same, dominated by a few known ones. 

Let;s check BOOKS_AND_REFERENCES, to see how it fares compared to the App Store one.

In [45]:
comm_app_installs, float_str_pairs = get_gplay_installs_per_category("BOOKS_AND_REFERENCE")
for p in comm_app_installs:
    print(p[0], ":", float_str_pairs[p[1]])

Google Play Books : 1,000,000,000+
Wattpad 📖 Free Books : 100,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Audiobooks from Audible : 100,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Aldiko Book Reader : 10,000,000+
JW Library : 10,000,000+
Dictionary - Merriam-Webster : 10,000,000+
Quran for Android : 10,000,000+
Dictionary : 10,000,000+
Oxford Dictionary of English : Free : 10,000,000+
Wikipedia : 10,000,000+
Moon+ Reader : 10,000,000+
Al-Quran (Free) : 10,000,000+
HTC Help : 10,000,000+
Cool Reader : 10,000,000+
NOOK: Read eBooks & Magazines : 10,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Hindi Dictionary : 10,000,000+
Dictionary.com: Find Definitions for English Words : 10,000,000+
Al Quran Indonesia : 10,000,000+
English Dictionary - Offline : 10,000,000+
Spanish English Translator : 10,000,000+
Read books online : 5,000,000+
AlReader -any text book reader : 5,000,000+
Dictionary - WordWeb : 5,000,000+
Bible KJV : 5,000,000+
Ebook Reader : 5,000

There are a lot of ebook readers and ebook sellers ( Google Play Books ,  Amazon Kindle, etc) , lots of dictionaries .

But as with the App Store case, it might be possbile to build an app around a new popular book, preferably from a self published/new author, again with the option of adding gamification and / or AR to make it more immers