# Profitable App Profiles for the App Store and Google Play Markets

One method in which mobile apps can run a profit is by allowing apps to be downloaded and installed for free, basing revenue on in-app ads. Therefore, getting more users to download an app will likely increase exposure to ads and increase profit. The purpose of this project is to analyze app-store data to help developers understand what kinds of apps are likely to attract more users.

## Exploration of data sets

Two data sets will be utilized, one from the Apple App Store and the other from the Google Play store. Here I open both datasets in Python, make them into lists, check that they have been loaded, and show the first few rows:

In [1]:
from csv import reader

opened_file_apple = open('appleStore.csv')
reader_file_apple = reader(opened_file_apple)
apple_list = list(reader_file_apple)
ios_header = apple_list[0]
ios = apple_list[1:]

print(opened_file_apple)
print(apple_list[0:2])

<_io.TextIOWrapper name='appleStore.csv' mode='r' encoding='UTF-8'>
[['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'], ['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']]


In [2]:
opened_file_android = open('googleplaystore.csv')
read_file_android = reader(opened_file_android)
android_list = list(read_file_android)
android_header = android_list[0]
android = android_list[1:]

print(opened_file_android)
print(android_list[0:2])

<_io.TextIOWrapper name='googleplaystore.csv' mode='r' encoding='UTF-8'>
[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'], ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']]


### To make exploration of the data easier, I create a function that lists the data, with parameters for starting row and ending row, and data stats.

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

### Below we explore both datasets

In [4]:
explore_data(apple_list, 0, 3, True)
print('\n')
explore_data(android_list, 0, 3, True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


Number of rows: 7198
Number of columns: 17


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free',

### After exploring both data sets I then look to identify columns that can aid in our analysis

In [5]:
print(apple_list[0])
print('\n')
print(android_list[0])

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


### See here for the documentation for each column
- [Google Play Store](https://www.kaggle.com/lava18/google-play-store-apps)
- [Apple iOS Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

In the Google Play store, the following columns may prove useful:
- App
- Category
- Rating
- Reviews
- Installs
- Type
- Size
- Price
- Content Rating
- Genres
- Android Version

In the Apple iOS store, the following columns may prove useful:
- track_name
- size_bytes
- price
- rating_count_tot
- user_rating
- cont_rating
- prime_genre


## Data cleaning

#### iOS Dataset

The iOS dataset, with respect to data entry, is clear of superficial errors.

#### Google Play Dataset

Sifting through the data there a column shift on line 10473 - it is missing a 'category' column value. See below:

In [6]:
explore_data(android_list, 0, 2) #show header and 1 row of correct values

print(android_list[10473]) # bad row that is missing 'category' column

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


I remove the entire row to cleanse the data of that error:

In [7]:
del android_list[10473]

Further, the Google Play data set appears to have duplicates. See below for examples:

In [8]:
duplicate_apps = []
unique_apps = []

for app in android_list:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    unique_apps.append(name)
    
print(len(duplicate_apps))
print(duplicate_apps[:15])

1181
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


#### Above we see that there were 1,181 duplicates. Looking closer at one of these duplicates, Instagram:

In [9]:
for app in android_list:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


#### To remove these duplicates I will create a dictionary that loops through the data, where each dictionary key is a unique app and the value is the highest number of reviews of that app. Using the dictionary I will then create a new data set, which will only have one entry per app (based on the row for each app with the highest number of user ratings)

In [10]:
reviews_max = {}

for app in android_list[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
print('Expected length: ', len(android_list[1:])-1181) #number we should have after duplicates are removed
print('Real/dictionary length: ', len(reviews_max))
#reviews_max

Expected length:  9659
Real/dictionary length:  9659


In [11]:
android_clean = []
already_added = []

for app in android_list[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if reviews_max[name] == n_reviews and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
        
explore_data(android_clean, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


#### The new list android_clean has 9,659 rows - as expected

### Remove non-English apps

#### The apps that we are looking to make are in English, so all non-English apps should be removed from the data set. First I create a function to loop through the data and return True/False if a string name contains more than three non-English characters (to account for emojis, dashes, etc).

In [12]:
def isEnglish (string):
    count = 0
    for character in string:
        if ord(character) > 127:
            count += 1
    if count > 3:
            return False
    return True

print(isEnglish('Instagram'))
print(isEnglish('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(isEnglish('Docs To Go™ Free Office Suite'))
print(isEnglish('Instachat 😜'))

True
False
True
True


#### Using the function created above (isEnglish), I loop through the datasets and create new lists for apps that are most likely to be English based, then compare the length of the lists to the prior lists (before non-English apps removed):

In [13]:
android_foreignapps = []
android_englishapps = []
ios_foreignapps = []
ios_englishapps = []

for app in android_clean:
    name = app[0]
    if isEnglish(name) == True:
        android_englishapps.append(app)
    else:
        android_foreignapps.append(app)
        
for app in ios:
    name = app[2]
    if isEnglish(name) == True:
        ios_englishapps.append(app)
    else:
        ios_foreignapps.append(app)
        
#explore_data(android_foreignapps, 0, 5, True)
print(explore_data(android_englishapps, 0, 3, True))
print('Android English & non-English row count: ', len(android_clean))

print('\n')

print(explore_data(ios_englishapps, 0, 3, True))
print('iOS original row count: ', len(ios))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13
None
Android English & non-English row count:  9659


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather,

## Isolate free apps
#### The company only builds apps that are free to download and install, with revenue generated through in-app advertising. The Android and Apple datasets contain both free and paid apps; I will isolate only free apps for this analysis

In [14]:
free_android_apps = []

for app in android_englishapps:
    price = app[7]
    if price == '0':
        free_android_apps.append(app)

print('We are left with: ', len(free_android_apps), ' Android apps')
    

We are left with:  8864  Android apps


In [15]:
free_ios_apps = []

for app in ios_englishapps:
    price = app[5]
    #print(price)
    if price == '0':
        free_ios_apps.append(app)

print('We are left with: ', len(free_ios_apps), ' iOS apps')

We are left with:  3222  iOS apps


## Data Strategy
#### The approach the company takes to developing apps is comprised of first building a minimal Android version of the app and adding it to the Google Play store; if the app has a good response to then build it out further; and finally if the app is profitable after six months, to then build an iOS version of the app and add it to the Apple App Store. 

#### Thus, my strategy here will be to find app profiles that are successful in both markets.

The first step is to get a sense of the most common genres for each market. I will start by building frequency tables for a few columns in the datasets:

In [16]:
def freq_table (dataset, index):
    genre_freqs = {}
    total = 0
    for app in dataset:
        total += 1
        genre = app[index]
        if genre in genre_freqs:
            genre_freqs[genre] += 1
        else:
            genre_freqs[genre] = 1
            
    table_percentages = {}
    for key in genre_freqs:
        percentage = (genre_freqs[key] / total) * 100
        table_percentages[key] = percentage
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

#### iOS app genres from 'prime_genre' column

In [17]:
print(display_table(free_ios_apps, 12))

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665
None


In the Apple App store, account for 58.2% of the available apps that are English based; this is followed by Entertainment oriented apps at 7.9%. More generally, it appears that appears that apps whose purpose is fun, through interaction or otherwise, are the most popular type that is available. In comparison, social networking apps (3.3%) and other productivity/tool related apps are there in smaller numbers. Though we can see that "fun" apps are the most populous, it does not mean that they have the most sustained user base.

#### Android 'Genres' and 'Category'

The Android dataset contains two potentially relevant columns to determine the category of the app: Genres and Category. It is unclear the difference between the two, and reviewing the frequency tables below may shed some light:

In [18]:
print(display_table(free_android_apps, -4)) #Genres

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

The Android Google Play store seems to have a different trend, with Tools holding most of the share of available English apps (8.4%), followed by entertainment at 6.1% and Education at 5.3%. Here, games seem to be broken down into more specific genres, greatly splitting the superficial marketshare.

In [19]:
print(display_table(free_android_apps, 1)) #Category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

The category column tells a similar story. Family (most games for kids) is the most popular app category at 18.9% of the available apps, followed by Games at 9.7% and Tools at 8.46%.

It is not clear exactly the difference between the Genres and Category columns. The Genres column appears to be more granular, and the Category column more general. Thus I will use the Category column moving forward. 

#### Reviewing the frequency tables we get the idea that the App Store is largely entertainment oriented games, folowed to a lesser extent by practical applications. The Google Play store, however, while following a similar trend appears to be more evenly balanced entertainment to practical.

## Analyzing install/review frequency

### Apple App Store
The Apple App Store does not have download frequency data in the dataset. The rating count total column seems like a good surrogate to determine popularity. Below I take the average number of ratings for an app for each of the given genre types:

In [20]:
prime_genre_freqs = freq_table(free_ios_apps, -5)
dic_freqs = {}

for genre in prime_genre_freqs:
    total = 0
    len_genre = 0
    for app in free_ios_apps:
        genre_app = app[-5]
        if genre_app == genre:
            user_ratings = float(app[6]) #'rating_count_tot'
            total += user_ratings
            len_genre += 1
    average = total/len_genre
    dic_freqs.update({genre : average})

sorted_freqs = sorted(dic_freqs, key=lambda x: dic_freqs[x], reverse = True)
for value in sorted_freqs:
    print("{} : {}".format(value, dic_freqs[value]))

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


#### Navigation, Reference, and Social Networking appear to be the most popular genre types on the App Store. Below I will investigate which apps are the most popular for each category:

In [21]:
#Navigation genre
dic_freqs = {}

for app in free_ios_apps:
    if app[-5] == 'Navigation':
        user_ratings = float(app[6])
        name = app[2]
        #print(name, ':', user_ratings)
        dic_freqs.update({name : user_ratings})
        
sorted_freqs = sorted(dic_freqs, key=lambda x: dic_freqs[x], reverse = True)
for value in sorted_freqs:
    print("{} : {}".format(value, dic_freqs[value]))

Waze - GPS Navigation, Maps & Real-time Traffic : 345046.0
Google Maps - Navigation & Transit : 154911.0
Geocaching® : 12811.0
CoPilot GPS – Car Navigation & Offline Maps : 3582.0
ImmobilienScout24: Real Estate Search in Germany : 187.0
Railway Route Search : 5.0


In [22]:
#Reference genre
dic_freqs = {}

for app in free_ios_apps:
    if app[-5] == 'Reference':
        user_ratings = float(app[6])
        name = app[2]
        #print(name, ':', user_ratings)
        dic_freqs.update({name : user_ratings})
        
sorted_freqs = sorted(dic_freqs, key=lambda x: dic_freqs[x], reverse = True)
for value in sorted_freqs:
    print("{} : {}".format(value, dic_freqs[value]))

Bible : 985920.0
Dictionary.com Dictionary & Thesaurus : 200047.0
Dictionary.com Dictionary & Thesaurus for iPad : 54175.0
Google Translate : 26786.0
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418.0
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588.0
Merriam-Webster Dictionary : 16849.0
Night Sky : 12122.0
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535.0
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693.0
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497.0
Guides for Pokémon GO - Pokemon GO News and Cheats : 826.0
WWDC : 762.0
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718.0
VPN Express : 14.0
Real Bike Traffic Rider Virtual Reality Glasses : 8.0
Jishokun-Japanese English Dictionary & Translator : 0.0
教えて!goo : 0.0


In [23]:
#Social Networking Genre
dic_freqs = {}

for app in free_ios_apps:
    if app[-5] == 'Social Networking':
        user_ratings = float(app[6])
        name = app[2]
        #print(name, ':', user_ratings)
        dic_freqs.update({name : user_ratings})
        
sorted_freqs = sorted(dic_freqs, key=lambda x: dic_freqs[x], reverse = True)
for value in sorted_freqs:
    print("{} : {}".format(value, dic_freqs[value]))

Facebook : 2974676.0
Pinterest : 1061624.0
Skype for iPhone : 373519.0
Messenger : 351466.0
Tumblr : 334293.0
WhatsApp Messenger : 287589.0
Kik : 260965.0
ooVoo – Free Video Call, Text and Voice : 177501.0
TextNow - Unlimited Text + Calls : 164963.0
Viber Messenger – Text & Call : 164249.0
Followers - Social Analytics For Instagram : 112778.0
MeetMe - Chat and Meet New People : 97072.0
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414.0
InsTrack for Instagram - Analytics Plus More : 85535.0
Tango - Free Video Call, Voice and Chat : 75412.0
LinkedIn : 71856.0
Match™ - #1 Dating App. : 60659.0
Skype for iPad : 60163.0
POF - Best Dating App for Conversations : 52642.0
Timehop : 49510.0
Find My Family, Friends & iPhone - Life360 Locator : 43877.0
Whisper - Share, Express, Meet : 39819.0
Hangouts : 36404.0
LINE PLAY - Your Avatar World : 34677.0
WeChat : 34584.0
Badoo - Meet New People, Chat, Socialize. : 34428.0
Followers + for Instagram - Follower Analytics : 28633.0
GroupMe : 28

#### In the App Store, Navigation appears to be dominated by big players, like Google/Waze. The apps in Reference or Social Networking seem more easy to develop and therefore more lucrative.

### Google Play Store
The Google Play Store dataset contains an installs column, but as we see below it is not precise. The column uses general numbers so one cannot see whether the app has, for example, 1,000,000 installs or 2,000,000 installs. See below:

In [24]:
display_table(free_android_apps, 5) #installs column

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


We do not need exact numbers to get an idea of the landscape. For the purposes of this analysis I will remove all of the extra characters, so 1,000+ will simply be 1,000. The numbers below show the average number of "installs" per app category in the Google Play Store:

In [25]:
prime_genre_freqs = freq_table(free_android_apps, 1)
dic_freqs = {}

for genre in prime_genre_freqs:
    total = 0
    len_genre = 0
    for app in free_android_apps:
        genre_app = app[1]
        if genre_app == genre:
            user_ratings = app[5] #installs column
            user_ratings = user_ratings.replace('+','')
            user_ratings = user_ratings.replace(',','')
            user_ratings = float(user_ratings)
            total += user_ratings
            len_genre += 1
    average = total/len_genre
    dic_freqs.update({genre : average})

sorted_freqs = sorted(dic_freqs, key=lambda x: dic_freqs[x], reverse = True)
for value in sorted_freqs:
    print("{} : {}".format(value, dic_freqs[value]))

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

In [39]:
prime_genre_freqs = freq_table(free_android_apps, 1)
dic_freqs = {}

for genre in prime_genre_freqs:
    total = 0
    len_genre = 0
    for app in free_android_apps:
        user_ratings = app[5] #installs column
        user_ratings = user_ratings.replace('+','')
        user_ratings = user_ratings.replace(',','')
        genre_app = app[1]
        if (genre_app == genre) and  (float(user_ratings) < 100000000):
            user_ratings = float(user_ratings)
            total += user_ratings
            len_genre += 1
    average = total/len_genre
    dic_freqs.update({genre : average})

sorted_freqs = sorted(dic_freqs, key=lambda x: dic_freqs[x], reverse = True)
for value in sorted_freqs:
    print("{} : {}".format(value, dic_freqs[value]))

PHOTOGRAPHY : 7670532.29338843
GAME : 6272564.694894147
ENTERTAINMENT : 6118250.0
VIDEO_PLAYERS : 5544878.133333334
WEATHER : 5074486.197183099
SHOPPING : 4640920.541237113
COMMUNICATION : 3603485.3884615386
PRODUCTIVITY : 3379657.318885449
TOOLS : 3191461.128987517
SOCIAL : 3084582.5201793723
SPORTS : 2994082.551839465
TRAVEL_AND_LOCAL : 2944079.6336633665
PERSONALIZATION : 2549775.832167832
MAPS_AND_NAVIGATION : 2484104.7540983604
FAMILY : 2342897.527075812
HEALTH_AND_FITNESS : 2005713.6605166052
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
NEWS_AND_MAGAZINES : 1502841.8775510204
BOOKS_AND_REFERENCE : 1437212.2162162163
HOUSE_AND_HOME : 1331540.5616438356
BUSINESS : 1226918.7407407407
LIFESTYLE : 1152128.779710145
FINANCE : 1086125.7859327218
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 513151.88679245283


In the Google Play store the Communication category appears to have the most installs, followed by Video Players and Social apps. Let's take a deeper dive and see which apps are the most popular for each category:

In [26]:
#Communication Genre
dic_freqs = {}

for app in free_android_apps:
    if app[1] == 'COMMUNICATION':
        user_ratings = app[5] #installs column
        user_ratings = user_ratings.replace('+','')
        user_ratings = user_ratings.replace(',','')
        user_ratings = float(user_ratings)
        name = app[0]
        #print(name, ':', user_ratings)
        dic_freqs.update({name : user_ratings})
        
sorted_freqs = sorted(dic_freqs, key=lambda x: dic_freqs[x], reverse = True)
for value in sorted_freqs:
    print("{} : {}".format(value, dic_freqs[value]))

WhatsApp Messenger : 1000000000.0
Messenger – Text and Video Chat for Free : 1000000000.0
Skype - free IM & video calls : 1000000000.0
Google Chrome: Fast & Secure : 1000000000.0
Gmail : 1000000000.0
Hangouts : 1000000000.0
Google Duo - High Quality Video Calls : 500000000.0
imo free video calls and chat : 500000000.0
LINE: Free Calls & Messages : 500000000.0
UC Browser - Fast Download Private & Secure : 500000000.0
Viber Messenger : 500000000.0
imo beta free calls and text : 100000000.0
Android Messages : 100000000.0
Who : 100000000.0
GO SMS Pro - Messenger, Free Themes, Emoji : 100000000.0
Firefox Browser fast & private : 100000000.0
Messenger Lite: Free Calls & Messages : 100000000.0
Kik : 100000000.0
KakaoTalk: Free Calls & Text : 100000000.0
Opera Mini - fast web browser : 100000000.0
Opera Browser: Fast and Secure : 100000000.0
Telegram : 100000000.0
Truecaller: Caller ID, SMS spam blocking & Dialer : 100000000.0
UC Browser Mini -Tiny Fast Private & Secure : 100000000.0
WeChat : 

In [37]:
total = []

for app in free_android_apps:
        user_ratings = app[5] #installs column
        user_ratings = user_ratings.replace('+','')
        user_ratings = user_ratings.replace(',','')
        if (app[1] == 'COMMUNICATION'):
            total.append(float(user_ratings))
print(sum(total)/ len(total))

under_100_million = []

for app in free_android_apps:
        user_ratings = app[5] #installs column
        user_ratings = user_ratings.replace('+','')
        user_ratings = user_ratings.replace(',','')
        if (app[1] == 'COMMUNICATION') and  (float(user_ratings) < 100000000):
            under_100_million.append(float(user_ratings))
print(sum(under_100_million)/ len(under_100_million))

38456119.167247385
3603485.3884615386


In [28]:
#Video Players Genre
dic_freqs = {}

for app in free_android_apps:
    if app[1] == 'VIDEO_PLAYERS':
        user_ratings = app[5] #installs column
        user_ratings = user_ratings.replace('+','')
        user_ratings = user_ratings.replace(',','')
        user_ratings = float(user_ratings)
        name = app[0]
        #print(name, ':', user_ratings)
        dic_freqs.update({name : user_ratings})
        
sorted_freqs = sorted(dic_freqs, key=lambda x: dic_freqs[x], reverse = True)
for value in sorted_freqs:
    print("{} : {}".format(value, dic_freqs[value]))

YouTube : 1000000000.0
Google Play Movies & TV : 1000000000.0
MX Player : 500000000.0
Motorola Gallery : 100000000.0
VLC for Android : 100000000.0
Dubsmash : 100000000.0
VivaVideo - Video Editor & Photo Movie : 100000000.0
VideoShow-Video Editor, Video Maker, Beauty Camera : 100000000.0
Motorola FM Radio : 100000000.0
Vote for : 50000000.0
Vigo Video : 50000000.0
MiniMovie - Free Video and Slideshow Editor : 50000000.0
Samsung Video Library : 50000000.0
LIKE – Magic Video Maker & Community : 50000000.0
DU Recorder – Screen Recorder, Video Editor, Live : 50000000.0
KineMaster – Pro Video Editor : 50000000.0
VMate : 50000000.0
HD Video Downloader : 2018 Best video mate : 50000000.0
Ringdroid : 50000000.0
Video Downloader : 10000000.0
Video Player All Format : 10000000.0
Code : 10000000.0
Music - Mp3 Player : 10000000.0
YouTube Studio : 10000000.0
video player for android : 10000000.0
HTC Service － DLNA : 10000000.0
HTC Gallery : 10000000.0
PowerDirector Video Editor App: 4K, Slow Mo & Mo

In [27]:
#Social Genre
dic_freqs = {}

for app in free_android_apps:
    if app[1] == 'SOCIAL':
        user_ratings = app[5] #installs column
        user_ratings = user_ratings.replace('+','')
        user_ratings = user_ratings.replace(',','')
        user_ratings = float(user_ratings)
        name = app[0]
        #print(name, ':', user_ratings)
        dic_freqs.update({name : user_ratings})
        
sorted_freqs = sorted(dic_freqs, key=lambda x: dic_freqs[x], reverse = True)
for value in sorted_freqs:
    print("{} : {}".format(value, dic_freqs[value]))

Facebook : 1000000000.0
Google+ : 1000000000.0
Instagram : 1000000000.0
Facebook Lite : 500000000.0
Snapchat : 500000000.0
Tumblr : 100000000.0
Pinterest : 100000000.0
Badoo - Free Chat & Dating App : 100000000.0
Tango - Live Video Broadcast : 100000000.0
LinkedIn : 100000000.0
Tik Tok - including musical.ly : 100000000.0
BIGO LIVE - Live Stream : 100000000.0
VK : 100000000.0
ooVoo Video Calls, Messaging & Stories : 50000000.0
MeetMe: Chat & Meet New People : 50000000.0
Zello PTT Walkie Talkie : 50000000.0
POF Free Dating App : 50000000.0
SKOUT - Meet, Chat, Go Live : 50000000.0
TextNow - free text + calls : 10000000.0
LiveMe - Video chat, new friends, and make money : 10000000.0
HTC Social Plugin - Facebook : 10000000.0
Quora : 10000000.0
Kate Mobile for VK : 10000000.0
Text Me: Text Free, Call Free, Second Phone Number : 10000000.0
Text free - Free Text + Call : 10000000.0
YouNow: Live Stream Video Chat : 10000000.0
We Heart It : 10000000.0
Path : 10000000.0
SayHi Chat, Meet New Peop

#### As we recally, the apps in Reference or Social Networking appeared to be the best candidates in the Apple App store. Reviewing the apps in the Google Play store it appears that we see a similar result (albeit it with slightly different twists to the category names/types).