# Mobile App Analytics

The goal of this project is to provide developers insights about apps developed and made available at Google Play and App Store, so they can understand what type of apps are likely to attract more users.

Since those apps are free for download, revenue from in-app ads is an important concern in order to be become the business sustainable. Optimization of users' engagement with ads is desirable and constantly pursued.  

In [None]:
from csv import reader

# call Google Play dataset 
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

# call App Store dataset 
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

## Data exploration

Before start, let's get acquainted with the datasets.

In [3]:
# create a function to explore data set
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
# explore Google Play dataset
print(android_header)
print('\n')
explore_data(android,1,5,rows_and_columns=True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


In [5]:
# explore App Store dataset
print(ios_header)
print('\n')
explore_data(ios,1,5,rows_and_columns=True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


## Data pre-processing

Data cleaning is an always present task in every dataset. In our case, the row 10472 in the Android dataset has rating value issue: instead of a decimal number between 1 and 5. Let's check that.

In [6]:
# check row 10472
print(android[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Here we simply drop it.

In [7]:
# delete row 10472
del(android[10472])

In [8]:
# check that the entry was replaced
print(android[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


In [9]:
# check number of duplicates in android dataset
duplicate_apps = [] # create a list to store duplicates
unique_apps = []     # create a list to store uniques

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


There are 1181 duplicated reviews in the Android dataset. Let's check an example:

In [10]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In this example we noticed there are 4 reviews for the app 'Instagram' but the only difference between them is is value of the fourth colummn ('Reviews'). Therefore, we want to keep the lastest record, i.e., the one with the largest number of reviews. For this purpose we will use a dictionary.

In [11]:
reviews_max = {} # create an empty dictionary

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [12]:
android_clean = [] # create an empty list

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews):
        android_clean.append(app)

In [13]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10054
Number of columns: 13


There is obvious a problem since we were expecting 9659 entries as a result and we got 10054 entries. Let's investigate further.

In [14]:
# check number of duplicates in android_clean dataset
duplicate_apps = [] # create a list to store duplicates
unique_apps = []     # create a list to store uniques

for app in android_clean:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 395


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google My Business', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling']


We still have duplicates in our list. Let's check why using the app 'Quick PDF Scanner + OCR FREE' as an example.

In [15]:
for app in android_clean:
    name = app[0]
    if name == 'Quick PDF Scanner + OCR FREE':
        print(app)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


As we can see, besides duplicates with the same name and different number of reviews, there are perfectly identical duplicates. We need to get rid of them.

In [16]:
android_clean = [] # create an empty list
already_added = [] # create an empty list

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name) 

In [17]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


Things now are as expected. Now let's check for duplicates in iOS dataset.

In [18]:
# check number of duplicates in ios dataset
duplicate_apps = [] # create a list to store duplicates
unique_apps = []     # create a list to store uniques

for app in ios:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 0


Examples of duplicate apps: []


Things seem to be fine in the iOS dataset (no duplicates). 

We are only interested in apps for English speakers. We know that each string character has a corresponding number associated with it (ASCII code). In an English text are all in the range 0 to 127.

For this purpose, a _check_ function can be handy.

In [19]:
# create a function that takes in a string and returns False if there's any 
# character in the string that doesn't belong to the set of common English 
# characters
def is_english(string): 
    count = 0
    for character in string:
        if ord(character) > 127:
            count += 1
    if count <= 3:
        return True
    else:
        return False

In [20]:
# check number of English apps in android_clean dataset
android_Eng_apps =  []        # create a list to store English apps
android_nonEng_apps = []     # create a list to store non-English apps

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_Eng_apps.append(app)
    else:
        android_nonEng_apps.append(app)

print('Number of English apps:', len(android_Eng_apps))

Number of English apps: 9614


From the reults, 9614 apps are for English speakers. Let's do the same with the iOS dataset.

In [21]:
# check number of English apps in ios dataset
ios_Eng_apps =  []        # create a list to store English apps
ios_nonEng_apps = []     # create a list to store non-English apps

for app in ios:
    name = app[1]
    if is_english(name):
        ios_Eng_apps.append(app)
    else:
        ios_nonEng_apps.append(app)

print('Number of English apps:', len(ios_Eng_apps))

Number of English apps: 6183


So far so good. Now we need to filter only the free apps.

In [22]:
# check number of English apps in android_clean dataset
android_Eng_freeApps =  [] # create a list to store free apps (Android)
ios_Eng_freeApps  = []     # create a list to store free apps (iOS)

for app in android_Eng_apps:
    price = app[7]
    if price == '0':
        android_Eng_freeApps.append(app)

for app in ios_Eng_apps:
    price = app[4]
    if price == '0.0':
        ios_Eng_freeApps.append(app)
        
explore_data(android_Eng_freeApps, 0, 3, True)
print('\n')
explore_data(ios_Eng_freeApps, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

After cleaning the datasets we ended up with 8864 apps for Android and 3222 for iOS.

## Data analysis

Since our strategy is to build an app for both platforms, we need to check out which kind of genre are most frequent on both lists. Do achive that, we will use frequency tables.

In [23]:
# create a frequency table function
def freq_table(dataset, index):
    frequency_table = {}
    total = 0
    for row in dataset:
        total += 1
        data_point = row[index]
        if data_point in frequency_table:
            frequency_table[data_point] += 1
        else:
            frequency_table[data_point] = 1
            
    table_percentages = {}
    for key in frequency_table:
        percentage = (frequency_table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

In [24]:
# create a function to display frequency in a descending order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [25]:
display_table(ios_Eng_freeApps, -5) # prime_genre feature

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


As we can from the results, in the iOS realm more than a half (58.16%) of apps are developed for gaming. The second and third places are developed for amusement too which puts 71% of available apps in this category with just 3 examples.

From this perspective, if we want to maximize the users of a new iOS free app for English users, it will be good idea to develop an app for amusement.

In [26]:
display_table(android_Eng_freeApps, -4) # Genres feature

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In the case of Android free apps for English users, things are quite different. Apps' availability are more distributed between genres. However, practical purpose apps like tools, education, and business are more frequent there. Accounting the Top 10 genres we have 33% of apps in that category.

In [27]:
display_table(android_Eng_freeApps, 1) # Category feature

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

From the frequency table above, we can see that still, gaming is a relevant genre in the Android realm (about 9.7% of apps) which suggest that developping a free amusement app for English users is a quite good decision.

The number of app's installation is also a relevant metric to support our decision, so let's dig into it. Within the Android dataset such feature is available ('Installs'), but it is not available in the iOS dataset. However, the number of ratings received for each app is there ('rating_count_tot') and it could be used as a proxy for the information we are looking for.

Let's start with the iOS dataset:

In [28]:
from collections import OrderedDict

# create a frequency table for genres
ios_genre = freq_table(ios_Eng_freeApps, -5) 

# loop throughout cleaned dataset to gather number of users' ratings
ratings = {} 
for genre in ios_genre: 
    total = 0
    len_genre = 0
    #print(genre)
    for app in ios_Eng_freeApps:
        genre_app = app[-5]
        #print(genre_app)
        if genre_app == genre:
            n_rating = float(app[5])
            #print(n_rating)
            total += n_rating
            len_genre += 1
            
    avg_n_rating = total/len_genre
    ratings[genre] = avg_n_rating 


OrderedDict(sorted(ratings.items(), key=lambda t: t[1],reverse=True))

OrderedDict([('Navigation', 86090.33333333333),
             ('Reference', 74942.11111111111),
             ('Social Networking', 71548.34905660378),
             ('Music', 57326.530303030304),
             ('Weather', 52279.892857142855),
             ('Book', 39758.5),
             ('Food & Drink', 33333.92307692308),
             ('Finance', 31467.944444444445),
             ('Photo & Video', 28441.54375),
             ('Travel', 28243.8),
             ('Shopping', 26919.690476190477),
             ('Health & Fitness', 23298.015384615384),
             ('Sports', 23008.898550724636),
             ('Games', 22788.6696905016),
             ('News', 21248.023255813954),
             ('Productivity', 21028.410714285714),
             ('Utilities', 18684.456790123455),
             ('Lifestyle', 16485.764705882353),
             ('Entertainment', 14029.830708661417),
             ('Business', 7491.117647058823),
             ('Education', 7003.983050847458),
             ('Catalogs', 400

Results above shows that Navigation, Reference and Social Networking are the Top 3 rated apps, but let's analyze this further.

In [29]:
# checking on Navigation apps
for app in ios_Eng_freeApps:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps.

In [30]:
# checking on Reference apps
for app in ios_Eng_freeApps:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [31]:
# checking on Social Networking apps
for app in ios_Eng_freeApps:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5]) # print name and number of ratings

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

The same pattern applies to Reference apps, where the average number is heavily influenced by the Bible, Dictionary.com & Thesaurus, Googçe Translate, etc. Same applies to Social Networking apps, where a few big players like Facebook, Ponterest, and Skype heavily influence the average number.

Now let's check the same for Android:

In [32]:
# checking on Installs feature of Android dataset

from collections import OrderedDict

# create a frequency table for genres
android_cat = freq_table(android_Eng_freeApps, 1) 

# loop throughout cleaned dataset to gather number of users' ratings
installs = {} 
for cat in android_cat: 
    total = 0
    len_cat = 0
    for app in android_Eng_freeApps:
        cat_app = app[1]
        if cat_app == cat:
            n_installs = app[5]
            # remove characters from string for float transform
            n_installs = n_installs.replace(',', '') 
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_cat += 1
    avg_n_installs = total/len_cat
    installs[cat] = avg_n_installs 


OrderedDict(sorted(installs.items(), key=lambda t: t[1],reverse=True))

OrderedDict([('COMMUNICATION', 38456119.167247385),
             ('VIDEO_PLAYERS', 24727872.452830188),
             ('SOCIAL', 23253652.127118643),
             ('PHOTOGRAPHY', 17840110.40229885),
             ('PRODUCTIVITY', 16787331.344927534),
             ('GAME', 15588015.603248259),
             ('TRAVEL_AND_LOCAL', 13984077.710144928),
             ('ENTERTAINMENT', 11640705.88235294),
             ('TOOLS', 10801391.298666667),
             ('NEWS_AND_MAGAZINES', 9549178.467741935),
             ('BOOKS_AND_REFERENCE', 8767811.894736841),
             ('SHOPPING', 7036877.311557789),
             ('PERSONALIZATION', 5201482.6122448975),
             ('WEATHER', 5074486.197183099),
             ('HEALTH_AND_FITNESS', 4188821.9853479853),
             ('MAPS_AND_NAVIGATION', 4056941.7741935486),
             ('FAMILY', 3695641.8198090694),
             ('SPORTS', 3638640.1428571427),
             ('ART_AND_DESIGN', 1986335.0877192982),
             ('FOOD_AND_DRINK', 1924897.73

The Top 3 categories are Communication, Video Players, and Social. Let's investigate further.

In [33]:
# checking on Communication apps
for app in android_Eng_freeApps:
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5]) # print name and number of installs

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

In [39]:
# checking on Video Players apps
for app in android_Eng_freeApps:
    if app[1] == 'VIDEO_PLAYERS':
        print(app[0], ':', app[5]) # print name and number of installs

YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Video Maker & Community : 50,

In [38]:
# checking on Video Social apps
for app in android_Eng_freeApps:
    if app[1] == 'SOCIAL':
        print(app[0], ':', app[5]) # print name and number of installs

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
Web Browser ( Fast & Secure Web Explorer) : 500,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Golden telegram : 50,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, sti

Again, big players skew the distribution in these three examples.

## Conclusion

In the task we want to make a data-driven decision of what kind of app to develop for English users. Since it is a FREE app, all revenue would be generated by embedded ads. 

To support the decision, we gather information from a sample of both major marketplaces for this kind of product: Apple Store and Google Play. 

First we cleaned up both datasets to acommodate our needs. After we analyzed the numbers of which apps are available there and which are the most installed ones.

Preliminary analysis shows that an app for amusement like gaming, sports or else is recommended for development since this category is responsible a significant amount of availability and installation on both realms (iOS and Android). 

This analysis is not exhaustive since there are many big players like Facebook, Google, etc. whose apps are heavily installed and produce a kind of bias in those numbers.