# Profitable apps on the App Store & Google Play

This report is for a company that builds free Android and iOS mobile apps for an English speaking audience. The company's main source of revenue consists of in-app ads. This means that the more users who see and engage with the ads, the better.

The goal for this project is to analyze data to help the company's developers understand what type of apps are likely to attract more users.

## Opening and exploring the data
Define a function that prints a given section of a dataset.

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print(f"Number of rows: {len(dataset)}")
        print(f"Number of columns: {len(dataset[0])}")

Print the first three rows of the *AppleStore.csv* and *googleplaystore.csv* datasets (excluding header row).  
Also print the total amount of rows and columns for each dataset.

In [2]:
from csv import reader

opened_file_apple = open("AppleStore.csv", encoding="utf-8")
opened_file_google = open("googleplaystore.csv", encoding="utf-8")

read_file_apple = reader(opened_file_apple)
read_file_google = reader(opened_file_google)

apps_data_apple = list(read_file_apple)
apps_data_google = list(read_file_google)

explore_data(apps_data_apple[1:], 1, 4, rows_and_columns=True)
print('\n')
explore_data(apps_data_google[1:], 1, 4, rows_and_columns=True)

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 

Print the header rows for both datasets.  
Links to the dataset documentation are provided if any of the categories are unclear.

In [3]:
print(apps_data_apple[0])
print('\n')
print(apps_data_google[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


- [Apple App Store dataset documentation](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)
- [Google Play Store dataset documentation](https://www.kaggle.com/datasets/lava18/google-play-store-apps)

## Data Cleaning - removing wrong data
Check for missing data in the Google dataset, and determine the index number for the rows that are missing data.

In [4]:
for row in apps_data_google[1:]:
    if len(row) != len(apps_data_google[0]):
        print(row)
        print(f"Index: {apps_data_google.index(row)}")
        print('\n')

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
Index: 10473




Delete the row(s) that are missing data.

In [5]:
del apps_data_google[10473]

Check for missing data in the Apple dataset, and determine the index number for the rows that are missing data.

In [6]:
for row in apps_data_apple[1:]:
    if len(row) != len(apps_data_apple[0]):
        print(row)
        print(f"Index: {apps_data_apple.index(row)}")
        print('\n')

## Data Cleaning - removing duplicate data
Check for duplicate data in the Google dataset.

In [7]:
unique_apps = []
duplicate_apps = []

for app in apps_data_google[1:]:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print(f"Duplicate Google apps: {len(duplicate_apps)}")

Duplicate Google apps: 1181


Check for duplicate data in the Apple dataset.

In [8]:
unique_apps = []
duplicate_apps = []

for app in apps_data_apple[1:]:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print(f"Duplicate Apple apps: {len(duplicate_apps)}")

Duplicate Apple apps: 0


Only the Google dataset contains duplicate data. For example, the **Instagram** app seems to be stored 4 times in the dataset.

In [9]:
for app in apps_data_google[1:]:
    name = app[0]
    if name == "Instagram":
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


From the duplicate data, we only want to keep the row with the most recent data. That is the one with the highest number of reviews (index 3). The other duplicate data can be removed.

Let's only store the most recent (aka most reviews) apps in a new **reviews_max** dictionary.

In [10]:
reviews_max = {}

for app in apps_data_google[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if name not in reviews_max:
        reviews_max[name] = n_reviews
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews   
        
print(f"Amount of most recent apps in Google dataset: {len(reviews_max)}")

Amount of most recent apps in Google dataset: 9659


But in order to completely remove any duplicate data, we also need to account for cases where the highest number of reviews is the same for more than one entry.

First we create two new datasets: **android_clean** and **already_added**.
- **android_clean** will be a cleaned up version of **apps_data_google** with only the most recent entries and without any duplicate data.
- **already_added** helps us to keep track of apps that we already added.

We only add apps from **apps_data_google** to **android_clean** if the amount of reviews of an app is the same in both **apps_data_google** and **reviews_max**, AND the app name is not stored in **already_added**.

We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry. If we just check for `n_reviews == reviews_max[name]`, we'll still end up with duplicate entries for some apps.

In [11]:
android_clean = []
already_added = []

for app in apps_data_google[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)  

Explore the first three rows of the cleaned dataset and check the length of the cleaned Google dataset.   
Also check if Instagram only occurs once with the most reviews (66577446).

In [12]:
explore_data(android_clean, 0, 3, True )
print('\n')

for app in android_clean:
    name = app[0]
    if name == "Instagram":
        print(app)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


## Data Cleaning - removing non-English apps
Since the company builds apps only for an English speaking audience, we also need to remove any non-English apps.

Define a function that returns `True` when an inputted string is in English, and `False` when it's not.  
Each character in a string has a corresponding number. According to the ASCII system, the characters used in English, correspond with the numbers 0 to 127. The `ord()` function return the corresponding number for the inputted character.

The function also needs to correctly identify certain special English characters like `™` and emojis, which fall outside the ASCII range and have corresponding numbers over 127. So to minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range.

In [14]:
def is_english(string):
    non_english_characters = 0
    
    for character in string:
        if ord(character) > 127:
            non_english_characters += 1
        
    if non_english_characters > 3:
        return False
    else:   
        return True

Test the function.

In [15]:
is_english("Instagram")

True

In [16]:
is_english("爱奇艺PPS -《欢乐颂2》电视剧热播")

False

In [17]:
is_english("Docs To Go™ Free Office Suite")

True

In [18]:
is_english("Instachat 😜")

True

Create a filtered dataset **google_apps_english_only** that contains only English apps from the Google Play Store.

In [19]:
google_apps_english_only = []

for app in android_clean:
    name = app[0]
    
    if is_english(name) == True:
        google_apps_english_only.append(app)

Create a filtered dataset **apple_apps_english_only** that contains only English apps from the Apple App Store.

In [20]:
apple_apps_english_only = []

for app in apps_data_apple[1:]:
    name = app[1]
    
    if is_english(name) == True:
        apple_apps_english_only.append(app)

Explore both datasets and determine how much English-only apps there are for both markets.

In [21]:
explore_data(google_apps_english_only, 0, 3, rows_and_columns=True)
print('\n')
explore_data(apple_apps_english_only, 0, 3, rows_and_columns=True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

## Data Cleaning - isolating free apps
Since the company only makes free apps, we want to isolate the free apps for our data analysis.

In [22]:
google_apps_free = []
apple_apps_free = []

for app in google_apps_english_only:
    price = app[7]
    
    if price == "0":
        google_apps_free.append(app)
        

for app in apple_apps_english_only:
    price = app[4]
    
    if price == "0.0":
        apple_apps_free.append(app)

In [23]:
print(f"Number of free Google apps: {len(google_apps_free)}")
print(f"Number of free Apple apps: {len(apple_apps_free)}")

Number of free Google apps: 8864
Number of free Apple apps: 3222


## Data Analysis - most common apps by genre
Our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets.

Let's first find out which genres are the most common on both markets, by creating a frequency table for each.
- For Google Play the relevant columns are: `Category` and `Genres`.
- For the App Store the relevant column is: `prime_genre`

First we create a function that generates a frequency table, based on a given dataset and the index of the desired column.

In [24]:
def freq_table(dataset, index):
    total = 0
    frequency_table = {}
    
    for row in dataset:
        total += 1
        value = row[index]
        
        if value not in frequency_table:
            frequency_table[value] = 1
        else:
            frequency_table[value] += 1
        
    percentage_table = {}
    
    for key in frequency_table:
        percentage = (frequency_table[key]/total) * 100
        percentage_table[key] = percentage
        
    return percentage_table 

Then we create a function that takes in a dataset and the index of the desired column, and returns a frequency table in descending order.

In [25]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(f"{entry[1]}: {entry[0]}")

Let's examine the frequency table (in percentages) for the `prime_genre` in the App Store.

In [26]:
display_table(apple_apps_free, 11)

Games: 58.16263190564867
Entertainment: 7.883302296710118
Photo & Video: 4.9658597144630665
Education: 3.662321539416512
Social Networking: 3.2898820608317814
Shopping: 2.60707635009311
Utilities: 2.5139664804469275
Sports: 2.1415270018621975
Music: 2.0484171322160147
Health & Fitness: 2.0173805090006205
Productivity: 1.7380509000620732
Lifestyle: 1.5828677839851024
News: 1.3345747982619491
Travel: 1.2414649286157666
Finance: 1.1173184357541899
Weather: 0.8690254500310366
Food & Drink: 0.8069522036002483
Reference: 0.5586592178770949
Business: 0.5276225946617008
Book: 0.4345127250155183
Navigation: 0.186219739292365
Medical: 0.186219739292365
Catalogs: 0.12414649286157665


- The most common genre is `Games` with a share of 58%
- Second is `Entertainment` with 7.8%
- The other genres range between 0%-5%

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare.   
However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users, the demand might not be the same as the offer.

Let's now examine the frequency table (in percentages) for the `Category` in the Google Play store.

In [27]:
display_table(google_apps_free, 1)

FAMILY: 18.907942238267147
GAME: 9.724729241877256
TOOLS: 8.461191335740072
BUSINESS: 4.591606498194946
LIFESTYLE: 3.9034296028880866
PRODUCTIVITY: 3.892148014440433
FINANCE: 3.7003610108303246
MEDICAL: 3.531137184115524
SPORTS: 3.395758122743682
PERSONALIZATION: 3.3167870036101084
COMMUNICATION: 3.2378158844765346
HEALTH_AND_FITNESS: 3.0798736462093865
PHOTOGRAPHY: 2.944494584837545
NEWS_AND_MAGAZINES: 2.7978339350180503
SOCIAL: 2.6624548736462095
TRAVEL_AND_LOCAL: 2.33528880866426
SHOPPING: 2.2450361010830324
BOOKS_AND_REFERENCE: 2.1435018050541514
DATING: 1.861462093862816
VIDEO_PLAYERS: 1.7937725631768955
MAPS_AND_NAVIGATION: 1.3989169675090252
FOOD_AND_DRINK: 1.2409747292418771
EDUCATION: 1.1620036101083033
ENTERTAINMENT: 0.9589350180505415
LIBRARIES_AND_DEMO: 0.9363718411552346
AUTO_AND_VEHICLES: 0.9250902527075812
HOUSE_AND_HOME: 0.8235559566787004
WEATHER: 0.8009927797833934
EVENTS: 0.7107400722021661
PARENTING: 0.6543321299638989
ART_AND_DESIGN: 0.6430505415162455
COMICS: 0.62

- `FAMILY` is the most common category with 18.9%
- After that `GAME` (9.7%) and `TOOLS` (8.5%) categories are also quite common
- The other categories range between 0%-5%

The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.).  
However, if we investigate this further, we can see that the `FAMILY` category (which accounts for almost 19% of the apps) means mostly games for kids.

![Google Play Family](googleplay_family.png)

And let's examine the frequency table (in percentages) for the `Genres` in the Google Play store.

In [28]:
display_table(google_apps_free, 9)

Tools: 8.449909747292418
Entertainment: 6.069494584837545
Education: 5.347472924187725
Business: 4.591606498194946
Productivity: 3.892148014440433
Lifestyle: 3.892148014440433
Finance: 3.7003610108303246
Medical: 3.531137184115524
Sports: 3.463447653429603
Personalization: 3.3167870036101084
Communication: 3.2378158844765346
Action: 3.1024368231046933
Health & Fitness: 3.0798736462093865
Photography: 2.944494584837545
News & Magazines: 2.7978339350180503
Social: 2.6624548736462095
Travel & Local: 2.3240072202166067
Shopping: 2.2450361010830324
Books & Reference: 2.1435018050541514
Simulation: 2.0419675090252705
Dating: 1.861462093862816
Arcade: 1.8501805054151623
Video Players & Editors: 1.7712093862815883
Casual: 1.7599277978339352
Maps & Navigation: 1.3989169675090252
Food & Drink: 1.2409747292418771
Puzzle: 1.128158844765343
Racing: 0.9927797833935018
Role Playing: 0.9363718411552346
Libraries & Demo: 0.9363718411552346
Auto & Vehicles: 0.9250902527075812
Strategy: 0.913808664259927

The top 4 `Genres` are:
1. `Tools` (8.5%)
2. `Entertainment` (6.1%) 
3. `Education` (5.4%) 
4. `Business` (4.6%)

As was already shown by the `Category` frequency table, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the `Genres` column.

One thing we can notice on Google Play is that the `Genres` column is much more granular (it has more categories) than the `Category` column. We're looking for the bigger picture at the moment, so we'll only work with the `Category` column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps.   
Now we'd like to get an idea about the kind of apps that have most users.

## Data Analysis - most popular apps by genre on the App Store
To find out what genres are the most popular (have the most users), we have to calculate the average number of installs for each app genre. 

For the Google Play dataset, we can find this information in the `Installs` column, but this information is missing for the App Store dataset. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot`.

First we get all the unique app genres in the App Store by generating a frequency table.

In [29]:
freq_table_genres_apple = freq_table(apple_apps_free, 11)
print(freq_table_genres_apple)

{'Social Networking': 3.2898820608317814, 'Photo & Video': 4.9658597144630665, 'Games': 58.16263190564867, 'Music': 2.0484171322160147, 'Reference': 0.5586592178770949, 'Health & Fitness': 2.0173805090006205, 'Weather': 0.8690254500310366, 'Utilities': 2.5139664804469275, 'Travel': 1.2414649286157666, 'Shopping': 2.60707635009311, 'News': 1.3345747982619491, 'Navigation': 0.186219739292365, 'Lifestyle': 1.5828677839851024, 'Entertainment': 7.883302296710118, 'Food & Drink': 0.8069522036002483, 'Sports': 2.1415270018621975, 'Book': 0.4345127250155183, 'Finance': 1.1173184357541899, 'Education': 3.662321539416512, 'Productivity': 1.7380509000620732, 'Business': 0.5276225946617008, 'Catalogs': 0.12414649286157665, 'Medical': 0.186219739292365}


Now we add up the user ratings for the apps of each unique genre. And divide the sum by the number of apps belonging to that genre.

Note: when looping over a dictionary, you loop over its keys, not its values.

In [30]:
for genre in freq_table_genres_apple:
    total = 0 # The sum of user ratings specific to a genre
    len_genre = 0 # The number of apps specific to a genre

    for app in apple_apps_free:
        genre_app = app[11]
        if genre_app == genre:
            n_user_ratings = float(app[5])
            total += n_user_ratings
            len_genre += 1
            
    avg_n_user_ratings = total/len_genre
    print(f"{genre}: {avg_n_user_ratings}")

Social Networking: 71548.34905660378
Photo & Video: 28441.54375
Games: 22788.6696905016
Music: 57326.530303030304
Reference: 74942.11111111111
Health & Fitness: 23298.015384615384
Weather: 52279.892857142855
Utilities: 18684.456790123455
Travel: 28243.8
Shopping: 26919.690476190477
News: 21248.023255813954
Navigation: 86090.33333333333
Lifestyle: 16485.764705882353
Entertainment: 14029.830708661417
Food & Drink: 33333.92307692308
Sports: 23008.898550724636
Book: 39758.5
Finance: 31467.944444444445
Education: 7003.983050847458
Productivity: 21028.410714285714
Business: 7491.117647058823
Catalogs: 4004.0
Medical: 612.0


The most popular App Store apps are:
1. `Navigation` (86090)
2. `Reference` (74942)
3. `Social Networking` (71548)
4. `Music` (57326)
5. `Weather` (52279)

However for some genres this data is heavily skewed by a few apps having most of the ratings, while the rest of the apps in that genre aren't actually popular.

We can see that's the case for `Navigation` apps where Waze and Google Maps account for most of the reviews.

In [31]:
for app in apple_apps_free:
    genre = app[11]
    if genre == "Navigation":
        print(f"{app[1]}: {app[5]}")

Waze - GPS Navigation, Maps & Real-time Traffic: 345046
Google Maps - Navigation & Transit: 154911
Geocaching®: 12811
CoPilot GPS – Car Navigation & Offline Maps: 3582
ImmobilienScout24: Real Estate Search in Germany: 187
Railway Route Search: 5


The same for `Reference` apps, where the Bible and Dictionary.com which skew up the average rating.

In [32]:
for app in apple_apps_free:
    genre = app[11]
    if genre == "Reference":
        print(f"{app[1]}: {app[5]}")

Bible: 985920
Dictionary.com Dictionary & Thesaurus: 200047
Dictionary.com Dictionary & Thesaurus for iPad: 54175
Google Translate: 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran: 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition: 17588
Merriam-Webster Dictionary: 16849
Night Sky: 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE): 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools: 4693
GUNS MODS for Minecraft PC Edition - Mods Tools: 1497
Guides for Pokémon GO - Pokemon GO News and Cheats: 826
WWDC: 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free: 718
VPN Express: 14
Real Bike Traffic Rider Virtual Reality Glasses: 8
教えて!goo: 0
Jishokun-Japanese English Dictionary & Translator: 0


`Social Networking` apps also have a few heavyhitters like Facebook, Pinterest and WhatsApp. But there also seems to be room for a lot of smaller apps who still get a decent amount of ratings. 

**This is the genre in the App Store we would recommend developing apps for.**

In [33]:
for app in apple_apps_free:
    genre = app[11]
    if genre == "Social Networking":
        print(f"{app[1]}: {app[5]}")

Facebook: 2974676
Pinterest: 1061624
Skype for iPhone: 373519
Messenger: 351466
Tumblr: 334293
WhatsApp Messenger: 287589
Kik: 260965
ooVoo – Free Video Call, Text and Voice: 177501
TextNow - Unlimited Text + Calls: 164963
Viber Messenger – Text & Call: 164249
Followers - Social Analytics For Instagram: 112778
MeetMe - Chat and Meet New People: 97072
We Heart It - Fashion, wallpapers, quotes, tattoos: 90414
InsTrack for Instagram - Analytics Plus More: 85535
Tango - Free Video Call, Voice and Chat: 75412
LinkedIn: 71856
Match™ - #1 Dating App.: 60659
Skype for iPad: 60163
POF - Best Dating App for Conversations: 52642
Timehop: 49510
Find My Family, Friends & iPhone - Life360 Locator: 43877
Whisper - Share, Express, Meet: 39819
Hangouts: 36404
LINE PLAY - Your Avatar World: 34677
WeChat: 34584
Badoo - Meet New People, Chat, Socialize.: 34428
Followers + for Instagram - Follower Analytics: 28633
GroupMe: 28260
Marco Polo Video Walkie Talkie: 27662
Miitomo: 23965
SimSimi: 23530
Grindr - G

## Data Analysis - most popular apps by genre on Google Play
As mentioned before, to find out what genres are the most popular (have the most users), we have to calculate the average number of installs for each app category. For the Google Play dataset, we can find this information in the `Installs` column.  

However, the install numbers don't seem precise enough, we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.).  
For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000.

In [34]:
display_table(google_apps_free, 5)

1,000,000+: 15.726534296028879
100,000+: 11.552346570397113
10,000,000+: 10.548285198555957
10,000+: 10.198555956678701
1,000+: 8.393501805054152
100+: 6.915613718411552
5,000,000+: 6.825361010830325
500,000+: 5.561823104693141
50,000+: 4.7721119133574
5,000+: 4.512635379061372
10+: 3.5424187725631766
500+: 3.2490974729241873
50,000,000+: 2.3014440433213
100,000,000+: 2.1322202166064983
50+: 1.917870036101083
5+: 0.78971119133574
1+: 0.5076714801444043
500,000,000+: 0.2707581227436823
1,000,000,000+: 0.22563176895306858
0+: 0.04512635379061372
0: 0.01128158844765343


But we don't need very precise data for our purpose, we only want to find out which app categories attract the most users.  
We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

First we get all the unique app categories in the Google Play store by generating a frequency table for the `Category` column.

In [35]:
freq_table_categories_google = freq_table(google_apps_free, 1)
print(freq_table_categories_google) 

{'ART_AND_DESIGN': 0.6430505415162455, 'AUTO_AND_VEHICLES': 0.9250902527075812, 'BEAUTY': 0.5979241877256317, 'BOOKS_AND_REFERENCE': 2.1435018050541514, 'BUSINESS': 4.591606498194946, 'COMICS': 0.6204873646209386, 'COMMUNICATION': 3.2378158844765346, 'DATING': 1.861462093862816, 'EDUCATION': 1.1620036101083033, 'ENTERTAINMENT': 0.9589350180505415, 'EVENTS': 0.7107400722021661, 'FINANCE': 3.7003610108303246, 'FOOD_AND_DRINK': 1.2409747292418771, 'HEALTH_AND_FITNESS': 3.0798736462093865, 'HOUSE_AND_HOME': 0.8235559566787004, 'LIBRARIES_AND_DEMO': 0.9363718411552346, 'LIFESTYLE': 3.9034296028880866, 'GAME': 9.724729241877256, 'FAMILY': 18.907942238267147, 'MEDICAL': 3.531137184115524, 'SOCIAL': 2.6624548736462095, 'SHOPPING': 2.2450361010830324, 'PHOTOGRAPHY': 2.944494584837545, 'SPORTS': 3.395758122743682, 'TRAVEL_AND_LOCAL': 2.33528880866426, 'TOOLS': 8.461191335740072, 'PERSONALIZATION': 3.3167870036101084, 'PRODUCTIVITY': 3.892148014440433, 'PARENTING': 0.6543321299638989, 'WEATHER': 

Now we add up the installs for the apps of each unique category. And divide the sum by the number of apps belonging to that category.

In [36]:
for category in freq_table_categories_google:
    total = 0 # The sum of installs specific to a category
    len_category = 0 # The number of apps specific to a category

    for app in google_apps_free:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace("+", "")
            n_installs = n_installs.replace(",", "")
            n_installs = float(n_installs)
            total += n_installs
            len_category += 1
            
    avg_n_installs = total/len_category
    print(f"{category}: {avg_n_installs}")

ART_AND_DESIGN: 1986335.0877192982
AUTO_AND_VEHICLES: 647317.8170731707
BEAUTY: 513151.88679245283
BOOKS_AND_REFERENCE: 8767811.894736841
BUSINESS: 1712290.1474201474
COMICS: 817657.2727272727
COMMUNICATION: 38456119.167247385
DATING: 854028.8303030303
EDUCATION: 1833495.145631068
ENTERTAINMENT: 11640705.88235294
EVENTS: 253542.22222222222
FINANCE: 1387692.475609756
FOOD_AND_DRINK: 1924897.7363636363
HEALTH_AND_FITNESS: 4188821.9853479853
HOUSE_AND_HOME: 1331540.5616438356
LIBRARIES_AND_DEMO: 638503.734939759
LIFESTYLE: 1437816.2687861272
GAME: 15588015.603248259
FAMILY: 3695641.8198090694
MEDICAL: 120550.61980830671
SOCIAL: 23253652.127118643
SHOPPING: 7036877.311557789
PHOTOGRAPHY: 17840110.40229885
SPORTS: 3638640.1428571427
TRAVEL_AND_LOCAL: 13984077.710144928
TOOLS: 10801391.298666667
PERSONALIZATION: 5201482.6122448975
PRODUCTIVITY: 16787331.344927534
PARENTING: 542603.6206896552
WEATHER: 5074486.197183099
VIDEO_PLAYERS: 24727872.452830188
NEWS_AND_MAGAZINES: 9549178.467741935
MA

The `Social Networking` genre that we recommended for the App Store, seems to be a combination of the `COMMUNICATION`, `SOCIAL` and `DATING` categories on Google Play. 

On average, `COMMUNICATION` apps have the most installs: 38,456,119.  
But this number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs.

In [37]:
for app in google_apps_free:
    if app[1] == "COMMUNICATION" and (app[5] == "1,000,000,000+"
                                      or app[5] == "500,000,000+"
                                      or app[5] == "100,000,000+"):
        print(F"{app[0]}: {app[5]}")

WhatsApp Messenger: 1,000,000,000+
imo beta free calls and text: 100,000,000+
Android Messages: 100,000,000+
Google Duo - High Quality Video Calls: 500,000,000+
Messenger – Text and Video Chat for Free: 1,000,000,000+
imo free video calls and chat: 500,000,000+
Skype - free IM & video calls: 1,000,000,000+
Who: 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji: 100,000,000+
LINE: Free Calls & Messages: 500,000,000+
Google Chrome: Fast & Secure: 1,000,000,000+
Firefox Browser fast & private: 100,000,000+
UC Browser - Fast Download Private & Secure: 500,000,000+
Gmail: 1,000,000,000+
Hangouts: 1,000,000,000+
Messenger Lite: Free Calls & Messages: 100,000,000+
Kik: 100,000,000+
KakaoTalk: Free Calls & Text: 100,000,000+
Opera Mini - fast web browser: 100,000,000+
Opera Browser: Fast and Secure: 100,000,000+
Telegram: 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer: 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure: 100,000,000+
Viber Messenger: 500,000,000+
WeC

We see the same pattern for `SOCIAL` apps (23,253,652 installs). The category is dominated by a few giants.

In [38]:
for app in google_apps_free:
    if app[1] == "SOCIAL" and (app[5] == "1,000,000,000+"
                                      or app[5] == "500,000,000+"
                                      or app[5] == "100,000,000+"):
        print(F"{app[0]}: {app[5]}")

Facebook: 1,000,000,000+
Facebook Lite: 500,000,000+
Tumblr: 100,000,000+
Pinterest: 100,000,000+
Google+: 1,000,000,000+
Badoo - Free Chat & Dating App: 100,000,000+
Tango - Live Video Broadcast: 100,000,000+
Instagram: 1,000,000,000+
Snapchat: 500,000,000+
LinkedIn: 100,000,000+
Tik Tok - including musical.ly: 100,000,000+
BIGO LIVE - Live Stream: 100,000,000+
VK: 100,000,000+


The `DATING` apps (854,028 installs) maybe have less total installs, but this is due to a lack of a few big apps taking all the cake. It seems the installs are more evenly spread, with a lot of diverse offerings catering to different niches. This makes it an interesting category that is welcoming to new apps.  
**Therefore we recommend the `DATING` app genre on Google Play.**

In [39]:
for app in google_apps_free:
    if app[1] == "DATING":
        print(F"{app[0]}: {app[5]}")

Match™ Dating - Meet Singles: 10,000,000+
Hinge: Dating & Relationships: 500,000+
Casual Dating & Adult Singles - Joyride: 5,000,000+
CMB Free Dating App: 1,000,000+
eharmony - Online Dating App: 5,000,000+
Free Dating App & Flirt Chat - Match with Singles: 5,000,000+
Chispa, the Dating App for Latino, Latina Singles: 100,000+
Clover Dating App: 500,000+
Black People Meet Singles Date: 1,000,000+
Mingle2 - Free Online Dating & Singles Chat Rooms: 1,000,000+
Free Dating App & Flirt Chat - Cheers: 10,000+
Blendr - Chat, Flirt & Meet: 1,000,000+
Free Dating Hook Up Messenger: 100,000+
Find Real Love — YouLove Premium Dating: 10,000,000+
Once - Quality Matches Every day: 1,000,000+
BLK - Swipe. Match. Chat.: 100,000+
Howlr: 5,000+
Stranger Chat & Date: 500,000+
RandoChat - Chat roulette: 1,000,000+
BeWild Free Dating & Chat App: 100,000+
OurTime Dating for Singles 50+: 100,000+
FarmersOnly Dating: 100,000+
Sudy – Meet Elite & Rich Single: 500,000+
Single Parent Meet #1 Dating: 500,000+
Eli

## Conclusion
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that a **dating app** fits the requirements. The genre (which falls within the `Social Networking` genre on the App Store) is popular on both the App Store and Google Play, but is not dominated by a few giants. It seems to be a welcoming environment to new apps, since there are so many diverse audiences to cater the app to.