# Profitability of Mobile Apps from Google Play and the App Store

The mobile app industry has blown up in the last decade or so, with over 4 million apps available for download between the App Store (iOS) and and Google Play (Android). This project focuses on finding the apps that generate the most revenue in the App stores. This data analysis will iOS developers to make educated decisions that will based on the data results to create profitable apps.

Our company only makes apps that free to download. Due to this, the only source of revenue comes from in-app ads. This means that we are mainly concerned about increasing the number of people that download our app. This goal of this project is to uncover what types of apps attract the most users.

## Exploring the Data
As of 2018, there are roughly 2.1 million Android apps in the Google Play store and 2 million iOS apps in the App Store. Collecting data on 4 million apps requires a lot of time and money. To avoid this, we will be looking at subsets of two datasets that will satisfy our needs. Below are the two datasets we will be looking at:

* [Dataset 1](https://www.kaggle.com/lava18/google-play-store-apps/home#googleplaystore.csv): contains data about approximately 10,000 Android apps from Google Play
* [Dataset 2](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home): contains data about approximately 7,000 iOs apps from the App Store

The following code is used to load in the two datasets.

In [1]:
#Dataset 1 - Android App Data
open_file = open("googleplaystore.csv")
from csv import reader 
read_file = reader(open_file)
android = list(read_file)
android_header = android[0]
android_data = android[1:]

#Dataset 2 - iOS App Data
open_file = open("AppleStore.csv")
read_file = reader(open_file)
ios = list(read_file)
ios_header = ios[0]
ios_data = ios[1:]

Now that we have loaded in the data, we're going to start exploring the data. In order to do so, we use a function called ```explore_data()``` that will allow us to easily print rows in a readable format.

In [2]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new empty line after each row
        
    if rows_and_columns:
        print("Number of rows: ", len(dataset))
        print("Number of columns: ", len(dataset[0]))

In [3]:
print(android_header)
print('\n')
explore_data(android_data, 0, 3, True)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows:  10841
Number of columns:  13


In our first dataset (Android apps), we see that it has 10,841 rows and 13 columns. The columns of importance for our analysis are ```App, Category, Rating, Reviews, Size, Installs, Type, and Genre```. For more information about the data columns, please refer to the [documentation](https://www.kaggle.com/lava18/google-play-store-apps/home#googleplaystore.csv).

Now let's take a look at Dataset 2 (iOS apps).

In [4]:
print(ios_header)
print('\n')
explore_data(ios_data, 0, 2, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows:  7197
Number of columns:  16


This dataset has 7,197 rows and 16 columns. The important columns for our analysis seem to be ```track_name, size_bytes, price, rating_count_tot, rating_count_ver, and prime_genre```. For more information about the data columns, please refer to the [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).

## Removing incomplete observations

According to the [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion) of the Google Play data set, [one discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) details an error for a certain index. It says that there is a missing value for ```Rating``` and the subsequent columns have been shifted over. Below, we will print that row to see if the error exists.

In [5]:
print(android_header)
print('\n')
explore_data(android_data, 10472, 10473)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']




Here we see that the ```Rating``` section has a value of "19" which does not make a lot of sense because ratings range from 0.0 to 5.0. This value seems to refer to the number of reviews the app has, which means that the error mentioned in the discussion is still present. The following code removes that entry from our data set.


In [6]:
print(len(android_data))
del android_data[10472]
print(len(android_data))

10841
10840


## Dealing with Duplicates

### Part One
When dealing with large data sets, it is common for duplicates to exist. This data set is no exception. There are several entries for the Instagram app, for example. 

In [7]:
for row in android_data:
    name = row[0]
    if name == "Instagram":
        print(row)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


As you can see, there are 4 observations for Instagram. Below is some that with count the number of duplicates in the data set.

In [8]:
duplicates = []
unique = []
for row in android_data:
    name = row[0]
    if name in unique:
        duplicates.append(name)
    else:
        unique.append(name)

print("Number of duplicate apps: ", len(duplicates))
print('\n')
print("Examples of duplicate apps: ", duplicates[:15])

Number of duplicate apps:  1181


Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


As we can see, there are 1,181 duplicate apps. We want to keep only unique observations for each app and delete duplicates. Rather than randomly deleting the duplicates, we want to select the observation with the most reviews and delete the other duplicates. 

Take the Instagram example printed above. The fourth column of each row corresponds to the number of ratings of the app. The second instance of Instagram has 66,577,446 reviews which is the highest out of the four so we keep that observation.

### Part Two

To do this, we first create a dictionary.

In [9]:
reviews_max = {}
for row in android_data:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews
print(len(reviews_max))

9659


* The dictionary we create is called "reviews_max". We start it off as an empty list.
* Then, for each iteration of the data of the Google Play data set, we:
    * isolate the name variable which stores the name of the app that we are looking at and the "n_reviews" variable which stores the number of reviews that the app has.

    * If this is not our first time seeing an app this name (a duplicate) and the duplicate has a higher number of reviews than the one currently in the dictionary: 
        * then we replace the dictionary's number of reviews with the duplicates. 
    * If this is our first time seeing an observation with this name:
        * we create a new entry in our dictionary with this app's name and number of reviews. 

We print out the length of our new, cleaned data set, which correctly has 9,659 observations. (10,841 total observations - 1,181 duplicates = 9,659 unique observations)

In [10]:
android_clean = []
already_added = []
for row in android_data:
    name = row[0]
    n_reviews = float(row[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(row)
        already_added.append(name)
        
print(len(android_clean))

9659


Now, we use the reviews_max dictionary to get rid of the duplicates. For the duplicate cases, we'll only keep the entries with the highest number of reviews. In the code cell above:

* We start by creating two empty lists, android_clean and already_added.
* We loop through the Google Play data set, and for every iteration:
    * We isolate the name of the app and the number of reviews.
    * We add the current app row to the android_clean list, and the app name to the already_cleaned list if:
        * The number of reviews of the current app equals the number of reviews of that app in the reviews_max dictionary and
        * The name of the app is not already in the already_added list. We include this condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry. For example, if the Instagram app had 4 iterations all with the same number of reviews. If we just check when ```reviews_max[name] == n_reviews```, there will still be duplicate entries in our data.
        
Now to ensure everything went accordingly, we want to explore our newly cleaned data set.

In [11]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows:  9659
Number of columns:  13


We have 9,659 rows as expected.

## Removing Non-English Apps

Through exploring the app names, there are some apps geared towards a non-english audience. Here are some examples: 

In [12]:
print(ios_data[813][1])
print(ios_data[6731][1])
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


We are not interested in these apps, so we want to remove them. In order to do so, we create a function called ```is_english``` that returns true if the app is in English. English text would include the alphabet (a-z), numbers (0-9), punctuation marks (., !, ?, etc.) and other symbols (+, -, _, etc.). 

The way we can determine whether or not a character is part of the english language is through the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.

We use the built-in ```ord``` function to determine a character's ASCII number. 

Since there are symbols outside of the range of 0-127 (English Text) that are often used in English apps, we only want to remove data that has more than 3 non-english characters.

In [13]:
def is_english(a_string):
    count = 0
    for char in a_string:
        if ord(char) > 127:
            count += 1
        if count > 3:
            return False
    return True

print(is_english("Instagram"))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(ord("™"))
print(ord("😜"))

            

True
False
True
True
8482
128540


As you can see, the function labeled 'Docs To Go™ Free Office Suite' or  'Instachat 😜' as being in English despite having the "™" (8482) and "😜" (128,540) which are outside of the range of typical English text (0-127).

Below, we use the ```is_english``` function to filter out non-english apps from our two data sets.

In [14]:
android_clean_eng = []
for row in android_clean:
    name = row[0]
    if is_english(name):
        android_clean_eng.append(row)
        
ios_eng = []
for row in ios_data:
    name = row[1]
    if is_english(name):
        ios_eng.append(row)

In [15]:
explore_data(android_clean_eng, 0, 3, True)
print('\n')
explore_data(ios_eng, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows:  9614
Number of columns:  13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+'

There are 9,614 Android and 6,183 iOS english apps.

## Isolating Free Apps

As we mentioned in the introduction, the company  only makes apps that are free to download and install, and their main source of revenue comes from in-app ads. Our data sets contain both free and non-free apps, so we'll need to isolate the free apps for our analysis. Below is code that does this,

In [16]:
android_free = []
for row in android_clean_eng:
    price = row[7]
    if price == '0':
        android_free.append(row)

ios_free = []
for row in ios_eng:
    price = row[4]
    if price == '0.0':
        ios_free.append(row)

print(len(android_free))
print(len(ios_free))

8864
3222


We are left with 8,864 Android apps and 3,222 iOS apps.

## Most Common Apps by Genre

### Part One
As previously mentioned in our introduction, we want determine which types of apps tend to attract the most users since our revenue is heavily influenced by the number of people who use our apps.

To reduce risks and overhead costs, our validation strategy for creating a profitable app is as follows:

1. Build a minimal Android version of the app, and add it to Google Play.
    * This is because Google Play has a larger market for apps than the App Store
2. If the app has a good response from users, we continue development on it.
3. If the app is profitable after 6 months, we build an iOS version of the app, and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For example, a profile that could work well for both markets could be a productivity app that makes use of gamification (making menial tasks into a game).

For our analysis, we want to first find out which genres of apps are the most popular on each store. We can do this from generating frequency tables of the ```Genres``` and ```prime_genre``` columns from the Android and iOS data respectively.

## Part Two

We will make two functions we can use to analyze the frequency tables:
* One function to generate frequency tables that show percentages
* One function to display the percentages in descending order


In [17]:
def freq_table(dataset, index):
    table = {}
    for row in dataset:
        val = row[index]
        if val in table:
            table[val] += 1
        else:
            table[val] = 1
   
    table_percentages = {}
    for key in table:
        percentage = (table[key] / len(dataset) * 100)
        table_percentages[key] = percentage
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
         

## Part Three

Let's start by analyzing thr ```prime_genre``` column of the App Store data set.

In [18]:
display_table(ios_free, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Of the free apps in the App Store, "Games" is by far the most popular genre, making up 58.16% of the market. The second most popular genre is "Entertainment" with only 7.88%, followed by "Photo & Video" with 4.97%. Education and Social Networking only make up 3.66% and 3.29% respectively.

From this, the App Store seems to be dominated by apps for Entertainment (games, photo and video, social networking, sports, music) purposes rather than for practical purposes. It is hard to recommend a profile based solely on the frequency tables. While games are the most common type of app, this does not mean that games necessary have the most users. There could be an oversaturation of games in the market.

In [19]:
# Category
display_table(android_free, 1)


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

The breakdown of categories in the Google Play store looks drastically different from the App Store. Family is the most common category, making up 18.91% of the market. Rounding out the top 5 are Games (9.72%), Tools (8.46%), Business (4.59%) and Lifestyle (3.90%). Comparativele, the Google Play store is not dominated by entertainment categories. A good number of apps fall under practical purposes.

Looking at the Genre frequency table for Google Play:

In [20]:
# Genre
display_table(android_free, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

It is difficult to compare the ```Genre``` and ```Category``` because there are many more genres than categories. We will use the ```Category``` column for our analysis.

In summary, the App Store has more entertainment-based apps, while Google Play has a more balanced range of apps. 

## Most Popular Apps by Genre for the App Store

As mentioned before, we want to find the genre of apps that has the most users. One way to accomplish this is by calculating the average number of installs per app genre. For the Google Play data set, we can use the ```Installs``` column, however there is no such column in the App Store data set. To remedy this, we will use the ```rating_count_tot``` column which is the total number of user ratings.

In [21]:
genre_fq = freq_table(ios_free, 11)
for genre in genre_fq:
    total = 0 
    len_genre = 0
    for row in ios_free:
        genre_app = row[11]
        if genre_app == genre:
            user_ratings = float(row[5])
            total += user_ratings
            len_genre += 1
    avg_rating = total / len_genre
    print(genre, ":", avg_rating)

Medical : 612.0
Photo & Video : 28441.54375
Social Networking : 71548.34905660378
Reference : 74942.11111111111
Business : 7491.117647058823
Catalogs : 4004.0
Entertainment : 14029.830708661417
Health & Fitness : 23298.015384615384
Music : 57326.530303030304
Games : 22788.6696905016
Productivity : 21028.410714285714
Education : 7003.983050847458
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Sports : 23008.898550724636
Lifestyle : 16485.764705882353
Finance : 31467.944444444445
Shopping : 26919.690476190477
Food & Drink : 33333.92307692308
Book : 39758.5
Navigation : 86090.33333333333
News : 21248.023255813954


From this analysis, it seems that "Navigation" is the app genre with the most user ratings because it has the highest average number of ratings. But let's take a closer look: 

In [22]:
for row in ios_free:
    if row[11] == "Navigation":
        print(row[1], ":", row[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Interesting. We see that two apps (Waze and Google Maps) make up the bulk of the results. "Navigation" is too influenced by those two apps. Similar things can be seen in "Social Networking" where a few apps dominate the genre (Facebook, Instagram, Twitter, etc.). 

We want to find genres that are popular collectively. These apps make the genres seem more popular than they actually are.

From exploring the data, we found that the App Store is saturated with for-fun apps. Our best plan to develop an app that will attract the most users is:
1. Choose a genre that would be considered "practical" (not for-fun).
2. Choose a genre that is not dominated by a couple apps within the genre.

With this critera, "Reference" seems like our best bet. Let's get a closer look at the "Reference" genre.

In [23]:
for row in ios_free:
    if row[11] == "Reference":
        print(row[1], ":", row[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


This seems a lot more balanced than the "Navigation". One app recommendation could be an app called *"GameGuider"* that compiles extensive guides of popular games for different consoles (Nintendo Switch, PS4, Xbox One, PC, etc.).

This app is both practical and for-fun because it involves video games with informative guides on how to traverse through the game. The gaming industry is becoming larger by the day so why not make an app that can capitalize on that. The main draw of *"GameGuider"* is that it will have all the guides in one app with filters, so finding what you need is seamless.

Now that we have foun the most popular apps by genre for the App Store, let's do the same for Google Play.


## Most Popular Apps by Genre for Google Play

For the Google Play data, we have data about the number of installs for the market, so we should be able to get a clearer idea about genre popularity. However, the install numbers don't seem precise enough since we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):


In [24]:
display_table(android_free, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


There are a couple of issues with this data. For one, the number of installs is an open range as opposed to a single numerical value. For example, we don't know if 10,000+ installs is 75,000 or 10,000. For our analysis, we do not require precise numbers, so we will just assume that 10,000+ refers to 10,000. 

To perform computations of the ```Installs``` column, we need to convert the numbers from ```strings``` to ```floats```. This means we must remove the "+" and commas from each observation.


In [25]:
category_fq = freq_table(android_free, 1)
for category in category_fq:
    total = 0
    len_category = 0
    for row in android_free:
        category_app = row[1]
        if category_app == category:
            installs = row[5]
            installs = installs.replace(",", "")
            installs = installs.replace("+", "")
            installs = float(installs)
            total += installs
            len_category += 1
    avg_installs = total / len_category
    print(category, ":", avg_installs)
            

ART_AND_DESIGN : 1986335.0877192982
SHOPPING : 7036877.311557789
PARENTING : 542603.6206896552
SPORTS : 3638640.1428571427
MEDICAL : 120550.61980830671
GAME : 15588015.603248259
EVENTS : 253542.22222222222
COMICS : 817657.2727272727
PERSONALIZATION : 5201482.6122448975
NEWS_AND_MAGAZINES : 9549178.467741935
WEATHER : 5074486.197183099
LIBRARIES_AND_DEMO : 638503.734939759
HEALTH_AND_FITNESS : 4188821.9853479853
FOOD_AND_DRINK : 1924897.7363636363
LIFESTYLE : 1437816.2687861272
HOUSE_AND_HOME : 1331540.5616438356
BEAUTY : 513151.88679245283
MAPS_AND_NAVIGATION : 4056941.7741935486
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PRODUCTIVITY : 16787331.344927534
TOOLS : 10801391.298666667
FINANCE : 1387692.475609756
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
BOOKS_AND_REFERENCE : 8767811.894736841
TRAVEL_AND_LOCAL : 13984077.710144928
AUTO_AND_VEHICLES : 647317.8170731707
ENTERTAINMENT : 11640705.88235294
FAMILY : 3695641.8198090694
PHOTOGRAPHY : 1784011

The category with the largest average number of installs is "Communication" with "38456119". Let's see if there are a few very popular apps that are skewing the results. 

In [26]:
for row in android_free:
    category = row[1]
    if category == "COMMUNICATION":
        id = row[0]
        installs = row[5]
        print(id, ":", installs)

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

This is heavily skewed by apps with over 1 billion installs (such as Skype, Gmail, Google Hangouts, Messenger, etc.)

We want to remove any apps with over 100 million installs to get a better picture of the distribution of the data. In doing so, we see that the number of installs is a tenth of what it was previously:

In [27]:
apps_under_100m = []
for row in android_free:
    installs = row[5]
    installs = installs.replace(",", "")
    installs = installs.replace("+", "")
    installs = float(installs)
    category = row[1]
    if (category == "COMMUNICATION") and (installs < 100000000):
        apps_under_100m.append(installs)
print(sum(apps_under_100m) / len(apps_under_100m))

3603485.3884615386


Many other categories such as "PHOTOGRAPHY" an "VIDEO_PLAYERS" suffer from the same issue that "COMMUNICATION" has; its market is dominated by large apps.

It is not a good decision to design an app in a market where certain have a monopoly.

Let's look at the "BOOKS_AND_REFERENCE" category since it was the market we chose for the App Store. 

In [28]:
for row in android_free:
    category = row[1]
    if category == 'BOOKS_AND_REFERENCE':
        print(row[0], ":", row[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

These apps are more varied. There are dictionaries, How-Tos, Bibles, etc. here. However, there are 5 apps that could be skewing the average with their large number of installs:

In [31]:
for row in android_free:
    installs = row[5]
    category = row[1]
    if category == "BOOKS_AND_REFERENCE" and (installs == "1,000,000,000+"
                                             or installs == "500,000,000+"
                                             or installs == "100,000,000+"):
        print(row[0], ":", row[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


There are only a few apps with that many installs, so there is still potential in this market. To get an idea of a successful app, let's look at at the apps in this category that have between ```1,000,000``` and ```100,000,000``` installs:

In [34]:
for row in android_free:
    installs = row[5]
    category = row[1]
    if category == "BOOKS_AND_REFERENCE" and (installs == "50,000,000+"
                                             or installs == "10,000,000+"
                                             or installs == "5,000,000+"
                                             or installs == "1,000,000+"):
        print(row[0], ":", row[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This market seems dominated with E-book and Dictionary apps. In order to lessen our competition, we will avoid creating similar apps. There are a couple of apps here made about the Quran, which shows that taking something popular and converting it into an app format is effective. 

With this idea in mind, our "*GameGuider*" idea that we presented in the analysis of the genres of the App Store.

## Conclusion

In this project, we analyzed data about the App Store and Google Play mobile app markets in order to recommend an app profile that could be profitable in both markets.

We concluded that creating an app called *"GameGuider"* that compiles extensive guides of popular games for different consoles (Nintendo Switch, PS4, Xbox One, PC, etc.) could be effective.

This app is both practical and for-fun because it involves video games with informative guides on how to traverse through the game. This app could have potential in both app markets.