# Profitable App Profiles for the App Store and Google Play Markets
Our company builds Android and iOS mobile apps that are free to download and install. The profit of our company is mainly contributed by in-app ads, which are mostly influenced by the number of users that use our apps.

The **goal** of this project is **to analyse the data of profitable free apps targeting English-speaking users, to provide insights to our developers about the type of apps, which are likely to adopt more users**.

## Opening and Exploring the Data
As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

We first try to find any relevant existing data for free to prevent spending a significant amount of time and money by collecting new data ourselves. We found two datasets, which seem to serve for our purpose:

- [Google Play Store dataset](https://www.kaggle.com/lava18/google-play-store-apps): Approximately ten thousand **Android** apps. [Download](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv) this dataset.
- [App Store dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps): Approximately seven thousand **iOS** apps. [Download](https://dq-content.s3.amazonaws.com/350/AppleStore.csv) this dataset.

We will start by opening these two datasets:

In [1]:
from csv import reader

### The Google Play dataset ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store dataset ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

We apply `explore_data()` function to explore the datasets more easily, as it allows us to repeatedly explore rows in a more readable way. Our function also shows the number of rows and columns for any dataset optionally.

The `explore_data()` function:

- Accepts four parameters:
    - `dataset`: It is expected to be a _list of lists_.
    - `start` and `end`: They are expected to be _integers_. They represent the starting and the ending indices of a slice from the dataset.
    - `rows_and_columns`: It is expected to be a _Boolean_ and has `False` as a default argument.
- Slices the dataset using `dataset[start:end]`.
- Prints the number of rows and columns if `rows_and_columns` is `True`.
    - `dataset` It should not have a header row, otherwise, the function will print the wrong number of rows (one more row compared to the actual length).

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:      ', len(dataset))
        print('Number of columns:   ', len(dataset[0]))

In [3]:
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows:       10841
Number of columns:    13


Our function shows that Google Play dataset has 10841 Android apps and 13 columns. The columns that appear to serve the purpose of our analysis are:
- `App`
- `Category`
- `Reviews`
- `Installs`
- `Type`
- `Price`
- `Genres`

Now, we will explore the App Store dataset.

In [4]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows:       7197
Number of columns:    16


App Store dataset contains 7197 iOS apps and 16 columns. Unfortunately, not all column names are self-explanatory. For the descriptions of all columns in the dataset, please check the App Store [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home). 

The columns that seem interesting are: 
- `Track_Name`: App name
- `Currency`: Currency Type
- `Price`: Price amount
- `Rating_Count_Tot`: User Rating counts (for all version)
- `Rating_Count_Ver`: User Rating counts (for current version)
- `Prime_Genre`: Primary Genre

## Data Cleaning
To ensure that our data is **accurate** before performing data analysis, we have to:
- Identify inaccurate data and subsequently correct or delete it.
- Identify duplicate data and delete the duplicates.

In alignment with our business goal, we will focus on the apps that are **free** to download and target **English-speaking** users. Hence, we also need to:
- Delete paid apps.
- Delete non-English apps.

### Deleting Inaccurate Data
The Google Play dataset has a dedicated [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion). [One of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) disclosed an error on _row 10472_.

We will first verify whether the row is really incorrect. We will do it by printing the row at that index and compare it against the header and another row that is correct.

In [5]:
print(android[10472])    # Incorrect row
print('\n')
print(android_header)    # Header
print('\n')
print(android[0])    # Correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


The _row 10472_ corresponds to the app _Life Made WI-Fi Touchscreen Photo Frame_. The rating of this app is 19, which is beyond the rating range for Google Play app, i.e. 1-5 (also mentioned in the [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015)). We also noticed that the value for `Category` is missing, which has shifted the `Rating` column and caused an error. Therefore, we will delete this row to avoid it from messing up our data analysis:

In [6]:
print('Number of Android apps:')
print('    - Before deleting row 10472:  ', len(android))
del android[10472]
print('    - After deleting row 10472:   ', len(android))

Number of Android apps:
    - Before deleting row 10472:   10841
    - After deleting row 10472:    10840


### Deleting Duplicate Entries
#### Part 1: Identifying Duplicate Entries
Based on our exploration of the Google Play dataset and observation at the [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), we noticed duplicate entries for some apps. For example, four entries were detected for *Instagram*:

In [7]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)
        print('\n')

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




In total, there are 1,181 cases where an app has more than one entry:

In [8]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:   ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:   ', duplicate_apps[:20])

Number of duplicate apps:    1181


Examples of duplicate apps:    ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express']


To perform proper data analysis, we need to remove the duplicate entries for these apps carefully.

Based on our inspection on the rows of _Instagram_ app printed above, only the number of `Reviews` (fourth position of each row or *index: 3*) are different on each row, which indicates different data collection time. We decided to **keep the rows with the highest number of reviews** because the higher the number of reviews, the more reliable the ratings and the more recent the data should be.

If the entries for the `Last Updated` were different, it is also a good criterion for keeping the rows for analysis, as the most recent the date is, the most updated the data should be.

#### Part II: Deleting the Duplicate Entries
To remove the duplicates, we will:
- Create a dictionary with unique apps:
    - key: `unique app name`
    - value: `highest number of reviews of that app`
- Use the information stored in the dictionary and create a new dataset, which will have only one entry per app with the highest number of reviews.

In [9]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (name in reviews_max) and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews

Earlier, we showed that the Google Play dataset contains 10,840 rows of entries and 1,181 duplicates. So, we expect that the number of entries of `review_max` dictionary to be the difference between these two numbers, i.e. 9,659 rows.

In [10]:
print('Expected length:   ', len(android) - 1181)
print('Actual length:     ', len(reviews_max))

Expected length:    9659
Actual length:      9659


Now, we will use `reviews_max` dictionary to remove the duplicate rows.

To keep only the highest number of reviews for the duplicated apps, we apply the code below:

- We initialise two empty lists: `android_clean` and `already_added`.
- We loop through the Android dataset, and for every iteration:
    - We extract the `name of the app` (*index: 0*) and the `number of reviews` (*index: 3*).
    - We add the current row (`app`) to the `android_clean` list, and the app name (`name`) to the `already_added` list if:
        - The number of reviews of the current app matches the number of reviews of that app as described in the `reviews_max` dictionary; and
        - The name of the app is not already in the `already_added` list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the `Box` app has three entries, and the number of reviews is the same). If we just check for `reviews_max[name] == n_reviews`, we'll still end up with duplicate entries for some apps.

In [11]:
android_clean = []    # store new cleaned dataset
already_added = []    # store app names

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

We examined the number of rows of the clean Android dataset (`android_clean`) and confirm that it only contains unique apps as its length is the same as the length of `reviews_max`, i.e. 9,659 rows.

In [12]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows:       9659
Number of columns:    13


To further verify whether we have removed the duplicates, we check the number of entries of the *Instagram* app (the example that we used earlier). In a previous code cell, we showed that there were four entries for *Instagram*. Now, the `android_clean` dataset only displays one entry for *Instagram*. This indicates that the duplicates have been removed.

In [13]:
for app in android_clean:
    name = app[0]
    if name == 'Instagram':
        print(app)
        print('\n')

['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




Next, we examine whether there are any duplicate entries on the App Store dataset by checking the `id` column (*index: 0*). We did not find any duplicates, therefore, it is not necessary for us to take further action.

In [14]:
ios_duplicate_apps = []
ios_unique_apps = []

for app in ios:
    id_ = app[0]
    if id_ in ios_unique_apps:
        ios_duplicate_apps.append(id_)
    else:
        ios_unique_apps.append(id_)

print('Number of duplicate iOS apps:   ', len(ios_duplicate_apps))

Number of duplicate iOS apps:    0


### Deleting Non-English Apps
#### Part One: Identifying Non-English Apps
While exploring the datasets, we detected some non-English apps, which are beyond our interest. Thus, we need to remove those non-English apps from the datasets. Here are a few examples of non-English apps in our datasets:

In [15]:
print('App Store dataset:')
print('    ', ios[813][1])
print('    ', ios[6731][1])
print('\n')
print('Google Play dataset:')
print('    ', android_clean[4412][0])
print('    ', android_clean[7940][0])

App Store dataset:
     爱奇艺PPS -《欢乐颂2》电视剧热播
     【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


Google Play dataset:
     中国語 AQリスニング
     لعبة تقدر تربح DZ


The **commonly used English characters** correspond to [ASCII](https://en.wikipedia.org/wiki/ASCII) (American Standard Code for Information Interchange) system, are in the **code range of 0 to 127**. We can use this code range to develop a function that differentiates the common English characters from the others. If an app name contains a character that is greater than 127, then it probably means that the app has a non-English name.

To differentiate English from non-English characters, we develop `english` function that applies [`ord()` built-in function](https://docs.python.org/3/library/functions.html#ord) to get the corresponding ASCII code of each character:

In [16]:
def english(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True

In [17]:
print(english('Instagram'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


The function seems to work fine, except for some English app names contains emojis or other symbols (`™`, `—` (em dash), `–` (en dash), etc.), which are not within the ASCII range. This is not ideal, as we do not want to lose any English apps for our data analysis.

In [18]:
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))

print(ord('™'))
print(ord('😜'))

False
False
8482
128540


#### Part Two: Optimizing the Filter
To minimize data loss, we modify the `english` function to allow an app name to contain up to three emoji or special characters to be considered as an English app. In other words, our function only filters an app with more than three characters of corresponding codes that are beyond the ASCII range (0-127).

In [19]:
def english(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

In [20]:
print(english('Instagram'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))

True
False
True
True


Although the function is not perfect, the number of non-English apps that could escape our filter is pretty low. Therefore, the `english` function is satisfactory for our data analysis purpose.

Now, we will use the `english` function to filter out both Android and iOS datasets.

In [21]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if english(name) == True:
        android_english.append(app)

for app in ios:
    name = app[1]
    if english(name) == True:
        ios_english.append(app)

print('Google Play Store with English apps only:')
print('\n')
explore_data(android_english, 0, 3, True)
print('_________________________________________')
print('\n')
print('App Store with English apps only:')
print('\n')
explore_data(ios_english, 0, 3, True)

Google Play Store with English apps only:


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows:       9614
Number of columns:    13
_________________________________________


App Store with English apps only:


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '

After filtering out the non-English apps, we have 9,614 apps in Android and 6,183 apps in iOS datasets.

### Isolating the Free Apps
To isolate the free apps for our data analysis, we use the lines of code below:

In [22]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)

for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)

print('Google Play Store — Final:')
print('\n')
explore_data(android_final, 0, 3, True)
print('_________________________________________')
print('\n')
print('App Store — Final:')
print('\n')
explore_data(ios_final, 0, 3, True)

Google Play Store — Final:


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows:       8864
Number of columns:    13
_________________________________________


App Store — Final:


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['52947

After our data cleaning process, we have **8,864 Android free apps** and **3,222 iOS free apps** for our data analysis.

## Data Analysis

### Most Common Apps by Genre
#### Part One: Our Validation Strategy
As mentioned earlier, our goal is to find out the types of free English apps that have high adoption rates, as it impacts our revenue. 

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we will develop it further.
3. If the app is profitable after six months, we will also build an iOS version of the app and add it to the App Store.

We will start by analysing the most common genres for each market by building frequency tables for `Prime_Genre` column (*index: 11*) of the App Store dataset, and the `Genres` (*index: 9*) and `Category` (*index: 1*) columns of the Google Play dataset.

#### Part Two: Functions for the Frequency Tables
We will build two functions to analyze the frequency tables:

- `freq_table()`: to **generate frequency tables** that show **percentages**
    - Takes in two parameters: 
        - `dataset` is expected to be a *list of lists*
        - `index` is expected to be an *integer*
- `display_table()`: to display the percentages in a **descending order**
    - Transforms the frequency table into a list of tuples, then sorts the list in descending order.
    - Prints the entries of the frequency table in descending order.

In [23]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for app in dataset:
        value = app[index]
        total += 1
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = round(percentage, 2)
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_table = (table[key], key)    # Transform the dictionary into a list of tuples for sorting later
        table_display.append(key_val_as_table)
    
    table_sorted = sorted(table_display, reverse=True)
    for item in table_sorted:
        print(item[1].title(), ':', item[0])

#### Part Three: Analysing the Frequency Tables
First, we analysed the frequency table for the `Prime_Genre` column (*index: 11*) of the App Store dataset.

In [24]:
display_table(ios_final, 11)    # Prime_Genre

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


Among the free English apps on App Store dataset, **more than half of them belong to *Games* (58.16%)**. *Entertainment* (7.88%) is the second most common genre, followed by *Photo & Video* (4.97%), *Education* (3.66%) and *Social Networking* (3.29%).

This data gives us an impression that the apps designed for **fun purposes** (e.g. *Games*, *Entertainment*, *Photo & Video*, *Social Networking*, *Sports*, *Music*, etc.) outnumber the apps built for **practical purposes** (e.g. *Education*, *Shopping*, *Utilities*, *Productivity*, *Lifestyle*, etc). 


Nevertheless, the high supply of apps for fun purposes on the App Store does not necessarily imply that they have a large number of users, as the demand might not be the same as the offer.

Next, we analyse the `Category` (*index: 1*) and `Genre` (*index: 9*) columns of Google Play dataset.

In [25]:
display_table(android_final, 1)    # Category

Family : 18.91
Game : 9.72
Tools : 8.46
Business : 4.59
Lifestyle : 3.9
Productivity : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.4
Personalization : 3.32
Communication : 3.24
Health_And_Fitness : 3.08
Photography : 2.94
News_And_Magazines : 2.8
Social : 2.66
Travel_And_Local : 2.34
Shopping : 2.25
Books_And_Reference : 2.14
Dating : 1.86
Video_Players : 1.79
Maps_And_Navigation : 1.4
Food_And_Drink : 1.24
Education : 1.16
Entertainment : 0.96
Libraries_And_Demo : 0.94
Auto_And_Vehicles : 0.93
House_And_Home : 0.82
Weather : 0.8
Events : 0.71
Parenting : 0.65
Art_And_Design : 0.64
Comics : 0.62
Beauty : 0.6


Based on the analysis of `Category`, **the app profile for Google Play seems to be very different from App Store**. Unlike App Store, **many apps on Google Play are built for practical purposes**, such as *Family* (18.91%), *Tools* (8.46%), *Business* (4.59%), *Lifestyle* (3.90%), *Productivity* (3.89%) categories, etc. The *Game* (9.72%) category of Google Play is also a lot smaller than the *Game* (*58.16%*) category of App Store.

Further examination of the *Family* category showed that it mainly includes games for kids (see below). Having said that, even if we exclude the *Family* category, **the offer of practical apps is still higher on Google Play compared to App Store**, and the offer of the combination of *Family* and *Game* (28.63%) is still a lot lower than the *Game* of App Store.

In [26]:
def android_category_app_name(category_input):
    for app in android_final:
        category = app[1]
        name = app[0]
        installs = app[5]
        if category == category_input:
            print(name)

In [27]:
android_category_app_name('FAMILY')

Jewels Crush- Match 3 Puzzle
Coloring & Learn
Mahjong
Super ABC! Learning games for kids! Preschool apps
Toy Pop Cubes
Educational Games 4 Kids
Candy Pop Story
Princess Coloring Book
Hello Kitty Nail Salon
Candy Smash
Happy Fruits Bomb - Cube Blast
Princess Adventures Puzzles
Kids Educational Game 3 Free
Puzzle Kids - Animals Shapes and Jigsaw Puzzles
Coloring book moana
Baby Panda Care
Kids Educational :All in One
Number Counting games for toddler preschool kids
Learn To Draw Glow Flower
No. Color - Color by Number, Number Coloring
Draw.ly - Color by Number Pixel Art Coloring
Baby puzzles
Garden Fruit Legend
Barbie™ Fashion Closet
Candy Day
Learn To Draw Glow Princess
ABC Kids - Tracing & Phonics
Barbie Magical Fashion
Minion Rush: Despicable Me Official Game
Piano Kids - Music & Songs
Educational Games for Kids
No.Draw - Colors by Number 2018
Fruit Boom
Baby Tiger Care - My Cute Virtual Pet Friend
Rhythm Patrol
Kiddopia - Preschool Learning Games
Papumba Academy - Fun Learning For Ki

Freeform – Stream Full Episodes, Movies, & Live TV
FXNOW: Movies, Shows & Live TV
DC All Access
Hallmark Channel Everywhere
Morse Player Free
Hulu: Stream TV, Movies & more
SYFY
AMC
Cartoon Wars 3
ABC – Live TV & Full Episodes
CX-10WiFi
CX-OF
cx-32wifi
CX-17WIFI
cx-33wifi
CX-WiFi720P
CX-60
CX-10DS
CX-37
CX watcher
CX-40
CX-42
CXmodel-ufo
WiFi FPV
Cy-Reader
Igitabo cy'Indirimbo
Mediabox CY
Cymath - Math Problem Solver
Army of Heroes
Math Solver
Death Dragon Knights RPG
Skylink Live TV CZ
PLAYzone.cz
Ghost Detector
Thuglife Video Maker
MadLipz
Escaping the Prison
Glam Doll Salon - Chic Fashion
COOKING MAMA Let's Cook!
Movie DB
DB Train Simulator
DB Tools
DB for Fallout Shelter
DB for Hustle Castle
DC Legends: Battle for Justice
FANDOM for: DC
DC Super Hero Girls™
LEGO® DC Mighty Micros
Batman: Gotham’s Most Wanted!
Choose SuperHero Quize Marvel | DC
MARVEL Strike Force
Villains vs Superheroes
MARVEL Future Fight
DC N COMPANY ENTERTAINMENT RADIO!
Results for DC Lottery
Driver Permit Test 

TRANSFORMERS: Earth Wars
No Pimple - Fun games
OMG Gross Zit - Date Nightmare
Garden Fever - Free!
Subway Simulator 3D
XCOM: TBG
Aurum Blade EX
My Ex Girlfriend Comes Back
Gods Wars Ex : Vampire
S.O.L : Stone of Life EX
EY eMentor
Pink Guy - Ey B0ss
Adhenarcos - Coupe Adhemar EY 2018
I am Dentist - Save my Teeth
EZ PZ RPG
EZ-Builder Mobile
Il Coccodrillo Come Fa
Dio Fa Cose
F.A Sumon songs
Farm Heroes Saga
All AJK Board Matric Fa Fsc Results
Fallout Shelter
Block Puzzle - Wood Legend
Fart sound pranks
Who Viewed My FB Profile
Who viewed my fb profile pro★★
Messenger Kids – Safer Messaging and Video Chat
Classic FC 64 IN 1
Cambridge English FC
FD VR Video Player - (Stored)
FD Mobile
FD VR - Virtual 3D Web Browser
FD VR Player - for 360 Youtube
FD VR Store -VR Games and Apps
FD VR - Virtual Reality Camera
FD VR Theater - for Youtube VR
Story Time FD
FD VR Cardboard Featured 360 Videos
FD VR Player - for Youtube 3D
FD VR Music Videos - MTV Pop and Rap in 360
FD VR - Virtual Photo Gallery


Although the difference between the `Category` and the `Genres` columns is not clear, the `Genres` column is a lot more detailed than the `Category` column. Since we only need to focus on the bigger picture for our purpose, we will only work with the `Category` column of the Google Play dataset from now onwards.

In [28]:
display_table(android_final, 9)    # Genre

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

Thus far, we noticed that **the App Store is mainly occupied by apps for fun purposes**, whereas **Google Play contains a more equal share of both practical and fun apps**. Next, we want to investigate what type of apps have a higher amount of users.

### Most Popular Apps by Genre on the App Store
To examine which genres have the highest number of users (or popularity), we can calculate the average number of installs for each app genre. This information is available in the `Installs` (*index: 5*) column of the Google Play dataset, however, it is absent in the App Store dataset. As an alternative solution, we will use the total number of user ratings (or reviews) in the `Rating_Count_Tot` (*index: 5*) to estimate the most popular App Store genres.

We will start by calculating the average number of user ratings per app genre on the App Store following these steps:
- Isolate the apps of each genre.
- Sum up the user ratings for the apps of that genre.
- Divide the sum by the number of apps belonging to that genre.

To make the data analysis easier, we sort the average number of user ratings in a descending manner.

In [29]:
genres_ios = freq_table(ios_final, 11)    # Frequency table for prime_genre(index: 11)

list_average_n_ratings = []


# To calculate the average number of user ratings

for genre in genres_ios:    # Loop over the unique genres of the App Store dataset
    total = 0    # The sum of the number of user ratings (not the actual ratings)
    len_genre = 0    # The number of apps specific to each genre
    
    for app in ios_final:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    
    average_n_ratings = round(total/len_genre)
    # print(genre, ':', average_n_ratings)    # (Sanity check) Before sorting
    value_key_ios = (average_n_ratings, genre)    # Use tuple to display the value-key order
    list_average_n_ratings.append(value_key_ios)


# To sort the average number of user ratings

n_ratings_sorted = sorted(list_average_n_ratings, reverse=True)

for item in n_ratings_sorted:
    template = '{genre} : {avg_n_ratings:,}'
    output = template.format(genre=item[1], avg_n_ratings=item[0])
    print(output)    # After sorting

Navigation : 86,090
Reference : 74,942
Social Networking : 71,548
Music : 57,327
Weather : 52,280
Book : 39,758
Food & Drink : 33,334
Finance : 31,468
Photo & Video : 28,442
Travel : 28,244
Shopping : 26,920
Health & Fitness : 23,298
Sports : 23,009
Games : 22,789
News : 21,248
Productivity : 21,028
Utilities : 18,684
Lifestyle : 16,486
Entertainment : 14,030
Business : 7,491
Education : 7,004
Catalogs : 4,004
Medical : 612


The top five app genres on App Store with the highest average number of user ratings are *Navigation* (86,090), *Reference* (74,942), *Social Networking* (71,548), *Music* (57,327) and *Weather* (52,280).

We found that the average number of ratings for the *Navigation* category is highly skewed by _Waze_ (345,046) and _Google Maps_ (154,911), as both have a total of nearly half a million of ratings (499,957).

In [30]:
def ios_genre_details(genre_name):
    for app in ios_final:
        genre_app = app[11]
        name = app[1]
        n_ratings = app[5]
        if genre_app == genre_name:
            template = '{name_} : {ratings_:,}'
            output = template.format(name_=name, ratings_=int(n_ratings))
            print(output)

In [31]:
ios_genre_details('Navigation')

Waze - GPS Navigation, Maps & Real-time Traffic : 345,046
Google Maps - Navigation & Transit : 154,911
Geocaching® : 12,811
CoPilot GPS – Car Navigation & Offline Maps : 3,582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


As for the `Reference` category, the average numbers are also highly skewed by _Bible_ (985,920) and _Dictionary.com_ (for both non-iPad and iPad, 254,222) apps, as these apps have a total of more than a million ratings (1,240,142).

In [32]:
ios_genre_details('Reference')

Bible : 985,920
Dictionary.com Dictionary & Thesaurus : 200,047
Dictionary.com Dictionary & Thesaurus for iPad : 54,175
Google Translate : 26,786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18,418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17,588
Merriam-Webster Dictionary : 16,849
Night Sky : 12,122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8,535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4,693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1,497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Same for the _Social Networking_ and _Music_ categories, as the average numbers are heavily influenced by social media giants such as *Facebook* (2,974,676), *Pinterest* (1,061,624), *Skype* (373,519), and music giants such as *Pandora* (1,126,879), *Spotify Music* (878,563) and *Shazam* (402,925). 

In [33]:
ios_genre_details('Social Networking')

Facebook : 2,974,676
Pinterest : 1,061,624
Skype for iPhone : 373,519
Messenger : 351,466
Tumblr : 334,293
WhatsApp Messenger : 287,589
Kik : 260,965
ooVoo – Free Video Call, Text and Voice : 177,501
TextNow - Unlimited Text + Calls : 164,963
Viber Messenger – Text & Call : 164,249
Followers - Social Analytics For Instagram : 112,778
MeetMe - Chat and Meet New People : 97,072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90,414
InsTrack for Instagram - Analytics Plus More : 85,535
Tango - Free Video Call, Voice and Chat : 75,412
LinkedIn : 71,856
Match™ - #1 Dating App. : 60,659
Skype for iPad : 60,163
POF - Best Dating App for Conversations : 52,642
Timehop : 49,510
Find My Family, Friends & iPhone - Life360 Locator : 43,877
Whisper - Share, Express, Meet : 39,819
Hangouts : 36,404
LINE PLAY - Your Avatar World : 34,677
WeChat : 34,584
Badoo - Meet New People, Chat, Socialize. : 34,428
Followers + for Instagram - Follower Analytics : 28,633
GroupMe : 28,260
Marco Polo Video Wal

In [34]:
ios_genre_details('Music')

Pandora - Music & Radio : 1,126,879
Spotify Music : 878,563
Shazam - Discover music, artists, videos & lyrics : 402,925
iHeartRadio – Free Music & Radio Stations : 293,228
SoundCloud - Music & Audio : 135,744
Magic Piano by Smule : 131,695
Smule Sing! : 119,316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110,420
Amazon Music : 106,235
SoundHound Song Search & Music Player : 82,602
Sonos Controller : 48,905
Bandsintown Concerts : 30,845
Karaoke - Sing Karaoke, Unlimited Songs! : 28,606
My Mixtapez Music : 26,286
Sing Karaoke Songs Unlimited with StarMaker : 26,227
Ringtones for iPhone & Ringtone Maker : 25,403
Musi - Unlimited Music For YouTube : 25,193
AutoRap by Smule : 18,202
Spinrilla - Mixtapes For Free : 15,053
Napster - Top Music & Radio : 14,268
edjing Mix:DJ turntable to remix and scratch music : 13,580
Free Music - MP3 Streamer & Playlist Manager Pro : 13,443
Free Piano app by Yokee : 13,016
Google Play Music : 10,118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9,9

Based on our analysis, the average numbers of reviews of *Navigation*, *Reference*, *Social Networking* and *Music* apps are skewed by giant companies. Except for *Reference*, all these apps are energy-, time- and cost-demanding to build. Additionally, they have special requirements such as community-building expertise, paid access to GPS API etc. Not to mention that competitions with big players such as *Google Map*, *Facebooks*, etc are very tough. Therefore, it is too risky for us to build these kinds of apps (*Navigation*, *Social Networking* and *Music*).

We also notice that *Reference* and *Book* genres overlap with each other. Both of them have **high user engagement and they are relatively easier, faster and cheaper to build**. Therefore, ***Reference* and *Book* exhibit some potentials in the apps market**. What we need to do is to **offer some unique and practical features** to the apps that we are going to build to attract and engage the users.

Other apps that have a high number of reviews as well include *Weather*, *Finance* and *Food & Drink*:
- *Weather*: these apps usually only engage users for a very short time. Thus, weather apps are not so attractive to us from both user engagement and in-app ads perspectives.
- *Finance*: building finance apps require financial, cybersecurity and legal expertise. Moreover, finance apps market is dominated by big Fintech and Bank players. Therefore, they are too cost- and expertise-demanding for us to build.
- *Food and drinks*: these apps are mainly owned by big food and drink chains and food delivery companies, and it is not the business area that we are interested in (i.e. apps development). Furthermore, users usually only use the apps to order and track their orders, i.e. low user engagement in our perspective. Thus, it is less interesting to us.

Although *Game* apps dominate App Store, they do not have a very high number of users in comparison to some of the apps that we mentioned above. This indicates that *Game* apps market is saturated, while the demand is not so high. Therefore, *Game* is less interesting to us.

In [35]:
ios_genre_details('Book')

Kindle – Read eBooks, Magazines & Textbooks : 252,076
Audible – audio books, original series & podcasts : 105,274
Color Therapy Adult Coloring Book for Adults : 84,062
OverDrive – Library eBooks and Audiobooks : 65,450
HOOKED - Chat Stories : 47,829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0


In [36]:
ios_genre_details('Weather')

The Weather Channel: Forecast, Radar & Alerts : 495,626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208,648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188,583
MyRadar NOAA Weather Radar Forecast : 150,158
AccuWeather - Weather for Life : 144,214
Yahoo Weather : 112,603
Weather Underground: Custom Forecast & Local Radar : 49,192
NOAA Weather Radar - Weather Forecast & HD Radar : 45,696
Weather Live Free - Weather Forecast & Alerts : 35,702
Storm Radar : 22,792
QuakeFeed Earthquake Map, Alerts, and News : 6,081
Moji Weather - Free Weather Forecast : 2,333
Hurricane by American Red Cross : 1,158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast 

In [37]:
ios_genre_details('Finance')

Chase Mobile℠ : 233,270
Mint: Personal Finance, Budget, Bills & Money : 232,940
Bank of America - Mobile Banking : 119,773
PayPal - Send and request money safely : 119,487
Credit Karma: Free Credit Scores, Reports & Alerts : 101,679
Capital One Mobile : 56,110
Citi Mobile® : 48,822
Wells Fargo Mobile : 43,064
Chase Mobile : 34,322
Square Cash - Send Money for Free : 23,775
Capital One for iPad : 21,858
Venmo : 21,090
USAA Mobile : 19,946
TaxCaster – Free tax refund calculator : 17,516
Amex Mobile : 11,421
TurboTax Tax Return App - File 2016 income taxes : 9,635
Bank of America - Mobile Banking for iPad : 7,569
Wells Fargo for iPad : 2,207
Stash Invest: Investing & Financial Education : 1,655
Digit: Save Money Without Thinking About It : 1,506
IRS2Go : 1,329
Capital One CreditWise - Credit score and report : 1,019
U by BB&T : 790
Paribus - Rebates When Prices Drop : 768
KeyBank Mobile : 623
VyStar Mobile Banking for iPhone : 434
Sparkasse - Your mobile branch : 77
VyStar Mobile Banking 

In [38]:
ios_genre_details('Food & Drink')

Starbucks : 303,856
Domino's Pizza USA : 258,624
OpenTable - Restaurant Reservations : 113,936
Allrecipes Dinner Spinner : 109,349
DoorDash - Food Delivery : 25,947
UberEATS: Uber for Food Delivery : 17,865
Postmates - Food Delivery, Faster : 9,519
Dunkin' Donuts - Get Offers, Coupons & Rewards : 9,068
Chick-fil-A : 5,665
McDonald's : 4,050
Deliveroo: Restaurant Delivery - Order Food Nearby : 1,702
SONIC Drive-In : 1,645
Nowait Guest : 1,625
7-Eleven, Inc. : 1,356
Outback : 805
Bon Appetit : 750
Starbucks Keyboard : 457
Whataburger : 197
Delish Eatmoji Keyboard : 154
Lieferheld - Delicious food delivery service : 29
Lieferando.de : 29
McDo France : 22
Chefkoch - Rezepte, Kochen, Backen & Kochbuch : 20
Youmiam : 9
Marmiton Twist : 2
Open Food Facts : 1


Let's dive into the apps market in Google Play now!

### Most Popular Apps by Genre on Google Play
We will use the number of installs (`Installs` column, *index: 5*) of the Google Play market to find out the most popular genres. We noticed that the number of installs is provided in open-ended values, such as 100+, 1,000+, 5,000+, etc:

In [39]:
display_table(android_final, 5)    # The `install` column

1,000,000+ : 15.73
100,000+ : 11.55
10,000,000+ : 10.55
10,000+ : 10.2
1,000+ : 8.39
100+ : 6.92
5,000,000+ : 6.83
500,000+ : 5.56
50,000+ : 4.77
5,000+ : 4.51
10+ : 3.54
500+ : 3.25
50,000,000+ : 2.3
100,000,000+ : 2.13
50+ : 1.92
5+ : 0.79
1+ : 0.51
500,000,000+ : 0.27
1,000,000,000+ : 0.23
0+ : 0.05
0 : 0.01


Since our interest is to find out the most popular genres instead of the precise number of installs, we will just assume that an app with 100+ installs has 100 installs, and an app with 1,000+ installs has 1,000 installs, and so on. To calculate the average, we also erase `+` and `,` from the number of installs and convert them from string to float. As in previous analysis, we will sort the average number of installs in a descending manner.

In [40]:
categories_android = freq_table(android_final, 1)    # Frequency table for category (index: 1)

list_val_key_n_installs = []


# To calculate the average number of installs

for category in categories_android:    # Loop over the unique categories of the Google Play dataset
    total = 0    # Sum of installs specific to each category
    len_category = 0    # The number of apps specific to each category
    
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = float(n_installs.replace('+', '').replace(',', ''))
            total += n_installs
            len_category += 1
    
    average_n_installs = round(total / len_category)
    # print(category, ':', average_n_installs)    # (Sanity check) Before sorting
    value_key_android = (average_n_installs, category)    # Use tuple to display the value-key order
    list_val_key_n_installs.append(value_key_android)


# To sort the average number of installs

n_installs_sorted = sorted(list_val_key_n_installs, reverse=True)

for item in n_installs_sorted:
    template = '{category} : {avg_n_installs:,}'
    output = template.format(category=item[1].title(), avg_n_installs=item[0])
    print(output)    # After sorting

Communication : 38,456,119
Video_Players : 24,727,872
Social : 23,253,652
Photography : 17,840,110
Productivity : 16,787,331
Game : 15,588,016
Travel_And_Local : 13,984,078
Entertainment : 11,640,706
Tools : 10,801,391
News_And_Magazines : 9,549,178
Books_And_Reference : 8,767,812
Shopping : 7,036,877
Personalization : 5,201,483
Weather : 5,074,486
Health_And_Fitness : 4,188,822
Maps_And_Navigation : 4,056,942
Family : 3,695,642
Sports : 3,638,640
Art_And_Design : 1,986,335
Food_And_Drink : 1,924,898
Education : 1,833,495
Business : 1,712,290
Lifestyle : 1,437,816
Finance : 1,387,692
House_And_Home : 1,331,541
Dating : 854,029
Comics : 817,657
Auto_And_Vehicles : 647,318
Libraries_And_Demo : 638,504
Parenting : 542,604
Beauty : 513,152
Events : 253,542
Medical : 120,551


*Communication* apps have the highest average number of installs (38,456,119). Among the top installs, 6 of the apps have more than one billion installs each (*WhatsApp*, *Facebook Messenger*, *Skype*, *Google Chrome*, *Gmail* and *Hangouts*), whereas 5 and 16 of them have more than 500 and 100 million installs each respectively.

In [41]:
def android_top_installs(dataset, category_name):    # `category_name` should be in capital
    one_billion = []
    five_hundred_million = []
    one_hundred_million = []

    for app in dataset:
        category_app = app[1]
        name = app[0]
        n_installs = app[5]
        if (category_app == category_name):
            if n_installs == '1,000,000,000+':
                one_billion.append(name)
            elif n_installs == '500,000,000+':
                five_hundred_million.append(name)
            elif n_installs == '100,000,000+':
                one_hundred_million.append(name)

    print('More than 1 billion installs:')
    for item in one_billion:
        print('    ', item)

    print('\n')
    print('More than 500 million installs:')
    for item in five_hundred_million:
        print('    ', item)

    print('\n')
    print('More than 100 million installs:')
    for item in one_hundred_million:
        print('    ', item)

In [42]:
android_top_installs(android_final, 'COMMUNICATION')

More than 1 billion installs:
     WhatsApp Messenger
     Messenger – Text and Video Chat for Free
     Skype - free IM & video calls
     Google Chrome: Fast & Secure
     Gmail
     Hangouts


More than 500 million installs:
     Google Duo - High Quality Video Calls
     imo free video calls and chat
     LINE: Free Calls & Messages
     UC Browser - Fast Download Private & Secure
     Viber Messenger


More than 100 million installs:
     imo beta free calls and text
     Android Messages
     Who
     GO SMS Pro - Messenger, Free Themes, Emoji
     Firefox Browser fast & private
     Messenger Lite: Free Calls & Messages
     Kik
     KakaoTalk: Free Calls & Text
     Opera Mini - fast web browser
     Opera Browser: Fast and Secure
     Telegram
     Truecaller: Caller ID, SMS spam blocking & Dialer
     UC Browser Mini -Tiny Fast Private & Secure
     WeChat
     Yahoo Mail – Stay Organized
     BBM - Free Calls & Messages


If we examine carefully the *Communication* apps that have lower than 100 million installs, we obtain about ten times lower average. In another word, the top 27 apps of *Communication* apps have heavily skewed the number of average installs (90.63% or 34,852,634/38,456,119), which could inevitably cause a bias on the data analysis unless the analysis is carefully performed.

In [43]:
below_one_hundred_million = []

for app in android_final:
    category_app = app[1]
    n_installs = app[5]
    n_installs = float(n_installs.replace('+', '').replace(',', ''))
    if (category_app == 'COMMUNICATION') and (n_installs < 100000000):
        below_one_hundred_million.append(n_installs)

average_n_installs = round(sum(below_one_hundred_million) / len(below_one_hundred_million))
template = 'Average number below one hundred million installs: {:,}'
output = template.format(average_n_installs)
print(output)

Average number below one hundred million installs: 3,603,485


In addition to the *Communication* category, we also observe the same trend on other categories with a high average number of installs, such as:
- *Video Players* apps are dominated by *YouTube*, *Google Play Movies & TV*, *MX Player*, etc
- *Social apps* are dominated by *Facebook*, *Google+*, *Instagram*, *Facebook Lite*, *Snapchat*, etc
- *Photography* apps are dominated by *Google Photo* and other photo editor apps, etc
- *Productivity* apps are dominated by *Google Drive*, *Microsoft Word*, *Dropbox*, *Google Calendar*, *Cloud Print*, etc

The above-mentioned categories are mainly dominated by big players. Additionally, they are time- and cost-demanding to develop. Therefore, developing these apps does not serve our business goal well. 

The *Game* category in Google Play seems to be very popular (15,588,015 average number of installs). Given our earlier analysis showed that the *Game* market is rather saturated, we would focus on other app categories instead of *Game*.

In [44]:
android_top_installs(android_final, 'VIDEO_PLAYERS')

More than 1 billion installs:
     YouTube
     Google Play Movies & TV


More than 500 million installs:
     MX Player


More than 100 million installs:
     Motorola Gallery
     VLC for Android
     Dubsmash
     VivaVideo - Video Editor & Photo Movie
     VideoShow-Video Editor, Video Maker, Beauty Camera
     Motorola FM Radio


In [45]:
android_top_installs(android_final, 'SOCIAL')

More than 1 billion installs:
     Facebook
     Google+
     Instagram


More than 500 million installs:
     Facebook Lite
     Snapchat


More than 100 million installs:
     Tumblr
     Pinterest
     Badoo - Free Chat & Dating App
     Tango - Live Video Broadcast
     LinkedIn
     Tik Tok - including musical.ly
     BIGO LIVE - Live Stream
     VK


In [46]:
android_top_installs(android_final, 'PHOTOGRAPHY')

More than 1 billion installs:
     Google Photos


More than 500 million installs:


More than 100 million installs:
     B612 - Beauty & Filter Camera
     YouCam Makeup - Magic Selfie Makeovers
     Sweet Selfie - selfie camera, beauty cam, photo edit
     Retrica
     Photo Editor Pro
     BeautyPlus - Easy Photo Editor & Selfie Camera
     PicsArt Photo Studio: Collage Maker & Pic Editor
     Photo Collage Editor
     Z Camera - Photo Editor, Beauty Selfie, Collage
     PhotoGrid: Video & Pic Collage Maker, Photo Editor
     Candy Camera - selfie, beauty camera, photo editor
     YouCam Perfect - Selfie Photo Editor
     Camera360: Selfie Photo Editor with Funny Sticker
     S Photo Editor - Collage Maker , Photo Collage
     AR effect
     Cymera Camera- Photo Editor, Filter,Collage,Layout
     LINE Camera - Photo editor
     Photo Editor Collage Maker Pro


In [47]:
android_top_installs(android_final, 'PRODUCTIVITY')

More than 1 billion installs:
     Google Drive


More than 500 million installs:
     Microsoft Word
     Dropbox
     Google Calendar
     Cloud Print


More than 100 million installs:
     Microsoft Outlook
     Microsoft OneDrive
     Microsoft OneNote
     Google Keep
     ES File Explorer File Manager
     Google Docs
     Microsoft PowerPoint
     Samsung Notes
     SwiftKey Keyboard
     Adobe Acrobat Reader
     Google Sheets
     Microsoft Excel
     WPS Office - Word, Docs, PDF, Note, Slide & Sheet
     Google Slides
     ColorNote Notepad Notes
     Evernote – Organizer, Planner for Notes & Memos
     CamScanner - Phone PDF Creator


Corresponding to our earlier analysis on App Store dataset, the *Books_And_Reference* category on Google Play is also quite popular and it has an average of 8,767,812 installs. Since we to launch the apps on both stores at a different timeline, we investigate the apps on *Books_And_Reference* in more detailed:

In [48]:
android_top_installs(android_final, 'BOOKS_AND_REFERENCE')

More than 1 billion installs:
     Google Play Books


More than 500 million installs:


More than 100 million installs:
     Bible
     Amazon Kindle
     Wattpad 📖 Free Books
     Audiobooks from Audible


The top 5 apps of *Books_And_Reference* category are dominated by a few giants such as Google and Amazon, etc. To find the sweet spot of this category, we examine the apps that are moderately popular — with an average installs between 1 million and 100 million:

In [49]:
def android_1mil_100mil_installs(dataset, category_name):    # category_name should be in capital
    fifty_million = []
    ten_million = []
    five_million = []
    one_million = []

    for app in dataset:
        category_app = app[1]
        name = app[0]
        n_installs = app[5]
        if (category_app == category_name):
            if n_installs == '50,000,000+':
                fifty_million.append(name)
            elif n_installs == '10,000,000+':
                ten_million.append(name)
            elif n_installs == '5,000,000+':
                five_million.append(name)
            elif n_installs == '1,000,000+':
                one_million.append(name)

    print('More than 50 million installs:')
    for item in fifty_million:
        print('    ', item)

    print('\n')
    print('More than 10 million installs:')
    for item in ten_million:
        print('    ', item)

    print('\n')
    print('More than 5 million installs:')
    for item in five_million:
        print('    ', item)
    
    print('\n')
    print('More than 1 million installs:')
    for item in one_million:
        print('    ', item)

In [50]:
android_1mil_100mil_installs(android_final, 'BOOKS_AND_REFERENCE')

More than 50 million installs:


More than 10 million installs:
     Wikipedia
     Cool Reader
     FBReader: Favorite Book Reader
     HTC Help
     Moon+ Reader
     Aldiko Book Reader
     Al-Quran (Free)
     Al Quran Indonesia
     Al'Quran Bahasa Indonesia
     Quran for Android
     Dictionary.com: Find Definitions for English Words
     English Dictionary - Offline
     NOOK: Read eBooks & Magazines
     Dictionary
     Spanish English Translator
     Dictionary - Merriam-Webster
     JW Library
     Oxford Dictionary of English : Free
     English Hindi Dictionary


More than 5 million installs:
     AlReader -any text book reader
     Ebook Reader
     Read books online
     Ancestry
     Dictionary - WordWeb
     50000 Free eBooks & Free AudioBooks
     Al Quran : EAlim - Translations & MP3 Offline
     Bible KJV
     English to Hindi Dictionary


More than 1 million installs:
     Book store
     Free Books - Spirit Fanfiction and Stories
     FamilySearch Tree
     Cloud 

Similar to App Store, the **moderately popular apps of the *Books_And_Reference* category are mainly dictionaries, readers, Al Quran/Bible and various types of guides**. This indicates that book and references have potential in both Google Play and App Store, and we need to make them unique to attract users.

## Conclusion
Altogether, we concluded that ***Book* and *Reference* genres show potential in both App Store and Google Play datasets**, as they **attract a significant number of users and posses considerable high engagement rate** due to the nature of the apps. Furthermore, they are **relatively easier, faster and cheaper to build**. All these factors are crucial in increasing the profitability of our company via in-app ads on the free apps that we build. 

We need to distinguish our apps from other apps by **focusing on niches with high market value and providing unique features that serve users' goals**. For example, the popularity of *Dictionary* and *Translation* apps on both stores indicates this area has a high market value. For example, we can build a *Picture Book* app with the addition of interactive games for learning the vocabulary of foreign languages. 

Although pregnancy ([Ref: Pregnancy Product Market](https://illadelink.com/global-pregnancy-product-market-value-projected-to-surge-remarkably-at-double-digit-cagr-during-2020-2026-zion-market-research/)) and baby care ([Ref: Baby Personal Care Market Size](https://www.grandviewresearch.com/industry-analysis/baby-personal-care-market)) have huge market values, there are not many guides on app stores yet. This can also be an interesting niche for us, as a guide type of apps is popular. For instance, we can build a free *Guide for Pregnancy or Baby Care* with interactive features, supportive online community and good user experience to engage our users and subsequently attract more advertisers.

In summary, our data analysis provides us with an area of focus for conducting proper and extensive market research and user research before developing apps. Subsequently, it helps us to make informed business decisions in building the right apps, which will eventually assist us in achieving our business goal.