![](ACoolLogo.png)

# Profitable App Profiles from Apple App Store and Google Play Markets

## Introduction

In this project, I take on the role of an analyst to understand some characteristics of what makes a **free** app profitable. The end goal is to build an app within the confines of a particular category/use case that can generate profit purely through in-app ads. That is to say, it is to help developers understand what type of apps are likely to attract more users.

*Note: This is a guided project from Dataquest's 'Data Science with Python - Fundamentals' course.

## Exploring The Dataset

For the purpose of this exercise, I have collected two datasets that are freely available on Kaggle.

1. Android Apps: Approximately 10k apps across categories. ([Link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)), stored in `googleplaystore.csv`.
2. iPhone Apps: Approximately 7k apprs across categories. ([Llink](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)), stored in `AppleStore.csv`.

The first step would be to open the two datasets and save them as a list of lists. Followed by removing the header row so that our analysis is fairly straightforward.

In [1]:
#For Google Play Dataset

opened_file_playstore = open('googleplaystore.csv')
from csv import reader
read_file_playstore = reader(opened_file_playstore)
playstore_full = list(read_file_playstore)
playstore_header = playstore_full[0]
playstore_data = playstore_full[1:]

In [2]:
#Testing the new lists:

print(playstore_header)
print('\n')
print(playstore_data[:1])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']]


In [3]:
#Replicating the same steps for the AppStore Dataset

opened_file_appstore = open('AppleStore.csv')
from csv import reader
read_file_appstore = reader(opened_file_appstore)
appstore_full = list(read_file_appstore)
appstore_header = appstore_full[0]
appstore_data = appstore_full[1:]

In [4]:
#Testing the new lists:

print(appstore_header)
print('\n')
print(appstore_data[:1])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']]


As can be noted above, the data is stored in a non-uniform manner. That is, the position of the coloumns are different in each dataset and overall, can't be treated the same way.

To make the traversal of data fairly simple, we make use of the function defined within the course - `explore_data()`

In [5]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row
    
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [6]:
#Printing the datasets by making use of the explore_data().

print('Play Store')
print('\n')
explore_data(playstore_data, 0,2, True)
print('\n')
print('App Store')
print('\n')
explore_data(appstore_data,0,2,True)

Play Store


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


App Store


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7197
Number of columns: 16


## Relevant Coloumns

For our particular use case, we need to first find out what coloumns contain information most suited to our needs. These can be identified by going through the documentation for both datasets. What we have identified are:

**PLAY STORE**
* `App` (application name)
* `Reviews` (number of user reviews)
* `Category` (category the app belongs to)
* `Installs` (number of user downloads/installs)
* `Type` (paid or free)
* `Price` (price of the app)
* `Genres` (genre app belongs to, can belong to multiple at ones)

**APP STORE**
* `track_name` (application name)
* `price` (price of the app)
* `rating_count_tot` (User rating counts for all versions)
* `user_rating` (ratings for the app)
* `prime_genre` (genre app belongs to)

## Data Cleaning

For this particular exercise, we need to ensure that the the dataset is perfectly fit for analysis. First, duplicate and inaccurate data is to be removed. Followed by ensuring that the dataset includes only English apps (Since our pretend app development company only builds apps in English).

### Inaccurate Data

From a [discussion](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion) on the Apple dataset it seems that there is no wrong datapoint in it.

However, the same is not true for the Play Store dataset. [Source](https://www.kaggle.com/lava18/google-play-store-apps/discussion)

This can be checked by printing that particular row.

In [7]:
#Printing row 10472
print(playstore_data[10472])
print('\n')
print('Rows ', ':', len(playstore_data[10472]))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Rows  : 12


As can be seen above, this particular coloumn only has 12 rows.

In [8]:
print(playstore_data[10471])
print('\n')
print('Rows ', ':', len(playstore_data[10471]))

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


Rows  : 13


It is now clear that the `Category` coloumn is what is missing. Hence, the rest of the coloumns have shifted over to the left.

For this particular exercise, it has been decided to delete this row using the `del()`

In [9]:
#Deleting row 10472

del playstore_data[10472]
print(playstore_data[10472])
print('Rows ', ':', len(playstore_data[10472]))

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']
Rows  : 13


Now, that we don't have any inaccurate date, we can move forward to the next part of data cleaning - Removing duplicates.

## Duplicate Data

Going through the discussion on Play Store dataset, we find that there are also duplicate entries. We can loop through the dataset to confirm this.

In [10]:
#Create two empty lists
duplicate_apps = []
unique_apps = []

#Loop through the datasets and add into the lists.
for app in playstore_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print("Number of unique apps in Play Store Dataset: " + str(len(unique_apps)))  
print("Number of duplicate apps in Play Store Dataset: " + str(len(duplicate_apps)))  
print('\n')
print('First 10 duplicate apps: ' + str(duplicate_apps[:10]))

Number of unique apps in Play Store Dataset: 9659
Number of duplicate apps in Play Store Dataset: 1181


First 10 duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [11]:
#Looking through the dataset for duplicate apps

for app in playstore_data:
    name = app[0]
    if name == 'Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


Looking at the code above, we can observe that there is an increasing pattern in the form of number of reviews. So, instead of removing apps randomly, they can be removed based on number of reviews. i.e. The apps with the highest number of reviews would imply recency. These recent apps can remain in our dataset.

To do this we first create a dictionary with the entries to keep per duplicatie entry (key = app name, and value = highest number of reviews).

In [14]:
#Creating a dictionary of duplicate apps and corresponding entry with highest review count.

reviews_max = {}
for app in playstore_data:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
            reviews_max[name] = n_reviews

print(list(reviews_max.items())[:3])
print('\n')
print('Length of dictionary is: ', len(reviews_max))
print('Expected length is: ', len(playstore_data)-1181)

[('Photo Editor & Candy Camera & Grid & ScrapBook', 159.0), ('Coloring book moana', 974.0), ('U Launcher Lite – FREE Live Cool Themes, Hide Apps', 87510.0)]


Length of dictionary is:  9659
Expected length is:  9659


Now, that we have the right dictionary elements (App name with highest rating count), this can be used to build a new list without duplicates.

In [15]:
#Create two empty lists
playstore_clean = []
already_added = []

#Loop through the original dataset and add to the clean list.

for app in playstore_data:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and name not in already_added:
        playstore_clean.append(app)
        already_added.append(name)

In [16]:
#Check using explore_data()

explore_data(playstore_clean, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13


The discussion on the AppStore dataset doesn't include any issues of duplicate entries. So, we will consider that true. To keep the nomenclature in sync, we will change the name `appstore_clean`

In [18]:
appstore_clean = appstore_data

explore_data(appstore_clean, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


## Data Preparation

With our data being cleaned, the next step is to ensure that our data is:

* In English
* Inclusive of only Free Apps

This is purely because of the natureo of the exercise. Where we assume that our developers are only building apps that include these characteristics.

### Removing Non-English Apps

The procedure to do this is by using the `ord()`. We will traverse through the name coloumn to ensure that each ASCII characters are those within the confines of English.

In [19]:
def english_check(some_string):
    number = 0
    for char in some_string:
        if ord(char) > 127:
            number += 1
    if number > 3:
        return False
    else:
        return True
    
print ("Instagram: " + str(english_check("Instagram")))
print ("爱奇艺PPS -《欢乐颂2》电视剧热播: " + str(english_check("爱奇艺PPS -《欢乐颂2》电视剧热播")))
print ("Docs To Go™ Free Office Suite: " + str(english_check("Docs To Go™ Free Office Suite")))
print ("Instachat 😜: " + str(english_check("Instachat 😜")))    

Instagram: True
爱奇艺PPS -《欢乐颂2》电视剧热播: False
Docs To Go™ Free Office Suite: True
Instachat 😜: True


With this settled, we'll the `english_check()` to remove all non-English characters.

In [22]:
appstore_clean_english = []

for app in appstore_clean:
    name = app[1]
    if english_check(name):
        appstore_clean_english.append(app)

playstore_clean_english = []

for app in playstore_clean:
    name = app[0]
    if english_check(name):
        playstore_clean_english.append(app)
        
print('Android')
print('\n')
explore_data(playstore_clean_english, 0, 3, True)
print('\n')
print('iPhone')
print('\n')
explore_data(appstore_clean_english, 0, 3, True)

Android


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


iPhone


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5'

### Removing Non-Free Apps

From the exploration of the data set above we know that `Price` is stored in the following columns in our data sets:

* Play Store 8th column (index 7)
* App Store 5th column (index 4)

We will now isolate these apps by running a loop for those apps where `price == 0`.

In [24]:
playstore_final = []

for app in playstore_clean_english:
    price = app[7]
    if price == '0':
        playstore_final.append(app)
        
appstore_final = []

for app in appstore_clean_english:
    price = app[4]
    if price == '0.0':
        appstore_final.append(app)
        
print ("Android")
print('\n')
explore_data(playstore_final, 0, 3, True)
print('\n')
print ("iPhone")
print('\n')
explore_data (appstore_final, 0, 3, True)        

Android


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


iPhone


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5'

# Data Analysis

To provide information for my development team, the end goal is to identify app profiles that are successful in both markets.

One way to find this out is by indicating what genre an app belongs to, and see which genres are the most common.   The earlier data exploration revealed the following columns store this information:

* Play Store: `Category` and `Genre` 
* App Store: `prime_genre`

### Most Common Genrese

To detect the most common genres, we will create a frequency table and then display them as percentages.

In [25]:
#Creating the frequency table
def freq_table(dataset, index):
    freq_dict = {}
    for app in dataset:
        element = app[index]
        if element in freq_dict:
            freq_dict[element] += 1
        else:
            freq_dict[element] = 1
    return freq_dict

#Then we will display the table in the preferred format

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now, we will use these functions.

We wil ldisplay the frequency table of coloumns `Category`, `Genre` of Play Store dataset. Followed by frequency table of the coloumn `prime_genre` of the apple dataset.

**Play Store/Category**

In [26]:
display_table(playstore_final, 1)

FAMILY : 1676
GAME : 862
TOOLS : 750
BUSINESS : 407
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 313
SPORTS : 301
PERSONALIZATION : 294
COMMUNICATION : 287
HEALTH_AND_FITNESS : 273
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 190
DATING : 165
VIDEO_PLAYERS : 159
MAPS_AND_NAVIGATION : 124
FOOD_AND_DRINK : 110
EDUCATION : 103
ENTERTAINMENT : 85
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
HOUSE_AND_HOME : 73
WEATHER : 71
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 57
COMICS : 55
BEAUTY : 53


**Play Store/Genre**

In [27]:
display_table(playstore_final, 9)

Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294
Communication : 287
Action : 275
Health & Fitness : 273
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 164
Video Players & Editors : 157
Casual : 156
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 81
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 53
Parenting : 44
Card : 40
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Action & Adventure : 12
Arcade;Action & Advent

**App Store/Prime Genre**

In [28]:
display_table(appstore_final, 11)

Games : 1874
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


Now we can observe that the most common genres are:

Play Store/Category:
1. Family: 1676
2. Game: 862
3. Tools: 750

Play Store/Genre:
1. Tools: 749
2. Entertainment: 538
3. Education: 474

App Store/Prime Genre:
1. Games: 1874
2. Entertainment: 254
3. Photo & Video: 160

It seems the most common genres for free apps in English on Google Play are of a more practical nature, while the most common genre for these types of apps on the App store is Games.

While this is good to know, the most frequent genre does not necessarily mean these apps and genres also have the most users.

Now, we will find out what genres are the most popular. For the Play Store, it's fairly straight forward thanks to the `Installs` coloumn. However, for the App Store, we will make do with the total number of user ratings, `rating_count_tot` coloumn.

In [31]:
freq_prime_genre = freq_table(appstore_final, 11)

for genre in freq_prime_genre:
    total = 0
    len_genre = 0
    for app in appstore_final:
        genre_app = app[11]
        if genre_app == genre:
            user_ratings = float(app[5])
            total += user_ratings
            len_genre += 1
            
    average = total/len_genre
    print(genre + ': '+str(average))

Social Networking: 71548.34905660378
Photo & Video: 28441.54375
Games: 22788.6696905016
Music: 57326.530303030304
Reference: 74942.11111111111
Health & Fitness: 23298.015384615384
Weather: 52279.892857142855
Utilities: 18684.456790123455
Travel: 28243.8
Shopping: 26919.690476190477
News: 21248.023255813954
Navigation: 86090.33333333333
Lifestyle: 16485.764705882353
Entertainment: 14029.830708661417
Food & Drink: 33333.92307692308
Sports: 23008.898550724636
Book: 39758.5
Finance: 31467.944444444445
Education: 7003.983050847458
Productivity: 21028.410714285714
Business: 7491.117647058823
Catalogs: 4004.0
Medical: 612.0


We see that popular apps with high average number of users seem to be Social Networking, Navigation and Music. However, these numbers are skewed because of apps like Facebook, Google Maps and Spotify.

Another popular genre is Reference. Let's deep dive for a better understanding.

In [33]:
for app in appstore_final:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Most of these apps in the genre seem to be a popular book, and converted to an app. Thereby adding some extra features.

Now, we can analyse the Google Play dataset in a similar manner.

While the data in the `install` column are not very precies (100.00+, 1.000.000+ etc), it serves our purpose.  We will have to remove the comma's and plus characters, and convert the data from string to float so we can perform calculations on them.

In [34]:
freq_category = freq_table(playstore_final, 1)

for category in freq_category:
    total = 0
    len_category = 0
    for app in playstore_final:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            clean_installs = installs.replace('+','')
            clean_installs = clean_installs.replace(',','')
            clean_installs = float(clean_installs)
            total += clean_installs
            len_category += 1
    average = total/len_category
    print(category + ': '+ str(average))

ART_AND_DESIGN: 1986335.0877192982
AUTO_AND_VEHICLES: 647317.8170731707
BEAUTY: 513151.88679245283
BOOKS_AND_REFERENCE: 8767811.894736841
BUSINESS: 1712290.1474201474
COMICS: 817657.2727272727
COMMUNICATION: 38456119.167247385
DATING: 854028.8303030303
EDUCATION: 1833495.145631068
ENTERTAINMENT: 11640705.88235294
EVENTS: 253542.22222222222
FINANCE: 1387692.475609756
FOOD_AND_DRINK: 1924897.7363636363
HEALTH_AND_FITNESS: 4188821.9853479853
HOUSE_AND_HOME: 1331540.5616438356
LIBRARIES_AND_DEMO: 638503.734939759
LIFESTYLE: 1437816.2687861272
GAME: 15588015.603248259
FAMILY: 3695641.8198090694
MEDICAL: 120550.61980830671
SOCIAL: 23253652.127118643
SHOPPING: 7036877.311557789
PHOTOGRAPHY: 17840110.40229885
SPORTS: 3638640.1428571427
TRAVEL_AND_LOCAL: 13984077.710144928
TOOLS: 10801391.298666667
PERSONALIZATION: 5201482.6122448975
PRODUCTIVITY: 16787331.344927534
PARENTING: 542603.6206896552
WEATHER: 5074486.197183099
VIDEO_PLAYERS: 24727872.452830188
NEWS_AND_MAGAZINES: 9549178.467741935
MA

Popular genres with a high average number of users seem Communication and Video Players, but again, we know these numbers are skewed due to popular apps like Whatsapp and Youtube.

Another popular genre that jumps out is Books and Reference, which coincides with our finding from the Apple data set. Let's see what apps this genre is about to get a better understanding.

In [36]:
for app in playstore_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

Once again, we observe that many of the apps in this category seem to have taken a popular book and made it available on a mobile device, possibly with some extra features.

Though this data is skewed thanks to apps like Google Play Books, Bible and Amazon Kindle.

# Recommendation

The analysis shows that the segment of free English apps has the most potential to be developed and released on both markets. That is, look for a popular book and convert this into an app. With additional functionalities for interaction, quick referencing and annotations. Among a host of other features.