# Profitable App Profiles for the App Store and Google Play Market

This project will analyze the data in iOS app store and Android Google Play to enable our team of developers to make data driven decision with respect to the apps they build. 

The purpose of the company is to build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. 

Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

## Opening and Exploring Data
Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. Luckily, these are two data sets that seem suitable for our goals:
<ul>
<li> A data set containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018.</li>
<li>A data set containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. </li>

Let start by opening the two data sets:


In [1]:
open1= open('AppleStore.csv')
open2 = open('googleplaystore.csv')
from csv import reader
read1 = reader(open1)
read2 = reader(open2)
apps_data = list(read1)
apple_header = apps_data[0]
apple = apps_data[1:]
goog_data = list(read2)
google_header = goog_data[0]
google = goog_data[1:]

To make it easier to explore the data, we define the function `explore_data` which can be use repeatedly to explore many data sets. We will also add options to our function to display the number of rows and columns of any data set.

In [2]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    if rows_and_columns: 
        print('Number of rows:', len(dataset))
        print('Number of columns:',len(dataset[0]))

We can use the function `explore_data` to display the first 4 rows of the Apple Store data: 

In [3]:
print(apple_header)
print('\n')
explore_data(apple,0,4,rows_and_columns = True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


At the first glance, we can see that the Apple Store data has 7197 apps and 16 columns. The columns which might be useful for our analysis are: `track_name`, `rating_count_tot`, `rating_count_ver`, `user_rating`, `user_rating_ver`, `prime_genre`.

Now let use the function `explore_data` with the data set of Google Play:

In [4]:
print(google_header)
print('\n')
explore_data(google,0,4,rows_and_columns = True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13


We can see that the Google Play data has 10841 apps and 13 columns. At quick glance, the columns which might be useful for our analysis are: `App`, `Rating`, `Review`, `Install`, `Genre`.

More details of the data sets can be found in these links: [AppleStore](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) and [Google Play](https://www.kaggle.com/lava18/google-play-store-apps/home)

## Cleaning Data
 Before beginning our analysis, we need to make sure the data we analyze is accurate, otherwise the results of our analysis will be wrong. This means that we need to:
<ul>
<li>Detect inaccurate data, and correct or remove it.</li> 
<li>Detect duplicate data, and remove the duplicates.</li>

We can see in one discussion of the Google Play data set that row 10472 has missed the `Rating` of the app. To check whether it is indeed incorrect, we will print the row 10472.

### Deleting Inaccurate Data
We will check the row 10472 for finding errors.

In [5]:
print(google_header)
print(google[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


At row 10472 the term `Rating` was missed, so we delete this row by the `del` statement.

In [6]:
del google[10472]

To check whether the Apple Store data set has the same error with Google Play data set we use or not, we use a loop to find the row which has the different length with the header. 

In [7]:
for row in apple:
    if len(row) != len(apple_header):
        print(row)

Affer running the code, no row was printed. That means there are no same errors in the Apple Store data set. 

### Deleting Duplicate Data
In other dicussion about the Google Play data set, we can also see that this set has some duplicate data. For example, `Google Ads` has three entries.

In [8]:
for app in google: 
    name = app[0]
    if name == 'Google Ads': 
        print(app)

['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


To count the number of duplicates, we:  
<ul>
<li>Created two lists: one for storing the name of `duplicate_apps`, and one for storing the name of `unique_apps`.</li>
<li>Looped through the `google` data set (the Google Play data set), and for each iteration:
We saved the app name to a variable named name.
If name was already in the `unique_apps` list, we appended name to the `duplicate_apps` list.
Else (if name wasn't already in the `unique_apps` list), we appended name to the `unique_apps` list. </li>
</ul>
We can see that there are total 1,181 cases where an apps occurs more than once

In [9]:
duplicate_apps = []
unique_apps = []
for app in google: 
    name= app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
print(len(duplicate_apps))

1181


To remove the duplicates, we will: 
<ul>
<li> Create a dictionary, where each key is a unique app name and the corresponding value is the highest number of reviews of that app. </li>

In [10]:
reviews_max = {}
for app in google:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max: 
        reviews_max[name] = n_reviews

<ul>
<li> Use the information stored in the dictionary to create a new data set and assign it to the list `google_clean` which will contain only the data of the unique apps. The length of this list is: `len(google) - len(duplicate_apps)` = 9,659.

In [11]:
google_clean = []
already_added = []
for app in google:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        google_clean.append(app)
        already_added.append(name)
print(len(google_clean))
google = google_clean

9659


### Removing Non-English Apps
There are some apps which contain characters that doesn't belong to the set of common English charaters. 

We will write a function `check_string` to check whether the string contains this types of characters.  

If the input string has a character that fall outside the ASCII range (0 - 127), then the function should return `False` (identify the string as non-English), otherwise it should return `True`.

In [12]:
def check_string(string):
    check = True
    for letter in string:
        if ord(letter) > 127 :
            check = False
    return check       

We will use this function to check whether these app names are detected as English or non-English:

`'Instagram'`
`'爱奇艺PPS -《欢乐颂2》电视剧热播'`
`'Docs To Go™ Free Office Suite'`
`'Instachat 😜'`

In [13]:
print(check_string('Instagram'))
print(check_string( '爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_string('Docs To Go™ Free Office Suite'))
print(check_string( 'Instachat 😜'))

True
False
False
False


If we're going to use the function we've created, we'll lose useful data since many English apps will be incorrectly labeled as non-English.To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range.

We will change the function to check out this properties of a string. 

In [14]:
def check_string2(string):
    check = True
    nonEng_number = 0
    for letter in string:
        if ord(letter) > 127 :
            nonEng_number += 1 
    if nonEng_number > 3: 
        check = False
    return check       

Now we use the new function to check out with the app names: 
`'爱奇艺PPS -《欢乐颂2》电视剧热播'`
`'Docs To Go™ Free Office Suite'`
`'Instachat 😜'`

In [15]:
print(check_string2( '爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_string2('Docs To Go™ Free Office Suite'))
print(check_string2( 'Instachat 😜'))

False
True
True


We will use the function `check_string` to filter out non-English apps apps from the data sets `google` and `apple` and assign the corresponding data to the new lists `google_new` and `apple_new`. 

After removing the non_English entries, the length of each data set are reduced. 

In [16]:
apple_new = []
for app in apple: 
    string = app[1]
    if check_string2(string): 
        apple_new.append(app)
google_new = []
for app in google: 
    string = app[0]
    if check_string2(string): 
        google_new.append(app)  
print('Apple apps: ',len(apple_new))
print('Google apps: ',len(google_new))
apple = apple_new 
google = google_new

Apple apps:  6183
Google apps:  9614


### Isolating the Free Apps
Our data sets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

We will store the free apps of Apple and Google Play in the correspond lists `apple_free` and `google_free`.  

In [17]:
apple_free = []
for app in apple:
    price = float(app[4])
    if price == 0: 
        apple_free.append(app)
google_free = []
for app in google:
    price = app[7]
    if price == '0': 
        google_free.append(app)
print('Apple apps: ',len(apple_free))
print('Google apps: ',len(google_free))
apple = apple_free 
google = google_free

Apple apps:  3222
Google apps:  8864


## Most Common Apps by Genre: 
As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:
<ol>
<li>Build a minimal Android version of the app, and add it to Google Play.</li>
<li>If the app has a good response from users, we develop it further.</li>
<li> If the app is profitable after six months, we build an iOS version of the app and add it to the App Store. </li>
Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. 

Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our data sets.

To generate the tables we use collumns which contain the genre of apps. In the data set of Apple Store we use the column 11 and in the set of Google Play we use the column 1 and 9. 

In [18]:
print(apple_header[11])

prime_genre


In [19]:
print(google_header[9])

Genres


In [20]:
print(google_header[1])

Category


Now, we create the function named `freq_table()` return the frequency table (as a dictionary) of the genre of apps in the data sets

In [21]:
def freq_table(dataset, index):
    n = len(dataset)
    table = {}
    list_genre = []
    for app in dataset: 
        if app[index] not in list_genre:
            list_genre.append(app[index])
    for genre in list_genre:
        freq = 0
        for app in dataset:
            if app[index] == genre:
                freq += 1
        table[genre] = round(freq/n*100,3)
    return table

And then we will use the function `display_table()` as below to display the frequency table that we haved generated by the above function. 

In [22]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0], '%')

Now we try with the data set of  **Apple Store**. 

In [23]:
display_table(apple,11)

Games : 58.163 %
Entertainment : 7.883 %
Photo & Video : 4.966 %
Education : 3.662 %
Social Networking : 3.29 %
Shopping : 2.607 %
Utilities : 2.514 %
Sports : 2.142 %
Music : 2.048 %
Health & Fitness : 2.017 %
Productivity : 1.738 %
Lifestyle : 1.583 %
News : 1.335 %
Travel : 1.241 %
Finance : 1.117 %
Weather : 0.869 %
Food & Drink : 0.807 %
Reference : 0.559 %
Business : 0.528 %
Book : 0.435 %
Navigation : 0.186 %
Medical : 0.186 %
Catalogs : 0.124 %


As the first glance, we can see that the most common genre is `Game`, nearly 60% out of the number of apps. The second one is `Entertainment`. 

We can also see the least common genre is `Catalogs` only 0.12%, the genre `Navigation` and `Medical` also have low percentage about 0.19%. 

Looking at the bigger picture, we found that most of the apps designed for entertainment (games, photo and video, social networking,...) and only a few percentage of apps designed for practical purposes (education, shopping, utilities, medical, naviagation,...).

However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

And we also use this function to examine the frequency of `Category` and `Genre` of **Google Play** apps. 

**Category**

In [24]:
display_table(google,1)

FAMILY : 18.908 %
GAME : 9.725 %
TOOLS : 8.461 %
BUSINESS : 4.592 %
LIFESTYLE : 3.903 %
PRODUCTIVITY : 3.892 %
FINANCE : 3.7 %
MEDICAL : 3.531 %
SPORTS : 3.396 %
PERSONALIZATION : 3.317 %
COMMUNICATION : 3.238 %
HEALTH_AND_FITNESS : 3.08 %
PHOTOGRAPHY : 2.944 %
NEWS_AND_MAGAZINES : 2.798 %
SOCIAL : 2.662 %
TRAVEL_AND_LOCAL : 2.335 %
SHOPPING : 2.245 %
BOOKS_AND_REFERENCE : 2.144 %
DATING : 1.861 %
VIDEO_PLAYERS : 1.794 %
MAPS_AND_NAVIGATION : 1.399 %
FOOD_AND_DRINK : 1.241 %
EDUCATION : 1.162 %
ENTERTAINMENT : 0.959 %
LIBRARIES_AND_DEMO : 0.936 %
AUTO_AND_VEHICLES : 0.925 %
HOUSE_AND_HOME : 0.824 %
WEATHER : 0.801 %
EVENTS : 0.711 %
PARENTING : 0.654 %
ART_AND_DESIGN : 0.643 %
COMICS : 0.62 %
BEAUTY : 0.598 %


The landscape seems significantly change when we look at the frequency table of categories on Google Play: there are not so many apps designed for fun, and the number of apps designed for practical purposes are increases, compare to the App Strore apps. 

** Genre **

In [25]:
display_table(google,9)

Tools : 8.45 %
Entertainment : 6.069 %
Education : 5.347 %
Business : 4.592 %
Productivity : 3.892 %
Lifestyle : 3.892 %
Finance : 3.7 %
Medical : 3.531 %
Sports : 3.463 %
Personalization : 3.317 %
Communication : 3.238 %
Action : 3.102 %
Health & Fitness : 3.08 %
Photography : 2.944 %
News & Magazines : 2.798 %
Social : 2.662 %
Travel & Local : 2.324 %
Shopping : 2.245 %
Books & Reference : 2.144 %
Simulation : 2.042 %
Dating : 1.861 %
Arcade : 1.85 %
Video Players & Editors : 1.771 %
Casual : 1.76 %
Maps & Navigation : 1.399 %
Food & Drink : 1.241 %
Puzzle : 1.128 %
Racing : 0.993 %
Role Playing : 0.936 %
Libraries & Demo : 0.936 %
Auto & Vehicles : 0.925 %
Strategy : 0.914 %
House & Home : 0.824 %
Weather : 0.801 %
Events : 0.711 %
Adventure : 0.677 %
Comics : 0.609 %
Beauty : 0.598 %
Art & Design : 0.598 %
Parenting : 0.496 %
Card : 0.451 %
Casino : 0.429 %
Trivia : 0.417 %
Educational;Education : 0.395 %
Board : 0.384 %
Educational : 0.372 %
Education;Education : 0.338 %
Word : 0.

The difference between the `Category` and the `Genre` is not so obviously but at the first glance, we can see that the table of `Genre` is much more granular (it has more rows) than the table of `Category`. 

In conclusion, we found that the App Store focus on the `Games` and apps designed for entertainment while Google Play has more balanced landscape of both practical and for-fun apps. 

## Most Popular Apps by Genre on the App Store 
One way to find out what genres are the most popular is to calculate the average number of downloads for each app genre. Unfortunately, this information is missing for the App Store data set. As a workaround, we will deal with the number of user ratings which we can find in the column `rating_count_tot`. 

Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to:
<ul>
<li> Isolate the apps of each genre. </li>
<li> Sum up the user ratings for the apps of that genre. </li>
<li> Divide the sum by the number of apps belonging to that genre (not by the total number of apps). </li>
</ul> 
We will store the average rating of each genre into the dictionary `rating_table`

In [26]:
genre_table = freq_table(apple,11)
rating_table = {}
for genre in genre_table: 
    total = 0
    len_genre = 0
    for app in apple:
        genre_app = app[11]
        if genre_app == genre : 
            total += float(app[5])
            len_genre += 1
    aver_rating = total/len_genre
    rating_table[genre] = aver_rating 

Now, to examine the number of ratings easier, we will write a function to sort the table. 

The table after sorting is printed below. 

In [27]:
def sort_table(table):
    table_display=[]
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])  
sort_table(rating_table)

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


As we see in the table above, the genre has highest number of average rating is `Navigation` but this genre is heavily influenced by Waze and Google Maps, which have a huge number of users. It may be one of the reasons why number of apps in this genre is very low. 

In [28]:
for app in apple:
    if app[11] == 'Navigation':
        print(app[1],' : ', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic  :  345046
Google Maps - Navigation & Transit  :  154911
Geocaching®  :  12811
CoPilot GPS – Car Navigation & Offline Maps  :  3582
ImmobilienScout24: Real Estate Search in Germany  :  187
Railway Route Search  :  5


Move to some other genre which have large number of rating such as: `Social Networking`, `Music`, ... We can see that they are all influence by some immensely well-known apps like Facebook, Pinterest, Pandora. So they are not ideal genres to develop a new app for. 

In [29]:
for app in apple:
    if app[11] == 'Social Networking':
        print(app[1],' : ', app[5])

Facebook  :  2974676
Pinterest  :  1061624
Skype for iPhone  :  373519
Messenger  :  351466
Tumblr  :  334293
WhatsApp Messenger  :  287589
Kik  :  260965
ooVoo – Free Video Call, Text and Voice  :  177501
TextNow - Unlimited Text + Calls  :  164963
Viber Messenger – Text & Call  :  164249
Followers - Social Analytics For Instagram  :  112778
MeetMe - Chat and Meet New People  :  97072
We Heart It - Fashion, wallpapers, quotes, tattoos  :  90414
InsTrack for Instagram - Analytics Plus More  :  85535
Tango - Free Video Call, Voice and Chat  :  75412
LinkedIn  :  71856
Match™ - #1 Dating App.  :  60659
Skype for iPad  :  60163
POF - Best Dating App for Conversations  :  52642
Timehop  :  49510
Find My Family, Friends & iPhone - Life360 Locator  :  43877
Whisper - Share, Express, Meet  :  39819
Hangouts  :  36404
LINE PLAY - Your Avatar World  :  34677
WeChat  :  34584
Badoo - Meet New People, Chat, Socialize.  :  34428
Followers + for Instagram - Follower Analytics  :  28633
GroupMe  :  

In [30]:
for app in apple:
    if app[11] == 'Music':
        print(app[1],' : ', app[5])

Pandora - Music & Radio  :  1126879
Spotify Music  :  878563
Shazam - Discover music, artists, videos & lyrics  :  402925
iHeartRadio – Free Music & Radio Stations  :  293228
SoundCloud - Music & Audio  :  135744
Magic Piano by Smule  :  131695
Smule Sing!  :  119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music  :  110420
Amazon Music  :  106235
SoundHound Song Search & Music Player  :  82602
Sonos Controller  :  48905
Bandsintown Concerts  :  30845
Karaoke - Sing Karaoke, Unlimited Songs!  :  28606
My Mixtapez Music  :  26286
Sing Karaoke Songs Unlimited with StarMaker  :  26227
Ringtones for iPhone & Ringtone Maker  :  25403
Musi - Unlimited Music For YouTube  :  25193
AutoRap by Smule  :  18202
Spinrilla - Mixtapes For Free  :  15053
Napster - Top Music & Radio  :  14268
edjing Mix:DJ turntable to remix and scratch music  :  13580
Free Music - MP3 Streamer & Playlist Manager Pro  :  13443
Free Piano app by Yokee  :  13016
Google Play Music  :  10118
Certified Mixtapes - Hip Hop 

Let see another genre with extremely high number of ratings, `Book`. 

In [31]:
for app in apple:
    if app[11] == 'Book':
        print(app[1],' : ', app[5])

Kindle – Read eBooks, Magazines & Textbooks  :  252076
Audible – audio books, original series & podcasts  :  105274
Color Therapy Adult Coloring Book for Adults  :  84062
OverDrive – Library eBooks and Audiobooks  :  65450
HOOKED - Chat Stories  :  47829
BookShout: Read eBooks & Track Your Reading Goals  :  879
Dr. Seuss Treasury — 50 best kids books  :  451
Green Riding Hood  :  392
Weirdwood Manor  :  197
MangaZERO - comic reader  :  9
ikouhoushi  :  0
MangaTiara - love comic reader  :  0
謎解き  :  0
謎解き2016  :  0


As we can see in the table below, some most popuplar apps in this genre are: Kindle, Audible, ... with not too heavy influences. On the other hands, Kindle also releases a particular device for reading books. It 's sometimes inconvenient because users need to bring both Kindle device and iPhone whenever they go outside and want to read e-books.

Up to this point, we can develop a new apps to read ebooks on Iphone with a variety of functions like: write a review or read others reviews, take notes on the book, upload your own books, ... 
Or we can add the dictionary into the apps for user to read books in different languages. 

The genre `Book` is highly recommend for the developers. This genre also fit to the trend of App Store which focus on generating for-fun apps. 

Besides, some genre require high domain knowledge and special facilities such as: `Weather, Food & Drink, Finance`... are not recommended. 

## Most Popular Apps by Genre on Google Play 
We will use the data about the number of installs to examine the populatity of each genre of apps in Google Play. 

In [32]:
category_table = freq_table(google,1)
install_table = {}
for category in category_table:
    total = 0 
    len_category = 0
    for app in google:
        category_app = app[1]
        if category_app == category: 
            string = app[5].replace('+','')
            string = string.replace(',','')
            total += float(string)
            len_category += 1
    install_table[category] = total/len_category
sort_table(install_table)

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

As we can see, the category which has the highest average number of install is `COMMUNICATION`. We will display the most popular apps belong to this category which has more than 100 million installs. 

In [33]:
for app in google:
    if app[1] == 'COMMUNICATION'and (app[5] == '1,000,000,000+'
                                or app[5] == '500,000,000+'
                                or app[5] == '100,000,000+'):
        print(app[0], ' : ', app[5])

WhatsApp Messenger  :  1,000,000,000+
imo beta free calls and text  :  100,000,000+
Android Messages  :  100,000,000+
Google Duo - High Quality Video Calls  :  500,000,000+
Messenger – Text and Video Chat for Free  :  1,000,000,000+
imo free video calls and chat  :  500,000,000+
Skype - free IM & video calls  :  1,000,000,000+
Who  :  100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji  :  100,000,000+
LINE: Free Calls & Messages  :  500,000,000+
Google Chrome: Fast & Secure  :  1,000,000,000+
Firefox Browser fast & private  :  100,000,000+
UC Browser - Fast Download Private & Secure  :  500,000,000+
Gmail  :  1,000,000,000+
Hangouts  :  1,000,000,000+
Messenger Lite: Free Calls & Messages  :  100,000,000+
Kik  :  100,000,000+
KakaoTalk: Free Calls & Text  :  100,000,000+
Opera Mini - fast web browser  :  100,000,000+
Opera Browser: Fast and Secure  :  100,000,000+
Telegram  :  100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer  :  100,000,000+
UC Browser Mini -Tiny Fas

This category is dominated by a lot of apps like Whatsapp, Skype, LINE, ... and the number of apps which have over 100 million installs is very large. 

Look at another category `BOOKS_AND_REFERENCE`, the number of popular apps is very small. Only 5 apps has the number of installs over 100 million. 

In [34]:
for app in google:
    if app[1] == 'BOOKS_AND_REFERENCE'and (app[5] == '1,000,000,000+'
                                or app[5] == '500,000,000+'
                                or app[5] == '100,000,000+'):
        print(app[0], ' : ', app[5])

Google Play Books  :  1,000,000,000+
Bible  :  100,000,000+
Amazon Kindle  :  100,000,000+
Wattpad 📖 Free Books  :  100,000,000+
Audiobooks from Audible  :  100,000,000+


This category may be ideal for the developer to built new things. 

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book, like we have mentioned above when we work with App Store data set. 

## Conclusion 
In this project, we have examine two sets of data to recommend the apps profile that will be profitable for company. 

To sum up, the genre/category about Books was highly recommended with some adding features to make the apps more convenient for users and different from tons of other apps which have existed before.