# What Kind of Mobile Apps Attract Users

Analyze data to help developers understand what kinds of apps are likely to attract more users on **Google Play** and the **App Store**. The goal of recommending an app profile that can be profitable for both markets.


## Open and Explore Data Sets

For the project I use 2 sets of data:

* Mobile App Statistics (Apple iOS app store) - [link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)
* Web scraped data of 10k Play Store apps for analysing the Android market - [link](https://www.kaggle.com/lava18/google-play-store-apps/home)

Begin with open and explore data sets

In [4]:
from csv import reader

# read AppleStore data
opened_file = open('AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
a_data = list(read_file)
a_data_header = a_data[0]
a_data = a_data[1:]

# read googleplaystore data
opened_file = open('googleplaystore.csv', encoding='utf8')
read_file = reader(opened_file)
g_data = list(read_file)
g_data_header = g_data[0]
g_data = g_data[1:]

For exploring data use help function `explore_data()`:

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Print header of `AppleStore` data and explore 3 top rows:

In [5]:
print(a_data_header)
print('\n')
explore_data(a_data, 0, 3, True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 7197
Number of columns: 17


---
There are 7197 applications in mobile application set. 17 fields describe each application. Most valuable for analys:
- 'track_name': App Name,
- 'currency': Currency Type,
- 'price': Price amount, 
- 'rating_count_tot': User Rating counts (for all version), 
- 'user_rating': Average User Rating value (for all version), 
- 'cont_rating': Content Rating, 
- 'prime_genre': Primary Genre, 
- 'sup_devices.num': Number of supporting devices, 
- 'ipadSc_urls.num': Number of screenshots showed for display, 
- 'lang.num': Number of supported languages. 

Print header of `googleplaystore` data and explore 3 top rows:

In [7]:
print(g_data_header)
print('\n')
explore_data(g_data, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


---

There are 10841 applications in mobile application set. 13 fields describe each application. Most valuable for analys:

- 'App': Application name, 
- 'Category': Category the app belongs to, 
- 'Rating': Overall user rating of the app (as when scraped), 
- 'Reviews': Number of user reviews for the app (as when scraped),
- 'Installs': Number of user downloads/installs for the app (as when scraped), 
- 'Type': Paid or Free, 
- 'Price': Price of the app (as when scraped), 
- 'Content Rating': Age group the app is targeted at - Children / Mature 21+ / Adult, 
- 'Genres': An app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to Music, Game, Family genres..

## Data Cleaning

In the discussion page of googleplaystore data set there was a mistake in `'Rating'` found.

In [16]:
print(g_data_header)
print('\n')
print(g_data[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


There is `'Rating'` value 19. Max value should be 5. To avoid this mistake the row should be removed.

In [17]:
print(len(g_data))
del g_data[10472] # delete only one time
print(len(g_data))

10841
10840


## Remove Duplicted Rows

Data set has several rows for some applications with difference in `'Reviews'` value. It may be caused by different timimg of saving information about the same application into data set. Here example for `'Instagram'` application:

In [32]:
print(g_data_header)
for row in g_data:
    app_name = row[0]
    if app_name == 'Instagram':
        print(row)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


The number of duplicated rows can be counted:

In [31]:
duplicates = []
uniques = []

for row in g_data:
    app_name = row[0]
    if app_name in uniques:
        duplicates.append(app_name)
    else:
        uniques.append(app_name)
        
print('Number of unique rows: ',len(uniques))
print('Number of duplicated rows: ',len(duplicates))
print("\nSome duplicated applications appeared in data set:")
print(duplicates[:5])

Number of unique rows:  9659
Number of duplicated rows:  1181

Some duplicated applications appeared in data set:
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


The easiest way is to remove duplicated rows randomly, but it will be not a good approach to analyse data after. The only one difference is `'Reviews'` value. The bigger the value the later data about application was written into data set. The beeter approach is to leave row with the biggest value of `'Reviews'` and all ather duplicated rows ahould be removed.

* **First step:** create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app

In [39]:
highest_review = {}

for row in g_data:
    app_name = row[0]
    review = float(row[3])
    if app_name not in highest_review:
        highest_review[app_name] = review
    else:
        if highest_review[app_name] < review:
            highest_review[app_name] = review

print(len(highest_review))

9659


* **Second step:** use the information stored in the dictionary to create a new data set, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews)

In [49]:
g_data_clean = [] # contain cleaned data set, without duplicates 
added_apps = [] # to keep track of apps that already added

for row in g_data:
    app_name = row[0]
    review = float(row[3])
    if (highest_review[app_name] == review) and (app_name not in added_apps):
        g_data_clean.append(row)
        added_apps.append(app_name)
        
explore_data(g_data_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Remove Non-English Applications

For analysis purpose there is no need of non-Ebglish applications. We will go through each application name and find names with only English letters. According to the ASCII (American Standard Code for Information Interchange) system there should be numbers less then 127. Numbers are integers representing the Unicode code point of each character.
We will use helper function:

In [50]:
def detect_eng (string):
    for character in string:
        if ord(character) > 127:
            return False
    return True

print(detect_eng('Instagram'))
print(detect_eng('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(detect_eng('Docs To Go™ Free Office Suite'))
print(detect_eng('Instachat 😜'))

True
False
False
False


This function works, but needs to be updated to detect additional characters, like emojis, but still recognize English as a main language of application name.

In [55]:
def detect_eng (string):
    emojies = 0
    for character in string:
        if ord(character) > 127:
            emojies +=1
    if emojies > 3:
        return False
    else:
        return True

print(detect_eng('Instagram'))
print(detect_eng('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(detect_eng('Docs To Go™ Free Office Suite'))
print(detect_eng('Instachat 😜'))

True
False
True
True


Let's use the new function to filter out non-English apps from both data sets.

In [57]:
a_data_eng = [] # only English applications in data set
g_data_eng = [] # only English applications in data set

for row in a_data:
    app_name = row[2]
    if detect_eng(app_name):
        a_data_eng.append(row)
        
for row in g_data_clean:
    app_name = row[0]
    if detect_eng(app_name):
        g_data_eng.append(row)
        
explore_data(a_data_eng, 0, 3, True)
print('\n')
explore_data(g_data_eng, 0, 3, True)

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 6183
Number of columns: 17


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', '

## Collect Free Appliactions in Data Set

The data sets contain both free and non-free apps, and we'll need to isolate only the free apps for our analysis.


In [62]:
a_data_new = [] # only free applications in data set
g_data_new = [] # only free applications in data set

for row in a_data_eng:
    price = float(row[5])
    if price == 0:
        a_data_new.append(row)

for row in g_data_eng:
    price = row[7]
    if price == '0':
        g_data_new.append(row)
        
print('AppStore applications: ', len(a_data_new))
print('googleplaystore applications: ', len(g_data_new))

AppStore applications:  3222
googleplaystore applications:  8864


## Look Closer at Genres

Our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps. We need to find app profiles that are successful on both markets: **App Store** and **Google Play**.

Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our data sets.

We will use `'prime_genre'` column of the **App Store** data set, and the `'Genres'` and `'Category'` columns of the **Google Play** data set. And create helper functions `freq_table()` and `display_table()`:

In [77]:
def freq_table(dataset, index):
    f_table = {}
    total = 0
    for row in dataset:
        total += 1
        value = row[index]
        if value in f_table:
            f_table[value] += 1
        else:
            f_table[value] = 1
    
    for element in f_table:
        f_table[element] = f_table[element]/total * 100
    
    return f_table

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now we can display frequency table of the columns `'prime_genre'`, `'Genres'`, and `'Category'`.

In [78]:
# frequency table for AppStore applications on 'prime_genre' column
display_table(a_data_new, 12) 

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


The biggest amount applications in `'Games'` - 58%, then it goes `'Entertainment'`, `'Photo & Video'`, `'Education'` and `'Social Networking'`. I looks like more applications for fun (games, entertainment, photo and video, social networking, sports, music, etc.) and less for practical purposes (education, shopping, utilities, productivity, lifestyle, etc.).

However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

In [66]:
# frequency table for googleplaysyore applications on 'Category' column
display_table(g_data_new, 1) 

FAMILY : 0.0018907942238267149
GAME : 0.0009724729241877256
TOOLS : 0.0008461191335740072
BUSINESS : 0.00045916064981949456
LIFESTYLE : 0.00039034296028880866
PRODUCTIVITY : 0.0003892148014440433
FINANCE : 0.0003700361010830325
MEDICAL : 0.00035311371841155237
SPORTS : 0.0003395758122743682
PERSONALIZATION : 0.00033167870036101085
COMMUNICATION : 0.00032378158844765343
HEALTH_AND_FITNESS : 0.0003079873646209386
PHOTOGRAPHY : 0.0002944494584837545
NEWS_AND_MAGAZINES : 0.00027978339350180506
SOCIAL : 0.00026624548736462096
TRAVEL_AND_LOCAL : 0.000233528880866426
SHOPPING : 0.00022450361010830324
BOOKS_AND_REFERENCE : 0.00021435018050541517
DATING : 0.00018614620938628158
VIDEO_PLAYERS : 0.00017937725631768953
MAPS_AND_NAVIGATION : 0.00013989169675090253
FOOD_AND_DRINK : 0.00012409747292418773
EDUCATION : 0.00011620036101083033
ENTERTAINMENT : 9.589350180505415e-05
LIBRARIES_AND_DEMO : 9.363718411552347e-05
AUTO_AND_VEHICLES : 9.250902527075812e-05
HOUSE_AND_HOME : 8.235559566787003e-05
W

In googleplay store set the whole picture is different. More application for for practical purposes (family, tools, business, lifestyle, productivity, etc.) and less fo fun. However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

Practical apps seem to have a better representation on **Google Play** compared to **App Store**. This picture is also confirmed by the frequency table we see for the `'Genres'` column:

In [67]:
# frequency table for googleplaysyore applications on 'Genres' column
display_table(g_data_new, 9)

Tools : 0.0008449909747292419
Entertainment : 0.0006069494584837545
Education : 0.0005347472924187726
Business : 0.00045916064981949456
Productivity : 0.0003892148014440433
Lifestyle : 0.0003892148014440433
Finance : 0.0003700361010830325
Medical : 0.00035311371841155237
Sports : 0.0003463447653429603
Personalization : 0.00033167870036101085
Communication : 0.00032378158844765343
Action : 0.00031024368231046933
Health & Fitness : 0.0003079873646209386
Photography : 0.0002944494584837545
News & Magazines : 0.00027978339350180506
Social : 0.00026624548736462096
Travel & Local : 0.00023240072202166065
Shopping : 0.00022450361010830324
Books & Reference : 0.00021435018050541517
Simulation : 0.00020419675090252708
Dating : 0.00018614620938628158
Arcade : 0.00018501805054151624
Video Players & Editors : 0.00017712093862815885
Casual : 0.00017599277978339351
Maps & Navigation : 0.00013989169675090253
Food & Drink : 0.00012409747292418773
Puzzle : 0.0001128158844765343
Racing : 9.9277978339350

The difference between the `'Genres'` and the `'Category'` columns is not clear, but one thing we can notice is that the `'Genres'` column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the `'Category'` column moving forward.

## Most Popular Apps by Genre on the App Store

The frequency tables we analyzed in the previous screen showed us that the **App Store** is dominated by apps designed for fun, while **Google Play** shows a more balanced landscape of both practical and for-fun apps. Now, we'd like to get an idea about the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the **Google Play** data set, we can find this information in the `'Installs'` column, but this information is missing for the **App Store** data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `'rating_count_tot'` app.

In [83]:
unique_genres = freq_table(a_data_new, 12)

for genre in unique_genres:
    total = 0
    len_genre = 0
    
    for row in a_data_new:
        genre_app = row[12]
        if genre_app == genre:
            rating_count = float(row[6])
            total += rating_count
            len_genre += 1
    
    avg_rating_count = total/len_genre
    print(genre,': ',avg_rating_count)

Productivity :  21028.410714285714
Weather :  52279.892857142855
Shopping :  26919.690476190477
Reference :  74942.11111111111
Finance :  31467.944444444445
Music :  57326.530303030304
Utilities :  18684.456790123455
Travel :  28243.8
Social Networking :  71548.34905660378
Sports :  23008.898550724636
Health & Fitness :  23298.015384615384
Games :  22788.6696905016
Food & Drink :  33333.92307692308
News :  21248.023255813954
Book :  39758.5
Photo & Video :  28441.54375
Entertainment :  14029.830708661417
Business :  7491.117647058823
Lifestyle :  16485.764705882353
Education :  7003.983050847458
Navigation :  86090.33333333333
Medical :  612.0
Catalogs :  4004.0


The most rated genres are: `'Reference'`, `'Navigation'`, `'Social Networking'`, `'Weather'`, `'Finance, Music'`, `'Food & Drink'`. Navigation is mostly influenced by Waze and Google Maps. Social Networking - by Facebook, Linkedin, Spype etc. Finance - by applications supporting online banking.

Let's look closer to some genres: `'Reference'`, `'Music'`, `'Food & Drink'`. 

In [92]:
for row in a_data_new:
    genre = row[12]
    if genre == 'Reference':
        print(row[2], ':', row[6]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


In [93]:
for row in a_data_new:
    genre = row[12]
    if genre == 'Music':
        print(row[2], ':', row[6]) # print name and number of ratings

Pandora - Music & Radio : 1126879
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
Deezer - Listen to your Favorite Music & Playlists : 4677
Sonos Controller : 48905
NRJ Radio : 38
radio.de - Der Radioplayer : 64
Spotify Music : 878563
SoundCloud - Music & Audio : 135744
Sing Karaoke Songs Unlimited with StarMaker : 26227
SoundHound Song Search & Music Player : 82602
Ringtones for iPhone & Ringtone Maker : 25403
Coach Guitar - Lessons & Easy Tabs For Beginners : 2416
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Magic Piano by Smule : 131695
QQ音乐HD : 224
The Singing Machine Mobile Karaoke App : 130
Bandsintown Concerts : 30845
PetitLyrics : 0
edjing Mix:DJ turntable to remix and scratch music : 13580
Smule Sing! : 119316
Amazon Music : 106235
AutoRap by Smule : 18202
My Mixtapez Music : 26286
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
Napster - Top Music 

In [94]:
for row in a_data_new:
    genre = row[12]
    if genre == 'Food & Drink':
        print(row[2], ':', row[6]) # print name and number of ratings

OpenTable - Restaurant Reservations : 113936
Allrecipes Dinner Spinner : 109349
McDo France : 22
Starbucks : 303856
Lieferando.de : 29
Domino's Pizza USA : 258624
Lieferheld - Delicious food delivery service : 29
Bon Appetit : 750
Chefkoch - Rezepte, Kochen, Backen & Kochbuch : 20
Chick-fil-A : 5665
Postmates - Food Delivery, Faster : 9519
Open Food Facts : 1
7-Eleven, Inc. : 1356
Nowait Guest : 1625
DoorDash - Food Delivery : 25947
SONIC Drive-In : 1645
Youmiam : 9
McDonald's : 4050
Deliveroo: Restaurant Delivery - Order Food Nearby : 1702
Outback : 805
Dunkin' Donuts - Get Offers, Coupons & Rewards : 9068
UberEATS: Uber for Food Delivery : 17865
Delish Eatmoji Keyboard : 154
Marmiton Twist : 2
Starbucks Keyboard : 457
Whataburger : 197


So it can be an application about popular book, with additional features like audio. Still we can consider fiance application, wheather or music one.

## Most Popular Apps by Genre on the Google Store

We have data about the number of installs for the **Google Play** market, so we should be able to get a clearer picture about genre popularity.


In [99]:
unique_categories = freq_table(g_data_new, 1)

for category in unique_categories:
    total = 0
    len_category = 0
    
    for row in g_data_new:
        category_app = row[1]
        if category_app == category:
            installs = row[5]
            installs = installs.replace('+','')
            installs = installs.replace(',','')
            installs = float(installs)
            total += installs
            len_category += 1
        
    avg_installs = total / len_category
    print(category,': ',avg_installs)

ART_AND_DESIGN :  1986335.0877192982
AUTO_AND_VEHICLES :  647317.8170731707
BEAUTY :  513151.88679245283
BOOKS_AND_REFERENCE :  8767811.894736841
BUSINESS :  1712290.1474201474
COMICS :  817657.2727272727
COMMUNICATION :  38456119.167247385
DATING :  854028.8303030303
EDUCATION :  1833495.145631068
ENTERTAINMENT :  11640705.88235294
EVENTS :  253542.22222222222
FINANCE :  1387692.475609756
FOOD_AND_DRINK :  1924897.7363636363
HEALTH_AND_FITNESS :  4188821.9853479853
HOUSE_AND_HOME :  1331540.5616438356
LIBRARIES_AND_DEMO :  638503.734939759
LIFESTYLE :  1437816.2687861272
GAME :  15588015.603248259
FAMILY :  3695641.8198090694
MEDICAL :  120550.61980830671
SOCIAL :  23253652.127118643
SHOPPING :  7036877.311557789
PHOTOGRAPHY :  17840110.40229885
SPORTS :  3638640.1428571427
TRAVEL_AND_LOCAL :  13984077.710144928
TOOLS :  10801391.298666667
PERSONALIZATION :  5201482.6122448975
PRODUCTIVITY :  16787331.344927534
PARENTING :  542603.6206896552
WEATHER :  5074486.197183099
VIDEO_PLAYERS 

Most installed applications in categories: `BOOKS_AND_REFERENCE`, `COMMUNICATION`, `ENTERTAINMENT`, `SOCIAL`, `PHOTOGRAPHY`, `TRAVEL_AND_LOCAL`, `TOOLS`, `PRODUCTIVITY`, `VIDEO_PLAYERS`. Still communication category dominated by WhatsApp, Facebook Messenger, Skype etc, social - by Facebook, Instagram, Google+, photography  - by Google Photos and other popular photo editors, and productivity - by Microsoft Word, Dropbox, Google Calendar, Evernote, etc.

We can look closer at the same category as in **AppStore** - `BOOKS_AND_REFERENCE`.

In [102]:
for row in g_data_new:
    genre = row[1]
    if genre == 'BOOKS_AND_REFERENCE':
        print(row[0], ':', row[5]) # print name and number of installs

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

Best choise is to try an application for `BOOKS_AND_REFERENCE` category.

## Conclusions

Result of analysis is recomendation to make an application for book category. It can be religios popular kind or new one. As far as both narkets (**Google Play** and **Adpp Store**) are pretty full of libraries, some special features should be added to apptlication. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.