# Apps Profiles Recommendation for the App Store and Google Play Markets


In this project, we analyze Android and iOS mobile apps on App Store and Google Play Markets for a company that builds those apps. Our goal is to help the developers of that company understand what kinds of apps are likely to attract more users.

# Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for those apps is complicated, so we will try to analyze a smaller sample instead. The aim is to find relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose. One data set contains data about approximately ten thousand Android apps from Google Play. The other one contains data about approximately seven thousand iOS apps from the App Store. 

First of all, we open these two files which describe the information of apps in App Store and Google Play. Then we make them as lists and explore them by printing the first few rows as well as checking the number of rows and columns in each file.

In [1]:
from csv import reader

open_AppleStore=open('AppleStore.csv', encoding='utf8')
open_googleplaystore=open('googleplaystore.csv', encoding='utf8')

read_AppleStore=reader(open_AppleStore)
read_googleplaystore=reader(open_googleplaystore)

Apple_data=list(read_AppleStore)
google_data=list(read_googleplaystore)

Apple_data_header=Apple_data[0]
google_data_header=google_data[0]

Apple_data=Apple_data[1:] # This is the actual dataset we consider, which does not contant the header
google_data=google_data[1:]

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') 

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
explore_data(Apple_data, 0, 3, True)  
print('\n')
explore_data(google_data, 0, 3, True)        


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1,

We now print the headers of these two files. Since our goal is to understand which free download and install apps are most used by people. The following columns might be useful for our analysis:

App Store data set: 'track_name', 'price', 'rating_count_tot', 'prime_genre'.
Google Play data set: 'App', 'Category', 'Reviews', 'Installs', 'Price', 'Genres'.

In [2]:
print(Apple_data_header)
print('\n')
print(google_data_header)


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


# Data Cleaning : Detecting and deleting wrong data

Before analyzing our data sets, one need to clean the data in order to remove the inaccurate information. The first step in the cleaning process is to detect and then correct or delete wrong data.

The Google Play data set has a dedicated discussion section which outlines an error for row 10472. Let us check this information by printing this row and comparing it against the header and another row that is correct.

In [3]:
for row in google_data:
    if len(row)!=len(google_data_header):
        print(row)
        print(google_data.index(row))
        
for row in Apple_data:
    if len(row)!=len(Apple_data_header):
        print(row)
        print(Apple_data.index(row))        
    

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472


And yes! There is an error in row 10472 of the file google_data. We can see that the rating value is missing. What we need to do now is to delete that row. It can also be seen that the file Apple_data does not have this problem.

In [4]:
del google_data[10472]

# Data Cleaning: Duplicate entries

The second step in the cleaning process is to check if there is any dupicated entry or not. If there are duplicate entries, one need to remove them. In the next cell we will write a program to calculate the number of duplicate apps for each data set and then print the first few examples of them.

In [5]:
duplicate_google_apps=[]
unique_google_apps=[]

for app in google_data:
    name=app[0]
    if name in unique_google_apps:
        duplicate_google_apps.append(name)
    else:
        unique_google_apps.append(name)
print('Number of duplicated google apps:', len(duplicate_google_apps))
print('\n')
print('Examples of duplicated google apps:', duplicate_google_apps[:10])

print('\n')
      
duplicate_Apple_apps=[]
unique_Apple_apps=[]

for app in Apple_data:
    name=app[1]
    if name in unique_Apple_apps:
        duplicate_Apple_apps.append(name)
    else:
        unique_Apple_apps.append(name)
print('Number of duplicated Apple apps:', len(duplicate_Apple_apps))
print('\n')
print('Examples of duplicated Apple apps:', duplicate_Apple_apps[:10])      

Number of duplicated google apps: 1181


Examples of duplicated google apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Number of duplicated Apple apps: 2


Examples of duplicated Apple apps: ['Mannequin Challenge', 'VR Roller Coaster']


One need to remove the duplicate apps. But there might be many duplicate entries for one app. We do not remove the duplicate apps randomly and there should be a nice critetion for doing that. Let us first look at duplicate apps named 'Twitter' of the file google_data.

In [6]:
for app in google_data:
    name =app[0]
    if name == 'Twitter':
        print(app)
        print('\n')
       

['Twitter', 'NEWS_AND_MAGAZINES', '4.3', '11667403', 'Varies with device', '500,000,000+', 'Free', '0', 'Mature 17+', 'News & Magazines', 'August 6, 2018', 'Varies with device', 'Varies with device']


['Twitter', 'NEWS_AND_MAGAZINES', '4.3', '11667403', 'Varies with device', '500,000,000+', 'Free', '0', 'Mature 17+', 'News & Magazines', 'August 6, 2018', 'Varies with device', 'Varies with device']


['Twitter', 'NEWS_AND_MAGAZINES', '4.3', '11657972', 'Varies with device', '500,000,000+', 'Free', '0', 'Mature 17+', 'News & Magazines', 'July 30, 2018', 'Varies with device', 'Varies with device']




These 3 rows for the app 'Twitter' were collected at different times. The row with highest number of reviewes should be the most recent one. So we will write a program to keep that row and remove the others. This idea can be applied fo all other apps in Google Play Store and Apple Store.

In [7]:
reviews_google_max={}
for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_google_max and reviews_google_max[name]<n_reviews:
        reviews_google_max[name]=n_reviews
    elif name not in reviews_google_max:
        reviews_google_max[name]=n_reviews
print(len(reviews_google_max))
print('Expected google length:', len(google_data) - 1181)

print('\n')
        
ratings_Apple_max={}
for app in Apple_data:
    name = app[1]
    n_ratings = float(app[5])
    if name in ratings_Apple_max and ratings_Apple_max[name]<n_ratings:
        ratings_Apple_max[name]=n_ratings
    elif name not in ratings_Apple_max:
        ratings_Apple_max[name]=n_ratings
print(len(ratings_Apple_max))
print('Expected Apple length:', len(Apple_data) - 2)    

9659
Expected google length: 9659


7195
Expected Apple length: 7195


The lengths of our data sets are equal to 9659 and 7195 as expected. Thus we have removed all the duplicate entries. Now we can creat the clean data set.

In [8]:
google_data_clean=[]
google_data_added=[]
for app in google_data:
    name = app[0]
    n_reviews=float(app[3])
    if n_reviews==reviews_google_max[name] and name not in google_data_added:
        google_data_clean.append(app)
        google_data_added.append(name)
print(len(google_data_clean))        
   
print('\n')    

Apple_data_clean=[]
Apple_data_added=[]
for app in Apple_data:
    name = app[1]
    n_ratings=float(app[5])
    if n_ratings==ratings_Apple_max[name] and name not in Apple_data_added:
        Apple_data_clean.append(app)
        Apple_data_added.append(name)
print(len(Apple_data_clean))        

9659


7195


# Data Cleaning: Removing Non-English Apps

We have obtained the clean data set. If we explore the data sets enough, we will notice the names of some of the apps which are not written in English. We do not want to analyze those appls. The next step in the cleaning process is to remove all the apps which do not have English names.

To do this, we remove each app whose name contains a symbol that is not commonly used in English text which usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;, etc.), and other symbols (+, *, /, etc.).

The characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127. We can use this information to build a function as below that checks an app name and tells us whether it contains non-ASCII characters.


In [9]:
def English_test(string):
    for character in string:
        if ord(character)>127:
            return False
    return True
print(English_test('Instagram'))
print(English_test('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(English_test('Docs To Go™ Free Office Suite'))
print(English_test('Instachat 😜'))

True
False
False
False


We have thus created a function English_test which can be used to check an app is in English or not. This function works relatively well with the 4 examples mentioned above. However, as we can see that the last three apps were actually English apps. Eventhough they contain non-English characters. The reason is because they contain some special emoji and characters. 

To solve this problem, we will only remove the app which contains more than 3 non-English characters. We will need to modify the above function. The modified function is not ablosutely correct but since very few non-English apps having more than 3 non-English characters, this seems to be good enough for our analysis.

In [10]:
def English_test_modified(string):
    count=0
    for character in string:
        if ord(character)>127:
            count+=1
    if count>3:    
        return False
    return True
print(English_test_modified('Instagram'))
print(English_test_modified('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(English_test_modified('Docs To Go™ Free Office Suite'))
print(English_test_modified('Instachat 😜'))

True
False
True
True


The function has been modified. We now use this function for our two data sets.

In [11]:
google_clean=[]
for app in google_data_clean:
    name = app[0]
    if English_test_modified(name)==True:
        google_clean.append(app)
print(len(google_clean))

print('\n')

Apple_clean=[]
for app in Apple_data_clean:
    name = app[1]
    if English_test_modified(name)==True:
        Apple_clean.append(app)
print(len(Apple_clean))

9614


6181


# Data Cleaning: Isolating Free Apps

As already discussed, the company only builds apps which are free to download and install. The final step in the cleaning process is to isolate free apps only.

In [12]:
google_final=[]
for app in google_clean:
    price = app[7]
    if price == '0':
        google_final.append(app)
print(len(google_final))     

print('\n')

Apple_final=[]
for app in Apple_clean:
    price = app[4]
    if price == '0.0':
        Apple_final.append(app)
print(len(Apple_final))     

8864


3220


So our final Google and Apple data sets have 8864 and 3220 rows respectively. We can start the analysis process.

# Analysis: Most Common Apps by Genre

We have obtained the final clean files for analysis. Our goal is to determine what kinds of apps which attract users the most. We can start by analyzing the most common genres for each market Google Store and Apple Store by building frequency tables for some columns in our final data sets. 

To be precise, we analyze the column 'Genres', 'Category' in google_final and 'prime_genre' in Apple_final. We will build a frequency table for these columns that show percentages and one more function which displays the percentage in the descending order.

In [13]:
def freq_table(dataset,index):
    column_counting={}
    total_number=0
    for app in dataset:
        total_number+=1
        column = app[index]
        if column in column_counting:
            column_counting[column]+=1
        else:
            column_counting[column]=1
    
    for app in column_counting:
        column_counting[app]=100*column_counting[app]/total_number

    return column_counting
    
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])  

        
        

In [14]:
display_table(google_final,1) #Category column in Google Play


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.700361010830325
MEDICAL : 3.5311371841155235
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.237815884476534
HEALTH_AND_FITNESS : 3.079873646209386
PHOTOGRAPHY : 2.9444945848375452
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768953
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418774
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075813
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 0

In [15]:
display_table(google_final,9) # Genres column


Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.700361010830325
Medical : 3.5311371841155235
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.237815884476534
Action : 3.1024368231046933
Health & Fitness : 3.079873646209386
Photography : 2.9444945848375452
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.041967509025271
Dating : 1.861462093862816
Arcade : 1.8501805054151625
Video Players & Editors : 1.7712093862815885
Casual : 1.759927797833935
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418774
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075813

In [16]:
display_table(Apple_final,11) # prime_genre column in Apple Store


Games : 58.13664596273292
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.2919254658385095
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602483
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801242
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


# Analyzing Frequency Tables: App Store

With Apple Store data set, the most common genre is 'Games' with about 58.14% among the total number of apps. The runner-up is 'Entertainment' with about 7.89%.

The general impression is that most of the apps are designed for entertainment such as games, entertainment, photo & video, social networking, sports, musics...The pratical purpose apps, such as education, shopping, utilities, productivity, lifestyle, business, books,...are more rare.

Remark that a large number of apps for a particular genre does not imply that apps of that genre have a large number of users in general.



# Analyzing Apps on Google Store

For Google Store apps, we look at the two columns 'Category' and 'Genres' to analyze:

At the first glance, from the column 'Category', it seems that most apps on Google Store are designed for pratical purpose such as family, tools, business, productivity and not some many apps are designed for entertainment. But it turns out that most of the apps in the category 'Family' are games for kids. 

But anyway, in comparing with Apple Store apps, Google Play contains more practical ones. This con be confirmed by looking at 'Genres' column.

For now, we can conclude that the App Store is dominated by apps designed for entertainment, while Google Play has a balance between practical and entertainment apps. 

Next step is to check what kind of apps that have most users.

        

# Analysis: what type of apps have the most number of users?

To find the most popular genres of apps on Google Play, i.e, the genres have the most users, one can conpute the number of installs. Thus we will explore the 'Installs' column of the data set google_final.

On the other hand, to find the similar information for App Store, we will explore the column 'rating_count_tot'-the total number of user ratings. Since we do not have the information about the number of installs in this data set.

    

# App Store: Most Popular Apps by Genre

In [17]:
Apple_genres=freq_table(Apple_final, 11)

for genre in Apple_genres:
    total=0
    len_genre=0
    for app in Apple_final:
        genre_app=app[11]
        if genre_app==genre:
            ratings=float(app[5])
            total+=ratings
            len_genre+=1 
    ave_ratings=total/len_genre
    print(genre,':' ,ave_ratings)
    

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22812.92467948718
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


We conclude from this that the genre 'Navigation' has the highest number of user ratings with approximately 86090 ratings. It follows by 'Reference', 'Social Networking', 'Music', 'Weather', 'Book', 'Food and Drink', 'Finance'.

If we look closer at the genre 'Navigation' as below, we can see that most of the apps (345046+154911=499957) are influenced by Waze and Google. Navigation apps seem to be more popular than the reality because of some extremely popular apps.



In [18]:
for app in Apple_final:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5]) # name and number of ratings columns

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The similar situation applies to the genres 'Social Networking' and 'Music' as below:

In [19]:
for app in Apple_final:
    if app[11] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [20]:
for app in Apple_final:
    if app[11] == 'Music':
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

This phenomema makes a difficulty for us in finding the most popular genres. One natural idea is to remove the extremely popular apps for each genre and then canculate again the user ratings. But in this case one can use another approach by looking at the genre 'Reference':

In [21]:
for app in Apple_final:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Reference apps have on average 74942 ratings and again it is dominated by the giants Bible and Dictionary.com. However, this genre seems to be a potential direction for the company! 

One idea is to take another popular book and turn it into an app where we could add different features for íntance daily quotes from the book, audio version of the book, quizzes about the book...We can also attach a dictionary within the app. The users do not need to exit our app to look up words in an external app.

Since App Store is dominated by entertainment apps, this idea seems to be interesting. A practical app might have more chance to stand out among so many apps on the App Store.

We can apply the same argument for the genre 'Book'. For the other popular apps such as 'Weather', 'Food and Drink', 'Finance', they seem to be not so interesting because:

Weather apps: people do not spend too much time on app and it is difficult to make profit. On the other hand, to have a trusted weather data, one might need to connect our apps to non-free Apps.

Food and Drink: which includes Starbucks, McDonald's...Making a popular app of this genre requires actual cooking and a delivery service, which is outside the scope of the company.

Finance apps: these apps involve banking, paying bills, money transfer... Building a finance app requires deep knowledge in finance.

# Google Play: Most Popular Apps by Genres

We can explore the number of installs in this case in order to find the necessary information. One difficulty arises, we do not have precise numbers as follows: 

In [22]:
display_table(google_final, 5) #Installs column

1,000,000+ : 15.72653429602888
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.1985559566787
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.772111913357401
5,000+ : 4.512635379061372
10+ : 3.542418772563177
500+ : 3.2490974729241877
50,000,000+ : 2.3014440433212995
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.7897111913357401
1+ : 0.5076714801444043
500,000,000+ : 0.27075812274368233
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


Fortunately, we do not need to know exactly the values in this case. To make it possible for our computation, one can use the float type of the number of installs. To do this, the symbols '+' and ',' need to be removed.  

In [23]:
google_categories=freq_table(google_final, 1)

for category in google_categories:
    total=0
    len_category=0
    for app in google_final:
        category_app=app[1]
        if category_app==category:
            installs=app[5]
            installs=(installs.replace('+','')).replace(',','')
            installs=float(installs)
            total+=installs
            len_category+=1
    ave_installs=total/len_category
    print(category, ':', ave_installs)
            
    
    

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

It is easy to see that the category 'COMMUNICATION' has the largest number of installs 38456119. Again, this number is heavily influenced by the giants such as WhatsApp, Messenger, Skype, Google Chrome, Gmail, and Hangouts. We can double check this information as follow:


In [24]:
for app in google_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Viber Messenger : 500,000,000+


The same phenomena applies to the second most installs category 'VIDEO_PLAYERS'

In [25]:
for app in google_final:
    if app[1] == 'VIDEO_PLAYERS' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+'):
        print(app[0], ':', app[5])

YouTube : 1,000,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+


So again, we meet the similar situation in comparing with App Store. It is hard to come up with a goog idea for now.

Note that the genre 'GAME' is quite polular but it is saturated, thus we need to think about another.

Remember that our goal is to find potential apps which work well both on App Store and Google Play, one might think about the category 'BOOKS_AND_REFERENCE'. In fact, this genre seem to be popular with 8767812 installs on average. Let is explore this genre more carefully.

In [26]:
for app in google_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

Again, we can check if this genre is really popular or if it is dominated by some extremely popular apps

In [27]:
for app in google_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+' 
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


This is quite interesting since there are not some many very popular apps. It seems that taking a popular book and turning it into an app could be profitable also for the Google Play market as for the case of App Store.

One need to add some special features such as daily quotes from the book, audio version, quizzes on the book, discussion forum, libraries...Note that one might remove the libraries since the App Store and Google Play are already full of them.

# CONCLUSIONS

We analyze the two huge data sets of mobile apps on App Store and Google Play for a company building these apps. The goal is to make a recommendation for the developes of the company which kind of apps they should build so that they can be profiatable from both App Store and Google Play.

We concluded that a potential direction is to take a popular book and turn it into an app. This method could profit for both App Store and Google Play markets. Keep in mind that the markets are already full of libraries, one need to attach some special features such as daily quotes, audio version, quizzes on the book, discussion forum...

