# Profitable App Profiles for the App Store and Google Play Markets

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. Our client develops free apps and would like to know what app types would be more profitable to build. The company's main source of revenue consists of in-app ads. Thus, our target is to identify app profiles that have a great user reach.

## Opening and Exploring Data

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. We are going to use the following datasets, both of which are a sample of the total number of apps available on App Store and Google Play Store.

* Google Play Store - https://www.kaggle.com/lava18/google-play-store-apps 
* Apple Store - https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

The Google Play data set contains data about aproximately ten thousand Android apps on Google Play, and the App Store dataset contains data about around seven thousand iOS apps on the App Store.

In [90]:
open_file = open(r'C:\Users\Lara\Documents\JAVACRISP\Dataset\AppleStore.csv', encoding='utf8')
from csv import reader
read_file = reader(open_file)
apple_store_data = list(read_file)

In [91]:
open_file = open(r'C:\Users\Lara\Documents\JAVACRISP\Dataset\googleplaystore.csv', encoding='utf8')
from csv import reader
read_file = reader(open_file)
google_store_data = list(read_file)

We define a function *explore_data* to make it easier to explore rows in a more readable way. This allows us to view the headers, rows content, number of rows and number of columns for each data set.

In [92]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') 
    if rows_and_columns==True:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [93]:
explore_data(google_store_data,0,1,True)
print('\n')
explore_data(apple_store_data,0,1,True)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Number of rows: 10842
Number of columns: 13


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Number of rows: 7198
Number of columns: 16


The Google Play data set has 10,842 rows, which includes the headers. This indicates that the Google Play data set contains data for 10,841 apps. 

After quickly scanning through the columns, I make a note of the following column headers that could be useful for the purpose of our analysis: 'App', 'Category', 'Reviews', 'Installs', 'Price' and 'Genres'.

The App Store data set has 7,198 rows, including the headers, so this data set contains data on 7,197 iOS apps. I noted down the following column headers that could be useful: 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'

Many of the headings are self-explanatory but for better understanding, I refer to the documentation mentioned at the start of both data set sources.

## Data Cleaning

### 1. Deleting missing data

The Google Play data set has a dedicated discussion section where it has been highlighted that row 10472 has missing or erroneous data. 

I printed row 10472 and compared this against the header and another row. Upon inspection, I couldn't find any missing or incorrect data. 

Taking into account that the user reporting the error might have removed the header row for the dataset, I checked row 10473.  We have seen previously that the android data set had 13 columns whereas this particular row has only 12 entries. 

I therefore deleted row 10473, being careful not to run this code more than once, as this will delete the 'new' row 10473.

In [94]:
print(google_store_data[10472])  # incorrect row reported
print('\n')
print(google_store_data[0])  # header
print('\n')
print(google_store_data[1])      # correct row
print('\n')
print(google_store_data[10473]) #actual incorrect row

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [95]:
del (google_store_data[10473]) #don't run this more than once

### 2. Removing Duplicate Entries

#### Part 1 - Identifying Duplicates

We have been told that some apps have more than one entry in the Google Play data set. Below, we identify which apps have duplicate entries and separate them from the unique apps for both the Google Play and App Store data sets. We do this by creating a list for duplicate apps and another list for unique apps in both data sets. 

In [96]:
duplicate_google_apps = []
google_apps = []

for app in google_store_data:
    name = app[0]
    if name in google_apps:
        duplicate_google_apps.append(name)
    else:
        google_apps.append(name)

print ('Number of duplicate Google apps: ', len(duplicate_google_apps))
print ('Examples of duplicate Google apps: ', duplicate_google_apps[:5])

Number of duplicate Google apps:  1181
Examples of duplicate Google apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


In [97]:
duplicate_apple_apps = []
apple_apps = []

for app in apple_store_data:
    name = app[0]
    if name in apple_apps:
        duplicate_apple_apps.append(name)
    else:
        apple_apps.append(name)

print ('Number of duplicate Apple apps: ', len(duplicate_apple_apps))
print ('Examples of duplicate Apple apps: ', duplicate_apple_apps[:5])

Number of duplicate Apple apps:  0
Examples of duplicate Apple apps:  []


In total, there are 1,181 cases where an app occurs more than once. 

We have confirmed that there are no duplicate apps in the App Store data set.

Looking closer at the duplicates, some have different numbers of ratings, which suggests that the data was collected at different times. We will keep the rows that have the highest number of reviews as the more reviews, the more reliable the ratings.

#### Part 2 - Deleting Duplicates

We then removed the duplicates from the Google Play data set. To do this, we created a dictionary '*google_reviews_max*' that contains the app names as keys and the value is going to be the highest number of ratings. 

In [98]:
google_reviews_max = {}

for app in google_store_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in google_reviews_max and google_reviews_max[name] < n_reviews:
        google_reviews_max[name] = n_reviews
    elif name not in google_reviews_max:
        google_reviews_max[name] = n_reviews

print('Initial number of rows:', (len(google_store_data)-1))     
print ('Number of rows after removing duplicates:', len(google_reviews_max))

Initial number of rows: 10840
Number of rows after removing duplicates: 9659


In a previous code cell, we found that there are 1,181 duplicate apps, so the length of our unique apps dictionary should equal the difference between our the initial number of rows in our data set and the number of duplicate apps:

$ 10,840-1,181=9,659 $

I then created a new dataset without duplicates named *'google_apps_clean'*.

In [99]:
google_apps_clean = [] #new cleaned dataset
already_added = [] #to store app names

for app in google_store_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if google_reviews_max[name]==n_reviews and name not in already_added:
        google_apps_clean.append(app)
        already_added.append(name)

print (len(google_apps_clean))

9659


### 3. Removing Non-English Apps

We have been advised that some of the apps are not directed toward an English-speaking audience. Since our company is targeting an English-speaking audience, we want to remove apps with any non-English characters in our data set. 

Characters that are specific to English texts are encoded using the ASCII standard and each ASCII character has a corresponding number between 0 and 127. 

We use the *ord( )* built-in function to assign a unicode value to each character in the app name. 

Some English app names use special characters, such as emjois or other symbols, which fall outside the ASCII standard. To avoid filtering out these apps, we have decided to only remove an app from our data set if it's name has more than three non-ASCII characters.

We use the *ord( ) function within our own defined function *is_english( )* to determine whether an app contains more than 3 non-ASCII characters in its name and if so, remove the app from our dataset.


In [100]:
def is_english(string):
    non_english = 0
    for i in string:
        if ord(i)>127:
            non_english += 1
    if non_english > 3:
        return False
    else:
        return True

google_apps_eng = []
apple_apps_eng = []

for app in google_apps_clean:
    name = app[0]
    if is_english(name) == True:
        google_apps_eng.append(app)

for app in apple_store_data:
    name = app[1]
    if is_english(name) == True:
        apple_apps_eng.append(app)
    
explore_data(google_apps_eng, 0,1, True)
print('\n')
explore_data(apple_apps_eng,0,1,True)
     

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Number of rows: 6184
Number of columns: 16


We are left with 9614 Android apps and 6184 iOS apps.

### 4. Isolating Free Apps

We are interested in apps that are free to download and install, however, our dataset contains data on both free and non-free apps; thus, we will only keep the data on free apps for our analysis.

In [101]:
free_google_apps_eng = []
free_apple_apps_eng = []

for app in google_apps_eng:
    price = app[7]
    if price == '0':
        free_google_apps_eng.append(app)

for app in apple_apps_eng:
    price = app[4]
    if price == '0.0':
        free_apple_apps_eng.append(app)

explore_data(free_google_apps_eng, 0,1, True)
print('\n')
explore_data(free_apple_apps_eng,0,1,True)
        
        

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


Number of rows: 3222
Number of columns: 16


I use the *explore_data* function above to sense-check our new cleaned data set for free apps targeted towards an English-speaking audience in both the App store and Google Play.

We now have 8863 Android apps and 3222 iOS apps in our data sets.

## Data Analysis

### 1. Most Common Apps by Genre

Our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps. 

Because our end goal is to add the app on both the App Store and Google Play, we need to find the app profiles that are successful on both markets. 

We consider the most common genres in each market. This data is given under the *'Genres'* and *'Category'* columns of the Google Play data set and the *'prime_genre'* column of the App Store data set. 

First, we build frequency tables for the *'Genres'* and *'Category'* columns on the Google Play data set. Then we use the frequency tables within the following functions that we have defined to analyse the data:

* `freq_table` to generate frequency tables in percentage terms
* `display_table` to display the percentages in descending order, showing the most common apps at the top 

In [102]:
def freq_table(dataset, index):
    frequency_table = {}
    total = 0
    for row in dataset:
        total += 1
        value = row[index]
        if value in frequency_table:
            frequency_table[value] += 1
        else:
            frequency_table[value] = 1
            
    freq_table_percentages = {}
    for key in frequency_table:
        proportion = frequency_table[key] / total
        percentage = proportion * 100
        freq_table_percentages[key] = percentage
        
    return freq_table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])   

print('Google Play Categories:')
display_table(free_google_apps_eng, 1) #Google store, category
print ('\n')
print('Google Play Genres:')
display_table(free_google_apps_eng, 9) #Goolge store, genre
print ('\n')

Google Play Categories:
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299

Among the free English apps on Google Play, the most common *genres* are *Tools, Entertainment, Education, Business & Productivity*.
The top 5 most common apps are related to the following *categories*: *Family, Game, Tools, Business & Lifestyle*.

Looking at the two frequency tables we have produced, the *'Genre'* column seems to have a more wider range of options or classifications compared to the *'Category'* column. We're only looking to get the bigger picture at the moment, so we will only work with the *'Category'* column moving forward.

In [103]:
display_table(free_apple_apps_eng, 11) #Apple store, prime_genre

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Unlike the Android app market (on Google Play) where there is more of a balance between for-fun apps and those designed for practicular purposes, one genre clearly dominates the iOS app market (on the App Store). 

More than half (approx. 58%) of the free English apps in our iOS app data set are *Games*. *Entertainment* apps are the second most common, making up 8% of the App Store, followed by *Photo & Video, Education and Social Networking* apps. 

### 2. Most Popular Apps by Genre 

The frequency tables we analysed showed us that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced representation of both practical and for-fun apps. Now, we'd like to get an idea about the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_column app.

* 1. Isolate the apps of each genre
* 2. Sum up the user ratings for the apps of that genre
* 3. Divide the sum by no. of apps in that genre

#### Part 1 - Most Popular Apps by Genre on the App Store
Below, we calculate the average number of user ratings per app genre on the App Store:

In [104]:
prime_genre_table = freq_table(free_apple_apps_eng, 11)
for genre in prime_genre_table:
    total = 0
    len_genre = 0 
    for app in free_apple_apps_eng:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre +=1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

print ('\n')

for app in free_apple_apps_eng:
    if app[11] in ['Navigation','Social Networking','Music']:
        print(app[1], ':', app[5]) # print name and number of ratings

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Facebook : 2974676
Pandora - Music & Radio : 1126879
Pinterest : 1061624
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
Skype for iPhone : 373519
Messenger : 351466
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Tumblr : 334293
iHeartRadio – Free Music & Radio Stations : 293228
WhatsApp Messenger : 28758

At first glance, navigation apps seem to be the most popular genre in the App Store data set, with the highest number of user reviews on average. 

We take a closer look into the top 5 genres in the App Store to sense-check our findings.

In [112]:
for app in free_apple_apps_eng:
    if app[-5] == 'Navigation':
        print(app[1],':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Waze and Google Maps are outliers and influence the average of number of ratings for this genre, with approx. 500,000 user reviews together. We then look at the top apps in the 'Social Networking' genre.

In [122]:
for app in free_apple_apps_eng:
    if app[-5] == 'Social Networking':
        print(app[1],':',app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

The same pattern applies for the *'Social Netowrking'* genre where the average number of user ratings is heavily influenced by outliers like Facebook, Pinterest, Skype, Messenger and Tumblr.

In [124]:
for app in free_apple_apps_eng:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

In [125]:
for app in free_apple_apps_eng:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [127]:
for app in free_apple_apps_eng:
    if app[-5] == 'Weather':
        print(app[1], ':', app[5])

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

We see the same pattern in the 'Music' genres, where the data is skewed by a few outliers such as Pandora and Spotify, making it seem like some genres are more popular than they really are. 

Reference and Weather columns seem to also have outliers but it looks like there is not a huge number of apps offering this kind of service. We will keep these two in mind and see if there is a common pattern also in the Google play data set.

#### Part two - Most Popular Apps by Genre on Google Play

We have been provided with data about the number of installs per app in the Android app market, so we should be able to get a clearer picture of genre popularity. Some of the data given on install numbers are open-ended i.e., 1,000+, 100,000+, or 1,000,000+, etc. However, we don't need to be precise so we have used left the numbers as they are.

In [105]:
google_categories = freq_table(free_google_apps_eng, 1)

for category in google_categories:
    total = 0     #this variable will store the sum of installs in each genre
    len_category = 0    #this varaible will store the no. of apps in each genre
    for app in free_google_apps_eng:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',','')
            n_installs = n_installs.replace('+','')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print (category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Communication apps seem to have the highest number of installs on average. We take a closer look at apps in this genre to see if there are any outliers.

In [106]:
for app in free_google_apps_eng:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

On average, communication apps have the highest number of installs on average, but this is heavily skewed up by outliers such as WhatsApp, Skype, Messenger amongst other apps.

We remove the outliers and check again the average number of installs. We remove communication apps with over 100 million installs. 

In [107]:
under_100_m = []

for app in free_google_apps_eng:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

We can see that the average number of installs for Communication apps reduced from 38,456,119 to 3,603,485. 

We notice that the 'Books and Reference' genre is also popular on Google Play (as well as the App Store). Thus, we take a closer look at the apps under the 'Books and Reference' category and check for any outliers.

In [108]:
for app in free_google_apps_eng:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

This genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, language learning apps, etc. However, there are still a number of extremely popular apps that skew the average number of installs.


In [109]:
for app in free_google_apps_eng:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])


Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


There are fewer outliers in this genre compared to other genres, so we will explore this market further.

We then look at apps in the middle in terms of popularity (between 1,000,000 and 100,000,000 installs). 

In [110]:
for app in free_google_apps_eng:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

It looks like this genre is dominated by software for reading ebooks. Libraries and dictionaries are also very popular therefore we would avoid building similar apps since there will be some significant competition in this space.  There are also a few Al-Quran apps and a few other niches, but there is only one app for favourite books.

## Conclusion

We analysed data about apps available on the App Store and Google Play, with the aim of finding a suitable app profile that can be profitable for both the Android and iOS app market. 

We concluded that an app centred around a popular book could be profitable in both markets. 