# Profitable app profiles (AppStore and GooglePlay)
   
   ### About
In this project, we are going to do the analysis of the AppStore and GooglePlay datasets to understand which app profile could be profitable in the respective marketplace.

   ### Goal:
Through this project, we aim to understand the free apps** that are listed in the Apple AppStore and Google Play markets and based on the actual usage statistics which app type profile might be best suited for us to develop as a free app so that it maximises the in-app ad revenue.

*** We are only considering apps which are of English names* 

In [1]:
from csv import reader

### The Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

To make it easier to explore the two data sets, we'll first write a function named explore_data() that we can use repeatedly to explore rows in a more readable way.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [4]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


 ## About the dataset  
   
* Apple AppStore dataset:  
    There are 7197 apps and information about app spread across 16 columns.  
    Following columns might be of interest in our analysis:
        - prime_genre - Genre of the app
        - rating_count_tot - Total users who have rated the apps across all versions
        - price - To determine if its a free or a paid app  

  
* Google PlayStore dataset:  
    There are 10841 apps and information about app spread across 13 columns.  
    Following columns might be of interest in our analysis:
        - Category - Higher level category of the app
        - Genres - Genre of the app
        - Installs - Number of users who have installed the app
        - Type - To determine if its a free or a paid app


## Deleting Wrong Data

We can find a problem in the row 10472, it is caused by a missing value in the 'Category' column). As a consequence, we'll delete this row.

In [5]:
print(android[10472])  # incorrect row
print('\n')
print(android_header)  # header
print('\n')
print(android[0])      # correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


In [6]:
print(len(android))
del android[10472]  # don't run this more than once
print(len(android))

10841
10840


## Check duplicate data

Exploring the Google Play data, we'll find that some apps have more than one entry. For example, the application Instagram has four entries:

In [7]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
    
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In [8]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Now to remove the duplicates, we have to keep only one row and delete the remaining for each of the duplicate app.

Looking at the column Reviews for the Instagram duplicate rows, we can see that there are variations between the rows in the number of ratings. We'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more reliable the ratings.


### Delete duplicate apps with lowest number of reviews...

To do that, we will:

1.-Create a dictionary where each key is a unique app name, and the value is the highest number of reviews of that app.  
2.-Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)

In [10]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


In [12]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name) 

explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Removing Non-English Apps

As we are interested just in english apps, we are going to remove each app whose name contains a symbol that is not commonly used in English text. 

In [13]:
print(ios[813][1])
print(ios[6731][1])

print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


In [14]:

def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False



The function seems to work fine, but some English app names use emojis or other symbols that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form. To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters:

In [17]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

print(ord('™'))
print(ord('😜'))

True
True
8482
128540


In [16]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
True


In [18]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

We can see that we're left with 9614 Android apps and 6183 iOS apps.

## Isolating the free apps

If we remember our project goal, we are looking to develop a free app and maximise our ad-revenue and hence for our analysis we only need the free apps to be considered for our profiling.  
  
In this final part of the data cleansing excercise, we are going to seperate the free and paid apps in both AppStore and PlayStore datasets.

In [19]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8864
3222


# Data Analysis

## Most Common Apps by Genre

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we then develop it further.
If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.
Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets.  
  
Let's begin the analysis by getting a sense of the most common genres for each market. 

In [20]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

In [21]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [22]:
display_table(android_final, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [23]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In [24]:
display_table(android_final, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

* AppStore: Looking at the data displayed above, we can see the following patterns:

The AppStore is dominated by the Games genre which occupies over half of the AppStore content at 58.16%.  
Now if we categorise the genres as Fun and Practical - We see that AppStore is dominated by apps for fun rather than practical purposes:  
Fun (Games, Entertainment, Photo & Video, Social Networking, Sports, Music) - occupies ~78%  
and Practical - occupies ~22%


* PlayStore: Here we have two columns (Genres and Category) for the purpose of categorisation. If we look at the Genres column we see that data is too granular to be able to stay categorically at a high-level similar to the AppStore.

So for our sub-sequent analysis to stay at the same level as the AppStore, we are going to consider the Category column and ignore the Genres column in the PlayStore.  
There are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.)

## Most Popular Apps by Genre on the App Store

One way to find out what genres are the most popular is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but for the App Store data set this information is missing.   

As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

In [25]:
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Reference : 74942.11111111111
Productivity : 21028.410714285714
Games : 22788.6696905016
Catalogs : 4004.0
Medical : 612.0
Photo & Video : 28441.54375
Travel : 28243.8
Utilities : 18684.456790123455
Book : 39758.5
Social Networking : 71548.34905660378
Weather : 52279.892857142855
Sports : 23008.898550724636
Business : 7491.117647058823
Finance : 31467.944444444445
Music : 57326.530303030304
Navigation : 86090.33333333333
Entertainment : 14029.830708661417
Health & Fitness : 23298.015384615384
Education : 7003.983050847458
News : 21248.023255813954
Shopping : 26919.690476190477
Food & Drink : 33333.92307692308
Lifestyle : 16485.764705882353


In [26]:
def top_apps_by_genre(dataset, genre, genre_index, appname_index, users_index, top_n = 5, pct = False):
    genre_apps = []
    total_genre_users = 0
    for app in dataset:        
        app_genre = app[genre_index]
        app_name = app[appname_index]
        app_users = int((app[users_index].replace(',','')).replace('+',''))        
        if app_genre == genre:
            total_genre_users += app_users
            app_tupple = (app_users, app_name)
            genre_apps.append(app_tupple)
    top = 0
    print('*'*5,'Top',top_n,'apps for',genre,'*'*5)    
    for app in sorted(genre_apps, reverse = True):
        top += 1
        if top > top_n:
            print('\n')
            break
        app_name = app[1]
        app_users = app[0]
        if pct == True:
            if total_genre_users != 0:
                app_user_pct = round((app_users / total_genre_users) * 100,2)
            else:
                app_user_pct = 0
            print(app_name,':',app_users, '(' + str(app_user_pct)+'%)')
        else:
            print(app_name,':',app_users)
        
top_apps_by_genre(dataset = ios_final, genre="Navigation", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = ios_final, genre="Reference", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = ios_final, genre="Social Networking", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = ios_final, genre="Music", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = ios_final, genre="Weather", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = ios_final, genre="Book", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = ios_final, genre="Finance", genre_index=-5, appname_index=1, users_index=5, pct=True)

***** Top 5 apps for Navigation *****
Waze - GPS Navigation, Maps & Real-time Traffic : 345046 (66.8%)
Google Maps - Navigation & Transit : 154911 (29.99%)
Geocaching® : 12811 (2.48%)
CoPilot GPS – Car Navigation & Offline Maps : 3582 (0.69%)
ImmobilienScout24: Real Estate Search in Germany : 187 (0.04%)


***** Top 5 apps for Reference *****
Bible : 985920 (73.09%)
Dictionary.com Dictionary & Thesaurus : 200047 (14.83%)
Dictionary.com Dictionary & Thesaurus for iPad : 54175 (4.02%)
Google Translate : 26786 (1.99%)
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418 (1.37%)


***** Top 5 apps for Social Networking *****
Facebook : 2974676 (39.22%)
Pinterest : 1061624 (14.0%)
Skype for iPhone : 373519 (4.93%)
Messenger : 351466 (4.63%)
Tumblr : 334293 (4.41%)


***** Top 5 apps for Music *****
Pandora - Music & Radio : 1126879 (29.78%)
Spotify Music : 878563 (23.22%)
Shazam - Discover music, artists, videos & lyrics : 402925 (10.65%)
iHeartRadio – Free Music & Radio Stations : 29

On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together.  
The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:

       - Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

       - Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.

       - Finance apps — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.

### Most Popular Apps by Genre on Google Play

In [27]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ENTERTAINMENT : 11640705.88235294
COMICS : 817657.2727272727
EDUCATION : 1833495.145631068
WEATHER : 5074486.197183099
PARENTING : 542603.6206896552
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
SPORTS : 3638640.1428571427
COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
TOOLS : 10801391.298666667
SOCIAL : 23253652.127118643
BEAUTY : 513151.88679245283
MAPS_AND_NAVIGATION : 4056941.7741935486
MEDICAL : 120550.61980830671
FOOD_AND_DRINK : 1924897.7363636363
LIFESTYLE : 1437816.2687861272
PHOTOGRAPHY : 17840110.40229885
HOUSE_AND_HOME : 1331540.5616438356
FINANCE : 1387692.475609756
EVENTS : 253542.22222222222
PERSONALIZATION : 5201482.6122448975
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
DATING : 854028.8303030303
HEALTH_AND_FITNESS : 4188821.9853479853
TRAVEL_AND_LOCAL : 13984077.710144928
LIBRARIES_AND_DEMO : 638503.734939759
ART_AND_DESIGN : 1986335.0877192982
SHOPPING : 7036877.311557789
NEWS_AND_MAGAZINES : 9549178.467741935
A

In [28]:
top_apps_by_genre(dataset = android_final, genre="COMMUNICATION", genre_index=1, appname_index=0, users_index=5, pct=True)
top_apps_by_genre(dataset = android_final, genre="PRODUCTIVITY", genre_index=1, appname_index=0, users_index=5, pct=True)
top_apps_by_genre(dataset = android_final, genre="FINANCE", genre_index=1, appname_index=0, users_index=5, pct=True)
top_apps_by_genre(dataset = android_final, genre="BOOKS_AND_REFERENCE", genre_index=1, appname_index=0, users_index=5, pct=True)

***** Top 5 apps for COMMUNICATION *****
WhatsApp Messenger : 1000000000 (9.06%)
Skype - free IM & video calls : 1000000000 (9.06%)
Messenger – Text and Video Chat for Free : 1000000000 (9.06%)
Hangouts : 1000000000 (9.06%)
Google Chrome: Fast & Secure : 1000000000 (9.06%)


***** Top 5 apps for PRODUCTIVITY *****
Google Drive : 1000000000 (17.27%)
Microsoft Word : 500000000 (8.63%)
Google Calendar : 500000000 (8.63%)
Dropbox : 500000000 (8.63%)
Cloud Print : 500000000 (8.63%)


***** Top 5 apps for FINANCE *****
Google Pay : 100000000 (21.97%)
PayPal : 50000000 (10.99%)
İşCep : 10000000 (2.2%)
Wells Fargo Mobile : 10000000 (2.2%)
Mobile Bancomer : 10000000 (2.2%)


***** Top 5 apps for BOOKS_AND_REFERENCE *****
Google Play Books : 1000000000 (60.03%)
Wattpad 📖 Free Books : 100000000 (6.0%)
Bible : 100000000 (6.0%)
Audiobooks from Audible : 100000000 (6.0%)
Amazon Kindle : 100000000 (6.0%)





We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player.  
The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

## Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.  
  
We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.