# Profitable App Profiles for the App Store and Google Play Markets #

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

## Open and explore the data ##

In [59]:
from csv import reader

dataset_apple = open('AppleStore.csv')
dataset_google = open('googleplaystore.csv')

# Apple dataset
dataset_apple = reader(dataset_apple)
dataset_apple = list(dataset_apple)
dataset_apple_header = dataset_apple[0]
apple = dataset_apple[1:]

# Google dataset
dataset_google = reader(dataset_google)
dataset_google = list(dataset_google)
dataset_google_header = dataset_google[0]
google = dataset_google[1:]

explore_data is a function to explore rows of data. Setting rows_and_columns to True will
show the numbers of rows and columns in dataset

In [60]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Explore the Apple iOS dataset

In [61]:
explore_data(apple, 0, 5, rows_and_columns=True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


We can see that the iOS dataset contains data on 7198 apps. There are 16 columns. Potentially useful columns for our purpose of determining profitable app profiles are track_name, price, rating_count_tot, user_rating, cont_rating and prime_genre.

Explore the Google dataset

In [62]:
explore_data(google, 0, 5, rows_and_columns=True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


We can see that the Google dataset contains data on 10842 apps. There are 13 columns. Potentially useful columns for our purpose of determining profitable app profiles are app, category, ratings, reviews, isntalls, and price, 

## Delete incorrect data ##

A user on the discussion section of Kaggle for the Android dataset reported an error in row 10472. We look at that row and find out the Category column is missing. So delete that row.

In [63]:
explore_data(google, 10472, 10473)

del google[10472]

explore_data(google, 10472, 10473)



['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']




## Delete duplicate data ##

Some rows in the Android dataset are duplicates of each other. We can find them using some code like this

In [64]:
duplicate_apps = []
unique_apps = []

for app in google:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
    
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


We will delete any duplicate rows. The way we choose which duplicates to delete is by looking at the column 'Reviews' and keeping the row with the highest number of reviews. In other words, we are keeping the most recent data sample for that app.

To do that in code, we'll build a dictionary where each key is the name of the app its value is the highest number of reviews for that app. We then create a new dataset with this dictionary, and the dataset will only have one entry per unique app.

In [65]:
unique_apps_dict = {}

for app in unique_apps:
    max_reviews = 0
    for row in google:
        if row[0] == app:
            if float(row[3]) > max_reviews:
                max_reviews = float(row[3])
    unique_apps_dict.update({app : max_reviews})

# Convert the dict to a list of lists (cleaned dataset)
# print(unique_apps_dict)

android_clean = []
already_added = []

for app in google:
    name = app[0]
    n_reviews = float(app[3])
    
    if (unique_apps_dict[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name) # make sure this is inside the if block

Do the same as above for apple

In [66]:
duplicate_apps_apple = []
unique_apps_apple = []

for app in apple:
    name = app[0]
    if name in unique_apps_apple:
        duplicate_apps_apple.append(name)
    else:
        unique_apps_apple.append(name)


unique_apps_dict_apple = {}

for app in unique_apps_apple:
    max_reviews = 0
    for row in apple:
        if row[0] == app:
            if float(row[5]) > max_reviews:
                max_reviews = float(row[5])
    unique_apps_dict_apple.update({app : max_reviews})

# Convert the dict to a list of lists (cleaned dataset)

apple_clean = []
already_added = []

for app in apple:
    name = app[0]
    n_reviews = float(app[5])
    
    if (unique_apps_dict_apple[name] == n_reviews) and (name not in already_added):
        apple_clean.append(app)
        already_added.append(name)

Look at a few rots of the cleaned dataset and confirm how many rows we have.

In [67]:
explore_data(android_clean, 0, 3, True)
explore_data(apple_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'G

## Removing Non-English Apps ##

Create a function that determines if an app name is in English

In [68]:
def app_name_in_english(app):
    
    non_english_letters = 0
    
    for letter in app:
        if ord(letter) > 127:
            non_english_letters += 1
        
    return non_english_letters <= 3

test_strings = ['Instagram', '爱奇艺PPS -《欢乐颂2》电视剧热播', 
               'Docs To Go™ Free Office Suite', 'Instachat 😜']

for string in test_strings:
    if app_name_in_english(string):
        print(string, 'is an English app')
    else:
        print(string, 'is not an English app')

Instagram is an English app
爱奇艺PPS -《欢乐颂2》电视剧热播 is not an English app
Docs To Go™ Free Office Suite is an English app
Instachat 😜 is an English app


Go through each dataset and examine the app's name. Move rows with English app names to new lists.

In [69]:
android_clean_english = []
apple_clean_english = []

for app in android_clean:
    name = app[0]
    if app_name_in_english(name):
        android_clean_english.append(app)
        
for app in apple_clean:
    name = app[1]
    if app_name_in_english(name):
        apple_clean_english.append(app)
        
explore_data(android_clean_english, 0, 3, True)
explore_data(apple_clean_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'G

## Remove non-free apps ##

In the apple dataset, price column is index 4. In the google dataset, price column is index 7. Since this is our final step processing the data, we will name the dataset lists android_final and ios_final.

In [70]:
android_final = []
ios_final = []

for app in android_clean_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in apple_clean_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
explore_data(android_final, 0, 3, True)
explore_data(ios_final, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'G

## Most Common Apps by Genre ##

We will begin our analysis of the data by looking at which apps are most popular on both the Google Play Store and App Store. We want to put our apps on both stores since our free apps generate revenue based on ad impressions. The more people using our apps, the more revenue! 

The strategy for validating new app ideas is like this:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

First, lets look at the dataset headers and find out which columns can help us determine popular genres.

In [71]:
print(dataset_apple_header)
print('\n')
print(dataset_google_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


For the apple dataset, column at index 11, 'prime_genre', is useful.
For the google dataset, column at index 1, 'Category', and column at index 9, 'Genres', are useful.

Let's create a function to build frequency tables, and one to display them in order of decreasing category/genre frequency. 

The display_data function will do similar to display_table but for a dict instead of a dataset (list of lists).

In [72]:
def freq_table(dataset, index):
    total_apps = len(dataset)
    table = {}
    
    for row in dataset:
        if row[index] in table:
            table[row[index]] += 1
        else:
            table.update({row[index] : 1})
            
    for chunk, frequency in table.items():
        table[chunk] = frequency/total_apps*100
            
    return table
            
    
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
    print('\n')
    
def display_data(data, top_x=None):
    table_display = []
    for key in data:
        key_val_as_tuple = (data[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    
    if not top_x:
        for entry in table_sorted:
            print(entry[1], ':', entry[0])
    else:
        i = 0
        for entry in table_sorted:
            if i < top_x:
                print(entry[1], ':', entry[0])
                i += 1

# Build a freqency table for the apple dataset 'prime_genre' 
# column
apple_table = display_table(ios_final, 11)
# Build a freqency table for the google dataset 'Genres' 
# column
google_table = display_table(android_final, 9)
# Build a freqency table for the 'Category' column
google_table2 = display_table(android_final, 1)



Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.31678700

We can see that on the apple app store, the most common genre, by far, is Games. On the google play store, the categories are more evenly distributed. 

## Find the most popular apps ##

To find the most popular apps, we'll look at number of installs for the google dataset, and total user ratings for the apple dataset.

### Find the most popular Apple apps ###


In [73]:
apple_unique_genres = freq_table(ios_final, 11)

print('-- Average number of ratings per genre on App Store --')
genre_ratings = {}
for genre in apple_unique_genres:
    apps_in_genre = 0
    total = 0
    for app in ios_final:
        if app[11] == genre:
            apps_in_genre += 1
            total += float(app[5])
    avg_rating = total/apps_in_genre
    genre_ratings.update({genre : avg_rating})

display_data(genre_ratings)

-- Average number of ratings per genre on App Store --
Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


The top 3 most popular categories are Social Networking, Music, and Reference. Social Networking and Music are dominated by major apps like Facebook/Instagram/Twitter and Spotify/Pandora/Soundcloud, so it is probably best for our company to avoid those categories.

The Reference category shows potential as it is not quite dominated by top apps like the other categories. We should zero in on that category for the Apple App Store.

### Find the most popular Google apps ###


In [74]:
android_unique_categories = freq_table(android_final, 1)
# print(android_unique_categories)

category_installs = {}
for category in android_unique_categories:
    apps_in_category = 0
    total_installs = 0
    for app in android_final:
        if app[1] == category:
            apps_in_category += 1
            installs = app[5]
            # Remove plus sign and commas
            installs = installs.replace('+', '')
            installs = float(installs.replace(',', ''))
            total_installs += installs
    
    avg_installs = total_installs/apps_in_category
    category_installs.update({category : avg_installs})
    
display_data(category_installs)

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

The most popular app categories in the Google Play Store are COMMUNICATION with an average number of installs of: 38456119. Second most is VIDEO_PLAYERS with 24727872 installs. The third most popular is SOCIAL with 23253652 installs.

Let's take a look at the apps in the COMMUNICATION category.

In [75]:
print('-- Number of installs for Google Play Store apps in category COMMUNICATION --')
app_installs = {}
for app in android_final:
    category = app[1]
    app_name = app[0]
    installs = float(app[5].replace('+', '').replace(',', ''))
    
    if category == 'COMMUNICATION':
        if app_name not in app_installs:
            app_installs.update({app_name : installs})
        elif installs > app_installs[app_name]:
            app_installs[app_name] = installs
            

display_data(app_installs, top_x=10)

-- Number of installs for Google Play Store apps in category COMMUNICATION --
WhatsApp Messenger : 1000000000.0
Skype - free IM & video calls : 1000000000.0
Messenger – Text and Video Chat for Free : 1000000000.0
Hangouts : 1000000000.0
Google Chrome: Fast & Secure : 1000000000.0
Gmail : 1000000000.0
imo free video calls and chat : 500000000.0
Viber Messenger : 500000000.0
UC Browser - Fast Download Private & Secure : 500000000.0
LINE: Free Calls & Messages : 500000000.0


6 of the top 10 apps have 1 billion or more installs. The corporations who run these apps are the likes of Google, Facebook. and Microsoft. We probably can't compete with them so this category most likely isn't the best choice for develping an app prototype in.

Now let's look at the second and third most popular categories, VIDEO_PLAYERS and SOCIAL:

In [76]:
print('-- Number of installs for Google Play Store apps in category VIDEO_PLAYERS --')
app_installs = {}
for app in android_final:
    category = app[1]
    app_name = app[0]
    installs = float(app[5].replace('+', '').replace(',', ''))
    
    if category == 'VIDEO_PLAYERS':
        if app_name not in app_installs:
            app_installs.update({app_name : installs})
        elif installs > app_installs[app_name]:
            app_installs[app_name] = installs
            

display_data(app_installs, top_x=10)


print('\n')
print('-- Number of installs for Google Play Store apps in category SOCIAL --')
app_installs_SOCIAL = {}
for app in android_final:
    category = app[1]
    app_name = app[0]
    installs = float(app[5].replace('+', '').replace(',', ''))
    
    if category == 'SOCIAL':
        if app_name not in app_installs_SOCIAL:
            app_installs_SOCIAL.update({app_name : installs})
        elif installs > app_installs_SOCIAL[app_name]:
            app_installs_SOCIAL[app_name] = installs
            

display_data(app_installs_SOCIAL, top_x=10)

-- Number of installs for Google Play Store apps in category VIDEO_PLAYERS --
YouTube : 1000000000.0
Google Play Movies & TV : 1000000000.0
MX Player : 500000000.0
VivaVideo - Video Editor & Photo Movie : 100000000.0
VideoShow-Video Editor, Video Maker, Beauty Camera : 100000000.0
VLC for Android : 100000000.0
Motorola Gallery : 100000000.0
Motorola FM Radio : 100000000.0
Dubsmash : 100000000.0
Vote for : 50000000.0


-- Number of installs for Google Play Store apps in category SOCIAL --
Instagram : 1000000000.0
Google+ : 1000000000.0
Facebook : 1000000000.0
Snapchat : 500000000.0
Facebook Lite : 500000000.0
VK : 100000000.0
Tumblr : 100000000.0
Tik Tok - including musical.ly : 100000000.0
Tango - Live Video Broadcast : 100000000.0
Pinterest : 100000000.0


Similar to the apps in COMMUNICATION, the apps in VIDEO_PLAYERS and SOCIAL categories are taken over by a few major apps/corporations that are difficult to compete with. 

Let's take a look at apps in BOOKS_AND_REFERENCE:

In [77]:
print('-- Number of installs for Google Play Store apps in category BOOKS_AND_REFERENCE --')
app_installs = {}
for app in android_final:
    category = app[1]
    app_name = app[0]
    installs = float(app[5].replace('+', '').replace(',', ''))
    
    if category == 'BOOKS_AND_REFERENCE':
        if app_name not in app_installs:
            app_installs.update({app_name : installs})
        elif installs > app_installs[app_name]:
            app_installs[app_name] = installs
            

display_data(app_installs, top_x=40)

-- Number of installs for Google Play Store apps in category BOOKS_AND_REFERENCE --
Google Play Books : 1000000000.0
Wattpad 📖 Free Books : 100000000.0
Bible : 100000000.0
Audiobooks from Audible : 100000000.0
Amazon Kindle : 100000000.0
Wikipedia : 10000000.0
Spanish English Translator : 10000000.0
Quran for Android : 10000000.0
Oxford Dictionary of English : Free : 10000000.0
NOOK: Read eBooks & Magazines : 10000000.0
Moon+ Reader : 10000000.0
JW Library : 10000000.0
HTC Help : 10000000.0
FBReader: Favorite Book Reader : 10000000.0
English Hindi Dictionary : 10000000.0
English Dictionary - Offline : 10000000.0
Dictionary.com: Find Definitions for English Words : 10000000.0
Dictionary - Merriam-Webster : 10000000.0
Dictionary : 10000000.0
Cool Reader : 10000000.0
Aldiko Book Reader : 10000000.0
Al-Quran (Free) : 10000000.0
Al'Quran Bahasa Indonesia : 10000000.0
Al Quran Indonesia : 10000000.0
Read books online : 5000000.0
English to Hindi Dictionary : 5000000.0
Ebook Reader : 5000000.

The apps in this category are relatively popular and appear not to be as dominated by the massively popular apps like the case with other categories like COMMUNICATION, SOCIAL, AND VIDEO_PLAYERS. In other words, there is some room for potential here. 

One potential avenue would be to select an already-popular book, like the Quran, and then add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.


## Conclusions ##

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.