## Profitable App Profiles for the App Store and Google Play Markets
Purpose:  
The aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets, in order to enable the developers to make data-driven decisions with respect to the kind of apps they build.

Goal:  
The company only build apps that are free to download and install, and the main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. The goal for this project is to analyze data to help the developers understand what kinds of apps are likely to attract more users.

Data sets:
1. A data set containing data about ten thousand Android apps from Google Play(collected in August 2018)  
https://www.kaggle.com/lava18/google-play-store-apps/home  

2. a data set containing data about seven thousand iOS apps from App Store(collected in July 2017)  
https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home  

### Opening and exploring 2 data sets
- preparation
    - make explore_data() function to make exploring the data easier.
        - parameters
            - datset            expected to be a list of lists
            - start, end        expected to be integers and starting and ending indice of a slice
            - show_rows_and_columns  expected to be a Boolean and has False as default

In [130]:
def explore_data(dataset, start, end, show_rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    if show_rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns: ', len(dataset[0]))

- Opening the data sets
    - open the2 data sets and save both as lists of lists

In [131]:
from csv import reader
# AppleStore dataset #
opened_file = open('AppleStore.csv', encoding='utf8')
readed_file = reader(opened_file)
apple = list(readed_file)
apple_header = apple[0]
apple = apple[1:]

# Google Play Markets dataset #
opened_file = open('googleplaystore.csv', encoding='utf8')
readed_file = reader(opened_file)
google = list(readed_file)
google_header = google[0]
google = google[1:]

print('apple header\n')
print(apple_header)
print('\n')
explore_data(apple, 0, 1, True)
print('google header\n')
print(google_header)
print('\n')
explore_data(google, 0, 1, True)


apple header

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


Number of rows: 7197
Number of columns:  17
google header

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns:  13


|index|column(AppleStore)|type|description|
|---|---|---|---|
|0 | | |
|1 |"id"|key|App ID|
|2 |"track_name"|string|App Name|
|3 |"size_bytes"|integer|Size (in Bytes)|
|4 |"currency"|string|Currency Type|
|5 |"price"|integer|Price amount|
|6 |"rating_count_tot"|integer|User Rating counts (for all version)|
|7 |"rating_count_ver"|integer|User Rating counts (for current version)|
|8 |"user_rating"|integer|Average User Rating value (for all version)|
|9 |"user_rating_ver"|integer|Average User Rating value (for current version)|
|10 |"ver"|string|Latest version code|
|11 |"cont_rating"|string|Content Rating|
|12 |"prime_genre"|string|Primary Genre|
|13 |"sup_devices.num"|integer|Number of supporting devices|
|14 |"ipadSc_urls.num"|integer|Number of screenshots showed for display|
|15 |"lang.num"|integer|Number of supported languages|
|16 |"vpp_lic"|integer|Vpp Device Based Licensing Enabled|
 
|Index |column(GooglePlayMarkets)|type|description|
|---|---|---|---|
|0 |App|string|Application name|
|1 |Category|string|Category the app belongs to|
|2 |Rating|integer|Overall user rating of the app (as when scraped)|
|3 |Reviews|integer|Number of user reviews for the app (as when scraped)|
|4 |Size|string|Size of the app (as when scraped)|
|5 |Installs|string|Number of user downloads/installs for the app (as when scraped)|
|6 |Type|string|Paid or Free|
|7 |Price|string|Price of the app (as when scraped)|
|8 |Content Rating|string|Age group the app is targeted at - Children / Mature 21+ / Adult|
|9 |Genres|string|An app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to Music, Game, Family genres.|
|10 |Last Updated|datetime|Date when the app was last updated on Play Store (as when scraped)|
|11 |Current Ver|string|Current version of the app available on Play Store (as when scraped)|
|12 |Android Ver|string|Min required Android version (as when scraped)|

### Data Cleaning  
- Detect inaccurate/duplicate data and correct/remove  
    - policy: rows below are the target for being cleaned in this analysis 
        - data-lacked rows
        - duplicate rows(apps)
        - non-free apps rows
        - non-English rows
    - Procedure:  
        1. detect data-lacked rows
        2. delete data-lacked rows
        3. detect duplicate rows(apps)
            - create lists of unique apps, and duplicate apps
        4. remove duplicate apps (not *randomly*)
            - criterion: keep the row with the highest number of reviews only!!(highest one seems to be most fresh one)
            - create a dictionary with keys(unique app name) and corresponding value(highest num of reviews of the key app)
        5. detect non-English apps rows
            - just keep rows with numbers of characters in the range of 0-127(ASCII code only) using `ord()` function 
        6. delete non-English apps rows
        7. detect non-free apps rows
        8. delete non-free apps rows
                

In [132]:
# 1. detect and confirming the lack(index:10472)
print(google[10472])
# 2. delete lacked row
# delete 10472 because column 3 is not integer
del google[10472] #do it once and done

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [133]:
# 3. detect duplicate rows(apps) for google #
duplicate_apps = []
unique_apps = []
for app in google:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
print('Number of duplicate apps(google): ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps(google): ', duplicate_apps[:2])
print('\n')
print('Expected length after remove duplicate apps(google): ', len(google) - 1181)
duplicate_apps_apple = []
unique_apps_apple = []
for app in apple:
    name = app[0]
    if name in unique_apps_apple:
        duplicate_apps_apple.append(name)
    else:
        unique_apps_apple.append(name)
print('Number of duplicate apps(apple): ', len(duplicate_apps_apple))
print('\n')
print('Examples of duplicate apps(apple): ', duplicate_apps_apple[:2])
print('\n')
print('Expected length after remove duplicate apps(apple): ', len(apple))

Number of duplicate apps(google):  1181


Examples of duplicate apps(google):  ['Quick PDF Scanner + OCR FREE', 'Box']


Expected length after remove duplicate apps(google):  9659
Number of duplicate apps(apple):  0


Examples of duplicate apps(apple):  []


Expected length after remove duplicate apps(apple):  7197


In [134]:
# 4. remove duplicate rows(apps) for google #
reviews_max = {}
for row in google:
    name = row[0]
    n_reviews = float(row[3])
    if (name in reviews_max) and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews #update max value
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
# confirm the length of the dictionary, which is the number of unique apps
print('duplicate-removed in a dictionary: ', len(reviews_max))

# remove in lists using a dictionary above
google_clean = [] # new cleaned data set
already_added = [] # just app names are stored
for row in google:
    name = row[0]
    n_reviews = float(row[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        google_clean.append(row)
        already_added.append(name)
# confirm removal above has been done
explore_data(google_clean, 0, 3, True)

duplicate-removed in a dictionary:  9659
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns:  13


In [135]:
# 5. detect non-English apps rows

    # function for detecting non-English apps 
        # argument: string
        # return: Boolean(True: English, False: non-English)
def det_eng(string):
    char_count = 0
    for char in string:
        if ord(char) > 127:
            char_count += 1
    if char_count > 3:
        return False
    else:
        return True

    #function test
        # print(det_eng('Instagram'))
        # print(det_eng('爱奇艺PPS -《欢乐颂2》电视剧热播'))
        # print(det_eng('Docs To Go™ Free Office Suite'))
        # print(det_eng('Instachat 😜'))

# 6. delete non-English apps rows from duplicate-cleaned google apps
google_eng = []
for row in google_clean:
    name = row[0]
    if det_eng(name):
        google_eng.append(row)# English
apple_eng = []
for row in apple:
    name = row[2]
    if det_eng(name):
        apple_eng.append(row)# English
print('English apps')
print('Google')
explore_data(google_eng, 0, 1, True)
print('\n')
print('apple')
explore_data(apple_eng, 0, 1, True)

# 7. detect non-free apps rows
google_final = []
for row in google_eng:
    price = row[7]
    if price == '0':
        google_final.append(row)
apple_final = []
for row in apple_eng:
    price = row[5]
    if price == '0':
        apple_final.append(row)
print('*****************')
print('free apps')
print('Google')
explore_data(google_final, 0, 1, True)
print('\n')
print('apple')
explore_data(apple_final, 0, 1, True)
    

# 8. delete non-free apps rows


English apps
Google
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 9614
Number of columns:  13


apple
['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


Number of rows: 6183
Number of columns:  17
*****************
free apps
Google
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 8864
Number of columns:  13


apple
['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


Number of rows: 3222
Number of columns:  17


function above permit up to three emojis or other special characters only!!!(to minimize the loss)  

### *output so for*  
|Items |Google|Apple|
|---|---|---|
|accurate apps|10840|7197|
|unique apps|9659|7197|
|English apps|9614|6183|
|clean-finished apps|8864|3222|  

### Analysis
- Aim : 
    - to determine the kind of **apps that are likely to attract more users**
        - the revenue is highly influenced by the number of users
- validation strategy :  
    1. build a minimal android app and add it to google
    2. if the app has a good response in google market, develop it further
    3. if the app is profitable after **6 months**, also build an iOS version and add it to apple
- **Goal of this analysis**
    - to add the app on both google and apple
- to do :  
    - find app profiles that are successful on both markets
        - a profile that might work well for both markets might be a productive app
- analytical points of view(what columns might be useful for the goal??)
    - most common genres
        - columns of interest
            - goole: `genres`, `category`
            - apple: `prime_genre`
                - ---> make the frequency table on genre share for each market
    - most popular genres
        - columns of interest
            - google: `installs`
            - apple: `rating_count_tot` (proxy for installs column cause missing in apple dataset)
                - ---> calcurating average number of user ratings per app genre


In [136]:
'''
    function for generating column frequency table that show percentages in the markets
'''
def freq_table(dataset, index):
    table = {}
    total = 0
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    table_per = {}
    for key in table:
        per = (table[key] / total) * 100
        table_per[key] = per
    return table_per
'''
    function for having decscending order on the information
        dictionary returns keys only
        in order to making descending order, just assign values first, keys second
        
'''
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
    table_sorted = sorted(table_display, reverse = True)
    for ind in table_sorted:
        print(ind[1], ':', ind[0])

display_table(google_final, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [137]:
display_table(google_final, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [138]:
display_table(apple_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


- points on genre analysis
    - google
        - some genres don't dominate the market
            - means more balanced landscape of both for-fun and practical apps
        - **family, game, tool** are the higher shared genres
        - more practical apps than apple market
    - apple
        - more than half are **games** in English apps
        - **games, entertainment, and photo&video** are 2/3 of entire apps
        - apple market is dominated by **fun apps
        - practical apps are more rare than fun apps
        - don't know yet the demand of users matchs well with this supply in this analysis

In [139]:
'''
    calcurating average number of user ratings per app genre on apple data set
        isolate the apps of each genre
        sum up the user ratings for the apps of that genre
        divide the sum by the number of apps belonging to that genre(not total number of apps)
'''
avg_display_apple = []
genres_apple = freq_table(apple_final, -5)
for genre in genres_apple:
    total = 0 # sum of user ratings for each genre
    num_genre = 0 # number of apps for each genre
    for row in apple_final:
        genre_app = row[-5] # genre
        if genre_app == genre:
            total += float(row[6]) # user rating 
            num_genre += 1
    avg_rating_apple = total / num_genre
    key_val = (avg_rating_apple, genre)
    avg_display_apple.append(key_val)
avg_display_apple_sorted = sorted(avg_display_apple, reverse = True)
for row in avg_display_apple_sorted:
    print(row[1], ':', row[0])

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


- impression of average in apple
    - navigation apps has the highest number of user reviews
    - might be influenced by google map...
    - **make which the apps included in the genres which gets higher ratings clear below**

In [140]:
sort_list = []
for row in apple_final:
    if row[-5] == 'Navigation':
        key_value = (int(row[6]), row[2])
        sort_list.append(key_value)
sorted_sort = sorted(sort_list, reverse = True)
for row in sorted_sort:
    print(row[0], ':', row[1])

345046 : Waze - GPS Navigation, Maps & Real-time Traffic
154911 : Google Maps - Navigation & Transit
12811 : Geocaching®
3582 : CoPilot GPS – Car Navigation & Offline Maps
187 : ImmobilienScout24: Real Estate Search in Germany
5 : Railway Route Search


In [141]:
sort_list = []
for row in apple_final:
    if row[-5] == 'Reference':
        key_value = (int(row[6]), row[2])
        sort_list.append(key_value)
sorted_sort = sorted(sort_list, reverse = True)
for row in sorted_sort:
    print(row[0], ':', row[1])

985920 : Bible
200047 : Dictionary.com Dictionary & Thesaurus
54175 : Dictionary.com Dictionary & Thesaurus for iPad
26786 : Google Translate
18418 : Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran
17588 : New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition
16849 : Merriam-Webster Dictionary
12122 : Night Sky
8535 : City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE)
4693 : LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools
1497 : GUNS MODS for Minecraft PC Edition - Mods Tools
826 : Guides for Pokémon GO - Pokemon GO News and Cheats
762 : WWDC
718 : Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free
14 : VPN Express
8 : Real Bike Traffic Rider Virtual Reality Glasses
0 : 教えて!goo
0 : Jishokun-Japanese English Dictionary & Translator


In [142]:
sort_list = []
for row in apple_final:
    if row[-5] == 'Social Networking':
        key_value = (int(row[6]), row[2])
        sort_list.append(key_value)
sorted_sort = sorted(sort_list, reverse = True)
count = 0
for row in sorted_sort:
    if count < 40:
        print(row[0], ':', row[1])
        count += 1

2974676 : Facebook
1061624 : Pinterest
373519 : Skype for iPhone
351466 : Messenger
334293 : Tumblr
287589 : WhatsApp Messenger
260965 : Kik
177501 : ooVoo – Free Video Call, Text and Voice
164963 : TextNow - Unlimited Text + Calls
164249 : Viber Messenger – Text & Call
112778 : Followers - Social Analytics For Instagram
97072 : MeetMe - Chat and Meet New People
90414 : We Heart It - Fashion, wallpapers, quotes, tattoos
85535 : InsTrack for Instagram - Analytics Plus More
75412 : Tango - Free Video Call, Voice and Chat
71856 : LinkedIn
60659 : Match™ - #1 Dating App.
60163 : Skype for iPad
52642 : POF - Best Dating App for Conversations
49510 : Timehop
43877 : Find My Family, Friends & iPhone - Life360 Locator
39819 : Whisper - Share, Express, Meet
36404 : Hangouts
34677 : LINE PLAY - Your Avatar World
34584 : WeChat
34428 : Badoo - Meet New People, Chat, Socialize.
28633 : Followers + for Instagram - Follower Analytics
28260 : GroupMe
27662 : Marco Polo Video Walkie Talkie
23965 : Mii

- impression of popular genres
    - navigation, social networking might be more popular than they really are
        - average number of ratings seems to be very influenced by some giant apps...
        - the other small apps struggle for the small share in the market
    - also, reference is not the popular genre than really are
        - but more attractive than navigation, networking genre
            - giant apps are not as so many as the genres above

In [143]:
'''
    calcurating most popular apps by genres on google data set
'''
display_table(google_final, 5) # the Installs column

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


- impression of google apps installation
    - which genres attracts users??
    - assume the install number as that is, compute total numbers for each genre

In [146]:
categories_google = freq_table(google_final, 1) # category
sort_list = []
for category in categories_google:
    total = 0
    num_category = 0
    for row in google_final:
        category_of_app = row[1]
        if category_of_app == category:
            number = row[5]
            number = number.replace(',', '')
            number = number.replace('+', '')
            total += int(number)
            num_category += 1
    avg_number = total / num_category
#    print(category, ':', avg_number)
    key_val = (avg_number, category)
    sort_list.append(key_val)
sorted_sort = sorted(sort_list, reverse = True)
for row in sorted_sort:
    print(row[0], ':', row[1])

38456119.167247385 : COMMUNICATION
24727872.452830188 : VIDEO_PLAYERS
23253652.127118643 : SOCIAL
17840110.40229885 : PHOTOGRAPHY
16787331.344927534 : PRODUCTIVITY
15588015.603248259 : GAME
13984077.710144928 : TRAVEL_AND_LOCAL
11640705.88235294 : ENTERTAINMENT
10801391.298666667 : TOOLS
9549178.467741935 : NEWS_AND_MAGAZINES
8767811.894736841 : BOOKS_AND_REFERENCE
7036877.311557789 : SHOPPING
5201482.6122448975 : PERSONALIZATION
5074486.197183099 : WEATHER
4188821.9853479853 : HEALTH_AND_FITNESS
4056941.7741935486 : MAPS_AND_NAVIGATION
3695641.8198090694 : FAMILY
3638640.1428571427 : SPORTS
1986335.0877192982 : ART_AND_DESIGN
1924897.7363636363 : FOOD_AND_DRINK
1833495.145631068 : EDUCATION
1712290.1474201474 : BUSINESS
1437816.2687861272 : LIFESTYLE
1387692.475609756 : FINANCE
1331540.5616438356 : HOUSE_AND_HOME
854028.8303030303 : DATING
817657.2727272727 : COMICS
647317.8170731707 : AUTO_AND_VEHICLES
638503.734939759 : LIBRARIES_AND_DEMO
542603.6206896552 : PARENTING
513151.8867924

- impression of number of installs on google
    - on average, communication apps are most attractive in the view of number of install
        - but the ratio is little bit complex, few giant apps have huge number of install...
        - the other apps don't have more number of install than teh giant apps have...

In [152]:
for row in google_final:
    if row[1] == 'COMMUNICATION' and (row[5] == '1,000,000,000+'
                                or row[5] == '500,000,000+'
                                or row[5] == '100,000,000+'):
        print(row[0], ':', row[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [154]:
under_100m = []
for row in google_final:
    number = row[5]
    number = number.replace(',', '')
    number = number.replace('+', '')
    if (row[1] == 'COMMUNICATION') and (float(number) < 100000000):
        under_100m.append(float(number))
sum(under_100m) / len(under_100m)

3603485.3884615386

In [155]:
for app in google_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

- impression of book category
    - various apps in there!!!!
    - how about the share of the category??

In [156]:
for app in google_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


In [157]:
for app in google_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.  
  
  
We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.
  
  
However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

**it shows the category has the potential to join in!!!!**

- impression of number of install in google apps
    - market is dominated by few giant apps  
    
### Conclusions
In this project, I analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

I concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.