## Profitable App Profiles for the App Store and Google Play Markets

Our goal is findings mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to help our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.



In [18]:
opened_AppleFile = open('AppleStore.csv')
from csv import reader
read_AppleFile = reader(opened_AppleFile)
large_AppleFile = list(read_AppleFile)
Adata_set = large_AppleFile[1:] 
Aheader = large_AppleFile[0] 

# Adata_set - is AppleStore.csv data rows without headers
# Aheader - is AppleStore.csv header row 

opened_PlayFile = open('googleplaystore.csv')
from csv import reader
read_PlayFile = reader(opened_PlayFile)
large_PlayFile = list(read_PlayFile)
Gdata_set = large_PlayFile[1:]
Gheader = large_PlayFile[0]

# Gdata_set - is googleplaystore.csv data rows w/o headers
# Gheader - is googleplaystore.csv header row

In [19]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [20]:
explore_data(Adata_set, 0, 3) ## - Exploring first 3 rows of AppleStore dataset using our newly made function

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']




In [21]:
explore_data(Gdata_set, 0,3) ## - Exploring first 3 rows of googleplaystore dataset using our newly made function

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']




In [22]:
print(Gheader)
print(' ')
print(Aheader)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
 
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


***Some relevant categories that will help us with our data analysis is 'App', 'Category', 'Rating', 'Reviews', 'Installs, 'Price', 'Genres', and 'Type' as far as googleplaystore.csv, and for the AppleStore.csv the most relevant categories are 'track_name', 'price', 'rating_count_tot', 'prime_genre', 'user_rating_ver', and 'user_rating'.***

In [23]:
print(Gdata_set[0],'\n')

print(len(Gheader))

for row in Gdata_set[1:]:
    if len(row) != len(Gheader):
        print(row)
        print("\n")
        print("Index postion is:", Gdata_set.index(row))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

13
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Index postion is: 10472


In [24]:
del Gdata_set[10472] # Deleting the row that is missing a category 

## We are starting to clean the Data Set now

In [25]:
duplicate_apps = []
unique_apps = []

for apps in Gdata_set:
    name = apps[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name) 
    
print('Number of Unique Apps', len(unique_apps))
print(' ')
print('Some Duplicate Apps are', duplicate_apps[:3]) 

Number of Unique Apps 9659
 
Some Duplicate Apps are ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business']


In [26]:
for apps in Gdata_set:
    name = apps[0]
    if name == 'Box': 
        print(apps) 

['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']


***We will not randomly delete this data, the most impotant part of the data is the ratings, therefore we will only keep the data rows with highest total of reviews, and delete the rest. For those who have the same amount of ratings, we will then only delete randomly.*** 

In [29]:
reviews_max = {}

for app in Gdata_set:
    name = app[0]
    n_reviews = float(app[3]) 
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max: 
        reviews_max[name] = n_reviews
        
print(len(reviews_max)) # checking to make sure we successfully have the same amount of Unique Apps as our previous list 
        
    

9659


In [32]:
android_clean = [] # - will store our newly cleaned data set
already_added = [] # - will only store in app names 

for app in Gdata_set:
    name = app[0] 
    n_reviews = float(app[3]) 
    if n_reviews == reviews_max[name] and name not in already_added: 
        android_clean.append(app)
        already_added.append(name)
        
explore_data(android_clean, 0, 3) # - MAKING SURE WE ADDED VALUES CORRECTLY


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']




***Now that we are done with cleaning the data for duplicates, we will now clean the data for only English based apps since our developers will only be making english apps.*** 

In [43]:
def english(doge):
    string = doge
    for character in string:
        value = ord(character) 
        if value <= 127: 
            return True
        else:
            return False 
        
english('Instagram')

True

In [44]:
english('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [49]:
english('Instachat 😜')

True

In [46]:
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english('Instagram')) 

False
True


***Now many of our English apps may have a symbol, so using this filter alone will kick out many useful pieces of data, so we will refine the function.*** 

In [47]:
def english(doge):
    string = doge
    no_ASCII = 0 
    for character in string:
        if ord(character) > 127:
            no_ASCII += 1
            
    if no_ASCII > 3:
        return False
    
    return True

english('Instagram')

True

In [48]:
print(english('Instachat 😜'))

True


In [52]:
english('Docs To Go™™™™™™™™™™™ Free Office Suite')

False

In [53]:
english('Docs To Go™ Free Office Suite')

True

In [59]:
english_android_apps = []
english_apple_apps = [] 

for app in android_clean:
    name = app[0]
    if_english = english(name)
    if if_english == True:
        english_android_apps.append(app)
        
for apps in Adata_set:
    name = apps[3]
    if_english = english(name)
    if if_english == True:
        english_apple_apps.append(apps) 
        
explore_data(english_android_apps, 0, 3) 
explore_data(english_apple_apps, 0, 3) 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'W

In [58]:
print(len(english_android_apps))
print(len(english_apple_apps))

9614
7197


# How have we cleaned the Data Set?
* We cleaned for English Data
* We cleaned for Duplicate Data
* We cleaned for Inaccurate Data

Now, we need some free apps...

In [61]:
print(Gheader)
print(' ')
print(Aheader)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
 
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [81]:
free_play_apps = []
free_ios_apps = []

for rows in english_android_apps:
    price = rows[7]
    if price == '0':
        free_play_apps.append(rows)
        
for inputs in english_apple_apps:
    price = inputs[5]
    if price == '0':
        free_ios_apps.append(inputs) 
        
explore_data(free_play_apps, 0, 5) 
print(' ')
explore_data(free_ios_apps, 0, 5) 
    

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


 
['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', 

# Isolating our final usable data. 
We have 8864 apps on the Google play store, and we have 4056 apps on the App store. Now we need to go find the most common app types in both stores to find the most successful app that can work on both platforms using a frequency table. 

In [83]:
print(len(free_play_apps))
print(len(free_ios_apps))

8864
4056


## Most Common Apps by Genre

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we then develop it further.
3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the prime_genre column of the App Store data set, and the Genres and Category columns of the Google Play data set.



In [85]:
unique_genres = [] 
all_entry = []

for app in free_ios_apps:
    prime_genre = app[-5] 
    if prime_genre not in unique_genres:
        unique_genres.append(prime_genre)
    else:
        all_entry.append(prime_genre)
        
print(unique_genres) 

['Productivity', 'Weather', 'Shopping', 'Reference', 'Finance', 'Music', 'Utilities', 'Travel', 'Social Networking', 'Sports', 'Health & Fitness', 'Games', 'Food & Drink', 'News', 'Book', 'Photo & Video', 'Entertainment', 'Business', 'Lifestyle', 'Education', 'Navigation', 'Medical', 'Catalogs']


In [87]:
unique_pgenres = [] 
all_pentry = []

for app in free_play_apps:
    Genres = app[-4] 
    if Genres not in unique_pgenres:
        unique_pgenres.append(Genres)
    else:
        all_pentry.append(Genres)
        
print(unique_pgenres) 

['Art & Design', 'Art & Design;Creativity', 'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business', 'Comics', 'Comics;Creativity', 'Communication', 'Dating', 'Education', 'Education;Creativity', 'Education;Education', 'Education;Pretend Play', 'Education;Brain Games', 'Entertainment', 'Entertainment;Brain Games', 'Entertainment;Creativity', 'Entertainment;Music & Video', 'Events', 'Finance', 'Food & Drink', 'Health & Fitness', 'House & Home', 'Libraries & Demo', 'Lifestyle', 'Lifestyle;Pretend Play', 'Card', 'Arcade', 'Puzzle', 'Racing', 'Sports', 'Casual', 'Simulation', 'Adventure', 'Trivia', 'Action', 'Word', 'Role Playing', 'Strategy', 'Board', 'Music', 'Action;Action & Adventure', 'Casual;Brain Games', 'Educational;Creativity', 'Puzzle;Brain Games', 'Educational;Education', 'Casual;Pretend Play', 'Educational;Brain Games', 'Art & Design;Pretend Play', 'Educational;Pretend Play', 'Entertainment;Education', 'Casual;Education', 'Casual;Creativity', 'Casual;Action & Adventure', '

So we have now the individual genres, my previous strategy was to type them all in using the `elif` strategy but that would be largely inefficient to run, and to write, so we will create a function to take a part of the data and create a frequency table. 

In [107]:
def freq_table(dataset, index): 
    list_index = {}
    total = 0
    for row in dataset: 
        total += 1
        category = row[index] 
        if category in list_index:
            list_index[category] += 1
        else:
            list_index[category] = 1
            
        table_percentages = {}
    for key in list_index:
        percentage = (list_index[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages
       
print(freq_table(free_ios_apps, -4)) # - Testing the new function, freq_table to make sure that it does accurately create the table. 

{'37': 48.717948717948715, '38': 29.610453648915186, '43': 1.9723865877712032, '12': 0.02465483234714004, '47': 0.04930966469428008, '24': 2.366863905325444, '40': 15.631163708086785, '26': 0.17258382642998027, '39': 0.22189349112426035, '25': 0.616370808678501, '23': 0.02465483234714004, '36': 0.04930966469428008, '11': 0.04930966469428008, '35': 0.34516765285996054, '15': 0.04930966469428008, '16': 0.04930966469428008, '9': 0.02465483234714004, '13': 0.02465483234714004}


## Creating Frequency Tables

In [108]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
display_table(free_ios_apps, -5) # - prime_genres on Apple App Store

Games : 55.64595660749507
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597637
Sports : 1.947731755424063
Health & Fitness : 1.8737672583826428
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032


For `prime_genre` you can see that the overwhelming favorite at 55% is the Games. We see that following the Games genre, we have Entertainment and then Photo & Video with 8% and 4% respectively. Social Netwroking and Education combine for another about 7%. 

We can see that about 70% of the English Apple App Store have apps that are built mainly for having fun and entertainment value, while more practical and functional apps are rare. The most popular apps doesn't mean they are the most *used* apps, so later we will look into that metric as well.  

In [109]:
display_table(free_play_apps, -4) # - Genres on Google Play Store

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [111]:
display_table(free_play_apps, 1) # - Category on Google Play Store

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

For right now, we will move forward in only using the Google Play Store's `Category` column, simply because we see that the Genre column is too specific for our purpose, as there is no signifcant or meaningful data we can conclude that is significant to our problem. 

We can see that Game is the second most common type on the Play Store, and Family is actually the first. Creating an App Profile, we could merge the two together for a family friendly game that can be played with multiple people. It is still too early since we haven't aggregated installation data yet to know the most popular app by users. 

In [133]:
genres = freq_table(free_ios_apps, -5) # prime_genre

for genre in genres:
    total = 0
    len_genre = 0 
    
    for app in free_ios_apps:
        apps = app[-5]
        if apps == genre:
            ratings_tot = float(app[6])
            total += ratings_tot
            len_genre += 1 
            
            
    average = (total / len_genre)
    print(genre, ':', average)

Productivity : 19053.887096774193
Weather : 47220.93548387097
Shopping : 18746.677685950413
Reference : 67447.9
Finance : 13522.261904761905
Music : 56482.02985074627
Utilities : 14010.100917431193
Travel : 20216.01785714286
Social Networking : 53078.195804195806
Sports : 20128.974683544304
Health & Fitness : 19952.315789473683
Games : 18924.68896765618
Food & Drink : 20179.093023255813
News : 15892.724137931034
Book : 8498.333333333334
Photo & Video : 27249.892215568863
Entertainment : 10822.961077844311
Business : 6367.8
Lifestyle : 8978.308510638299
Education : 6266.333333333333
Navigation : 25972.05
Medical : 459.75
Catalogs : 1779.5555555555557


## Apple's Best App Profile: Music / Social Networking
Due to being the most popular app in terms of downloads and having a close to 56k downloads on average in the category, this is a must create. Using what we do know about the space, the two largest players are Spotify and Apple Music, with one of these large players being a default in many phones, our close second would be social networking since the amount of downloads per app is more evenly spread. 

In [180]:
music_apps = {}
for row in free_ios_apps:
    ratings_count = float(row[6])
    name = row[2]
    if row[-5] == 'Music' and ratings_count > 500000:
        print(row[2], " : ", row[6])
    

Pandora - Music & Radio  :  1126879
Spotify Music  :  878563


So with this one, we can see that the two most popular apps that have been downloaded in the Apple App Store is Pandora and Spotify. Let's take an analysis at the popular social networks.  

In [181]:
social_apps = {}
for row in free_ios_apps:
    ratings_count = float(row[6])
    name = row[2]
    if row[-5] == 'Social Networking' and ratings_count > 500000:
        print(row[2], " : ", row[6])

Facebook  :  2974676
Pinterest  :  1061624


Based on the data we have taken for App Profiles, we see that Facebook is pretty large and has a very social and feed approach, but Pinterest is interesting, maybe if we allowed a version similiar to Soundcloud where smaller artists can create a social network where they can post small clips of the chorus, catchy pop lyrics or a powerful bridge with a snip or loop in a clip of a music video, or a graphic to where people will press down to listen to a snippet and tap if they want to listen to whole song. 

## Analyzing Google Play's Data

In [182]:
print(Gheader)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [183]:
display_table(free_play_apps, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


Let's try to turn this free frequency table into more precise and function values. 

In [197]:
categories = freq_table(free_play_apps, 1) 

for category in categories:
    totals = 0 
    len_category = 0 
    for app in free_play_apps:
        category_app = app[1]
        if category_app == category:
            no_installs = app[5]
            no_installs = no_installs.replace('+', '')
            no_installs = no_installs.replace(',', '')
            totals += float(no_installs)
            len_category += 1
    average = totals / len_category
    if average > 20000000:
        print(category, ":", average)
    

COMMUNICATION : 38456119.167247385
SOCIAL : 23253652.127118643
VIDEO_PLAYERS : 24727872.452830188


The three most popular app categories are communication, social and video players. Let's take this a step further and see the most popular apps in each category. 

In [209]:
communication_apps = {}
for row in free_play_apps:
    installs = row[5]
    name = row[0]
    if row[1] == 'COMMUNICATION':
        installs = installs.replace('+','')
        installs = installs.replace(',','')
        installs = float(installs)
        if installs > 500000000:
            print(row[0], " : ", row[5])

WhatsApp Messenger  :  1,000,000,000+
Messenger – Text and Video Chat for Free  :  1,000,000,000+
Skype - free IM & video calls  :  1,000,000,000+
Google Chrome: Fast & Secure  :  1,000,000,000+
Gmail  :  1,000,000,000+
Hangouts  :  1,000,000,000+


In [210]:
social_apps = {}
for row in free_play_apps:
    installs = row[5]
    name = row[0]
    if row[1] == 'SOCIAL':
        installs = installs.replace('+','')
        installs = installs.replace(',','')
        installs = float(installs)
        if installs > 500000000:
            print(row[0], " : ", row[5])

Facebook  :  1,000,000,000+
Google+  :  1,000,000,000+
Instagram  :  1,000,000,000+


In [211]:
video_apps = {}
for row in free_play_apps:
    installs = row[5]
    name = row[0]
    if row[1] == 'VIDEO_PLAYERS':
        installs = installs.replace('+','')
        installs = installs.replace(',','')
        installs = float(installs)
        if installs > 500000000:
            print(row[0], " : ", row[5])

YouTube  :  1,000,000,000+
Google Play Movies & TV  :  1,000,000,000+


## Google's Best App Profile: Social

Out of the top 3 categories, we know that Social Networking apps is the lowest, but at the same time, we see that Video and Communication are skewed with the amount of default apps that Google has in these lists. 