# Project: Profitable App Profiles for the App Store and Google Play Markets

The project is about analyzing users of apps and their consumption of apps in depth.

This project aims to analyze data to help developers understand more about what type of apps are likely to attract more users.

Some conditions will have to be achieved for this. The project mainly deals with some data exploration, cleaning, adjusting and analyzing genres that could be explored in both markets (Android and Apple).

In [1]:
# load packages
from csv import reader

In [2]:
# Open Apple app dataset
opened_apple = open('AppleStore.csv', encoding='utf8')
read_apple = reader(opened_apple)
data_apple = list(read_apple)

In [3]:
# Open Google app dataset
opened_gpl = open('googleplaystore.csv', encoding='utf8')
read_gpl = reader(opened_gpl)
data_gpl = list(read_gpl)

In [4]:
# Function for dataset exploration
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') 

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

## Data exploring

In [5]:
# Explore apple dataset
# check nuumber of rows and cols
explore_data(data_apple, 1, 4, rows_and_columns = True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7198
Number of columns: 16


In [6]:
# Explore google dataset
# check nuumber of rows and cols
explore_data(data_gpl, 1, 4, rows_and_columns = True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


In [7]:
# Get variables names - apple
explore_data(data_apple, 0, 1)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']




In [8]:
# Get variables names - google
explore_data(data_gpl, 0, 1)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']




## Incorrect observations 

In [9]:
# Check the incorrect observation of google ds:
# (blank spaces)
explore_data(data_gpl, 10471, 10475)
# (it's the 10473 when the ds has header)

['Jazz Wi-Fi', 'COMMUNICATION', '3.4', '49', '4.0M', '10,000+', 'Free', '0', 'Everyone', 'Communication', 'February 10, 2017', '0.1', '2.3 and up']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']




In [10]:
data_gpl[10473] # this is the line
# delete the line
del data_gpl[10473]

In [11]:
# check
data_gpl[10472:10474]

[['Xposed Wi-Fi-Pwd',
  'PERSONALIZATION',
  '3.5',
  '1042',
  '404k',
  '100,000+',
  'Free',
  '0',
  'Everyone',
  'Personalization',
  'August 5, 2014',
  '3.0.0',
  '4.0.3 and up'],
 ['osmino Wi-Fi: free WiFi',
  'TOOLS',
  '4.2',
  '134203',
  '4.1M',
  '10,000,000+',
  'Free',
  '0',
  'Everyone',
  'Tools',
  'August 7, 2018',
  '6.06.14',
  '4.4 and up']]

## Duplicate entries

In [12]:
# Explore: google ds has duplicate entries?
duplicated = []
not_duplicated = []

for entry in data_gpl:
    name = entry[0] # name is the firs element
    if name in not_duplicated:
        duplicated.append(name)
    else:
        not_duplicated.append(name)
    
print('Duplicated:', len(duplicated))
print('Not duplicated:', len(not_duplicated))

# it has 1000+ duplicates
# one of the variables counts the number of reviews
# the greater number indicates the most recent data entry
# this will be used as criterion for selecting an entry

Duplicated: 1181
Not duplicated: 9660


In [13]:
# 1. Create dictionary
# key: unique app; value: highest num of reviews
# this way we can get rid of duplicates
reviews_max = {}

for i in data_gpl[1:]:
    name = i[0]
    n_reviews = float(i[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [14]:
reviews_max

{'Photo Editor & Candy Camera & Grid & ScrapBook': 159.0,
 'Coloring book moana': 974.0,
 'U Launcher Lite – FREE Live Cool Themes, Hide Apps': 87510.0,
 'Sketch - Draw & Paint': 215644.0,
 'Pixel Draw - Number Art Coloring Book': 967.0,
 'Paper flowers instructions': 167.0,
 'Smoke Effect Photo Maker - Smoke Editor': 178.0,
 'Infinite Painter': 36815.0,
 'Garden Coloring Book': 13791.0,
 'Kids Paint Free - Drawing Fun': 121.0,
 'Text on Photo - Fonteee': 13880.0,
 'Name Art Photo Editor - Focus n Filters': 8788.0,
 'Tattoo Name On My Photo Editor': 44829.0,
 'Mandala Coloring Book': 4326.0,
 '3D Color Pixel by Number - Sandbox Art Coloring': 1518.0,
 'Learn To Draw Kawaii Characters': 55.0,
 'Photo Designer - Write your name with shapes': 3632.0,
 '350 Diy Room Decor Ideas': 27.0,
 'FlipaClip - Cartoon animation': 194216.0,
 'ibis Paint X': 224399.0,
 'Logo Maker - Small Business': 450.0,
 "Boys Photo Editor - Six Pack & Men's Suit": 654.0,
 'Superheroes Wallpapers | 4K Backgrounds': 

In [15]:
# check dictionary length
len(reviews_max)

9659

In [16]:
# 2. Use the dictionary to remove duplicate rows

android_clean = []
already_added = []

for i in data_gpl[1:]:
    name = i[0]
    n_reviews = float(i[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(i)
        already_added.append(name)

In [17]:
# check the dictionary without duplicates
# here I have all the observations for each app
android_clean

[['Photo Editor & Candy Camera & Grid & ScrapBook',
  'ART_AND_DESIGN',
  '4.1',
  '159',
  '19M',
  '10,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'January 7, 2018',
  '1.0.0',
  '4.0.3 and up'],
 ['U Launcher Lite – FREE Live Cool Themes, Hide Apps',
  'ART_AND_DESIGN',
  '4.7',
  '87510',
  '8.7M',
  '5,000,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'August 1, 2018',
  '1.2.4',
  '4.0.3 and up'],
 ['Sketch - Draw & Paint',
  'ART_AND_DESIGN',
  '4.5',
  '215644',
  '25M',
  '50,000,000+',
  'Free',
  '0',
  'Teen',
  'Art & Design',
  'June 8, 2018',
  'Varies with device',
  '4.2 and up'],
 ['Pixel Draw - Number Art Coloring Book',
  'ART_AND_DESIGN',
  '4.3',
  '967',
  '2.8M',
  '100,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design;Creativity',
  'June 20, 2018',
  '1.1',
  '4.4 and up'],
 ['Paper flowers instructions',
  'ART_AND_DESIGN',
  '4.4',
  '167',
  '5.6M',
  '50,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'March 26, 2017

In [18]:
# same length as the dictionary w/ reviews without duplicates
len(android_clean)

9659

In [19]:
already_added

['Photo Editor & Candy Camera & Grid & ScrapBook',
 'U Launcher Lite – FREE Live Cool Themes, Hide Apps',
 'Sketch - Draw & Paint',
 'Pixel Draw - Number Art Coloring Book',
 'Paper flowers instructions',
 'Smoke Effect Photo Maker - Smoke Editor',
 'Infinite Painter',
 'Garden Coloring Book',
 'Kids Paint Free - Drawing Fun',
 'Text on Photo - Fonteee',
 'Name Art Photo Editor - Focus n Filters',
 'Tattoo Name On My Photo Editor',
 'Mandala Coloring Book',
 '3D Color Pixel by Number - Sandbox Art Coloring',
 'Learn To Draw Kawaii Characters',
 'Photo Designer - Write your name with shapes',
 '350 Diy Room Decor Ideas',
 'FlipaClip - Cartoon animation',
 'ibis Paint X',
 'Logo Maker - Small Business',
 "Boys Photo Editor - Six Pack & Men's Suit",
 'Superheroes Wallpapers | 4K Backgrounds',
 'HD Mickey Minnie Wallpapers',
 'Harley Quinn wallpapers HD',
 'Colorfit - Drawing & Coloring',
 'Animated Photo Editor',
 'Pencil Sketch Drawing',
 'Easy Realistic Drawing Tutorial',
 'Pink Silver 

## Separate non-english apps

There are apps that are not in English language.
These entries will be removed.

In [20]:
# check some of those entries

print(data_apple[814][1])
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
中国語 AQリスニング
لعبة تقدر تربح DZ


In [21]:
print(ord('ت'))

1578


In [22]:
# Functions that takes strings and returns boolean
# if string belongs or not to common English charact
# (1 - 127 according to ASCII)
# False = apps with non-standard English characters
# True = apps with standard English characters

# update: if the string has > 3 non-ascii, return False

def check_string(string):
    non_asc = 0 
    
    for character in string:
        if ord(character) > 127:
            non_asc += 1
            
    if non_asc > 3:
        return False # non-eng
    else:
        return True

In [23]:
obj1 = 'Instagram'
obj2 = '爱奇艺PPS -《欢乐颂2》电视剧热播'
obj3 = 'Docs To Go™ Free Office Suite'
obj4 = 'Instachat 😜'

check_string(string = obj1)

True

In [24]:
# test after update
s1 = 'Docs To Go™ Free Office Suite'
s2 = 'Instachat 😜'
s3 = '爱奇艺PPS -《欢乐颂2》电视剧热播'

In [25]:
check_string(s1)

True

In [26]:
check_string(s2)

True

In [27]:
check_string(s3)

False

In [28]:
# Now use the new function to filter non-engg apps
# from both datasets 
# OBS: the function is olny working in strings

In [29]:
apple_english = []

for row in data_apple:
    name = row[0]
    if check_string(name):
        apple_english.append(row)

In [30]:
apple_english

[['id',
  'track_name',
  'size_bytes',
  'currency',
  'price',
  'rating_count_tot',
  'rating_count_ver',
  'user_rating',
  'user_rating_ver',
  'ver',
  'cont_rating',
  'prime_genre',
  'sup_devices.num',
  'ipadSc_urls.num',
  'lang.num',
  'vpp_lic'],
 ['284882215',
  'Facebook',
  '389879808',
  'USD',
  '0.0',
  '2974676',
  '212',
  '3.5',
  '3.5',
  '95.0',
  '4+',
  'Social Networking',
  '37',
  '1',
  '29',
  '1'],
 ['389801252',
  'Instagram',
  '113954816',
  'USD',
  '0.0',
  '2161558',
  '1289',
  '4.5',
  '4.0',
  '10.23',
  '12+',
  'Photo & Video',
  '37',
  '0',
  '29',
  '1'],
 ['529479190',
  'Clash of Clans',
  '116476928',
  'USD',
  '0.0',
  '2130805',
  '579',
  '4.5',
  '4.5',
  '9.24.12',
  '9+',
  'Games',
  '38',
  '5',
  '18',
  '1'],
 ['420009108',
  'Temple Run',
  '65921024',
  'USD',
  '0.0',
  '1724546',
  '3842',
  '4.5',
  '4.0',
  '1.6.2',
  '9+',
  'Games',
  '40',
  '5',
  '1',
  '1'],
 ['284035177',
  'Pandora - Music & Radio',
  '130242560'

In [31]:
gpl_english = []

for row in android_clean: # using the gpl cleaned ds
    name = row[0]
    if check_string(name):
        gpl_english.append(row)

In [32]:
gpl_english

[['Photo Editor & Candy Camera & Grid & ScrapBook',
  'ART_AND_DESIGN',
  '4.1',
  '159',
  '19M',
  '10,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'January 7, 2018',
  '1.0.0',
  '4.0.3 and up'],
 ['U Launcher Lite – FREE Live Cool Themes, Hide Apps',
  'ART_AND_DESIGN',
  '4.7',
  '87510',
  '8.7M',
  '5,000,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'August 1, 2018',
  '1.2.4',
  '4.0.3 and up'],
 ['Sketch - Draw & Paint',
  'ART_AND_DESIGN',
  '4.5',
  '215644',
  '25M',
  '50,000,000+',
  'Free',
  '0',
  'Teen',
  'Art & Design',
  'June 8, 2018',
  'Varies with device',
  '4.2 and up'],
 ['Pixel Draw - Number Art Coloring Book',
  'ART_AND_DESIGN',
  '4.3',
  '967',
  '2.8M',
  '100,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design;Creativity',
  'June 20, 2018',
  '1.1',
  '4.4 and up'],
 ['Paper flowers instructions',
  'ART_AND_DESIGN',
  '4.4',
  '167',
  '5.6M',
  '50,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'March 26, 2017

In [33]:
print(len(apple_english))
print(len(gpl_english))

7198
9614


In [34]:
# check header
apple_english[0]

['id',
 'track_name',
 'size_bytes',
 'currency',
 'price',
 'rating_count_tot',
 'rating_count_ver',
 'user_rating',
 'user_rating_ver',
 'ver',
 'cont_rating',
 'prime_genre',
 'sup_devices.num',
 'ipadSc_urls.num',
 'lang.num',
 'vpp_lic']

In [35]:
data_gpl[0]

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

## Isolating the free apps

In [36]:
# Now, let's separate the free apps for further analysis
# (we'll also work considering the free app market)

# free apple apps
free_apple = []

for row in apple_english:
    price = row[4]
    if price == '0.0':
        free_apple.append(row)
        

In [37]:
# Count of how many apple free apps
free_apple
len(free_apple)

4056

In [38]:
# Count of google play free apps
free_gpl = []

for row in gpl_english:
    price = row[7]
    if price == '0':
        free_gpl.append(row)

In [39]:
free_gpl
len(free_gpl)

8864

One of the main objectives is to find out which types of apps that are more likely to attract more people.

Ultimately, the successfull apps will be release both on Google Play and the App Store --> it has to be successfull on both markets.

Due to the order of actions:

+ release a mvp on google play store;

+ if successfull, develop it further;

+ if successfull, release it also on app store.

If we find out the apps that are successfull on both markets, we'll have a good starting point for further analysis and development.

## Most common apps by genre

In [40]:
# Invesigate the datasets to find genres/types of apps
# that are most common

In [41]:
# check the header of each, it is not on the free ds
# obs: do not use these for further analysis
print(apple_english[0:2])
print('\n')
print(data_gpl[0:2])

[['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'], ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']]


[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'], ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']]


In [42]:
# cols that could be used for more detailed analysis:
# (of which genres/types of apps are most common on each market)

# apple: prime_genre (col 11); name (col 1)
# google: genres (col 9); name (col 0)

In [43]:
# Function to generate frequency tables

def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        gen = row[index]
        
        if gen in table:
            table[gen] += 1
            
        else:
            table[gen] = 1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
            
    return table_percentages



In [44]:
# Check prime_genre and Genres cols from both datasets

print(freq_table(free_apple, 11))
print('\n')
print(freq_table(free_gpl, 9))

{'Social Networking': 3.5256410256410255, 'Photo & Video': 4.117357001972387, 'Games': 55.64595660749507, 'Music': 1.6518737672583828, 'Reference': 0.4930966469428008, 'Health & Fitness': 1.8737672583826428, 'Weather': 0.7642998027613412, 'Utilities': 2.687376725838264, 'Travel': 1.3806706114398422, 'Shopping': 2.983234714003945, 'News': 1.4299802761341223, 'Navigation': 0.4930966469428008, 'Lifestyle': 2.3175542406311638, 'Entertainment': 8.234714003944774, 'Food & Drink': 1.0601577909270217, 'Sports': 1.947731755424063, 'Book': 1.6272189349112427, 'Finance': 2.0710059171597637, 'Education': 3.2544378698224854, 'Productivity': 1.5285996055226825, 'Business': 0.4930966469428008, 'Catalogs': 0.22189349112426035, 'Medical': 0.19723865877712032}


{'Art & Design': 0.5979241877256317, 'Art & Design;Creativity': 0.06768953068592057, 'Auto & Vehicles': 0.9250902527075812, 'Beauty': 0.5979241877256317, 'Books & Reference': 2.1435018050541514, 'Business': 4.591606498194946, 'Comics': 0.6092057

In [45]:
# Function that takes in the freq_table and order the apps by
# the number of occurrence of each genre
# (it transforms the dictionary into a list of tuples and orders it)

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [46]:
# Use the display_table to check the dict (now list of tuples)
# in an ordered way

# Apple apps
print(display_table(free_apple, 11)) # prime_genre col

Games : 55.64595660749507
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597637
Sports : 1.947731755424063
Health & Fitness : 1.8737672583826428
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032
None


According to the table above, 'Games' is the most common genre among free apple apps, while 'Catalogs' is the least common.

Top 5 apps by genre: Games, entertainment, education, photo & video and utilities.

Most apps are designed to entertainment, as denoted by Games being much higher than the second place, which is also entertainment.

But still not possible to conclude an app profile based solely on this table. We'll have to check google play table, but keeping in mind that Games/entertainment are the 'winner' among apple apps.

In [47]:
# Google apps
print(display_table(free_gpl, 9)) # genre col

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

There's too many categories inside this genre variable.

The most common are tools, followed by entertainment, education, business and medical.

'Tools' is a not so clear category, such as games or education. So it is difficult to conclude something yet.

Entertainment continues to be a strong app genre, also followed by education, as occurred in apple apps.

In [48]:
# Google apps
print(display_table(free_gpl, 1)) # category col

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Category variable has less categories tha the previous, which is easier to take conclusions from.

According to this variable, 'family' genre is the most common, followed by games, tools, business and medical.

Apart from 'tools', we see that entertainment and practical purposes are always on top, on both datasets. However, on google we see the 'family' category in a highlighted position.

Since we don't have this category on apple apps, we may conclude that this particular genre is considered to be present on entertainment and games, for example, apps directed to children, etc.

If I had to pick one genre, it would have to be related to entertainment.

## Most popular apps by genre on App Store

In [49]:
# Generate a frequency table for the prime_genre column
# of the Apple dataset, in order to find out
# the average number of user ratings per app genre on the App Store
# (relationship between num of user ratings and num of apps to that genre)

genres_ios = freq_table(free_apple, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in free_apple:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)



Social Networking : 53078.195804195806
Photo & Video : 27249.892215568863
Games : 18924.68896765618
Music : 56482.02985074627
Reference : 67447.9
Health & Fitness : 19952.315789473683
Weather : 47220.93548387097
Utilities : 14010.100917431193
Travel : 20216.01785714286
Shopping : 18746.677685950413
News : 15892.724137931034
Navigation : 25972.05
Lifestyle : 8978.308510638299
Entertainment : 10822.961077844311
Food & Drink : 20179.093023255813
Sports : 20128.974683544304
Book : 8498.333333333334
Finance : 13522.261904761905
Education : 6266.333333333333
Productivity : 19053.887096774193
Business : 6367.8
Catalogs : 1779.5555555555557
Medical : 459.75


In [50]:
# Analyze results to be able to suggest at least one
# app profile recommendatio for the App store.

# Reference has the higher avg num of reviews

In [51]:
# Let's take a closer look into the Reference genre
for app in free_apple:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

# dictionaries plays an important role in here, as does the bible

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
彩库宝典-【官方版】 : 0
Jishokun-Japanese English Dictionary & Translator : 0
無料で音楽や写真・カメラの裏技アプリ for iPhone7 : 0


In [52]:
# do the same for music 
for app in free_apple:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])
        
# In terms of music, major apps like Pandora and Spotify are of great relevance

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

In [53]:
# do the same for weather 
for app in free_apple:
    if app[-5] == 'Weather':
        print(app[1], ':', app[5])
        
# weather and forecast apps are also very popular
# however, it is critical to have a good/realiable scientific
# source of information

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
실시간 날씨 :

In [54]:
# So, considering the previous 3 genres, the 'easiest' nich to start
# the experiment could be the reference niche.

# we could get an area of major interest, which are already
# popular by itself, like culinary,
# and make a reference guide for people who are learning how to cook,
# or already have some experience...

## Most popular apps by genre on google play

In [55]:
# check the Installs columns of android
# which is not very reliable, but give a rough estimate

display_table(free_gpl, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [56]:
# Calcxulating the average number of installs per app genre for the
# Google play store

# we have to clean the data as showed above

In [63]:
# generate a frequency table for the Category column
# (to get the unique app genres)
freq_cat = freq_table(free_gpl, 1)
freq_cat

{'ART_AND_DESIGN': 0.6430505415162455,
 'AUTO_AND_VEHICLES': 0.9250902527075812,
 'BEAUTY': 0.5979241877256317,
 'BOOKS_AND_REFERENCE': 2.1435018050541514,
 'BUSINESS': 4.591606498194946,
 'COMICS': 0.6204873646209386,
 'COMMUNICATION': 3.2378158844765346,
 'DATING': 1.861462093862816,
 'EDUCATION': 1.1620036101083033,
 'ENTERTAINMENT': 0.9589350180505415,
 'EVENTS': 0.7107400722021661,
 'FINANCE': 3.7003610108303246,
 'FOOD_AND_DRINK': 1.2409747292418771,
 'HEALTH_AND_FITNESS': 3.0798736462093865,
 'HOUSE_AND_HOME': 0.8235559566787004,
 'LIBRARIES_AND_DEMO': 0.9363718411552346,
 'LIFESTYLE': 3.9034296028880866,
 'GAME': 9.724729241877256,
 'FAMILY': 18.907942238267147,
 'MEDICAL': 3.531137184115524,
 'SOCIAL': 2.6624548736462095,
 'SHOPPING': 2.2450361010830324,
 'PHOTOGRAPHY': 2.944494584837545,
 'SPORTS': 3.395758122743682,
 'TRAVEL_AND_LOCAL': 2.33528880866426,
 'TOOLS': 8.461191335740072,
 'PERSONALIZATION': 3.3167870036101084,
 'PRODUCTIVITY': 3.892148014440433,
 'PARENTING': 0.6

In [66]:
for category in freq_cat:
    total = 0
    len_category = 0
    
    for row in free_gpl:
        category_app = row[1]
        if category_app == category:
            num = row[5]
            num = num.replace('+', '')
            num = num.replace(',', '')
            num = float(num)
            total += num
            len_category += 1
    
    avg_installs = total / len_category
    print(category , ':' , avg_installs)
            
    

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

## Conclusion

In [None]:
# Now, comparing with the iOS dataset, we can see that
# the 'reference' category is also relevant here, with more than
# 8 million installs.

# Of course there are genres with more installs, such as social and 
# entertainment.

# Communication is probably the largest (more than 40 million),
# which may also be an indication of market saturation.

# Given the possibilites to put together more than one niche inside
# the reference app, such as references for diverse subjects of
# interest, this could be an option.