# App Stores Dataset Analysis

This project is about analyzing data to help a company that builds Android and iOS mobile apps. This company works in building free apps in which there are in-app ads. Their revenue comes from the engagement between the users and those in-app ads. Therefore, the more users that see and engage wih the ads, the better.

Our goal in this project is to identify which type of apps are likely to attract more users on Google Play and App Store so the developers can understand the best apps to build.

Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our goals:

* A [data set](https://www.kaggle.com/lava18/google-play-store-apps/home) containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018.
* A [data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017.

## Exploring the Data

First, we will define a function that allows us to explore the dataset we chose to analyze. The following function shows us, for instance, some of the data so we can have an idea on how it is organized inside the dataset as well as the number of rows and columns and the name of the columns.

In [1]:
def explore_data (dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))


With the function defined, we will open the datasets files that are located in a different directory from the project.

In [2]:
apple_opened_file = open('C:\\Users\Mariana\Desktop\PROFISSIONAL\Cursos Online\Python_DataQuest\Project 01 - App Stores Dataset Analysis\Project 01_Datasets\AppleStore.csv', encoding = 'utf8')
android_opened_file = open ('C:\\Users\Mariana\Desktop\PROFISSIONAL\Cursos Online\Python_DataQuest\Project 01 - App Stores Dataset Analysis\Project 01_Datasets\googleplaystore.csv',  encoding = 'utf8')
from csv import reader
apple_file = reader(apple_opened_file)
android_file = reader(android_opened_file)
apple_apps = list(apple_file)
android_apps = list (android_file)

We will then explore the data with the function `explore_data`

In [3]:
explore_data (apple_apps, 0, 5, True)
print ('\n')
explore_data (android_apps, 0, 5, True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


Number of rows: 7198
Number of columns: 17


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 

## Data Cleaning Part 01 - Removing Innacurate Data

Moving forward, we will perform a simple data cleaning process so we can isolate the data we are actually interested in: **Free apps that are developed towards an English-Speaking Audience.**

From this [topic](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) in the discussion session from where the data was imported, it was possible to identify an error within the dataset. As we don't know if the user considered or not the header row, we will check if the index provided (10472) is indeed correct.


In [4]:
print(android_apps[0])
print ('\n')
print (android_apps[10472:10474])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


[['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up'], ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']]


As seen above, the actual row from the dataset that is missing information (in this case, the "category" info) is the row indexed to the 10473. Therefore we will remove this row.

In [5]:
print( 'Before length: ', len(android_apps))
del android_apps[10473]
print( 'After length: ', len(android_apps))

Before length:  10842
After length:  10841


## Data Cleaning Part 02 - Removing Duplicates 

Furthermore, we will check if there are any duplicates in each of the datasets. For this, we will create a function called `if_duplicates`

In [6]:
def if_duplicates (dataset_name, row_name):
    duplicate_apps = []
    unique_apps = []

    for app in dataset_name[1:]:
        name = app[row_name]
        if name in unique_apps:
            duplicate_apps.append(name)
        else:
            unique_apps.append(name)
        
    return [duplicate_apps,unique_apps]
  

With this function, we will loop through both datasets to identify if there is any duplicates and if so, how many.

In [7]:
duplicates_android = if_duplicates(android_apps,0)[0]
unique_android = if_duplicates(android_apps,0)[1]
print ("number of duplicate android apps:", " ", len(duplicates_android))
print ('\n')
print ("Some of the apps are:", " ", duplicates_android[:15])
print ('\n')
print ("number of android unique apps:", " ", len(unique_android))
print ('\n')
duplicates_apple = if_duplicates(apple_apps,2)[0]
unique_apple = if_duplicates(apple_apps,2)[1]
print ("number of apple duplicate apps:", " ", len(duplicates_apple))
print ('\n')
print ("Some of the apps are:", " ", duplicates_apple[:15])
print ('\n')
print ("number of apple unique apps:", " ", len(unique_apple))

number of duplicate android apps:   1181


Some of the apps are:   ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


number of android unique apps:   9659


number of apple duplicate apps:   2


Some of the apps are:   ['VR Roller Coaster', 'Mannequin Challenge']


number of apple unique apps:   7195


In [8]:
print(apple_apps[0])
print ('\n')
for app in apple_apps [1:]:
    name = app[2]
    if name == 'VR Roller Coaster':
        print (app)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['4000', '952877179', 'VR Roller Coaster', '169523200', 'USD', '0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['7579', '1089824278', 'VR Roller Coaster', '240964608', 'USD', '0', '67', '44', '3.5', '4', '0.81', '4+', 'Games', '38', '0', '1', '1']


Therefore, we have 1181 duplicate android entries and only 2 duplicate apple entries.We will not delete the duplicates randomly. It is better to stay with the most recent entries so we get the most recent data from each app. However, the same app can be present in 2+ different categories (which is relevant for this analysis). Thus, as a criteria, we will maintain the entry with the most reviews.

In [9]:
reviews_max_google = {}

for app in android_apps[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max_google           and \
    reviews_max_google[name] < n_reviews:
        reviews_max_google[name] = n_reviews
        
    elif name not in reviews_max_google:
        reviews_max_google[name]= n_reviews

reviews_max_apple = {}

for app in apple_apps[1:]:
    name = app[2]
    n_reviews = float(app[6])
    
    if name in reviews_max_apple           and \
    reviews_max_apple[name] < n_reviews:
        reviews_max_apple[name] = n_reviews
        
    elif name not in reviews_max_apple:
        reviews_max_apple[name]= n_reviews
        


print ('number of unique apps for android:',len(reviews_max_google))  
print('\n')
print ('number of unique apps for apple:',len(reviews_max_apple)) 


       
        

number of unique apps for android: 9659


number of unique apps for apple: 7195


As the length of those dictionaries match the expected, we can state that they are reliable. Then, we have dictionaries with only one entry for each app. 

Now, we are going to make a list with no duplicate data by performing a data clean process and deleting duplicates from the Android and Apple Apps datasets. We will create, then, two lists that will be useful for us: the `android_clean_dataset` and the `apple_clean_dataset`.

We go through the datasets and compare the app name of each entry to `already_added` list so we can keep track of what we have already added and avoid duplicates as some entries may have the same number of reviews.

In [10]:
android_clean_dataset = []
already_added = []

for app in android_apps[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max_google and n_reviews == reviews_max_google[name] and name not in already_added:
        android_clean_dataset.append(app)
        already_added.append(name)
                             
                            
print( 'number of apps in the android clean dataset is:', len(android_clean_dataset))
print ('\n')
apple_clean_dataset = []
already_added = []

for app in apple_apps[1:]:
    name = app[2]
    n_reviews = float(app[6])
    
    if name in reviews_max_apple and n_reviews == reviews_max_apple[name] and name not in already_added:
        apple_clean_dataset.append(app)
        already_added.append(name)
print( 'number of apps in the apple clean dataset is:', len(apple_clean_dataset))

number of apps in the android clean dataset is: 9659


number of apps in the apple clean dataset is: 7195


## Data Cleaning Part 03 - Removing Non-English Apps

As we want free apps that are developed towards an **English-Speaking Audience**, we will delete the ones that are made in a different language. For this, we will first build the function `if_english` and then test it:

In [11]:
def if_english (appname):
    for character in appname:
        order = ord(character)
        if order > 127:
             return False
            
    return True

In [12]:
print (if_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print (if_english('Instagram'))
print (if_english('Docs To Go™ Free Office Suit'))
print (if_english('Instachat 😜'))

False
True
False
False


As seen above, the function did not succeed in identifying certain app names. This is because emojis and characters like ™ fall outside the ASCII range and have corresponding numbers over 127.

In [13]:
print( ord('😜'))
print( ord('™'))

128540
8482


Therefore, to minimize data loss, we will only delete apps that have more than three characters that fall outside de ASCII range, modifying our `if_english` function and testing it again:

In [14]:
def if_english(appname):
    non_ascii = 0
    is_english = True
    for character in appname:
        order = ord(character)
        if order > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        is_english = False
    
    return is_english

In [15]:
print (if_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print (if_english('Instagram'))
print (if_english('Docs To Go™ Free Office Suit'))
print (if_english('Instachat 😜'))

False
True
True
True


With this function, we will loop through both datasets to delete all the potential non-english apps and adding the cleaner dataset to the lists `android_english_apps` and `apple_english_apps`.

In [16]:
android_english_apps =[]
apple_english_apps = []

for app in android_clean_dataset:
    if if_english(app[0]):
        android_english_apps.append(app)

for app in apple_apps:
    if if_english(app[2]):
        apple_english_apps.append(app)  

print ('Rows in Google Dataset: ', len(android_english_apps))
print ('Rows in Apple Dataset: ', len(apple_english_apps))
print ('\n')
print (android_english_apps[:2])
print (apple_english_apps[:2])

Rows in Google Dataset:  9614
Rows in Apple Dataset:  6184


[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']]
[['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'], ['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']]


## Data Cleaning Part 04 - Removing Non-free Apps

The next step will be to remove any non-free apps as the free apps are the type that is important to this analysis. We will then create a function for this purpose called `free_apps`

In [17]:
def free_apps(appdata,price_row):
    freeapps_data = []
    for app in appdata:
        price = app[price_row]
        if price == '0':
            freeapps_data.append(app)
    return freeapps_data


In [18]:
freeapps_android = free_apps (android_english_apps,7)
freeapps_apple = free_apps (apple_english_apps, 5)
print ('android dataset length:',len(freeapps_android))
print ('apple dataset length:',len(freeapps_apple))

android dataset length: 8864
apple dataset length: 3222


To confirm the right execution of this function, we will loop through android apps and make another list of free apps but now with the criteria that the sixth row is the string `"free"`

In [19]:
freeandroid = []

for app in android_english_apps:
    if app[6] == 'Free':
        freeandroid.append(app)

print ("android dataset length:", len(freeandroid))

android dataset length: 8863


As we can see, there is a similar result for both criterias so the function defined is a reliable one. There must be a wrong entry but it is not relevant for the final result as there is just one.

Moreover, we will insert the name os each column for each dataset.

In [20]:
row_name_column_android = android_apps [0]
row_name_column_apple = apple_apps [0]

freeapps_apple.insert(0, row_name_column_apple)
freeapps_android.insert(0, row_name_column_android)

print(freeapps_android[0])
print("\n")
print(freeapps_apple[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


So far we have:
1. Removed inaccurate data
2. Removed duplicate app entries
3. Removed non-English apps
4. Isolated the free apps


## Most Common Apps - Identifying the Important Columns


As stated in the beggining, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps. 

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. Thus we will inspect both data sets and identify the columns we could use to generate frequency tables to find out what are the most common genres in each market.

In [21]:
explore_data (freeapps_apple, 0, 4, True)
print ('\n')
explore_data (freeapps_android, 0, 4, True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


Number of rows: 3223
Number of columns: 17


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_

For the apple dataset, we could use the following columns:
1. rating_count_tot  
2. user_rating

As for the android dataset, we could analyze the following columns:
1. Rating
2. Reviews
3. Installs

## Most Common Apps - Creating Frequency Tables

To move forward with this analysis, we will first build a frequency table for the `prime_genre` column of the App Store data set, and for the `Genres` and `Category` columns of the Google Play data set.

We will build two functions we can use to analyze the frequency tables:
1. One function to generate frequency tables that show percentages called `freq_table`
2. Another function we can use to display the percentages in a descending order called `display_table`

Both are defined below:

In [22]:
def freq_table(dataset, index):
    freq = {}
    for row in dataset[1:]:
        relevant_info = row[index]
        if relevant_info in freq:
            freq[relevant_info] += 1
        else:
            freq[relevant_info] = 1
    
    return freq

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        freq_percentage = round((entry[0]/(len(dataset)))*100, 2)
        print(entry[1], ':', freq_percentage, '%')

Then we will show the frequency tables for the `prime_genre` column of the App Store data set, and for the `Genres` and `Category` columns of the Google Play data set.

In [23]:
print ('APPLE - Prime_Genre')
display_table (freeapps_apple,12)
print ('\n')
print ('ANDROID - Category')
display_table (freeapps_android,1)
print ('\n')
print ('ANDROID - Genre')
display_table (freeapps_android,9)

APPLE - Prime_Genre
Games : 58.14 %
Entertainment : 7.88 %
Photo & Video : 4.96 %
Education : 3.66 %
Social Networking : 3.29 %
Shopping : 2.61 %
Utilities : 2.51 %
Sports : 2.14 %
Music : 2.05 %
Health & Fitness : 2.02 %
Productivity : 1.74 %
Lifestyle : 1.58 %
News : 1.33 %
Travel : 1.24 %
Finance : 1.12 %
Weather : 0.87 %
Food & Drink : 0.81 %
Reference : 0.56 %
Business : 0.53 %
Book : 0.43 %
Navigation : 0.19 %
Medical : 0.19 %
Catalogs : 0.12 %


ANDROID - Category
FAMILY : 18.91 %
GAME : 9.72 %
TOOLS : 8.46 %
BUSINESS : 4.59 %
LIFESTYLE : 3.9 %
PRODUCTIVITY : 3.89 %
FINANCE : 3.7 %
MEDICAL : 3.53 %
SPORTS : 3.4 %
PERSONALIZATION : 3.32 %
COMMUNICATION : 3.24 %
HEALTH_AND_FITNESS : 3.08 %
PHOTOGRAPHY : 2.94 %
NEWS_AND_MAGAZINES : 2.8 %
SOCIAL : 2.66 %
TRAVEL_AND_LOCAL : 2.34 %
SHOPPING : 2.24 %
BOOKS_AND_REFERENCE : 2.14 %
DATING : 1.86 %
VIDEO_PLAYERS : 1.79 %
MAPS_AND_NAVIGATION : 1.4 %
FOOD_AND_DRINK : 1.24 %
EDUCATION : 1.16 %
ENTERTAINMENT : 0.96 %
LIBRARIES_AND_DEMO : 0.94 

### 1. Analysing Frequency Tables

With the frequency tables originated, we will analyse what they tell us.

#### 1.1 Apple: `prime_genre`

The most common genre is `Games` representing almost 60% of our dataset, followed by `Enternainment` and `Photo & Video` (both representing around 13% combined).


#### 1.2 Android: `genres` and `category`

On the other hand, for the android dataset, we observe another kind of pattern where there is a more equaly distribution through different genres. In general, there are more apps designed for **practical purposes** rather than entertainment, as we can observe as the following categories take the top places (what is also true for the genre frequency):
1. FAMILY : 18.91 %
2. GAME : 9.72 %
3. TOOLS : 8.46 %
4. BUSINESS : 4.59 %
5. LIFESTYLE : 3.9 %
6. PRODUCTIVITY : 3.89 %
7. FINANCE : 3.7 %

---
Hence, the most important apps of our interest is **entertainment-related for *Apple/iOS***  and **productivity-related for *Android*** if we only analyse through this perspective. However, despite showing the most frequent designed apps, those frequency tables do not reveal the apps that have the most users. Therefore, we can not rely only in these analysis to recommend an app profile that would fit our purpose. 

Now, we'd like to get an idea about the kind of apps with the most users. One way to find out which genres are the most popular (have the most users) is to calculate the average number of **installs** for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` column.

#### 1.3 Apple: average number of user ratings per genre


In [24]:
apple_freqtable_genre = freq_table(freeapps_apple, 12)
genre_user_number_ratings_avg = {}
for genre in apple_freqtable_genre:
    total = 0
    for row in freeapps_apple:
        genre_app = row[12]
        if genre_app == genre:
            total += float(row[6])
    genre_user_number_ratings_avg[genre] = round(total/apple_freqtable_genre[genre],2)   

        
table_display = []
for genre in genre_user_number_ratings_avg:
    key_val_as_tuple = (genre_user_number_ratings_avg[genre], genre)
    table_display.append(key_val_as_tuple)

table_display_sorted = sorted(table_display, reverse = True)
print("APPLE - Average number of user ratings:")
for entry in table_display_sorted:
    print(entry[1], ' - ', entry[0])


APPLE - Average number of user ratings:
Navigation  -  86090.33
Reference  -  74942.11
Social Networking  -  71548.35
Music  -  57326.53
Weather  -  52279.89
Book  -  39758.5
Food & Drink  -  33333.92
Finance  -  31467.94
Photo & Video  -  28441.54
Travel  -  28243.8
Shopping  -  26919.69
Health & Fitness  -  23298.02
Sports  -  23008.9
Games  -  22788.67
News  -  21248.02
Productivity  -  21028.41
Utilities  -  18684.46
Lifestyle  -  16485.76
Entertainment  -  14029.83
Business  -  7491.12
Education  -  7003.98
Catalogs  -  4004.0
Medical  -  612.0


In [25]:
apple_freqtable_genre = freq_table(freeapps_apple, 12)
genre_user_ratings_avg = {}
for genre in apple_freqtable_genre:
    total = 0
    for row in freeapps_apple:
        genre_app = row[12]
        if genre_app == genre:
            total += float(row[8])
    genre_user_ratings_avg[genre] = round(total/apple_freqtable_genre[genre],2)   

        
table_display = []
for genre in genre_user_ratings_avg:
    key_val_as_tuple = (genre_user_ratings_avg[genre], genre)
    table_display.append(key_val_as_tuple)

table_display_sorted = sorted(table_display, reverse = True)
print("APPLE - Average user rating:")
for entry in table_display_sorted:
    print(entry[1], ' - ', entry[0])

APPLE - Average user rating:
Catalogs  -  4.12
Games  -  4.04
Productivity  -  4.0
Shopping  -  3.97
Business  -  3.97
Music  -  3.95
Photo & Video  -  3.9
Navigation  -  3.83
Health & Fitness  -  3.77
Reference  -  3.67
Education  -  3.64
Food & Drink  -  3.63
Social Networking  -  3.59
Entertainment  -  3.54
Utilities  -  3.53
Travel  -  3.49
Weather  -  3.48
Lifestyle  -  3.41
Finance  -  3.38
News  -  3.24
Sports  -  3.07
Book  -  3.07
Medical  -  3.0


Thus, with all those three analysis from the Apple Store, we can suggest that a good app profile would be one made towards entertainment purposes as genres as `Games`,`Music`and `Photo & Video` appear with good results.

#### 1.4 Android: average number of installs

In [26]:
android_freqtable_genre = freq_table(freeapps_android, 1)
genre_number_installs_avg = {}
for genre in android_freqtable_genre:
    total = 0
    for row in freeapps_android:
        genre_app = row[1]
        installs = row[5].replace ("+","")
        installs = installs.replace(",","")
        if genre_app == genre:
            total += float(installs)
    genre_number_installs_avg[genre] = round(total/android_freqtable_genre[genre],2)   
    
table_display = []
for genre in genre_number_installs_avg:
    key_val_as_tuple = (genre_number_installs_avg[genre], genre)
    table_display.append(key_val_as_tuple)

table_display_sorted = sorted(table_display, reverse = True)
print("ANDROID - Average number of installs per category:") 
for entry in table_display_sorted:
    print(entry[1], ' - ', entry[0])

ANDROID - Average number of installs per category:
COMMUNICATION  -  38456119.17
VIDEO_PLAYERS  -  24727872.45
SOCIAL  -  23253652.13
PHOTOGRAPHY  -  17840110.4
PRODUCTIVITY  -  16787331.34
GAME  -  15588015.6
TRAVEL_AND_LOCAL  -  13984077.71
ENTERTAINMENT  -  11640705.88
TOOLS  -  10801391.3
NEWS_AND_MAGAZINES  -  9549178.47
BOOKS_AND_REFERENCE  -  8767811.89
SHOPPING  -  7036877.31
PERSONALIZATION  -  5201482.61
WEATHER  -  5074486.2
HEALTH_AND_FITNESS  -  4188821.99
MAPS_AND_NAVIGATION  -  4056941.77
FAMILY  -  3695641.82
SPORTS  -  3638640.14
ART_AND_DESIGN  -  1986335.09
FOOD_AND_DRINK  -  1924897.74
EDUCATION  -  1833495.15
BUSINESS  -  1712290.15
LIFESTYLE  -  1437816.27
FINANCE  -  1387692.48
HOUSE_AND_HOME  -  1331540.56
DATING  -  854028.83
COMICS  -  817657.27
AUTO_AND_VEHICLES  -  647317.82
LIBRARIES_AND_DEMO  -  638503.73
PARENTING  -  542603.62
BEAUTY  -  513151.89
EVENTS  -  253542.22
MEDICAL  -  120550.62


## App Genre Suggestion

### 1. Summary of Results

#### 1.1 Common Apps - Android

|Common Apps|%|Average number of installs|
|-----------|-|--------------------------|
|Family|18.91|3,695,641.82|
|Game|9.72|15,588,015.6|
|Tools|8.46|10,801,391.3|
|Business|4.59|1,712,290.15|
|Lifestyle|3.9|1,437,816.27|
|Productivity|3.89|16,787,331.34|
|Finance|3.7|1,387,692.48|
|Communication|3.24|38,456,119.17|
|Health and Fitness|3.08|4,188,821.99|
|Photography|2.94|17,840,110.4|
|News and Magazines|2.8|9,549,178.47|
|Social|2.66|23,253,652.13|
|Travel and Local|2.34|13,984,077.71|
|Shopping|2.24|7,036,877.31|

#### 1.2 Common Apps - Apple

|Common Apps|%|Average User Rating|
|-----------|-|-------------------|
|Games|58.14|4.04|
|Entertainment|7.88|3.54|
|Photo & Video|4.96|3.9|
|Education|3.66|3.64|
|Social Networking|3.29|3.59|
|Shopping|2.61|3.97|
|Utilities|2.51|3.53|
|Sports|2.14|3.07|
|Music|2.05|3.95|
|Health & Fitness|2.02|3.77|
|Productivity|1.74|4.0|



### 2. Suggestion

It is then suggested that we focus on developing, at first, an app that is focused on productivity, communication and practical purposes as we will launch the MVP (*minimal viable product*) to the Android platform to test it.

Another fact to consider is that there are a lot more users on Android platform than iOS. Therefore, it would make sense focusing on its demand rather than primarily the iOS demand (focused on entertainment).