## Profitable App Profiled for the Apple App Store and Google Play Store

- This project is about analyzing the applications from both the Google play store and the Apple app store in order to know which applications are worth investing in. 
- My goal in this project is to apply everything I have learnt so far and utilize it in this analysis.

## Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play
Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, these data sets that seem suitable for our goals:

- [A data set](https://www.kaggle.com/lava18/google-play-store-apps) containing data of about 10,000 Android apps from Google Play; the data set can be downloaded directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)
- [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data of about 7,000 iOS apps from the App Store: the data set can be downloaded directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)

In [1]:
from csv import reader

# Opening AppleStore Apps
open_file = open('AppleStore.csv', encoding='utf-8')
read_file = reader(open_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

# Opening Google PlayStore Apps
open_file = open('googleplaystore.csv',encoding='utf-8')
read_file = reader(open_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

Now we create a function to explore the data sets and make our analysis more understandable and easier. This function `explore_data()` will allow us explore rows and be able to show us the number of rows and columns.

In [2]:
def explore_data(dataset, start, end, rows_and_columns = True):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row,'\n') #prints row and adds a linespace after each row
        
    if rows_and_columns:
        print('Number of rows:',len(dataset))
        print('Number of columns:',len(dataset[0]))

print(ios_header,'\n')
explore_data(ios,0,5)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1'] 

['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1'] 

['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1'] 

['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1'] 

['5', '282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+', 'Reference', '37'

We can see that there are 7197 number of apps in this dataset and the columnn mostly important to us in this project are the 'track_name', 'size_bytes', 'price', rating_count_tot', 'user_rating', and 'prime_genre' columns

More information on the ios apps columns can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

In [3]:
print(android_header,'\n')
explore_data(android,0,5)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] 

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] 

['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

We can see that there are 7197 number of apps in this dataset and the columnn mostly important to us in this project are the 'App', 'Category', 'Rating', 'Size', 'Price', 'Content Rating' and 'Genres' columns

More information on the android apps columns can be found [here](https://www.kaggle.com/lava18/google-play-store-apps)


## Data Cleaning

This is one of the most important parts of this project and it involves removing or correcting wrong data, removing duplicate data, and modifying the data for the sole purpose of our analysis.

In [4]:
# Delete this android row with the error and run only ONCE

print(android_header,'\n')
print(android[10472],'\n')
# del android[10472]
print(android[10472],'\n')

print(len(android)) #the new lenght has been reduced by 1

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up'] 

10840


Above, we can see that the row 10472 in the android dataset does not have a category.

Now the row with an error has been deleted and we will proceed with the rest of the data cleaning process

In [5]:
# To check if there are similar errors let us check the length of each row

for app in android:
    if len(app) != len(android_header):
        print(app)
        
for App in ios:
    if len(App) != len(ios_header):
        print(App)
        
# No output means the rows are now all equal

## Removing Duplicate Apps
From the [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion) section, it can be noticed that some apps were duplicated and this has to be corrected

In [6]:
# finding out the number of duplicate apps in android

for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Now let us find the number of duplicate apps in android since ther is no such error for ios(by running the above code for ios)

In [7]:
# Finding the number of duplicate app for android

duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate_apps:', len(duplicate_apps),'\n')
print('Number of unique_apps:', len(unique_apps),'\n')
print('Examples of duplicate apps:',duplicate_apps[:20])

Number of duplicate_apps: 1181 

Number of unique_apps: 9659 

Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express']


In [8]:
# Showing you that some apps indeed appear more than once

app_freq = {}
for app in android:
    name = app[0]
    if name in app_freq:
        app_freq[name] += 1
    else:
        app_freq[name] = 1
        
# print(app_freq)

### Removing Duplicates
### Part One
Since there are multiple duplicated apps, we need to keep one and remove the others. To make this possible, we will use the product reviews since the highest review would be the most recent one. This way, we can eliminate the duplicates.

We found out that there were 1,181 duplicate apps and we have 9,659 unique apps

In [9]:
print('Expected length:', len(android)-1181)     #To confirm the observation

Expected length: 9659


To remove the duplicates:

- We create a dictionary where each ket is a unique app name and the value is the highest number of reviews of that app
- The information in the new dictionaty is used to form a new dataset, which has only one entry per app(app with highest reviews)

In [10]:
# Creating the dictionary

reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
    
# print(reviews_max)
print(len(reviews_max))

9659


### Part two
Now we will use the `reviews_max` dictionary to remove the duplicates. Remember that we only need the apps with the highest number of reviews since we eliminated the duplicates. This is how the code below works:
* We start by initializing two empty lists, `android_clean` and `already_added`
* We loop through the android set for every iteration
    * We isolate the name and number of reviews of the app
    * We add the app to the `android_cleaned` list and the app name to `already_added` if:
        * The number of reviews is the same as the maximum number of reviews of the app in the `reviews_max` dictionary
        * The app name is not already in the `already_added` list. This is so that we do not have some apps with the same number of reviews still become a duplicate. If we only check for `n_reviews == reviews_max[name]`, some apps have the same maximum reviews and they will become duplicates.

In [11]:
# Creating the clean data

android_clean = []
already_added = []
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
        
explore_data(android_clean,1,5)
explore_data(already_added,1,5)

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] 

['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'] 

['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up'] 

Number of rows: 9659
Number of columns: 13
U Launcher Lite – FREE Live Cool Themes, Hide Apps 

Sketch - Draw & Paint 

Pixel Draw - Number Art Coloring Book 

Paper flowers instructions 

Number of rows: 9659
Number of columns: 46


### Removing Non-English Apps
### Part One
If we explore both app data sets, we will notice some apps with non-english names and our company focuses only on english apps

In [12]:
print(ios[813][1])
print(ios[6731][1],'\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

436672029
1144164707 

中国語 AQリスニング
لعبة تقدر تربح DZ


We are not interested in keeping app with non-english names and we will remove them. The english text commonly used are numbers and other punctuation marks(.,!,?,;) and symbols(+,`*`,/)

Behind the scenes, each character we use in a string has a corresponding number associated with it. For instance, the corresponding number for characer `'a'` is 97, character `'A'` is 65, and character `'爱'` is 29,233. These are because of the ASCII standards that has characters ranging between 0 and 127.

We can take advantage of this using the in-built function `ord()` to eliminate characters that are not within the ASCII range.

If an app contains a character greater than 127, then it is probably not an english name.

In [13]:
# Writing a function that takes in a string and returns True or False

def english_check(name):
    for character in name:
        if ord(character) > 127:
            return False
        
    return True
            
print(english_check('Instagram'))
print(english_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_check('Docs To Go™ Free Office Suite'))
print(english_check('Instachat 😜'))

True
False
False
False


The function above worked but it does not represent emojis or some symbols. If we use the function created above then we might lose a large amount of data since these emojis and symbols fall outside of the ASCII range of 0 to 127.

### Part Two

To solve this problem, we adjust the function above to cater for names that have at most three characters outside of the ASCII range

In [14]:
# Updated function for english app

def english_name(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
        
        if non_ascii > 3:
            return False
    return True

print(english_name('Docs To Go™ Free Office Suite'))
print(english_name('Instachat 😜'))

True
True


Although the functionn is not perfect but it should be able to filter out most of the non-english apps on both platforms

In [15]:
# Filtering out non_english apps

ios_english = []
android_english = []

for app in ios:
    name = app[2]
    if english_name(name) == True:
        ios_english.append(app)
        
for app in android_clean:
    name = app[0]
    if english_name(name) == True:
        android_english.append(app)
        
# print(len(ios_english))
# print(len(android_english))
explore_data(ios_english,0,3)
explore_data(android_english,0,3)

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1'] 

['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1'] 

['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1'] 

Number of rows: 6183
Number of columns: 17
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Te

Now we can see that there are about **6,183** english **ios** apps and **9,614** english **android** apps. Old data was 7197 and 9659

## Isolating the Free Apps
As we mentioned in the introduction, we only build free apps and those are the apps we need for this analysis to make it a fruitful project. Isolating the free apps is the last step of our data cleaning process.

In [16]:
# Creating a new data set consisting of free apps only

ios_free = []
android_free = []

for app in ios_english:
    price = app[5]
    if price == '0':
        ios_free.append(app)

for app in android_english:
    price = app[6]
    if price == 'Free':
        android_free.append(app)
        
explore_data(ios_free,0,3)
explore_data(android_free,0,3)

['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1'] 

['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1'] 

['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1'] 

Number of rows: 3222
Number of columns: 17
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', 

Now we have **3,222** free ios apps and **8,863** free android apps. This should be enough for our analysis

## Most Common Apps by Genre
### Part One
Our aim is to determine the kind of apps that will most likely attract more new users because our revenue depends on in-app ads and the more users we have, the better it is for us. We use the steps.

To reduce the risks ahead, we want to:
1. Build a simple or minimal version of the app in android then add to the Google Play store.
2. If the app gets good response from users, we improve on it.
3. If the app becomes profitable after six months, we build the iOS version of the app and ass it to the App Store.

So now we need to find the kind of apps that are successful on both platforms. 

To get the sense of what is popular on both platforms, we will build a frequency table to know the app genre of popular apps. For ios we will use the `prime_genre` column and for android we will use the `Category`and `Genres` column.

In [18]:
# To see the columns we need
print(ios_header,'\n')
print(android_header)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


## Part Two
Since we have decided on which columns to use, no we will build two functions we can use to analyze the frequency tavles:
* One function to generate frequency tables that show percentages
* Second function we can use to display the percentages in descending order

In [42]:
def freq_table(dataset, index):
    frequency = {}
    total = 0
    for app in dataset:
        total += 1
        value = app[index]
        if value in frequency:
            frequency[value] += 1
        else:
            frequency[value] = 1
    
    freq_percentage = {}
    for genre in frequency:
        percentage = (frequency[genre] * 100) / total
        freq_percentage[genre] = percentage
    
    return freq_percentage
        

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Part Three
Now we generate the frequency tables for columns `prime_genre`, `Genres`, and `Category`

In [49]:
display_table(ios_free,12) #Frequency table for Prime Genre

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.6623215394165114
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.017380509000621
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


As seen from the frequency table for ios:
- The most common genre is Games and Entertainment comes second.
- Since these are free apps, users tend to get free games and other entertainment apps.
- Most apps are designed for games, photo and video, social networking, sport and music.

To recommend an app profile for the App Store market based on this frequency table, I think it is difficult to decide since the larger percentage does not directly imply a large number of users

In [53]:
display_table(android_free,1) #Frequency table for Category

FAMILY : 18.8987927338373
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.700778517432021
MEDICAL : 3.5315355974275078
SPORTS : 3.3961412614238973
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376733
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.2452894053932075
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496447
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916394
AUTO_AND_VEHICLES : 0.9251946293580052
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189553
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0.

The most common apps here are mostly for family, gaming, tools, business, lifestyle and productivity. This is very different compared to the ios apps. 

This shows that most users on android actually use the apps and not just download. It seems to be more functional and free.

In [52]:
display_table(android_free,9) #Frequency table for Genre 

Tools : 8.450863138892023
Entertainment : 6.070179397495205
Education : 5.348076272142615
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.700778517432021
Medical : 3.5315355974275078
Sports : 3.4638384294257025
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376733
Travel & Local : 2.324269434728647
Shopping : 2.2452894053932075
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496447
Arcade : 1.8503892587160105
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.1282861333634209
Racing : 0.9928917973598105
Role Playing : 0.9364774906916394
Libraries & Demo : 0.9364774906916394
Auto & Vehicles : 0.9251946293580052

Based on these data, I still cannot recommend an app because I do not know how many users are using these apps in these categories. I do have an expectation but we will have to generate another table that shows the number of users.

## Most Popular Apps by Genre on the App
The frequency tables we analyzed above showed us that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to get an idea about the kind of apps with the most users.

One way to find out what genres are the most popular, is to calculate the average number of installs for each app genre. For Google Play data set, we use the `Installs` columns. The App Store data set does not have this but we can make use of the the `rating_count_tot` column to approximate this.

In [75]:
genre_freq = freq_table(ios_free,12)

for genre in genre_freq:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[-5]
        if genre_app == genre:
            rating = float(app[6])
            total += rating
            len_genre += 1
    
    ratings_avg = total / len_genre
    print(genre, ':', ratings_avg)


Productivity : 21028.410714285714
Weather : 52279.892857142855
Shopping : 26919.690476190477
Reference : 74942.11111111111
Finance : 31467.944444444445
Music : 57326.530303030304
Utilities : 18684.456790123455
Travel : 28243.8
Social Networking : 71548.34905660378
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Games : 22788.6696905016
Food & Drink : 33333.92307692308
News : 21248.023255813954
Book : 39758.5
Photo & Video : 28441.54375
Entertainment : 14029.830708661417
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Navigation : 86090.33333333333
Medical : 612.0
Catalogs : 4004.0
