# Analyzing App Data
***
In this project we will import app data from the Google Play and the App Store library to better understand what users are looking for in an app. 

The goals of this project is to advize developers as to how they can improve thier apps by providing insights into what users seem to be looking for and how users are actually interacting with the apps.

## Section 1: Introducing The Data Set
***
1. A data set containing data about approximately ten thousand Android apps from Google Play — the data was collected in August 2018. [dataset](https://www.kaggle.com/lava18/google-play-store-apps/home)
2. A data set containing data about approximately seven thousand iOS apps from the App Store — the data was collected in July 2017. [dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)

In [38]:
from csv import reader

### The Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

`explore_data(dataset, start, end, rows_and_columns=False)` is a useful fucntion that parses a data set in cvs. It slices the data as you with. It then prints the set. And it finally provides row and column data.

It takes in four parameters:

dataset, which is expected to be a list of lists
start and end, which are both expected to be integers and represent the starting and the ending indices of a slice from the data set
rows_and_columns, which is expected to be a Boolean and has False as a default argument.

In [39]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

### Exploring Android Data

In [40]:
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


## Exploring iOS Data

In [41]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


|App|Category|Rating|Reviews|Size|Installations|Type|Price|Content Rating|Genre|Last Updated|Current Ver.|Android Ver.|
|---|:---:|---:|

Of the 13 categories recorded in our data set all could be useful for analysis of some sort. However, we will likely skip over 'last updated' and 'current version.' 

Take a look at the [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) here for more information on the raw data set. 

# Section 2: Cleaning the Data
***

## Deleting Duplicates, Innacuracies, and Irrelevant Data.

### We will need to
### 1. Detect inaccurate data and correct (or remove) it
### 2. Detect duplicate data and remove the duplicates
### 3. Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播
### 4. Remove non-free apps
***

### Missing Information

Below is an example of diry data. The 'genre' column is missing here. In cases like this we will delete the entire list. 

In [42]:
print(android_header)
print('\n')
print(android[10472]) # inocrrect row

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [43]:
print(len(android))
del android[10472] #dont run this again
print(android[10472])
print(len(android))

10841
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']
10840


### Duplicates
We know from visually scanning some of the data and looking through the data set comments online that there are quite a few instances of repeated entried. 

Below is one example of Instagram. However there are more that 1000 repeat rows. 

### Removing Android Duplicates

In [44]:
for app in android:
    name=app[0]
    if name == "Instagram":
        print(app)
        print("\n")
    

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




In [45]:
duplicate_apps=[]
unique_apps=[]
for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print("Number of duplicate apps:", len(duplicate_apps))
print("\n")
print("Number of unique apps:", len(unique_apps))
print("\n")
print("Example of duplicate apps:", duplicate_apps[:20])


Number of duplicate apps: 1181


Number of unique apps: 9659


Example of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express']


So we we need to delete duplicates. However if the copies are not idential, say they are of different ages, then the information they contain will vary. For example, a newer inout may habve more reviews or a more recent version. We will be carefull to pick the more recent version and delete the older duplicates. We will focus on number of reviews as you can see from the below data that "version" can be undefined as "Varies with Device."

Below you can see the "Instagram" duplicated and the varying review number.

In [46]:
for app in android:
    name = app[0]
    if name =="Instagram":
        print(app)
        print("\n")

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




In [47]:
for app in android:
    if app[3]=='3.0M':
        print(app)

In [48]:
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and n_reviews > reviews_max[name]:
        reviews_max[name]=n_reviews
        
    elif name not in reviews_max:
        reviews_max[name]=n_reviews
        
print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


We have isolated the duplicates with the highest ratings. We have then verified that the expected lenght and out new length match. Indeed they do!

Next we must actually produce a new clean set of data. 

In [49]:
android_clean=[] # stores our new clean data set
already_added=[] # stores only app names

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
        
print("There are", len(android_clean), "items in this clean set.")

def check_duplicate(data_set, app_name):
    counter=0
    for app in data_set:
        name=app[0]
        if name == app_name:
            counter+=1
            ratings=app[3]
    
    if counter==1:
        print("There was", counter, "version of", app_name)
        print("and it has", ratings ,"ratings, as expected!")
    elif counter>1:
        print("There were", counter, "versions of", app_name)
    else: 
        print("Doesn't seem like", app_name, "exists!")


check_duplicate(android_clean, "Instagram")
        

There are 9659 items in this clean set.
There was 1 version of Instagram
and it has 66577446 ratings, as expected!


### Cleaning iOS Duplicates

In [50]:
duplicate_apps=[]
unique_apps=[]
for app in ios:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print("Number of duplicate apps:", len(duplicate_apps))
print("\n")

reviews_max = {}
for app in ios:
    name = app[0]
    n_reviews = float(app[5])
    
    if name in reviews_max and n_reviews > reviews_max[name]:
        reviews_max[name]=n_reviews
        
    elif name not in reviews_max:
        reviews_max[name]=n_reviews
        
print('Expected length:', len(ios))
print('Actual length:', len(reviews_max))

ios_clean=[] # stores our new clean data set
already_added=[] # stores only app names

for app in ios:
    name = app[0]
    n_reviews = float(app[5])
    if n_reviews == reviews_max[name] and name not in already_added:
        ios_clean.append(app)
        already_added.append(name)
         
print("There are", len(ios_clean), "items in this clean set.")



Number of duplicate apps: 0


Expected length: 7197
Actual length: 7197
There are 7197 items in this clean set.


Above we produced a new set of clean data called `android_clean` and we ensured it was indeed clean by firstly comparing the size of the data set with the expected size we worked out ealier. 
Secondly we have used a function called `check_duplicate` that simplifies a quick check of a specific app and the number of duplicates. 
We ran this for "Instagram" as we did earlier and there was indeed 1 version. And finally the version we have is the one with the largest number of reviews. 

## Cleaning Non-English Apps

We only produce english apps and so we must clean non english apps from the set. 

We can use ASCII and string indexing to sort thorugh our app names and start to clean up further. 

In [51]:
def is_english(app):
    name=str(app)
    counter=0
    for character in name:
        if ord(character)>127:
            counter+=1

    print(counter)
                
    if counter<4:
        print(app, "is likely an english app")
    else: 
        print(app, "is likely a foreing app")

is_english('Instachat 😜') 
is_english('爱奇艺PPS -《欢乐颂2》电视剧热播') 

1
Instachat 😜 is likely an english app
13
爱奇艺PPS -《欢乐颂2》电视剧热播 is likely a foreing app


In [73]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(is_english('Instachat 😜😜😜😜'))
print(ord('™'))
print(ord('😜'))

1
Docs To Go™ Free Office Suite is likely an english app
None
1
Instachat 😜 is likely an english app
None
4
Instachat 😜😜😜😜 is likely a foreing app
None
8482
128540


This method of course is far from fullproof. As we can see above.

### It's time to clean the data for non-english

We will use an adapted versionof the `is_english` function to clean our data set further.

In [52]:
def clean_english(data_set, threshold):
    
    clean_english_set=[]
    names=[]
    
    for app in data_set:
        name=str(app[0])
        counter=0
        for character in name:
            if ord(character)>127:
                counter+=1
        if counter < threshold:
            clean_english_set.append(app)
            names.append(name)
    
    return clean_english_set, names

### Cleaning Android English

In [53]:
clean_english_android , names = clean_english(android_clean, 4)

print("There are", len(clean_english_android), "rows in this cleaned set.")
print(names[:30])



There are 9614 rows in this cleaned set.
['Photo Editor & Candy Camera & Grid & ScrapBook', 'U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'Sketch - Draw & Paint', 'Pixel Draw - Number Art Coloring Book', 'Paper flowers instructions', 'Smoke Effect Photo Maker - Smoke Editor', 'Infinite Painter', 'Garden Coloring Book', 'Kids Paint Free - Drawing Fun', 'Text on Photo - Fonteee', 'Name Art Photo Editor - Focus n Filters', 'Tattoo Name On My Photo Editor', 'Mandala Coloring Book', '3D Color Pixel by Number - Sandbox Art Coloring', 'Learn To Draw Kawaii Characters', 'Photo Designer - Write your name with shapes', '350 Diy Room Decor Ideas', 'FlipaClip - Cartoon animation', 'ibis Paint X', 'Logo Maker - Small Business', "Boys Photo Editor - Six Pack & Men's Suit", 'Superheroes Wallpapers | 4K Backgrounds', 'HD Mickey Minnie Wallpapers', 'Harley Quinn wallpapers HD', 'Colorfit - Drawing & Coloring', 'Animated Photo Editor', 'Pencil Sketch Drawing', 'Easy Realistic Drawing Tutorial', 

### Cleaning iOS English

In [54]:
clean_english_ios , names = clean_english(ios_clean, 4)

print("There are", len(clean_english_ios), "rows in this cleaned set.")
print(names[:30])



There are 7197 rows in this cleaned set.
['284882215', '389801252', '529479190', '420009108', '284035177', '429047995', '282935706', '553834731', '324684580', '343200656', '512939461', '362949845', '359917414', '469369175', '924373886', '575658129', '506627515', '500116670', '479516143', '293778748', '341232718', '440045374', '295646461', '487119327', '284815942', '596402997', '466965151', '293622097', '350642635', '582654048']


### It's time to remove free apps

We are only looking for free apps becuase we only develop free apps. We make revenu from in app advertising. 

In [55]:
def clean_free(data_set, index, free):
    clean_free_set=[]
    for app in data_set:
        if app[index]==free:
            clean_free_set.append(app)
    return clean_free_set

### Clean Free Android

In [56]:
clean_free_android=clean_free(clean_english_android,6,"Free")
android_final=clean_free_android
print("There are", len(clean_free_android), "free, english apps in the Google play store.")
print("\n")
print(clean_free_android[:3])

There are 8863 free, english apps in the Google play store.


[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]


### Clean Free iOS

In [57]:
clean_free_ios=clean_free(clean_english_ios,index=4,free='0.0') # index for price is 4
ios_final=clean_free_ios
print("There are", len(clean_free_ios), "free, english apps in the Apple store.")
print("\n")
print(clean_free_ios[:3])

There are 4056 free, english apps in the Apple store.


[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']]


# Section 3: Analysis

So far we have:
- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps
- Isolated the free apps

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our data sets.

## We need to perform preliminary analysis to see which columns are ideal for frequency tables.

We will do this by looping through each column and finding out how many variations of the result there are.



In [58]:
def freq_table(dataset, index):
    freq={}
    length=len(dataset)
    for app in dataset:
        name=app[index]
        if name in freq:
            freq[name]+=1
        else:
            freq[name]=1  
            
    for item in freq:
        freq[item]= round(float(freq[item]/length*100),2)
    
    return freq

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])  
             

## Most Popular Google Play Genres
***
There are dozens of genres in the Google Play library. However the Family genre is by far the most common followes by Game and Tools After Tools, the genres are all below 5%. 

In [59]:
display_table(clean_free_android,1) #Genre

FAMILY : 18.9
GAME : 9.73
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


## Most Popular Google Play Categories
***
Once again the distribution of Categories is quite even. However only Tools, Entertainment, Education are categories with over 5%. 

There is a clear desire for apps to be immediately useful, as in they are often needed to function as a tool. Not only in the Tool category but Educatio

In [60]:
display_table(clean_free_android,9) #Category

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.9
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;B

## Most Popular Google Play Prime Genres
***
In this case there is a clear sense that most free english apps are designed for all age ranges. Indeed 81.44% of all the apps we analysed are for everyone. And only a tiny fraction of apps in our set are designed for ages 18+.

In [61]:
display_table(clean_free_android,8) #age_range

Everyone : 81.44
Teen : 11.06
Mature 17+ : 4.19
Everyone 10+ : 3.26
Adults only 18+ : 0.03
Unrated : 0.02


### Most Popular Apple Store Genre

In [62]:
display_table(clean_free_ios,11) #prime_genre

Games : 55.65
Entertainment : 8.23
Photo & Video : 4.12
Social Networking : 3.53
Education : 3.25
Shopping : 2.98
Utilities : 2.69
Lifestyle : 2.32
Finance : 2.07
Sports : 1.95
Health & Fitness : 1.87
Music : 1.65
Book : 1.63
Productivity : 1.53
News : 1.43
Travel : 1.38
Food & Drink : 1.06
Weather : 0.76
Reference : 0.49
Navigation : 0.49
Business : 0.49
Catalogs : 0.22
Medical : 0.2


Interestingly here the majority of all free english apps in the apple store are games. The second largest group which is only about a 7th the size of Games, is Entertainment. 

### Analsysis Comparison Regarding Genre:
Games vs tools. However the clear differentiator in the app store with games being such a clear majority is also a stark difference the more evenly distributed genres of the Google Play store. 

## Further Genre Analysis
One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

In [63]:
def convert_str_to_float(string):
    string=string.replace('+','')
    string=string.replace(',','')
    number=float(string)
    return number

number=convert_str_to_float('500,000,000+')
print(number)
print(type(number))

500000000.0
<class 'float'>


In [64]:
def avg_install_genre(freq_table, dataset, genre_index, install_index):
    for genre in freq_table:
        total=0
        len_genre=0
        for app in dataset:
            genre_app=app[genre_index] #index for genre
            if type(app[install_index])==str:
                n_ratings=convert_str_to_float(app[install_index])
            n_ratings=float(n_ratings) #index is for number of ratings
            if genre_app==genre:
                len_genre+=1
                total+=n_ratings
        avg_rating=round(total/len_genre)
        print(genre)
        #print(total)
        print(avg_rating)
        print('\n')

## Rating By Genre Analysis for iOS

Social Networks, Reference and Music top this analysis with over 50,000 rating each. There are big players in each of there genres that have massive advantages and win most of the reviews. 

Our recommendation would be to isolate a genre that is currently not booming but yet is not heavily sidelined. It may have some potential to boom and the competition in the space is less aggressive and suseptible to new entrants. 

Perhaps Travel is a ripe for new entrants. With only 1.38% of apps and yet a healthy portion of reviews it is a vibrant area. The dominant player is AirBnb which is losing some of its shine as it grows. This makes it an exciting area to get into, however there are significant obstacles. It is a huge market and there is lots of competition. However compared with the aforementioned highly popular genres it is a better bet with a huge upside and strong momentum. Furthermore a travel app can easily be designed to tap into the "family" category that we have found to be pervasive. 

In [65]:
genre_freq_ios=freq_table(clean_free_ios, 11)

avg_install_genre(genre_freq_ios, clean_free_ios,11, 5)

Education
6266


Reference
67448


Lifestyle
8978


Navigation
25972


Games
18925


Finance
13522


Productivity
19054


Sports
20129


Health & Fitness
19952


Entertainment
10823


Shopping
18747


Photo & Video
27250


Utilities
14010


Book
8498


Food & Drink
20179


Weather
47221


News
15893


Medical
460


Business
6368


Music
56482


Social Networking
53078


Travel
20216


Catalogs
1780




## Rating By Genre Analysis for Android

Here we will use intalls rather than ratings to gague popularity. However wee need to clean the install column. As you can see below, strings are used and they are not precise. 

We want to convert these to floats so we can order and manipulate them.

In [66]:
display_table(android_final,5) #Installs

1,000,000+ : 15.73
100,000+ : 11.55
10,000,000+ : 10.55
10,000+ : 10.2
1,000+ : 8.39
100+ : 6.92
5,000,000+ : 6.83
500,000+ : 5.56
50,000+ : 4.77
5,000+ : 4.51
10+ : 3.54
500+ : 3.25
50,000,000+ : 2.3
100,000,000+ : 2.13
50+ : 1.92
5+ : 0.79
1+ : 0.51
500,000,000+ : 0.27
1,000,000,000+ : 0.23
0+ : 0.05


### Andrid Popularity by Category

In [67]:
genre_freq_android=freq_table(android_final, 1) #

avg_install_genre(genre_freq_android, android_final,1, 5) #

EVENTS
253542


TOOLS
10801391


PARENTING
542604


BOOKS_AND_REFERENCE
8767812


COMMUNICATION
38456119


SOCIAL
23253652


HEALTH_AND_FITNESS
4188822


MEDICAL
120551


AUTO_AND_VEHICLES
647318


FOOD_AND_DRINK
1924898


SPORTS
3638640


DATING
854029


ENTERTAINMENT
11640706


VIDEO_PLAYERS
24727872


WEATHER
5074486


SHOPPING
7036877


BUSINESS
1712290


EDUCATION
1833495


BEAUTY
513152


LIFESTYLE
1437816


GAME
15588016


COMICS
817657


FAMILY
3697848


TRAVEL_AND_LOCAL
13984078


HOUSE_AND_HOME
1331541


ART_AND_DESIGN
1986335


FINANCE
1387692


PHOTOGRAPHY
17840110


PRODUCTIVITY
16787331


LIBRARIES_AND_DEMO
638504


NEWS_AND_MAGAZINES
9549178


MAPS_AND_NAVIGATION
4056942


PERSONALIZATION
5201483




### Andrid Popularity by Genre

In [68]:
genre_freq_android=freq_table(android_final, 9) #

avg_install_genre(genre_freq_android, android_final,9, 5) #

News & Magazines
9549178


Card
3815462


Beauty
513152


Educational;Creativity
2333333


Strategy;Action & Adventure
1000000


Strategy;Creativity
1000000


Adventure;Action & Adventure
35333333


Dating
854029


Education;Education
4759517


Business
1712290


Casual;Music & Video
10000000


Puzzle;Creativity
750000


Puzzle;Education
100000


Strategy;Education
500000


Maps & Navigation
4056942


Lifestyle;Education
100000


Educational;Pretend Play
9375000


Books & Reference;Education
1000


Racing;Pretend Play
1000000


Board;Action & Adventure
3000000


Simulation;Action & Adventure
4857143


Adventure
4922785


Strategy
11339901


Arcade;Pretend Play
1000000


Music
9445583


Simulation;Education
500


Role Playing
3965645


Entertainment;Pretend Play
3000000


Board;Brain Games
407143


Education;Creativity
2875000


Casual
19569222


Action;Action & Adventure
5888889


Productivity
16787331


Educational;Action & Adventure
17016667


Travel & Local
14051476


Puzzle;Brain G

# Section 4: Conclusion

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [69]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [70]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

We have gone through the Google Play Store and the Apple App Store. We have found that family apps in a semi-popular genre with some successful competiors but not behemouths.

So we recommend an app that is perhaps in lifestyle with a focus on millenials and their impact on the environment. This app could track users consumption and rate thier carbon footprint and recommend them ways to reduce their foo