# Analyze Apps in App Store and Google Play

Our company wants to launch a new app which uses in-app ads to make profits. It means more people use the app more profit to the company which means it should be free to download. The app will target to English speaking audiences. In this project, we are going to analyst apps in App Store and Google Play to determine which type of app is the best for in-app ads which can be downloaded for free.

There are two datasets in this project which are [App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) and [Google Play](https://www.kaggle.com/lava18/google-play-store-apps). App Store dataset can be downloaded directly from [here](https://dq-content.s3.amazonaws.com/350/AppleStore.csv). Google Play dataset can be downloaded directly from [here](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).

## Opening dataset

In [36]:
from csv import reader

# Read App Store Dataset
appstore_opened = open('Applestore.csv', encoding='utf-8')
appstore_read = reader(appstore_opened)
appstore = list(appstore_read)
appstore_header = appstore[0]
appstore = appstore[1:]

# Read Google Play Dataset
googleplay_opened = open('googleplaystore.csv', encoding='utf-8')
googleplay_read = reader(googleplay_opened)
googleplay = list(googleplay_read)
googleplay_header = googleplay[0]
googleplay = googleplay[1:]

In [37]:
# Create a function to explore dataset easily
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset_slice))
        print('Number of columns:', len(dataset_slice[0]))

In [38]:
print('Number of rows in App Store dataset:', len(appstore))
print('Number of columns in App Store dataset:', len(appstore[0]))
print('\n')
print('Number of rows in Google Play dataset:', len(googleplay))
print('Number of columns in Google Play dataset:', len(googleplay[0]))

Number of rows in App Store dataset: 7197
Number of columns in App Store dataset: 16


Number of rows in Google Play dataset: 10841
Number of columns in Google Play dataset: 13


In [39]:
print('Columns\' names in App Store dataset\n')
appstore_header

Columns' names in App Store dataset



['id',
 'track_name',
 'size_bytes',
 'currency',
 'price',
 'rating_count_tot',
 'rating_count_ver',
 'user_rating',
 'user_rating_ver',
 'ver',
 'cont_rating',
 'prime_genre',
 'sup_devices.num',
 'ipadSc_urls.num',
 'lang.num',
 'vpp_lic']

The columns in the App Store dataset that would be useful for this analysis are price, rating_count_tot and prime_genre.

In [40]:
print('Columns\' names in Google Play dataset\n')
googleplay_header

Columns' names in Google Play dataset



['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

The columns in the Google Play dataset that would be useful for this analysis are Category, Type, Price, Genres.

## Cleaning Data

### Deleting Wrong Data

From this [discussuion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015), there is an error in row 10,472 in the Google Play Dataset.

In [41]:
print(googleplay_header)
print('\n')
print(googleplay[10472]) # incorrect row
print('\n')
print(googleplay[0]) # correct row

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


There is no Category column for row number 10472 and rating is 19 which is clearly off. The maximum rating is only 5. We'll delete this row.

In [42]:
print(len(googleplay))
del googleplay[10472]
print(len(googleplay))

10841
10840


### Removing duplicate Apps

In [43]:
# Find duplicate app
appstore_duplicate = []
appstore_unique = []

for row in appstore:
    app = row[0]
    if app in appstore_unique:
        appstore_duplicate.append(app)
    else:
        appstore_unique.append(app)

print('Number of duplicate apps:', len(appstore_duplicate))

Number of duplicate apps: 0


In [44]:
# Find duplicate app
googleplay_duplicate = []
googleplay_unique = []

for row in googleplay:
    app = row[0]
    if app in googleplay_unique:
        googleplay_duplicate.append(app)
    else:
        googleplay_unique.append(app)

print('Number of duplicate apps:', len(googleplay_duplicate)) 
print('Example of duplicate apps: ', googleplay_duplicate[:50])

Number of duplicate apps: 1181
Example of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express', 'Accounting App - Zoho Books', 'Invoice & Time Tracking - Zoho', 'join.me - Simple Meetings', 'Invoice 2go — Professional Invoices and Estimates', 'SignEasy | Sign and Fill PDF and other Documents', 'Quick PDF Scanner + OCR FREE', 'Genius Scan - PDF Scanner', 'Tiny Scanner - PDF Scanner App', 'Fast Scanner : Free PDF Scan', 'Mobile Doc Scanner (MDScan) Lite', 'TurboScan: scan documents and receipts in PDF', 'Tiny Scanner Pro: PDF Doc Scan', 'Docs To Go™ Fr

### Number of duplicate apps
- There's no duplicate app in the App Store Dataset.
- There're 1,181 duplicate apps in Google Play Dataset.

We need to find the criteria for deleting duplicate entries. Let's look at the example of Telegram

In [45]:
for app in googleplay:
    if app[0] == 'Telegram':
        print(app)

['Telegram', 'COMMUNICATION', '4.4', '3128250', 'Varies with device', '100,000,000+', 'Free', '0', 'Mature 17+', 'Communication', 'July 27, 2018', 'Varies with device', 'Varies with device']
['Telegram', 'COMMUNICATION', '4.4', '3128509', 'Varies with device', '100,000,000+', 'Free', '0', 'Mature 17+', 'Communication', 'July 27, 2018', 'Varies with device', 'Varies with device']
['Telegram', 'COMMUNICATION', '4.4', '3128611', 'Varies with device', '100,000,000+', 'Free', '0', 'Mature 17+', 'Communication', 'July 27, 2018', 'Varies with device', 'Varies with device']


In the fourth column tells how many reviews these apps have. This number can be used to tell which row is the most recent one. We'll use this column to determine which row should be removed and which row should not be removed. The higher number of reviews tells us that this row has been retrieved recently.

In [46]:
### Number of apps without duplicate apps
print('Number of expected apps in App Store without duplicatation: ', len(appstore)-len(appstore_duplicate))
print('Number of expected apps in Google Play without duplicatation: ', len(googleplay)-len(googleplay_duplicate))

Number of expected apps in App Store without duplicatation:  7197
Number of expected apps in Google Play without duplicatation:  9659


In [47]:
# creating a dictionary to store apps' names as keys and number of the highest reviews as values
googleplay_review_max = {}

for app in googleplay:
    name = app[0]
    n_reviews = float(app[3])
    if (name in googleplay_review_max) and googleplay_review_max[name] < n_reviews:
        googleplay_review_max[name] = n_reviews
    elif name not in googleplay_review_max:
        googleplay_review_max[name] = n_reviews
        
print('Number of apps in Google Play Dataset without duplication:', len(googleplay_review_max))

Number of apps in Google Play Dataset without duplication: 9659


In [48]:
# looping through Google Play Dataset and add rows which match the number of the highest review in googleplay_review_max to googleplay_clean
# googleplay_added stores names of the apps that have already added in googleplay_clean
googleplay_clean = []
googleplay_added = []

for row in googleplay:
    name = row[0]
    n_reviews = float(row[3])
    if name not in googleplay_added and n_reviews == googleplay_review_max[name]:
        googleplay_clean.append(row)
        googleplay_added.append(name)
print('Number of apps in Google Play Dataset after removing duplications:', len(googleplay_clean))

Number of apps in Google Play Dataset after removing duplications: 9659


### Removing non-English Apps
The app targets English speaking people. We need to remove apps that use non-English characters. We consider apps' names that have characters in ASCII more than 127 are not apps for English speaking people.

In [49]:
# a function to determine whether names given as parameters do not contain non-English characters
# True if the names given do not contain non-English characters, False otherwise
def english_app(texts):
    for text in texts:
        if ord(text) > 127:
            return False
    return True

In [50]:
# Test english_app function
print(english_app('Instagram')) # True becuase it does not contain non-English characters
print(english_app('爱奇艺PPS -《欢乐颂2》电视剧热播')) # False because it contains non-English characters which are chinese characters
print(english_app('Docs To Go™ Free Office Suite')) # False because it contains non-English character which is special character
print(english_app('Instachat 😜')) # False because it contains non-English character which is emoji

True
False
False
False


The english_app function return "False" which means they're not an app targeting English speaking people even English apps' names that contain some special characters like ™, or 😜. If we use this function, we'll lose some relevant data. We'll find a better solution for this which is the function what return "False" only the apps' name contain more than 3 character that falls out of the ASCII range (0-127).

In [51]:
def english_app_3(texts):
    number = 0
    for text in texts:
        if ord(text) > 127:
            number = number + 1
    if number >= 3:
        return False
    else:
        return True

In [52]:
# Test english_app function
print(english_app_3('Instagram')) # True becuase it does not contain non-English characters
print(english_app_3('爱奇艺PPS -《欢乐颂2》电视剧热播')) # False because it contains more than 3 chinese characters
print(english_app_3('Docs To Go™ Free Office Suite')) # True because it contains only one special character 
print(english_app_3('Instachat 😜')) # True because it contains only one special character

True
False
True
True


In [53]:
# Looping through appstore and use english_app_3 function to remove apps' names which have more 3 characters that fall out of the ASCII range (0-127)
appstore_english = []

for app in appstore:
    name = app[1]
    if english_app_3(name):
        appstore_english.append(app)

print('Number of removed non-English apps in App Store: ', len(appstore) - len(appstore_english))
print('Number of English apps in App Store: ', len(appstore_english))

Number of removed non-English apps in App Store:  1042
Number of English apps in App Store:  6155


In [54]:
# Looping through googleplay_clean and use english_app_3 function to remove apps' names which have more 3 characters that fall out of the ASCII range (0-127)
googleplay_english = []

for app in googleplay_clean:
    name = app[0]
    if english_app_3(name):
        googleplay_english.append(app)

print('Number of removed non-English apps in Google Play: ', len(googleplay_clean) - len(googleplay_english))
print('Number of English apps in Google Play: ', len(googleplay_english))

Number of removed non-English apps in Google Play:  62
Number of English apps in Google Play:  9597


### Remove Paid Apps

In [55]:
appstore_free = []
appstore_paid = []

for app in appstore_english:
    price = float(app[4])
    if price == 0:
        appstore_free.append(app)
    else:
        appstore_paid.append(app)
        
print('Number of free apps in App Store:', len(appstore_free))
print('Number of paid apps in App Store:', len(appstore_paid))

Number of free apps in App Store: 3203
Number of paid apps in App Store: 2952


In [56]:
googleplay_free = []
googleplay_paid = []

for app in googleplay_english:
    price = app[7].replace('$', '')
    price = float(price)
    if price == 0:
        googleplay_free.append(app)
    else:
        googleplay_paid.append(app)
        
print('Number of free apps in Google Play:', len(googleplay_free))
print('Number of paid apps in Google Play:', len(googleplay_paid))

Number of free apps in Google Play: 8848
Number of paid apps in Google Play: 749


## Data Analysis

Our company wants to launch an app in both the Google Play and the App Store. The app will target English speaking people and it will be free. The app makes profits by in-app purchases. We'll need to analyze both datasets to determine what kind of apps are popular among android and ios users. 

### Common Genres in the Market

We'll are going analyze what genres are the most common in the App Store and Google Play market.

In [57]:
# freq_table function which returns the frequency table as percentages
def freq_table(dataset, index):
    table = {}
    for row in dataset:
        name = row[index]
        if name in table:
            table[name] += 1
        else:
            table[name] = 1
    for key, value in table.items():
        table[key] = round(float(value)/len(dataset)*100)
    return table

# display_table function which orders frequencies descendingly and prints them 
def display_table(dataset, index):
    table = freq_table(dataset, index)
    freq_order = []
    for key in table:
        freq_order.append((table[key], key))
    sorted_table = sorted(freq_order, reverse=True)
    for value in sorted_table:
        print(str(value[0])+'%', ':', value[1])

In [58]:
# App Store Common Genre Frequency Table
display_table(appstore_free, 11)

58% : Games
8% : Entertainment
5% : Photo & Video
4% : Education
3% : Social Networking
3% : Shopping
2% : Utilities
2% : Sports
2% : Productivity
2% : Music
2% : Lifestyle
2% : Health & Fitness
1% : Weather
1% : Travel
1% : Reference
1% : News
1% : Food & Drink
1% : Finance
1% : Business
0% : Navigation
0% : Medical
0% : Catalogs
0% : Book


In [59]:
# Google Play Common Category Frequency Table
display_table(googleplay_free, 1)
print('\n')
# Google Play Common Genre Frequency Table
display_table(googleplay_free, 9)

19% : FAMILY
10% : GAME
8% : TOOLS
5% : BUSINESS
4% : PRODUCTIVITY
4% : MEDICAL
4% : LIFESTYLE
4% : FINANCE
3% : SPORTS
3% : SOCIAL
3% : PHOTOGRAPHY
3% : PERSONALIZATION
3% : NEWS_AND_MAGAZINES
3% : HEALTH_AND_FITNESS
3% : COMMUNICATION
2% : VIDEO_PLAYERS
2% : TRAVEL_AND_LOCAL
2% : SHOPPING
2% : DATING
2% : BOOKS_AND_REFERENCE
1% : WEATHER
1% : PARENTING
1% : MAPS_AND_NAVIGATION
1% : LIBRARIES_AND_DEMO
1% : HOUSE_AND_HOME
1% : FOOD_AND_DRINK
1% : EVENTS
1% : ENTERTAINMENT
1% : EDUCATION
1% : COMICS
1% : BEAUTY
1% : AUTO_AND_VEHICLES
1% : ART_AND_DESIGN


8% : Tools
6% : Entertainment
5% : Education
5% : Business
4% : Productivity
4% : Medical
4% : Lifestyle
4% : Finance
3% : Sports
3% : Social
3% : Photography
3% : Personalization
3% : News & Magazines
3% : Health & Fitness
3% : Communication
3% : Action
2% : Video Players & Editors
2% : Travel & Local
2% : Simulation
2% : Shopping
2% : Dating
2% : Casual
2% : Books & Reference
2% : Arcade
1% : Weather
1% : Strategy
1% : Role Playing
1

The first step in our data analysis is to analyze common genres in the App Store and Google Play.  There is only one variable for a category in the App Store but there're two variables in the Google Play dataset which depict types of apps which are category or genre. The genre variable is more granular compared to the category variable. We don't need that much detail in the genre variable. Then we use the category variable to analyze in this case.

### App Store
Games are the most common genre for free English apps in the App Store which accounts for 58% followed by entertainment 8% and photo & video 5%. If we categorize categories into two types which are fun and practical, we can see that there are 71% in fun category (58% games, 8% entertainment, 3% social networking and 2% music) and 29% practical category (5% photo & video, 4% education, 3% shopping, 2% utilities, 2% sports, 2% productivity, 2% lifestyle, 2% health & fitness, 1% weather, 1% travel, 1% reference, 1% news, 1% food & drink, 1% finance, 1% business, 0% navigation, 0% medical, 0% catalogs, 0% book). Fun category is the dominant category in the App Store for free English apps.

### Google Play
Family is the most common category in Google Play which accounts for 19% followed by game 10% and tools 8%. Family is a category for games for kids which we can add both family and game categories which account for 29%. If we categorize categories into fun and practical categories, we can see that there are 29% in fun category (19% family and 10% fame) and 71% practical category (8% tools, 5% business, 4% productiviry, 4% medical, 4% lifestyle, 4% finance, 3% sports, 3% social, 3% photography, 3% personalization, 3% news and magazine, 3% health and fitness, 3% communication, 2% video platers, 2% travel and local, 2% shopping, 2% dating, 2% books and reference, 1% weather, 1% parenting, 1% maps and navigation, 1% libraries and demo, 1% house and home, 1% food and drink, 1% events, 1% entertainment, 1% education, 1% comics, 1% beauty, 1% auto and vehicles and 1% art and design). Practical category is the dominant category in the Google Play for free English apps.

### Most Popular Apps

We're going to explore what kind of apps are the most popular by calculating the average number of install for each app genre. For Google Play, we're going to use Installs column but for Appstore, there is no information regarding the number of installs. We're going to use rating_count_tot column which is the number of user ratings as a proxy for the number of installs.

In [60]:
appstore_unique_genre = freq_table(appstore_free, 11)
average_rating = {}

# Calculate the average of user ratings for each genre
for genre in appstore_unique_genre:
    total = 0
    len_genre = 0
    for app in appstore_free:
        genre_app = app[11]
        if genre_app == genre:
            n_rating = float(app[5])
            total += n_rating
            len_genre += 1
    average = round(total / len_genre)
    average_rating[genre] = average

# Sort the average of user ratings in descending order
average_list = []
for average in average_rating:
    average_list.append((average_rating[average], average))
average_list = sorted(average_list, reverse=True)

for average in average_list:
    print(average[0], ':', average[1])

86090 : Navigation
79350 : Reference
71548 : Social Networking
57327 : Music
52280 : Weather
46385 : Book
33334 : Food & Drink
32367 : Finance
28442 : Photo & Video
28244 : Travel
27231 : Shopping
23298 : Health & Fitness
23009 : Sports
22886 : Games
21248 : News
21028 : Productivity
19156 : Utilities
16815 : Lifestyle
14195 : Entertainment
7491 : Business
7004 : Education
4004 : Catalogs
612 : Medical


The Navigation genre has the most number of user ratings on average but the number of user ratings in this genre has been heavily influenced by Waze and Google Maps as well as other genres. We need to find a solution to remove outliers that skewed our results and make them seem like they're more popular than they are. We're going to remove apps that have the number of user ratings more than 1.5 IQR.

In [61]:
for app in appstore_free:
    if app[11] == 'Navigation':
        print(app[1], app[5])

Waze - GPS Navigation, Maps & Real-time Traffic 345046
Google Maps - Navigation & Transit 154911
Geocaching® 12811
CoPilot GPS – Car Navigation & Offline Maps 3582
ImmobilienScout24: Real Estate Search in Germany 187
Railway Route Search 5


In [88]:
import numpy as np

n_ratings = [int(app[5]) for app in appstore_free]

appstore_q75, appstore_q25 = np.percentile(n_ratings, [75, 25])
appstore_iqr = appstore_q75 - appstore_q25
appstore_iqr

9805.5

IQR for the number of user ratings for the Appstore is 9805.5. 

In [63]:
average_rating_remove_outliers = {}

# Calculate the average of user ratings for each genre
for genre in appstore_unique_genre:
    total = 0
    len_genre = 0
    for app in appstore_free:
        genre_app = app[11]
        if genre_app == genre:
            n_rating = float(app[5])
            if n_rating >= 9805.5:
                continue
            total += n_rating
            len_genre += 1
    average = round(total / len_genre)
    average_rating_remove_outliers[genre] = average

# Sort the average of user ratings in descending order
average_list_remove_outliers = []
for average in average_rating_remove_outliers:
    average_list_remove_outliers.append((average_rating_remove_outliers[average], average))
average_list_remove_outliers = sorted(average_list_remove_outliers, reverse=True)

for average in average_list_remove_outliers:
    print(average[0], ':', average[1])

2474 : Business
2458 : Shopping
2321 : Health & Fitness
1917 : Photo & Video
1895 : Reference
1874 : Music
1855 : Food & Drink
1769 : Entertainment
1709 : Productivity
1654 : Social Networking
1595 : Lifestyle
1507 : Utilities
1443 : Education
1390 : Games
1389 : Finance
1258 : Navigation
1201 : Travel
1183 : Sports
1000 : News
890 : Catalogs
612 : Medical
590 : Weather
275 : Book


The business genre has the most number of user ratings followed by shopping, health & fitness, photo & video, reference, etc.

### The Most Popular Apps by Genre on Google Play

In Google Play Dataset, there is a column containing the number of installs but they're not precise. For example, 100,000+ installs might be 100,000 or 200,000 or 500,000 and so on. We're gonna assume that the number of installs is exactly as they are because we don't need precision in our analysis.

In [64]:
display_table(googleplay_free, 5)

16% : 1,000,000+
12% : 100,000+
11% : 10,000,000+
10% : 10,000+
8% : 1,000+
7% : 5,000,000+
7% : 100+
6% : 500,000+
5% : 50,000+
4% : 5,000+
4% : 10+
3% : 500+
2% : 50,000,000+
2% : 50+
2% : 100,000,000+
1% : 5+
1% : 1+
0% : 500,000,000+
0% : 1,000,000,000+
0% : 0+
0% : 0


In [101]:
googleplay_category = freq_table(googleplay_free, 1)
googleplay_install = {}

# Calculate the average of installs for each category
for category in googleplay_category:
    total = 0
    len_category = 0
    for app in googleplay_free:
        category_app = app[1]
        if category_app == category:
            num_install = app[5]
            num_install = float(num_install.replace('+', '').replace(',', ''))
            total = total + num_install
            len_category = len_category + 1
    googleplay_install[category] = round(total / len_category)

# Sort the average of installs in descending order
googleplay_list = []
for category in googleplay_install:
    googleplay_list.append((googleplay_install[category], category))
googleplay_list = sorted(googleplay_list, reverse=True)

for app in googleplay_list:
    print(app[0], ':', app[1])

38590581 : COMMUNICATION
24727872 : VIDEO_PLAYERS
23253652 : SOCIAL
17840110 : PHOTOGRAPHY
16787331 : PRODUCTIVITY
15544015 : GAME
13984078 : TRAVEL_AND_LOCAL
11640706 : ENTERTAINMENT
10830252 : TOOLS
9549178 : NEWS_AND_MAGAZINES
8814200 : BOOKS_AND_REFERENCE
7036877 : SHOPPING
5201483 : PERSONALIZATION
5145550 : WEATHER
4188822 : HEALTH_AND_FITNESS
4049275 : MAPS_AND_NAVIGATION
3695642 : FAMILY
3650602 : SPORTS
1986335 : ART_AND_DESIGN
1924898 : FOOD_AND_DRINK
1833495 : EDUCATION
1712290 : BUSINESS
1446158 : LIFESTYLE
1387692 : FINANCE
1360598 : HOUSE_AND_HOME
854029 : DATING
832614 : COMICS
647318 : AUTO_AND_VEHICLES
638504 : LIBRARIES_AND_DEMO
542604 : PARENTING
513152 : BEAUTY
253542 : EVENTS
120551 : MEDICAL


Communication is the most popular category for free apps in Google Play which target English speakers followed by video players, social, photography, productivity etc. We need to remove outliers that skew our result by removing apps that have the number of installs greater than 1.5 IQR.

In [93]:
googleplay_installs = [float(app[5].replace('+', '').replace(',', '')) for app in googleplay_free]
googleplay_r25, googleplay_r75 = np.percentile(googleplay_installs, [25, 75])
googleplay_iqr = googleplay_r75 -  googleplay_r25
googleplay_iqr

999000.0

IRQ for the number of installs in Google Play is 999,000. We're going to remove apps that have the number of installs greater than 999,000.

In [103]:
googleplay_install_remove_outliers = {}

# Calculate the average of installs for each category
for category in googleplay_category:
    total = 0
    len_category = 0
    for app in googleplay_free:
        category_app = app[1]
        if category_app == category:
            num_install = app[5]
            num_install = float(num_install.replace('+', '').replace(',', ''))
            if num_install >= 999000:
                continue
            total = total + num_install
            len_category = len_category + 1
    googleplay_install_remove_outliers[category] = round(total / len_category)

# Sort the average of installs in descending order
googleplay_list_remove_outliers = []
for category in googleplay_install_remove_outliers:
    googleplay_list_remove_outliers.append((googleplay_install_remove_outliers[category], category))
googleplay_list_remove_outliers = sorted(googleplay_list_remove_outliers, reverse=True)

for app in googleplay_list_remove_outliers:
    print(app[0], ':', app[1])

186905 : EDUCATION
149590 : WEATHER
136645 : HOUSE_AND_HOME
126944 : GAME
121920 : SHOPPING
121667 : ENTERTAINMENT
108996 : HEALTH_AND_FITNESS
104241 : COMICS
97606 : BEAUTY
94078 : FOOD_AND_DRINK
93802 : ART_AND_DESIGN
92908 : PHOTOGRAPHY
90263 : DATING
80721 : PARENTING
80028 : BOOKS_AND_REFERENCE
79376 : AUTO_AND_VEHICLES
72514 : SOCIAL
72293 : FAMILY
71369 : LIBRARIES_AND_DEMO
66561 : SPORTS
65784 : PERSONALIZATION
58808 : TRAVEL_AND_LOCAL
56013 : FINANCE
55261 : LIFESTYLE
54912 : TOOLS
53683 : PRODUCTIVITY
50429 : VIDEO_PLAYERS
46981 : COMMUNICATION
43725 : MAPS_AND_NAVIGATION
43614 : NEWS_AND_MAGAZINES
39771 : MEDICAL
34617 : EVENTS
23379 : BUSINESS


After removing outliers, the number of installs in Google Play has changed. Education is number one followed by weather, house and home, game, shopping, etc.

## Conclusion

| Order of Popular Genre | App Store          | Google Play        |
| ---------------------- | ----------------- | ------------------ |
| 1                      | Business          | Education          |
| 2                      | Shopping          | Weather            |
| 3                      | Health & Fitness  | House and Home     |
| 4                      | Photo & Video     | Game               |
| 5                      | Reference         | Shopping           |
| 6                      | Music             | Entertainment      |
| 7                      | Food & Drink      | Health and Fitness |
| 8                      | Entertainment     | Comics             |
| 9                      | Productivity      | Beauty             |
| 10                     | Social Networking | Food and Drink     |

The table above shows the top 10 genres for free English apps in App Store and Google Play. There are more Android users (74.25%) more than ios users (25.15%) in August 2020 ([statcounter](https://statcounter.com/os-market-share/mobile/)). The results from Google Play will have more power for us to make a decision because of the number of users. The number of users is important for us because our company wants to make profits from in-app adds which means more people using the app, more profits. 

- Education is a good option for the company.
- Weather seems good but people tend to use it for only a few seconds which would rarely show in-app ads to users.
- House and home and game are other good options.
- Shopping is on the top lists for both App Store and Google Play but it would require building the whole system for shopping, not just the app.
- Entertainment is other top lists for both App Store and Google Play which would be great to build an app in.