# Profitable App Profiles for Google Play and App Store Markets

In this project we will analyze which types of apps are most profitable in the Google Play and App Store marketplaces. We'll act as Data Analysts for a company that specializes in creating apps for these markets. Our job is to help this company's developers make data-driven decisions on which types of app they should be creating. 

The company creates free apps and generates revenue from ads that show within the app. More users within the app means more revenue. The goal in this project is to help app developers understand what types of apps are likely to attract more users on Google Play and the App Store.



## Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on apple's App store, and 2.1 million on Google Play. Instead of parsing through 4 million apps, we will analyze a sample of this data instead. 

Instead of spending the time and energy to create these samples ourselves, there are two data sets already available which suit our needs:

- [This data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) with a sample of around 7,000 iOS apps from the App Store that was collected in July of 2017. 

- [This data set](https://www.kaggle.com/lava18/google-play-store-apps) with with a sample of around 10,000 Android apps from the Google Play store that was collected in August of 2018.

We will start by opening both data sets:

In [4]:
from csv import reader

#import Apple Store data
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
appleApps = list(read_file)
appleheader = appleApps[0]
appleApps = appleApps[1:]

#import Google Play data
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
googleApps = list(read_file)
googleheader = googleApps[0]
googleApps = googleApps[1:]


To make looking through our data easier, we will add an explore data function that will allow us to look through each row of data.

In [5]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(appleheader)
print('\n')
explore_data(appleApps, 0, 3, True)


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


We can quickly see that this dataset has 7197 rows and 16 columns. By looking at the header, we can see that some useful data points would be `track_name`, `price`, `rating_count_tot`, `user_rating`, and `prime_genre`. Not all the headings are intuitive, but explanations for each heading can be found in the [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).

Next, we'll go through the Google Play data:

In [6]:
print('\n')

print(googleheader)
print('\n')
explore_data(googleApps, 0, 3, True)



['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


This dataset has 10,841 rows and 13 columns. Useful columns here might include `App`, `Category`, `Rating`, `Reviews`, and `Genres`. 

## Deleting Wrong Data

The Google Play data set has a [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), and [one discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes an error for a certain row. We will print the incorrect row and compare it against one that is correct:

In [7]:
print(googleheader)
print('\n')
print(googleApps[10472])
print('\n')
print(googleApps[10470])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Jazz Wi-Fi', 'COMMUNICATION', '3.4', '49', '4.0M', '10,000+', 'Free', '0', 'Everyone', 'Communication', 'February 10, 2017', '0.1', '2.3 and up']


We can see that row 10473 that this app `Life Mafe WI-Fi` has a rating of `19`. Since ratings can only go up to 5, this is an incorrect entry. To clean this up we will delete that row from the data set:

In [8]:
print(len(googleApps))
del googleApps[10472] 
print(len(googleApps))

10841
10840


## Removing Duplicate Data Entries

If we look through our data set, or through the discussions sections, we'll find that some apps have duplicate entires. For example in the Google Play dataset, the Facebook app has two entries:

In [9]:
for app in googleApps:
    name = app[0]
    if name == 'Facebook':
        print(app)
        

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


There are 1181 total cases in this dataset where an app appears more than once:

In [10]:
duplicate_apps = []
unique_apps = []

for app in googleApps:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Duplicate apps: ', len(duplicate_apps))
print('\n')
print('Examples: ', duplicate_apps[:5])

Duplicate apps:  1181


Examples:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


In the above code we went through the apps and created two lists: one where the names of each app are unique, and one where they are repeated. The length of the second list is 1181.

With this information we need to delete the duplicates, but we can't do it randomly. We will want to check through the duplicates and find which of them has the most recent data entries. That way, we arent using older iterations of the data.

For example if we look back at our two Facebook entires, we can see that the difference between the two is the number of reviews it collected. The first row has more.

In [11]:
for app in googleApps:
    name = app[0]
    if name == 'Facebook':
        print(app)

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


Next we will need to
- Use the highest number of reviews for each that is duplicated. We can find this using a dictionary. 
- Create a separate list with those rows, eliminating duplicates.

First we will build the dictionary:

In [12]:
reviews_max = {}

for app in googleApps:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews


Earlier we saw that there were 1,181 duplicate rows. With this dictionary, we would expect the length to be the total number of rows minus that number. We can confirm this with the code below.


In [13]:
print('Expected length: ', len(googleApps) - len(duplicate_apps))        
print('Actual length: ', len(reviews_max))

Expected length:  9659
Actual length:  9659


Next, we will use our `reviews_max` dictionary to delete the duplicates. We will use rows with the highest number of reviews for our cleaned dataset. 


In [14]:
android_clean = []
already_added = []

for app in googleApps:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

Now let's explore the newly cleaned data set and confirm that the number of rows is 9,659

In [15]:
explore_data(android_clean, 0, 4, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9659
Number of columns: 13


## Removing Non-English Apps

If we look through the data enough, we see that some of them are not geared towards an English speaking audience. Our company is only interested in English apps, so we can get rid of these apps for our analysis. First we will create a function that checks through an app to see if the characters are in English.

Characters specific to English text are each assigned a number from 0 - 127. This is called the ASCII standard, and there is an existing function that checks a characters ASCII number: `ord()`.  In the function below, we can use the `ord()` function to check if the name of an app is in English.

In [16]:
def englishCheck(a_string):
    
    for character in a_string:
 
        if ord(character) > 127:
            return False 
        
    return True
        
print(englishCheck('Instagram'))
print(englishCheck('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(englishCheck('Docs To Go™ Free Office Suite'))
print(englishCheck('Instachat 😜'))

True
False
False
False


In the tests above we can see that certain characters like ™ and emojis are not included in those 0-127 characters, though the rest is in English. To minimize data loss, we will say that if there are more than 3 characters out of place it isn't English, but under 3 is fine. It may not catch everything, but it will work well enough for our purposes.

In [20]:
def englishCheck(a_string):
    stringCheck = 0
    
    for character in a_string:
        
            if ord(character) > 127:
                stringCheck += 1
            
            if stringCheck > 3:
                return False
        
    return True
        
print(englishCheck('Instagram'))
print(englishCheck('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(englishCheck('Docs To Go™ Free Office Suite'))
print(englishCheck('Instachat 😜'))

True
False
True
True


Below we will use our new `englishCheck()` function to create a new list for all the Google Play apps and all of the App Store apps:

In [21]:
english_android = []

for app in android_clean:
    name = app[0]
    if englishCheck(name):   
        english_android.append(app)
    
        
        
english_iOS = []

for app in appleApps:
    name = app[1]
    if englishCheck(name):   
        english_iOS.append(app)


In [133]:
explore_data(english_android, 0, 3, True)
explore_data(english_iOS, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'G

Exploring the data above, we can see our lists have gone down. We now have 9614 apps in our Google Play data set and 6183 in our App Store data set.

## Isolating the Free Apps

As mentioned in the introduction, our company creates free apps. We'll want to remove all apps that have a price that isn't 0 and put the free apps into new lists:

In [134]:
free_android = []
free_iOS = []

for app in english_android:
    price = app[7]
    
    if price == '0':
        free_android.append(app)


for app in english_iOS:
    price = app[4]
    
    if price == '0.0':
        free_iOS.append(app)
        
print('New length of Android apps: ', len(free_android))
print('New length of iOS apps: ',len(free_iOS))


New length of Android apps:  8864
New length of iOS apps:  3222


## Most Common Apps by Genre

Because the aim of the app company is to find app profiles that are most successful, we need to find the profile that fits both the Google Play market and the Apps Store. 

To begin analysis, we'll take a look at what are the most common genres for each market.

For our Google Play data set, we'll take a look at the `Category`, and `Genres` columns. For the App Store data, we'll look through the `prime_genres` column. 

In [70]:
def freq_table(dataset, index):
    dictionary = {}
    total = 0
    
    for row in dataset:
        total += 1
        point = row[index]
        
        if point in dictionary:
            dictionary[point] += 1
        else:
            dictionary[point] = 1
            
    table_percentages = {}
        
    for key in dictionary:
        percentage = round((dictionary[key] / total) * 100, 2)
        table_percentages[key] = percentage
        
    return table_percentages
            
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [77]:

freq_table(free_iOS, 11)

{'Social Networking': 3.29,
 'Photo & Video': 4.97,
 'Games': 58.16,
 'Music': 2.05,
 'Reference': 0.56,
 'Health & Fitness': 2.02,
 'Weather': 0.87,
 'Utilities': 2.51,
 'Travel': 1.24,
 'Shopping': 2.61,
 'News': 1.33,
 'Navigation': 0.19,
 'Lifestyle': 1.58,
 'Entertainment': 7.88,
 'Food & Drink': 0.81,
 'Sports': 2.14,
 'Book': 0.43,
 'Finance': 1.12,
 'Education': 3.66,
 'Productivity': 1.74,
 'Business': 0.53,
 'Catalogs': 0.12,
 'Medical': 0.19}

First, we will examine the most popular genres for iOS

In [89]:
display_table(free_iOS, 11)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


Here, we can see that the most popular genre for the Apple Store is games generating more than half the total apps in this data set at 58%. The closest behind is entertainment apps at 7.9%. All other apps fall below 5%. Most of the apps created are for entartainment purposes while just a small percentage are for productivity and other practical needs like education or utilities. 

While we know that these entertainment apps hold a large percentage of apps on the market, we do not yet know if they are the apps that hold the most downloads or highest ratings.

Next, We'll analyze the app genres from Google Play. 

In [87]:
display_table(free_android, -4)

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

In [88]:
display_table(free_android, 1)

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


Above, we can see that a different category of Family tends to hold the top place in the Google Play store, as well as the category of tools. Still, gaming and entertainment apps hold a high spot in the fequency of total apps. 

Overall, the Google Play store seems to have less gaming apps total, with app frequency balanced over a wise range of categories.

So far we can see that there is a pattern with the most popular apps by genre being entertainment and game apps, but we have not yet seen which apps are most popular with users. 

## Most Popular Apps by Genre on the App Store

We'll start with finding out which genres are most popular with users on the App Store. We do not have the total number of installations as a category in this data set, but we will use the `rating_tot_count` which counts the total number of ratings as a proxy. We will need to 

- Isolate the apps of each genre
- Sum the user ratings for the apps of that genre
- Divide the sum by number of apps of that genre


In [104]:
user_genre_iOS = freq_table(free_iOS, 11)

for genre in user_genre_iOS:
    total = 0
    len_genre = 0
    
    for app in free_iOS:
        genre_app = app[11]
        
        if genre_app == genre:
            ratings = float(app[5])
            total += ratings
            len_genre += 1
            
    average_iOS = total / len_genre
    print(genre, ' : ', average_iOS )
    

    


Social Networking  :  71548.34905660378
Photo & Video  :  28441.54375
Games  :  22788.6696905016
Music  :  57326.530303030304
Reference  :  74942.11111111111
Health & Fitness  :  23298.015384615384
Weather  :  52279.892857142855
Utilities  :  18684.456790123455
Travel  :  28243.8
Shopping  :  26919.690476190477
News  :  21248.023255813954
Navigation  :  86090.33333333333
Lifestyle  :  16485.764705882353
Entertainment  :  14029.830708661417
Food & Drink  :  33333.92307692308
Sports  :  23008.898550724636
Book  :  39758.5
Finance  :  31467.944444444445
Education  :  7003.983050847458
Productivity  :  21028.410714285714
Business  :  7491.117647058823
Catalogs  :  4004.0
Medical  :  612.0


We can see here that the the most popular genres seem to be Navigation, Reference, and Social Networking. However, closer inspection shows that a few select apps make up the majority of users for those categorites. For example if we examine the reference category, we find that the Bible app accounts for nearly a million ratings:

In [106]:
for app in free_iOS:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


However, we shouldn't discount this category entirely since many of the other apps have a significant amount of ratings as well. In a similar vein, though the Navigation category seems large we can inspect more closely to see if there's anything driving that domination:

In [107]:
for app in free_iOS:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In this case, Waze and Google Maps account for most of those ratings. 

Some popular genres include weather, food and drink, finance, book, travel and photo and video. We should probable exclude some of these categories from our final app profile

- Weather: there isn't too much viability for making a profit from in-app adds from an app like this since people don't spend too much time on weather apps in the first place.

- Food and drink: many of these apps are already branded around a specific company like the Starbucks app, McDonalds, Subways, etc. Or they deliver food. Both are outside the scope of our company.

- Finance: these involve domain knowlege, and would require us to hire a (or many) finance experts to build the app.

However the book, health and fitness, and photo and video categories seem like possible genres to target. Seeing how well the Book category does as well as our travel app, we could create an app that matches books to whatever place you are currently in our travelling to. reccomending books or articles from local authors, or about subjects surrounding that place. 

Next, we will examine Google Play:

## Most Popular Apps by Genre on Google Play



In [132]:
genre_android = freq_table(free_android, 1)

for category in genre_android:
    total = 0
    len_category = 0
    
    for app in free_android:
        category_app = app[1]
        
        if category_app == category:
                installs = app[5]
                installs = installs.replace('+','').replace(',','')
                installs = float(installs)
                total += installs
                len_category += 1
                    
    average_installs = total/len_category
    print(category, ':', average_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Here we can see Comminication, Entertainment, Social, Game, Photography, Family, Travel and Local, Tools, Productivity, Video Players fill some of the top spots in the Google Play store. As with the App Store, some of the categories are skewed with specific apps that have well over a billion downloads. 

However Travel, Shopping, and Books and Reference seem to do well in this market as well. An app that combines all of these could be a profitable one for our company.

## Conclusions

In this project we analyzed data from the App Store and Google Play markets, and reccomended an app profile for our company that was influced by the data we found. 

We concluded that an app that reccomends books based on travel and location would be a good fit. Because book apps already exist, it would also need to need to have another draw such as a social element or influence of local authors. 