# Profitable App Profiles for the App Store and Google Play Markets by Pavel Gladkevich

This project was completed as part of the Data Analyst series of [Dataquest](https://www.dataquest.io/directory/) on 04/26/19
<br/><br/>**Goal:** Our aim in this project is to enable a team of developers to make decisions based off of the [Apple iOS app store data](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) and the [Google Play Store data](https://www.kaggle.com/lava18/google-play-store-apps/home). We will analyze these data bases with the goal of suggesting possible app ideas, specifically targetting free applications that will generate revenue from in-app adds. Our suggestions will be based off of what applications are likely to attract more users and hence potentially more revenue. 

# Exploring Data

In [488]:
#load the downloaded csv file of Apple Apps Data from file path
open_file_1 = open('/Users/pgladkevich/Desktop/coding/projects/datasets/app_store_apple_dataset_10k_apps/AppleStore.csv')


In [489]:
#load the downloaded csv file of Google Apps Data from file path
open_file_2 = open('/Users/pgladkevich/Desktop/coding/projects/datasets/google-play-store-apps/googleplaystore.csv')

In [490]:
# Return a reader object which will iterate over lines in the given csvfile. Each 
from csv import reader

read_file_1 = reader(open_file_1)
read_file_2 = reader(open_file_2)

#The Apple App Store data set
ios_apps_data = list(read_file_1)

#The Google Play Store data set
google_apps_data = list(read_file_2)

ios_header = ios_apps_data[0]
google_header = google_apps_data[0]

#Get rid of the headers
ios_apps_data = ios_apps_data[1:]
google_apps_data = google_apps_data[1:]

In [491]:
# Returns a desired number of rows from the top of the dataset and can optionally calculate the total row/columns
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [492]:
print(ios_header)
print('\n')
print(explore_data(ios_apps_data, 0, 2, rows_and_columns=True))

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


Number of rows: 7197
Number of columns: 17
None


The output above is four rows from the Apple Apps Data dataset. The columns 'track_name', 'currency', 'price','rating_count_tot', 'user_rating', and 'prime_genre' might be of interest to us. Additionally our function has calculated the number of rows as 7197 so this is the total number of apps. The number of columns is 17 but there are only 16 different types of data so we will delete the first  As previously mentioned the Appstore data can be accessed from the kaggle data set homepage [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)



In [493]:
del(ios_header[0])
for row in ios_apps_data:
    del(row[0])
print(ios_header)
print('\n')
print(explore_data(ios_apps_data, 0, 2, rows_and_columns=True))

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


Number of rows: 7197
Number of columns: 16
None


In [494]:
print(google_header)
print('\n')
explore_data(google_apps_data, 0, 2, rows_and_columns=True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


Now the output above is four rows from the Google Playstore Data dataset. The columns 'App', 'Category', 'Rating', 'Reviews', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres' are of potential interest. Additionally, our function has caclulated that there are 10841 apps in the Google Playstore (subtract the header), and that the data describing them are sorted into 13 different categories (columns). Again further information is available here [Google Play Store data](https://www.kaggle.com/lava18/google-play-store-apps/home)

# Cleaning Data 
The google data set has a [discussion board](https://www.kaggle.com/lava18/google-play-store-apps/discussion) that informs us that there is an [entry](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) in the with an error in entry number 10472. 

In [495]:
print(google_header)
print('\n')
print(google_apps_data[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


As we can tell from the output above there is an error in 'Rating' (Overall user rating of the app (as when scraped)) which is the third column (index=2). The rating only goes from 0 to five so 19 is not a possible value

In [496]:
del(google_apps_data[10472])

Now we have to delete any duplicate apps that show up in either of the two data sets. Below is an example of a duplication of the app Slack.

In [497]:
print(google_header)
print('\n')
for app in google_apps_data:
    name = app[0]
    if name == 'Slack':
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


The differences between the three instances of the Slack app are in the fourth column in the 'Reviews' column. The amount of reviews in the first two instances are lower than in the last. We should delete the copies that are of the lowest value in Reviews as the represent outdated data. Before we do that we should count the amount of duplicates. Thus the expected number of deletions will be the number of duplicates, and the number of rows after the deletions should be the number of unique apps.

In [498]:
'''
Takes in a dataset and the index of the 'name' column. Iterates through the rows of the table and counts instances 
of duplicate entries. Prints the number of non-unique entries followed by the number of unique apps.
'''

def count_duplicates(dataset, name_index):
    unique_apps = []
    duplicate_apps = []
    
    #count the number of duplicates
    for app in dataset:
        name = app[name_index]
        if name not in unique_apps:
            unique_apps.append(name)
        else:
            duplicate_apps.append(name)
    print('Number of duplicate apps in given data set is: ', len(duplicate_apps))
    print('Number of unique apps in given data set is: ', len(unique_apps))
    
#call for google_apps_data
count_duplicates(google_apps_data,0)

Number of duplicate apps in given data set is:  1181
Number of unique apps in given data set is:  9659


Below is a function that will enable us to delete duplicates from any data set in a list of lists format that has a difference in reviews. 

In [499]:
'''
A function that takes a dataset with the header removed in a list of lists format as an input, and the index 
value of the column labeled 'Reviews'. It iterates through adding the duplicate apps to a dictionary with the 
name of the app as a key and the corresponding review count as a value, then it deletes duplicate entries 
that have a lower amount of reviews
'''
def delete_duplicates(dataset, name_index=0, reviews_index=3):
    dataset_clean = []
    already_added = []
    reviews_max = {}
    
    #Create a dictionary of the apps as a key paired with the maximum number of reviews as a value
    for app in dataset:
        name = app[name_index]
        number_reviews = float(app[reviews_index])
        if name not in reviews_max:
            reviews_max[name] = number_reviews
        elif name in reviews_max and reviews_max[name] < number_reviews:
            reviews_max[name] = number_reviews
            
    #Now loop through and select the apps that need to be deleted       
    for app in dataset:
        name = app[name_index]
        number_reviews = float(app[reviews_index])
        
        #You need the and condition to prevent ties of number of reviews from causing duplicates to remain
        if (reviews_max[name] == number_reviews) and (name not in already_added):
            dataset_clean.append(app)
            already_added.append(name)
    return dataset_clean

#set the google_apps_data to the cleaned data
google_apps_data_clean = delete_duplicates(google_apps_data)
len(google_apps_data_clean)

9659

### Removing non-English apps
Since we want to look at only apps directed toward an English-speaking audience we should remove any apps that are not made of [ASCII](https://en.wikipedia.org/wiki/ASCII) characters not in the range 0, to 127. This is the range of numerical indexes of all characters that belong to the common English language character set. We will perform this task using a function and applying it to our datasets.  

In [500]:
print(ios_apps_data[813][1])
print(ios_apps_data[6731][1])

print(google_apps_data_clean[4412][0])
print(google_apps_data_clean[7940][0])

AliExpress Shopping App
Idle Armies
中国語 AQリスニング
لعبة تقدر تربح DZ


In [501]:
#Will return true if input string is made up of characters only from the English language and false otherwise
def is_english(name):
    non_ASCII_count = 0
    
    #iterate through the characters of the string
    for char in name:
        if ord(char) > 127:
            non_ASCII_count += 1
            
    #3 was chosen since ord(😜) is>127 and we want to minimize the amount of apps that get deleted
    if non_ASCII_count > 3:
        return False
    else:
        return True

app_list = ['Instagram', '爱奇艺PPS -《欢乐颂2》电视剧热播', 'Docs To Go™ Free Office Suite', 'Instachat 😜']
english_test_list = list((map(is_english, test_list)))
english_test_list

[True, False, True, True]

In [502]:
#Now lets loops through both of our data sets with the new function we wrote and delete non-english apps
google_english = []
ios_english = []

for row in google_apps_data_clean:    
    name = row[0]
    
    if is_english(name):
        google_english.append(row)   

for row in ios_apps_data:
    #different index corresponds to name for ios
    name = row[1]
    
    if is_english(name):
        ios_english.append(row) 

print(len(google_english))
print(len(ios_english))


9614
6183


Our business model in this project is built around the ad-revenue from free apps. Hence, now we want to remove the non-free apps to isolate the free apps for our analysis.

In [503]:
free_google_apps = []
free_ios_apps = []

for app in google_english:
    if app[7] == '0':
        free_google_apps.append(app)
for app in ios_english:
    if app[4] == '0':
        free_ios_apps.append(app)

print(len(free_google_apps))
print(len(free_ios_apps))

8864
3222


### Identifying Genres
Our validation strategy for an application will be to construct a minimal version of an application, add it to the Google Play store, then if it is profitable after six months we will construction an iOS version and add it to the App Store. For this purpose we will first construct frequency tables to identify common genres. The index of 'prime_genre' for iOS apps is 11, and for google apps it is called 'Genres' with index 9, but there is also a 'Category' column with similar data that has an index of 1.

In [504]:
#Takes a dataset as a list of lists without a header and an index, and then returns a frequency table
def freq_table(dataset, index):
    genre_counts = {}
    count = 0
    
    #iterate through and create a dictionary of counts of genres
    for row in dataset:
        genre = row[index]
        count += 1
        if genre in genre_counts:
            genre_counts[genre] += 1 
        else:
            genre_counts[genre] = 1
    
    freq = {}
    #create a frequency table with percentages for each genre
    for genre in genre_counts:
        freq[genre] = 100*(genre_counts[genre])/(count)
    return freq

#Function for displaying a dataset as a sorted key-value tuple list
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [505]:
print('free iOS app Genre Frequencies')
display_table(free_ios_apps, 11)
print('\n')
print('free Google app Category Frequencies')
display_table(free_google_apps, 1)
print('\n')
print('free Google app Genre Frequencies')
display_table(free_google_apps, 9)

free iOS app Genre Frequencies
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.6623215394165114
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.017380509000621
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


free Google app Category Frequencies
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.700361010830325
MEDICAL : 3.53113718411552

For the iOS app store the most common free app genre is games with a commanding $54.86\%$. This is the strong majority, with the runner-up being Entertainment at $7.2\%$. In general it appears that there are more apps created for the purpose of entertainment rather than education or productivity. This doesn't necissarrily mean that these apps are less frequently used, just that there are more apps created that try to be 'fun'. It also makes sense that there would be numerous free game apps due to the very successful mobile [fremium](https://themanifest.com/app-development/app-monetization-using-freemium-business-model) model that works well for games. Additionally users are more likely to get bored playing the same game and want to switch.

For the google free apps sorted by categories it appears that the spread is much more even with Family having $19.33\%$, Game with $9.82\%$ and Tools with $8.61\%$, then there is a drop off where the next categories are all at between $4\%-2 \%$ In general it appears that there are less 'fun' apps and that the spread of free apps is more evenly distributed. However if we examine the Family category we will notice that there are many games designed for kids such as [Roblox](https://www.roblox.com/) and so the actual frequency of free game apps is closer to $30\%$.

For the google free apps sorted by genres the formerly dominant Games genre has dissappeared; however, if we take a closer look we'll notice that this is because it has been broken up into each category of game such as 'Action' $3.11\%$ or Arcade $1.91\%$. That is the major difference with the rest of the table having similar percentages to the frequency table sorted by categories. 

# Most Popular Genres of Free Apps
To figure out which genres of free apps are the most popular we can look at the number of installs, but for the iOS dataset this information is missing so we will use a workaround and take the number of user ratings as a proxy. The index of the total rating count is 4.

In [539]:
#iterate over the prime_genre column and create a frequency table
ios_freq = freq_table(free_ios_apps, 11)

for genre in ios_freq:
    #number of user ratings
    total = 0
    #number of apps specific to a genre
    len_genre = 0
    for row in free_ios_apps:
        genre_app = row[11]
        if genre_app == genre:
            ratings = float(row[5])
            total += ratings
            len_genre += 1
    average_num_users = total/len_genre
    print('The ' + str(genre) +' genre has roughly: ' + 
          str(int(average_num_users)) + ' average users per app.')


The Productivity genre has roughly: 21028 average users per app.
The Weather genre has roughly: 52279 average users per app.
The Shopping genre has roughly: 26919 average users per app.
The Reference genre has roughly: 74942 average users per app.
The Finance genre has roughly: 31467 average users per app.
The Music genre has roughly: 57326 average users per app.
The Utilities genre has roughly: 18684 average users per app.
The Travel genre has roughly: 28243 average users per app.
The Social Networking genre has roughly: 71548 average users per app.
The Sports genre has roughly: 23008 average users per app.
The Health & Fitness genre has roughly: 23298 average users per app.
The Games genre has roughly: 22788 average users per app.
The Food & Drink genre has roughly: 33333 average users per app.
The News genre has roughly: 21248 average users per app.
The Book genre has roughly: 39758 average users per app.
The Photo & Video genre has roughly: 28441 average users per app.
The Entertai

It looks like the Navigation genre has the most active users, but perhaps a better use of an advertisement campaing would be the number two most actively used as the Social networking genre. Users are actively engaged with their screens while using this type of application and are more likely to view adds without simply ignoring them. Upon closer inspection we realize that this is not as good of an idea as it may seem at first.

In [507]:
for app in free_ios_apps:
    genre = app[11]
    name = app[1]
    users = app[5]
    
    if genre == 'Social Networking':
        print(name, ':', users)

Facebook : 2974676
LinkedIn : 71856
Skype for iPhone : 373519
Tumblr : 334293
Match™ - #1 Dating App. : 60659
WhatsApp Messenger : 287589
TextNow - Unlimited Text + Calls : 164963
Grindr - Gay and same sex guys chat, meet and date : 23201
imo video calls and chat : 18841
Ameba : 269
Weibo : 7265
Badoo - Meet New People, Chat, Socialize. : 34428
Kik : 260965
Qzone : 1649
Fake-A-Location Free ™ : 354
Tango - Free Video Call, Voice and Chat : 75412
MeetMe - Chat and Meet New People : 97072
SimSimi : 23530
Viber Messenger – Text & Call : 164249
Find My Family, Friends & iPhone - Life360 Locator : 43877
Weibo HD : 16772
POF - Best Dating App for Conversations : 52642
GroupMe : 28260
Lobi : 36
WeChat : 34584
ooVoo – Free Video Call, Text and Voice : 177501
Pinterest : 1061624
知乎 : 397
Qzone HD : 458
Skype for iPad : 60163
LINE : 11437
QQ : 9109
LOVOO - Dating Chat : 1985
QQ HD : 5058
Messenger : 351466
eHarmony™ Dating App - Meet Singles : 11124
YouNow: Live Stream Video Chat : 12079
Cougar 

It appears that the average count of the apps in the social networking genre is a bit deceiving. There are a substantial number of apps that are in the marketplace, yet the vast majority do not have over 10,000 user ratings (which we are using as a proxy of total users). The weather genre is unlikely going to have users using it for an extended period of time, and users who are reading a book will be unlikely to want a product that contains adverstisements. A possible option is the Travel genre.

In [508]:
for app in free_ios_apps:
    name = app[1]
    genre = app[11]
    users = app[5]
    if genre == 'Travel':
        print(name, ':', users)

TripAdvisor Hotels Flights Restaurants : 56194
Yelp - Nearby Restaurants, Shopping & Services : 223885
Google Earth : 446185
Trainline UK: Live Train Times, Tickets & Planner : 248
BlaBlaCar - Trusted Carpooling : 397
DB Navigator : 512
Voyages-sncf.com : book train and bus tickets : 268
Southwest Airlines : 30552
Hotels & Vacation Rentals by Booking.com : 31261
Uber : 49466
Fly Delta : 8094
Airbnb : 22302
iExit Interstate Exit Guide : 1798
GasBuddy : 145549
HotelTonight - Great Deals on Last Minute Hotels : 32341
Expedia Hotels, Flights & Vacation Package Deals : 10278
Viator Tours & Activities : 1839
United Airlines : 5748
飞猪 : 154
HISTORY Here : 685
Ryanair - Cheapest Fares : 175
Lyft : 46922
铁路12306 : 177
Urlaubspiraten : 188
VoiceTra(Voice Translator) : 0
Ab in den Urlaub – Pauschalreisen günstig buchen : 22
Fluege.de - Finde den billigsten Flug : 0
MiFlight™ – Airport security line wait times at checkpoints for domestic and international travelers : 493
FlixBus - bus travel in Eu

Perhaps an app that could potentially be succesful is a historical guide that focuses on museums, national parks, and places of cultural importance. It could bring in add revenue from restaurants, tourism sectors while providing users with historical summaries of a locale. There currently only the app History Here which likely has some summaries but no interaction with other organizations. Now we will look at the most popular apps by genre on Google Play.

In [509]:
display_table(free_google_apps, 5)

1,000,000+ : 15.72653429602888
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.1985559566787
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.772111913357401
5,000+ : 4.512635379061372
10+ : 3.542418772563177
500+ : 3.2490974729241877
50,000,000+ : 2.3014440433212995
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.7897111913357401
1+ : 0.5076714801444043
500,000,000+ : 0.27075812274368233
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


From this printout from the table we can tell that the counts in the google app store are a categorical variable so we do not have the exact user counts. We will approximate this by lowballing the estimate for every single app e.g.: 100,000+ = 100,000 installs.

In [516]:
#Create a google frequency table
google_freq = freq_table(free_google_apps, 1)
google_freq

{'ART_AND_DESIGN': 0.6430505415162455,
 'AUTO_AND_VEHICLES': 0.9250902527075813,
 'BEAUTY': 0.5979241877256317,
 'BOOKS_AND_REFERENCE': 2.1435018050541514,
 'BUSINESS': 4.591606498194946,
 'COMICS': 0.6204873646209387,
 'COMMUNICATION': 3.237815884476534,
 'DATING': 1.861462093862816,
 'EDUCATION': 1.1620036101083033,
 'ENTERTAINMENT': 0.9589350180505415,
 'EVENTS': 0.7107400722021661,
 'FINANCE': 3.700361010830325,
 'FOOD_AND_DRINK': 1.2409747292418774,
 'HEALTH_AND_FITNESS': 3.079873646209386,
 'HOUSE_AND_HOME': 0.8235559566787004,
 'LIBRARIES_AND_DEMO': 0.9363718411552346,
 'LIFESTYLE': 3.9034296028880866,
 'GAME': 9.724729241877256,
 'FAMILY': 18.907942238267147,
 'MEDICAL': 3.5311371841155235,
 'SOCIAL': 2.6624548736462095,
 'SHOPPING': 2.2450361010830324,
 'PHOTOGRAPHY': 2.9444945848375452,
 'SPORTS': 3.395758122743682,
 'TRAVEL_AND_LOCAL': 2.33528880866426,
 'TOOLS': 8.461191335740072,
 'PERSONALIZATION': 3.3167870036101084,
 'PRODUCTIVITY': 3.892148014440433,
 'PARENTING': 0.65

In [538]:
#loop over the categories in google_english
for category in google_freq:
    #number of user ratings
    total = 0
    #number of apps specific to a genre
    len_category = 0
    
    for app in free_google_apps:
        category_app = app[1]
        if category_app == category:
            #make sure that there aren't any non integer values
            installs_str = app[5].replace(',', '')
            total += float(installs_str.replace('+',''))
            #inrement the count of the number of apps in the category
            len_category += 1
    #summed total of users in category/number apps in category
    average_num_users = total/len_category
    print('The ' + str(category) +' category has roughly: ' + 
          str(int(round(average_num_users,0))) + ' users on average per app.')
    
    

The ART_AND_DESIGN category has roughly: 1986335 users on average per app.
The AUTO_AND_VEHICLES category has roughly: 647318 users on average per app.
The BEAUTY category has roughly: 513152 users on average per app.
The BOOKS_AND_REFERENCE category has roughly: 8767812 users on average per app.
The BUSINESS category has roughly: 1712290 users on average per app.
The COMICS category has roughly: 817657 users on average per app.
The COMMUNICATION category has roughly: 38456119 users on average per app.
The DATING category has roughly: 854029 users on average per app.
The EDUCATION category has roughly: 1833495 users on average per app.
The ENTERTAINMENT category has roughly: 11640706 users on average per app.
The EVENTS category has roughly: 253542 users on average per app.
The FINANCE category has roughly: 1387692 users on average per app.
The FOOD_AND_DRINK category has roughly: 1924898 users on average per app.
The HEALTH_AND_FITNESS category has roughly: 4188822 users on average pe

To build on our app idea from before lets explore the TRAVEL_AND_LOCAL category and only look at apps that have a large number of installs.

In [544]:
large_n_installs = ['50,000,000+', '10,000,000+', '5,000,000+', '1,000,000+']

for app in free_google_apps:
    cat = app[1]
    name = app[0]
    installs = app[5]
    if cat == 'TRAVEL_AND_LOCAL' and (installs in large_n_installs):
        print(name, ':', installs)


trivago: Hotels & Travel : 50,000,000+
Hopper - Watch & Book Flights : 5,000,000+
TripIt: Travel Organizer : 1,000,000+
CityMaps2Go Plan Trips Travel Guide Offline Maps : 1,000,000+
KAYAK Flights, Hotels & Cars : 10,000,000+
Hostelworld: Hostels & Cheap Hotels Travel App : 1,000,000+
Google Trips - Travel Planner : 5,000,000+
GPS Map Free : 5,000,000+
GasBuddy: Find Cheap Gas : 10,000,000+
Southwest Airlines : 5,000,000+
AT&T Navigator: Maps, Traffic : 10,000,000+
VZ Navigator : 50,000,000+
KakaoMap - Map / Navigation : 10,000,000+
AirAsia : 10,000,000+
Expedia Hotels, Flights & Car Rental Travel Deals : 10,000,000+
Goibibo - Flight Hotel Bus Car IRCTC Booking App : 10,000,000+
Allegiant : 1,000,000+
Amtrak : 1,000,000+
JAL (Domestic and international flights) : 1,000,000+
Flight & Hotel Booking App - ixigo : 5,000,000+
Wisepilot for XPERIA™ : 5,000,000+
VZ Navigator for Galaxy S4 : 5,000,000+
MAIN : 1,000,000+
Yoriza Pension - travel, lodging, pension, camping, caravan, pool villas ac

There appear to be a significantly greater number of apps in the Travel section in the google appstore. Upon a closer look however we can notice that alot of them are related to GPS, navgiation, and hotels. There are no apps that have a potential to compete with our app's share of the market, thus, our app idea is still feasibly viable.

# Conclusion

We have now analyzed the two publically available datasets of the Apple and Google app stores. Based off of our analysis our recommendation for an app idea that could be profitable in both markets focuses on the Travel sector. Specifically a potential application is one that focuses on museums, national parks, and places of cultural importance. It would feature historical blurbs about the surroundings and would be a modernized version of a Travel Guide. 