# Profitable App Profiles for the App Store and Google Play Markets

Our goal with this project is to analyse the most profitable market for launching a new app. We have data about existing apps from both Google Play Store and Apple Store. We will analyse these apps and their data to come up with viable market for creatibe a new app. 

We are looking to build free app and our main source of revenue consists of in-app ads. This means that our revenue for app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

# Opening and Exploring the Data

As of 2023, Google Play Store has 3.553 million mobile apps accessible for Android users, while 1.642 million available on the Apple App Store. Analysing all these apps will require lot of resources. So instead we will be analysing sample datasets of less than 10K apps for both platforms. 

There are 2 datasets with relevant data for our project. 

1) Google Play Store dataset is dowloaded from here: https://www.kaggle.com/datasets/lava18/google-play-store-apps
2) Apple App Store dataset is downloaded from here https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps

We will use functions from csv module to import data from csv files. 

In [1]:
from csv import reader

#importing apple play store data

opened_file = open('C:/Users/Linus/Documents/Sheets/apps_data/AppleStore.csv', encoding='utf-8')
read_file = reader(opened_file)
apple = list(read_file)
apple_header = apple[0]     #header
apple = apple[1:]           #removing header for analysis

#importing android/Google play store data

opened_file = open('C:/Users/Linus/Documents/Sheets/apps_data/googleplaystore.csv', encoding='utf-8')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]   #header
android = android[1:]           #removing header for analysis

To better understand the data and make it easier to explore datasets, we will define `explore_data()` function.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Let's take a look at Apple Store data:

In [3]:
print(apple_header)
print('\n')
explore_data(apple, 0, 5, True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


['5', '282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+', 'Reference', '37'

We can see that there  are 7197 apps in the dataset and each of them have interesting datapoints including `price`, `user_rating`, `prime_genre` which could be useful for our analysis.

Now analyse Google play store apps data:

In [4]:
print(android_header)
print('\n')
explore_data(android, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

There are 10841 Google play store apps. `Category`, `rating`, `Installs` and `Reviews` columns will be great data for our analysis. 

# Data Cleaning

In the [community discussion](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015) for the dataset, 1 user has discovered the error in row number 10472 of Google Play store apps. Let's check it.

In [5]:
print(android_header) #header
print('\n')
print(android[0]) #correct row
print('\n')
print(android[10472]) #incorrect row

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Indeed, the row has errors. `Category` for row 10472 has rating instead of category and the following column values have shifted to left column. 

This will cause problems in our analysis. So we will delete the row. 

<div class="alert alert-warning">
    Warning: Run the below cell only once.
</div>


In [6]:
print(len(android)) # rows before deletion
del android[10472]  # delete the row
print(len(android)) # rows after deletion

10841
10840


# Removing the duplicate entries

Duplicate values can lead to bad data. So we will check if any app is entered twice and remove them.

In [7]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Example of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Example of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


We don't want to count same apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app. One thing we could do is remove the duplicate rows randomly, but we could probably find a better way.

We won't remove rows randomly, rather we'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more reliable the ratings.

To do that, we will create a dictionary where each key is a unique app name, and the value is the highest number of reviews of that app.

Then use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)

In [8]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In the previous code cell, we sae that there are 1181 cases where app appears more than once, so the lenght of our unique apps dictionary should be equal to difference between the lenght of our data and 1181.  

In [9]:
print('Expected length:', len(android)-1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


Now we have to remove duplicate entries. We can use `reviews_max` dictionary for achieving it. 

We can use for loop to iterate over every app in `android` list and add rows that have maximum reviews for particular app to list `android_clean`. We will also `alerady_added`to avoid it being iterated again. 


In [10]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

We can explore the new dataset now to confirm that there are only 9659 rows now.

In [11]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


# Removing Non-English Apps

If you explore the data sets enough, you'll notice names of come apps are not in English. Below we can see couple of examples:

In [12]:
print(apple[814][2])
print(apple[6734][2])

print(android_clean[4412][0])
print(android_clean[7940][0])

搜狐新闻—新闻热点资讯掌上阅读软件
エレメンタル ファンタジー - 高精細３ＤアクションＲＰＧ
中国語 AQリスニング
لعبة تقدر تربح DZ


Since our app is intended for English audience, we are not interested in these apps. So we will remove them.

The characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.

We will build the `is_english()` funtion that will use in-built `ord()` function to find our corresponding ASCII number of each character.

In [13]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

print(is_english('Data Scientist'))
print(is_english('डेटा वैज्ञानिक'))

True
False


The function seems to work fine, but some English app names use emojis or other symbols (™, —, –, etc.) that fall outside of the ASCII range. Because of this, we might remove useful apps if we use the function in its current form.

In [14]:
print(is_english('FIFA World Cup™'))
print(is_english('Funny Friends 😜'))

False
False


To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters:

In [15]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print(is_english('FIFA World Cup™'))
print(is_english('Funny Friends 😜😜😜'))

True
True


The function is still not perfect, and some non-English apps might get past our filter, but this seems good enough at this point in our analysis — we shouldn't spend too much time on optimization at this point.

Below, we use the is_english() function to filter out the non-English apps for both data sets. 

We will initiate 2 empty lists `android_english` and `apple_english`. Then iterate over earlier datasets and only add apps for which `is_english()` returns True. 

In [16]:
android_english = []
apple_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)

for app in apple:
    name = app[2]
    if is_english(name):
        apple_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n\n')
explore_data(apple_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13



['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '18

We are left with 9614 Android & 6183 Apple apps. 

# Isloating the Free Apps

As we have mentioned in introduction, we want to build a free app. So our analysis must be confined to free apps only. But both datasets contain both free and non-free apps, so we need to isolate the free apps. 

Below we will create empty lists and append the apps if their price is `0`.

In [17]:
android_final = []
apple_final = []

for app in android_english:
    price  = app[7]
    if price == '0':
        android_final.append(app)
        
for app in apple_english:
    price  = app[5]
    if price == '0':
        apple_final.append(app)
        
print(len(android_final))
print(len(apple_final))

8864
3222


We are now left with 8864 Android & 3222 Apple apps which should be enough sample size for our analysis.

# Most Common Apps by Genre

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we then develop it further.
3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the `prime_genre` column of the App Store data set, and the `Genres` and `Category` columns of the Google Play data set.


We'll build two functions we can use to analyze the frequency tables:

One function to generate frequency tables that show percentages. Another function that we can use to display the percentages in a descending order.

In [18]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages  = {}
    for key in table:
        percentage = (table[key] / total) * 10
        table_percentages[key] = percentage
        
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

# Most Common Apps by Genre: Analysis


We start by examining the frequency table for `prime_genre` column of Apple store data.

In [19]:
display_table(apple_final, -5)

Games : 5.816263190564866
Entertainment : 0.7883302296710117
Photo & Video : 0.4965859714463067
Education : 0.3662321539416512
Social Networking : 0.32898820608317814
Shopping : 0.260707635009311
Utilities : 0.25139664804469275
Sports : 0.21415270018621976
Music : 0.2048417132216015
Health & Fitness : 0.20173805090006208
Productivity : 0.1738050900062073
Lifestyle : 0.15828677839851024
News : 0.1334574798261949
Travel : 0.12414649286157667
Finance : 0.111731843575419
Weather : 0.08690254500310365
Food & Drink : 0.08069522036002483
Reference : 0.0558659217877095
Business : 0.05276225946617008
Book : 0.04345127250155183
Navigation : 0.0186219739292365
Medical : 0.0186219739292365
Catalogs : 0.012414649286157667


We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

Let's continue by examining the `Genres` and `Category` columns of the Google Play data set (two columns which seem to be related).

In [20]:
display_table(android_final, 1) #category

FAMILY : 1.8907942238267148
GAME : 0.9724729241877256
TOOLS : 0.8461191335740071
BUSINESS : 0.45916064981949456
LIFESTYLE : 0.39034296028880866
PRODUCTIVITY : 0.3892148014440433
FINANCE : 0.3700361010830325
MEDICAL : 0.35311371841155237
SPORTS : 0.33957581227436817
PERSONALIZATION : 0.3316787003610108
COMMUNICATION : 0.3237815884476535
HEALTH_AND_FITNESS : 0.3079873646209386
PHOTOGRAPHY : 0.2944494584837545
NEWS_AND_MAGAZINES : 0.27978339350180503
SOCIAL : 0.26624548736462095
TRAVEL_AND_LOCAL : 0.23352888086642598
SHOPPING : 0.22450361010830325
BOOKS_AND_REFERENCE : 0.21435018050541516
DATING : 0.1861462093862816
VIDEO_PLAYERS : 0.17937725631768955
MAPS_AND_NAVIGATION : 0.13989169675090252
FOOD_AND_DRINK : 0.12409747292418771
EDUCATION : 0.11620036101083032
ENTERTAINMENT : 0.09589350180505414
LIBRARIES_AND_DEMO : 0.09363718411552346
AUTO_AND_VEHICLES : 0.09250902527075812
HOUSE_AND_HOME : 0.08235559566787004
WEATHER : 0.08009927797833935
EVENTS : 0.07107400722021662
PARENTING : 0.06543

The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

Even so, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the `Genres` column:

In [21]:
display_table(android_final, -4)

Tools : 0.8449909747292418
Entertainment : 0.6069494584837545
Education : 0.5347472924187725
Business : 0.45916064981949456
Productivity : 0.3892148014440433
Lifestyle : 0.3892148014440433
Finance : 0.3700361010830325
Medical : 0.35311371841155237
Sports : 0.34634476534296027
Personalization : 0.3316787003610108
Communication : 0.3237815884476535
Action : 0.31024368231046934
Health & Fitness : 0.3079873646209386
Photography : 0.2944494584837545
News & Magazines : 0.27978339350180503
Social : 0.26624548736462095
Travel & Local : 0.23240072202166065
Shopping : 0.22450361010830325
Books & Reference : 0.21435018050541516
Simulation : 0.20419675090252706
Dating : 0.1861462093862816
Arcade : 0.18501805054151624
Video Players & Editors : 0.17712093862815884
Casual : 0.1759927797833935
Maps & Navigation : 0.13989169675090252
Food & Drink : 0.12409747292418771
Puzzle : 0.1128158844765343
Racing : 0.09927797833935019
Role Playing : 0.09363718411552346
Libraries & Demo : 0.09363718411552346
Auto 

The difference between the `Genres` and the `Category` columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

**Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.**

# Most Popular Apps by Genre on the App Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` app.

Below, we calculate the average number of user ratings per app genre on the App Store:

In [22]:
genres_apple = freq_table(apple_final, -5)

for genre in genres_apple:
    total = 0
    len_genre = 0
    for app in apple_final:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[6])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Productivity : 21028.410714285714
Weather : 52279.892857142855
Shopping : 26919.690476190477
Reference : 74942.11111111111
Finance : 31467.944444444445
Music : 57326.530303030304
Utilities : 18684.456790123455
Travel : 28243.8
Social Networking : 71548.34905660378
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Games : 22788.6696905016
Food & Drink : 33333.92307692308
News : 21248.023255813954
Book : 39758.5
Photo & Video : 28441.54375
Entertainment : 14029.830708661417
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Navigation : 86090.33333333333
Medical : 612.0
Catalogs : 4004.0


# Most Popular Apps by Genre on Google Play

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):



In [23]:
display_table(android_final, 5) # the Installs columns


1,000,000+ : 1.572653429602888
100,000+ : 1.1552346570397112
10,000,000+ : 1.0548285198555956
10,000+ : 1.01985559566787
1,000+ : 0.8393501805054151
100+ : 0.6915613718411552
5,000,000+ : 0.6825361010830324
500,000+ : 0.5561823104693141
50,000+ : 0.4772111913357401
5,000+ : 0.4512635379061372
10+ : 0.35424187725631767
500+ : 0.3249097472924187
50,000,000+ : 0.23014440433212996
100,000,000+ : 0.21322202166064982
50+ : 0.19178700361010828
5+ : 0.078971119133574
1+ : 0.05076714801444043
500,000,000+ : 0.02707581227436823
1,000,000,000+ : 0.02256317689530686
0+ : 0.004512635379061372
0 : 0.001128158844765343


One problem with this data is that is not precise. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to get an idea which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

To perform computations, however, we'll need to convert each install number to float — this means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category).

In [24]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

# Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.