# Profitable App Profiles for the App Store and Google Play Markets

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and we design them for an English-speaking audience. Our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

## Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources with collecting new data ourselves, we should first try to see whether we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose:

- [A dataset](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately 10,000 Android apps from Google Play. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).

- [A dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

Let's start by opening the two data sets and then continue with exploring the data.

In [1]:
from csv import reader

#Apple Store Dataset
opened_file = open('AppleStore.csv', encoding = 'utf-8')
read_file = reader(opened_file)
ios_apps = list(read_file)
ios_header = ios_apps[0]
ios_apps = ios_apps[1:]

#Google Play Store Dataset
opened_file = open('googleplaystore.csv', encoding = 'utf-8')
read_file = reader(opened_file)
android_apps = list(read_file)
android_header = android_apps[0]
android_apps = android_apps[1:]

To make it easier to explore the two data sets, we'll first write a function named  `explore_data()` that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any data set.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(android_header)
print('\n')
explore_data(android_apps, 10472, 10473, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Number of rows: 10841
Number of columns: 13


Now let's take a look at the App Store data set.

In [3]:
print(ios_header)
print('\n')
explore_data(ios_apps, 0, 2, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7197
Number of columns: 16


## Deleting Wrong Data

The Google Play data set has a dedicated discussion section, and we can see that one of the discussions outlines an error for row 10472. Let's print this row and compare it against the header and another row that is correct.

In [4]:
print(len(android_apps))
del android_apps[10472]
print(len(android_apps))

10841
10840


### Part One

If we explore the Google Play data set long enough, we'll find that some apps have more than one entry. For instance, the application Instagram has four entries:

In [5]:
for app in android_apps:
    name = app[0]
    
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Now we are going to count the number of duplicate entries in the android dataset:

In [6]:
duplicate_apps = []
unique_apps = []

for app in android_apps:
    name = app[0]
    
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
            
print(f'Number of duplicate apps: {len(duplicate_apps)}')

Number of duplicate apps: 1181


We don't want to count certain apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app. 

If you examine the rows we printed for the Instagram app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews. The higher the number of reviews, the more recent the data should be. Then rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

To do that, we will:

- Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
- Use the information stored in the dictionary and create a new dataset, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

### Part Two

Let's start by building the dictionary.

In [7]:
reviews_max = {}

for app in android_apps:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

After we remove the duplicates, we should be left with 9,659 rows:

In [8]:
print(f'Length: {len(android_apps) - 1181}')
print(f'Expected length: {len(reviews_max)}')

Length: 9659
Expected length: 9659


Now, let's use the reviews_max dictionary to remove the duplicates. For the duplicate cases, we'll only keep the entries with the highest number of reviews. In the code cell below:

* We start by initializing two empty lists, android_clean and already_added.
* We loop through the android data set, and for every iteration:
    * We isolate the name of the app and the number of reviews.
    * We add the current row (app) to the android_clean list, and the app name (name) to the already_added list if:
        * The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and
        * The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.

In [9]:
android_clean = []
already_added = []

for app in android_apps:
    name = app[0]
    n_reviews = float(app[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

Now let's quickly explore the new data set, and confirm that the number of rows is 9,659.

In [10]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


We have 9659 rows, just as expected.

## Removing Non-English Apps

### Part One

If we explore the data long enough, we'll find that both datasets have apps with names that suggest they are not designed for an English-speaking audience.

In [11]:
print(ios_apps[813][1])
print(ios_apps[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


We're not interested in keeping these apps, so we'll remove them. One way to do this is to remove each app with a name containing a symbol that isn't commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

Each character we use in a string has a corresponding number associated with it. The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system. Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. 

In [12]:
def detect_english_name(name):
    
    for char in name:
        if ord(char) > 127:
            return False
        
    return True

Now we are going to test if the function detects the following words as English or non-English:

In [13]:
print(detect_english_name('instagram'))
print(detect_english_name('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(detect_english_name('Docs To Go™ Free Office Suite'))
print(detect_english_name('Instachat 😜'))

True
False
False
False


We wrote a function that detects non-English app names, but we saw that the function couldn't correctly identify certain English app names like `'Docs To Go™ Free Office Suite'` and `'Instachat 😜'`. This is because emojis and characters like `™` fall outside the ASCII range and have corresponding numbers over 127.

In [14]:
print(ord('™'))
print(ord('😜'))

8482
128540


### Part Two

If we're going to use the function we've created, we'll lose useful data since many English apps will be incorrectly labeled as non-English. To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range.

In [15]:
def detect_english_name(name):
    outside_ascii = 0
    
    for char in name:
        if ord(char) > 127:
            outside_ascii += 1
        
    if outside_ascii > 3:
        return False
    
    return True

Now we are going to test if the function detects the following words as English or non-English:

In [16]:
print(detect_english_name('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(detect_english_name('Docs To Go™ Free Office Suite'))
print(detect_english_name('Instachat 😜'))

False
True
True


We will use the new function to filter the non-English applications from both data sets.

In [17]:
android_english_apps = []
ios_english_apps = []

for app in android_clean:
    name = app[0]
    
    if detect_english_name(name):
        android_english_apps.append(app)
        
for app in ios_apps:
    name = app[1]
    
    if detect_english_name(name):
        ios_english_apps.append(app)

Now let's quickly explore the new data set, and see how many rows are left for each data set

In [18]:
explore_data(android_english_apps, 0, 2, True)
print('\n')
explore_data(ios_english_apps, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6183
Number of columns: 16


## Isolating the Free Apps

 Our datasets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

In [19]:
android_free_apps = []
ios_free_apps = []

for app in android_english_apps:
    price = app[7]
    
    if price == '0':
        android_free_apps.append(app)

for app in ios_english_apps:
    price = app[4]
    
    if price == '0.0':
        ios_free_apps.append(app)

Let's see how many apps have remained in each data set

In [20]:
print(f'Length of Google Play Store dataset before: {len(android_english_apps)}')
print(f'Length of Google Play Store dataset now: {len(android_free_apps)}')

Length of Google Play Store dataset before: 9614
Length of Google Play Store dataset now: 8864


In [21]:
print(f'Length of App Store dataset before: {len(ios_english_apps)}')
print(f'Length of App Store dataset now: {len(ios_free_apps)}')

Length of App Store dataset before: 6183
Length of App Store dataset now: 3222


## Most Common Apps by Gengre

### Part One

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we then develop it further.
3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the `prime_genre` column of the App Store data set, and the `Genres` and `Category` columns of the Google Play data set.

### Part Two

We'll build two functions we can use to analyze the frequency tables:

* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in a descending order

In [22]:
def freq_table(dataset, index):
    frequency_table = {}
    
    for row in dataset:
        value = row[index]
        frequency_table[value] = frequency_table.get(value, 0) +1
        
    for key in frequency_table:
        frequency_table[key] /= len(dataset)
        frequency_table[key] *= 100
    
    return frequency_table

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We'll use the `display_table` function to display the frequency table of the columns `prime_genre`, `Genres`, and `Category`.

### Part Three

We already generated frequency tables for the columns `prime_genre`, `Genres`, and `Category`. We'll now focus on analyzing these frequency tables.

In [23]:
display_table(ios_free_apps, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


As we can see, the most common genre in the case of the App Store are Games with 58%. The next most common genre is Entertainment with 7% and Photo & Video with 4% (we must bear in mind that this is in the case of free applications in English so we could not reach the same conclusion if we analyze all the applications). 

With this we can deduce that most of the free applications in English in the App Store are designed for entertainment(games, photo and video, social networking, etc.) rather than for practical purposes(education, shopping, utilities, etc.). Despite having more applications designed for fun, it does not mean that they have a larger audience.

In [24]:
display_table(android_free_apps, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [25]:
display_table(android_free_apps, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In the case of free applications in English from the google play store, most of the applications are designed for practical purposes.

Unlike the applications of the App Store, where there was a big difference between the number of applications designed for fun and the number of applications for practical purposes, in this case, there are more applications designed for practical purposes, however there is not a big difference as in the case of the App Store.

## Most Popular Apps by Genre on the App Store

Now, we'd like to determine the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. 

In [26]:
prime_genre_table = freq_table(ios_free_apps, -5)
genres = {}

for genre in prime_genre_table:
    total = 0
    len_genre = 0
    
    for app in ios_free_apps:
        genre_app = app[-5]
        
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
        
    avg_rating = total / len_genre
    genres[genre] = avg_rating

table_display = []

for key in genres:
    key_val_as_tuple = (genres[key], key)
    table_display.append(key_val_as_tuple)
        
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


The genre with the highest average installs is Navigation. Let's explore a little what applications there are.

In [27]:
for app in ios_free_apps:
    if app[-5] == 'Navigation':
        print(f'{app[1]} : {app[5]}')

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


From what we can see, the **Navigation** genre is in first place thanks to _Waze_, _Google Maps_, and _Geocaching_. The genre is not as popular as it seemed. The number of installs is because of a few applications. This does not mean that **Navigation** is the most popular genre.

Let's try with **Social Networking**:

In [28]:
for app in ios_free_apps:
    if app[-5] == 'Social Networking':
        print(f'{app[1]} : {app[5]}')

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

As you can see, in this case, although there are more applications, we can see that the applications that have many installations are popular such as _WhatsApp_, _Facebook_, _Pinterest_. So it doesn’t help us much to know which genre is the most popular.

Finally, let's try with **Games**:

In [29]:
for app in ios_free_apps:
    if app[-5] == 'Games' and int(app[5]) > 60000:
        print(f'{app[1]} : {app[5]}')

Clash of Clans : 2130805
Temple Run : 1724546
Candy Crush Saga : 961794
Angry Birds : 824451
Subway Surfers : 706110
Solitaire : 679055
CSR Racing : 677247
Crossy Road - Endless Arcade Hopper : 669079
Injustice: Gods Among Us : 612532
Hay Day : 567344
PAC-MAN : 508808
DragonVale : 503230
Head Soccer : 481564
Despicable Me: Minion Rush : 464312
The Sims™ FreePlay : 446880
Sonic Dash : 418033
8 Ball Pool™ : 416736
Tiny Tower - Free City Building : 414803
Jetpack Joyride : 405647
Bike Race - Top Motorcycle Racing Games : 405007
Kim Kardashian: Hollywood : 397730
Trivia Crack : 393469
WordBrain : 391401
Sniper 3D Assassin: Shoot to Kill Gun Game : 386521
Flow Free : 373857
Geometry Dash Lite : 370370
▻Sudoku : 359832
Fruit Ninja® : 327025
Pixel Gun 3D : 301182
Temple Run 2 : 295211
My Horse : 293857
Word Cookies! : 287095
Dragon City Mobile : 277268
The Simpsons™: Tapped Out : 274501
Plants vs. Zombies™ 2 : 267394
Clash Royale : 266921
Pokémon GO : 257627
CSR Racing 2 : 257100
Star Wars™: 

Well, in this case it seems that **Games** is quite a popular genre so it can be useful to create a game

## Most popular Apps by Genre on Google Play

We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.). We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. 

In [30]:
import string

genres_table = freq_table(android_free_apps, 1)
genres = {}

for category in genres_table:
    total = 0
    len_category = 0
    
    for app in android_free_apps:
        category_app = app[1]
        
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.translate(n_installs.maketrans('', '', string.punctuation))
            total += float(n_installs)
            len_category += 1
            
    avg_install = total / len_category 
    genres[category] = avg_install

table_display = []

for key in genres:
    key_val_as_tuple = (genres[key], key)
    table_display.append(key_val_as_tuple)
        
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

As well as in the _App Store_ the first positions **COMMUNICATION**, **VIDEO_PLAYERS**, **SOCIAL** are there thanks to very popular applications such as _WhatsApp_, _Facebook_, _YouTube_.

In [31]:
for app in android_free_apps:
    if app[1] == "COMMUNICATION" and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(f'{app[0]} : {app[5]}')

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [32]:
for app in android_free_apps:
    if app[1] == "VIDEO_PLAYERS" and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(f'{app[0]} : {app[5]}')

YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Motorola FM Radio : 100,000,000+


In [33]:
for app in android_free_apps:
    if app[1] == "SOCIAL" and (app[5] == '1,000,000,000+'
                               or app[5] == '500,000,000+'
                               or app[5] == '100,000,000+'):
        print(f'{app[0]} : {app[5]}')

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+


However, the **GAME** genre seems to work well here as well.

In [34]:
for app in android_free_apps:
    if app[1] == "GAME" and (app[5] == '1,000,000,000+'
                             or app[5] == '500,000,000+'
                             or app[5] == '100,000,000+'):
        print(f'{app[0]} : {app[5]}')

Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Piano Tiles 2™ : 100,000,000+
Pokémon GO : 100,000,000+
Extreme Car Driving Simulator : 100,000,000+
Trivia Crack : 100,000,000+
Angry Birds 2 : 100,000,000+
Candy Crush Saga : 500,000,000+
8 Ball Pool : 100,000,000+
Subway Surfers : 1,000,000,000+
Candy Crush Soda Saga : 100,000,000+
Clash Royale : 100,000,000+
Clash of Clans : 100,000,000+
Plants vs. Zombies FREE : 100,000,000+
Pou : 500,000,000+
Flow Free : 100,000,000+
My Talking Angela : 100,000,000+
slither.io : 100,000,000+
Cooking Fever : 100,000,000+
Yes day : 100,000,000+
Score! Hero : 100,000,000+
Dream League Soccer 2018 : 100,000,000+
My Talking Tom : 500,000,000+
Sniper 3D Gun Shooter: Free Shooting Games - FPS : 100,000,000+
Zombie Tsunami : 100,000,000+
Helix Jump : 100,000,000+
Crossy Road : 100,000,000+
Temple Run 2 : 500,000,000+
Talking Tom Gold Run : 100,000,000+
Agar.io : 100,000,000+
Bus Rush: Subway Edition : 100,000,00

So we can say that creating a game would be profitable on both platforms since besides being a popular genre, a game keeps our audience in our application for a long time, which would make us receive more profits.

## Conclusion

In this project we analyze the data on the applications of the App Store and Google Play Store in search of getting enough information to find an application profile that is profitable on both platforms.

We concluded that creating a game would be a profitable option on both platforms since it is a popular genre and it keeps the audience for a long time in our application which is beneficial considering that our source of income is ads.