# Analyzing Mobile App Data

## About this Project

Our aim in this project is to find a profitable app profile for both **App Store** and **Google Play** markets. We are working as data analyst for company that builds **Android** and **IOS** apps and our goal is to provide developer team with idea of profitable app with respect of the kind of apps they build.

Our company build apps that can be downloaded and installed, due to this our only revenue consist of in-app ads. This means that our revenue for any app will depend on the number of users of our app. Our main goal in this project is to help our developer team to understand what kinds of apps should attract more users.

## Opening and Exploring the Data

First things first, let's open datasets and search through it.
For this project we use free datasets, one for [App Store](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps) and one for [Google Play](https://www.kaggle.com/datasets/lava18/google-play-store-apps).
* [App Store](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps) dataset containing data about approximately seven thousand iOS apps. You can download this dataset directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).
* [Google Play](https://www.kaggle.com/datasets/lava18/google-play-store-apps) dataset containing data about approximately ten thousand Android apps from Google Play. You can download this dataset directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)

In [1]:
from csv import reader

In [2]:
def open_dataset(dataset_name, header = False):
    opened_file = open(dataset_name, encoding='utf8')
    read_file = reader(opened_file)
    apps_data = list(read_file)
    if header:
        header = apps_data[0]
        apps_data = apps_data[1:]
        return header, apps_data
    else:
        apps_data = apps_data[1:]
        return apps_data

In [3]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
apple_dataset = open_dataset('C:/Users/feelingcxld Study/my_datasets/AppleStore.csv')
google_dataset = open_dataset('C:/Users/feelingcxld Study/my_datasets/googleplaystore.csv')

Here's a few of rows these datasets have.

In [5]:
print('App Store dataset:')
explore_data(apple_dataset, 0, 5, True)
print('\n')
print('Google Play dataset:')
explore_data(google_dataset, 0, 5, True)

App Store dataset:
['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


['5', '282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


Number of rows: 7197
Number of columns: 17


Google Play dataset:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Ar

As we can see in **App Store** dataset we have 7197 rows of apps and for **Google Play** we have 10841.

## Deleting Wrong Data

In [Discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion) section about **Google Play** dataset we find one thread that describes an error for a certain row. You can find it  by using this [this link](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015). We should remove incorrect data in our datasets to avoid misleading or incorrect resulst.

To remove wrong entry we first print it to make sure this is the one that was told in discussion thread.
Then we just delete this entry from our list.

In [6]:
print(google_dataset[10472])
del google_dataset[10472]
print(google_dataset[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


## Removing Duplicate Entries

### Part one: Counting Duplicates

If we explore our **Google Play** dataset we can find many duplicates. For instance, **Instagram** has four entries.

In [7]:
for app in google_dataset:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Using simple loop, we can determine how many duplicates our **Google Play** dataset has.

In [8]:
duplicate_apps = list()
unique_apps = list()

for app in google_dataset:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Expected length:', len(google_dataset) - 1181)

Number of duplicate apps: 1181


Expected length: 9659


There's roughly 1181 duplicates in our dataset and after clearing we should left with 9659 apps. Here's some examples of apps that have duplicates.

In [9]:
print('Examples of duplicate apps:', duplicate_apps[:9 + 1])

Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


### Part Two: Removing Duplicates

We need to remove duplicate entries and keep only one entry per app. We could remove duplicate apps randomly, but we have better way to do so.

As you can see in our **'Instagram'** example the only difference is in **'Number of reviews'** column. The different data should mean that the information was collected at different times. We can use this information to keep most recent data.

First, for clearing duplicates we need to create dictionary with each app name and it highest number of ratings.

In [11]:
reviews_max = dict()

for app in google_dataset:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Number of entries in dictionary:', len(reviews_max))

Number of entries in dictionary: 9659


In the code section above we loop through our **Google Play** dataset took name and number of reviews of each app and put it in `reviews_max` dictionary. If there is a duplicate with number of reviews higher than the entry we already have we swap old rating with new one and continue our loop. In the end we print number of unique apps in dataset.

In the next code section we are going to do these steps:
* We create 2 lists `android_clean` and `already_added`.
* We loop through each app in our dataset.
* For each iteration we are going to do next steps:
    * We store app name and number of reviews in `name` and `n_reviews` variables.
    * We add full row `app` in our fresh list `android_clean` and also store the name of app in `already_added` list if name of app stored in `name` variable not in `already_added` list and number of reviews of app stored in `n_reviews` variable is euqal to a value stored in the dictionary we created before.

In [12]:
android_clean = list()
already_added = list()

for app in google_dataset:
    name = app[0]
    n_reviews = float(app[3])
    if reviews_max[name] == n_reviews and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

print('Number of rows left after cleaning:', len(android_clean))

Number of rows left after cleaning: 9659


After removing the duplicates we left with roughly 9659 apps as we expected.

## Removing non-English apps

If we explore our datasets enough, we can notice the names of some apps that not are not directed toward English speakers. Here's some exmaples of these apps.

In [13]:
print(apple_dataset[814][2])
print(apple_dataset[6734][2])

print(android_clean[4412][0])
print(android_clean[7940][0])

搜狐新闻—新闻热点资讯掌上阅读软件
エレメンタル ファンタジー - 高精細３ＤアクションＲＰＧ
中国語 AQリスニング
لعبة تقدر تربح DZ


We're not interested for keeping non-English apps, so we'll remove them. In programming, each chracter in a string has a corresponding number associated with it. For example, the corresponding character for `'a'` is 97, character `'A'` is 65, and character `'爱'` is 29,233. According to [ASCII](https://en.wikipedia.org/wiki/ASCII) the range of the characters we commonly use in an English text are from 0 to 127. Using this information we can easily remove non-English apps from our datasets.

In [14]:
def eng_char_check(string):
    non_eng_chars = list()
    for char in string:
        if ord(char) > 127:
            non_eng_chars.append(char)
    if len(non_eng_chars) > 3:
        return False
    else:
        return True

In [15]:
android_eng = list()
apple_eng = list()

for app in android_clean:
    if eng_char_check(app[0]) is True:
        android_eng.append(app)

for app in apple_dataset:
    if eng_char_check(app[2]) is True:
        apple_eng.append(app)

print('Number of android apps:', len(android_eng))
print('Number of apple apps:', len(apple_eng))

Number of android apps: 9614
Number of apple apps: 6183


In code section above we use function that check if the app use only English characters and if it's true we add this app in corresponding list. This will help us find only apps developed for English speakers.

## Isolating Free Apps

As we mention earlier, we only build apps that are free to download and install. Because of this we need to isolate free apps from paid ones. To do so, we can use **'Type'** column for Google Play apps and **'price'** column for App Store apps.

In [16]:
android_free = list()
apple_free = list()

for app in android_eng:
    if app[6] == 'Free':
        android_free.append(app)

for app in apple_eng:
    price = float(app[5])
    if price == 0:
        apple_free.append(app)

In [17]:
print('Number of free android apps:', len(android_free))
print('Number of free apple apps:', len(apple_free))

Number of free android apps: 8863
Number of free apple apps: 3222


After we complete our cleaning, there are 8863 apps left from **Google Play** dataset and 4056 apps from **App Store** dataset. From this point we can start the analysis.

## Most Common Apps by Genre

## Part One: Finding Apps Frequency by Genre

As the amount of people using our apps affects our revenue, our goal is to determine the kinds of apps that are likely attract more users.

To minimize risks and overhead, our validation strategy for an app idea has three steps:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both **Google Play** and **App Store**, we need to find app profiles that are successful in both markets. Because of this reason we will analyse our datasets using app genres.

In [18]:
def freq_table(dataset, index):
    freq_table = dict()
    total = 0
    for app in dataset:
        val = app[index]
        total += 1
        if val in freq_table:
            freq_table[val] += 1
        elif val not in freq_table:
            freq_table[val] = 1
    freq_table_percentages = dict()
    for key in freq_table:
        percentage = (freq_table[key] / total) * 100
        freq_table_percentages[key] = round(percentage, 1)
    return freq_table_percentages

In [19]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', str(entry[0]) + '%')

To find out what genres are popular in **Google Play** and **App Store** we build 2 functions. First one create frequency table of genres and second one is sort it by descending order.

## Part Two: Examine Data

Let's start by examing **App Store** frequency table.

In [20]:
display_table(apple_free, 12) # prime genre section

Games : 58.2%
Entertainment : 7.9%
Photo & Video : 5.0%
Education : 3.7%
Social Networking : 3.3%
Shopping : 2.6%
Utilities : 2.5%
Sports : 2.1%
Music : 2.0%
Health & Fitness : 2.0%
Productivity : 1.7%
Lifestyle : 1.6%
News : 1.3%
Travel : 1.2%
Finance : 1.1%
Weather : 0.9%
Food & Drink : 0.8%
Reference : 0.6%
Business : 0.5%
Book : 0.4%
Navigation : 0.2%
Medical : 0.2%
Catalogs : 0.1%


We can see 58.2% of all non-English free apps are games. Entertainment apps have 7.9% of the market, followed by photo & video apps with 5.0%. Apps designed for education are 3.7%, followed by social networking apps with 3.3% of the apps with our dataset.

As we can see, **App Store** is dominated by apps developed for entertainment, while apps with practical purposes are more rare. But the fact that entertainment apps have numerous adventage it doesn't mean that they also have the greatest number of users.

Let's continue our analysis with **Category** and **Genres** columns of our **Google Play** dataset.

In [21]:
display_table(android_free, 1) # Category section

FAMILY : 18.9%
GAME : 9.7%
TOOLS : 8.5%
BUSINESS : 4.6%
PRODUCTIVITY : 3.9%
LIFESTYLE : 3.9%
FINANCE : 3.7%
MEDICAL : 3.5%
SPORTS : 3.4%
PERSONALIZATION : 3.3%
COMMUNICATION : 3.2%
HEALTH_AND_FITNESS : 3.1%
PHOTOGRAPHY : 2.9%
NEWS_AND_MAGAZINES : 2.8%
SOCIAL : 2.7%
TRAVEL_AND_LOCAL : 2.3%
SHOPPING : 2.2%
BOOKS_AND_REFERENCE : 2.1%
DATING : 1.9%
VIDEO_PLAYERS : 1.8%
MAPS_AND_NAVIGATION : 1.4%
FOOD_AND_DRINK : 1.2%
EDUCATION : 1.2%
ENTERTAINMENT : 1.0%
LIBRARIES_AND_DEMO : 0.9%
AUTO_AND_VEHICLES : 0.9%
WEATHER : 0.8%
HOUSE_AND_HOME : 0.8%
PARENTING : 0.7%
EVENTS : 0.7%
COMICS : 0.6%
BEAUTY : 0.6%
ART_AND_DESIGN : 0.6%


This table seems significantly different on **Google Play**. There are not that many apps designed for fun, and it looks like that there is a good amount of apps designed for practical purposes, and the frequency table for **Genres** column also proves this.

In [22]:
display_table(android_free, 9) # Genres section

Tools : 8.5%
Entertainment : 6.1%
Education : 5.3%
Business : 4.6%
Productivity : 3.9%
Lifestyle : 3.9%
Finance : 3.7%
Sports : 3.5%
Medical : 3.5%
Personalization : 3.3%
Communication : 3.2%
Health & Fitness : 3.1%
Action : 3.1%
Photography : 2.9%
News & Magazines : 2.8%
Social : 2.7%
Travel & Local : 2.3%
Shopping : 2.2%
Books & Reference : 2.1%
Simulation : 2.0%
Dating : 1.9%
Arcade : 1.9%
Video Players & Editors : 1.8%
Casual : 1.8%
Maps & Navigation : 1.4%
Food & Drink : 1.2%
Puzzle : 1.1%
Racing : 1.0%
Strategy : 0.9%
Role Playing : 0.9%
Libraries & Demo : 0.9%
Auto & Vehicles : 0.9%
Weather : 0.8%
House & Home : 0.8%
Events : 0.7%
Adventure : 0.7%
Comics : 0.6%
Beauty : 0.6%
Art & Design : 0.6%
Parenting : 0.5%
Card : 0.5%
Trivia : 0.4%
Educational;Education : 0.4%
Educational : 0.4%
Casino : 0.4%
Board : 0.4%
Word : 0.3%
Education;Education : 0.3%
Racing;Action & Adventure : 0.2%
Puzzle;Brain Games : 0.2%
Music : 0.2%
Entertainment;Music & Video : 0.2%
Casual;Pretend Play : 0.2

The difference between **Genres** and **Category** columns in our **Google Play** dataset is not so clear, but one thing we can notice is that the **Genres** column have more categories. At the moment, we looking for the big picture, so we will be working with **Category** column from this point.

## Part Three: Most Popular App Store Apps by Genre

To get more accurate data, we can compare average user ratings for each genre. To do this, we need to sum up all user ratings for one genre and then divide this sum by the number of apps in same genre.

In [23]:
apple_genre_freq_table = freq_table(apple_free, 12)
apple_table_sorted = list()

for genre in apple_genre_freq_table:
    total = 0
    len_genre = 0
    for app in apple_free:
        genre_app = app[12]
        if genre_app == genre:
            user_rating = float(app[6])
            total += user_rating
            len_genre += 1
    avg_user_rating = total / len_genre
    key_val_as_tuple = (round(avg_user_rating, 1), genre)
    apple_table_sorted.append(key_val_as_tuple)

apple_table_sorted = sorted(apple_table_sorted, reverse = True)
for entry in apple_table_sorted:
    print(entry[1], ':', entry[0])

Navigation : 86090.3
Reference : 74942.1
Social Networking : 71548.3
Music : 57326.5
Weather : 52279.9
Book : 39758.5
Food & Drink : 33333.9
Finance : 31467.9
Photo & Video : 28441.5
Travel : 28243.8
Shopping : 26919.7
Health & Fitness : 23298.0
Sports : 23008.9
Games : 22788.7
News : 21248.0
Productivity : 21028.4
Utilities : 18684.5
Lifestyle : 16485.8
Entertainment : 14029.8
Business : 7491.1
Education : 7004.0
Catalogs : 4004.0
Medical : 612.0


**Navigation** apps on average have the highest number of user reviews, but this is influenced by **Waze** and **Google Maps**, which have almost half a million user reviews together.

In [24]:
for app in apple_free:
    if app[12] == 'Navigation':
        print(app[2], app[6])

Waze - GPS Navigation, Maps & Real-time Traffic 345046
Geocaching® 12811
ImmobilienScout24: Real Estate Search in Germany 187
Railway Route Search 5
CoPilot GPS – Car Navigation & Offline Maps 3582
Google Maps - Navigation & Transit 154911


Same goes for **Social Networking** apps, where the average number is influenced by giants like **Facebook**, **Instagram** etc. **Music** apps in the same boat, theres is **Spotify**, **Shazam** and other apps that heavily influence the average number.

Our aim is to find popular genres, but it seems that most popular genres must seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold. We could have a better picture by removing extremely popular apps from each genre and recalculate average user ratings, but we'll leave this level of detail for later.

**Reference** is another genre that has a high average user rating, but again ratings are skewed because of **Bible** and **Dictionary.com**.

In [25]:
for app in apple_free:
    if app[12] == 'Reference':
        print(app[2], app[6])

Bible 985920
Dictionary.com Dictionary & Thesaurus 200047
Dictionary.com Dictionary & Thesaurus for iPad 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran 18418
Merriam-Webster Dictionary 16849
Google Translate 26786
Night Sky 12122
WWDC 762
Jishokun-Japanese English Dictionary & Translator 0
教えて!goo 0
VPN Express 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools 4693
Guides for Pokémon GO - Pokemon GO News and Cheats 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) 8535
GUNS MODS for Minecraft PC Edition - Mods Tools 1497
Real Bike Traffic Rider Virtual Reality Glasses 8


On the other hand we have **Health & Fitness**, which has a more balanced user rating distribution.

In [26]:
for app in apple_free:
    if app[12] == 'Health & Fitness':
        print(app[2], app[6])

Lifesum – Inspiring healthy lifestyle app 5795
Lose It! – Weight Loss Program and Calorie Counter 373835
Nike+ Training Club - Workouts & Fitness Plans 33969
Sleep Cycle alarm clock 104539
Period Tracker Lite 53620
Weight Watchers 136833
My Cycles Period and Ovulation Tracker 7469
Runtastic Running, Jogging and Walking Tracker 10298
Calorie Counter & Diet Tracker by MyFitnessPal 507706
Waterlogged - Daily Hydration Tracker 5000
WebMD for iPad 9142
Fooducate - Lose Weight, Eat Healthy,Get Motivated 11875
My Score Plus Weight Loss, Food & Exercise Tracker 467
VIBO RealMassager 6
Fitbit 90496
Headspace 12819
Charity Miles: Walking & Running Distance Tracker 3115
Sworkit - Custom Workouts for Exercise & Fitness 16819
Fitstar Personal Trainer 7496
Garmin Connect™ Mobile 8341
Smart Alarm Clock : sleep cycle & snoring recorder 3779
Plant Nanny - Water Reminder with Cute Plants 27421
Sleep Meister - Sleep Cycle Alarm Lite 445
ameli, l'Assurance Maladie 8
RUNNING for weight loss: workout & meal

This niche looks like a good spot. We can make a fitness app, but not with only excercises. We could make this app like gamefication version of default fitness apps with character development, achievements, online community etc. You can improve your character in-game by doing various excersises in real life. And with online community you can share your progress with other people who use this app.

This idea seems to fit well with the fact that **App Store** dominated by apps for entertainment.

Some others popular genres such as **Weather** or **Food & Drink** don't seem to be intrested to us.
* In **Weather** apps people usually don't spend that much time because of this chances of making profit using in-app ads are low for this category of apps.
* In **Food & Drink** most of apps is restraunts or delivery serivices like **McDonald's** or **Starbucks** so this niche is not for us.

## Part Four: Most Popular Google Store Apps by Genre

For **Google Play** we have data about number of installs, so we should be able to get clearer picture about genre popularity. However, we can see that most values are open-ended. For example there are '100+', '1000+', '5000+' etc.

In [27]:
display_table(android_free, 5)

1,000,000+ : 15.7%
100,000+ : 11.6%
10,000,000+ : 10.5%
10,000+ : 10.2%
1,000+ : 8.4%
100+ : 6.9%
5,000,000+ : 6.8%
500,000+ : 5.6%
50,000+ : 4.8%
5,000+ : 4.5%
10+ : 3.5%
500+ : 3.2%
50,000,000+ : 2.3%
100,000,000+ : 2.1%
50+ : 1.9%
5+ : 0.8%
1+ : 0.5%
500,000,000+ : 0.3%
1,000,000,000+ : 0.2%
0+ : 0.0%


This data seems not very precise. For example, for 100,000+ downloads we don't know if it's 100,000 or 300,000. But we don't need perfectly accurate data for our purposes, we only need an idea of which genre attract more users.

Now we need to perform computations. To do so, we need to convert each install number to a `float` - this means we need to remove the commas and the plus characters. We'll do this directly in the loop below, where we do our computations.

In [28]:
android_category_freq_table = freq_table(android_free, 1)
android_table_sorted = list()

for category in android_category_freq_table:
    total = 0
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            installs = installs.replace(',', '')
            installs = installs.replace('+', '')
            installs = float(installs)
            total += installs
            len_category += 1
    avg_installs = total / len_category
    key_val_as_tuple = (round(avg_installs, 1), category)
    android_table_sorted.append(key_val_as_tuple)

android_table_sorted = sorted(android_table_sorted, reverse = True)
for entry in android_table_sorted:
    print(entry[1], ':', entry[0])

COMMUNICATION : 38456119.2
VIDEO_PLAYERS : 24727872.5
SOCIAL : 23253652.1
PHOTOGRAPHY : 17840110.4
PRODUCTIVITY : 16787331.3
GAME : 15588015.6
TRAVEL_AND_LOCAL : 13984077.7
ENTERTAINMENT : 11640705.9
TOOLS : 10801391.3
NEWS_AND_MAGAZINES : 9549178.5
BOOKS_AND_REFERENCE : 8767811.9
SHOPPING : 7036877.3
PERSONALIZATION : 5201482.6
WEATHER : 5074486.2
HEALTH_AND_FITNESS : 4188822.0
MAPS_AND_NAVIGATION : 4056941.8
FAMILY : 3697848.2
SPORTS : 3638640.1
ART_AND_DESIGN : 1986335.1
FOOD_AND_DRINK : 1924897.7
EDUCATION : 1833495.1
BUSINESS : 1712290.1
LIFESTYLE : 1437816.3
FINANCE : 1387692.5
HOUSE_AND_HOME : 1331540.6
DATING : 854028.8
COMICS : 817657.3
AUTO_AND_VEHICLES : 647317.8
LIBRARIES_AND_DEMO : 638503.7
PARENTING : 542603.6
BEAUTY : 513151.9
EVENTS : 253542.2
MEDICAL : 120550.6


**Communication** category have the most installs on average, but this skewed by a few apps such as **What's App**, **Skype**, **Hangouts** etc.

In [29]:
for app in android_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] =='100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

If we remove all app that above 100,000,000+ installs the average would be reduced roughly ten times.

In [30]:
under_100_m = list()

for app in android_free:
    installs = app[5]
    installs = installs.replace(',', '')
    installs = installs.replace('+', '')
    installs = float(installs)
    if app[1] == 'COMMUNICATION' and installs < 100000000:
        under_100_m.append(installs)

print(round(sum(under_100_m) / len(under_100_m), 2))

3603485.39


We see the same patter for **Video Players** category with 24,727,872 installs. The market is dominated by apps like **YouTube**, **Google Play Movies & TV** and **MX Player**. The same patter goes for **Social**, **Photography** and **Productivity** apps.

Again, these genres might seem more popular than they really are. These niches seem to be dominated by giants who are hard to compete against.

The **Health & Fitness** genre looks fairly popular as well, with an average number of installs of 4,188,822. We should explore this genre more in-depth, since we found that this genre should work well on the **App Store** and our goal is to recommend profile that show potention for being profitable on both **App Store** and **Google Play**.

Let's take a look at some of the apps from this genre and their number of installs.

In [31]:
for app in android_free:
    if app[1] == 'HEALTH_AND_FITNESS':
        print(app[0], ':', app[5])

Step Counter - Calorie Counter : 500,000+
Lose Belly Fat in 30 Days - Flat Stomach : 5,000,000+
Pedometer - Step Counter Free & Calorie Burner : 1,000,000+
Six Pack in 30 Days - Abs Workout : 10,000,000+
Lose Weight in 30 Days : 10,000,000+
Pedometer : 10,000,000+
LG Health : 10,000,000+
Step Counter - Pedometer Free & Calorie Counter : 10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App : 10,000,000+
Sportractive GPS Running Cycling Distance Tracker : 1,000,000+
30 Day Fitness Challenge - Workout at Home : 10,000,000+
Home Workout for Men - Bodybuilding : 1,000,000+
Fat Burning Workout - Home Weight lose : 100,000+
Buttocks and Abdomen : 500,000+
Walking for Weight Loss - Walk Tracker : 100,000+
Running & Jogging : 500,000+
Sleep Sounds : 1,000,000+
Fitbit : 10,000,000+
Lose Belly Fat-Home Abs Fitness Workout : 50,000+
Cycling - Bike Tracker : 500,000+
Abs Training-Burn belly fat : 100,000+
Calorie Counter - EasyFit free : 1,000,000+
Aunjai i lert u : 500,000+
Garmin Connect

The **Health & Fitness** genre includes a variety of apps from calorie counters to meditation apps. 

In [32]:
for app in android_free:
    if app[1] == 'HEALTH_AND_FITNESS' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] =='100,000,000+'):
        print(app[0], ':', app[5])

Period Tracker - Period Calendar Ovulation Tracker : 100,000,000+
Samsung Health : 500,000,000+


There's only 2 really popular apps, so this market shows really huge potential to us. Let's have some app ideas based on the apps that are somewhere in the middle in terms of popularity.

In [33]:
for app in android_free:
    if app[1] == 'HEALTH_AND_FITNESS' and (app[5] == '1,000,000+'
                                      or app[5] == '5,000,000+'
                                      or app[5] =='10,000,000+'
                                      or app[5] =='50,000,000+'):
        print(app[0], ':', app[5])

Lose Belly Fat in 30 Days - Flat Stomach : 5,000,000+
Pedometer - Step Counter Free & Calorie Burner : 1,000,000+
Six Pack in 30 Days - Abs Workout : 10,000,000+
Lose Weight in 30 Days : 10,000,000+
Pedometer : 10,000,000+
LG Health : 10,000,000+
Step Counter - Pedometer Free & Calorie Counter : 10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App : 10,000,000+
Sportractive GPS Running Cycling Distance Tracker : 1,000,000+
30 Day Fitness Challenge - Workout at Home : 10,000,000+
Home Workout for Men - Bodybuilding : 1,000,000+
Sleep Sounds : 1,000,000+
Fitbit : 10,000,000+
Calorie Counter - EasyFit free : 1,000,000+
Garmin Connect™ : 10,000,000+
BetterMe: Weight Loss Workouts : 5,000,000+
Bike Computer - GPS Cycling Tracker : 1,000,000+
Running Distance Tracker + : 1,000,000+
Runkeeper - GPS Track Run Walk : 10,000,000+
Walking: Pedometer diet : 1,000,000+
8fit Workouts & Meal Planner : 10,000,000+
Keep Trainer - Workout Trainer & Fitness Coach : 1,000,000+
PumpUp — Fitness Co

Look's like this niche is dominated by **Calorie Counters**, **Home Workout** apps as well as apps for **Weight Loss**, so it's probably not a good idea to build a simmilar apps since there'll be some strong competition.

We are notice there a few apps build around **Abs Workout**, which suggests that building an app around one popular kind of excercises can be profitable. It seems like building an app around one theme, in our case **Abs training**, with new and intresting features could be profitable for both **App Store** and **Google Play** markets.

## Conclusion

In this project we analyzed data about the **App Store** and **Google Play** mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We came to the conclusion that creating a fitness app using the idea of gamification can be profitable for both **App Store** and **Google Play**. The market is already full of ordinary fitness apps, so we think that **Abs Workout** app with adding features like character development, achievements, online community etc. could be popular and profitable.