# Profitable App Profiles for the App Store and Google Play Markets

Our company only builds apps that are free to download and install, directed toward an English-speaking audience, making profit through ads. We are going to analyze what types of apps are likely to attract more users on Google Play and the App Store.

We will do so by collecting and analyzing data about mobile apps available on Google Play and the App Store (a data set containing data about approximately 10,000 Android apps from Google Play, collected in August 2018, and a data set containing data about approximately 7,000 iOS apps from the App Store, collected in July 2017).

In [1]:
#Open datasets
open_apple = open("AppleStore.csv")
open_google = open("googleplaystore.csv")
from csv import reader
read_apple = reader(open_apple)
read_google = reader(open_google)
apple = list(read_apple)
android = list(read_google)

In [2]:
#Create function to explore datasets 
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Looking at google dataset:

In [3]:
#Explore datasets
explore_data(android,0,2,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


This dataset has 10,842 rows and 13 columns, being 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres' the most relevant for this analysis.

Looking and Apple dataset:

In [4]:
explore_data(apple,0,2,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


Number of rows: 7198
Number of columns: 16


We count with 7,198 apps and 16 columns, being 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre' the most relevant for our analysis.

## Deleting wrong data

Through the discussion section for the google play dataset, we read that one user is outlining an error for the row of the app 'Life Made WI-Fi Touchscreen Photo Frame' at index 10473. Let's check if there is indeed an error and, if so, this row will be deleted.

We will print the row with the error, the header and a regular row to determine if an error exists.

In [5]:
print(android[10473])
print('')
print(android[0])
print('')
print(android[1])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


The third column for the app 'Life Made WI-Fi' shows that the rating for the app is 19. The max possible rating is 5, therefore this line seems to be incorrect and will be deleted:

In [6]:
del android[10473]
#deactivating this line of code after performed, in order to avoid deletion of different rows if all code needs to be rerun

## Identifying duplicates

In [7]:
duplicate_apps = []
unique_apps = []

for app in android:
    app_name = app[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)

print(len(duplicate_apps),' duplicate apps')

1181  duplicate apps


In [8]:
#examine some duplicate apps
print(duplicate_apps[:20])

['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express']


In [9]:
#examine duplicate app in data
for app in android:
    if app[0] == 'Slack':
        print (app)
print('')      
print(android[0])

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


We can see changes in the 'Reviews' column. We will keep the data for the row with the highest number of reviews and eliminate the rest to get rid of duplicates

## Removing duplicates

In [10]:
print('Expected length after removing duplicates',len(android)-len(duplicate_apps)-1) #-1 to remove header

Expected length after removing duplicates 9659


#### Creating a dictionary that contains all apps (unique values) with its associated number of reviews (the highest one)

In [11]:
#dictionary containing unique apps with number of max reviews
android_dict = {}
for row in android[1:]:
    rating = float(row[3])
    name = row[0]
    if name in android_dict:
        if rating > android_dict[name]:
            android_dict[name] = float(row[3])
    else:
        android_dict[row[0]] = float(row[3])
        

apple_dict = {}
for row in apple[1:]:
    rating = float(row[5])
    name = row[1]
    if name in apple_dict:
        if rating > apple_dict[name]:
            apple_dict[name] = float(row[5])
    else:
        apple_dict[row[1]] = float(row[5])
        
print('Lenght android:',len(android_dict))
print('Length aple:',len(apple_dict))

Lenght android: 9659
Length aple: 7195


#### We create two lists: android_clean will store data without the duplicates. We add items whose number of reviews match with those in the dictionary and whose name is not in the list of already_added items

In [12]:
android_clean = []
already_added = []
for row in android[1:]:
    name = row[0]
    rating = float(row[3])
    if (rating == android_dict[name]) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)


apple_clean = []
already_added_ap = []
for row in apple[1:]:
    name = row[1]
    rating = float(row[5])
    if (rating == apple_dict[name]) and (name not in already_added_ap):
        apple_clean.append(row)
        already_added_ap.append(name)
        
print('Android:',len(android_clean))
print('Apple:',len(apple_clean))

Android: 9659
Apple: 7195


#### Defining a function to identify app names that have three or more characters identified as non-English, through the ASCII range

In [13]:
def en_string(s):
    sum = 0
    for character in s:
        if ord(character) > 127:
            sum += 1
    if sum > 3:
        return False
    else:
        return True

#### Removing non-English apps

In [14]:
#creating a list of English apps. If function above returns True, we add the row
android_en = []
for row in android_clean:
    name = row[0]
    if en_string(name):
        android_en.append(row)
        
apple_en = []
for row in apple[1:]:
    name = row[1]
    if en_string(name):
        apple_en.append(row)

print('Android rows:',len(android_en))
print('Apple rows:',len(apple_en))

Android rows: 9614
Apple rows: 6183


### Get apps that are for free

In [15]:
print('Android:',android[0])
print('Apple:',apple[0])

Android: ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
Apple: ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [16]:
print('Index for price in Android:',android[0].index('Price'))
print('Index for price in Apple:',apple[0].index('price'))
    

Index for price in Android: 7
Index for price in Apple: 4


In [17]:
android_en_free = []
for row in android_en:
    if row[7] == '0':
        android_en_free.append(row)
        
apple_en_free = []
for row in apple_en:
    if row[4] == '0.0':
        apple_en_free.append(row)
        
print('Lenght of Android:',len(android_en_free))
print('Lenght of Apple:',len(apple_en_free))

Lenght of Android: 8864
Lenght of Apple: 3222


# Apps likely to attract more users

#### As our revenue is impacted by the number of people using our apps, we need to develop apps likely to attract more users. Our strategy is: 1) build a minimal version of the app and add it to Google Play 2) If the app has a good respond from users, develop it further 3) If the app is profitable after 6 months, add it to the App Store

We need to find app profiles that are successful on both markets. We are going to analyze the most common genres for each market by building frequency tables

In [18]:
print(android[0:2])
print('')
print(apple[0:2])

[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'], ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']]

[['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'], ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']]


#### The columns to look at are 'Category' and 'Genres' for the android set, and prime_genre for the apple set.

### We write a function to create a frequency table that takes as input a dataset and an index (corresponding to the column on which the table is generated)

In [19]:
def freq_table(dataset, column):
    freq_dataset = {}
    length = len(dataset)
    for row in dataset:
        element = row[column]
        if row[column] in freq_dataset:
            freq_dataset[element] += 1
        else:
            freq_dataset[element] = 1

    for key in freq_dataset:
        freq_dataset[key] = round((freq_dataset[key]/length)*100,2)
    return freq_dataset

In [20]:
freq_table(android_en_free,1)

{'ART_AND_DESIGN': 0.64,
 'AUTO_AND_VEHICLES': 0.93,
 'BEAUTY': 0.6,
 'BOOKS_AND_REFERENCE': 2.14,
 'BUSINESS': 4.59,
 'COMICS': 0.62,
 'COMMUNICATION': 3.24,
 'DATING': 1.86,
 'EDUCATION': 1.16,
 'ENTERTAINMENT': 0.96,
 'EVENTS': 0.71,
 'FAMILY': 18.91,
 'FINANCE': 3.7,
 'FOOD_AND_DRINK': 1.24,
 'GAME': 9.72,
 'HEALTH_AND_FITNESS': 3.08,
 'HOUSE_AND_HOME': 0.82,
 'LIBRARIES_AND_DEMO': 0.94,
 'LIFESTYLE': 3.9,
 'MAPS_AND_NAVIGATION': 1.4,
 'MEDICAL': 3.53,
 'NEWS_AND_MAGAZINES': 2.8,
 'PARENTING': 0.65,
 'PERSONALIZATION': 3.32,
 'PHOTOGRAPHY': 2.94,
 'PRODUCTIVITY': 3.89,
 'SHOPPING': 2.25,
 'SOCIAL': 2.66,
 'SPORTS': 3.4,
 'TOOLS': 8.46,
 'TRAVEL_AND_LOCAL': 2.34,
 'VIDEO_PLAYERS': 1.79,
 'WEATHER': 0.8}

#### With the next function we will be able to display the frequency table by descending order

In [21]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [22]:
display_table(android_en_free,1)

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


The most popular category in Google Play for free apps in English is the family category. The content of this category is mainly games for kids.

A good number of apps are designed for practical purposes (tools, business, lifestyle, etc.).

The general impression is that on Google Play there is a good representation of practical apps.

In [23]:
display_table(apple_en_free,11)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


More than a half (58.16%) of the English free apps in Apple Store are games. Entertainment apps, photo and video apps are next, followed by social networking, amounting for 3.29% of the apps present in our data set.

The general impression is that on App Store, the part containing free English apps is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare.

## Most Popular Apps by Genre on the App Store¶

In order to get an idea about the kind of apps that have most users, we will calculate it by calculating the average number of user ratings per app genre on the App Store. We will take the total number of user ratings (column rating_count_tot).

In [24]:
apple_genres = freq_table(apple_en_free,11)

for genre in apple_genres:
    total = 0 #to calculate number of ratings
    len_genre = 0 #this is the number of apps in a specific range
    for row in apple_en_free:
        genre_app = row[11]
        if genre_app == genre:
            n_ratings = float(row[5])
            total += n_ratings
            len_genre += 1
    avg_user_ratings = total/len_genre
    print(genre, ': ',round(avg_user_ratings,2))

Medical :  612.0
Photo & Video :  28441.54
Music :  57326.53
Entertainment :  14029.83
Business :  7491.12
Sports :  23008.9
Education :  7003.98
Utilities :  18684.46
Productivity :  21028.41
Health & Fitness :  23298.02
Social Networking :  71548.35
Navigation :  86090.33
Food & Drink :  33333.92
Book :  39758.5
Lifestyle :  16485.76
Finance :  31467.94
Reference :  74942.11
Catalogs :  4004.0
Travel :  28243.8
News :  21248.02
Games :  22788.67
Shopping :  26919.69
Weather :  52279.89


On average, navigation apps have the highest number of user reviews.

In [25]:
for row in apple_en_free:
    if row[11] == 'Navigation':
        print(row[1], ':', row[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The numbers seem to be highly influenced by the app 'Waze - GPS Navigation'. For a more detailed analysis, we can remove later on these popular apps from the list to get a better representation. Let's examine other popular categories:

In [26]:
print('Apps in the Reference category')
for row in apple_en_free:
    if row[11] == 'Reference':
        print(row[1], ':', row[5]) # print name and number of ratings
print('')      
print('Apps in the Social Networking category')
for row in apple_en_free:
    if row[11] == 'Social Networking':
        print(row[1], ':', row[5]) # print name and number of ratings

Apps in the Reference category
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0

Apps in the Social Networking category
Facebook : 2974676
Pinterest : 1061624
Skype for iP

Again, we find each genre is highly influenced by very popular apps. These apps might make our genre look more popular than it really is, as we are making the classification according to the number of users.

Before we proceed with removing these popular apps, let's have a look at the Google Play Store dataset:

## Most Popular Apps by Genre on the Google Play Store¶

In [27]:
android_genres = freq_table(android_en_free,1)

for genre in android_genres:
    total = 0 #to calculate number of ratings
    len_genre = 0 #this is the number of apps in a specific range
    for row in android_en_free:
        genre_app = row[1]
        if genre_app == genre:
            n_ratings = row[5] #n_installs
            n_ratings = n_ratings.replace('+','')
            n_ratings = n_ratings.replace(',','')
            total += float(n_ratings)
            len_genre += 1
    avg_user_ratings = total/len_genre
    print(genre, ': ',round(avg_user_ratings,2))

DATING :  854028.83
MEDICAL :  120550.62
FOOD_AND_DRINK :  1924897.74
GAME :  15588015.6
ENTERTAINMENT :  11640705.88
FAMILY :  3695641.82
FINANCE :  1387692.48
TOOLS :  10801391.3
SHOPPING :  7036877.31
MAPS_AND_NAVIGATION :  4056941.77
PHOTOGRAPHY :  17840110.4
VIDEO_PLAYERS :  24727872.45
SPORTS :  3638640.14
WEATHER :  5074486.2
TRAVEL_AND_LOCAL :  13984077.71
BUSINESS :  1712290.15
COMMUNICATION :  38456119.17
PRODUCTIVITY :  16787331.34
SOCIAL :  23253652.13
EVENTS :  253542.22
BOOKS_AND_REFERENCE :  8767811.89
ART_AND_DESIGN :  1986335.09
BEAUTY :  513151.89
LIFESTYLE :  1437816.27
EDUCATION :  1833495.15
PARENTING :  542603.62
PERSONALIZATION :  5201482.61
AUTO_AND_VEHICLES :  647317.82
COMICS :  817657.27
NEWS_AND_MAGAZINES :  9549178.47
HOUSE_AND_HOME :  1331540.56
HEALTH_AND_FITNESS :  4188821.99
LIBRARIES_AND_DEMO :  638503.73


On average, 'Communications' apps present the highest number of installations. Again, this would indicate that Communications is the most popular category, however this number seems to be skewed by the presence of a few very popular apps:

In [28]:
for row in android_en_free:
    name = row[0]
    genre = row[1]
    if genre == 'COMMUNICATION' and (row[5] == '1,000,000,000+'
                                      or row[5] == '500,000,000+'
                                      or row[5] == '100,000,000+'):
        print (name,': ',row[5])

WhatsApp Messenger :  1,000,000,000+
imo beta free calls and text :  100,000,000+
Android Messages :  100,000,000+
Google Duo - High Quality Video Calls :  500,000,000+
Messenger – Text and Video Chat for Free :  1,000,000,000+
imo free video calls and chat :  500,000,000+
Skype - free IM & video calls :  1,000,000,000+
Who :  100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji :  100,000,000+
LINE: Free Calls & Messages :  500,000,000+
Google Chrome: Fast & Secure :  1,000,000,000+
Firefox Browser fast & private :  100,000,000+
UC Browser - Fast Download Private & Secure :  500,000,000+
Gmail :  1,000,000,000+
Hangouts :  1,000,000,000+
Messenger Lite: Free Calls & Messages :  100,000,000+
Kik :  100,000,000+
KakaoTalk: Free Calls & Text :  100,000,000+
Opera Mini - fast web browser :  100,000,000+
Opera Browser: Fast and Secure :  100,000,000+
Telegram :  100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer :  100,000,000+
UC Browser Mini -Tiny Fast Private & Secure :  

'Health and Fitness' is another category that seems to be popular, but the lower number of installations may indicate the absence of numerous popular apps:

In [29]:
for row in android_en_free:
    name = row[0]
    genre = row[1]
    if genre == 'HEALTH_AND_FITNESS':
        print (name,': ',row[5])

Step Counter - Calorie Counter :  500,000+
Lose Belly Fat in 30 Days - Flat Stomach :  5,000,000+
Pedometer - Step Counter Free & Calorie Burner :  1,000,000+
Six Pack in 30 Days - Abs Workout :  10,000,000+
Lose Weight in 30 Days :  10,000,000+
Pedometer :  10,000,000+
LG Health :  10,000,000+
Step Counter - Pedometer Free & Calorie Counter :  10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App :  10,000,000+
Sportractive GPS Running Cycling Distance Tracker :  1,000,000+
30 Day Fitness Challenge - Workout at Home :  10,000,000+
Home Workout for Men - Bodybuilding :  1,000,000+
Fat Burning Workout - Home Weight lose :  100,000+
Buttocks and Abdomen :  500,000+
Walking for Weight Loss - Walk Tracker :  100,000+
Running & Jogging :  500,000+
Sleep Sounds :  1,000,000+
Fitbit :  10,000,000+
Lose Belly Fat-Home Abs Fitness Workout :  50,000+
Cycling - Bike Tracker :  500,000+
Abs Training-Burn belly fat :  100,000+
Calorie Counter - EasyFit free :  1,000,000+
Aunjai i lert u :  

In [31]:
for row in android_en_free:
    name = row[0]
    genre = row[1]
    if genre == 'HEALTH_AND_FITNESS' and (row[5] == '1,000,000,000+'
                                      or row[5] == '500,000,000+'
                                      or row[5] == '100,000,000+'):
        print (name,': ',row[5])

Period Tracker - Period Calendar Ovulation Tracker :  100,000,000+
Samsung Health :  500,000,000+


The health and fitness genre includes a variety of apps. The number of extremely popular apps that skew the average is much lower in comparison with other genres, so this market shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [32]:
for row in android_en_free:
    name = row[0]
    genre = row[1]
    if genre == 'HEALTH_AND_FITNESS' and (row[5] == '1,000,000+'
                                      or row[5] == '5,000,000+'
                                      or row[5] == '10,000,000+'
                                      or row[5] == '50,000,000+'):
        print (name,': ',row[5])

Lose Belly Fat in 30 Days - Flat Stomach :  5,000,000+
Pedometer - Step Counter Free & Calorie Burner :  1,000,000+
Six Pack in 30 Days - Abs Workout :  10,000,000+
Lose Weight in 30 Days :  10,000,000+
Pedometer :  10,000,000+
LG Health :  10,000,000+
Step Counter - Pedometer Free & Calorie Counter :  10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App :  10,000,000+
Sportractive GPS Running Cycling Distance Tracker :  1,000,000+
30 Day Fitness Challenge - Workout at Home :  10,000,000+
Home Workout for Men - Bodybuilding :  1,000,000+
Sleep Sounds :  1,000,000+
Fitbit :  10,000,000+
Calorie Counter - EasyFit free :  1,000,000+
Garmin Connect™ :  10,000,000+
BetterMe: Weight Loss Workouts :  5,000,000+
Bike Computer - GPS Cycling Tracker :  1,000,000+
Running Distance Tracker + :  1,000,000+
Runkeeper - GPS Track Run Walk :  10,000,000+
Walking: Pedometer diet :  1,000,000+
8fit Workouts & Meal Planner :  10,000,000+
Keep Trainer - Workout Trainer & Fitness Coach :  1,000,00

This segment seems to be dominated by software for counting calories burned, steps or tracking exercise, so it's probably not a good idea to build similar apps since there'll be some significant competition. As the market is full of apps, we will need to add some special features besides the ones already offered. This might include a combination of different features integrated into one, daily goals notifications, amount of exercise left to fulfill the goal, reminders, a forum where people can motivate each other, etc.

# Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that creating a fitness app that integrates all of the seen features into one and adding other desirable features could be profitable for both the Google Play and the App Store markets.