# Determine a profitable app profile for the App Store and Google Play # 

In this project, we aim to build an app profile based on the most profitable apps from the Google Play and Apple Store datasets. We will use this profile to build an app that can operate on both the iOS and Android platform.  

Our pre-condition for the app profile is that it must be free. Main source of revenue will come from in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

# Exploring the data collected from Kaggle#

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. Kaggle has public datasets available for Google Play and Apple Store data.

Google Play dataset contains approximately 10,000 Android apps which was collected in August 2018.

Apple Store dataset contains approximately 7,000 iOS apps which was collected in July 2017.

In [61]:
## Import modules ##
from csv import reader

## Apple Store dataset ##
with open('AppleStore.csv') as f:
    apple_dataset = list(reader(f))
    apple_header  = apple_dataset.pop(0) # store header

## Google Play dataset ##
with open('googleplaystore.csv', 'r') as f:
    android_dataset = list(reader(f))
    android_header  = android_dataset.pop(0) # store header

In [62]:
## Function for data exploration ##
def explore_data(dataset, start, end, rows_and_columns=False, header=None):
    dataset_slice = dataset[start:end]
    
    if header is not None:
        print('Column Names: ', header, '\n')
    
    for row in dataset_slice:
        print(row, '\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns: ', len(dataset[0]))

In [63]:
## Display number of rows and columns plus a few rows of the Apple Store dataset ##
explore_data(apple_dataset, 0, 3, True, apple_header)

Column Names:  ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] 

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] 

['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] 

Number of rows: 7197
Number of columns:  16


The columns that can be useful for our analysis are 'track_name','currency','price','rating_count_tot','user_rating','prime_genre','sup_devices.num'. For more information about these columns please click on the link. 

Link: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home

In [64]:
## Display number of rows and columns plus a few rows of the Google Play dataset ##
explore_data(android_dataset, 0, 3, True, android_header)

Column Names:  ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] 

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

Number of rows: 10841
Number of columns:  13


The columns that can be useful for our analysis are 'app','category','rating','reviews','installs','Type,'Price','Content Rating','Genres'. For more information about these columns please click on the link.

Link: https://www.kaggle.com/lava18/google-play-store-apps/home

# Data cleaning #

After investigating the discussion forum at Kaggle for the Google Play and Apple Store datasets, we find a discuss about an error occuring in row 10,472. It looks like the row is missing an element. My solution is to populate the missing element with an empty value.

In [65]:
## Populate missing element in Android dataset index 10472 ##
android_dataset[10472] = ['Life Made WI-Fi Touchscreen Photo Frame','','1.9','19','3.0M','1,000+','Free','0','Everyone','','February 11, 2018','1.0.19','4.0 and up']

# Removing Duplicate Entries #
After researching the data within Kaggle, we find that some apps have duplicate entries. In the next cells we calculate how many duplicates exist in our datasets and display a few apps that contain duplicates

In [66]:
## Identify Apple Store apps that contain duplicates and total duplicates ##
apple_dict = dict()

for row in apple_dataset:
    app = row[1]
    
    if app in apple_dict:
        apple_dict[app] += 1
    else:
        apple_dict[app] = 0

apple_duplicates = [key for key, value in apple_dict.items() if value >= 1]
total_apple_duplicates = sum(value for value in apple_dict.values() if value >= 1)
print('Examples of duplicate apps:', apple_duplicates, '\n')
print('Number of duplicate apps:', total_apple_duplicates)

Examples of duplicate apps: ['VR Roller Coaster', 'Mannequin Challenge'] 

Number of duplicate apps: 2


In [67]:
## Identify Google Play apps that contain duplicates and total duplicates ##
android_dict = dict()

for row in android_dataset:
    app = row[0]
    
    if app in android_dict:
        android_dict[app] += 1
    else:
        android_dict[app] = 0

android_duplicates = [key for key, value in android_dict.items() if value >= 1]
total_android_duplicates = sum(value for value in android_dict.values() if value >= 1)
print('Examples of duplicate apps:', android_duplicates[:5], '\n')
print('Number of duplicate apps:', total_android_duplicates)

Examples of duplicate apps: ['Google Earth', 'Calls & Text by Mo+', 'Motorola FM Radio', 'Expedia Hotels, Flights & Car Rental Travel Deals', 'Amazon for Tablets'] 

Number of duplicate apps: 1181


The approach to remove duplicates will be based highest number of reviews in index 3 (column name: reviews). Reasoning being that the higher the number of reviews, the more recent the data should be. 

Now, let's create a dictionary that contains the max review for each app. For the duplicate cases, we'll only keep the entries that match their max review dictionary counterpart.

In [68]:
## Create dictionary containing all Google Play apps and their highest number of reviews ##
reviews_max = dict()

for row in android_dataset:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Unique app count: ', len(reviews_max))

Unique app count:  9660


In [69]:
## Create list of Google Play apps with duplicates removed ##
android_clean = list()
android_already_added = list()

for row in android_dataset:
    name = row[0]
    n_reviews = float(row[3])
    
    if (reviews_max[name] == n_reviews) and (name not in android_already_added):
        android_clean.append(row)
        android_already_added.append(name)

In [70]:
## Explore new android clean list ##
explore_data(android_clean, 0, 1, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

Number of rows: 9660
Number of columns:  13


Here are the steps we took to remove duplicates from our Google Play dataset. 
    - Create two lists: android clean (store our new dataset with duplicates removed) and already added (store only app names)
    - Append row of data from Google Play dataset to android clean list if "reviews" column matches in max review dictionary and app not in already added list and append row to already added list. - this helps us to keep track of apps that we already added.

In [71]:
## Unique list of Apple apps ##
apple_clean = list()
apple_already_added = list()

for row in apple_dataset:
    name = row[1]
    
    if name not in apple_already_added:
        apple_clean.append(row)
        apple_already_added.append(name)

Since not many duplicates exist, I just randomly remove the duplicates.

# Removing Non-English Apps #
Exploring the dataset, we find that some apps seem to be directed toward a Non-English speaking audience. We explore a couple of examples from both datasets:

### Part One ###

In [72]:
## Few examples of non-english apps ##
print(apple_clean[7194][1])
print(android_clean[4412][0])
print(android_clean[7940][0])

みんなのお弁当 by クックパッド ~お弁当をレシピ付きで記録・共有~
中国語 AQリスニング
لعبة تقدر تربح DZ


We're not interested in keeping these kind of apps, so we'll remove them. One way to go about this is to remove each app whose name contains a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;, etc.), and other symbols (+, *, /, etc.).

All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.

We built this function below, and we use the built-in ord() function to find out the corresponding encoding number of each character.

In [73]:
## Function to test whether app title has characters that fall outside of the ASCII range 0-127 ##
def english_test(app):
    
    for letter in app:
        if ord(letter) > 127:
            return False
    return True

## Test function ##
print(english_test('Instachat 😜'))
print(english_test('Instagram'))
print(english_test('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_test('Docs To Go™ Free Office Suite'))

False
True
False
False


The function will need to be adjusted since the current version is removing English apps that contain emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range.

### Part Two ###
To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters:

In [74]:
## Adjust function to minimize the impact of data loss ##
def english_test(app):
    counter = 0
    
    for letter in app:
        if ord(letter) > 127:
            counter += 1
            
    if counter > 3: # app needs to have more than three non-ASCII characters in its title
        return False
    else:
        return True
    
print(english_test('Instachat 😜'))
print(english_test('Docs To Go™ Free Office Suite'))
print(english_test('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False


In [75]:
## Utilize english test function to filter for English apps ##
android_english = list()
apple_english = list()

for app in android_clean: 
    if english_test(app[0]):
        android_english.append(app)
        
for app in apple_clean:
    if english_test(app[1]):
        apple_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(apple_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] 

Number of rows: 9615
Number of columns:  13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] 

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] 

['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+',

# Isolating the Free Apps #
As stated in our introduction, our precondition is that the app should be free. Our data sets contain both free and non-free apps, and we'll need to isolate only the free apps for our analysis. Below, we isolate the free apps for both our data sets.

In [76]:
## Create final datasets by filtering for free apps ##
apple_final = list()
android_final = list()

for app in android_english:
    if app[7] == '0':
        android_final.append(app)
        
for app in apple_english:
    if app[4] == '0.0':
        apple_final.append(app)

## Display number rows per dataset after isolating free apps ##
print("Google Play total: ", len(android_final))
print("Apple Store total: ", len(apple_final))

Google Play total:  8865
Apple Store total:  3220


# Most Common Apps by Genre #
### Part One ###
Our aim is to determine the type of apps that attract the most users.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we then develop it further.
3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we then develop it further.
If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.
Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

### Part Two ###
Two functions we can use to analyze the frequency tables:

1. Generate frequency tables that show percentages
2. Display the percentages in a descending order

In [77]:
## Function to create frequency table ##
def freq_table(dataset, index):
    freq_dict = dict()
    total     = len(dataset)
    
    for row in dataset:
        column = row[index]
        if column in freq_dict:
            freq_dict[column] += 1
        else:
            freq_dict[column] = 1
    
    for key, value in freq_dict.items():
        freq_dict[key] = round((value / total) * 100, 2)
    
    return freq_dict

## Function to display frequency table ##
def display_table(dataset, index):
    table         = freq_table(dataset, index)
    table_display = []

    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Part Three ###
Explore prime genre column from Apple Store dataset.

In [78]:
display_table(apple_final, 11)

Games : 58.14
Entertainment : 7.89
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.52
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.34
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


Most common genre is Games. Runner-up is Entertainment. Games make up more than half the apps in the Apple store. Most apps are used for the purpose of entertainment such as Games, Music and Entertainment. 

Sole based on this analyze I could not make a recommend on the type of app to make. Market share isn't an indicator of amount of users. Social media have millions to billions of users yet them only make up 3.2% of apps in the market. I would better to make a determination on the number of users per genre.

Explore category and genres columns from Google Play dataset.

In [79]:
display_table(android_final, 1) # Category

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.24
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.92
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6
 : 0.01


In [80]:
display_table(android_final, 9) # Genres

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.24
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.92
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

Most common genre is Tools. Runner-up is Entertainment. Google Play genres column seems like granular representation of the category column. I will switch focus to using category column for this analyis since it's more compact. Most common category genre is Family. Runner-up is Game. 

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

# Most popular apps by genre in Apple Store (average number of users) #
One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but for the Apple Store dataset this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot column.

Below, we calculate the average number of user ratings per app genre on the Apple Store:

In [81]:
apple_genre = freq_table(apple_final, 11)
apple_users_sorted = dict()

for key in apple_genre.keys():
    total = 0
    len_genre = 0
    
    for row in apple_final:
        if row[11] == key:
            total += float(row[5])
            len_genre += 1
    apple_users_sorted[key] = round(total/len_genre) # Calculation for average number of users
    
print(sorted(apple_users_sorted.items(), key = lambda kv: kv[1], reverse=True))

[('Navigation', 86090), ('Reference', 74942), ('Social Networking', 71548), ('Music', 57327), ('Weather', 52280), ('Book', 39758), ('Food & Drink', 33334), ('Finance', 31468), ('Photo & Video', 28442), ('Travel', 28244), ('Shopping', 26920), ('Health & Fitness', 23298), ('Sports', 23009), ('Games', 22813), ('News', 21248), ('Productivity', 21028), ('Utilities', 18684), ('Lifestyle', 16486), ('Entertainment', 14030), ('Business', 7491), ('Education', 7004), ('Catalogs', 4004), ('Medical', 612)]


In [82]:
## Investigate most popular genres apps ##
counter = 0 
for app in apple_final:
    if app[-5] == 'Navigation':
        if counter < 5:
            print(app[1], ':', app[5]) # total number of users per app
            counter += 1
print('\n')

counter = 0
for app in apple_final:
    if app[-5] == 'Reference':
        if counter < 5:
            print(app[1], ':', app[5]) # total number of users per app
            counter += 1
print('\n')

counter = 0
for app in apple_final:
    if app[-5] == 'Social Networking':
        if counter < 5:
            print(app[1], ':', app[5]) # total number of users per app
            counter += 1
print('\n')
            
counter = 0
for app in apple_final:
    if app[-5] == 'Health & Fitness':
        if counter < 5:
            print(app[1], ':', app[5]) # total number of users per app
            counter += 1

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187


Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418


Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293


Calorie Counter & Diet Tracker by MyFitnessPal : 507706
Lose It! – Weight Loss Program and Calorie Counter : 373835
Weight Watchers : 136833
Sleep Cycle alarm clock : 104539
Fitbit : 90496


On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc.

Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings.

Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating.

Recommend app profile: health & fitness

Why? Because it seems like an undervalued genre. Apps in this genre average 23,000 users. The top two apps have more than 300,000 users. Which performs better against other apps not related to well known entities such as Facebook, Google Maps, Bible, or Starbucks. My specific aim would be to build an app targeted to runners and fitness enthusiasts. App to show all avaiable trails around the world. Fitness market is something that is growing as seen by the growth of gyms.

# Most popular apps by category in Google Play (average number of users) #
The Google Play dataset contains the number of installs, so we should be able to get a clearer picture about genre popularity. However, most of values are open-ended (100+, 1,000+, 5,000+, etc.). The "installs" column is not precise but for this analysis we only want to get an idea of which app genres attract the most users.

For this analysis, we'll need to remove the plus and comma symbols from "installs" column before converting to float datatype.

In [83]:
android_genre = freq_table(android_final, 1)
android_users_sorted = dict()

for key in android_genre.keys():
    total = 0
    len_category = 0
    
    for row in android_final:
        if row[1] == key:
            total += float(row[5].replace('+', '').replace(',', '')) # remove non-numeric characters
            len_category += 1
    android_users_sorted[key] = round(total/len_category) # Calculation for average number of users
    
print(sorted(android_users_sorted.items(), key = lambda kv: kv[1], reverse=True))

[('COMMUNICATION', 38456119), ('VIDEO_PLAYERS', 24727872), ('SOCIAL', 23253652), ('PHOTOGRAPHY', 17840110), ('PRODUCTIVITY', 16787331), ('GAME', 15588016), ('TRAVEL_AND_LOCAL', 13984078), ('ENTERTAINMENT', 11640706), ('TOOLS', 10801391), ('NEWS_AND_MAGAZINES', 9549178), ('BOOKS_AND_REFERENCE', 8767812), ('SHOPPING', 7036877), ('PERSONALIZATION', 5201483), ('WEATHER', 5074486), ('HEALTH_AND_FITNESS', 4188822), ('MAPS_AND_NAVIGATION', 4056942), ('FAMILY', 3695642), ('SPORTS', 3638640), ('ART_AND_DESIGN', 1986335), ('FOOD_AND_DRINK', 1924898), ('EDUCATION', 1833495), ('BUSINESS', 1712290), ('LIFESTYLE', 1437816), ('FINANCE', 1387692), ('HOUSE_AND_HOME', 1331541), ('DATING', 854029), ('COMICS', 817657), ('AUTO_AND_VEHICLES', 647318), ('LIBRARIES_AND_DEMO', 638504), ('PARENTING', 542604), ('BEAUTY', 513152), ('EVENTS', 253542), ('MEDICAL', 120551), ('', 1000)]


In [84]:
## Investigate most popular genres apps ##
install_list = ['1,000,000,000+', '500,000,000+', '100,000,000+']

for app in android_final:
    if app[1] == 'COMMUNICATION' and app[5] in install_list:
            print(app[0], ':', app[5]) # total number of users per app
print('\n')
for app in android_final:
    if app[1] == 'SOCIAL' and app[5] in install_list:
            print(app[0], ':', app[5]) # total number of users per app
print('\n')
for app in android_final:
    if app[1] == 'HEALTH_AND_FITNESS' and app[5] in ['50,000,000+', '10,000,000+', '5,000,000+']:
            print(app[0], ':', app[5]) # total number of users per app
print('\n')

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Apps within the genres of communication, video players, social, photography, and productivity are heavily skewed because apps with installations over 500,000,000 are related to well known entities such as WhatsApp, Facebook, Skype, Google, Gmail, and social media messengers.

Genres with potential include health and fitness, entertainment, and personalization.

Recommend app profile: health & fitness

Why? Like the Google Play Store it seems to be undervalued. These apps average over 4,000,000 installations. 

# Average rating by genre in Apple Store #
Our next analysis is to a look at user ratings. It will gives an idea of the user satisfaction for apps within the genres.

In [85]:
apple_genre = freq_table(apple_final, 11)
apple_users_sorted = dict()

for key in apple_genre.keys():
    total = 0
    len_genre = 0
    
    for row in apple_final:
        if row[11] == key:
            total += float(row[7])
            len_genre += 1
    apple_users_sorted[key] = round(total/len_genre, 2) # Calculation for average rating per app
    
print(sorted(apple_users_sorted.items(), key = lambda kv: kv[1], reverse=True))

[('Catalogs', 4.12), ('Games', 4.04), ('Productivity', 4.0), ('Business', 3.97), ('Shopping', 3.97), ('Music', 3.95), ('Photo & Video', 3.9), ('Navigation', 3.83), ('Health & Fitness', 3.77), ('Reference', 3.67), ('Education', 3.64), ('Food & Drink', 3.63), ('Social Networking', 3.59), ('Entertainment', 3.54), ('Utilities', 3.53), ('Travel', 3.49), ('Weather', 3.48), ('Lifestyle', 3.41), ('Finance', 3.38), ('News', 3.24), ('Sports', 3.07), ('Book', 3.07), ('Medical', 3.0)]


# Average rating by category in Google Play #
For this analysis, we'll need to supress ratings that contain non-numeric characters otherwise we can't complete our calculation for average rating.

In [86]:
android_genre = freq_table(android_final, 1)
android_users_sorted = dict()

for key in android_genre.keys():
    total = 0
    len_category = 0
    
    for row in android_final:
        if row[1] == key and row[2].replace('.', '', 2).isdigit(): # Need to test whether all characters are numbers
            total += float(row[2])
            len_category += 1
    android_users_sorted[key] = round(total/len_category, 2) # Calculation for average rating per app
    
print(sorted(android_users_sorted.items(), key = lambda kv: kv[1], reverse=True))

[('EVENTS', 4.44), ('BOOKS_AND_REFERENCE', 4.35), ('ART_AND_DESIGN', 4.34), ('PARENTING', 4.34), ('EDUCATION', 4.34), ('PERSONALIZATION', 4.3), ('BEAUTY', 4.28), ('SOCIAL', 4.25), ('HEALTH_AND_FITNESS', 4.24), ('WEATHER', 4.23), ('GAME', 4.23), ('SHOPPING', 4.23), ('SPORTS', 4.21), ('COMICS', 4.18), ('LIBRARIES_AND_DEMO', 4.18), ('AUTO_AND_VEHICLES', 4.18), ('PRODUCTIVITY', 4.18), ('FAMILY', 4.17), ('FOOD_AND_DRINK', 4.17), ('PHOTOGRAPHY', 4.16), ('MEDICAL', 4.15), ('HOUSE_AND_HOME', 4.14), ('COMMUNICATION', 4.13), ('FINANCE', 4.13), ('ENTERTAINMENT', 4.12), ('BUSINESS', 4.1), ('NEWS_AND_MAGAZINES', 4.1), ('LIFESTYLE', 4.08), ('TRAVEL_AND_LOCAL', 4.07), ('VIDEO_PLAYERS', 4.04), ('MAPS_AND_NAVIGATION', 4.04), ('TOOLS', 4.03), ('DATING', 3.98), ('', 1.9)]


# Conclusion #
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

Our conclusion is to build a fitness app. In both platforms this market is underserved. They are in the middle of the pack for average amount of users and ratings. An app that displays all available trails for runners and bikers also there is the possiblity of leveraging existing apps such as Google maps and fitbit. Start small by using large cities than expand to include other locations.