# Profitable App Profiles for the App Store and Google Play Markets

We are a game development group only building apps that are free to 
download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app — the more users who see and engage with the ads, the better.

Therefore the goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users. To do this, we will need to collect and analyze about mobile apps available on Google Play and the App Store.

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play. Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, there are  two datasets suitable for our goals.

   - [A dataset](https://www.kaggle.com/lava18/google-play-store-apps?select=googleplaystore.csv) containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).

   - [A dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

Let's start by opening the two data sets and then continue with exploring the data.
    

In [1]:
from csv import reader

opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android_apps_data = android[1:]
print(android_header)
print('\n')
print(android_apps_data[0])
print('\n')
print(android_apps_data[1])
print('\n')

opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
apple = list(read_file)
apple_header = apple[0]
apple_apps_data = apple[1:]
print(apple_header)
print('\n')
print(apple_apps_data[0])
print('\n')
print(apple_apps_data[1])


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Phot

## Data cleaning

Before beginning our analysis, we need to clean our data properly otherwise the results of our analysis will be inaccurate. This means that we need to do the following:
   - Detect inaccurate data, and correct or remove it.
   - Detect duplicate data, and remove the duplicates.

The Google Play dataset has a dedicated discussion section, and we can see that [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes an error for a certain row.

There is one row missing a field. So now we need to determine the index of this row.

In [2]:
index_incomplete_entry = 0 
for app in android_apps_data:
    if len(app) != len(android_header):
        index_incomplete_entry = android_apps_data.index(app)
        break
print(index_incomplete_entry)
print(android_apps_data[index_incomplete_entry])

# Remove the incomplete row
del android_apps_data[index_incomplete_entry]

10472
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [3]:
# Now to see if this row is removed
index_incomplete_entry = None 
for app in android_apps_data:
    if len(app) != len(android_header):
        index_incomplete_entry = android_apps_data.index(app)
        break
print(index_incomplete_entry)

None


In [4]:
# Removing the duplicated entries, entries with the same app name
# ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', '
# Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 
# 'Android Ver']
# It can be seen that only colume 'App' should be the Primary key to uniquely
# identify apps

# You can find more information from this discussion link
# https://www.kaggle.com/lava18/google-play-store-apps/discussion

# This code is to find out the entries with dupliated name
print(android_header)
unique_apps = []
duplicate_apps = []
print(len(android_apps_data))
for app in android_apps_data:
    if app[0] in unique_apps:
        duplicate_apps.append(app[0])
    else:
        unique_apps.append((app[0]))
print('Number of unique apps: ', len(unique_apps))
print('The first 10 unique apps: ', (unique_apps[0:10]))
print('Number of duplicated apps: ', len(duplicate_apps))
print('The first 10 duplicated apps: ', duplicate_apps[0:10])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
10840
Number of unique apps:  9659
The first 10 unique apps:  ['Photo Editor & Candy Camera & Grid & ScrapBook', 'Coloring book moana', 'U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'Sketch - Draw & Paint', 'Pixel Draw - Number Art Coloring Book', 'Paper flowers instructions', 'Smoke Effect Photo Maker - Smoke Editor', 'Infinite Painter', 'Garden Coloring Book', 'Kids Paint Free - Drawing Fun']
Number of duplicated apps:  1181
The first 10 duplicated apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [5]:
# In this section, we need to create a list called android_clean
# containing unique apps with the highest number of reviews of each app
# index of app name is 0, index of review number is 3

android_clean = []
# android_dict store the app entry with the highest review
# {'Garden Coloring Book': (8, 1000)}
# 8 is the index of this app entry in android_apps_data
# 1000 is the highest review number of this app
# This step is to create a dictionary to contain all apps with the hightest review number
android_dict = {}
for app in android_apps_data:
    name = app[0]
    if name in android_dict:
        if android_dict[name][1] < app[3]:
            del android_dict[name]
            android_dict[name] = (android_apps_data.index(app), app[3])
    else:
        android_dict[name] = (android_apps_data.index(app), app[3])

# This step is just to 
for key in android_dict:
    index = android_dict[key][0]
    android_clean.append(android_apps_data[index])
    
# inspect the length of android_clean, it should be 9659
print(len(android_clean))


9659


**Filtering out those English-speaking apps**

Remember we use English for the apps we develop at our company, and we'd like to analyze only the apps that are designed for an English-speaking audience. However, if we explore the data long enough, we'll find that both datasets have apps with names that suggest they are not designed for an English-speaking audience. We're not interested in keeping these apps, so we'll remove them. 

One way to do this is to remove each app with a name containing a symbol that isn't commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

We can get the corresponding number of each character using the ord() built-in function.

First step is to write a function that takes in a string and returns False if there's any character in the string that doesn't belong to the set of common English characters; otherwise, the function returns True.

In [6]:
# The numbers corresponding to the characters we commonly use in an English 
# text are all in the range 0 to 127, according to the ASCII 
# (American Standard Code for Information Interchange) system. Based on this
# number range, we can build a function that detects whether a character 
# belongs to the set of common English characters or not. 
# If the number is equal to or less than 127, then the character belongs to 
# the set of common English characters.
# If number of charactor falling outside the english text ACSII range is more than 3,
# Then this text is considered non-english text
def is_english_title(str):
    count = 0
    for c in str:
        if ord(c) > 127:
            count += 1;
            if count >= 3:
                return False
    return True
print(is_english_title('Instachat 😜😜😜'))

False


Use the new function to filter out non-English apps from both datasets. Loop through each dataset. If an app name is identified as English, append the whole row to a separate list.Explore the datasets and see how many rows you have remaining for each dataset.


In [7]:
# Android app dataset
android_clean_english_apps = []
for app in android_clean:
    if is_english_title(app[0]):
        android_clean_english_apps.append(app)
print(len(android_clean_english_apps))

# Apple apps
apple_apps_data_english = []
for app in apple_apps_data:
    if is_english_title(app[1]):
        apple_apps_data_english.append(app)
print(len(apple_apps_data_english))


9597
6155


So far in the data cleaning process, we've done the following:

   * Removed inaccurate data 
   * Removed duplicate app entries
   * Removed non-English apps
   * Isolating the free apps (will be done in the next section)
   
As we mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our datasets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

Isolating the free apps will be our last step in the data cleaning process. On the next screen, we're going to start analyzing the data.

In [8]:
# Filtering out the free apps
# Price index in android is 7, Price index in apple is 4

def extract_price(str):
    if str[0] == '$':
        str = str[1:]
    return float(str)

android_free_apps = []
for app in android_clean_english_apps:
    float_price = extract_price(app[7])
    if float_price == 0.0:
        android_free_apps.append(app)
apple_free_apps = []
for app in apple_apps_data_english:
    float_price = extract_price(app[4])
    if float_price == 0.0:
        apple_free_apps.append(app)

print(len(android_free_apps))
print(len(apple_free_apps))

8846
3203


## App Validation 

As we mentioned in the introduction, our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

To minimize risks and overhead, our validation strategy for an app idea has three steps:

   1. Build a minimal Android version of the app, and add it to Google Play.
   2. If the app has a good response from users, we develop it further.
   3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.
   
Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by determining the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our datasets.

Let's first inspect both data sets and identify the columns we could use to generate frequency tables to determine the most common genres in each market.

In [9]:
print(list(zip(android_header, android_apps_data[2])))
print('\n')
print(list(zip(apple_header, apple_apps_data[0])))

[('App', 'U Launcher Lite – FREE Live Cool Themes, Hide Apps'), ('Category', 'ART_AND_DESIGN'), ('Rating', '4.7'), ('Reviews', '87510'), ('Size', '8.7M'), ('Installs', '5,000,000+'), ('Type', 'Free'), ('Price', '0'), ('Content Rating', 'Everyone'), ('Genres', 'Art & Design'), ('Last Updated', 'August 1, 2018'), ('Current Ver', '1.2.4'), ('Android Ver', '4.0.3 and up')]


[('id', '284882215'), ('track_name', 'Facebook'), ('size_bytes', '389879808'), ('currency', 'USD'), ('price', '0.0'), ('rating_count_tot', '2974676'), ('rating_count_ver', '212'), ('user_rating', '3.5'), ('user_rating_ver', '3.5'), ('ver', '95.0'), ('cont_rating', '4+'), ('prime_genre', 'Social Networking'), ('sup_devices.num', '37'), ('ipadSc_urls.num', '1'), ('lang.num', '29'), ('vpp_lic', '1')]


For Android apps, we will use 'Category' and 'Genres' to create frequency table， indexes are 1 and 9. For Apple apps, we will use 'prime_genre' to create frequency table, index is 11.

In [10]:
def freq_table(apps_data, index):
    freq_table = {}
    for app in apps_data:
        if app[index] in freq_table:
            freq_table[app[index]] += 1
        else:
            freq_table[app[index]] = 1
    for key in freq_table:
        freq = round(freq_table[key] / len(apps_data) * 100, 2)
        freq_table[key] = (freq, freq_table[key]) 
    return freq_table

In [11]:
# display the frequency table in a descending order
def display_table(freq_table):
    table_display = []
    for key in freq_table:
        value_key_tuple = (freq_table[key], key)
        table_display.append(value_key_tuple)
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

## Frequency table analysis

We'll now focus on analyzing these frequency tables. 
   
Analyze the frequency table you generated for the prime_genre column of the App Store dataset.

   * What is the most common genre? What is the next most common?
   * What other patterns do you see?
   * What is the general impression — are most of the apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle) or more for entertainment (games, photo and video, social networking, sports, music)?
   * Can you recommend an app profile for the App Store market based on this frequency table alone? If there's a large number of apps for a particular genre, does that also imply that apps of that genre generally have a large number of users?

Analyze the frequency table you generated for the Category and Genres column of 
the Google Play dataset.
   * What are the most common genres?
   * What other patterns do you see?
   * Compare the patterns you see for the Google Play market with those you saw for the App Store market.
   * Can you recommend an app profile based on what you found so far? Do the frequency tables you generated reveal the most frequent app genres or what genres have the most users?

In [12]:
print('Apple apps by prime_genres')
display_table(freq_table(apple_free_apps,11))

Apple apps by prime_genres
Games : (58.26, 1866)
Entertainment : (7.84, 251)
Photo & Video : (5.0, 160)
Education : (3.68, 118)
Social Networking : (3.31, 106)
Shopping : (2.59, 83)
Utilities : (2.47, 79)
Sports : (2.15, 69)
Music : (2.06, 66)
Health & Fitness : (2.03, 65)
Productivity : (1.75, 56)
Lifestyle : (1.56, 50)
News : (1.34, 43)
Travel : (1.25, 40)
Finance : (1.09, 35)
Weather : (0.87, 28)
Food & Drink : (0.81, 26)
Reference : (0.53, 17)
Business : (0.53, 17)
Book : (0.37, 12)
Navigation : (0.19, 6)
Medical : (0.19, 6)
Catalogs : (0.12, 4)


We can see that among the free English apps, more than a half (55.65%) are games. Entertainment apps are sightly more than 8%, followed by photo and video apps, which are slightly more than 4%. Only 3.53% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

Let's continue by examining the Genres and Category columns of the Google Play data set (two columns which seem to be related).

In [13]:
print('Google apps by Category')
display_table(freq_table(android_free_apps,1))

Google apps by Category
FAMILY : (18.97, 1678)
GAME : (9.67, 855)
TOOLS : (8.44, 747)
BUSINESS : (4.6, 407)
PRODUCTIVITY : (3.9, 345)
LIFESTYLE : (3.89, 344)
FINANCE : (3.71, 328)
MEDICAL : (3.53, 312)
SPORTS : (3.39, 300)
PERSONALIZATION : (3.32, 294)
COMMUNICATION : (3.23, 286)
HEALTH_AND_FITNESS : (3.09, 273)
PHOTOGRAPHY : (2.95, 261)
NEWS_AND_MAGAZINES : (2.8, 248)
SOCIAL : (2.67, 236)
TRAVEL_AND_LOCAL : (2.34, 207)
SHOPPING : (2.25, 199)
BOOKS_AND_REFERENCE : (2.14, 189)
DATING : (1.87, 165)
VIDEO_PLAYERS : (1.8, 159)
MAPS_AND_NAVIGATION : (1.39, 123)
FOOD_AND_DRINK : (1.24, 110)
EDUCATION : (1.18, 104)
ENTERTAINMENT : (0.96, 85)
LIBRARIES_AND_DEMO : (0.94, 83)
AUTO_AND_VEHICLES : (0.93, 82)
HOUSE_AND_HOME : (0.8, 71)
WEATHER : (0.79, 70)
EVENTS : (0.71, 63)
PARENTING : (0.66, 58)
ART_AND_DESIGN : (0.64, 57)
COMICS : (0.61, 54)
BEAUTY : (0.6, 53)


The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.).However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

In [14]:
print('Google apps by Genres')
display_table(freq_table(android_free_apps,9))

Google apps by Genres
Tools : (8.43, 746)
Entertainment : (6.08, 538)
Education : (5.36, 474)
Business : (4.6, 407)
Productivity : (3.9, 345)
Lifestyle : (3.88, 343)
Finance : (3.71, 328)
Medical : (3.53, 312)
Sports : (3.46, 306)
Personalization : (3.32, 294)
Communication : (3.23, 286)
Action : (3.1, 274)
Health & Fitness : (3.09, 273)
Photography : (2.95, 261)
News & Magazines : (2.8, 248)
Social : (2.67, 236)
Travel & Local : (2.33, 206)
Shopping : (2.25, 199)
Books & Reference : (2.14, 189)
Simulation : (2.05, 181)
Dating : (1.87, 165)
Arcade : (1.84, 163)
Video Players & Editors : (1.77, 157)
Casual : (1.75, 155)
Maps & Navigation : (1.39, 123)
Food & Drink : (1.24, 110)
Puzzle : (1.13, 100)
Racing : (0.99, 88)
Role Playing : (0.94, 83)
Libraries & Demo : (0.94, 83)
Auto & Vehicles : (0.93, 82)
Strategy : (0.92, 81)
House & Home : (0.8, 71)
Weather : (0.79, 70)
Events : (0.71, 63)
Adventure : (0.67, 59)
Comics : (0.6, 53)
Beauty : (0.6, 53)
Art & Design : (0.6, 53)
Parenting : (0


The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

## Most Popular Apps by Genre on the App Store 

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Let's start with calculating the average number of user ratings per app genre on the App Store.

In [21]:
apple_total_number_install = {}
for app in apple_free_apps:
    if app[11] in apple_total_number_install:
        apple_total_number_install[app[11]] += float(app[5])
    else:
        apple_total_number_install[app[11]] = float(app[5])
apple_avg_number_install = {}
apple_freq_table_by_catogory = freq_table(apple_free_apps, 11)
for category in apple_freq_table_by_catogory:
    apple_avg_number_install[category] = \
        int(apple_total_number_install[category] / \
            (apple_freq_table_by_catogory[category][1]))

# print the apple_freq_table_by_catogory in a table format
display_table(apple_avg_number_install)
# print(apple_freq_table_by_catogory)


Navigation : 86090
Reference : 79350
Social Networking : 71548
Music : 57326
Weather : 52279
Book : 46384
Food & Drink : 33333
Finance : 32367
Photo & Video : 28441
Travel : 28243
Shopping : 27230
Health & Fitness : 23298
Sports : 23008
Games : 22886
News : 21248
Productivity : 21028
Utilities : 19156
Lifestyle : 16815
Entertainment : 14195
Business : 7491
Education : 7003
Catalogs : 4004
Medical : 612


Navigation ranked the first on this average total ranking. This result is out of expectation. Let's pull out the first 10 app of each with the highest rating_count_tot.

In [30]:
apple_navigation_apps = []
for app in apple_free_apps:
    if app[11] == 'Navigation':
        apple_navigation_apps.append((app[1],float(app[5])))
apple_navigation_apps.sort(key=lambda x:x[1], reverse=True)
for app in apple_navigation_apps:
    print(app[0], ': ', app[1])
print('\n')

Waze - GPS Navigation, Maps & Real-time Traffic :  345046.0
Google Maps - Navigation & Transit :  154911.0
Geocaching® :  12811.0
CoPilot GPS – Car Navigation & Offline Maps :  3582.0
ImmobilienScout24: Real Estate Search in Germany :  187.0
Railway Route Search :  5.0




Waze is a free navigation app for Android and iPhone. It offers community-based traffic (i.e., traffic details from other Waze users), aiming to help you avoid traffic and always be able to take the best route to your destination. At this point, we might be able infer that some functionality based app can build up significant user base. Now let's take a look at Reference.

In [34]:
apple_reference_apps = []
for app in apple_free_apps:
    if app[11] == 'Reference':
        apple_reference_apps.append((app[1],float(app[5])))
apple_reference_apps.sort(key=lambda x:x[1], reverse=True)
for app in apple_reference_apps[:10]:
    print(app[0], ': ', app[1])

Bible :  985920.0
Dictionary.com Dictionary & Thesaurus :  200047.0
Dictionary.com Dictionary & Thesaurus for iPad :  54175.0
Google Translate :  26786.0
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran :  18418.0
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition :  17588.0
Merriam-Webster Dictionary :  16849.0
Night Sky :  12122.0
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) :  8535.0
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools :  4693.0


At this point, it appears that good functional apps (Bible, Wave) can outrun the big tech brand. Now let's take a look at the Social Networking, Music, Weather.

In [35]:
# Social Networking
apple_sn_apps = []
for app in apple_free_apps:
    if app[11] == 'Social Networking':
        apple_sn_apps.append((app[1],float(app[5])))
apple_sn_apps.sort(key=lambda x:x[1], reverse=True)
for app in apple_sn_apps[:10]:
    print(app[0], ': ', app[1])
print('\n')

Facebook :  2974676.0
Pinterest :  1061624.0
Skype for iPhone :  373519.0
Messenger :  351466.0
Tumblr :  334293.0
WhatsApp Messenger :  287589.0
Kik :  260965.0
ooVoo – Free Video Call, Text and Voice :  177501.0
TextNow - Unlimited Text + Calls :  164963.0
Viber Messenger – Text & Call :  164249.0




In [36]:
# Music
apple_music_apps = []
for app in apple_free_apps:
    if app[11] == 'Music':
        apple_music_apps.append((app[1],float(app[5])))
apple_music_apps.sort(key=lambda x:x[1], reverse=True)
for app in apple_music_apps[:10]:
    print(app[0], ': ', app[1])
print('\n')

Pandora - Music & Radio :  1126879.0
Spotify Music :  878563.0
Shazam - Discover music, artists, videos & lyrics :  402925.0
iHeartRadio – Free Music & Radio Stations :  293228.0
SoundCloud - Music & Audio :  135744.0
Magic Piano by Smule :  131695.0
Smule Sing! :  119316.0
TuneIn Radio - MLB NBA Audiobooks Podcasts Music :  110420.0
Amazon Music :  106235.0
SoundHound Song Search & Music Player :  82602.0




In [None]:
# Music
apple_music_apps = []
for app in apple_free_apps:
    if app[11] == 'Music':
        apple_music_apps.append((app[1],float(app[5])))
apple_music_apps.sort(key=lambda x:x[1], reverse=True)
for app in apple_music_apps:
    print(app[0], ': ', app[1])
print('\n')

In [37]:
# Weather
apple_weather_apps = []
for app in apple_free_apps:
    if app[11] == 'Weather':
        apple_weather_apps.append((app[1],float(app[5])))
apple_weather_apps.sort(key=lambda x:x[1], reverse=True)
for app in apple_weather_apps[:10]:
    print(app[0], ': ', app[1])
print('\n')

The Weather Channel: Forecast, Radar & Alerts :  495626.0
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking :  208648.0
WeatherBug - Local Weather, Radar, Maps, Alerts :  188583.0
MyRadar NOAA Weather Radar Forecast :  150158.0
AccuWeather - Weather for Life :  144214.0
Yahoo Weather :  112603.0
Weather Underground: Custom Forecast & Local Radar :  49192.0
NOAA Weather Radar - Weather Forecast & HD Radar :  45696.0
Weather Live Free - Weather Forecast & Alerts :  35702.0
Storm Radar :  22792.0




## Most Popular Apps by Genre on Google Play

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [38]:
for app in android_free_apps[:10]:
    print(app[0],': ', app[5])

Photo Editor & Candy Camera & Grid & ScrapBook :  10,000+
U Launcher Lite – FREE Live Cool Themes, Hide Apps :  5,000,000+
Sketch - Draw & Paint :  50,000,000+
Pixel Draw - Number Art Coloring Book :  100,000+
Paper flowers instructions :  50,000+
Smoke Effect Photo Maker - Smoke Editor :  50,000+
Infinite Painter :  1,000,000+
Garden Coloring Book :  1,000,000+
Kids Paint Free - Drawing Fun :  10,000+
Text on Photo - Fonteee :  1,000,000+


For Android app, it is not clear that exactly how many installs each app has. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users.We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. 

In [17]:
def trim(str):
    str = str.replace('+','')
    str = str.replace(',', '')
    return str

In [39]:
android_total_number_install = {}
for app in android_free_apps:
    if app[1] in android_total_number_install:
        android_total_number_install[app[1]] += float(trim(app[5]))
    else:
        android_total_number_install[app[1]] = float(trim(app[5]))
android_avg_number_install = {}
android_freq_table_by_catogory = freq_table(android_free_apps, 1)
for category in android_freq_table_by_catogory:
    android_avg_number_install[category] = \
        int(android_total_number_install[category] / \
            (android_freq_table_by_catogory[category][1]))

display_table(android_avg_number_install)
print(android_header)

COMMUNICATION : 38590581
VIDEO_PLAYERS : 24727872
SOCIAL : 23253652
PHOTOGRAPHY : 17805627
PRODUCTIVITY : 16787331
GAME : 15516683
TRAVEL_AND_LOCAL : 13984077
ENTERTAINMENT : 11640705
TOOLS : 10710881
NEWS_AND_MAGAZINES : 9549178
BOOKS_AND_REFERENCE : 8814199
SHOPPING : 7036877
PERSONALIZATION : 5201482
WEATHER : 5145550
HEALTH_AND_FITNESS : 4188821
MAPS_AND_NAVIGATION : 4049274
FAMILY : 3694276
SPORTS : 3650602
ART_AND_DESIGN : 1986335
FOOD_AND_DRINK : 1924897
EDUCATION : 1820673
BUSINESS : 1712290
LIFESTYLE : 1446158
FINANCE : 1387692
HOUSE_AND_HOME : 1360598
DATING : 854028
COMICS : 832613
AUTO_AND_VEHICLES : 647317
LIBRARIES_AND_DEMO : 638503
PARENTING : 542603
BEAUTY : 513151
EVENTS : 253542
MEDICAL : 120616
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Recall our ultimate goal of this project is that we need to recommend app catogory which can gain popularity on both Apple and Android platform. For the first 10 ranking catagories in Google, the markets are all occupied by those Gients tech companies, just as on Apple. However, "Reference" app still sheds some light here.

In [40]:
for app in android_free_apps:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E