# Analyzing the Characteristics of Popular Apps

This project is designed to help a team of app developers identify common traits amongst the most popular apps on both Android and iOS platforms. Since our developers only create apps that are free to download, with revenue coming from in-app ads, we want to focus on apps with the highest level of engagement.

### Exploring the Data

We will be analyzing two data sets for this project. The first is from August 2018 and contains data from approximately 10,000 Android apps on the Google Play Store. The second is from July 2017 and contains data from approximately 7,000 iOS apps from the App Store.

To begin, we will create a function that allows us to explore these two data sets in more detail. This function will import a data set, print the rows we'd like to see, and (optionally) print the number of rows and columns in the range we've selected.

In [1]:
from csv import reader

# Imports data, slices rows based on start and end arguments, and prints sliced list 

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') 

# If rows_and_columns argument is True, prints number of rows and columns

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))


Next we'll import the two sets we'd like to explore using the "explore_data" function we just created. 

In [2]:
# Opening and reading App Store data, then converting to list of lists

opened_ios = open('AppleStore.csv')
read_ios = reader(opened_ios)
list_ios = list(read_ios)

# Opening and reading Google Play data, then converting to list of lists

opened_android = open('googleplaystore.csv')
read_android = reader(opened_android)
list_android = list(read_android)

# Passing in first three rows of imported data sets into exploratory function

explore_data(list_ios[1:], 0, 3, True)
print('\n')
explore_data(list_android[1:], 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1,

It appears that our "explore_data" function works as intended. Let's also print the header row of each set to see which columns may be useful in the next stages of our analysis.

In [3]:
print(list_android[0])
print('\n')
print(list_ios[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


### Identifying relevant fields

It appears that the following fields are relevant to the goals of our analysis: Category, Reviews, Installs, Type, Price, price, rating_count_tot, cont_rating, prime_genre, sup_devices.num, ipadSc_urls.num and vpp_lic.

Here is the supporting documentation for the two data sets, which provides context for each of the fields:

[AppleStore.csv](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) 

[googleplaystore.csv](https://www.kaggle.com/lava18/google-play-store-apps/home)

### Cleaning the Data

Before we jump into analyzing our sets, we should ensure that there are no data integrity issues that could skew our results.

When we read through the documentation of the Google Play set, we can see that other users have flagged an issue with row 10473. It appears that the "Category" column is missing, which is causing subsequent columns to erroneously shift. Let's delete this row.

In [4]:
# Printing and deleting problematic row

print(list_android[10473])
del list_android[10473]

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


### Removing Duplicate Entries

A cursory look at the Google Play set reveals another issue: some apps contain more than one entry. We'll need to identify and remove these duplicate entries, which we can accomplish using the following steps:

1. Create two new empty lists - one to store unique values (app names we're encountering for the first time) and another to store duplicate values (app names that already exists in our list of unique values).

2. Loop through each row of the data set and check to see if the app name in that row has already been added to our list of unique values. If it hasn't, it means it's the first time we're encountering that app, so we need to add it.

3. If the app name does exist in our list of unique values, it means the app contains a duplicate, so we need to add it to our list of duplicate values.

In [5]:
# Looping through Play Store dataset, identifying if a row is unique or a duplicate, and storing in the applicable list

unique_list = []
duplicate_list = []

for row in list_android:
    app_name = row[0]
    if app_name in unique_list:
        duplicate_list.append(app_name)
    else:
        unique_list.append(app_name)

# Testing out the code by printing the first three lines of our duplciate list and its length

print(duplicate_list[:3])  
print(len(duplicate_list))

['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business']
1181


We can see that there are 1,181 duplicate entries in our dataset. We need to establish some logic for determining which of the duplicate entries should be kept. A sensible approach would be to identify and retain the most recent entry. Looking at the column fields, we can see a field called "Reviews," which provides the total number of user reviews submitted. We can use this field to associate the highest number of reviews with the most recent entry.

Let's create a dictionary to store app names (our keys) and the highest number of reviews associated with that app (our values). To find the highest number of reviews, we'll loop through each row in our set. For each loop, if we detect a review count that's higher than the current value in our dictionary, we'll assign the new value.

In [6]:
# Looping through Play Store dataset to find max number of reviews for each app, then adding to dictionary for reference

reviews_max = {}
for row in list_android[1:]: # Slicing to exclude header row
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews
        
# Printing length of newly created dictionary to ensure code above successful

print(len(reviews_max))


9659


We can use our newly created dictionary as a tool in creating a new, duplicate-free list. To achieve that we will loop through our data set and compare the number of reviews in each row to our "reviews_max" dictionary. When we find a row with a rating value that that matches the value in our dictionary, we'll append that row to a new "clean" list.

In [7]:
# Creating a new list with no duplicates

android_clean = []
already_added = []

for row in list_android[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(row)
        already_added.append(name)

# Checking the length and first rows of our new cleaned list

print(android_clean[:10])
print(len(android_clean))
    
    

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up'], ['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+', '

### Removing Non-English Characters

We've successfully removed duplicate entries, but we need to further cleanse our data by finding and removing apps that are not developed for an English audience (as they aren't relevant to the market our developers are targeting). To accomplish this task, we can leverage ASCII numbers. Every character in a string has a corresponding ASCII number, with common English characters falling anywhere between 0 and 127. 

To find non-English apps, we can loop through the characters of each app name and detect anything out of range. However, since many commonly used characters like emojis also fall out of range, we need to establish a reasonable threshold so we're not needlessly discarding useful data. Thus, we will build a function that removes any app that has a total of three or more characters outside of range.

In [8]:
# This function takes in a string and returns False if any character in the string doesn't belong to the set of common English characters (0 and 127). Otherwise it returns True.

def ascii_check(name):
    out_of_bounds_total = 0
    for char in name:
        if ord(char) > 127:
            out_of_bounds_total += 1
        if out_of_bounds_total > 3:
            return False
    return True

# Checking the results to make sure our function works

print(ascii_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(ascii_check('Instagram'))
print(ascii_check('Docs To Go™ Free Office Suite'))
print(ascii_check('Instachat 😜'))

False
True
True
True


Next we'll loop through both data sets and use our newly created "ascii_check" funtion to filter out any non-English apps. The remaining apps will be added to a new list. Finally, we'll print the lengths of our new lists to see how many apps are left.

In [9]:
# Using the new function to filter out non-English apps, then checking the length of the newly created lists

eng_list_ios = []
eng_android_clean = []

for row in list_ios[1:]:
    app_name = row[1]
    if ascii_check(app_name):
        eng_list_ios.append(row)

for row in android_clean:
    app_name = row[0]
    if ascii_check(app_name):
        eng_android_clean.append(row)

print(len(eng_list_ios))
print(len(eng_android_clean))


6183
9614


### Removing Paid Apps

Since our developers only build free-to-use apps, we should also remove from our sets any paid apps. Luckily our Android data includes a column, "Type", that provides this information. There is no equivalent in the iOS data, but it does contain a column with pricing information, which does the job (a price of 0.0 == free). We'll loop through each data set, identify which apps are free, and add those apps to a new list.

In [10]:
free_list_ios = []
free_list_android = []

for row in eng_android_clean:
    type = row[6]
    if type == 'Free':
        free_list_android.append(row)
        
for row in eng_list_ios:
    price = float(row[4])
    if price == 0.0:
        free_list_ios.append(row)
        
# Checking lengths of new lists

print(len(free_list_android))
print(len(free_list_ios))

8863
3222


### Validation Strategy

Our data is now in sufficient shape to move forward with our analysis. We mentioned at the outset that our goal is to identify the kinds of apps that are likely to attract the most users. For any new app idea, our developers have a validation strategy that consists of three steps:

1. Build a minimal Android version of the app and add it to the Google Play store

2. If the app has a good response from users, develop it further

3. If the app is profitable after six months, build an iOS version of the app and add it to the App Store

Therefore it's imperative that we identify what makes a successful app on both the Google Play and App Stores.

To start, we need to build a frequency table for the most common genres for each market. We'll use the "Genres" and "Category" columns from the Google Play data, and "prime_genre" column from the App Store data. 

To help build our frequency table, we'll create a function that generates frequency tables. It will accept a data set and column index as its two inputs, and return a frequency table in dictionary format, with frequencies represented as percentages.

In [11]:
# Function takes in data set (list of lists) and index number, and returns dictionary representing frequency of items found in index column as percentage (our frequency table)

def freq_table(dataset, index):
    freq_table_dict = {}
    for row in dataset:
        index_value = row[index]
        if index_value not in freq_table_dict:
            freq_table_dict[index_value] = 1
        else:
            freq_table_dict[index_value] += 1
    for element in freq_table_dict:
        freq_table_dict[element] = freq_table_dict[element] / len(dataset) * 100 
    return freq_table_dict

One issue with our frequency table is that it exists in a dictionary, and dictionaries sort values based on keys, not values. That means if we tried to sort our frequency table, Python would sort it based on the names of the apps, not their frequencies. We can get around that problem by converting our dictionary into a list of tuples with the key-value order reversed. 

In [12]:
# Function transforms dictionary (frequency table) into list of reverse-order tuples, then sorting in descending order

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now let's run our data sets through our revised frequency table generator, selecting the columns we identified earlier as relevant to our analysis. We'll then print the resulting tables and analyze the results.

In [13]:
# Printing frequency tables for relevant columns to analyze data

print(display_table(free_list_android, 1))
print('\n')
print(display_table(free_list_android, 9))
print('\n')
print(display_table(free_list_ios, 11))


FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

### Frequency Table Analysis

Here are some observations about the frequency tables we just printed:

#### App Store (iOS)

The most common genre, by a significant margin, is "Games", representing close to 60% of apps in our data set. The runner-up is "Entertainment," representing ~8% share of our set. In fact, most of the apps in the top 10 are related to some form of entertainment, including "Photo & Video" (~5%), "Social Networking" (~3.3%), "Shopping" (~2.6%), "Sports" (~2.1%) and "Music" (~2%). 

On the opposite end of the spectrum, apps with the lowest frequency are more utilitarian and practical in nature (e.g. "Business" (~0.5%), "Navigation" (~0.2%), "Medical" (~0.2%) and "Catalogs" (~0.1%).

Based on the above, it would seem that the ideal profile is a game-based app, or more generally, an app that delivers some form of entertainment.

#### Google Play Store (Android)

We see a very different landscape on the Android store. Our top five genres are an equal mix of fun and utilitarian: "Tools" (~8.5%), "Entertainment" (~6.1%), "Education" (~5.3%), "Business" (~4.6%) and "Productivity" (~3.9%). We see a similar pattern in the "Category" column, with "Game" and "Tools" being the second and third most popular, respectively. This mixed pattern appears to continue throughout our frequency tables, demonstrating that there is no dominant app genre on Google Play store.

#### Putting the Pieces Together

It's clear that entertainment-based apps dominate the iOS store, with the overwhelming majority (close to 60%) being games. 

On the Android store, there is no dominant profile; apps for entertainment and utilitarian purposes are equally represented throughout our frequency tables.

With no consistent pattern across both iOS and Android, it's difficult to recommend a specific app profile at this point. Furthermore, our frequency tables have an important limitation: they don't tell us how many users these apps are attracting. For example, just because most apps on iOS are game-based doesn't necessarily mean those apps are drawing in the most users. Perhaps they're so ubiquitous simply because they're easier to develop. So which apps are attracting the most users? We'll need to expand our investigation to answer that question.

### Finding Apps with High Usage

To understand what kinds of apps attract the most users, we can look at average installs. The Google Play data set makes this easy as it contains a column with information on installs. However, this information is missing in the App Store data set. The best substitute is to use the "rating_count_tot" column, which indicates the number of user ratings.

Let's tackle the more challenging App Store data first. We'll calculate the average number of user ratings per app genre.

In [14]:
# Generating frequency table for unique app genres

prime_genre_freq = freq_table(free_list_ios, 11)

# Calculating average user rating of each app genre with nested loop

avg_usr_rating_dict = {}
for genre in prime_genre_freq:
    total = 0
    len_genre = 0
    for app in free_list_ios:
        genre_app = app[11]
        if genre_app == genre:
            total += float(app[5])
            len_genre += 1
    avg_usr_rating = total / len_genre
    avg_usr_rating_dict[genre] = avg_usr_rating
    
table_display = []
for key in avg_usr_rating_dict:
    key_val_as_tuple = (avg_usr_rating_dict[key], key)
    table_display.append(key_val_as_tuple)

table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])


Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


Earlier in our analysis we saw that game-based apps outnumbered all other app genres by significant margin on the iOS store. However, the average user ratimgs we printed above tells a very different story. "Games" is on the lower end of the spectrum, with roughly 25% of the average user ratings held by the leading genre, "Navigation". Since we're targeting apps with high user engagement and usage, we should focus our attention to the "Navigation" category. Also worth considering are "Reference" and "Social Networking", which are on the higher end of the scale as well. It's not entirely clear what kinds of apps are categorized as "Reference," so let's print a list with just those apps to get a better sense.

In [15]:
reference_list = []

for row in free_list_ios:
    if row[11] == 'Reference':
        reference_list.append(row[1])
print(reference_list)

['Bible', 'Dictionary.com Dictionary & Thesaurus', 'Dictionary.com Dictionary & Thesaurus for iPad', 'Google Translate', 'Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran', 'New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition', 'Merriam-Webster Dictionary', 'Night Sky', 'City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE)', 'LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools', 'GUNS MODS for Minecraft PC Edition - Mods Tools', 'Guides for Pokémon GO - Pokemon GO News and Cheats', 'WWDC', 'Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free', 'VPN Express', 'Real Bike Traffic Rider Virtual Reality Glasses', '教えて!goo', 'Jishokun-Japanese English Dictionary & Translator']


Next, we'll identify Android apps with the highest levels of usage through the "Installs" column. Note that this column provides install ranges as opposed to exact numbers (e.g. 100+, 1,000+, 5,000+). This should be sufficient for us as we simply want to understand which apps attract the most users. When computing our averages, we'll ignore the "+" and use the number displayed.

In [16]:
# Generating frequency table for unique app genres using "Category" column

category_freq = freq_table(free_list_android, 1)

# Calculating average number of installs for each app genre with nested loop

avg_installs_dict = {}
for category in category_freq:
    total = 0
    len_category = 0
    for row in free_list_android:
        category_app = row[1]
        if category_app == category:
            installs = row[5]
            installs = installs.replace('+', '')
            installs = float(installs.replace(',', ''))
            total += installs
            len_category += 1
    avg_installs = total / len_category
    avg_installs_dict[category] = avg_installs
    
table_display = []
for key in avg_installs_dict:
    key_val_as_tuple = (avg_installs_dict[key], key)
    table_display.append(key_val_as_tuple)

table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])
   

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3697848.1731343283
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

Earlier in our analysis, we saw that there was a fairly even distribution of entertainment-based and utility-based apps on the Google Play store. At first glance, it appears that pattern applies to our list of most-installed app categories. At the top are "Communication", "Video Players", "Social", "Photography" and "Productivity." Most of these are self-explanatory, but it's unclear what sorts of apps fall under our top category, "Communication." Let's print the names of the apps in this category to see if we can learn more.

In [17]:
communication_list = []

for row in free_list_android:
    if row[1] == 'COMMUNICATION':
        communication_list.append(row[0])
print(communication_list)

['WhatsApp Messenger', 'Messenger for SMS', 'My Tele2', 'imo beta free calls and text', 'Contacts', 'Call Free – Free Call', 'Web Browser & Explorer', 'Browser 4G', 'MegaFon Dashboard', 'ZenUI Dialer & Contacts', 'Cricket Visual Voicemail', 'TracFone My Account', 'Xperia Link™', 'TouchPal Keyboard - Fun Emoji & Android Keyboard', 'Skype Lite - Free Video Call & Chat', 'My magenta', 'Android Messages', 'Google Duo - High Quality Video Calls', 'Seznam.cz', 'Antillean Gold Telegram (original version)', 'AT&T Visual Voicemail', 'GMX Mail', 'Omlet Chat', 'My Vodacom SA', 'Microsoft Edge', 'Messenger – Text and Video Chat for Free', 'imo free video calls and chat', 'Calls & Text by Mo+', 'free video calls and chat', 'Skype - free IM & video calls', 'Who', 'GO SMS Pro - Messenger, Free Themes, Emoji', 'Messaging+ SMS, MMS Free', 'chomp SMS', 'Glide - Video Chat Messenger', 'Text SMS', 'Talkray - Free Calls & Texts', 'LINE: Free Calls & Messages', 'GroupMe', 'mysms SMS Text Messaging Sync', '2

It appears that "Communication" is a very broad category, with everything from web browsers to streaming radio - a mix of entertainment and utility.

We could try suggesting an app profile for the Google Play store, but remember that we want a profile that would work for both Android and iOS. Now that we have a list of the most downloaded genres for Google Play, let's compare to the most reviewed apps on iOS to find any points of intersection.

#### App Profile Recommendation

There are a suprising number of differences between the leading app genres in each list. Let's start with the top two on iOS, "Navigation" and "Reference"; we can find their equivalents on Android in "Maps and Navigation" and "Books and Reference", respectively. However, the Android equivalents are ranked much lower: "Maps and Navigation" is 16th, and "Books and Reference" is 11th. These may not be the best profiles for success on both platforms.

Next let's look at the 4th and 5th most popular genres on iOS, "Music" and "Weather". For "Music", the closest category on Android is "Entertainment", sitting in 8th place. The problem, though, is that "Entertainment" is too broad a category. We can flag this for further investigation, but at this point there's not enough evidence that this would succeed on both platforms. The "Weather" genre is also not a great candidate as its Android equivalent is only at 14th place.

The most promising genre, based on a high-degree of popularity on both stores, appears to be Social Networking. This category is ranked a very impressive 3rd place on both platforms. One question that comes to mind, though, is how many social apps are generating this kind of popularity? We saw in the frequency tables we generated earlier that social apps were not that common. Is this a case of a handful of massively popular apps driving our metrics up? Let's print a list of the social apps on Android to see.

In [19]:
social_list = []

for row in free_list_android:
    if row[1] == 'SOCIAL':
        social_list.append(row[0])
print(social_list)

['Facebook', 'Facebook Lite', 'Tumblr', 'Social network all in one 2018', 'Pinterest', 'TextNow - free text + calls', 'Google+', 'The Messenger App', 'Messenger Pro', 'Free Messages, Video, Chat,Text for Messenger Plus', 'Telegram X', 'The Video Messenger App', 'Jodel - The Hyperlocal App', 'Hide Something - Photo, Video', 'Love Sticker', 'Web Browser & Fast Explorer', 'LiveMe - Video chat, new friends, and make money', 'VidStatus app - Status Videos & Status Downloader', 'Love Images', 'Web Browser ( Fast & Secure Web Explorer)', 'SPARK - Live random video chat & meet new people', 'Golden telegram', 'Facebook Local', 'Meet – Talk to Strangers Using Random Video Chat', 'MobilePatrol Public Safety App', '💘 WhatsLov: Smileys of love, stickers and GIF', 'HTC Social Plugin - Facebook', 'Quora', 'Kate Mobile for VK', 'Family GPS tracker KidControl + GPS by SMS Locator', 'Moment', 'Text Me: Text Free, Call Free, Second Phone Number', 'Text Free: WiFi Calling App', 'Badoo - Free Chat & Dating

We can see that there are many apps in the social category, at least on Android, so that alleviates one concern. We also want to ensure that majority of the installs aren't coming from a few major apps.

#### Next Steps

Our project is almost complete, but we need to investigate the social category further by looking at install and rating information in more detail.