# Analyzing Mobile App Data

In this analysis, we will be looking into a dataset of mobile apps for IOS/Android. We will be looking at apps that are free to download/install on Google Play and the App Store. We will identify which genres and categories are the most popular, and our ultimate goal is to gather insights on the specific genre/category of apps that will attract the most number of users.

I'll be approaching this research as if I am an analyst at a company, and I am investigating on their behalf to determine what kind of app the company should create.

### Opening and Exploring the Data ###

The dataset that will be analyzed is a sample that contains data of about 10,000 Android apps from Google Play, and 7,000 iOS apps from the App Store. The dataset can be accessed through the following links: 

- <a href="https://www.kaggle.com/datasets/lava18/google-play-store-apps">Google Play</a>
- <a href="https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps">App Store</a>

In this analysis, we will focus only on apps that are **free** and use **English** as its main language. 

In [2]:
##import 'reader' class from the 'csv' module
from csv import reader


##ios
opened_file = open('AppleStore.csv') 
read_file = reader(opened_file)     ##use the reader to open and read file
ios = list(read_file)               ##convert the csv file to a list of lists
ios_header = ios[0]                 ##extract the header row
ios = ios[1:]                       ##splice out the header row

##android
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]



We begin by opening the dataset files and converting them to a list of lists. We then remove the header rows to make it easier for Python to skip over the headers when analyzing the data. We can now start exploring our datasets. We will begin with the Google Play data. 

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


A function called `explore_data` was written that will explore the rows and columnms of the data sets in a more readable way. The function prints the first few rows of the dataset and also gives us the number of rows and columns. The first row that the function gives is the header row that tells us the column labels. Among the columns, the `App`, `Category`, `Rating`, `Reviews`, `Installs`, `Type`, `Price`, `Content Rating`, and `Genres` columns are the ones that should be noted, as these columns may help us understand and gauge the popularity of certain apps. We can also see that the Google Play data set has 10841 rows and 13 columns.

Now, let's explore the App Store data in the same way. 

In [4]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


The App Store dataset has 7197 rows and 16 columns. Among the columns, the `price`, `rating_count_tot`, `rating_count_ver`, `user_rating`, and `prime_genre` are likely columns that are relevant in analyzing the popularity of apps. We will keep these columns in mind as we proceed with our analysis. 

### Deleting Wrong Data ###

This next part of the analysis deals with data cleaning. Datasets often contain inaccurate and duplicate data. In order to specifically target apps that are free and uses English, we must remove non-English Apps and remove apps that aren't free. In addition, if there is duplicate data, we must detect and remove it. 

In the Google Play dataset's discussion section, a user noted one example of an inaccurate data. The app in row 10472 of the dataset has a rating of 19, when the maximum possible rating for an app on Google Play is 4. We will simply delete this row and move on.

In [5]:
print(android[10472])
print('\n')
print(android_header)  # header

print(len(android))
del android[10472]  
print(len(android))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
10841
10840


The name of the app was *Life Made Wi-Fi Touchscreen Photo Frame* and we can confirm that it has an error rating of 19. The row has been deleted and we can see that instead of 10841 rows, the dataset now has 10840. 

### Removing Duplicate Entries: Part One ###

The Google Play data set has lots of duplicate entries. We'll be removing rows based on how recent the data entry is. By looking at the reviews for each app, we can know how recent the data is. For example, out of the four duplicate data entries for Instagram, the one with the most reviews will be the most recent data. We will keep that one and remove the other 3.

Instagram is an example of an app in Google Play that has duplicate entries. It occurs 4 times

In [6]:
for app in android: 
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Let's check to see how many duplicate apps we have total in Google Play:

In [7]:
duplicate_apps = []
unique_apps = []

for app in android: 
    name = app[0]
    if name in unique_apps: 
        duplicate_apps.append(name)
    else: 
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In total, there are 1181 apps in Google Play that have duplicate entries. We must find a way to remove these duplicate entries and make sure there is only one entry per app. For each app, we will keep the entry that has the highest number of reviews and delete the other entries.

In order to do this, we will build a dictionary called `reviews_max`. Then, we will loop through the android data set and add each entry to `reviews_max dictionary`. If the entry is already in `reviews_max`, the code checks to see if the current entry has a higher number of reviews than the one in the dictionary. The entry with the higher number of reviews is kept. The dictionary will only contain the maximum number of reviews for each app in the dataset. 

### Removing Duplicate Entries: Part Two ###

In [8]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews


print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


The code loops through the dataset and uses the `reviews_max` dictionary as a reference to determine if the data entry that is being examined has the highest number of reviews among its duplicates. If the number of reviews for the data entry matches that of the number of reviews in "reviews_max," this means that it has the highest number of reviews. 


In [9]:
android_clean = []
already_added = []

for app in android: 
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)


To remove the duplicate entries, two empty lists were made: `android_clean` and `already_added`. Then we loop through the data set and identify the app name and number of reviews. If the number of reviews of the app matches the number of reviews in the `reviews_max` dictionary, then it will be appended to `android_clean` and the app name to the `already_added` list. The second part of the conditional addresses the possibility where the highest number of reviews of a duplicate app is the same for more than one entry. 

In [10]:
explore_data(android_clean, 0, 4, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9659
Number of columns: 13


Upon using the `explore_data` function, it is confirmed that the `android_clean` list contains 9659 rows. Therefore, we can safely assume that duplicate entries have been removed and this list contains only unique data entries. 

### Removing Non-English Apps ###

The dataset also has some non-English apps that needs to be removed. Here are some examples:

In [11]:
print(ios[813][1])
print(ios[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


To remove such non-English apps, we first need a function that can detect non-English letters. The function `is_it_english` below takes in each character in a string and returns True or False depending on whether it corresponds to the ASCII (American Standard Code for Information Interchange) system. 

In [12]:
def is_it_english(string):
    non_ASCII = 0 
    
    for character in string:
        if ord(character) > 127:
            non_ASCII += 1
        
    if non_ASCII > 3:
        return False
    else: 
        return True 
    
print(is_it_english('Instagram'))
print(is_it_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_it_english('Docs To Go™ Free Office Suite'))
print(is_it_english('Instachat 😜'))

True
False
True
True


We can see that the function is working as intended. If an app is in another language, the function will return False. Emojis and trademarks are fine and will return as True as long as the rest of the title is in English. Next, we will apply this function as a filter as we gather all the English language apps into a new list. 

In [13]:
android_english = []
ios_english = []

for app in android_clean: 
    name = app[0]
    if is_it_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if is_it_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

Upon filtering out the non-English apps, we are left with 9614 android apps and 6183 iOS apps. 

### Isolating the Free Apps ###

Now that we have made sure we are only working with English-language apps, the next step is to isolate the free apps. The dataset currently consists of both free and paid apps. The cell below runs a loop to detect apps that have a price of 0 and appends those apps into the `android_free` and `ios_free` lists. 

In [14]:
android_free = []
ios_free = []

for app in android_english: 
    price = app[7]
    if price == '0':
        android_free.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_free.append(app)

print(len(android_free))
print(len(ios_free))

8864
3222


There are 8864 free apps on Google Play and 3222 free apps on the App Store. These are the final lists of apps that will be analyzed in this investigation. 

### Analyzing the Most Common Apps by Genre ###

The data cleaning process is complete and we can finally start analyzing the data that we have organized and sorted. We will first analyze how common an app genre is by looking at how much representation they have in their mobile platforms. Then, we will gauge their popularity by looking at how many number of users each app genre has. 

In order to visualize the data that we have, a frequency table must be constructed. The code below constructs a frequency table that displays the percentages of each genre of apps in each mobile platform and displays the percentages in a descending order. 


In [15]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

            
    
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now, we will analyze the apps and gauge how common each app genre is based on how common they are in their respective platforms. 

**App Store**

In [16]:
display_table(ios_free, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


**App Store Analysis**

Based on this data, the most common app category in the App Store is the gaming genre. Entertainment apps come at second place, but the competition is not close. Games are by far the most dominant category. 

On a broader scale, apps in the App Store that are designed for entertainment purposes (such as games, social media, photography, etc.) generally tend to be more common than apps designed for practical purposes (such as education, shopping, utilities, productivity, etc.). However, it is important to note that the broad availability of apps of a certain category does not automatically translate to popularity. It is entirely possible that the App Store is bombarded by game apps that nobody wants to play. The availability of theses apps may not necessarily match the demand for them and we will keep this in mind as we proceed with our analysis. 

**Google Play**

In [17]:
display_table(android_free, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

**Google Play Analysis** 

In the Google Play Store, the most common app genre is "Family." It is almost double that of the next most common category, which is the "Game" genre. "Tools" genre is a close third place. 

Compared to the App Store data, games are still among the most common apps in Google Play. In fact, the Family genre is composed mostly of games meant for kids. However, they are nowhere near as common as they are in the App Store. It looks like the apps that are geared towards families tend to be more common. 

The dataset here is definitely less skewed towards a single category as it is for the gaming category in the App Store. There is much better representation of practical apps in the Google Play Store. 

### Most Popular Apps by Genre (App Store) ###

So far, we have analyzed the app genres by how common and how much representation they have in their respective platforms. Now, we will determine the most popular genre of apps by looking at the number of users. The App Store data does not provide us with the number of users, so we are going to use the total number of user ratings instead. 

The code below is similar to what we did when analyzing the most common apps. The frequency table will generate the number of users for each genre. 

In [18]:
genres_ios = freq_table(ios_free, -5)

# Create a list to store tuples of (average_n_ratings, genre)
avg_ratings_by_genre = []

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    if len_genre > 0:
        average_n_ratings = total / len_genre
        avg_ratings_by_genre.append((average_n_ratings, genre))

# Sort the list of tuples in descending order based on average ratings
avg_ratings_by_genre.sort(reverse=True)

# Print the sorted results
for avg_rating, genre in avg_ratings_by_genre:
    print(genre, ":", avg_rating)

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


Based on reviews alone, navigations apps have the highest number of users. However, we must be cautious before determining that navigation apps are the most popular genre in the App Store. If we look into the data closely, it is clear that most of these reviews are exclusively for only 2 specific apps: Google Maps and Waze. 

In [19]:
for app in ios_free:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Google Maps and Waze alone make up for almost 500,000 reviews for navigation apps. It's important to keep in mind that when it comes to certain genres that have a lot of reviews, some of these numbers are skewed by a few mainstream apps that happen to to dominate the scene. Here are some examples: 



In [20]:
for app in ios_free:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [21]:
for app in ios_free:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5]) # print name and number of ratings

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

We find a similar pattern in the `Reference` and `Social Networking` columns. The Bible App, the Dictionary.com apps, and Google Translate alone could make up one of the most popular genres in itself. Same with social networking, without the big networking apps like Facebook, Pinterest, and Skype, Social Networking would not be among the very top when it comes to the most popular genres. 

On the contrary, the `Finance` genre seems to be more evenly distributed in comparison. If a company was to set about creating a free app in the app store, a finance app seems promising. There is a demand for finance apps, and it is not yet dominated by any particular apps in the way other genres are. 

I would also highlight the `Music` genre in our analysis. This category follows the pattern we observed earlier, where it is dominated by a select few apps. Yet, there is still a strong enough market even outside of the mainstream apps due to the sheer popularity of music apps. Smaller music apps such as ringtone makers and karaoke apps see decent amount of success; not nearly successful as Pandora or Spotify, but enough to make a good profit. 

The cells below show the user distribution amongst these two columns:

In [35]:
for app in ios_free:
    if app[-5] == 'Finance':
        print(app[1], ':', app[5]) # print name and number of ratings

Chase Mobile℠ : 233270
Mint: Personal Finance, Budget, Bills & Money : 232940
Bank of America - Mobile Banking : 119773
PayPal - Send and request money safely : 119487
Credit Karma: Free Credit Scores, Reports & Alerts : 101679
Capital One Mobile : 56110
Citi Mobile® : 48822
Wells Fargo Mobile : 43064
Chase Mobile : 34322
Square Cash - Send Money for Free : 23775
Capital One for iPad : 21858
Venmo : 21090
USAA Mobile : 19946
TaxCaster – Free tax refund calculator : 17516
Amex Mobile : 11421
TurboTax Tax Return App - File 2016 income taxes : 9635
Bank of America - Mobile Banking for iPad : 7569
Wells Fargo for iPad : 2207
Stash Invest: Investing & Financial Education : 1655
Digit: Save Money Without Thinking About It : 1506
IRS2Go : 1329
Capital One CreditWise - Credit score and report : 1019
U by BB&T : 790
Paribus - Rebates When Prices Drop : 768
KeyBank Mobile : 623
VyStar Mobile Banking for iPhone : 434
Sparkasse - Your mobile branch : 77
VyStar Mobile Banking for iPad : 57
Zaim : 4

In [36]:
for app in ios_free:
    if app[-5] == 'Music':
        print(app[1], ':', app[5]) # print name and number of ratings

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

We have analyzed the popularity of apps in the App Store, now let's analyze the Google Play market. 

### Most Popular Apps by Genre (Google Play) ###

We begin by generating a frequency table for the Google Play app categories. The frequency table will show us the number of installs for each category

In [23]:
categories_android = freq_table(android_free, 1)

avg_installs_by_category = []

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
    if len_category > 0:
        avg_n_installs = total / len_category
        avg_installs_by_category.append((avg_n_installs, category))
        
avg_installs_by_category.sort(reverse=True)

for avg_installs, category in avg_installs_by_category:
    print(category, ':', avg_installs)


COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

**Google Play Analysis**: 

Based on this data, there is a clear top 3 in the Google Play store: `communication`, `video players`, and `social media`. However, once again these are genres whose data are skewed by a few apps that dominate the market. For example, the communication category has several apps that have over 100 million and 500 million installs:

In [43]:
for app in android_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

If we removed the top communication apps (WhatsApp, Messenger, Skype, etc.), their average number of installs would be reduced significantly. If we were to create an app in any of the communication, video players, and social media categories,it would be nearly impossible to compete against these apps that already have established an oligopoly. 

A game app seems to be a safe recommendation. No matter the platform, games are always at the top of the market. 

In [42]:
for app in android_free:
    if app[1] == 'GAME':
        print(app[0], ':', app[5])

Solitaire : 10,000,000+
Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Bubble Witch 3 Saga : 50,000,000+
Race the Traffic Moto : 10,000,000+
Marble - Temple Quest : 10,000,000+
Shooting King : 10,000,000+
Geometry Dash World : 10,000,000+
Jungle Marble Blast : 5,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Block Craft 3D: Building Simulator Games For Free : 50,000,000+
Farm Fruit Pop: Party Time : 1,000,000+
Love Balls : 50,000,000+
Piano Tiles 2™ : 100,000,000+
Pokémon GO : 100,000,000+
Paint Hit : 10,000,000+
Snake VS Block : 50,000,000+
Rolly Vortex : 10,000,000+
Woody Puzzle : 1,000,000+
Stack Jump : 10,000,000+
The Cube : 5,000,000+
Extreme Car Driving Simulator : 100,000,000+
Bricks n Balls : 1,000,000+
The Fish Master! : 1,000,000+
Color Road : 10,000,000+
Draw In : 10,000,000+
PLANK! : 500,000+
Looper! : 1,000,000+
Trivia Crack : 100,000,000+
Will it Crush? : 5,000,000+
Tomb of the Mask : 5,000,000+
Baseball Boy! : 10,000,000+
Hello Stars : 10,000,000+
Tank Stars : 1

Another recommendation would be a health and fitness app. Fitness is something that a lot of people seem to be interested in. There is a ton of variety and small apps that see a lot of sucess. 

In [44]:
for app in android_free:
    if app[1] == 'HEALTH_AND_FITNESS':
        print(app[0], ':', app[5])

Step Counter - Calorie Counter : 500,000+
Lose Belly Fat in 30 Days - Flat Stomach : 5,000,000+
Pedometer - Step Counter Free & Calorie Burner : 1,000,000+
Six Pack in 30 Days - Abs Workout : 10,000,000+
Lose Weight in 30 Days : 10,000,000+
Pedometer : 10,000,000+
LG Health : 10,000,000+
Step Counter - Pedometer Free & Calorie Counter : 10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App : 10,000,000+
Sportractive GPS Running Cycling Distance Tracker : 1,000,000+
30 Day Fitness Challenge - Workout at Home : 10,000,000+
Home Workout for Men - Bodybuilding : 1,000,000+
Fat Burning Workout - Home Weight lose : 100,000+
Buttocks and Abdomen : 500,000+
Walking for Weight Loss - Walk Tracker : 100,000+
Running & Jogging : 500,000+
Sleep Sounds : 1,000,000+
Fitbit : 10,000,000+
Lose Belly Fat-Home Abs Fitness Workout : 50,000+
Cycling - Bike Tracker : 500,000+
Abs Training-Burn belly fat : 100,000+
Calorie Counter - EasyFit free : 1,000,000+
Aunjai i lert u : 500,000+
Garmin Connect

### Final Conclusions ###

In this project, we analyzed data about the most popular apps on the App Store and Google Play. Our goal was to find a genre of apps that we can recommend to a company wanting to create a brand new app. 

First, I would advise a company to look into the music genre. Even though the popularity of music apps based on their number of installs/reviews is inflated by mainstream music apps, the sheer popularity of the music industry allows for smaller music apps to succeed. People seem to love listening to music, so perhaps we can create an app that will aid people in listening to music. Apps such as song recommenders, tools for making music videos run in the background, or choosing a popular singer/band and making an app about their music would be a couple of ideas. 

Second, I would recommend a health and fitness app. Fitness apps are highly represented in both the App Store and Google Play. People are clearly interested in taking care of their health and fitness, particularly with exercise and calories. 

Lastly, a game app would be another recommendation. We can't ignore the popularity of games. As we said earlier, they are always at the top of the market regardless of the platform. However, because there are so many mobile games out there already, we would have to make sure that our game is unique and enjoyable. Perhaps we can get creative and make a musical game, or even a game that is about health and fitness. The possibilities are endless, but through data analysis we can narrow down those possibilities and make an informed decision based on data. 