# Profitable App Profiles for the App Store and Google Play Markets

This project will collect and analyze a sample of data about mobile apps available on Google Play and the App Store. The data sets used are as follows: 
- Approximately 10 thousand Android apps from August 2018 (https://www.kaggle.com/lava18/google-play-store-apps)
- Approximately 7 thousand iOS apps from July 2017 (https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

The goal of this project is to help identify and understand what kinds of free apps are likely to attract more users on Google Play and the App Store.

We begin by defining a function that helps to explore the data sets.

In [6]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Now, we will open both data sets and use our function to display a small portion of the information.

In [7]:
from csv import reader
opened = open('AppleStore.csv')
read = reader(opened)
ios = list(read)
ios_header = ios[0]
ios_apps = ios[1:]

opened = open('googleplaystore.csv')
read = reader(opened)
android = list(read)
android_header = android[0]
android_apps = android[1:]

In [8]:
print(ios_header)
print('\n')
explore_data(ios_apps, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


The iOS data has 7,197 apps and 16 columns. More information about the column names can be found at https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home

In [9]:
print(android_header)
print('\n')
explore_data(android_apps, 0 , 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

The Android app data has 10,841 apps and 13 columns.

***

## Data Cleaning

We will first perform some data cleaning in preparation for data analysis. First, we will detect and delete incorrect data.

### Finding and Removing Invalid Data

In the Discussions tab of the Google Play data, there is a discussion describing an error in row 10472. We will further investigate the error by displaying the row below.

In [10]:
print(android_header)
print(android_apps[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


As we can see, the user rating for this app is 19, but the highest possible rating on Google Play is 5. For this reason, we will delete this data. 

In [11]:
print(android_apps[10472])
del android_apps[10472] #DO NOT RUN MORE THAN ONCE
print(android_apps[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


By printing out the row in the index after deleting the invalid data, we can ensure that the correct data was removed.

### Finding and Removing Duplicate Data

In the Discussions tab of the Google Play data, there are errors describing instances of duplicate entries in the data set. For example, the app Slack has multiple instances:

In [12]:
for app in android_apps:
    if app[0] == 'Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


To find and build a collection of all the duplicate apps, we will use a for loop to sweep through the entire data set and populate lists depending on whether the app is found to be unique or a duplicate.

In [13]:
duplicate = []
unique = []

for app in android_apps:
    name = app[0]
    if name in unique:
        duplicate.append(name)
    unique.append(name)

print(len(duplicate))

1181


Altogether, we found 1,181 instances where an app occurs more than once.

We don't want to have multiple instances of the same app when we analyze data, so we must find an ideal way to determine which of the multiple app entries to keep. 

As you can see above, where we printed all instances of the Slack app, the only difference occurs in the 4th data position, which corresponds to the number of reviews. Therefore, instead of removing duplicate data randomly, we will keep the entry with the most reviews as that signifies it is the most recent. 

To remove the duplicates, we will:
- Create a dictionary where each key is a unique app name and the corresponding value is the highest number of reviews for that particular app
- Use this dictionary to create a new data set with only one entry per app

In [14]:
reviews_max = {}

for app in android_apps:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

Previously, we found that there were 1,181 instances of duplicate apps. Thus, the length of our reviews_max dictionary should equal the inital number of apps (10,841) minus the app we deleted (1) minus the number of duplicate apps (1,181). 10,841 - 1 - 1,181 = 9,659.

In [15]:
print(len(reviews_max))

9659


Now, we can use this dictionary to remove the duplicate rows by:
- Creating a new list to store the new data set (android_clean) and an additional list to keep track of all apps already added to the cleaned data set (already_added)
- Looping through all the apps and:
    - Isolating the app name and number of reviews
    - We will add the app to the clean data set if:
        - The number of reviews is the same as the number in the reviews_max dict and if the name is not in already_added

In [16]:
android_clean = []
already_added =[]

for app in android_apps:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

Now, we can explore the clean data set and confirm it has the right amount of data.

In [17]:
explore_data(android_clean, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


There are 9,659 rows, just as expected.

### Finding and Removing non-English Apps

If we explore both data sets enough, we find that some apps are languages other than English. As our audience is English-speaking, we would like to find and remove any apps whose name suggest they are not directed toward an English-speaking audience. Some examples are shown below:

In [18]:
print(ios_apps[813])
print(ios_apps[6731])
print(android_clean[4412])
print(android_clean[7940])

['445375097', '爱奇艺PPS -《欢乐颂2》电视剧热播', '224617472', 'USD', '0.0', '14844', '0', '4.0', '0.0', '6.3.3', '17+', 'Entertainment', '38', '5', '3', '1']
['1120021683', '【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜', '77551616', 'USD', '0.0', '0', '0', '0.0', '0.0', '1.3', '12+', 'Games', '38', '0', '1', '1']
['中国語 AQリスニング', 'FAMILY', 'NaN', '21', '17M', '5,000+', 'Free', '0', 'Everyone', 'Education', 'June 22, 2016', '2.4.0', '4.0 and up']
['لعبة تقدر تربح DZ', 'FAMILY', '4.2', '238', '6.8M', '10,000+', 'Free', '0', 'Everyone', 'Education', 'November 18, 2016', '6.0.0.0', '4.1 and up']


One way to remove these non-English apps, is by removing apps whose name contains symbols not commonly used in English text. Each character in a string has a corresponding number associated with it. This number can be found using the built-in ord() function.

According to the ASCII system, the numbers corresponding to the characters we commonly use in English text are in the range from 0 to 127. Based on this, we can formulate a function that detects whether a character belongs to this common set of English characters. If an app name contains a character with a corresponding number greater than 127, it probably means the app has a non-English name.

In [19]:
def is_English(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True

print(is_English('Instagram'))
print(is_English('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


The function seems to work, but since some English app names contain symbols not commonly used in English text (e.g. ™, 😜), the current function idenfies these as not English.

In [20]:
print(is_English('Docs To Go™ Free Office Suite'))
print(is_English('Instachat 😜'))

False
False


To minimize data loss, we must determine a better way to identify non-English apps. Below we will modify the is_English function to return False if the app name contains more than three characters that fall outside of the ASCII range.

In [21]:
def is_English(string):
    non_ASCII = 0
    for character in string:
        if ord(character) > 127:
            non_ASCII += 1
    if non_ASCII > 3:
        return False
    else:
        return True

In [22]:
print(is_English('Docs To Go™ Free Office Suite'))
print(is_English('Instachat 😜'))

True
True


In [23]:
print(is_English('Instagram'))
print(is_English('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


The function is not perfect, but we think this is good enough to filter out the non-English apps. Below, we will use our function on both the iOS and Android data sets. 

In [24]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_English(name):
        android_english.append(app)

for app in ios_apps:
    name = app[1]
    if is_English(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 5, True)
print('\n')
explore_data(ios_english, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', 

After completing this step, we are left with 9,614 Android apps and 6,183 iOS apps.

### Isolating Free Apps

As we mentioned previously, we are only interested in apps that are free to download, so the next step is to isolate only the free apps for this data analysis.

In [25]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
    
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
explore_data(android_final, 0, 5, True)
print('\n')
explore_data(ios_final, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', 

After data cleaning is completed, we have 8,864 Android apps and 3,222 iOS apps.

*** 

## Data Analysis

As mentioned previously, our aim is to determine the kinds of apps that are likely to attract more users because revenue is highly influenced by the number of people using apps. 

Our validation strategy for an app idea is as follows:
1. Build a minimal Android app and add it to Google Play.
2. If the app has a good response from users, we'll develop it further.
3. If the app is profitable after six months, we'll build an iOS version and add it to the App Store.

As our end goal is to add an app to both the App Store and Google Play, we'll need to find app profiles that are successful on both markets. 

We'll start by finding the most common genres for each market.

### Most Common App Genres

We'll use frequency tables to find the most common app genres.

Inspecting the data sets, we can find that we'll need to build a frequency table for the prime_genre column of the App Store data set and for the Genres and Category columns of the Google Play data set.

We'll build two functions to assist in this:
- One to generate frequency tables that show percentages
- Another to display percentages in descending order

In [26]:
def freq_table(dataset, index):
    freq_table = {}
    total = 0
    for item in dataset:
        total += 1
        value = item[index]
        if value in freq_table:
            freq_table[value] += 1
        else:
            freq_table[value] = 1
    
    table_percent = {}
    for key in freq_table:
        percentage = (freq_table[key] / total) * 100
        table_percent[key] = percentage
    
    return table_percent

In [27]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

First, we can use our functions to evaluate the prime_genre column of the App Store data set.

In [28]:
display_table(ios_final, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can see that of the apps in our iOS data set, more than half (58.2%) fall into the Games genre. This is followed by Entertainment apps at 7.9%, Photo & Video apps at 5.0% and Education apps at 3.7%.

This gives the general impression that a majority of apps are designed for fun rather than practical uses. However, a higher volume of apps in the Games category does not necessarily mean these apps bring in the greatest number of users. 

Next, we can use the functions to evaluate the Genres and Category columns of the Google Play data set.

In [29]:
display_table(android_final, 1) #Category column

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Here, the highest percentage of apps fall into the Family category (18.9%) followed by Games at 9.72%, Tools at 8.5% and Business at 4.6%. 

This shows a spread that seems to be a little different from the iOS apps. It appears that more apps are designed for functionality (Tools, Business) rather than just fun (Games). However, apps in the Family category could mean they're used for either or both practical and/or fun uses. 

In [30]:
display_table(android_final, 9) # Genre column

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Within the Genre column, the highest percentage is allocated to Tools (8.5%), followed by Entertainment (6.1%), Education (5.3%), and Business (4.6%).

The difference between the Category and Genre columns of the Google Play data set are not entirely clear, but it seems the the Genre column is more detailed and precise as opposed to the generalized sense of the Category column.

At this point, the frequency tables we analyzed showed us that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps.

### Most Popular App Genres

One way to determine the most popular app genres is by calculating the average number of installs for each app genre. For the Google Play data set, there is already an Installs column, quantifying the number of installs per app. However, this information is not provided for the App Store data set. As a result, we will use the total number of user ratings instead.

Let's start with the App Store:

#### Most Popular App Genres on the App Store

To calculate the average number of user ratings per app genre on the App Store, we will:
- Isolate the apps of each genre
- Sum up the user ratings for the apps of that genre
- Divide the sum by the number of apps belonging to that genre

In [32]:
ios_unique_genres = freq_table(ios_final, 11)

In [40]:
for genre in ios_unique_genres:
    total = 0 # sum of user ratings for each genre
    len_genre = 0 # count of apps in genre
    for app in ios_final:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg = total/len_genre
    print(genre + ' : ' + str(avg))

Education : 7003.983050847458
Music : 57326.530303030304
Entertainment : 14029.830708661417
Book : 39758.5
Shopping : 26919.690476190477
Productivity : 21028.410714285714
Sports : 23008.898550724636
Games : 22788.6696905016
Lifestyle : 16485.764705882353
Travel : 28243.8
Weather : 52279.892857142855
Utilities : 18684.456790123455
Social Networking : 71548.34905660378
Food & Drink : 33333.92307692308
Navigation : 86090.33333333333
Medical : 612.0
Business : 7491.117647058823
Finance : 31467.944444444445
Photo & Video : 28441.54375
Health & Fitness : 23298.015384615384
News : 21248.023255813954
Reference : 74942.11111111111
Catalogs : 4004.0


From this analysis, we can see that Navigation apps come out on top with the highest average user ratings at 86,090. This is followed by Reference (74,942), Social Networking (71,548), and Music (57,327).

However, this may be greatly skewed due to a couple apps that are largely popular. Below, we'll investigate how even or skewed the number of ratings are among Navigation apps.

In [39]:
for app in ios_final:
    if app[11] == 'Navigation':
        print(app[1] + ' : ' + app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Here, we can see that Waze and Google Maps have vastly larger numbers of user ratings that the other Navigation apps. Let's look into a few other genres to see if there are similar patterns.

In [44]:
for app in ios_final:
    if app[11] == 'Reference':
        print(app[1] + ' : ' + app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [42]:
for app in ios_final:
    if app[11] == 'Social Networking':
        print(app[1] + ' : ' + app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [43]:
for app in ios_final:
    if app[11] == 'Music':
        print(app[1] + ' : ' + app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

These all show similar patterns, where some genres are heavily skewed by some giants:
- Reference : Bible, Dictionary.com
- Social Networking : Facebook, Pinterest
- Music : Pandora, Spotify

This indicates that these genres may appear to be more popular than they actually are due to a few very large apps that bring the average number of user ratings up. However, the skew of average number of rating calculations was the smallest in the Reference category, indicating that this might be a more evenly spread and profitable category for our new app.

#### Most Popular App Genres on Google Play

For the Google Play data set, the number of installs are not exact and instead are grouped into categories, as shown below:

In [35]:
display_table(android_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


To use this data computationally, we'll first need to convert all of these strings to floats. We can use the replace() method for this.

In [36]:
android_unique_category = freq_table(android_final, 1)

In [41]:
for category in android_unique_category:
    total = 0 # sum of installs per category
    len_category = 0 # count of apps per category
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += int(n_installs)
            len_category += 1
    avg = total / len_category
    print(category + ' : ' + str(avg))

PARENTING : 542603.6206896552
LIBRARIES_AND_DEMO : 638503.734939759
SPORTS : 3638640.1428571427
PRODUCTIVITY : 16787331.344927534
COMICS : 817657.2727272727
DATING : 854028.8303030303
HOUSE_AND_HOME : 1331540.5616438356
BOOKS_AND_REFERENCE : 8767811.894736841
BEAUTY : 513151.88679245283
EDUCATION : 1833495.145631068
TOOLS : 10801391.298666667
VIDEO_PLAYERS : 24727872.452830188
COMMUNICATION : 38456119.167247385
ENTERTAINMENT : 11640705.88235294
WEATHER : 5074486.197183099
AUTO_AND_VEHICLES : 647317.8170731707
LIFESTYLE : 1437816.2687861272
FOOD_AND_DRINK : 1924897.7363636363
ART_AND_DESIGN : 1986335.0877192982
GAME : 15588015.603248259
EVENTS : 253542.22222222222
PERSONALIZATION : 5201482.6122448975
BUSINESS : 1712290.1474201474
FINANCE : 1387692.475609756
TRAVEL_AND_LOCAL : 13984077.710144928
MAPS_AND_NAVIGATION : 4056941.7741935486
PHOTOGRAPHY : 17840110.40229885
MEDICAL : 120550.61980830671
FAMILY : 3695641.8198090694
NEWS_AND_MAGAZINES : 9549178.467741935
SOCIAL : 23253652.12711864

The above analysis shows that the category with the highest average number of installs on Google Play is Communication with 38,456,119. This is followed by Video Players (24,727,872), and Social (23,253,652), and Photography (17,840,110).

Again, let's see if similar patterns occur here as they did above.

In [48]:
for app in android_final:
    if app[1] == 'COMMUNICATION':
        print(app[0] + ' : ' + app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

A few giants exist here, (e.g. WhatsApp, Messenger, Skype) with 1,000,000,000+ installs in addition to a few smaller giants with 500,000,000+ installs. Let's see these below:

In [61]:
for app in android_final:
    if app[1] == 'COMMUNICATION':
        n_installs = app[5]
        n_installs = n_installs.replace(',', '')
        n_installs = n_installs.replace('+', '') 
        if int(n_installs) >= 500000000:
            print(app[0] + ' : ' + app[5])

WhatsApp Messenger : 1,000,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Viber Messenger : 500,000,000+


If we re-calculate the average number of installs per app in the Communication category, we find a much lower average at 9,191,689.

In [56]:
total = 0
length = 0
for app in android_final:
    if app[1] == 'COMMUNICATION':
        n_installs = app[5]
        n_installs = n_installs.replace(',', '')
        n_installs = n_installs.replace('+', '') 
        if int(n_installs) < 500000000:
            total += int(n_installs)
            length += 1
print(total / length)

9191689.13405797


Let's see if there are similar patterns in other categories with a high average number of installs.

In [58]:
for app in android_final:
    if app[1] == 'VIDEO_PLAYERS':
        print(app[0] + ' : ' + app[5])

YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Video Maker & Community : 50,

In [62]:
for app in android_final:
    if app[1] == 'VIDEO_PLAYERS':
        n_installs = app[5]
        n_installs = n_installs.replace(',', '')
        n_installs = n_installs.replace('+', '') 
        if int(n_installs) >= 500000000:
            print(app[0] + ' : ' + app[5])

YouTube : 1,000,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+


In [59]:
for app in android_final:
    if app[1] == 'SOCIAL':
        print(app[0] + ' : ' + app[5])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
Web Browser ( Fast & Secure Web Explorer) : 500,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Golden telegram : 50,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, sti

In [63]:
for app in android_final:
    if app[1] == 'SOCIAL':
        n_installs = app[5]
        n_installs = n_installs.replace(',', '')
        n_installs = n_installs.replace('+', '') 
        if int(n_installs) >= 500000000:
            print(app[0] + ' : ' + app[5])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Google+ : 1,000,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+


Similarly, the Video Players category is dominated by YouTube, Google Play Movies & TV, and MX Player. The Social category has giants including Facebook, Google+, and Instagram. 

Again, the skewed analysis may make it seem that these categories are much more popular than in actuality. 

Let's look at the Books and Reference category of Google Play apps to see if there is a more even spread here in a similar fashion to the iOS data set.

In [64]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0] + ' : ' + app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [65]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        n_installs = app[5]
        n_installs = n_installs.replace(',', '')
        n_installs = n_installs.replace('+', '') 
        if int(n_installs) >= 500000000:
            print(app[0] + ' : ' + app[5])

Google Play Books : 1,000,000,000+


Here, there is only one app with more than 500,000,000 installs. This indicates that the Books and Reference category might be a good place to develop a new app.

***

## Conclusion

In this project, we analyzed iOS and Android apps to determine what type of free app could become the most profitable in both markets. 

We concluded that a book or reference app may be a profitable investment in both the Google Play and App Store markets. Further analysis should be done to look into the details of how to differentiate the new app from existing ones.