# Profitable App Profiles for the App Store and Google Play Markets

The mobile apps industry is everchanging.The goal of this project is to understand what type of free apps are likely to attract more English-speaking users on Google Play and the App Store. According to the study published on Statista (https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009/), as of December 2019, the Google Android and Apple iOS operating system jointly dominates almost 99% of the global market share.

## 1. Understanding the client

The client's revenue is highly influenced by the number of people using our apps. To minimize risks and overhead, their validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

The company's end goal is to add the app on both Google Play and the App Store, so there is a need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

## 2. Methodology

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

The two data sets used for this study are:

1. Data set of 10,000 Google Play Store Apps (googleplaystore.csv) from https://www.kaggle.com/lava18/google-play-store-apps
2. Data set of 7,200 Apple iOS App Store apps (AppleStore.csv) from https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

## 3. Exploring the Data

In [16]:
from csv import reader

#Google Play data set
opened_file = open('googleplaystore.csv', encoding='utf8')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

#App Store data set
opened_file = open('AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

The function explore_data() is created to to print rows in a readable way repeatedly:

In [291]:
#rows_and_columns is expected to be a Boolean and has False as a default argument
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

#### 3.1 Google Play Store CSV Data Set

In [292]:
print('The headers in the Google Play Store CSV file:')
print('\n')
print(android_header)
print('\n')

print('---------------------------------------------------------------------------')
print('\n')

print('The first three rows of the Google Play Store CSV file:')
print('\n')
explore_data(android, 0, 3, True)

The headers in the Google Play Store CSV file:


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


---------------------------------------------------------------------------


The first three rows of the Google Play Store CSV file:


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10840
Number of columns: 13


There are 10841 apps in Google Play. Among the 13 columns in this data set, the columns that are helpful for this analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.

#### 3.2 Apple  App Store CSV Data Set

In [293]:
print('The headers in the Apple Store CSV file:')
print('\n')
print(ios_header)
print('\n')

print('---------------------------------------------------------------------------')
print('\n')

print('The first three rows of the Apple Store CSV file:')
print('\n')
explore_data(ios, 0, 3, True)

The headers in the Apple Store CSV file:


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


---------------------------------------------------------------------------


The first three rows of the Apple Store CSV file:


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


The data provider has defined the column names as follows:

| Columns | Syntax                  | Description                                     |
| --------| ------------------------| ------------------------------------------------|
|    1    | "id "                   | App ID                                          |
|    2    | "track_name"            | App Name                                        |
|    3    | "size_bytes"            | Size (in Bytes)                                 |
|    4    | "currency"              | Currency Type                                   |
|    5    | "price"                 | Price amount                                    |
|    6    | "rating_count_tot"      | User Rating counts (for all version)            |
|    7    | "rating_count_ver"      | User Rating counts (for current version)        |
|    8    | "user_rating"           | Average User Rating value (for current version) |
|    9    | "ver"                   | Latest version code                             |
|    10   | "cont_rating"           | Content Rating                                  |
|    11   | "prime_genre"           | Primary Genre                                   |
|    12   | "sup_devices.num"       | Number of supporting devices                    |
|    13   | "ipadSc_urls.num"       | Number of screenshots showed for display        |
|    14   | "lang.num"              | Number of supported languages                   |
|    15   | "vpp_lic"               | Vpp Device Based Licensing Enabled              |


Based on the explanation in Meraki (https://documentation.meraki.com/SM/Apps_and_Software/Apple_Volume_Purchase_Program_(VPP)), VPP stands for Volume Purchase Program (VPP). It is an Apple portal for businesses and schools to purchase and license apps and books in volume. If the licence is enabled, it will be shown as 1 on the column.

There are 7197 apps in App Store. Among the 15 columns in this data set, the columns that helpful for this analysis are 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'.

## 4. Data Cleaning

#### 4.1 Removing Wrong Data

As described in the discussion session of the data set source (https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015), there is a wrong data shown on row 10472 of the Google Play data set:

In [36]:
print('This is the header row:')
print(android_header)  # header
print('\n')
print('Row 10472:')
print(android[10472])  # incorrect row
print('\n')
print('The first 3 rows of the data set is printed for comparison:')
print(android[0])
print(android[1])
print(android[2])

This is the header row:
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Row 10472:
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


The first 3 rows of the data set is printed for comparison:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']
['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


The 'Category' column is missing on row 10472 and the 'Rating' column in the data set is out of range. The maximum rating for a Google Play app is 5, but the rating of 'Life Made WI-Fi Touchscreen Photo Frame' is 19 as shown on the data set. Hence this row will be deleted:

In [37]:
print('The original size of data set:', len(android))

The original size of data set: 10841


#This command must not run more than once
#This command is changed to a markdown after execution to prevent re-run

del android[10472]

In [39]:
print('The size of data set after deletion:', len(android))

The size of data set after deletion: 10840


#### 4.2 Removing Duplicated Entries

Duplicate entries were found in the data set and have to be removed. There are some apps that have more than one entry. For example, Instagram has four entries in the Google Play Data Set:

In [58]:
print(android_header)
print('--------------------------------------------------------------------------------------')
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
--------------------------------------------------------------------------------------
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Looking into the above data, the difference between each Instagram entry is the 'Review' column, which indicates the number of reviews. This could be justified that the data was collected at different times. To ensure data reliability, the row with the highest number of reviews will be kept to ensure data accuracy.

#### 4.2.1 Part 1: Google Play Store Data Set

In the Google Play Store Data Set, there are 1181 duplicated apps.

In [48]:
android_duplicate_apps = []
android_unique_apps = []

for app in android:
    name = app[0]
    if name in android_unique_apps:
        android_duplicate_apps.append(name)
    else:
        android_unique_apps.append(name)

print('Number of duplicate apps:', len(android_duplicate_apps))
print('\n')
print('Examples of duplicate apps:', android_duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Duplicated data negatively impacts data accuracy, so duplicated data should be removed.

To delete duplicated data effectively, we will create a dictionary with the application name as the key and hightest number of reviews for that application as the value. The dictionary with unique application names will be used to create a new data set.

In [108]:
#Building the Google Play Store Dictionary by creating a function

def detect_duplicates(app_profile):
    
    max_reviews = {}

    for app in app_profile:
        name = app[0] # app name lies in the first column in the Google Play Store data set
        n_reviews = float(app[3])
        
        if name in max_reviews and max_reviews[name] < n_reviews:
            max_reviews[name] = n_reviews
        elif name not in max_reviews: max_reviews[name] = n_reviews
        
    return max_reviews

android_max_reviews = detect_duplicates(android)

print('Length before cleaning:', len(android))
print('Expected length:', len(android_unique_apps))
print('Actual length:', len(android_max_reviews))

Length before cleaning: 10840
Expected length: 9659
Actual length: 9659


The below steps are taken to remove duplicates:

1. Create two lists: android_clean and android_added_names
2. Create a 'for loop', for every iteration, the current row (app) to added the android_clean list, and the app name (name) to the android_added_names list when:
    - The current row is not in the android_added_names list
    - The rating column of the row does not mach the number of reviews of the application in android_max_reviews dictionary

In [305]:
android_clean = list()
android_added_names = list()

for app in android:
    name = app[0] # app name lies in the first column in the Google Play Store data set
    n_reviews = float(app[3])
    
    if (name not in android_added_names) and (android_max_reviews[name] == n_reviews):
        android_clean.append(app)
        android_added_names.append(name) # copy the name to android_added_names list for cross checking

In [307]:
print(android_header)

print('\n')

print('The first three rows of the cleaned Google Play Store file:')
print('\n')
explore_data(android_clean, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


The first three rows of the cleaned Google Play Store file:


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


The number of rows in the cleaned Google Play Store is 9659 as expected.

#### 4.2.2 Part 2: Apple App Store Data Set

Similar to the Google App Store data set, the Apple Store data set is being checked if there are any duplicates:

In [129]:
ios_duplicate_apps = list()
ios_unique_apps = list()

for app in ios:
    name = app[1] #app name lies in the second column of the Apple Store data set
    if name in ios_unique_apps:
        ios_duplicate_apps.append(name)
    else:
        ios_unique_apps.append(name)

print('Number of duplicate apps:', len(ios_duplicate_apps))
print('Examples of duplicate apps:', ios_duplicate_apps[:15])

Number of duplicate apps: 2
Examples of duplicate apps: ['Mannequin Challenge', 'VR Roller Coaster']


In [299]:
print('Details of the duplicated entries:')
print('\n')
print(ios_header)
print('------------------------------------------------------------------')
for row in ios:
    name = row[1]
    if name == 'Mannequin Challenge': print(row)
    elif name == 'VR Roller Coaster': print(row)

Details of the duplicated entries:


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
------------------------------------------------------------------
['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


As described in the discussion session of the data set source (https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion/90409), the fours apps above are not duplicates. Therefore, these four entries will not be deleted and the length of this data set remains unchanged.

In [87]:
print('Length of this data set:', len(ios))

Length of this data set: 7197


The number of rows in the Apple Store data set remains unchanged. There are 7197 rows in this data set.

#### 4.3 Removing non-English Apps¶

Since he goal of this project is to understand what type of apps are likely to attract more English-speaking audience on Google Play and the App Store, the non-English apps on the data set have to be removed. Below are the examples of non-English Apps in both data sets:

In [149]:
print('Examples from Google Play Data Set:')
print('-', android_clean[4412][0])
print('-', android_clean[7940][0])
print('\n')
print('Examples from Apple Store Data Set:')
print('-', ios[813][1])
print('- 20 Minutes.fr - l\'actualité en continu')

Examples from Google Play Data Set:
- 中国語 AQリスニング
- لعبة تقدر تربح DZ


Examples from Apple Store Data Set:
- 爱奇艺PPS -《欢乐颂2》电视剧热播
- 20 Minutes.fr - l'actualité en continu


Each character we use in a string has a corresponding number associated with it:

In [150]:
print('The corresponding number for character \'ス\' is', ord('ス'))
print('The corresponding number for character \'ح\' is', ord('ح'))
print('The corresponding number for character \'爱\' is', ord('爱'))
print('The corresponding number for character \'é\' is', ord('é'))
print('The corresponding number for character \'a\' is', ord('a'))
print('The corresponding number for character \'A\' is', ord('A'))

The corresponding number for character 'ス' is 12473
The corresponding number for character 'ح' is 1581
The corresponding number for character '爱' is 29233
The corresponding number for character 'é' is 233
The corresponding number for character 'a' is 97
The corresponding number for character 'A' is 65


According to  ASCII (American Standard Code for Information Interchange), numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127. However, some of these Enlish applications have names wih emoji or symbols:

In [181]:
print('Examples from Google Play Data Set:')
print('- Instachat 😜')
print('- Kernel Manager for Franco Kernel ✨')
print('- Docs To Go™ Free Office Suite')

print('\n')

print('Examples from Apple Store Data Set:')
print('- Room Escape Game - Santa\'s Room')
print('- SuperCam_Pro')

Examples from Google Play Data Set:
- Instachat 😜
- Kernel Manager for Franco Kernel ✨
- Docs To Go™ Free Office Suite


Examples from Apple Store Data Set:
- Room Escape Game - Santa's Room
- SuperCam_Pro


Some of the corresponding number of these symbols are out of the 0 to 127 range:

In [266]:
print('The corresponding number for character \'😜\' is', ord('😜'))
print('The corresponding number for character \'😍\' is', ord('😍'))
print('The corresponding number for character \'🔥\' is', ord('🔥'))
print('The corresponding number for character \'✨\' is', ord('✨'))
print('The corresponding number for character \'【\' is', ord('【'))
print('The corresponding number for character \'】\' is', ord('】'))
print('The corresponding number for character \'♪\' is', ord('♪'))
print('The corresponding number for character \'♥\' is', ord('♥'))
print('The corresponding number for character \'–\' is', ord('–'))
print('The corresponding number for character \'—\' is', ord('—'))
print('The corresponding number for character \'™\' is', ord('™'))
print('The corresponding number for character \'★\' is', ord('★'))
print('The corresponding number for character \'’\' is', ord('’'))
print('The corresponding number for character \'•\' is', ord('•'))
print('The corresponding number for character \'®\' is', ord('®'))

The corresponding number for character '😜' is 128540
The corresponding number for character '😍' is 128525
The corresponding number for character '🔥' is 128293
The corresponding number for character '✨' is 10024
The corresponding number for character '【' is 12304
The corresponding number for character '】' is 12305
The corresponding number for character '♪' is 9834
The corresponding number for character '♥' is 9829
The corresponding number for character '–' is 8211
The corresponding number for character '—' is 8212
The corresponding number for character '™' is 8482
The corresponding number for character '★' is 9733
The corresponding number for character '’' is 8217
The corresponding number for character '•' is 8226
The corresponding number for character '®' is 174


To minimize the impact of data loss, we will only remove an app if its name has more than three non-ASCII characters:

In [286]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print('The function is_english will return as \'False\' if name has more than three non-ASCII characters:')
print('Example 1: Docs To Go™ Free Office Suite:', is_english('Docs To Go™ Free Office Suite'))
print('Example 2: Instachat 😜:', is_english('Instachat 😜'))
print('Example 3: 爱奇艺PPS -《欢乐颂2》电视剧热播:', is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

The function is_english will return as 'False' if name has more than three non-ASCII characters:
Example 1: Docs To Go™ Free Office Suite: True
Example 2: Instachat 😜: True
Example 3: 爱奇艺PPS -《欢乐颂2》电视剧热播: False


In [301]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)

for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)

print('Google Play Store Apps:')
explore_data(android_english, 0, 3, True)
print('------------------------------------------------------------------')
print('Apple Store Apps:')
explore_data(ios_english, 0, 3, True)

Google Play Store Apps:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13
------------------------------------------------------------------
Apple Store Apps:
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


[

#### 4.4 Removing Free Apps

The purpose of this study is to understand what type of paid apps are likely to attract more English-speaking users on Google Play and the App Store. Therefore, free apps should be isolated.

In [303]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7] # the price column 
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4] # the price column 
    if price == '0.0':
        ios_final.append(app)
        
print('Number of Android apps:', len(android_final))
print('Number of iOS apps:', len(ios_final))

Number of Android apps: 8864
Number of iOS apps: 3222


After cleaning all data, the data sets contain 8864 Android and 3222 iOS apps information for this analysis.

## 5. Analysis

### 5.1 Most Popular Apps by Genre

#### 5.1.1 Breakdown of Google Play Store Data Set

The Genres (index[-4]) and Category (index[1]) columns of the Google Play data set is similar in nature, a frequency table is generated for both columns to choose the best fit.

In [325]:
print('Frequency Table generated by Genre column:')
display_table(android_final, -4)

Frequency Table generated by Genre column:
Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718

In [326]:
print('Frequency Table generated by Category column:')
display_table(android_final, 1)

Frequency Table generated by Category column:
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PA

Looking into the Genres frequency table, the table displays more specific information. For example, Parenting apps are divided into 4 sub-genres:
- Parenting: 0.4963898916967509
- Parenting;Music & Video: 0.06768953068592057
- Parenting;Education: 0.078971119133574
- Parenting;Brain Games: 0.01128158844765343

The Category frequency table, on the other hand, provides a more general view of the popularity based on app types. The categorisation of app type is also more similar to the Apple Store data set, common types includes:
- Health & Fitness
- Food & Drink
- Finance
- Education
- Productivity

Therefore, the Category frequency table will be used for representing the Android data set.

#### 5.1.2 Breakdown of Apple App Store Data Set

This section illustrates the most common apps by genre in both Google Play Store and Apple Store. A frequency table will be generated from the prime_genre column of the App Store data set, as well as Genres and Category columns of the Google Play data set.

In [443]:
# function to generate frequency tables that show percentages
def frequency_table(dataset, index):
    table = {}
    count = 0
    
    for row in dataset:
        genre = row[index] # column that corresponds to genre
        count += 1
        if genre in table:
            table[genre] += 1
        else:
            table[genre] = 1
    
    table_percentages = {}
    
    for key in table: # display data by percentage
        percentage = (table[key] / count) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

# function to display the percentages in a descending order
def display_table(dataset, index):
    table = frequency_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Below is the frequency table of App Store Data Set:

In [324]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


'Games' ranks first among the free English apps offered on Apple Store, contributing to approximately 60% of the chart. 'Entertainment', with a percentage close to 8%, is at the second place, followed by 'Photo & Video' which amount for almost 5%.

Most games available on Apple Store are designed for fun. 'Games', 'Entertainment' and 'Photo & Video' jointly take up 70% of free English apps in App Store. While productivity and practical apps, such as 'Education', 'Shopping', 'News', are less common.

From this data set, compared with Apple Store, Google Play offers a more diversed and balanced choice of applications. 'Games', 'Entertainment','Photography' and 'Video Players' only jointly take up 15% of free English apps.

### 5.2 Most Popular Type of Free Apps

One of the methods to find out what genres are the most popular, i.e. have the most users, is to calculate the average number of installs for each app genre. For the Google Play data set, we could find this information in the 'Installs' column. However, this information is not available on the App Store data set. As a workaround, the total number of user ratings will be used. User ratings count could be found in the 'rating_count_tot' column.

#### 5.2.1 Number of Installs on Google Play Store

The number of installs for an application is a good indicator of an app's popularity. However, the install numbers are not precise as the values are open-ended:

In [531]:
install_range = list()

for row in android_final:
    install = row[5]
    # numbers with commas and plus signs are converted to float
    install = install.replace(',', '')
    install = install.replace('+', '')
    install = int(install)
    if install not in install_range: install_range.append(install)

install_range.sort(reverse=True)

In [532]:
for install in install_range:
    print(install, '+')

1000000000 +
500000000 +
100000000 +
50000000 +
10000000 +
5000000 +
1000000 +
500000 +
100000 +
50000 +
10000 +
5000 +
1000 +
500 +
100 +
50 +
10 +
5 +
1 +
0 +


Since there is no recise data, the data will be rounded down. For example, for apps with 100,000+ installs will be considered as 100,000 installs, and apps with 1,000,000+ installs will equal to 1,000,000 installs.

Average installations for each category is as follows:

In [561]:
categories_android = frequency_table(android_final, 1) # create new list

for category in categories_android:
    total = 0 # total number of installations per category
    category_count = 0 # number of categories
    
    total_installs = {}
    category_count = {}
    
    for app in android_final:
        category_app = app[1]
        installs = app[5]
        installs = installs.replace(',', '')
        installs = installs.replace('+', '')
        installs = float(installs)
        
        if category_app not in total_installs: 
            total_installs[category_app] = installs
            category_count[category_app] = 1
        else:
            total_installs[category_app] += installs
            category_count[category_app] += 1

d1 = total_installs
d2 = category_count
d3 = {x:float(d1[x])/d2[x] for x in d2} # divide total installs of each genre by number of counts to get average rating


# print dictionary in descending order by average ratings
android_avg_installs = sorted(d3.items(), key=lambda x: x[1], reverse = True)
android_avg_installs

[('COMMUNICATION', 38456119.167247385),
 ('VIDEO_PLAYERS', 24727872.452830188),
 ('SOCIAL', 23253652.127118643),
 ('PHOTOGRAPHY', 17840110.40229885),
 ('PRODUCTIVITY', 16787331.344927534),
 ('GAME', 15588015.603248259),
 ('TRAVEL_AND_LOCAL', 13984077.710144928),
 ('ENTERTAINMENT', 11640705.88235294),
 ('TOOLS', 10801391.298666667),
 ('NEWS_AND_MAGAZINES', 9549178.467741935),
 ('BOOKS_AND_REFERENCE', 8767811.894736841),
 ('SHOPPING', 7036877.311557789),
 ('PERSONALIZATION', 5201482.6122448975),
 ('WEATHER', 5074486.197183099),
 ('HEALTH_AND_FITNESS', 4188821.9853479853),
 ('MAPS_AND_NAVIGATION', 4056941.7741935486),
 ('FAMILY', 3695641.8198090694),
 ('SPORTS', 3638640.1428571427),
 ('ART_AND_DESIGN', 1986335.0877192982),
 ('FOOD_AND_DRINK', 1924897.7363636363),
 ('EDUCATION', 1833495.145631068),
 ('BUSINESS', 1712290.1474201474),
 ('LIFESTYLE', 1437816.2687861272),
 ('FINANCE', 1387692.475609756),
 ('HOUSE_AND_HOME', 1331540.5616438356),
 ('DATING', 854028.8303030303),
 ('COMICS', 81765

The top 3 installs categories in Android are:
- Communication
- Video players
- Social

In [565]:
print('Breakdown of Communication Apps:')
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print('-', app[0], ':', app[5])

Breakdown of Communication Apps:
- WhatsApp Messenger : 1,000,000,000+
- imo beta free calls and text : 100,000,000+
- Android Messages : 100,000,000+
- Google Duo - High Quality Video Calls : 500,000,000+
- Messenger – Text and Video Chat for Free : 1,000,000,000+
- imo free video calls and chat : 500,000,000+
- Skype - free IM & video calls : 1,000,000,000+
- Who : 100,000,000+
- GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
- LINE: Free Calls & Messages : 500,000,000+
- Google Chrome: Fast & Secure : 1,000,000,000+
- Firefox Browser fast & private : 100,000,000+
- UC Browser - Fast Download Private & Secure : 500,000,000+
- Gmail : 1,000,000,000+
- Hangouts : 1,000,000,000+
- Messenger Lite: Free Calls & Messages : 100,000,000+
- Kik : 100,000,000+
- KakaoTalk: Free Calls & Text : 100,000,000+
- Opera Mini - fast web browser : 100,000,000+
- Opera Browser: Fast and Secure : 100,000,000+
- Telegram : 100,000,000+
- Truecaller: Caller ID, SMS spam blocking & Dialer : 100,0

Noticed from the above breakdown that the market is dominated by communication giants like WhatsApp Messenger, Gmail and LINE. If we remove all the communication apps that have over 100 million installs, the average would be reduced by around ten times:

In [577]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
avg_under_100_m = sum(under_100_m) / len(under_100_m)

print('Average installations for all Communication Apps: 38456119.167247385')
print('Average installations for Communication Apps having less than 100 million stalls:', avg_under_100_m)

Average installations for all Communication Apps: 38456119.167247385
Average installations for Communication Apps having less than 100 million stalls: 3603485.3884615386


The same situation applies to Video Players as well, the average was reduced by approximately 4.5 times when market dominators like YouTube, Google Play Movies and Dubsmash are removed:

In [578]:
print('Breakdown of Video Players:')
for app in android_final:
    if app[1] == 'VIDEO_PLAYERS' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print('-', app[0], ':', app[5])

Breakdown of Video Players:
- YouTube : 1,000,000,000+
- Motorola Gallery : 100,000,000+
- VLC for Android : 100,000,000+
- Google Play Movies & TV : 1,000,000,000+
- MX Player : 500,000,000+
- Dubsmash : 100,000,000+
- VivaVideo - Video Editor & Photo Movie : 100,000,000+
- VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
- Motorola FM Radio : 100,000,000+


In [579]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'VIDEO_PLAYERS') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
avg_under_100_m = sum(under_100_m) / len(under_100_m)

print('Average installations for all Video Players: 24727872.452830188')
print('Average installations for Video Players having less than 100 million stalls:', avg_under_100_m)

Average installations for all Video Players: 24727872.452830188
Average installations for Video Players having less than 100 million stalls: 5544878.133333334


#### 5.2.2 Number of Ratings on Apple App Store

Total number of ratings has been computed and sorted:

In [439]:
def total_ios_rating(dataset): # accumulated ratings of each genre
    table = {}
    for row in dataset:
        genre = row[-5]
        rating = float(row[5])
        if genre in table:
            table[genre] += rating
        else:
            table[genre] = rating
    return table
    
def ios_rating_count(dataset): # number of counts for each genre
    table = {}
    count = {}
    for row in dataset:
        genre = row[-5]
        rating = float(row[5])
        if genre in table:
            table[genre] += rating
            count[genre] += 1
        else:
            table[genre] = rating
            count[genre] = 1
    return count

In [440]:
d1 = total_ios_rating(ios_final)
d2 = ios_rating_count(ios_final)
d3 = {x:float(d1[x])/d2[x] for x in d2} # divide accumulated ratings of each genre by number of counts to get average rating

In [441]:
# print dictionary in descending order by average ratings
sorted_ios_avg_ratings = sorted(d3.items(), key=lambda x: x[1], reverse = True)
sorted_ios_avg_ratings

[('Navigation', 86090.33333333333),
 ('Reference', 74942.11111111111),
 ('Social Networking', 71548.34905660378),
 ('Music', 57326.530303030304),
 ('Weather', 52279.892857142855),
 ('Book', 39758.5),
 ('Food & Drink', 33333.92307692308),
 ('Finance', 31467.944444444445),
 ('Photo & Video', 28441.54375),
 ('Travel', 28243.8),
 ('Shopping', 26919.690476190477),
 ('Health & Fitness', 23298.015384615384),
 ('Sports', 23008.898550724636),
 ('Games', 22788.6696905016),
 ('News', 21248.023255813954),
 ('Productivity', 21028.410714285714),
 ('Utilities', 18684.456790123455),
 ('Lifestyle', 16485.764705882353),
 ('Entertainment', 14029.830708661417),
 ('Business', 7491.117647058823),
 ('Education', 7003.983050847458),
 ('Catalogs', 4004.0),
 ('Medical', 612.0)]

The top 3 reviews genres in Apple are:
- Navigation
- Reference
- Social Networking

In [567]:
print('Breakdown of Navigation Apps:')
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Breakdown of Navigation Apps:
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [568]:
print('Breakdown of Reference Apps:')
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5]) # print name and number of ratings

Breakdown of Reference Apps:
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [570]:
print('Breakdown of Social Networking Apps:')
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5]) # print name and number of ratings

Breakdown of Social Networking Apps:
Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Vi

## 6. Recommendations

Apple Store Genres ranking list:

In [580]:
sorted_ios_avg_ratings

[('Navigation', 86090.33333333333),
 ('Reference', 74942.11111111111),
 ('Social Networking', 71548.34905660378),
 ('Music', 57326.530303030304),
 ('Weather', 52279.892857142855),
 ('Book', 39758.5),
 ('Food & Drink', 33333.92307692308),
 ('Finance', 31467.944444444445),
 ('Photo & Video', 28441.54375),
 ('Travel', 28243.8),
 ('Shopping', 26919.690476190477),
 ('Health & Fitness', 23298.015384615384),
 ('Sports', 23008.898550724636),
 ('Games', 22788.6696905016),
 ('News', 21248.023255813954),
 ('Productivity', 21028.410714285714),
 ('Utilities', 18684.456790123455),
 ('Lifestyle', 16485.764705882353),
 ('Entertainment', 14029.830708661417),
 ('Business', 7491.117647058823),
 ('Education', 7003.983050847458),
 ('Catalogs', 4004.0),
 ('Medical', 612.0)]

Looking into the genres on the Apple Store list, the first fours genres are already dominated by the market. Weather information is easily accessible via the Internet in this day and age, the possibility of generating revenues via in-app purchases for Weather apps are low. The next popular application in line is 'Book'.

In [586]:
print('Breakdown of Books Genre:')
for app in ios_final:
    if app[-5] == 'Book':
        print(app[1], ':', app[5])

Breakdown of Books Genre:
Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0


The book genre's major player is Kindle and Audible. There is relatively less competition and seems potential for development. The selling point of these competitors are wide selection of ebooks and audio available. Therefore, we could offer something different from these two features to appeal to the audience, for example, speedreading. Nowadays people are very busy with their lives that some may not have time to read a 300-page best seller. Similar to book reviews, we could transform the knowledge from books to a 10 to 15-minute summary for target audience to digest.

In [581]:
android_avg_installs

[('COMMUNICATION', 38456119.167247385),
 ('VIDEO_PLAYERS', 24727872.452830188),
 ('SOCIAL', 23253652.127118643),
 ('PHOTOGRAPHY', 17840110.40229885),
 ('PRODUCTIVITY', 16787331.344927534),
 ('GAME', 15588015.603248259),
 ('TRAVEL_AND_LOCAL', 13984077.710144928),
 ('ENTERTAINMENT', 11640705.88235294),
 ('TOOLS', 10801391.298666667),
 ('NEWS_AND_MAGAZINES', 9549178.467741935),
 ('BOOKS_AND_REFERENCE', 8767811.894736841),
 ('SHOPPING', 7036877.311557789),
 ('PERSONALIZATION', 5201482.6122448975),
 ('WEATHER', 5074486.197183099),
 ('HEALTH_AND_FITNESS', 4188821.9853479853),
 ('MAPS_AND_NAVIGATION', 4056941.7741935486),
 ('FAMILY', 3695641.8198090694),
 ('SPORTS', 3638640.1428571427),
 ('ART_AND_DESIGN', 1986335.0877192982),
 ('FOOD_AND_DRINK', 1924897.7363636363),
 ('EDUCATION', 1833495.145631068),
 ('BUSINESS', 1712290.1474201474),
 ('LIFESTYLE', 1437816.2687861272),
 ('FINANCE', 1387692.475609756),
 ('HOUSE_AND_HOME', 1331540.5616438356),
 ('DATING', 854028.8303030303),
 ('COMICS', 81765

Checking into the 'BOOKS_AND_REFERENCE' category, there are only 5 dominators in the industry.

In [594]:
print('Popular Books and Reference:')
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE'and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print('-', app[0], ':', app[5])

Popular Books and Reference:
- Google Play Books : 1,000,000,000+
- Bible : 100,000,000+
- Amazon Kindle : 100,000,000+
- Wattpad 📖 Free Books : 100,000,000+
- Audiobooks from Audible : 100,000,000+


Though there are many providers of books and references, the number of downloads for these apps are not vast.

In [591]:
print('Breakdown of Books and Reference:')
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print('-', app[0], ':', app[5])

Breakdown of Books and Reference:
- E-Book Read - Read Book for free : 50,000+
- Download free book with green book : 100,000+
- Wikipedia : 10,000,000+
- Cool Reader : 10,000,000+
- Free Panda Radio Music : 100,000+
- Book store : 1,000,000+
- FBReader: Favorite Book Reader : 10,000,000+
- English Grammar Complete Handbook : 500,000+
- Free Books - Spirit Fanfiction and Stories : 1,000,000+
- Google Play Books : 1,000,000,000+
- AlReader -any text book reader : 5,000,000+
- Offline English Dictionary : 100,000+
- Offline: English to Tagalog Dictionary : 500,000+
- FamilySearch Tree : 1,000,000+
- Cloud of Books : 1,000,000+
- Recipes of Prophetic Medicine for free : 500,000+
- ReadEra – free ebook reader : 1,000,000+
- Anonymous caller detection : 10,000+
- Ebook Reader : 5,000,000+
- Litnet - E-books : 100,000+
- Read books online : 5,000,000+
- English to Urdu Dictionary : 500,000+
- eBoox: book reader fb2 epub zip : 1,000,000+
- English Persian Dictionary : 500,000+
- Flybook : 500

## 7. Conclusion

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.


Based on the findings, the recommended genre for development is a speedreading application in the 'Book' genre. This feature is realitively new and could avoid direct competitions with market dominators.