# Profitable App Profiles for the App Store and Google Play Markets

Our aim in this project is to find mobile app profiles that are more profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our developers to make data-driven decisions about the kind of apps they build.

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

## Exploring the data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play. Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead, using these 2 data sets:

A [dataset](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. 

The data can be downloaded [here](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)

A [dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. 

The data can be downloaded [here](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)

We'll start by opening the two datasets and exploring them.

In [1]:
from csv import reader

In [2]:
# Google Play data
google_play = open('googleplaystore.csv')
google_file = reader(google_play)
android = list(google_file)
android_header = android[0]
android = android[1:]

# iOs App Store data
ios_data = open('AppleStore.csv')
ios_file = reader(ios_data)
ios = list(ios_file)
ios_header = ios[0]
ios = ios[1:]

In [3]:
# Define a function for exploring the data
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
# Printing the first few rows of the android data set
print(android_header)
explore_data(android,0,5,rows_and_columns=True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

The Google Play store data set has 10841 rows (excluding headers), which means that it has 10841 apps. There are also 13 columns. Columns that are likely to be useful for our analysis are: 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', 'Genres'

In [5]:
# Printing the first few rows of the iOS data set
print(ios_header)
explore_data(ios,0,5,rows_and_columns=True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


The iOS app store data set has 7197 rows (excluding headers), which means that it has 7197 apps. There are also 16 columns. Columns that are likely to be useful for our analysis are: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver',  'prime_genre'. The column names are a bit tricky to decipher, so more details can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

## Cleaning the data

Our company only builds app that are free to download and install, and we design them for an English-speaking audience. This means that we'll need to do the following:

Remove non-English apps like Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠.
Remove apps that aren't free
Remove any duplicate apps

The Google Play data set has a dedicated discussion section, and we can see that one of the discussions outlines an error for row 10472. We will check the row and delete the row with an error.

In [6]:
print(android[10472])  # incorrect row
print('\n')
print(android_header)  # header
print('\n')
print(android[0])      # correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


The row 10472 corresponds to the app Life Made WI-Fi Touchscreen Photo Frame, and we can see that the rating is 19. This is clearly off because the maximum rating for a Google Play app is 5. This is probably becuase the 'Category' column is missing a value (verified in the discussions section). As a consequence, we'll delete this row.

In [7]:
print(len(android))
del android[10472]  # don't run this more than once
print(len(android))

10841
10840


After checking the Google Play data set, we'll find that some apps have more than one entry. For instance, the application Instagram has four entries:

In [8]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [9]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Using the function above to check, we can see that there are 1181 duplicate apps in the Google Play Store data set. 

Based on the duplicates for Instagram, we can see that the number of reviews (4th column) varies. As entries with more reviews are likely to be more recent, we will retain these rather than removing the duplicates randomly.

To remove the duplicates, we will do the following:

a) Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.

b) Use the information stored in the dictionary and create a new dataset, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

## Removing duplicates

In [10]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print(len(reviews_max))
print('Expected length:', len(android) - 1181)

9659
Expected length: 9659


We now have a dictionary with unique keys for every app and the max number of reviews for each app.

Now, let's use the reviews_max dictionary to remove the duplicates. For the duplicate cases, we'll only keep the entries with the highest number of reviews. In the code cell below:

We start by initializing two empty lists, android_clean and already_added.
We loop through the android data set, and for every iteration:
We isolate the name of the app and the number of reviews.
We add the current row (app) to the android_clean list, and the app name (name) to the already_added list if:
The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and
The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.

In [11]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
            android_clean.append(app)
            already_added.append(name)
            
print(len(android_clean))
    

9659


The number of rows is correct, so we'll use the function defined above to explore the data set

In [12]:
explore_data(android_clean, 0, 3, rows_and_columns=True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


### Confirming that iOS data set has no duplicates

In [13]:
duplicate_ios = []
unique_ios = []

for app in ios:
    app_id = app[0]
    if name in unique_apps:
        duplicate_ios.append(app_id)
    else:
        unique_ios.append(app_id)

print('Number of duplicate apps:', len(duplicate_ios))
print('\n')
print('Examples of duplicate apps:', duplicate_ios[:15])

Number of duplicate apps: 7197


Examples of duplicate apps: ['284882215', '389801252', '529479190', '420009108', '284035177', '429047995', '282935706', '553834731', '324684580', '343200656', '512939461', '362949845', '359917414', '469369175', '924373886']


We use English for the apps we develop at our company, and we'd like to analyze only the apps that are designed for an English-speaking audience. However, if we explore the data long enough, we'll find that both datasets have apps with names that suggest they are not designed for an English-speaking audience.

We will now define a function to check whether any of the characters in an app's name are not commonly used in the English language. This is based on the understanding that the numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system.

In [14]:
def check_english(app_name):
    for letter in app_name:
        if ord(letter) > 127:
            return False
    
    return True

In [15]:
# Testing the function
print(check_english('Instagram'))
print(check_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))
print(check_english('Docs To Go‚Ñ¢ Free Office Suite'))
print(check_english('Instachat üòú'))

print(ord('‚Ñ¢'))
print(ord('üòú'))

True
False
False
False
8482
128540


The function seems to work fine, but some English app names use emojis or other symbols (‚Ñ¢, ‚Äî (em dash), ‚Äì (en dash), etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form.

To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters.

In [16]:
def check_english(app_name):
    non_ascii = 0
    
    for character in app_name:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True
    
print(check_english('Docs To Go‚Ñ¢ Free Office Suite'))
print(check_english('Instachat üòú'))
print(check_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))

True
True
False


The revised function seems to work. It may not be 100% accurate, but it should be sufficient for this analysis. Now we'll use it to filter out non-English apps in both data sets.

In [17]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    
    if check_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    
    if check_english(name):
        ios_english.append(app)


explore_data(android_english,0,3,rows_and_columns=True)
explore_data(ios_english,0,3,rows_and_columns=True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

We are left with 9614 Android and 6183 iOS apps.

## Isolating free apps

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our datasets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

In [18]:
android_free = []
ios_free = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_free.append(app)
    
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_free.append(app)
    
explore_data(android_free,0,3,rows_and_columns=True)
print('\n')
explore_data(ios_free,0,3,rows_and_columns=True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+'

After isolating the free apps, we have 8864 Android apps and 3222 iOS apps, which should be sufficient for our analysis.

## Finding the most popular genres

Our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

To minimize risks and overhead, our validation strategy for an app idea has three steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we develop it further.
If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.
Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. 

We'll begin the analysis by finding the most common genres for each market, and we will use the following columns in each data set.

Android data set: 'Category','Genres'

Apple data set: 'prime_genre'

We'll build two functions we can use to analyze the frequency tables:

One function to generate frequency tables that show percentages.
Another function we can use to display the percentages in a descending order

In [22]:
def freq_table(dataset, index):
    frequency_table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in frequency_table:
            frequency_table[value] += 1
        else:
            frequency_table[value] = 1
    
    table_percentages = {}
    for key in frequency_table:
        percentage = (frequency_table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
    

Analysing the genres in the iOS app store

In [27]:
display_table(ios_free, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


The most common genre among free English apps in the iOS app store is games by far. Games make up more than half the free English apps. The next most common genre is entertainment.

Most of the apps are designed for fun, with games, photo and video, social networking, sports and music apps making up >75% of free english apps.

However, it's difficult to recommend an app profile based on this frequency table alone as we cannot judge the popularity of the apps.

Analysing categories and genres in the Android app store

In [28]:
display_table(android_free, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [29]:
display_table(android_free, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

From the frequency tables for app categories and genres in the Google Play Store, we can see that there is a more even distribution of categories/genres. Family apps are the most common (18.9%), followed by games (9.7%) and tools (8.5%). The Google Play store is not as dominated by apps designed for fun, with practical apps (business, productivity, finance, education etc) making up a larger proportion of the free english apps than in the iOS store. This is evident based on both the category and genre data.

However, again this does not tell us how popular the different kinds of apps are.

## Finding out the popularity of the genres

In order to determine how popular the genres are, we will need to look at the number of installations for each app genre. For the Google Play data, this is found in the 'Installs' column. However, the iOS App Store data set does not have this data, so we will use the number of user ratings as a proxy, which is in the "rating_count_tot" column.

We'll start by doing this for the iOS App Store data set.

In [34]:
ios_genres = freq_table(ios_free, 11)

for genre in ios_genres:
    total = 0
    len_genre = 0
    
    for app in ios_free:
        genre_app = app[11]
        if genre_app == genre:
            ratings = float(app[5])
            total += ratings
            len_genre += 1
            
    avg_user_ratings = total / len_genre
    
    print(genre)
    print(avg_user_ratings)

Social Networking
71548.34905660378
Photo & Video
28441.54375
Games
22788.6696905016
Music
57326.530303030304
Reference
74942.11111111111
Health & Fitness
23298.015384615384
Weather
52279.892857142855
Utilities
18684.456790123455
Travel
28243.8
Shopping
26919.690476190477
News
21248.023255813954
Navigation
86090.33333333333
Lifestyle
16485.764705882353
Entertainment
14029.830708661417
Food & Drink
33333.92307692308
Sports
23008.898550724636
Book
39758.5
Finance
31467.944444444445
Education
7003.983050847458
Productivity
21028.410714285714
Business
7491.117647058823
Catalogs
4004.0
Medical
612.0


The 2 genres with the highest average number of ratings are Navigation and Reference apps, and both make up a small percentage of free english apps. This suggests that either genre could be a possible priority for the company. Let's take a look at what apps there are for these genres in the iOS app store to see if one genre is more suitable than the other.

In [36]:
for app in ios_free:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5])

print('\n')

for app in ios_free:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching¬Æ : 12811
CoPilot GPS ‚Äì Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pok√©mon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 7

While Navigation apps are the most popular on average, most users use 2 apps - Google Maps and Waze, so the ratings averages are skewed by these 2 apps. It may be difficult to compete with such established apps. While the Bible app has the most ratings among Reference apps by a large margin, user ratings are more spread out across a variety of reference apps. This suggests that there is more potential within this genre, as the apps could be reference for popular games or activities. This would also complement the large number of games apps on the iOS app store.

Other genres that have high numbers of ratings are social networking apps, music, weather, books and finance.

In [43]:
genres = ['Social Networking', 'Music','Weather','Book','Finance']

for g in genres:
    for app in ios_free:
        if app[11] == g:
            print(app[1], ':', app[5])
    print('\n')


Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo ‚Äì Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger ‚Äì Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match‚Ñ¢ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miito

Looking at the apps within these genres, social networking, weather and music tend to be dominated by a few highly popular apps, and the genres apear saturated with a large number and variety of apps. 
Finance and Weather apps also tend to be linked to technical services, information and functions, which our company may not be able to offer for free.

Book apps tend to be dominated by e-readers, but there is also potential to gain users if the app offers a niche type of books, functionality or platform for content sharing.

Next, we'll do a similar analysis for the the Google Play data set. For this data set, we'll focus on categories as the data is less granular than that of genres. As the data for the number of installations is provided in bands e.g. 100,000+, 1,000,000+, we will need to do some cleaning. This level of detail is sufficient for our analysis as we are just trying to find the most popular categories.

In [40]:
android_cat = freq_table(android_free, 1)

for category in android_cat:
    total = 0
    len_category = 0
    
    for app in android_free:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            installs = installs.replace(',','')
            installs = installs.replace('+','')
            installs = float(installs)
            total += installs
            len_category += 1
            
    avg_installs = total / len_category
    
    print(category)
    print(avg_installs)

ART_AND_DESIGN
1986335.0877192982
AUTO_AND_VEHICLES
647317.8170731707
BEAUTY
513151.88679245283
BOOKS_AND_REFERENCE
8767811.894736841
BUSINESS
1712290.1474201474
COMICS
817657.2727272727
COMMUNICATION
38456119.167247385
DATING
854028.8303030303
EDUCATION
1833495.145631068
ENTERTAINMENT
11640705.88235294
EVENTS
253542.22222222222
FINANCE
1387692.475609756
FOOD_AND_DRINK
1924897.7363636363
HEALTH_AND_FITNESS
4188821.9853479853
HOUSE_AND_HOME
1331540.5616438356
LIBRARIES_AND_DEMO
638503.734939759
LIFESTYLE
1437816.2687861272
GAME
15588015.603248259
FAMILY
3695641.8198090694
MEDICAL
120550.61980830671
SOCIAL
23253652.127118643
SHOPPING
7036877.311557789
PHOTOGRAPHY
17840110.40229885
SPORTS
3638640.1428571427
TRAVEL_AND_LOCAL
13984077.710144928
TOOLS
10801391.298666667
PERSONALIZATION
5201482.6122448975
PRODUCTIVITY
16787331.344927534
PARENTING
542603.6206896552
WEATHER
5074486.197183099
VIDEO_PLAYERS
24727872.452830188
NEWS_AND_MAGAZINES
9549178.467741935
MAPS_AND_NAVIGATION
4056941.774193

The 5 categories with the most installs per app are: Communication, travel and local, social, photography and video players. The breakdown of apps in these categories is as follows.

In [47]:
categories = ['COMMUNICATION','TRAVEL_AND_LOCAL','SOCIAL','PHOTOGRAPHY','VIDEO_PLAYERS']

for c in categories:
    for app in android_free:
        if app[1] == c:
            print(app[0], ':', app[5])
    print('\n')

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free ‚Äì Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link‚Ñ¢ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+

Out of these top 5 categories, 4 categories - communication, travel and local, social, video players - tend to be dominated by key services. This could indicate that the categories appear more popular than they actually are, while it would be difficult to compete against established apps.

Installations for photography seem to be more spread out among different apps, and the apps offer a variety of services. This could be one option for the company to offer. However, this differs from what is popular on the iOS app store. It may also require capabilities to provide visual editing functions in the app that the company might not have.

If we take a look at the Books and Reference category, it is also relatively popular, with 8,767,811 installations per app on average. As this is similar to the genres that are popular in the iOS App Store, it is worth exploring further.

In [48]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+

There is a wide range of book and reference apps, from e-readers to dictionaries and many resource books. It appears that are still a few number of apps that skew the average, so let's check if this is really the case.

In [54]:
popular = ['1,000,000,000+','500,000,000+','100,000,000+']
for app in android_free:
    for p in popular:
        if (app[1] == 'BOOKS_AND_REFERENCE') and (app[5] == p):
            print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad üìñ Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


It appears that there are only a few very popular apps, with installations relatively spread out among the remaining apps. Let's take a look at apps with fewer downloads to see if this is the case. 

In [56]:
avg_popular = ['5,000,000+','1,000,000+','500,000+','100,000+']
for app in android_free:
    for p in avg_popular:
        if (app[1] == 'BOOKS_AND_REFERENCE') and (app[5] == p):
            print(app[0], ':', app[5])

Download free book with green book : 100,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
English translation from Bengali : 100,000+
Pdf Book Download - Read Pdf Book : 100,000+
Free Book Reader : 100,000+
Only 30 days in English, the guideline is guaranteed : 500,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dict

There are indeed many apps with more middling numbers of installations. Many of these are platforms for reading, dictionaries or books related to popular texts or games. While it will be difficult to compete with platforms and dictionaries, it may be possible to produce apps built around popular books / games / activities. This is similar to the iOS App Store. To make the app stand out, we would also need to consider additional features that would make them more attractive to users.

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that making apps about popular book / games could be profitable for both the Google Play and the App Store markets. To make the apps stand out from existing apps, we would also need to consider adding some special features besides the raw content about the book / game. This could include quizzes, daily highlights, platforms to discuss the book / game with other users etc.

Possible additional analysis to revisit: 

Analyze the frequency table for the Genre column of the Google Play dataset, and see if you can find useful patterns.


Assume we could also make revenue via in-app purchases and subscriptions, and try to determine which genres seem to be liked the most by users ‚Äî you could examine app ratings here.
