# Profitable apps profiles
This document explains the analysis process performed to determine the characteristics of a profitable app published in Google Play and iOS Store. The app main revenue comes from in-app ads.

### Define an explore data function

In [1]:
def explore_data(dataset, start=1, end=4, rows_and_columns=False):
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        print('\n')
        
    dataset_slice = dataset[start:end]
    
    for row in dataset_slice:
        print(row)


### Read both iOS and Android data

In [2]:
android_csv_data = open('/home/li0t/Documents/workspace/data-sets/googleplaystore.csv')
ios_csv_data = open('/home/li0t/Documents/workspace/data-sets/AppleStore.csv')

from csv import reader

# Android Google Play data
read_android_data = reader(android_csv_data)
android_data_list = list(read_android_data)
android_headers = android_data_list[0]
android_data_set = android_data_list[1:]

# iOS App Store data
read_ios_data = reader(ios_csv_data)
ios_data_list = list(read_ios_data)
ios_headers = ios_data_list[0]
ios_data_set = ios_data_list[1:]


### Display data headers and rows and headers count

In [3]:
print('Google Play')
print('\n')
print(android_headers)
print('\n')
explore_data(android_data_set, rows_and_columns=True)
print('\n')

print('========================================================================================================\n')

print('iOS App Store')
print('\n')
print(ios_headers)
print('\n')
explore_data(ios_data_set, rows_and_columns=True)

Google Play


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Number of rows: 10841
Number of columns: 13


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']
['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']



iOS App Store


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'

In [4]:
# Checking for reported corrupted row
# print(android_data_set[10472])
print('10471 : ' + str(len(android_data_set[10471])))
print('10472 : ' + str(len(android_data_set[10472])))
print('10473 : ' + str(len(android_data_set[10473])))
print('10474 : ' + str(len(android_data_set[10474])))
print('10475 : ' + str(len(android_data_set[10475])))

10471 : 13
10472 : 12
10473 : 13
10474 : 13
10475 : 13


In [5]:
corrupted_row_index = 10473
corrupted_row = android_data_set[corrupted_row_index]

cols_length = len(android_data_set[0])
if len(corrupted_row) != cols_length:
    del android_data_set[corrupted_row_index]

## Duplicated data
As seen in the following example there are duplicated app names in the android data set:

In [6]:
# Deleting duplicates
android_unique_names = []
android_duplicated_names = []

for app in android_data_set:
    name = app[0]
    if name in android_unique_names:
        android_duplicated_names.append(name)
    else:
        android_unique_names.append(name)

print('Found ' + str(len(android_duplicated_names)) + ' duplicated apps names')
print('Unique apps are: ' + str(len(android_data_set)  - len(android_duplicated_names)))


Found 1181 duplicated apps names
Unique apps are: 9660


# Cleaning data
We'll use the app max reviews as the unique value.

In [7]:
from datetime import datetime

android_duplicated_apps = {}

# Find latest version for each duplicated app
for dup_app_name in android_duplicated_names:
    
    for app in android_data_set:
        app_name = app[0]
    
        if app_name == dup_app_name:

            if app_name not in android_duplicated_apps:
                android_duplicated_apps[app_name] = app
            else: 
                app_reviews = float(app[3])
                dup_app_reviews = float(android_duplicated_apps[app_name][3])        
                
                if app_reviews > dup_app_reviews:
                    android_duplicated_apps[app_name] = app


## Recreating duplicated android app data
An array with unique android apps will be created

In [8]:
android_unique_data_set = []
added_duplicates = []

for app in android_data_set:
    app_name = app[0]
    
    if app_name in android_duplicated_names:
        if app_name not in added_duplicates:
            duplicated_app = android_duplicated_apps[app_name]
            android_unique_data_set.append(duplicated_app)
            added_duplicates.append(app_name)
    else:
        android_unique_data_set.append(app)

print('Unique android names: ' + str(len(android_unique_names)))
print('Unique android apps: ' + str(len(android_unique_data_set)))

Unique android names: 9660
Unique android apps: 9660


## Deleting non-english apps
We'll define a function that checks whether an app name has non-enlgish characters in it, and the use it to clean the apps. 

In [9]:
def isEnglishApp(name):
    nonEnglishChars = 0
    
    for char in name:
        charN = ord(char)
       
        if (charN > 127):
            nonEnglishChars += 1
    
    return nonEnglishChars <= 3

# iOS Apps
# iOS name index: 1
ios_apps = []
for app in ios_data_set:
    app_name = app[1]
    
    if isEnglishApp(app_name):
        ios_apps.append(app)
        
# Android Apps     
# Android name index: 0
android_apps = []        
for app in android_unique_data_set:
    app_name = app[0]
    
    if isEnglishApp(app_name):
        android_apps.append(app)


print(len(ios_apps))
print(len(android_apps))

6183
9615


# Deleting non-free apps
As the analysis will be performed in order to find an optimal profile for an app that has in-app ads revenue, only the free apps will be considered.

In [15]:
# Android Apps   
# Android price index: 7
android_free_apps = []

for app in android_apps:
    price = app[7]

    if price == '0':
        android_free_apps.append(app)

print('Android free apps: ' + str(len(android_free_apps)))

# iOS Apps
# iOS price index: 4
ios_free_apps = []

for app in ios_apps:
    price = float(app[4])

    if price == 0.0 or 0:
        ios_free_apps.append(app)
        
print('iOS free apps: ' + str(len(ios_free_apps)))



Android free apps: 8864
iOS free apps: 3222


# Building profiles
As the app that we are building will have it's main revenue source from advertising, the app must a have as much users as it can. For that we'll count the number of users associated with the different genres to find which genres are more popular.

## Developement strategy
As resources are limited, the first version of the app will be a minimal Android app published in Google Play, if that app proves to be catchy we'll add more features. If after six months after it's release the app becomes profitable, the iOS version will be devolped and published in the App Store.

In [72]:
# Build frecuency tables for app genres
def freq_table(dataset, index, print_freqs=False):
    freqs = {}
    
    for row in dataset:
        value = row[index]
        if value in freqs:
            freqs[value] += 1
        else: 
            freqs[value] = 1
    
    if print_freqs:
        print(freqs)
        
    return freqs

def sort_table(table, index):
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    
    return table_sorted

def display_table(table, index, display_count=10):
    table_sorted = sort_table(table, index)
    for entry in table_sorted[:display_count]:
        print(entry[1], ':', entry[0])
    

        
# Android Apps
# Android "Category" is a single word
print('Android "Category"\n')
android_category_index = 1
android_categories_table = freq_table(android_free_apps, android_category_index)
display_table(android_categories_table, android_category_index)
print('\n')

# Android "Genres" column separates genres with a semicolon ";"
print('Android "Genres"\n')
android_genres_index = 9
android_genres_table = freq_table(android_free_apps, android_genres_index)
display_table(android_genres_table, android_genres_index)
print('\n')

# iOS Apps
# iOS prime_genre index: 11
# iOS prime_genre is a single word
print('iOS "prime_genres"\n')
ios_prime_genre_index = 11
ios_prime_genre_table = freq_table(ios_free_apps, ios_prime_genre_index)
display_table(ios_prime_genre_table, ios_prime_genre_index)
print('\n')



Android "Category"

FAMILY : 1676
GAME : 862
TOOLS : 750
BUSINESS : 407
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 313
SPORTS : 301
PERSONALIZATION : 294


Android "Genres"

Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294


iOS "prime_genres"

Games : 1874
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65




# Users downloads
To correctly profile our app we need not just to find which genres have more reviews, but alse determine which apps are downloaded the most. 

To do that we'll find what's the average download rate for each app genre

In [120]:

def find_genres_avg(dataset, genre_index, downloads_index, parse_downloads=False):
    app_freq_table = freq_table(dataset, genre_index)
    downloads_avg = {}
        
    for app_genre in app_freq_table:
        genre_total = 0
        downloads_total = 0
        
        for row in dataset:
            row_genre = row[genre_index]    
            row_downloads = 0
            
            if parse_downloads:
                row_downloads = float(row[downloads_index].replace('+','').replace(',', ''))
            else:
                row_downloads = float(row[downloads_index])
            
            if row_genre == app_genre:
                genre_total += 1
                downloads_total += row_downloads
        
        downloads_avg[app_genre] =  round(downloads_total / genre_total) 
        
    return downloads_avg
                
                
print('iOS Apps average download per prime_genre\n')
ios_rating_count_tot = 5
ios_genres_avg = find_genres_avg(ios_free_apps, ios_prime_genre_index, ios_rating_count_tot, True)
display_table(ios_genres_avg, ios_prime_genre_index)
print('===============================\n')

print('Android Apps average download per Category\n')
android_installs_index = 5
android_categories_avg = find_genres_avg(android_free_apps, android_category_index, android_installs_index, True)
display_table(android_categories_avg, android_category_index)
print('===============================\n')

print('Android Apps average download per Genres\n')
android_installs_index = 5
android_genres_avg = find_genres_avg(android_free_apps, android_genres_index, android_installs_index, True)
display_table(android_genres_avg, android_genres_index)

display_table(android_genres_avg, android_genres_index)
print('===============================\n')

L = [0, 1, 2, 3, 4]

def find_most_popular():
    most_popular_genre = None
    
    for i, n in enumerate(L):
        ios_most_popular_n = sort_table(ios_genres_avg, ios_prime_genre_index)[n:n+1]
        android_c_most_popular_n = sort_table(android_categories_avg, android_category_index)[n:n+1]
        android_g_most_popular_n = sort_table(android_genres_avg, android_genres_index)[n:n+1]
        
        most_popular_rate_n = round((ios_most_popular_n[0][0]  + android_c_most_popular_n[0][0] + android_g_most_popular_n[0][0]) / 3)
        print('#' +str(n+1) + ' ' + 'average downloads: ' + str(most_popular_rate_n) + ' = (' + ios_most_popular_n[0][1] + ' + ' + android_c_most_popular_n[0][1] + ' + ' + android_g_most_popular_n[0][1] + ')') 
           
 
    
find_most_popular()

iOS Apps average download per prime_genre

Navigation : 86090
Reference : 74942
Social Networking : 71548
Music : 57327
Weather : 52280
Book : 39758
Food & Drink : 33334
Finance : 31468
Photo & Video : 28442
Travel : 28244

Android Apps average download per Category

COMMUNICATION : 38456119
VIDEO_PLAYERS : 24727872
SOCIAL : 23253652
PHOTOGRAPHY : 17840110
PRODUCTIVITY : 16787331
GAME : 15588016
TRAVEL_AND_LOCAL : 13984078
ENTERTAINMENT : 11640706
TOOLS : 10801391
NEWS_AND_MAGAZINES : 9549178

Android Apps average download per Genres

Communication : 38456119
Adventure;Action & Adventure : 35333333
Video Players & Editors : 24947336
Social : 23253652
Arcade : 22888365
Casual : 19569222
Puzzle;Action & Adventure : 18366667
Photography : 17840110
Educational;Action & Adventure : 17016667
Productivity : 16787331
Communication : 38456119
Adventure;Action & Adventure : 35333333
Video Players & Editors : 24947336
Social : 23253652
Arcade : 22888365
Casual : 19569222
Puzzle;Action & Adventure

# Conclusions
As seen above there are similarities in the profiles of popular apps for both Google Play and iOS App Store. Specifically the entertainment, social, and tools apps are the most popular genres with the greatest number of average downloads.

## iOS differences
It's easy to see that the priorities for iOS users diverge from that of the Android users, being more popular the tools over entertainment although the social apps have practically the same protagonisms in both platforms.

## Development strategy
As stated in the beginning of this analysis the first step of the development will be a minimal android app that, only if it becomes profitable will be ported to iOS. 

So let's consider the following facts:
1. The early users of the app will be Android users.
2. The number of downloads per genre is much greater in Android apps.

Given that the most reasonable conclusion is that to develop an app that is most popular on Android primarily, but as the market will expand into iOS eventually we'll also need to consider that genre that shares the most similar download rate in both markets. That said, the greater user base of Android makes comunnication apps an irresistible option. 

**Thus the optimal app profile which it's main revenue comes from advertising is a free English-spoken communication app.**



