# Profitable App Profiles

The objective of this project is to analyze data from AppleStore and Google Play Markets in order to find out which type of App is most profitable.

In this project we'll do this based in the free apps that receive the profit by using adds, so, the kind of app that attract more users and more clicks per add is more profitable.

## Exploring Data

To start the analyze of some apps, we can start by extracting the information from some free data sets that we can find in web sites like [kaggle](https://www.kaggle.com/).

So, we'll use two data sets, one containing informations about apps from [Google Play](https://www.kaggle.com/lava18/google-play-store-apps) and other from [Apple Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

Let's start by extracting and exploring the data.

In [1]:
def extract(archive, header=True):
    from csv import reader
    opened_file = open(archive, encoding='utf8')
    read_file = reader(opened_file)
    data = list(read_file)
    if header:
        return data[1:], data[0]
    else:
        return data

In [2]:
def explore_data(dataset, start, end, rows_and_columns = False):
    for row in dataset[start:end]:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns', len(dataset[0]))
    print()

Below we have informations about some apps that we can find in Apple Store or Google Play.

In [3]:
apple_apps, apple_header = extract('AppleStore.csv')
google_apps, google_header = extract('googleplaystore.csv')

explore_data(apple_apps, 0, 4, True)
explore_data(google_apps, 0, 4, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns 16

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live 

As you can see above, the data sets have some columns, each of them describe a information about one of the apps, what is the mean of them you can see by checking the headers.

Here we have the headers of the both lists, showing what kind of information the data sets gave to us in each column.

In [4]:
print(apple_header)
print() # create a line without text to make the headers easier to read
print(google_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


## Data Cleaning

Here we'll be removing the data that can cause problems to our analyzes.
For example, if we have duplicated, non-english or apps that aren't free, we'll have some problems and won't reach our goal.

### Info Missing

Let's start by checking if we have any information missing.

In [5]:
def info_missing(archive):
    for row in archive:
        if len(row) != len(archive[0]):
            print(archive.index(row)) # show the row that the error is
            print(len(row)) # show how many info we have
            print(len(archive[0]) - len(row)) # how many info is missing

In [6]:
info_missing(google_apps)

10472
12
1


In [7]:
info_missing(apple_apps)

Now that we know one of the errors let's check and repair them.

In [8]:
print(google_apps[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [9]:
del(google_apps[10472]) # removes the app that haves an error from the list

Now that we already fix the information missing problem, we'll search for duplicated apps.

### Duplicated Apps

Let's start to doing this by Google Play apps.

In [10]:
unique_names = []
duplicated_names = []

for row in google_apps:
    names = row[0]
    if names in unique_names:
        duplicated_names.append(names)
    else:
        unique_names.append(names)
        
print('How many duplicated names we have:', len(duplicated_names))
print('Some examples:', (duplicated_names[:10]))

How many duplicated names we have: 1181
Some examples: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


To delete the duplicated names, we'll delete the oldest ones and let the newest continue in the list. For this, we need to check which one has the greater number of reviews.

In [11]:
reviews_max = {}

for row in google_apps:
    name = row[0]
    n_reviews = int(row[3])
    if name not in reviews_max:
        reviews_max[name] = n_reviews
    else:
        if n_reviews > reviews_max[name]:
            reviews_max[name] = n_reviews
            
print('Number of unique apps:',len(reviews_max))

Number of unique apps: 9659


Now that we already have a dictionary with all unique apps and their max number of reviews, it's time to separate them from the duplicated ones. To do this we'll search for the apps that have the greater number of reviews and put them in a list.

In [12]:
google_added = [] # here we'll add the name of the unique ones
google_clean = [] # here we'll add the row of the unique ones

for row in google_apps:
    name = row[0]
    review = int(row[3])
    if review == reviews_max[name] and name not in google_added:
        google_added.append(name)
        google_clean.append(row)
        
print('Number of unique apps:',len(google_added))

Number of unique apps: 9659


Now let's do the same for the Apple Store.

In [13]:
unique_names2 = []
duplicated_names2 = []

for row in apple_apps:
    names = row[1]
    if names in unique_names2:
        duplicated_names2.append(names)
    else:
        unique_names2.append(names)
        
print('How many duplicated names we have:',len(duplicated_names2))
print('And they are:', duplicated_names2)

How many duplicated names we have: 2
And they are: ['Mannequin Challenge', 'VR Roller Coaster']


In [14]:
reviews_max2 = {}

for row in apple_apps:
    names = row[1]
    n_reviews = int(row[5])
    if names not in reviews_max2:
        reviews_max2[names] = n_reviews
    else:
        if reviews_max2[names] < n_reviews:
            reviews_max[names] = n_reviews
            
print('Number of unique apps:',len(reviews_max2))

Number of unique apps: 7195


In [15]:
apple_added = []
apple_clean = []

for row in apple_apps:
    name = row[1]
    review = int(row[5])
    if reviews_max2[name] == review and name not in apple_added:
        apple_added.append(name)
        apple_clean.append(row)
        
print('Number of unique apps:',len(apple_added))

Number of unique apps: 7195


Now that we already have created a list for the unique apps, let's create a function to remove the non-english apps from them.

### Non English Apps

To do this, we'll check the apps that have more than three ascii characters and remove them from list, because probably they are non-english apps, so they can cause problems in the research.

In [16]:
def non_english(name):
    ascii = 0
    for character in name:
        if ord(character) > 127:
            ascii += 1
        if ascii > 3:  # if an app has more than 3 ascii characters
            return False # the app will be removed from the list (probably non-english)
    return True 

# examples
print(non_english('Instagram'))
print(non_english('电视剧热播'))
print(non_english('Docs To Go™ Free Office Suite'))
print(non_english('Instachat 😜'))

True
False
True
True


In [17]:
english_google = []
english_apple = []

for row in google_clean:
    name = row[0]
    if non_english(name):
        english_google.append(row)
        
for row in apple_clean:
    name = row[1]
    if non_english(name):
        english_apple.append(row)
        
print(len(english_google))
print(len(english_apple))

9614
6181


### Paid Apps

Now that we have removed some errors from our lists, let's check if we have non-free apps in them. So, we'll create a new list just for the free apps.

In [18]:
free_google = []
free_apple = []

for row in english_google:
    price = row[7]
    if price == '0':
        free_google.append(row)

for row in english_apple:
    price = row[4]
    if price == '0.0':
        free_apple.append(row)
        
print(len(free_google))
print(len(free_apple))

8864
3220


Now the data is already clean, so let's start to research informations about profitable apps.

## Data Research

To create a profitable app, first we need to know a genre that the users are used to like. So, we'll start by searching the most commons genres in market.

In [34]:
def freq_table(dataset, index):
    table = {}
    for c in dataset:
        value = c[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    percentage = {}
    for key in table:
        percentage[key] = ((table[key]/len(dataset)) * 100)
        
    return percentage

def display_table(dataset, index):
    table = freq_table(dataset, index)
    ordenate(table)

def ordenate(info):
    info_display = []
    for key in info:
        value = (info[key], key)
        info_display.append(value)
    info_sorted = sorted(info_display, reverse=True)
    for show in info_sorted:
        print(f'{show[1]}: {show[0]}')

The frequency table of the most common apps is already created. Let's show them.

### Frequency of Category at Google Play

In [35]:
display_table(free_google, 1)

FAMILY: 18.907942238267147
GAME: 9.724729241877256
TOOLS: 8.461191335740072
BUSINESS: 4.591606498194946
LIFESTYLE: 3.9034296028880866
PRODUCTIVITY: 3.892148014440433
FINANCE: 3.7003610108303246
MEDICAL: 3.531137184115524
SPORTS: 3.395758122743682
PERSONALIZATION: 3.3167870036101084
COMMUNICATION: 3.2378158844765346
HEALTH_AND_FITNESS: 3.0798736462093865
PHOTOGRAPHY: 2.944494584837545
NEWS_AND_MAGAZINES: 2.7978339350180503
SOCIAL: 2.6624548736462095
TRAVEL_AND_LOCAL: 2.33528880866426
SHOPPING: 2.2450361010830324
BOOKS_AND_REFERENCE: 2.1435018050541514
DATING: 1.861462093862816
VIDEO_PLAYERS: 1.7937725631768955
MAPS_AND_NAVIGATION: 1.3989169675090252
FOOD_AND_DRINK: 1.2409747292418771
EDUCATION: 1.1620036101083033
ENTERTAINMENT: 0.9589350180505415
LIBRARIES_AND_DEMO: 0.9363718411552346
AUTO_AND_VEHICLES: 0.9250902527075812
HOUSE_AND_HOME: 0.8235559566787004
WEATHER: 0.8009927797833934
EVENTS: 0.7107400722021661
PARENTING: 0.6543321299638989
ART_AND_DESIGN: 0.6430505415162455
COMICS: 0.62

The most used Category of app in Google Play is the "Family". So, as we can see, at this market we don't have too much apps with focus on fun. In this way, develop a free app to this platform would need to be something with another utility. But, in this platform, the majority of the content is made to kids, so, the easier way to made a profitable app is making them for kids.

### Most Downloaded Google Play Apps by Category

In [37]:
categories = freq_table(google_apps, 1)
cat = {}
for category in categories:
    tot = 0
    len_category = 0
    for row in google_apps:
        category_app = row[1]
        installs = row[5]
        if category_app == category:
            n = installs.replace('+', '').replace(',', '')
            tot += float(n)
            len_category += 1
    cat[category] = tot/len_category
    
ordenate(cat)

COMMUNICATION: 84359886.95348836
SOCIAL: 47694467.46440678
VIDEO_PLAYERS: 35554301.25714286
PRODUCTIVITY: 33434177.75707547
GAME: 30669601.761363637
PHOTOGRAPHY: 30114172.10447761
TRAVEL_AND_LOCAL: 26623593.58914729
NEWS_AND_MAGAZINES: 26488755.335689045
ENTERTAINMENT: 19256107.382550336
TOOLS: 13585731.809015421
SHOPPING: 12491726.096153846
BOOKS_AND_REFERENCE: 8318050.112554112
PERSONALIZATION: 5932384.647959184
EDUCATION: 5586230.769230769
MAPS_AND_NAVIGATION: 5286729.124087592
FAMILY: 5201959.181034483
WEATHER: 5196347.804878049
HEALTH_AND_FITNESS: 4642441.3841642225
SPORTS: 4560350.255208333
FINANCE: 2395215.120218579
BUSINESS: 2178075.7934782607
FOOD_AND_DRINK: 2156683.0787401577
HOUSE_AND_HOME: 1917187.0568181819
ART_AND_DESIGN: 1912893.8461538462
LIFESTYLE: 1407443.8193717278
DATING: 1129533.3632478632
COMICS: 934769.1666666666
LIBRARIES_AND_DEMO: 741128.3529411765
AUTO_AND_VEHICLES: 625061.305882353
PARENTING: 525351.8333333334
BEAUTY: 513151.88679245283
EVENTS: 249580.640625


The most downloaded apps from Google Play  are Communication, Social and Video Players, but that ones probably got influenced by big apps, like Whatsapp, Youtube, Instagram and others. So, the most used that can be easier to profit than go against great corporations are the productivity or the games. So, according to the last list, make an app about how to make a children more productive or games for childs can be a good option to profit in Google Play.

### Frequency of Genres at Google Play

In [21]:
display_table(google_apps, -4)

Tools: 7.767527675276753
Entertainment: 5.747232472324723
Education: 5.064575645756458
Medical: 4.271217712177122
Business: 4.243542435424354
Productivity: 3.911439114391144
Sports: 3.671586715867159
Personalization: 3.616236162361624
Communication: 3.5701107011070112
Lifestyle: 3.5147601476014763
Finance: 3.3763837638376386
Action: 3.367158671586716
Health & Fitness: 3.1457564575645756
Photography: 3.0904059040590406
Social: 2.7214022140221403
News & Magazines: 2.61070110701107
Shopping: 2.3985239852398523
Travel & Local: 2.370848708487085
Dating: 2.158671586715867
Books & Reference: 2.1309963099630997
Arcade: 2.029520295202952
Simulation: 1.8450184501845017
Casual: 1.7804428044280445
Video Players & Editors: 1.595940959409594
Puzzle: 1.2915129151291513
Maps & Navigation: 1.2638376383763839
Food & Drink: 1.1715867158671587
Role Playing: 1.0055350553505535
Strategy: 0.9870848708487084
Racing: 0.904059040590406
House & Home: 0.8118081180811807
Libraries & Demo: 0.7841328413284132
Auto &

As was said before, the most easier way to profit in Google Play is developing apps for kids. But, in the Genres, the most used of them is tools, so, complementing my last answer, the most safe way to profit in Google Play is creating some app that can be used as tool by kids or by them parents.

### Frequency of Genres at Apple Store

In [22]:
display_table(free_apple, -5)

Games: 58.13664596273293
Entertainment: 7.888198757763975
Photo & Video: 4.968944099378882
Education: 3.6645962732919255
Social Networking: 3.291925465838509
Shopping: 2.608695652173913
Utilities: 2.515527950310559
Sports: 2.142857142857143
Music: 2.049689440993789
Health & Fitness: 2.018633540372671
Productivity: 1.7391304347826086
Lifestyle: 1.5838509316770186
News: 1.3354037267080745
Travel: 1.2422360248447204
Finance: 1.1180124223602486
Weather: 0.8695652173913043
Food & Drink: 0.8074534161490683
Reference: 0.5590062111801243
Business: 0.5279503105590062
Book: 0.43478260869565216
Navigation: 0.18633540372670807
Medical: 0.18633540372670807
Catalogs: 0.12422360248447205


At Apple Store the most common Genre of app is by far 'Games', so, the focus in this platform is clearly fun. In this way, create a free game that the player receives an add everytime he dies could be a good exemple about how can we profit in this Genre.

### Most Download Apps by Genre at App Store

In [36]:
genre = freq_table(free_apple, -5)
for key in genre:
    total = 0
    len_genre = 0
    for row in english_apple:
        genre_app = row[-5]
        tot_rating = float(row[5])
        if key == genre_app:
            len_genre +=1
            total += tot_rating
    genre[key] = total/len_genre

ordenate(genre)

Social Networking: 60253.84920634921
Music: 29047.109489051094
Reference: 27037.188679245282
Shopping: 26635.011764705883
Finance: 23353.530612244896
Weather: 23145.246376811596
Food & Drink: 19934.386363636364
Navigation: 19370.821428571428
Travel: 19030.183333333334
News: 16980.315789473683
Games: 15595.90442477876
Sports: 15350.913461538461
Photo & Video: 14688.715542521993
Health & Fitness: 10802.157575757576
Book: 10359.2
Lifestyle: 8930.373737373737
Entertainment: 8862.409799554565
Productivity: 8508.089285714286
Utilities: 7927.525821596244
Business: 5149.320754716981
Catalogs: 3465.0
Education: 2472.278048780488
Medical: 648.952380952381


The most download apps at Apple Store are the Social Network ones, but probably are influenced by the giant apps like Instagram, Facebook, Whatsapp and others, so probably isn't safe try to profit against big companies (this happens with the music category too, because of apps like spotify, deezer, apple music...). So, the best options to try to profit are games (as was said before), or, according to this list, the Reference, Shopping or Finance are able to be a good option.

# Conclusion

At all, looks like the easier way to make money in both platforms at same time is developing a game for kids.

-> The most frequent genre in App Store is games.

-> Games is on Top 5 most downloaded games from Google Play.

-> The most frequent category in Google Play is  family, that involve childrens.