# Creating a Popular and Profitable App

In this project we will be providing information on to what kind of app will draw in the most people. This app will be a free app that gains revenue from ads, so the more people that download and continue to use it, the better!

To find out the best app to make, we will want to sort through the app's in both Google Play and the iOS App Store. These store's have well over 4 million app's combined, which would require a significant amount of time to analyze! Instead, we will be using two existing data sets:

* A data set with around 10,000 Android apps, found [here](https://www.kaggle.com/lava18/google-play-store-apps).
* A data set with around 7,000 iOS apps, found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

In [1]:
from csv import reader

opened_ios = open("AppleStore.csv") ## First we will open the iOS App Store data set, and create a list for it.
read_ios = reader(opened_ios)
ios_apps = list(read_ios)
ios_header = ios_apps[0]

opened_android = open("googleplaystore.csv") ## And do the same for the Google Play data set.
read_android = reader(opened_android)
android_apps = list(read_android)
android_header = android_apps[0]

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False): ## Not made by me
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Using the `explore_data()` function (provided), let's make sure everything was listed correctly. With this function, we can print a slice of a selected list of lists, as well as print the total number of columns and rows.

In [3]:
print("First few rows of App Store data set:", "\n")
explore_data(ios_apps[1:], 0, 5, True)
print("\n", "First few rows of Google Play data set:", "\n")
explore_data(android_apps[1:], 0, 5, True)

First few rows of App Store data set: 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16

 First few rows of Google Play data set: 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', 

Everything looks good! We have 7197 apps listed in our App Store data set, and 10841 apps listed in our Google Play data set. 

## Analysis Decision

Now we need to decide what information in both these data set lists will be the most helpful in our analysis. The header of each set will tell us what information is in what column. Let's view the header's and make a decision.

In [4]:
print("App Store Header:")
print(ios_apps[:1])
print("\n", "Google Play Header:")
print(android_apps[:1])

App Store Header:
[['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']]

 Google Play Header:
[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']]


The header for the App Store may be a little confusing. You can find more detailed documentation on each column [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

I deduce the most useful columns for our analysis will be "Genres"/"Category" ("prime_genre"), "Rating" ("user_rating"), and "Reviews" ("rating_count_tot"). I chose these columns for a number of reason:
* The higher the number of ratings, the more people that have used the app
* The better the rating compared to the number of ratings shows how popular the app is
* The genre will tell us which genre market is the most popular

## Data Cleaning: Duplicates

Now we have both our data set lists available, we have our goal, and we know how to achieve that goal. We now need to clean the data, or remove the incorrect data.

Looking through the dedicated [discussion page](https://www.kaggle.com/lava18/google-play-store-apps/discussion) on the Google Play data set, I notice that there's a [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describing an error in the row 10472. Let's check this out ourselves, and decide if it's indeed an error.

In [5]:
print(android_apps[10472 + 1]) ## We are adding one row to include the header, which the discussion creator did not include.
print("\n", "The number of columns in the row of question:", "\n", len(android_apps[10472 + 1]))
print("\n", "The number of columns in our header:", "\n", len(android_apps[0]))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']

 The number of columns in the row of question: 
 12

 The number of columns in our header: 
 13


Notice that in the row of question, the second column contains a number, rather than what we expected to be a genre. In this case, the genre column is completely missing for this entry. We proved that by printing the length of the header and this entry, which shows a difference of one.

Due to this mistake, let's remove this row.

In [6]:
del android_apps[10472 + 1]

print(android_apps[10472 + 1])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


Continuing to look through the Google Play [discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion), I also notice that there are several discussions on the topic of duplicates of some apps - Instagram for example. Let's also check this out for ourselves.

In [7]:
for rows in android_apps: ## Finding all instances of Insagram in our dataset
    if rows[0] == "Instagram":
        print(rows)
        print("\n")

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




Intsagram appears four times in `android_apps`. 

Instagram may not be the only duplicate, so let's search for more using a loop that will iterate through every entry in our Google Play data set - and tell us the number of duplicate entries.

We'll make a loop that will grab the first column of each entry (the title), and place it into the `unique_apps` list, only if there is no such title already there. If the title is already in `unique_apps`, it'll place the title in `duplicate_apps`.

In [8]:
unique_apps = []
duplicate_apps = []

for rows in android_apps[1:]: ## This loop will put the title of each entry and put it into the unique_apps list. If a title is already there, it will put it into the duplicate_apps list.
    name = rows[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print("The number of duplicate apps:", len(duplicate_apps))
print("\n")
print("Duplicate apps:")
print(duplicate_apps[:20])

The number of duplicate apps: 1181


Duplicate apps:
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express']


Wow! There are 1181 duplicate entries in our Google Play data set. We are going to want to remove all of these duplicates.

To make our analysis as accurate as possible, we're going to keep the most up-to-date entries. We can find these entries by using the number of user ratings. The entries with the most user ratings will be the most up-to-date.

In [9]:
max_reviews = {}

for row in android_apps[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if name in max_reviews and max_reviews[name] < n_reviews:
        max_reviews[name] = n_reviews
    if name not in max_reviews:
        max_reviews[name] = n_reviews

if len(max_reviews) == 9659:
    print("The length is correct!")
else:
    print("The length is incorrect!")        

The length is correct!


Above, we created a dictionary named `max_reviews` and created two `if` statements inside a loop. The dictionaries keys will be the title, and the key's value will be the number of reviews. The loop will now place a key-value pair of each entry into max_reviews. If the current iteration's key already exists and has a higher number of reviews, it will update that key's value to the current iteration.

`max_reviews` is now a dictionary that holds the keys for the most up-to-date apps. We can now use this dictionary to create a new list with all our entries that match our dictionaries key-value pairs.

In [10]:
android_clean = []
already_added = []

for rows in android_apps[1:]:
    name = rows[0]
    n_reviews = float(rows[3])
    if (max_reviews[name] == n_reviews) and (name not in already_added):
        android_clean.append(rows)
        already_added.append(name)
    
explore_data(android_clean, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13


It appears everything went smoothly. 

We created two new lists. The first list, `android_clean` is all of our entries from `android_apps` that matched our key-value pairs in our `max_reviews` dictionary. Our second list, `already_added` acted as a buffer for our `if` statement above. Without the buffer, duplicates of entries with the same number of reviews would have been put into our new `android_clean` data set.

Let's check our App Store data set for duplicates.

In [11]:
unique_ios = []
duplicate_ios = []

for rows in ios_apps[1:]:
    name = rows[0]
    if name in unique_ios:
        duplicate_ios.append(name)
    else:
        unique_ios.append(name)
        
print("Number of duplicates:", len(duplicate_ios))

Number of duplicates: 0


Excellent. There are no duplicates in our App store data set.

## Data Cleaning: Non-English

We want to create an app that's directed toward an English speaking audience, but our lists contain apps that are directed to other languages as well. We want to remove these apps.

All characters used in a string has a number attached to it. All the commonly used English characters (alphabet, numbers, punctuation, ect) are in range of 0 to 127, according to [ASCII](https://en.wikipedia.org/wiki/ASCII). If a character is outside of this range, it is unlikely to be an English character.

Let's create a function that will look at a string and tell us if the characters are or are not English.

In [12]:
def app_english(string):
    for letter in string:
        if ord(letter) > 127:
            return False
    return True
        
print(app_english("df爱奇艺PPS -《欢乐颂2》电视剧热播"))
print(app_english("Instagram"))
print(app_english("Docs To Go™ Free Office Suite"))
print(app_english("Instachat 😜"))

False
True
False
False


The function returned `False` for some English apps. This is due to some characters in the English apps being outside the range of 127. For example, ™ corresponds to 8482. Let's fix this function to only `return False` if there are more than three characters that are outside the range of 127.

In [13]:
del app_english

In [14]:
def app_english(string):
    max = 0
    for letter in string:
        if ord(letter) > 127:
            max += 1
        
    if max > 3:
        return False
    else:
        return True
        
print(app_english("爱奇艺PPS -《欢乐颂2》电视剧热播"))
print(app_english("Instagram"))
print(app_english("Docs To Go™ Free Office Suite"))
print(app_english("Instachat 😜"))

False
True
True
True


Now that we have a working function to check for English apps, we can use it to put our English apps into a new list.

In [15]:
english_android = []
english_ios = []

for rows in android_clean:
    name = rows[0]
    if app_english(name) is True:
        english_android.append(rows)
        
for rows in ios_apps[1:]:
    name = rows[1]
    if app_english(name) is True:
        english_ios.append(rows)
        
print("English iOS Apps:", len(english_ios))
print("English Android Apps:", len(english_android))

English iOS Apps: 6183
English Android Apps: 9614


Now we have two new lists containing mostly English apps. A few non-english apps may have slipped through, but now our analysis will be much more accurate.

## Data Cleaning: Paid Apps

Since we are going to build an app that's free and earns revenue through ads, we'll want to filter out all of the paid apps to further increase our analysis' accuracy.

In [16]:
free_android = []
free_ios = []

for rows in english_android:
    price = rows[7]
    if price == "0":
        free_android.append(rows)
        
for rows in english_ios:
    price = rows[4]
    if price == "0.0":
        free_ios.append(rows)
        
print("Free Android Apps:", len(free_android))
print("Free iOS Apps:", len(free_ios))

Free Android Apps: 8864
Free iOS Apps: 3222


## Analysis: Most Common Genres

Now that we have finished cleaning our data, we can begin our analysis. Our validation strategy is comprised in three steps:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

With these steps in mind, we need to find an app profile that will fit in with both Google Play and the iOS App Store. We can begin this by finding the most common genres in both markets. The Google Play data set has two different columns that seem to display a genre, we'll view both of them.

To do this we'll create a function that will create a frequency table based on the number of genres in a given data set. This function will be used in the `display_table()` function (provided) to show the most common genres in order from most common to least common.

In [17]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for rows in dataset:
        freq = rows[index]
        
        if freq in table:
            table[freq] += 1
            total += 1
        else:
            table[freq] = 1
            total += 1
            
    for value in table:
        table[value] /= total
        table[value] *= 100
        
    return table

def display_table(dataset, index): ## Not made by me
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
print("-------------------------------------", "\n" "Google Play Category Frequency Table:", "\n", "-------------------------------------", "\n")
display_table(free_android, 1)
print("\n", "-------------------------------------", "\n", "Google Play Genres Frequency Table:", "\n", "-------------------------------------", "\n")
display_table(free_android, 9)
print("\n", "-------------------------------------", "\n", "App Store Genres Frequency Table:", "\n", "-------------------------------------", "\n")
display_table(free_ios, 11)

------------------------------------- 
Google Play Category Frequency Table: 
 ------------------------------------- 

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.823

Looking through our tables, I instantly notice a couple things.
* The App Store has a very large number of Games
* The Google Play store has less Games, and more apps in the practical genres
* The Google Play store has lots of Family oriented apps

Lets first analyze the App Store:
Games and Entertainment are the most popular genres in this store by far. This means these markets are heavily saturated, and will require quality apps to really stick out and pull people in. Practical genres seem to have the least number of apps. With fewer apps to compete with, it may be easier to bring in more people.

Now lets analyze the Google Play store's tables:
The Google Play store seems to have a lot more practical genre apps. Tools and Education seem to be much more common in this market. Entertainment is still a runner-up, but not nearly as common as the App Store.

Comparing the two, we can assess that the Google Play market is more educational and the App Store is more entertainment focused. 

This doesn't give us all of the information we need, but it's a good start. We now know what markets are the most saturated and what markets are lacking.

## Analysis: Most Popular Genres 
## App Store

Now that we know what genres are the most common, let's see which of these genres are the most used. Knowing this information will tell us if a genre is common due to the large number of users, or due to heavy saturation.

Let's start with the App Store data set. We can do this by displaying the average number of user reviews for every genre. While we'd prefer the number of downloads, this information is not available to us in our App Store data set.

In [18]:
ios_genres = freq_table(free_ios, 11)

for genre in ios_genres:
    total = 0
    len_genre = 0
    for rows in free_ios:
        genre_app = rows[11]
        if genre_app == genre:
            total += float(rows[5])
            len_genre += 1
    average = total / len_genre
    print(genre, average)

Games 22788.6696905016
Food & Drink 33333.92307692308
Utilities 18684.456790123455
Business 7491.117647058823
Weather 52279.892857142855
Photo & Video 28441.54375
Sports 23008.898550724636
Navigation 86090.33333333333
News 21248.023255813954
Productivity 21028.410714285714
Medical 612.0
Entertainment 14029.830708661417
Finance 31467.944444444445
Education 7003.983050847458
Catalogs 4004.0
Reference 74942.11111111111
Social Networking 71548.34905660378
Travel 28243.8
Music 57326.530303030304
Shopping 26919.690476190477
Lifestyle 16485.764705882353
Book 39758.5
Health & Fitness 23298.015384615384


It seems that although the market has a very large number of free games, people seem to prefer more practical apps in genres such as Navigation, Reference, Weather, and Finance.

With this in mind, a genre with a high user audience and fewer apps to compete with would surely allow our app to get noticed. However, certain genres may have a large average due to a few large outliers, and the genre is popular only due to these few large apps.

For example, let's take a look at the Reference genre.

In [19]:
for row in free_ios:
    name = row[1]
    genre = row[11]
    total_rating = row[5]
    if genre == "Reference":
        print(name, total_rating)

Bible 985920
Dictionary.com Dictionary & Thesaurus 200047
Dictionary.com Dictionary & Thesaurus for iPad 54175
Google Translate 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition 17588
Merriam-Webster Dictionary 16849
Night Sky 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools 4693
GUNS MODS for Minecraft PC Edition - Mods Tools 1497
Guides for Pokémon GO - Pokemon GO News and Cheats 826
WWDC 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free 718
VPN Express 14
Real Bike Traffic Rider Virtual Reality Glasses 8
教えて!goo 0
Jishokun-Japanese English Dictionary & Translator 0


It appears that the average of this this genre was heavily skewed by a few giants, such as the Bible app, with close to a million user reviews.

An ideal genre would be one that isn't saturated with tons of apps and has balanced reviews for most of the apps in said genre. For example, the Entertainment genre.

In [20]:
for row in free_ios:
    name = row[1]
    genre = row[11]
    total_rating = row[5]
    if genre == "Entertainment":
        print(name, total_rating)

Netflix 308844
Fandango Movies - Times + Tickets 291787
Colorfy: Coloring Book for Adults 247809
IMDb Movies & TV - Trailers and Showtimes 183425
TRUTH or DARE!!! - FREE 171055
Mad Libs 117889
Twitch 109549
Action Movie FX 101222
Voice Changer Plus 98777
iFunny :) 98344
The CW 97368
The Moron Test 88613
DIRECTV 81006
ABC – Watch Live TV & Stream Full Episodes 78890
Xbox 72187
Redbox 60236
Talking Tom Cat 2 for iPad 56399
Hulu: Watch TV Shows & Stream the Latest Movies 56170
NBC – Watch Now and Stream Full TV Episodes 55950
Emoji> 55338
DIRECTV App for iPad 47506
Amazon Prime Video 43667
CBS Full Episodes and Live TV 39436
FOX NOW - Watch Full Episodes and Stream Live TV 39391
Talking Angela for iPad 32763
Recolor - Coloring Book 31180
Talking Ben the Dog for iPad 31116
Talking Tom Cat for iPad 29492
YouTube Kids 28560
Tom's Love Letters 27711
HBO GO 26278
NFL Sunday Ticket 24258
Pigment - Coloring Book for Adults 23967
Disney Channel – Watch Full Episodes, Movies & TV 21082
BuzzTube - 

This genre only has a few large apps that focus on video streaming and comedy. The middle ground seems to have a large variety, and has a few family oriented apps. This genre seems to be a good market to consider.

## Google Play
Let's check the most popular genres in the Google Play store. Since this data set has a column for number of installs, we'll use this column. To make our information simple, we'll assume that apps with 100,000+ installs have 100,000 installs, apps with 1,000,000+ installs have 1,000,000 installs, and so on.

In [21]:
android_genres = freq_table(free_android, 1)

for genre in android_genres:
    total = 0
    len_genre = 0
    for rows in free_android:
        genre_app = rows[1]
        if genre_app == genre:
            installs = rows[5]
            installs = installs.replace("+", "")
            installs = installs.replace(",", "")
            total += float(installs)
            len_genre += 1
    average = total / len_genre
    print(genre, average)

LIFESTYLE 1437816.2687861272
HOUSE_AND_HOME 1331540.5616438356
EVENTS 253542.22222222222
TOOLS 10801391.298666667
PHOTOGRAPHY 17840110.40229885
FAMILY 3695641.8198090694
SHOPPING 7036877.311557789
LIBRARIES_AND_DEMO 638503.734939759
SPORTS 3638640.1428571427
PARENTING 542603.6206896552
WEATHER 5074486.197183099
TRAVEL_AND_LOCAL 13984077.710144928
FOOD_AND_DRINK 1924897.7363636363
HEALTH_AND_FITNESS 4188821.9853479853
FINANCE 1387692.475609756
PRODUCTIVITY 16787331.344927534
BUSINESS 1712290.1474201474
EDUCATION 1833495.145631068
COMMUNICATION 38456119.167247385
NEWS_AND_MAGAZINES 9549178.467741935
AUTO_AND_VEHICLES 647317.8170731707
BEAUTY 513151.88679245283
COMICS 817657.2727272727
PERSONALIZATION 5201482.6122448975
SOCIAL 23253652.127118643
VIDEO_PLAYERS 24727872.452830188
ART_AND_DESIGN 1986335.0877192982
DATING 854028.8303030303
MEDICAL 120550.61980830671
MAPS_AND_NAVIGATION 4056941.7741935486
ENTERTAINMENT 11640705.88235294
GAME 15588015.603248259
BOOKS_AND_REFERENCE 8767811.89473

There seems to be heavy markets in Productivity and Tools. Lets view some of the apps in these markets.

In [22]:
for row in free_android:
    name = row[0]
    genre = row[1]
    total_rating = row[5]
    if genre == "PRODUCTIVITY":
        print(name, total_rating)
        
        
print("\n", "\n")
        
for row in free_android:
    name = row[0]
    genre = row[1]
    total_rating = row[5]
    if genre == "TOOLS":
        print(name, total_rating)

Microsoft Word 500,000,000+
All-In-One Toolbox: Cleaner, Booster, App Manager 10,000,000+
AVG Cleaner – Speed, Battery & Memory Booster 10,000,000+
QR Scanner & Barcode Scanner 2018 10,000,000+
Chrome Beta 10,000,000+
Microsoft Outlook 100,000,000+
Google PDF Viewer 10,000,000+
My Claro Peru 5,000,000+
Power Booster - Junk Cleaner & CPU Cooler & Boost 1,000,000+
Google Assistant 10,000,000+
Microsoft OneDrive 100,000,000+
Calculator - unit converter 50,000,000+
Microsoft OneNote 100,000,000+
Metro name iD 10,000,000+
Google Keep 100,000,000+
Archos File Manager 5,000,000+
ES File Explorer File Manager 100,000,000+
ASUS SuperNote 10,000,000+
HTC File Manager 10,000,000+
MyMTN 1,000,000+
Dropbox 500,000,000+
ASUS Quick Memo 10,000,000+
HTC Calendar 10,000,000+
Google Docs 100,000,000+
ASUS Calling Screen 10,000,000+
lifebox 5,000,000+
Yandex.Disk 5,000,000+
Content Transfer 5,000,000+
HTC Mail 10,000,000+
Advanced Task Killer 50,000,000+
MyVodafone (India) - Online Recharge & Pay Bills 1

These markets seem to have lots of giants in them. Competing with these larger apps may prove to be difficult, so we may want to avoid these genres.

In [23]:
for rows in free_android:
    name = rows[0]
    genre = rows[1]
    total_rating = rows[5]
    if genre == "TOOLS" and (total_rating == "100,000,000+" or total_rating == "500,000,000+" or total_rating == "1,000,000,000+"):
        print(name, total_rating)
        
print("\n", "\n")

for rows in free_android:
    name = rows[0]
    genre = rows[1]
    total_rating = rows[5]
    if genre == "PRODUCTIVITY" and (total_rating == "100,000,000+" or total_rating == "500,000,000+" or total_rating == "1,000,000,000+"):
        print(name, total_rating)

Google 1,000,000,000+
Google Translate 500,000,000+
Calculator 100,000,000+
Device Help 100,000,000+
Account Manager 100,000,000+
SHAREit - Transfer & Share 500,000,000+
Samsung Calculator 100,000,000+
Gboard - the Google Keyboard 500,000,000+
Google Korean Input 100,000,000+
Share Music & Transfer Files - Xender 100,000,000+
Tiny Flashlight + LED 100,000,000+
GO Keyboard - Cute Emojis, Themes and GIFs 100,000,000+
Speedtest by Ookla 100,000,000+
CM Locker - Security Lockscreen 100,000,000+
Applock 100,000,000+
Clean Master- Space Cleaner & Antivirus 500,000,000+
Lookout Security & Antivirus 100,000,000+
Google Now Launcher 100,000,000+
360 Security - Free Antivirus, Booster, Cleaner 100,000,000+
Samsung Smart Switch Mobile 100,000,000+
Avast Mobile Security 2018 - Antivirus & App Lock 100,000,000+
AppLock 100,000,000+
AVG AntiVirus 2018 for Android Security 100,000,000+
Security Master - Antivirus, VPN, AppLock, Booster 500,000,000+
Battery Doctor-Battery Life Saver & Battery Cooler 1

The Art and Design genre seem to be rather popular. Let's take a closer look.

In [24]:
for row in free_android:
    name = row[0]
    genre = row[1]
    total_rating = row[5]
    if genre == "ART_AND_DESIGN":
        print(name, total_rating)

Photo Editor & Candy Camera & Grid & ScrapBook 10,000+
U Launcher Lite – FREE Live Cool Themes, Hide Apps 5,000,000+
Sketch - Draw & Paint 50,000,000+
Pixel Draw - Number Art Coloring Book 100,000+
Paper flowers instructions 50,000+
Smoke Effect Photo Maker - Smoke Editor 50,000+
Infinite Painter 1,000,000+
Garden Coloring Book 1,000,000+
Kids Paint Free - Drawing Fun 10,000+
Text on Photo - Fonteee 1,000,000+
Name Art Photo Editor - Focus n Filters 1,000,000+
Tattoo Name On My Photo Editor 10,000,000+
Mandala Coloring Book 100,000+
3D Color Pixel by Number - Sandbox Art Coloring 100,000+
Learn To Draw Kawaii Characters 5,000+
Photo Designer - Write your name with shapes 500,000+
350 Diy Room Decor Ideas 10,000+
FlipaClip - Cartoon animation 5,000,000+
ibis Paint X 10,000,000+
Logo Maker - Small Business 100,000+
Boys Photo Editor - Six Pack & Men's Suit 100,000+
Superheroes Wallpapers | 4K Backgrounds 500,000+
HD Mickey Minnie Wallpapers 50,000+
Harley Quinn wallpapers HD 10,000+
Colo

This genre seems to have much fewer giants. This may be the market we want to create an app for. The giant apps seem to be more focused around professional design, while the middle base seems to focus on drawing or coloring.

We may want to create an app that focuses on combining some of these middle based apps for a family oriented audience. We would want our app to stick out from the rest, and draw in a large audience.

We could do this by catering to both adults and children. The app could be based around providing a drawing and coloring platform, while also providing great family oriented craft ideas. The app could also allow the users to publish their own projects or drawings inside the app, allowing a connection between families.

## Conclusion

In conclusion, we analyzed app data from both the App Store and Google Play with a goal of recommending a profitable app profile for both markets.

An art and design app that is family-oriented may draw in a large audience. This app could cater to both adults and children, allowing for a larger user base. Features that would prove to be fun and useful for families could be a coloring and drawing studio, a DIY project notebook, and a community sharing network that allows users to show their projects and ideas.