# Profitable App Profiles for the App Store and Google Play Markets
## The art of app profiling in an imaginary setting...
### ...but with a concrete result.

---

Once upon a time, in a cold Monday morning, I had a call with an imaginary boss. He told me that he wanted to build an app.

Imaginary Boss: "Hello? Imaginary Subordinate? Build me an app."

Me: "Hello, Imaginary Boss. Ok, but could you be more specific, please?"

Imaginary Boss: "I want it to be **a mobile app for Android and iOS**. Y'know, standard stuff."

Me: "You want it to be free or just be greedy?"

Imaginary Boss: "Nah. I'm feeling generous. Just make it **free to download and see how in-app ads help us generate revenue**. Get some historical data about this mobile apps and think about the factors that could influence the revenue."

Me: "Sure. I can look at the user numbers, especially those who see and engage with the ads. Giving them incentive for viewing ads might increase the revenue, too. I'll just take a look what type of apps people like the most."

Imaginary Boss: "Ok, looks like you already know what to do. Give me a report by today."

Me: "You're crazy."

Imaginary Boss: "I'm imaginary, I can be as crazy as I want."

Me: "Well, fine if you see it that way. Since I'm imaginary too, I can be crazier than you and have it sent by noon."

Imaginary Boss: "Ooh. Nice. I'll treat you some imaginary coffee after this."

Me: "Sure."

## Opening and exploring the (not imaginary) data

Being as lazy as I can be (i.e. to reduce the potential time and money cost for data collection), I tried to find any existing data to analyze my task. Luckily, I found some:
- Around ten thousand Android apps from the Google Play. ([data set](https://www.kaggle.com/lava18/google-play-store-apps))
- Around seven thousand iOS apps from the App Store. ([data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps))

Let's check it out.

In [2]:
from csv import reader

# Google Play dataset
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

# App Store dataset
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


Ooh, I love these iOS apps. Shoot, I don't understand what the header means. Some are self-explanatory, but others aren't. Based on the [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home), I think `'track name'`, `'currency'`, `'rating_count_tot'`, `'rating_count_ver'`, and `'prime_genre'` are useful.

Let's check the Android apps.

In [4]:
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


Oh, I haven't use any of these apps before, but they seem interesting! Anyway, I think `'App'`, `'Category'`, `'Reviews'`, `'Installs'`, `'Type'`, `'Price'`, and `'Genres'` are useful.

### Wrong data detected
So, I strolled around the [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion) to see any potential problems with this data sets. And yes, in the Google Play data set, [someone pointed out](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) there's an error for row 10472 (missing category so the data shifted to the left). Let's see.

In [5]:
print(android_header)
print('\n')
print(android[10472]) # incorrect row
print('\n')
print(android[14]) # example of a correct row

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['3D Color Pixel by Number - Sandbox Art Coloring', 'ART_AND_DESIGN', '4.4', '1518', '37M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 3, 2018', '1.2.3', '2.3 and up']


Well, it looks like row 10472 has a `'Rating'` of 19 which clearly doesn't make any sense since the maximum rating for a Google Play app is 5. It looks like this happened because this row is missing `'Category'`. I have two options here: correcting it (someone stated in the discussion that it belongs to the 'Lifestyle' section), or remove it. Let's just delete it.

In [6]:
print(len(android))
del android[10472]
print(len(android))

10841
10840


### Finding and removing the duplicates

Large data sets should raise the suspicion of data duplication. Let's check the first app listed in the above explore data output (Facebook).

In [7]:
for app in android:
    name = app[0]
    if name == 'Facebook':
        print(app)

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


Yup. Suspicion confirmed. Let's go through them all.

In [8]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Example of duplicate apps:', duplicate_apps[:5])

Number of duplicate apps: 1181


Example of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


I need to remove the duplicates for the analysis, but I need to know first which one to remove. Based on the information on the Facebook app, there is a difference in the `'Review'` column (number of reviews). I'll go with a row that has the highest review number to make the analysis more reliable and remove the others.

To do this, I'll create a new data set from a dictionary where each key is a unique app name with the value based on the highest number of reviews of that app.

In [9]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Expected length:', len(android) - 1181) 
#1181 is the number of duplicated app

print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


Now the `reviews_max` dictionary is ready to be used as a pointer to remove the duplicates.

I will add the current row (`app`) to the `android_clean` list, and the app name (`name`) to the `already_added` list if:
- The number of reviews in current `app` matches the one in the `reviews_max` dictionary.
- The `name` is not already in the `already_added` list. This is to account for cases where the highest number of reviews of a duplicate app is the same for more than one entry (e.g. Facebook app has four entries, and each has the same number of reviews). If I just check for the `reviews_max[name] == n_reviews`, the rows for that app that have the same number of reviews will still be included.

In [10]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

It's time to test the new data set! If it shows 9659 rows, then I have successfully removed the correct duplicates.

In [11]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


### My imaginary boss is a monolingual, so let's get rid of the non-English apps

As a monolingual, obviously the target market of my imaginary boss is English-speaking users (though I tried to teach him ありがとう and 谢谢 a few times already, but yeah). 

So, let's get rid of those non-English apps by considering the ASCII standard characters and the built-in `ord()` function. Symbols are a bit tricky, because there might be quite a significant number of apps that use symbols, so we can't be too perfect here.

...and, oh. My imaginary boss is allergic to emojis. He won't, ever, like, EVER, put too many emojis on his apps titles. So it's better to be careful not to consider app with way too many emojis in this data.

In [12]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    return True

print(is_english('Facebook'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

print(ord('™'))
print(ord('😜'))

True
False
False
False
8482
128540


Now, let's only remove the apps with more than three non-ASCII characters to minimize losses.

In [51]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
True


In [24]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

### Focus on the free app

In [25]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8864
3222


## Data analysis
### Most common apps by genre

I want to see what kind of apps more likely to attract users because the revenue will be heavily influenced by the number of users.

The validation for that aim is as follows:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we then develop it further.
3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

I suspect gamification apps will have many users. It's useful for any age, fun for kids, and better to have it in multiple operating systems just in case the user have many devices with different OS, and would like to sync their progress.

I'll create a frequency table to get a sense of the most common apps by using the `prime_genre` column in the App Store data set and `Genre` and `Category` for the Google Play data set.

In [26]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [28]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Games alone already taken up more than half of the free apps genre (58.1%), followed by the Entertainment genre (7.8%). I wonder if the developers think that users mostly download apps for fun and won't pay much for entertainment purposes, and therefore they won't be too bothered about in-app ads because it's free anyway, so the developers can generate revenue from it. But the numbers of fun apps don't imply they also have the greatest number of users.

It might be different for free productivity or educational app, users might expect a much more professional or polished ad-free paid app to ensure no ads are going to distract them, and hope that paid apps developers will consider best user experience to help them work or study.

In [27]:
display_table(android_final, 1) #Category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

The most common genre is Family, which I suspect is full of kids games. But overall, Google Play store offers a more balanced apps than the App Store which has more games.

In [29]:
display_table(android_final, -4) # Genres

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

This `'Genre'` column is more granular than the `'Category'`. Let's just stick with the `'Category'` as it gives more general sense.

## Most popular apps by genre on the App Store

To know what genres are the most popular (have the most users), I'll calculate the average number of installs per each genre.

To get the average numbers of installs per each genre, I have to isolate the apps of each genre, sum up the user ratings of that genre, and divide that sum by the number of apps belonging to that genre.

I can use `Installs` column for Google Play data, but since there is no counterpart for this in the App Store data, I'm going to use the `rating_count_tot`.

In [30]:
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0 #to store the sum of user ratings (number, not actual)
    len_genre = 0 #to store the genre-specific apps number
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Games : 22788.6696905016
Music : 57326.530303030304
Health & Fitness : 23298.015384615384
Social Networking : 71548.34905660378
News : 21248.023255813954
Business : 7491.117647058823
Navigation : 86090.33333333333
Travel : 28243.8
Medical : 612.0
Entertainment : 14029.830708661417
Photo & Video : 28441.54375
Utilities : 18684.456790123455
Reference : 74942.11111111111
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Shopping : 26919.690476190477
Weather : 52279.892857142855
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Book : 39758.5
Sports : 23008.898550724636
Catalogs : 4004.0
Productivity : 21028.410714285714


In [41]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Navigation, Music, Social Networking, Weather and Reference apps seem to be in the most average numbers.

I don't think we can get much benefit if we create an app under the Music and Social networking genre, since there are many big players there. Book genre is also dominated by Kindle.

Weather won't do much either since people won't look at their weather app for such a long time. Finance genre will require a specific domain knowledge, and it's impractical for now. Food & drink and shopping genres are popular because they have a real store for that, and we don't, so this won't be an option. Sports and Travel seem nice too, but could be too niche.

We can go with Reference, Photo & Video, Productivity, and Health & Fitness. I think all of these are potential. We can have popular books to be turned into an app under the Reference genre. Users always love new filters for their photos, so we can create an app for the Photo & Video genre. Gamified pomodoro technique for Productivity might also work because people will use that app for quite some time. Health & Fitness gamification will also increase engagement with the users and we're promoting good health.

But let's see whether any of these potential genres overlap with the categories in the Google Play. Hopefully so.

## Most popular apps by genre on the Google Play

I can use `Installs` column for Google Play data, so let's check it out.

In [38]:
display_table(android_final, 5) # Installs

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


Ok, apparently the data is open-ended (100+, 1,000+, etc.). To keep things simple, I'll consider the number for what it is (500,000+ means 500,000 installs, 100+ installs means 100 installs, etc.). To do this, I have to remove the comma and plus symbols to avoid errors.

In [40]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

SPORTS : 3638640.1428571427
BOOKS_AND_REFERENCE : 8767811.894736841
HOUSE_AND_HOME : 1331540.5616438356
PHOTOGRAPHY : 17840110.40229885
FAMILY : 3695641.8198090694
HEALTH_AND_FITNESS : 4188821.9853479853
NEWS_AND_MAGAZINES : 9549178.467741935
LIBRARIES_AND_DEMO : 638503.734939759
PRODUCTIVITY : 16787331.344927534
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
WEATHER : 5074486.197183099
SOCIAL : 23253652.127118643
EVENTS : 253542.22222222222
LIFESTYLE : 1437816.2687861272
DATING : 854028.8303030303
PARENTING : 542603.6206896552
SHOPPING : 7036877.311557789
COMMUNICATION : 38456119.167247385
EDUCATION : 1833495.145631068
FOOD_AND_DRINK : 1924897.7363636363
AUTO_AND_VEHICLES : 647317.8170731707
GAME : 15588015.603248259
BUSINESS : 1712290.1474201474
BEAUTY : 513151.88679245283
COMICS : 817657.2727272727
MEDICAL : 120550.61980830671
ART_AND_DESIGN : 1986335.0877192982
VIDEO_PLAYERS : 24727872.452830188
FINANCE : 1387692.475609756
PERSONALIZATION : 5201482.6122448975
MAPS_AND

Since my imaginary boss would like to have the app in both OS, I'll focus on the genres that I've already decided based on the iOS app earlier.

I'll suggest that we build an app for the Photography, Health and Fitness, Productivity and Books and Reference categories. 

Wait, what's that Tools category? Seems interesting.

In [44]:
for app in android_final:
    if app[1] == 'TOOLS':
        print(app[0], ':', app[5])

Google : 1,000,000,000+
Google Translate : 500,000,000+
Moto Display : 10,000,000+
Motorola Alert : 50,000,000+
Motorola Assist : 50,000,000+
Moto Suggestions ™ : 1,000,000+
Moto Voice : 10,000,000+
Calculator : 100,000,000+
Device Help : 100,000,000+
Account Manager : 100,000,000+
myMetro : 10,000,000+
File Manager : 50,000,000+
My Telcel : 50,000,000+
Calculator - free calculator, multi calculator app : 10,000,000+
ASUS Sound Recorder : 10,000,000+
iWnn IME for Nexus : 5,000,000+
Samsung Max - Data Savings & Privacy Protection : 10,000,000+
Android TV Remote Service : 1,000,000+
ZenUI Help : 10,000,000+
Calculator - free calculator ,multi calculator app : 100,000+
SHAREit - Transfer & Share : 500,000,000+
ZenUI Keyboard – Emoji, Theme : 10,000,000+
Files Go by Google: Free up space on your phone : 10,000,000+
SD card backup : 1,000,000+
Nokia mobile support : 5,000,000+
File Manager -- Take Command of Your Files Easily : 10,000,000+
Samsung Calculator : 100,000,000+
Clear : 10,000,00

In [45]:
for app in android_final:
    if app[1] == 'TOOLS' and (app[5] == '1,000,000,000+'
                             or app[5] == '500,000,000+'
                             or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google : 1,000,000,000+
Google Translate : 500,000,000+
Calculator : 100,000,000+
Device Help : 100,000,000+
Account Manager : 100,000,000+
SHAREit - Transfer & Share : 500,000,000+
Samsung Calculator : 100,000,000+
Gboard - the Google Keyboard : 500,000,000+
Google Korean Input : 100,000,000+
Share Music & Transfer Files - Xender : 100,000,000+
Tiny Flashlight + LED : 100,000,000+
GO Keyboard - Cute Emojis, Themes and GIFs : 100,000,000+
Speedtest by Ookla : 100,000,000+
CM Locker - Security Lockscreen : 100,000,000+
Applock : 100,000,000+
Clean Master- Space Cleaner & Antivirus : 500,000,000+
Lookout Security & Antivirus : 100,000,000+
Google Now Launcher : 100,000,000+
360 Security - Free Antivirus, Booster, Cleaner : 100,000,000+
Samsung Smart Switch Mobile : 100,000,000+
Avast Mobile Security 2018 - Antivirus & App Lock : 100,000,000+
AppLock : 100,000,000+
AVG AntiVirus 2018 for Android Security : 100,000,000+
Security Master - Antivirus, VPN, AppLock, Booster : 500,000,000+
Batt

Ooookay, there are so many apps that we can build under this category: alarm, calculator, flash lights, voice recorder, keyboards, etc. Cool.

Now, let's check the Productivity category.

In [46]:
for app in android_final:
    if app[1] == 'PRODUCTIVITY' and (app[5] == '1,000,000,000+'
                             or app[5] == '500,000,000+'
                             or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Microsoft Word : 500,000,000+
Microsoft Outlook : 100,000,000+
Microsoft OneDrive : 100,000,000+
Microsoft OneNote : 100,000,000+
Google Keep : 100,000,000+
ES File Explorer File Manager : 100,000,000+
Dropbox : 500,000,000+
Google Docs : 100,000,000+
Microsoft PowerPoint : 100,000,000+
Samsung Notes : 100,000,000+
SwiftKey Keyboard : 100,000,000+
Google Drive : 1,000,000,000+
Adobe Acrobat Reader : 100,000,000+
Google Sheets : 100,000,000+
Microsoft Excel : 100,000,000+
WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100,000,000+
Google Slides : 100,000,000+
ColorNote Notepad Notes : 100,000,000+
Evernote – Organizer, Planner for Notes & Memos : 100,000,000+
Google Calendar : 500,000,000+
Cloud Print : 500,000,000+
CamScanner - Phone PDF Creator : 100,000,000+


I can see giant players here like Microsoft and Google in the top ranks. Let's see the middle ranks, maybe we can fit in there.

In [48]:
for app in android_final:
    if app[1] == 'PRODUCTIVITY' and (app[5] == '10,000,000+'
                             or app[5] == '5,000,000+'
                             or app[5] == '1,000,000+'):
        print(app[0], ':', app[5])

All-In-One Toolbox: Cleaner, Booster, App Manager : 10,000,000+
AVG Cleaner – Speed, Battery & Memory Booster : 10,000,000+
QR Scanner & Barcode Scanner 2018 : 10,000,000+
Chrome Beta : 10,000,000+
Google PDF Viewer : 10,000,000+
My Claro Peru : 5,000,000+
Power Booster - Junk Cleaner & CPU Cooler & Boost : 1,000,000+
Google Assistant : 10,000,000+
Metro name iD : 10,000,000+
Archos File Manager : 5,000,000+
ASUS SuperNote : 10,000,000+
HTC File Manager : 10,000,000+
MyMTN : 1,000,000+
ASUS Quick Memo : 10,000,000+
HTC Calendar : 10,000,000+
ASUS Calling Screen : 10,000,000+
lifebox : 5,000,000+
Yandex.Disk : 5,000,000+
Content Transfer : 5,000,000+
HTC Mail : 10,000,000+
MyVodafone (India) - Online Recharge & Pay Bills : 10,000,000+
Microsoft Translator : 5,000,000+
Hacker's Keyboard : 1,000,000+
Security & Privacy : 1,000,000+
Loop - Habit Tracker : 1,000,000+
TickTick: To Do List with Reminder, Day Planner : 1,000,000+
Keeper: Free Password Manager & Secure Vault : 10,000,000+
Pushb

I think Productivity category is also potential. We can create these apps: to do list, reminder, calendar, notepad, habit tracker, and that pomodoro technique app that I mentioned earlier.

## Conclusion
Me: "Hello? Imaginary Boss? Can you hear me?"

Imaginary Boss: "Yup. So, what did you get?"

Me: "We can build an app that falls under the Photography and Health & Fitness categories, which are already popular but might be a good idea to get into as long as we can add special features. For example, for the Photography app, we can add monthly filter or celebrities-inspired filters, or even user-generated filters and make a contest out of it. Or for the Productivity app, we can create pomodoro app with gamification, or any other productivity app with gamification, since it's always be nice to have some fun during work and study. And for the Tools category apps, the options are pretty much limitless, we can create alarms, calculators, flashlights, or maybe some kind of survival-related app."

Imaginary Boss: "Well, sounds like we can have a good brainstorming session on this. Thanks, and enjoy your imaginary coffee!"

Me: "Thanks!"