# I Have an Idea for an App, Should I build it?
A few years ago I had an idea for an app. But I knew nothing about coding, or app building. I still don't know about app building, but I think I've learned a thing or two about coding. 

In this notebook I will analyze data to determine what kind of free apps are likely to attract more users and be more popular on both the Apple App Store and the Google Play Store. A popular app has a better chance of generating ad revenue. Which would be necessary since a paid version of my idea already exists on the Apple App Store. After determining the types of apps that are popular. I'll decide if my idea has a chance.

This notebook is based on a guided project from  Dataquest.io, an online platform to learn Data Analysis and Data Science. The learning goal of the project was to review common data types, lists, for loops, conditional statements, dictionaries, and functions. I tweaked the project to apply to my own experinces described above.


## Open and explore the data
As of 2019 Q4 there are more than 2.5 million apps on the Google Play Store and more than 1.8 million apps on the Apple App store. [Source](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)

Instead of collecting and analyzing this huge amount of data, I will use a publicly available subset of data from each store available on kaggle. 
- The Apple app store dataset contains approximately seven thousand iOS apps and is available [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps). 
- The Google Play Store dataset contains data on approximately ten thousand Android apps and is available [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps). 

In [30]:
from csv import reader

# Open Apple Store Dataset
open_file = open('AppleStore.csv')
read_file = reader(open_file)
apple = list(read_file)
apple_header = apple[0]
apple = apple[1:]

#Open Google Play Dataset
open_file = open('googleplaystore.csv')
read_file = reader(open_file)
google = list(read_file)
google_header = google[0]
google = google[1:]


In [31]:
def explore_data(list_of_lists, start, end, 
                 rows_and_columns = False):
    dataset_slice = list_of_lists[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of rows:', len(list_of_lists))
        print('Number of columns:', len(list_of_lists[0]))
        


In [32]:
print(google_header)
print('\n')
explore_data(google, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [33]:
print(apple_header)
print('\n')
explore_data(apple, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


## Data Cleaning
 
### Delete Inaccurate Data
Kaggle datasets have a dedicated discussion section. The google data set has a known error at index 10472 that is mentioned in the discussion page. [here](https://www.kaggle.com/lava18/google-play-store-apps/discussion?sortBy=top&group=all&page=1&pageSize=20&category=all)
 
Below I inspect the list and compare it to the header. 

In [34]:
print(google_header)
print('\n')
print(google[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Index 10472 corresponds to the app Life Made WI-Fi Touchscreen Photo Frame. Compared to the header row, the Category value is missing and the rating is 19. The maximum rating for a Google Play app is 5. I'll just delete this row.

In [35]:
print(len(google))
del(google[10472])
print(len(google))

10841
10840


The App Store discussion page does not mention any errors or missing values. Fingers crossed I don't find any.

### Identify duplicate entries

Now that I've checked for missing data, I'll now check for duplicate entries.

In [36]:
duplicate_apps = []
unique_apps = []

for app in google:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of unique apps:', len(unique_apps) )       
print('Number of duplicate apps:', len(duplicate_apps))
print('Examples of duplicate apps:', duplicate_apps[:10])

Number of unique apps: 9659
Number of duplicate apps: 1181
Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


The code didn't identify any duplicates in the App Store data.

Now I'll inspect the Slack duplicates to see which entry I should keep.

In [37]:
print(google_header)
print('\n')
for app in google:
    name = app[0]
    if name == 'Slack':
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


The Slack duplicates have different values in index 3 which corresponds to the number of reviews. This suggests the data was collected at different times. Because the number of reviews generally increases the longer an app is available the entry with the most reviews is likely the most recent. I'll keep the app with the most reviews and delete the other duplicates.

### Remove Duplicate entries
To remove the duplicates I will: 
- Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
- Use the information stored in the dictionary and create a new data set, which will have only one entry per app (and for each app, I will only select the entry with the highest number of reviews).

In [38]:
reviews_max = {}
for app in google:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Expected number of unique apps:', len(unique_apps) )
print ('Actual number of unique apps:', len(reviews_max))

Expected number of unique apps: 9659
Actual number of unique apps: 9659


In [39]:
google_clean = []
already_added = []

for app in google:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        google_clean.append(app)
        already_added.append(name)
        
explore_data(google_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


### Remove Non-English Apps

The app I had in mind is geared toward an English speaking audience. Both data sets contain apps from across the world, including apps directed to non english speakers as shown below.

In [40]:
print(apple[6731][1])

print(google_clean[4412][0])
print(google_clean[7940][0])

【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


Below I'll build a function incorporating the built in [ord()](https://docs.python.org/3.4/library/functions.html?highlight=ord#ord) funtion to filter out non-english apps. Given a string representing one Unicode character, the ord() function returns an integer representing the Unicode code point of that character. English text is numbers 0 through 127. If an app name contains more than three characters that are greater than 127, then it probably means that the app has a non-English name. 

In [41]:
def is_string_english(string):
    not_english = 0
    
    for char in string:
        if ord(char) > 127:
            not_english += 1
            
    if not_english > 3:
        return False
    else:
        return True

#Test above function
print(is_string_english('Instagram'))
print(is_string_english('Instachat 😜'))
print(is_string_english(google_clean[4412][0]))


True
True
False


The is_string_english() function allows up to three characters outside of the English character range because apps may incorporate emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the range.

Below I use the function to filter out the non-english apps, and use the previously created explore_data() function to inspect the results.

In [42]:
apple_english = []
google_english = []

for app in apple:
    if is_string_english(app[1]):
        apple_english.append(app)
    
for app in google_clean:
    if is_string_english(app[0]):
        google_english.append(app)
        
#View results
print('Google')
explore_data(google_english, 0, 2, True)
print('\n')
print('Apple')
explore_data(apple_english, 0, 2, True)

Google
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13


Apple
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6183
Number of columns: 16


### Identify Free Apps

As mentioned in the introduction, a paid version of my idea already exists on the App Store. I would need to make the app free to compete with the existing app. The data sets contain both free and paid apps. I'll create a subset containing only free apps.

In [43]:
google_final = []
apple_final = []

for app in google_english:
    price = app[7]
    if price == '0':
        google_final.append(app)
        
for app in apple_english:
    price = app[4]
    if price == '0.0':
        apple_final.append(app)
        
print(len(google_final))
print(len(apple_final))

8864
3222


## Data Analysis

### Most Common Genres

I have an idea of which genre my app would be. I need to know if that genre is popular enough to attract many users. 

I'll begin by getting a sense of which are the most common genres for each store. I'll review the header information to get an idea of which indexes contain genre information. 

In [44]:
print(google_header)
print('\n')
print (apple_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


The Google data has a 'Category' [1] and 'Genres' [-4] index that may be of interest. The Apple data has a 'prime_genre' [-5] index. 

To get a better idea of the genre types I'll build two functions.

- One function to generate a frequency tables that show percentages stored as a dictionary 
- Another function  to display the percentages in a descending order, stored as a list

In [45]:
def freq_table(list_of_lists, index):
    table = {}
    total = 0
    
    for row in list_of_lists:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(list_of_lists, index):
    table = freq_table(list_of_lists, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Most Common App Store Genres

Now I'll use the functions above to analyze the filtered and cleaned App Store and Google Play store data.

In [46]:
display_table(apple_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Among free english apps  more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

In summary, the App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the supply.

Now I'll look at the Google Play store data and decide which index is more helpful 'Category' [1] or 'Genres' [-4].

In [47]:
print("Google Play Store 'Category' Data")
print('\n')
display_table(google_final, 1)

Google Play Store 'Category' Data


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 

In [48]:
print("Google Play Store 'Genres' Data")
print('\n')
display_table(google_final, -4)

Google Play Store 'Genres' Data


Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346

The 'Genere' data appears to be more granular than the 'Category' data. It has a longer output, and therefore more options/divisions of the data. It may be best to use the 'Category' data to compare to the Apple App store. 

Despite the name, there does not seem to be as many apps for fun on the Google Play store (again, only considering free and English apps). Family, Games, Tools, and Business round out the top four categories. Tools, Entertainment, Education, and Business are the top four genres.

Taking another look at the Play Store category data and comparing it the the App Store data we see that the Family category is unique to the Google Play store and the most common type of app. A quick visit to the Play Store and it is apparent that the Family category is mostly games for children. 

My initial analysis suggests that of the free english apps on the App Store and Play Store fun apps are the most common, or apps that are categorized as Family, Games, or Entertainment, depending on the store.

Again, I still can't say much about which apps have the most number of users and will therefore be the most profitable in an ad generating business model. Supply of fun apps on the app store does not necessarily mean there is a huge demand for them. Next I'll determine which genres have the most users. 

### Most Popular Apps by Genre on the App Store
 
One way to find out what types of apps are the most popular is to determine which genres have the most users. This is easily done by calculating the average number of installs for each app genre. The Google Play data set, has the number of installs in the Installs column. The App Store data set is missing this information. I'll use the total number of user ratings as a proxy. 

Below I calculate the average number of user ratings per app genre on the App Store:

In [49]:
apple_genres = freq_table(apple_final,-5)


for genre in apple_genres:
    total = 0
    len_genre = 0
    for app in apple_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre +=1
    n_avg_ratings = total/len_genre
    print(genre, ':', n_avg_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Reference 74k, Social Networking 71K, Weather 52K, Music 57k, seem to have the highest number of ratings per app, and therefore likely have the most installs.

Weather and Music will likely have costs and fees associated with obtaining weather data or music licensing. So while these genres are popular I wouldn't build this type of app. 

Below I'll print the Social Networking genre apps and the number of reviews.

In [50]:
for app in apple_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

Facebook is the most reviewed app with almost 3 million reviews. The number three app is Skype with less than 400k reviews. Facebook is clearly an outlier and likely skewing the averages for the whole genre making it seem like the genre is more popular than it really is.

Let's look at the Reference genre. Below I'll print the Reference genre apps and the number of reviews.

In [51]:
for app in apple_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


While the Bible and Dictionary.com skew the averages higher in this genre, this niche seems to show some potential. A reference app is more simple than a social networking app, and therefore costs less to build. 

Furthermore, a practical app might have more of a chance to stand out among the huge number of apps on the App Store. Which I've shown is largely dominated by for fun apps. 

Let's take a look at the Google Play store

### Most Popular Apps by Genre on the Google Play Store
For the Google Play market, the data set has data on the number of installs, so I should be able to get a clearer picture about genre popularity. However, the install numbers are not precise because they are open-ended (100+, 1,000+, 5,000+, etc.):

In [52]:
display_table(google_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


However, I only want to get an idea which app genres attract the most users, and I don't need perfect precision with respect to the number of users.

Therefore, I'll  leave the numbers as they are, which means an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

In the loop below, I'll  convert each install number to float, and compute the average number of installs for each genre (category).

In [53]:
google_categories = freq_table(google_final, 1)

for category in google_categories:
    total = 0
    len_category = 0
    for app in google_final:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Communication apps have, on average, the most instals; around 38 million. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts).

In [54]:
for app in google_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

This skew pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), and productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The Game and Family categories seem pretty popular, but previously we found out the Family category is young children games and the apps for fun market seems a bit saturated, so I'd stay clear of these categories as well. 

The Books and Reference category is fairly popular with around 8.7 million downloads and was very popular on the App Store. I'll take a look at some of the apps from this genre and their number of installs:

In [55]:
for app in google_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:

In [56]:
for app in google_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


However, it looks like there are only a few very popular apps, so this market still shows potential. It's also very popular on the App Store. Just steer clear of overdone apps such as ebooks, libraris, and tutorials. There are quite a few apps built around the Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. 

## Should I Build My App?
I've already established that the Reference genre is likely the best genre to get into. The paid version of the app I considered building is categorized as Travel. 

Travel is mildly successful in both stores. Travel apps average 28K user reviews in the App Store. Google, Yelp, and GasBuddy are the most reviewed travel apps. Next is rideshare and large booking agencies. It could be difficult to compete with well funded, and well advertised apps. The paid version isn't in this data set. It has 775 reviews. So it would be in the middle bottom of Travel apps shown below. 

In [57]:
for app in apple_final:
    if app[-5] == 'Travel':
        print(app[1], ':', app[5])

Google Earth : 446185
Yelp - Nearby Restaurants, Shopping & Services : 223885
GasBuddy : 145549
TripAdvisor Hotels Flights Restaurants : 56194
Uber : 49466
Lyft : 46922
HotelTonight - Great Deals on Last Minute Hotels : 32341
Hotels & Vacation Rentals by Booking.com : 31261
Southwest Airlines : 30552
Airbnb : 22302
Expedia Hotels, Flights & Vacation Package Deals : 10278
Fly Delta : 8094
Hopper - Predict, Watch & Book Flights : 6944
United Airlines : 5748
Skiplagged — Actually Cheap Flights & Hotels : 1851
Viator Tours & Activities : 1839
iExit Interstate Exit Guide : 1798
Gogo Entertainment : 1482
Google Street View : 1450
Webcams – EarthCam : 912
HISTORY Here : 685
DB Navigator : 512
Mobike - Dockless Bike Share : 494
MiFlight™ – Airport security line wait times at checkpoints for domestic and international travelers : 493
BlaBlaCar - Trusted Carpooling : 397
Six Flags : 353
Google Trips – Travel planner : 329
Voyages-sncf.com : book train and bus tickets : 268
Trainline UK: Live Tra

The Google Play Store data set shows around 14 million downloads for Travel apps. Travel apps on the Play Store follow a similar makeup to the App Store. Dominated by well funded, large corporate apps shown below.

In [58]:
for app in google_final:
    if app[1] == 'TRAVEL_AND_LOCAL':
        print(app[0], ':', app[5])

trivago: Hotels & Travel : 50,000,000+
Hopper - Watch & Book Flights : 5,000,000+
TripIt: Travel Organizer : 1,000,000+
Trip by Skyscanner - City & Travel Guide : 500,000+
CityMaps2Go Plan Trips Travel Guide Offline Maps : 1,000,000+
KAYAK Flights, Hotels & Cars : 10,000,000+
World Travel Guide by Triposo : 500,000+
Booking.com Travel Deals : 100,000,000+
Hostelworld: Hostels & Cheap Hotels Travel App : 1,000,000+
Google Trips - Travel Planner : 5,000,000+
GPS Map Free : 5,000,000+
GasBuddy: Find Cheap Gas : 10,000,000+
Southwest Airlines : 5,000,000+
AT&T Navigator: Maps, Traffic : 10,000,000+
VZ Navigator : 50,000,000+
KakaoMap - Map / Navigation : 10,000,000+
AirAsia : 10,000,000+
Expedia Hotels, Flights & Car Rental Travel Deals : 10,000,000+
Goibibo - Flight Hotel Bus Car IRCTC Booking App : 10,000,000+
Allegiant : 1,000,000+
Amtrak : 1,000,000+
JAL (Domestic and international flights) : 1,000,000+
Flight & Hotel Booking App - ixigo : 5,000,000+
VZ Navigator for Tablets : 500,000+

## Conclusion
Should I build my app? From my quick analysis the answer is no,  not if I intend it to be profitable. The Travel app market is dominated by large travel company apps. So it could be hard to get noticed in this genre. Furthermore it is not the most popular app genre, a reasonably priced paid version already exists on the App Store, and on further inspection, three free versions exist on the Play Store. 

From my analysis, a reference app would be the best to create. Specifically, it seems that taking a popular book  and turning it into an app could be profitable for both the Google Play and the App Store markets. 