# Title: A brief analysis of free apps
## Synopsis: 

    This is a guided project through DataQuest.io. The goals of the project are to analyze the dataset to help developers understand which app types are likely to attract more users.
    The developers build free apps, and the apps are monetized through in-app advertisements. Another goal is to produce a portfolio-worthy project.
    

In [60]:
## A function for exploring datasets
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds new empty line after each row
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
## four parameters: dataset (expected to be list of lists); start and end, expected to be integers and 
## represent starting, ending indices of a slice from a dataset 
## and finally 'rows_and_columns', a boolean that has a default argument of 'False'
## dataset should not have a header, otherwise it will provide an inaccurate count len()

In [61]:
def open_dataset(file_name, has_head=True):
    opened_file = open(file_name)
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)
    if has_head:
        return data[1:], data[0]
    else: 
        return data
## a function to open datasets
## returns the data set and header row or just returns the data set as a list of lists
## recommended manner to call function:
## variable_data, header_data = open_dataset('filename')
    

In [62]:
## open datasets

app_store_data, app_store_header = open_dataset(file_name='AppleStore.csv')
##Above: Apple App Store
play_store_data, play_store_header = open_dataset(file_name='googleplaystore.csv')
##Above: Google Play Store

## explore data

print(app_store_header)
print('\n')
explore_data(app_store_data, 0, 3, True)
print('\n')
print(play_store_header)
print('\n')
explore_data(play_store_data, 0, 3, True)
print('\n')
##prints a blank line
explore_data(app_store_data, 5, 9, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 

## Analysis notes
Which columns can assist in this analysis

* Android
* Number of rows: 10841
* Number of columns: 13
* Category, 1; Rating, 2; Reviews, 3; Installs, 5; Type, 6;
* Price, 7; Content Rating, 8; Genres, 9

* iOS
* Number of rows: 7197
* Number of columns: 16
* price, 4; rating_count_tot, 5; rating_count_ver, 6; user_rating, 7;
* cont_rating, 10; prime_genre, 11; 
* [Data source](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

* common categories, play store, ios store
* price, 7, 4
* rating, 2, 5 or 6
* reviews, 3, 5 or 6
* Content rating, 8, 10
* Type or genre, 9, 11

In iOS: What is the difference in the various rating_count's, what are some of the odd columns at the end? Is there an 'Installs' equiv for iOS?


In [63]:
## Deleting wrong data
## Guided data cleaning
## Intended audience: English speakers
## Developers produced free apps
## Revenue derived from advertising
## Will need to remove
#### non free apps
#### non english apps
######## Why though?

## Solution notebook : https://github.com/dataquestio/solutions/blob/master/Mission350Solutions.ipynb

## Google Play Store data set
## Discussion: A row, allegedly at 10472, is column shifted and missing some data
## Discussion link: https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015

print('This is the length of the Play Store dataset\'s header:', len(play_store_header))
print(play_store_header)
print('\n')
header_len = len(play_store_header)
for row in play_store_data:
    if len(row) != header_len:
        print(row)
        print('\n')
        print(play_store_data.index(row))
print('\n')
## See above: establishes a variable for the length of the dataset's header
## Then it loops through the data to see which, if any, rows are outside of that
## Then it prints that row and that index
## Here it does that once because it only one row is problematic. Ostensibly.

## DataQuest guide is directing me to delete that entry, though. Why?

print('This is the Play Store dataset\'s length before deletion:', len(play_store_data))
print('\n')
del play_store_data[10472] #only run once.
print('This is the Play Store dataset\'s length before deletion:', len(play_store_data))
print('\n')

## I mean, like, okay--but why? Why, DQ?
## Lets have a look see at the discussion on the App Store data
## Discussion link: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion

## There are 7197 rows in the App Store data set. And that is without the header.
## Indexing starts at 0 (zero)
## This DataQuest lesson has established that there are 7197 rows.
## The reported len( x ) of the dataset should be 7196 rows.
## Ergo, there is a duplicate entry. Maybe.

## That's what is reported in this discussion: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion/106176
## And the next lesson of DataQuest is removing duplicate entries


This is the length of the Play Store dataset's header: 13
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


10472


This is the Play Store dataset's length before deletion: 10841


This is the Play Store dataset's length before deletion: 10840




In [64]:
## Removing duplicate entries
## red blue green
## apparently the Play Store dataset has many many dupes
print(play_store_header)
print('\n')
for app in play_store_data:
    name = app[0]
    if name == 'Instagram':
        print(app)
print('\n')        
## creating a dictionary of duplicate apps
duplicate_apps = []
unique_apps = []

for app in play_store_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Duplicate Apps:', len(duplicate_apps))
print('\n')
print('These are some examples:')
print('\n')
explore_data(duplicate_apps, 0, 3, True)
print('\n')
print('Unique Apps:', len(unique_apps))
print('\n')
print('These are some examples:')
print('\n')
explore_data(unique_apps, 0, 3, True)

## so there are quite a few duplicates
print('\n')
print(play_store_header)
print('\n')
for app in play_store_data:
    name = app[0]
    if name == 'Google My Business':
        print(app)
print('\n')
print(play_store_header)
print('\n')
for app in play_store_data:
    name = app[0]
    if name == 'Box':
        print(app)
print('\n')
print(duplicate_apps[16:24])
print('\n')
for app in play_store_data:
    name = app[0]
    if name == 'Google Analytics':
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Duplicate Apps: 1181


These are some examples:


Quick PDF Scanner + OCR FREE


Box


Google My Business


Number of rows: 1181
Number of colum

# Google Play Store dataset cleaning

The Google Play Store dataset is in need of some cleaning.

It contains just shy of 1,200 duplicate entries and one item was errant.

Possible criteria for data removal:
* Index 3: keep the entry with the most reviews

Notes on other sampled entries: There are no obvious differences. And Instagram appears to be the only app reporting different numbers of ratings

Of the duplicates, efforts were made to keep only the most recent information from the duplicates, effectively trimming and unifying the data.



In [65]:
## expected number of rows after duplicate removals is 9659
## build the dictionary first

max_reviews = {}
##empty dictionary
for app in play_store_data:
    ## loops through play store data
    app_name = app[0]
    ## sets variable to the first index value of the dataset
    review_numbers = float(app[3])
    if app_name in max_reviews and max_reviews[app_name] < review_numbers:
        max_reviews[app_name] = review_numbers
    elif app_name not in max_reviews:
        max_reviews[app_name] = review_numbers
    ## what exactly are these accomplishing? I need to figure that out. See In[9] in solution book
    ## https://github.com/dataquestio/solutions/blob/master/Mission350Solutions.ipynb

print('Expected length:', len(play_store_data) - 1181)
print('Actual length:', len(max_reviews))
    
        
## okay, so the dictionary populated here, max_reviews, is populated by every valid entry
## populated by the desired entries.
## the conditionals loop through the data set
## they first check for the app's name
## then it checks the review numbers, then, if that entry has less review_numbers, it is excluded
## ostensibly this will only keep the most reviewed entry of duplicates
## the elif (else if) statement adds entries to the dictionary, max_reviews, that do not have duplicates.

        
    

Expected length: 9659
Actual length: 9659


In [66]:
## 5. Removing duplicate entries part two
## creat dictionary where each key is unique app name corresponding
## to highest number of reviews for each entry in play_store_data
## will also: use info to create new set with one entry per app
android_clean = []
already_added = []
for app in play_store_data:
    name = app[0]
    review_numbers = float(app[3])
    
    if (max_reviews[name] == review_numbers) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
        
print(android_clean[3])
print('The length of the new data set is:', len(android_clean))    


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']
The length of the new data set is: 9659


# What was done to the Google Play Store data

I looped through the initial dataset to locate duplicate entries, a total of 1,181 duplicate entries. I then truncated that list by checking the number of reviews and adding only the entries with the most reviews to a new dataset.

A brief check of the Apple App Store Data, seen below, will show that there are no duplicate apps.

In [67]:
duplicate_app_store = []
unique_app_store = []

for app in app_store_data:
    name = app[0]
    if name in unique_app_store:
        duplicate_app_store.append(name)
    else:
        unique_app_store.append(name)

print(len(duplicate_app_store))

0


# Culling dataset to better represent the goal

As we the goal is to analyze free apps directed at English-speaking audiences, we will want to analyze both datasets for non-English apps.



In [68]:
# print(ord('a'))
# print(ord('A'))
# print(ord('B'))
# print(ord('5'))
# print(ord('爱'))
# print(ord('-'))
## the ord() function shows the corresponding number used by a string to represent a character
## Commonly used English language characters are in the rance of 0 to 127 according to ASCII (American Standard Code for Information Interchange)
## Strings in Python are indexable and iterable

# this_string = 'abc'
# print('\n')
# print(this_string[0])
# print(this_string[1])
# print(this_string[2])
# print('\n')
# for char in this_string:
#     print(char)
# for char in this_string:
#     print(ord(char))
    
def is_it_english(a_string):
    outer_ascii = 0
    for char in a_string:
        if ord(char) > 127:
            outer_ascii += 1
    if outer_ascii > 3:
        return False
    else:
        return True

# print(is_it_english('Instagram'))
# print(is_it_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

## Though this works for identifying specifically English language characters, it misses some characters
## that may be used in an English language app's name

# print('Instagram')
# print(is_it_english('Instagram'))
# print('爱奇艺PPS -《欢乐颂2》电视剧热播')
# print(is_it_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
# print('Docs To Go™ Free Office Suite')
# print(is_it_english('Docs To Go™ Free Office Suite'))
# print('Instachat 😜')
# print(is_it_english('Instachat 😜'))

## These oddball characters are causing some issues, as seen in the output.
## We will lose useful data using the function we have defined
## To prevent excess data loss during this analysis, we will remove apps only if they have more than three
## characters corresponding to numbers falling outside the ASCII range

english_app_store = []
english_play_store = []

for app in android_clean:
    name = app[0]
    if is_it_english(name):
        english_play_store.append(app)

for app in app_store_data:
    name = app[1]
    if is_it_english(name):
        english_app_store.append(app)
print('The length of the App Store dataset, prior to expunging non-English language apps, is:', len(app_store_data))
print('\n')
print('Below is a sample post-cleaning')
print('\n')
explore_data(english_app_store, 0, 3, True)
print('\n')
print('The length of the Play Store dataset, prior to expunging non-English language apps, is:', len(android_clean))
print('\n')
print('Below is a sample post-cleaning')
print('\n')
explore_data(english_play_store, 0, 3, True)
print('\n')

The length of the App Store dataset, prior to expunging non-English language apps, is: 7197


Below is a sample post-cleaning


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


The length of the Play Store dataset, prior to expunging non-English language apps, is: 9659


Below is a sample post-cleaning


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '

In [69]:
## the next step will be to isolate the free apps into separate lists for each dataset
free_app_store = []
free_play_store = []

for app in english_app_store:
    price = app[4]
    if price == '0.0':
        free_app_store.append(app)
        
for app in english_play_store:
    price = app[7]
    if price == '0':
        free_play_store.append(app)
        
print('This is the final length of the Apple App Store dataset:', len(free_app_store))
print('\n')
print('This is the final length of the Google Play Store dataset:', len(free_play_store))
print('\n')

This is the final length of the Apple App Store dataset: 3222


This is the final length of the Google Play Store dataset: 8864




# Which apps best fit the profile

The end goal is to have an app that is popular, ergo profitable, on both platforms. Lets have a look at genre frequency tables for each dataset.

## Validation strategy
1. Build a minimal Android version of the app to be added to the Play Store
2. If the app receives a good response, it will be developed further
3. If the app is profitable after a half year, build out an iOS version for the App Store.

The validation strategy will help to mitigate risks and costs.

In [70]:
## free_app_store: use the prime_genre column to generate freq table
## free_play_store: use the genres and category columns to generate freq tables
## not ideal way to sort dictionary to follow

def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
        
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
        
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

print('This is a sorted table for App Store genre data:')
display_table(free_app_store, 11)
print('\n')
print('This is a sorted table for the Play Store category data:')
display_table(free_play_store, 1) #category
print('\n')
print('This is a sorted table for the Play Store genre data:')
display_table(free_play_store, -4)#genre

This is a sorted table for App Store genre data:
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


This is a sorted table for the Play Store category data:
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003

## A look at the data
# App Store

Through our raw analysis, we see that mobile games that are both free and in English make up majority, about 58.2%, of these apps. The next contender for most prolific apps are entertainment apps and photo and video apps, making up about 7.9% and about 5%, respectively.

Apps geared toward entertainment are the most common among Apple's App Store with practical apps trickling in a bit lower in the dataset.

Though this frequency table does not show the popularity or userbase among these genres. With this alone, it would be difficult to recommend an app profile. 

# Google Play Store

Google's Play Store is a bit less obvious when looking at the most populous apps, given our profile. Family apps account for about 19% of the Play Store and mobile games for about 9.7%. Looking at the Google Play store, many of the family apps are games and activities geared toward children. Among higher frequency across categories and genres in the Play Store are apps with utilitarian or practical purposes.

Side note: it appears from one this data was collected to now (March 2021) that the "Family" category has been abandoned in the Play Store for other, more apropos categories.

In [71]:
## finding the most popular
## using 'installs' from play store and, as a proxy, 'rating_count_column' in app store
## free_app_store[5]
## free_play_store[5]

## for iOS, isolate each genre -> sum the ratings -> divide by number of apps in genre

app_store_genre = freq_table(free_app_store, 11)

for genre in app_store_genre:
    total = 0
    len_total = 0
    for app in free_app_store:
        the_genre = app[11]
        if the_genre == genre:
            ratings = float(app[5])
            total += ratings
            len_total += 1
    average_rating = total / len_total
    print(genre, ':', average_rating)
    


Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


# iOS: An overview
Social media: 71,500
Music: 57,300
Reference: 74,900
Weather: 52,200
Navigation: 86,000

Social networking apps, representing some of the largest dot com giants in the industry, on average, have a high number of user reviews. Only navigation apps, such as Google Maps and Waze, and music apps have a higher amount of reviews among their respective categories.

In no particular order, social media, reference, weather and navigation apps make up the most reviewed apps among Apple's App Store, indicating high usage rates and a large customer pool.

In [72]:
for app in free_app_store:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


# Navigation

Few apps make up the Navigation market place for this dataset. Waze and Google have a strong hold on this marketplace with an existing and large userbase.

In [73]:
for app in free_app_store:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


# Reference

Reference apps consist of functional apps, of which many intersect with other categories and genres. There is strong potential in a reference app for a low-barrier entry that could appeal to an existing userbase, as evidenced by some of the entries listed above.

In [74]:
for app in free_app_store:
    if app[11] == 'Weather':
        print(app[1], ':', app[5])

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

# Weather
Weather Apps are dominated by well-known names and apps that utilize existing data infrastructures from governments. Some jockey for position for a user's chosen app, and others try to offer complimentary functions, such as radar views, that likely share space with a user's chosen app.

There is potential to compete in this genre. The entry barrier, however, may be rather technical and potentially costly.

In [75]:
for app in free_app_store:
    if app[11] == 'Music':
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

# Music

Music apps run a gamut between listening, learning, playing and reference. Licensing for music streaming may prove difficult to obtain at a reasonable cost and then such an app would have to compete with Spotify, Apple Music, Pandora, et al.

However, building a reference-based or a musical-instrument based app would have a low barrier to entry and ought to be low cost when compared to some of the other options. Not many changes would be needed to keep a piano app or a music theory app up-to-date.

In [76]:
for app in free_app_store:
    if app[11] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

# Social Networking

Social networking apps are popular among users. Many of the top apps in this genre began as small start-ups and developed into large-scale corporations. Many often have complex back-ends for data collection and other services they offer. This may be a viable avenue for long-term growth; however, the startup costs -- e.g. software engineers, data engineers, hardware costs -- may be higher than the fictional client is seeking.

# Google Play

Below, we will have a look at the Google Play Store.

In [77]:
display_table(free_play_store, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


# Google Play installs

The above data, while imprecise, shows a rough range of installs for various apps. This data can help us demonstrate which app genres attract the most users.

In [78]:
free_play_store_categories = freq_table(free_play_store, 1)

for category in free_play_store_categories:
    total = 0
    len_category = 0
    for app in free_play_store:
        play_store_cat = app[1]
        if play_store_cat == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    average_installs = total / len_category
    print(category, ':', average_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

# Installs

Art and design; books and reference; business; communication; education; entertainment; finance; food and drink; health and fitness; house and home; games; shopping; and photography are all categories, just to name a few, that have more than a million installs.

Communication apps, such as Skype, Kik, Gmail, have the most installs, about 38.5 million.

In [79]:
for app in free_play_store:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

When adjusted for the most popular among the installs, communication apps will be reduced by about 10 times.

In [80]:
under_100 = []

for app in free_play_store:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100.append(float(n_installs))
        
sum(under_100) / len(under_100)

3603485.3884615386

This pattern is repeated across several categories by market domineering companies such as Google, Facebook, Microsoft, etc. Competition is strong and upstart cost may be difficult.

In [81]:
for app in free_play_store:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0)], ':', app[5])

SyntaxError: closing parenthesis ')' does not match opening parenthesis '[' (<ipython-input-81-d202e9ffb95e>, line 3)

# Books and Reference

This genre includes religious reference material, dictionaries, foreign language dictionaries, and the like. There are, however, a few popular apps skewing the average.

In [None]:
for app in free_play_store:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

There are few apps in this category that exceed 100 million installs. This leaves quite a bit of potential for ingress into this market.

In [None]:
for app in free_play_store:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

This category consists of reference materials, ebook apps, and various library and dictionary apps. This is similar to the results for the iOS analysis.

# Conclusion

A reference app offers quite a lot of potential for ingress into the market; though, competition is still stiff.

Appropriate selected material for a reference app, such as books or music, can offer a lower cost of entry and lower upkeep cost, particularly if the material is relatively static.

The app, be it for referencing music theory or a popular book, ought to set itself apart with special features. This could include regular quotes about the material or from the material, quizzes and tests, and some manner of user input.