# App Selection for Start Up
____
Taking the role of an company deciding the direction of the next app. This company builds free apps and monetizes through in-app ads revenue.

iOS is prioritized due to 1) the fragmentation of Android, and 2) apps on the App Store tend to be more profitable. So we will explore iTunes first.

To develop a successful app, the following strategy is adopted:
1. Publish an iOS minimum viable product on the App Store.
2. If the app is well received, we develop it further.
3. If the app is profitable after six months, we develop the same app for the Google Play Store.

## Data Exploration

The datasets prepared are [Mobile App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) by Ramanathan Perumal, containing  7196 entries; and [Google Play Store Apps](https://www.kaggle.com/lava18/google-play-store-apps/home) by Lavanya Gupta, containing 10841 entries.

First, open the datasets using csv.reader.

In [1]:
from csv import reader
ios_db = list(reader(open('AppleStore.csv', encoding = 'utf8')))
gplay_db = list(reader(open('googleplaystore.csv', encoding = 'utf8')))

Then let's explore the data. The following function allows us to format the selected data for readability.

In [2]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
print('iTune App Store')
print(explore_data(ios_db, 0, 3))
print('_________________________________________________________________________________')
print('Google Play Store')
print(explore_data(gplay_db, 0, 3))

iTune App Store
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


None
_________________________________________________________________________________
Google Play Store
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moa

### Determine Irrelevant Information

Let's decide what content will be useful for our analysis.
For ios_db, we can ignore columns that don't provide meaningful information for our analysis.

1) `id`. App id does not provide any meaningful information other than to act as unique primary key. Unless we are dealing with multiple databases. 
2) 'user_rating_ver'. Unless there are data of ratings from previous versions, which will allow us to see whether an app has become better or worse received by users overtime. Otherwise, this column is meaningless.
3) `ver`. For the same reason as above, this column is meaningless.
4) `vpp_lic`. Apple Volume Purchase Program allows companies and organizations to easily purchase and manage apps en bulk. Since we are interested in the overall market performance, and since the consumer market dwarfs business use (except for certain apps), we are not interested in this column.

For the gplay_db, we can ignore the following columns:
1) `Last Updated`
2) `Current Ver`
3) `Android Ver`. Unlike iOS, which is a proprietary close-ended operating system, Android is an open source platform, and is fragmented among different device makers and devices. We intend to keep our app on the latest Android version anyways, so column can be ignored.

## Data Cleaning

### Removing Unknown Genre- Entry 10473

There is an error within gplay_db, in which entry 10473 has an empty genre categorization. To keep ensure the data is pristine, we will remove this entry from our data with the `del` function. We must not run this command more than once, or we will delete more data than intended.
`del gplay_db[10473]`

`print(gplay_db[0])`
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', **'Genres'**, 'Last Updated', 'Current Ver', 'Android Ver']

`print(gplay_db[10473])`
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', **''**, 'February 11, 2018', '1.0.19', '4.0 and up']

`print(gplay_db[10473])` #post deletion
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']

We can also confirm `del` worked as intended by the number of entries, but it is possible that we mistyped and deleted a different entry instead.
Since the first row is header row, we exclude it from our app count.

In [4]:
print('Number of apps before deletion:', len(gplay_db[1:]))
del gplay_db[10473]
print('Number of apps post deletion:', len(gplay_db[1:]))

Number of apps before deletion: 10841
Number of apps post deletion: 10840


To confirm the troublesome data is deleted, let's check the entry content.

In [5]:
print(gplay_db[10473])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


### Identifying and Removing Duplicate Entries

Another problem with `gplay_db` is that it has duplicate entries. Here are two examples:

In [6]:
for app in gplay_db:
    name = app[0]
    if name == 'Slack':
        print(app)
print('\n')
for app in gplay_db:
    name = app[0]
    if name == 'Google Ads':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


Duplicate entries will distort the analysis. If the issue is small, maybe it is more time efficient to leave the duplicate values in, maybe if under 0.5% of the dataset. So let's explore how serious the problem is.
To do, we will use `for` loop to iterate over each app's name. We will store the unique and duplicate values in two different lists.

In [7]:
duplicate_apps = []
unique_apps = []
for app in gplay_db[1:]:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of unique apps:', len(unique_apps))
print('Examples of unique apps:', unique_apps[:3])
print('\n')
print('Number of duplicate apps:', len(duplicate_apps))
print('Examples of duplicate apps:', duplicate_apps[:2])
print('\n')
print('Expected length:', len(gplay_db[1:])- len(duplicate_apps))

Number of unique apps: 9659
Examples of unique apps: ['Photo Editor & Candy Camera & Grid & ScrapBook', 'Coloring book moana', 'U Launcher Lite – FREE Live Cool Themes, Hide Apps']


Number of duplicate apps: 1181
Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box']


Expected length: 9659


About 11% of entries in gplay_db are duplicates. Having such significant amount of duplicates will compromise our analysis.

Looking at the duplicated rows, we find the difference to be mainly with the number of `'reviews`.

The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we will only keep the row with the highest number of reviews, assuming the entry with the highest review is the most up-to-date entry, since the entries are not time stamped.

To do this we must sort the duplicate entries. We will store the output in a dictionary. The function first creates a [app name: reviews] key value pair if it encounters a new app (app not already in the dictionary), then everytime the same app is encountered, the function will update the review count if the new review count is greater than the one in the dictionary. Doing so guarantees we have a dictionary of apps with the most reviews.

In [8]:
reviews_max = {}

for app in gplay_db[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))

9659


The 'reviews-max' dictionary only contains two fields, so we will use it as reference sort through gplay_db. If the number of reviews matches 'reviews_max', then the entry is added to a new `gplay_db_unique` table.

To avoid the situation where two duplicate apps have the same amount of reviews, we need to create a condition so that only 1 instance is appended to the `gplay_db_unique` table.
We can can check if the function worked as intended by comparing the number of entries in `gplay_db_unique` and `gplay_db_already_added`.

If we do not do this, then we can see the number of apps matching 'reviews_max' is 10054, and not 9659 as expected.

In [9]:
gplay_db_highest_reviews = []
gplay_db_unique = []
gplay_db_already_added = []

for app in gplay_db[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]):
        gplay_db_highest_reviews.append(app)
        
print('Apps with duplicates of the highest reviews:', len(gplay_db_highest_reviews))

for app in gplay_db[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in gplay_db_already_added):
        gplay_db_unique.append(app)
        gplay_db_already_added.append(name)

print('Unique apps with the highest reviews:', len(gplay_db_unique))

Apps with duplicates of the highest reviews: 10054
Unique apps with the highest reviews: 9659


### Identifying and Removing Non-English Apps

There are non-English apps in both datasets. Here are a few examples:

In [10]:
print(ios_db[813][1])
print(ios_db[6731][1])
print(gplay_db_unique[7940][0])

BATTLE BEARS -1
Beast Poker
لعبة تقدر تربح DZ


We are not interested in non-English apps. To remove these entries, we will reference the ASCII (American Standard Code for Information Interchange) table.

ASCII designates values 0-127 are for control and printable characters, 128 and on are extended characters, including unicodes and characters from other languages. So we will only keep apps with names conforming values 0-127.

We will loop through the app name, taking each string as iterable, checking against the ASCII table, and retaining the result into a new list.

There are some characters not in ASCII, such as special symbols (™) and emojis (😜). Our function must be all inclusive of those edge cases. so the following code will not work.

In [11]:
def is_english(app_name):
    for character in app_name:
        if ord(character) > 127:
            return False    
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

None
False
False
False


The level of precision in our function is not high enough to account for edge cases with special characters. While it is possible to create an all-inclusive dictionary with every possible character, doing so would increase the cost and time by orders of magnitude.
So a simpler method is deviced. If an app name has more than 3 special characters, then we consider it to be non-english.

In [12]:
def is_english(app_name):
    non_ascii = 0
    for character in app_name:
        if ord(character) > 127:
            non_ascii += 1
    if non_ascii > 3:
        return False
    else:
        return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


Now let's filter both datasets so only English apps are kept.

In [13]:
ios_db_english = []
ios_db_non_english = []
gplay_db_english = []
gplay_db_non_english = []

for app in ios_db[1:]:
    if is_english(app[1]):
        ios_db_english.append(app)
    else:
        ios_db_non_english.append(app)

for app in gplay_db_unique:
    if is_english(app[0]):
        gplay_db_english.append(app)
    else:
        gplay_db_non_english.append(app)

print(len(ios_db_english))
print(len(gplay_db_english))

6183
9614


### Identifying and Removing Non-Free Apps

Lastly, we are only interested in free apps. So let's sort the data once more.

In [14]:
ios_db_free = []
ios_db_non_free = []
gplay_db_free = []
gplay_db_non_free = []

for app in ios_db_english:
    if float(app[4]) == 0:
        ios_db_free.append(app)
    else:
        ios_db_non_free.append(app)
        
for app in gplay_db_english:
    if app[6] == 'Free':
        gplay_db_free.append(app)
    else:
        gplay_db_non_free.append(app)

print('Number of free iOS apps:', len(ios_db_free))
print('Number of free Android apps:', len(gplay_db_free))

Number of free iOS apps: 3222
Number of free Android apps: 8863


## Determining The Most Popular App Genre

### Number of Apps

Before development can begin, we must decide the genre of our app. To do this, we will look at which genre are most popular in both stores.
Two metrics will be employed. 1) number of installs per genre, and 2) number of apps per genre.

`gplay_db` has 'Genre' and 'Category'. Both could provide information about the nature of the app. The difference between the two is unclear for now, so let's not pick one over the other for now.

To do this, we will create a function to count the frequency of the column, then a different function to sort the result from largest to smallest. Of course we are interested in percentage as well as the absolute numbers.

In [15]:
def count_total_apps(database):
    total_apps = 0
    
    for row in database:
        total_apps += 1
    return total_apps

def count_freq_table_raw(dataset, count_index):
    count_frequency_table_raw = {}
    
    for row in dataset:
        col = row[count_index]
        if col in count_frequency_table_raw:
            count_frequency_table_raw[col] += 1
        else:
            count_frequency_table_raw[col] = 1
    
    return count_frequency_table_raw
        
def count_freq_table_percentage(database,count_index):
    count_frequency_table_percentage = {}
    table = count_freq_table_raw(database, count_index)
    total_apps = count_total_apps(database)
    for col in table:
        count_frequency_table_percentage[col] = (table[col] / total_apps * 100)
    return count_frequency_table_percentage

def count_sort_table(function, dataset, count_index = ""):
    table = function(dataset, count_index)
    table_sorted = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_sorted.append(key_val_as_tuple)

    table_sorted = sorted(table_sorted, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

ios_db_free_genre_frequency = count_freq_table_raw(ios_db_free, 11)
gplay_db_free_genre_frequency = count_freq_table_raw(gplay_db_free, 9)
gplay_db_free_category_frequency = count_freq_table_raw(gplay_db_free, 1)

#print(sorted(count_freq_table_percentage(ios_db_free, 11)))
print('iTunes App Store by genre frequency')
print(count_sort_table(count_freq_table_percentage, ios_db_free, 11))
print('\n')
print('Google Play by genre frequency')
print(count_sort_table(count_freq_table_percentage, gplay_db_free, 9))
print('\n')
print('Google Play by category frequency')
print(count_sort_table(count_freq_table_percentage, gplay_db_free, 1))

iTunes App Store by genre frequency
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665
None


Google Play by genre frequency
Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical :

Games are by far the most popular genre in the App Store. But that does not mean games have the most user.
`print('Sorted iOS Prime Genre frequency')`
`print(display_table(ios_db_free, 11))`
Games : 58.16263190564867
Entertainment : 7.883302296710118

Whereas in the Play Store, games are still popular as a Category, but no where as popular.
`print('Sorted Google Play Category frequency')`
`print(display_table(gplay_db_free, 1))`
FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657

The difference between Genre and Category in the Playstore Genre is a more granular breakdown of Category.

### Number of Ratings/Reviews

There is no field for user count in the iOS database. Therefore, we will use  `rating_count_tot` as proxy.

For Google Play Store, there is a field called `Installs`, but because the data comes in logarithmic intervals, it is impossible to pinpoint whether an app has 1,000,001 or 4,999,999 installs. Therefore we will again use the number of `Reviews` as proxy.

While reviews and ratings may be biased towards really good or bad apps, we are interested in the quantity, not quality.

To do this we will need to sum the number of reviews. Luckily we can easily modify the previous function to achieve this. But before we begin, we will need to process the data into an integer.

In [16]:
for row in ios_db_free:
    ratings = row[5]
    ratings = int(ratings)
    row[5] = ratings

for row in gplay_db_free:
    reviews = row[3]
    reviews = int(reviews)
    row[3] = reviews

print('Ratings class:', type(ios_db_free[0][5]))
print('Reviews class:', type(ios_db_free[0][5]))

Ratings class: <class 'int'>
Reviews class: <class 'int'>


In [17]:
def sum_total(database, sum_index):
    sum_col = []

    for row in database:
            sum_col.append(row[sum_index])
    return sum(sum_col)

def sum_table_raw(dataset, count_index, sum_index):
    table = {}
    
    for row in dataset:
        col = row[count_index]
        
        if col in table:
            table[col] += row[sum_index]
        else:
            table[col] = row[sum_index]
    
    return table
        
def sum_table_percentage(database, count_index, sum_index):
    table_percentage = {}
    table = sum_table_raw(database, count_index, sum_index)
    total_sum = sum_total(database, sum_index)
    for col in table:
        table_percentage[col] = (table[col] / total_sum * 100)
    return table_percentage

def sum_sort_table(function, dataset, count_index, sort_index):
    table = function(dataset, count_index, sort_index)
    table_sorted = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_sorted.append(key_val_as_tuple)

    table_sorted = sorted(table_sorted, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

ios_db_free_genre_ratings = sum_table_raw(ios_db_free, 11, 5)
gplay_db_free_category_reviews = sum_table_raw(gplay_db_free, 1, 3)
        
print('iTunes App Store by Ratings')
print(sum_sort_table(sum_table_percentage, ios_db_free, 11, 5))
print('\n')
print('Play Store by Reviews')
print(sum_sort_table(sum_table_percentage, gplay_db_free, 1, 3))

iTunes App Store by Ratings
Games : 53.39225622901802
Social Networking : 9.481896177948654
Photo & Video : 5.689352746228933
Music : 4.730306761290697
Entertainment : 4.455288795493973
Shopping : 2.8270862703306054
Sports : 1.9848817257966838
Health & Fitness : 1.8933111726001723
Utilities : 1.892148459242271
Weather : 1.8301320792365399
Reference : 1.6865069740297345
Productivity : 1.472258909509895
Finance : 1.4163173942418434
Travel : 1.412449184425342
News : 1.1422908603728785
Food & Drink : 1.0835513316693615
Lifestyle : 1.0511603879311853
Education : 1.033277106349015
Book : 0.6959014479156925
Navigation : 0.6457960035666545
Business : 0.15921546603801798
Catalogs : 0.020023674344242168
Medical : 0.004590842419583994
None


Play Store by Reviews
GAME : 28.229435358723187
COMMUNICATION : 13.690261155227141
TOOLS : 10.986128935507878
SOCIAL : 10.920798506662036
FAMILY : 9.085377009801904
PHOTOGRAPHY : 5.0530151164592105
VIDEO_PLAYERS : 3.2402976157400882
PRODUCTIVITY : 2.655212542

From our aggregation, we see that some apps are over represented, while others are under represented in ratings and reviews.

To see the degree of difference, we will compare 'rating_count_tot' and 'reviews' on a per app basis across different 'Genres' (ios_db_free) and 'Categories' (gplay_db_free). 

In [18]:
genre_ios = count_freq_table_raw(ios_db_free, 11)

for genre in genre_ios:
    total = 0
    len_genre = 0
    for row in ios_db_free:
        genre_app = row[11]
        if genre_app == genre:
            app_ratings = float(row[5])
            total += app_ratings
            len_genre += 1
    average_app_ratings_in_genre = total / len_genre
    print(genre, ':', average_app_ratings_in_genre)

Utilities : 18684.456790123455
Business : 7491.117647058823
Shopping : 26919.690476190477
Catalogs : 4004.0
Sports : 23008.898550724636
Productivity : 21028.410714285714
Education : 7003.983050847458
News : 21248.023255813954
Entertainment : 14029.830708661417
Photo & Video : 28441.54375
Medical : 612.0
Music : 57326.530303030304
Games : 22788.6696905016
Travel : 28243.8
Food & Drink : 33333.92307692308
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Finance : 31467.944444444445
Navigation : 86090.33333333333
Social Networking : 71548.34905660378
Lifestyle : 16485.764705882353
Book : 39758.5


1) Turn dictionary into a list of tuples.
2) Separate `key` and `value` into different lists.
3) Divide the `value` of `rating`/`reviews` by `frequency`.
4) Append the new `value` into a new dictionary.
5) Sort the dictionary.

In [19]:
#ios_db_free_genre_ratings
#gplay_db_free_category_reviews
#ios_db_free_genre_frequency
#gplay_db_free_category_frequency
def divide_two_dictionaries(num_dict, den_dict):
    num_key = []
    num_value = []
    den_key = []
    den_value = []
    div_result = []
    res = {}
    num_key, num_value = zip(*num_dict.items())
    den_key, den_value = zip(*den_dict.items())
    div_result = [i / j for i, j in zip(num_value, den_value)] 
    for key in num_key:
        for value in div_result:
            res[key] = value
            div_result.remove(value)
            break
    return res

ios_db_free_average_app_ratings_by_genre = divide_two_dictionaries(ios_db_free_genre_ratings, ios_db_free_genre_frequency)
gplay_db_free_average_app_reviews_by_category = divide_two_dictionaries(gplay_db_free_category_reviews, gplay_db_free_category_frequency)

def simple_sort_table(parameter):
    table = parameter
    table_sorted = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_sorted.append(key_val_as_tuple)

    table_sorted = sorted(table_sorted, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

print('iTunes App Store average user per app by Genre:')
print(simple_sort_table(ios_db_free_average_app_ratings_by_genre))
print('\n')
print('Play Store average user per app by Genre:')
print(simple_sort_table(gplay_db_free_average_app_reviews_by_category))

iTunes App Store average user per app by Genre:
Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0
None


Play Store average user per app by Genre:
COMMUNICATION : 995608.4634146341
SOCIAL : 965830.9872881356
GAME : 683523.8445475638
VIDEO_PLAYERS : 425350.08176100627
PHOTOGRAPHY : 404081.3754789272
TOOLS : 305732.8973333333
ENTERTAINMENT : 301752.24705882353
SHOPPING : 223887.34673366835
PERSONALIZA

## Conclusion

From the analysis, the following genres should be considered:
1) Social networking app: ranked #3 in iTunes, #2 in Android, and possibly crosses into the `Communication` category in Android too. A social networking app should not be too technical to pilot, as adoption comes more from pivot of marketing than indepth technical changes, so we can receive instant feedback. Lastly, social networking apps have high chance of cross platform success.
2) Weather app: ranked #5 in iTunes, and #10 in Android. The MVP consists of interfacing with existing weather meteorology networks in realtime. Differentiation however, is the challenge.

L:astly, while music app may seem like a good idea, as it is ranked #4 in iTunes, and #7 in Android under `Entertainment`. The challenge lies in securing licensing rights. Leading apps such as Spotify and Pandora is still unprofitable and cashflow negative. This genre is best avoided without a truly disruptive idea.