# App Store & Google Play Store Markets Profitable App Profiles
The objective of this project is to find apps that are profitable in the markets and use the findings to help in decision making on what kind of apps to be developed. 
Since the company develops only free apps, the source of revenue is solely app users who interact with the ads placed on company apps. So developing apps that attract traffic will increase the source of revenue.

## Opening and Exploring the Data
As of the first quarter of 2020, Android users had upto 2.56 million apps available in Google Playstore whereas Apple's App Store had about 1.85 million available apps for iOS users. 
Collecting such a huge data and cleaning it requires a significant amount of time and money, so there is need to get a sample and work with it instead. Fortunately, there are two datasets that seem appropriate for our objective. These datasets are:
* [A data set](https://www.kaggle.com/lava18/google-play-store-apps) contains data about approximately ten thousand Android Apps in Google Playstore.
* [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) contains data about approximately seven thousand iOS apps from Apple App Store. 

Let's start by opening the two datasets and explore a little. 

In [9]:
from csv import reader

### The Google Play Data set ###
opened_file = open(r"C:\Users\hp\Downloads\googleplaystore.csv",encoding='latin1')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### Apps Store Data set ###
opened_file = open(r"C:\Users\hp\Downloads\AppleStore.csv",encoding='latin1')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

To make it easier to explore the two data sets, a function named explore_data() is written. it can be used numerous times to explore rows in the more readable way. An option will be added to the function to show number of rows and columns for any data set. 

In [6]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty line between rows)
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite â\x80\x93 FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


From the above, it is clear that Google Playstore data set has 10841 apps and 13 columns. We can quickly see that the columns that could be useful for our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.
Now, lets have a look at the App Store data set. 

In [10]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


From the above, its vivid that there are 7197 iOS apps in the data set and the columns that are of interest for our analysis are: 'track_name', 'currency', 'price', 'rating_count-tot', 'rating_count_ver', and 'prime_genre'. Not all column names are self-explanatory in the data set above but the names' details can be found in the link provided in the intro. 

## Deleting Irrelevant/Wrong Data
After a quick scan of the Google Play data set, looks like there is an error in row 10472. We shall print it and compare it with another row that does not have an error. 

In [11]:
print(android[10472]) # the row with an error
print('\n')
print(android_header) # header
print('\n')
print(android[0]) # row without an error

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


From the output, it looks like the row 10472 corresponds to the app *Life Made WI-Fi Touchscreen Photo Frame*, and we can see that rating is 19. This is clearly erroneous since the maximum rating for a Google Play app is 5. This error is caused by a missing value in the 'Category' column. As such, the row should be deleted. 

In [12]:
print(len(android))
del android[10472] # dont run this more than once because it will result in deleting other rows
print(len(android)) # this is to ensure and track the number of rows deleted

10841
10840


## Removing Duplicate Entries
### Part One
If we explore the Google Playstore data set carefully enough, we will realise that some apps have more than one entry. For example, the application instagram has four entries:

In [13]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


There are 1,181 scenarios in which an app occurs more than once:

In [14]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


During the analysis, we want to remove duplicate entries and keep only one entry per app. So we need to remove all duplicates from the data sets. 
On examining the rows we printed above for Instagram app, the main difference happens on the fourth position of each row (it corresponds to the number of reviews). Different numbers show that the data was collected at different times. This can then be used to create a criterion for keeping rows. The higher the number of reviews, the more reliable the ratings. And so keeping this in mind, we dont need to then remove rows randomly but rather selectively by keeping the ones with the highest number. 

To do that:
- Create a dictionary where each key is a unique app name, and the value is the highestnumber of reviews of that app
- Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)

## Part Two
Lets start by building the dictionary.

In [15]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

The length of the dictionaty created above (of unique apps) should be equal to the difference between the length of our data set and 1,181 (scenarios where an app occurs more than once).

In [16]:
print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


We shall use the reviews_max dictionary to remove the duplicates in the code cell below by:

- Start by initializing two empty lists, android_clean and already_added
- Loop through the android data set, and for every iteration:
    - Isolate the name of the app and the number of reviews
    - Add the current row (app) to the android_clean list, and the app name (name) to the already_added list if:
        - the number of reviews of the current app matches the number reviews of the app as described in the reviews_max dictionary
        - the name of the app is not already in the already_added list. This condition is to account for scenarios where the highest number of reviews of duplicate app is the same for more than one entry. Checking for only reviews_max[name] == n_reviews will still maintain duplicate entries in the data

In [17]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name) # this has to be inside the if block

Take a quick look at the new data set and confirm that the number of rows is 9,659.

In [18]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite â\x80\x93 FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Removing Non-English Apps
### Part One
From the data, some of the apps are not intended for use by english-speaking consumer. Some of the examples are in the code cell below:

In [20]:
print(ios[813][1]) # in ios data set
print(ios[6731][1])

print(android_clean[4412][0]) # in the google playstore data set
print(android_clean[7940][0])

ç±å¥èºPPS -ãæ¬¢ä¹é¢2ãçµè§å§ç­æ­
ãè±åºã²ã¼ã ãçµ¶å¯¾ã«æå¾ã¾ã§ãã¬ã¤ããªãã§ ãè¬è§£ãï¼ãã­ãã¯ããºã«ã
ä¸­å½èª AQãªã¹ãã³ã°
ÙØ¹Ø¨Ø© ØªÙØ¯Ø± ØªØ±Ø¨Ø­ DZ


Since we are not interested in such apps, we should remove them from the data set. Such apps can be identified by checking the app name. More often, non-english apps will contain symbol(s) that are not common in English text.

All characters that are used in English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 and we can use this to build a function that checks the app's name and indicate if it has non-ASCII characters.

We build the function below by using the built-in ord() function to find out the corresponding encoding number of each character.

In [21]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
        
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


The function above seems to be working but there are some app name that use emojis or non-ASCII characters but are intended for consumption by English-speakers. The function as it is could remove useful apps and we dont want that. So we need to modify it.  

### Part Two
To minimize data loss, we will only remove apps whose names have more than three non-ASCII characters:

In [23]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
            
    if non_ascii > 3:
        return False
    else:
        return True
    
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
True


The function above is not perfect because a few non_English apps might beat the filter but it is good enough for use now.

The is_english() function below is used to filter out the non-English apps for both data sets.

In [24]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite â\x80\x93 FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9500
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12'

### Separating Free Apps from non-free apps
As mentioned in the intro, the company builds only free apps and the source of revenue is through in-app ads. The data set contains both free and non-free apps but our analysis is only for the free apps so we need to isolate the free apps as below:

In [25]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8760
3169


So from the above, there are 8,760 Android apps and 3,169 iOS apps. 

## Most Common Apps by Genre
### Part One
Our main objective is to determine which kinds of app are likely to attract more users since our revenue depend entirely on the number of users that use the apps.

To minimize risks and everhead, the validation strategy for an app idea comprised of three steps:

- Build an Android App version and add it to the Google Playstore
- If there is positive feedback for the app, develop it further
- If the app is profitable after six months, build an iOS app version and add it to the App Store

Lets begin by looking at the most common genres for each market. Lets begin by building a frequency table for the prime_genre column of the App Store data set and the Genres and Category columns of the Google Playstore data set.

### Part Two

We will build two functions we can use to analyze the frequency table:

- One function to generate frequency tables that show percentages
- Another function to display the percentages in a descending order

In [26]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
        
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
        table_sorted = sorted(table_display, reverse = True)
        for entry in table_sorted:
            print(entry[1], ':', entry[0])

### Part Three
Lets start by examining the frequency for the prime_genre column of the App Store data set.

In [27]:
display_table(ios_final, -5)

Social Networking : 3.2817923635216157
Photo & Video : 5.0489113284947935
Social Networking : 3.2817923635216157
Games : 58.53581571473651
Photo & Video : 5.0489113284947935
Social Networking : 3.2817923635216157
Games : 58.53581571473651
Photo & Video : 5.0489113284947935
Social Networking : 3.2817923635216157
Music : 2.0511202272010096
Games : 58.53581571473651
Photo & Video : 5.0489113284947935
Social Networking : 3.2817923635216157
Music : 2.0511202272010096
Reference : 0.5364468286525718
Games : 58.53581571473651
Photo & Video : 5.0489113284947935
Social Networking : 3.2817923635216157
Music : 2.0511202272010096
Health & Fitness : 1.9880088355948247
Reference : 0.5364468286525718
Games : 58.53581571473651
Photo & Video : 5.0489113284947935
Social Networking : 3.2817923635216157
Music : 2.0511202272010096
Health & Fitness : 1.9880088355948247
Weather : 0.8520037866834964
Reference : 0.5364468286525718
Games : 58.53581571473651
Photo & Video : 5.0489113284947935
Social Networking : 

It is clear that App Store has majority of English apps that are designed for fun (entertainment, photo and video, games, social networking, sports, music, etc). Apps with practical purposes are few. 

Lets examine the genres and category columns of Google Play data set.

In [28]:
display_table(android_final, 1)

ART_AND_DESIGN : 0.6506849315068494
AUTO_AND_VEHICLES : 0.9246575342465754
ART_AND_DESIGN : 0.6506849315068494
AUTO_AND_VEHICLES : 0.9246575342465754
ART_AND_DESIGN : 0.6506849315068494
BEAUTY : 0.6050228310502284
BOOKS_AND_REFERENCE : 2.146118721461187
AUTO_AND_VEHICLES : 0.9246575342465754
ART_AND_DESIGN : 0.6506849315068494
BEAUTY : 0.6050228310502284
BUSINESS : 4.646118721461187
BOOKS_AND_REFERENCE : 2.146118721461187
AUTO_AND_VEHICLES : 0.9246575342465754
ART_AND_DESIGN : 0.6506849315068494
BEAUTY : 0.6050228310502284
BUSINESS : 4.646118721461187
BOOKS_AND_REFERENCE : 2.146118721461187
AUTO_AND_VEHICLES : 0.9246575342465754
ART_AND_DESIGN : 0.6506849315068494
BEAUTY : 0.6050228310502284
COMICS : 0.5821917808219178
BUSINESS : 4.646118721461187
COMMUNICATION : 3.2534246575342465
BOOKS_AND_REFERENCE : 2.146118721461187
AUTO_AND_VEHICLES : 0.9246575342465754
ART_AND_DESIGN : 0.6506849315068494
BEAUTY : 0.6050228310502284
COMICS : 0.5821917808219178
BUSINESS : 4.646118721461187
COMMUNI

In Google App Store, the representation is more equal across board. Though practical apps seem to have a better representation. 

In [29]:
display_table(android_final, -4)

Art & Design : 0.6050228310502284
Art & Design : 0.6050228310502284
Art & Design;Creativity : 0.0684931506849315
Auto & Vehicles : 0.9246575342465754
Art & Design : 0.6050228310502284
Art & Design;Creativity : 0.0684931506849315
Auto & Vehicles : 0.9246575342465754
Beauty : 0.6050228310502284
Art & Design : 0.6050228310502284
Art & Design;Creativity : 0.0684931506849315
Books & Reference : 2.146118721461187
Auto & Vehicles : 0.9246575342465754
Beauty : 0.6050228310502284
Art & Design : 0.6050228310502284
Art & Design;Creativity : 0.0684931506849315
Business : 4.646118721461187
Books & Reference : 2.146118721461187
Auto & Vehicles : 0.9246575342465754
Beauty : 0.6050228310502284
Art & Design : 0.6050228310502284
Art & Design;Creativity : 0.0684931506849315
Business : 4.646118721461187
Books & Reference : 2.146118721461187
Auto & Vehicles : 0.9246575342465754
Beauty : 0.6050228310502284
Art & Design : 0.6050228310502284
Comics : 0.5707762557077625
Art & Design;Creativity : 0.068493150684

The difference between the Genres and the Category columns is not so clear but one thing noticeable is is that the Genres column has more categoriess. 

Overall, from the analysis we have done thus far, we can say that the iOS App Store is dominated by apps designed for fun, while Google Play Store show more of a balanced landscape across board. 

### Most Popular Apps by Genre on the App Store

One way to find out which apps are most popular is to calculate the average number of installs for each app genre. We only have a challenge because in the App Store data, Installs column is missing. So solve this, we'll take the total number of user ratings as a proxy which can be found in the rating_count_tot app. 

In the code cell below, we can calculate the average number of user ratings per app genre on the App Store:

In [30]:
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 72916.54807692308
Photo & Video : 28441.54375
Games : 22985.211320754715
Music : 58205.03076923077
Reference : 79350.4705882353
Health & Fitness : 24037.634920634922
Weather : 54215.2962962963
Utilities : 19900.473684210527
Travel : 31358.5
Shopping : 27816.2
News : 21750.071428571428
Navigation : 86090.33333333333
Lifestyle : 16739.34693877551
Entertainment : 14364.774193548386
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 46384.916666666664
Finance : 32367.02857142857
Education : 7003.983050847458
Productivity : 21799.14814814815
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


From above, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

In [31]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) #print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
GeocachingÂ® : 12811
CoPilot GPS â Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


This same pattern applies to social networking apps, where the average number is mostly influenced by a handful of major apps like Facebook, Pinterest, Skype and others. It is also observed in music apps where the average number is influenced by big fellows like Spotify, Pandora and Shazam. 

Though our main objective is to find the popular genres, music, social networking and navigation apps seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while others can get past the ten thousand ratings threshold. To get a better picture, we can remove the extremely popular apps for each genre and then rework the averages.

Reference apps have 79350 user ratings on average though it's actually the Bible and Dictionary.com which skew up the average rating:

In [32]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD â¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for PokÃ©mon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Jishokun-Japanese English Dictionary & Translator : 0


One noble idea is to take a popular book, change it into an app and add more features that could help keep users in the app for longer because this will directly contribution to our revenue. We could add features like an embeded in-app dictionary to help users look up meaning of words, an audio version of the app, quizzes about the app and many more. 

Other genres that seem popular include food and drink, finance and weather.

### Most Popular Apps by Genre on Google Play
Luckily, we have the data we need for this analysis so we should manage to get a clearer picture. However, the number of installs does not seem precise enough because most values are open-ended (100+, 1000+, 5000+, etc). 

In [33]:
display_table(android_final, 5) # the Install columns

10,000+ : 10.205479452054794
10,000+ : 10.205479452054794
5,000,000+ : 6.872146118721462
10,000+ : 10.205479452054794
5,000,000+ : 6.872146118721462
50,000,000+ : 2.28310502283105
100,000+ : 11.518264840182649
10,000+ : 10.205479452054794
5,000,000+ : 6.872146118721462
50,000,000+ : 2.28310502283105
100,000+ : 11.518264840182649
10,000+ : 10.205479452054794
5,000,000+ : 6.872146118721462
50,000+ : 4.7716894977168955
50,000,000+ : 2.28310502283105
1,000,000+ : 15.74200913242009
100,000+ : 11.518264840182649
10,000+ : 10.205479452054794
5,000,000+ : 6.872146118721462
50,000+ : 4.7716894977168955
50,000,000+ : 2.28310502283105
1,000,000+ : 15.74200913242009
100,000+ : 11.518264840182649
10,000,000+ : 10.60502283105023
10,000+ : 10.205479452054794
5,000,000+ : 6.872146118721462
50,000+ : 4.7716894977168955
50,000,000+ : 2.28310502283105
1,000,000+ : 15.74200913242009
100,000+ : 11.518264840182649
10,000,000+ : 10.60502283105023
10,000+ : 10.205479452054794
5,000,000+ : 6.872146118721462
50

With such open-ended value numbers, we will make an assumption that the value 1000+ installs has 1000 installs.

For the computations, we'll need to convert the numbers to float to remove the commas and the plus characters, otherwise the conversation will fail and raise an error. 
This will be done in the loop and compute the average number of installs for each genre (category) as shown in the code cell below.

In [34]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 654074.8271604938
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8329168.936170213
BUSINESS : 1712290.1474201474
COMICS : 859042.1568627451
COMMUNICATION : 38550548.03859649
DATING : 861409.5521472392
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11767380.952380951
EVENTS : 253542.22222222222
FINANCE : 1365500.4049079753
FOOD_AND_DRINK : 1951283.8055555555
HEALTH_AND_FITNESS : 4219697.055350553
HOUSE_AND_HOME : 1385541.463768116
LIBRARIES_AND_DEMO : 649314.0506329114
LIFESTYLE : 1447458.976676385
GAME : 15571586.690307328
FAMILY : 3716053.755274262
MEDICAL : 121161.87781350482
SOCIAL : 23628689.23275862
SHOPPING : 7103190.78680203
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3750580.6438356163
TRAVEL_AND_LOCAL : 14120454.07804878
TOOLS : 10902378.834454913
PERSONALIZATION : 5240358.986111111
PRODUCTIVITY : 16787331.344927534
PARENTING : 552875.1785714285
WEATHER : 5212877.101449275
VIDEO_PLAYERS : 24878048.860759493
NEWS_AND_MAGAZI

Communication apps have the most installs at 38,550,548. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail and Hangout), and a few others with over 100 and 500 million installs. 

In [35]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                     or app[5] == '500,000,000'
                                     or app[5] == '100,000,000'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger â Text and Video Chat for Free : 1,000,000,000+
Skype - free IM & video calls : 1,000,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+


The average would be reduced by about ten times if we remove all the communication apps with over 100 million:

In [36]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)
    

3437620.895348837

We see the same pattern for the video players category, which is runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV or MX Player. 

The game genre seems pretty popular, but previously we found out this part of the market seems a little saturated, so coming up with a different app recommendation is feasible. 

Let's take a look at some of the apps from this genre and their number of installs:

In [37]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra â free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:

In [38]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                           or app[5] == '500,000,000+'
                                           or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Audiobooks from Audible : 100,000,000+


However, it looks like there are only a handful of popular apps. Let's try to get some app ideas based on kind of apps that are mid-level in terms of popularity since this market shows some potential. 

In [39]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                           or app[5] == '5,000,000+'
                                           or app[5] == '10,000,000+'
                                           or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra â free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+

The market seem more dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries. So it's probably not a good idea to develop app of the same kind because of existing competition. 

It also look like there are a few categories of the book Quran suggesting that building an app around a popular book or religious texts could be more profitable. 

Like in the iOS market, android market is full of libraries so we need some attractive features to the raw version such as embed audio version of the book, daily quotes from the book, and many others. 

### Conclusions
In project, we analyzed data about the App Store and Google Play mobile apps. The goal was to recommend an app profile that could be profitable for both markets. 

The conclusion from the analysis was that turning a popular book into an app could be profitable in both markets. To edge out competitors in the market, the app would need attractive features built into it. This features might include embeded dictionary, daily quotes, quizzes and audio version of the book just to mention a few.