# Finding Profitable Mobile Application Opportunities

## Introduction
> The modern world is centered around working with technology, and more specifically with mobile applications. The majority of tasks we complete today are through applications, from browsing social media to checking the balances of our bank accounts. For this reason, mobile applications have become an attractive entity for monetization. Applications that generate large amounts of traffic can be very profitable through selling advertising space on their platforms. The aim of this project is to help recommend profitable mobile app profiles to a mobile app developer client. Our client is interested in strictly making free apps and revenue through such apps are heavily influenced by the number of users that you use the app. That being said, the goal of this project is to figure out which types of apps are likely to attract more users. Throughout this project, we will be extracting our insights from the following two datasets:

> - appstore.csv - A data set containing data about approximately seven thousand iOS apps from the App Store.
> - playstore.csv - A data set containing data about approximately ten thousand Android apps from Google Play.

> The first part of this notebook will focus on preparing and cleaning the dataset for analysis, while the second part will center on analyzing the cleaned data and synthesizing a recommendation.

In [1]:
# Importing libaries
import statistics as st
from csv import reader
import pandas as pd
pd.set_option('display.max_rows', None)

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Loading the datasets
def openfile(filename):
    return list(reader(open(filename)))

appstore_data = openfile("/Users/omarstinner/Data Files/Python Projects/Files/Guided Project - Profitable App Profiles/appstore.csv")
playstore_data = openfile("/Users/omarstinner/Data Files/Python Projects/Files/Guided Project - Profitable App Profiles/playstore.csv")
appstore_header = appstore_data[0]
playstore_header = playstore_data[0]

## Part 1: Cleaning The Data

In [3]:
def faulty_detector(data_set):
    return [data_set.index(row) for row in data_set if len(row) != len(data_set[0])]

faulty_rows_playstore = faulty_detector(playstore_data)
faulty_rows_appstore = faulty_detector(appstore_data)

for index in faulty_rows_playstore:
    playstore_data.pop(index)

> **Function:** To generate a list of indexes belonging to faulty rows in a dataset

> **What's Happening?** After creating a list of faulty rows for both the Google Playstore and the Apple Appstore, we use the list of faulty rows seperatly to remove the faulty rows in each dataset dataset

In [4]:
from itertools import groupby 

sorted_playstore = sorted(playstore_data[1:], key=lambda x: (x[0], -int(x[3])))
sorted_appstore = sorted(appstore_data[1:], key=lambda x: (x[1], -int(x[5])))

def remove_duplicates(data_set, column_index):
    return [list(g)[0]for k,g in groupby(data_set, lambda x: x[column_index])]

duplicate_free_playstore = remove_duplicates(sorted_playstore,0)
duplicate_free_appstore = remove_duplicates(sorted_appstore,1)

> **Function:** To generate a list of a duplicate free dataset

> **What's Happening?** We sort the dataset in descending order based on the "App" column for the Google Playstore and the "track_name" column for the Apple Appstore before feeding it into the function. The function only works when the datasets are sorted. The function then removes the duplicates that succeed the the first instance of an app and we are left with duplicate free datasets

In [5]:
def non_english_remover(data_set, index):
    return [row for row in data_set if len([i for i in row[index] if ord(i) > 127]) <= 3]

english_playstore = non_english_remover(duplicate_free_playstore, 0)
english_appstore = non_english_remover(duplicate_free_appstore, 1)

> **Function**: To generate a list of strictly english applications

> **What's Happening?** The function removes applications that have more than 3 characters that have an ASCI value over 127. We are then left with a dataset with only english apps

In [6]:
def paid_app_remover(data_set, playstore = False):
    if playstore:
        return [row for row in data_set if "$" not in row[7]]
    if playstore == False:
        return [row for row in data_set if float(row[4]) == 0]

ready_playstore = paid_app_remover(english_playstore, True)
ready_appstore = paid_app_remover(english_appstore)
ready_playstore.insert(0, playstore_header)
ready_appstore.insert(0, appstore_header)

> **Function:** To generate a list of applications that strictly free

> **What's Happening?** For the Apple Appstore, the function removes any row that contains a "$" in the string of the "price" column. For the Google Playstore, the function removes any rows that equal 0 in the "Price" column. We are then left with datasets of only free applications. We then insert the respective headers at the beginning of the each dataset, as they will be needed for when we convert the datasets into a pandas DataFrame.

In [7]:
pandas_playstore = pd.DataFrame(ready_playstore[1:], columns = ready_playstore[0])
pandas_appstore = pd.DataFrame(ready_appstore[1:], columns = ready_appstore[0])

pandas_playstore["Installs"] = pandas_playstore["Installs"].str.replace("+","")
pandas_playstore["Installs"] = pandas_playstore["Installs"].str.replace(",","")
pandas_playstore["Installs"] = pandas_playstore["Installs"].astype(int)
pandas_appstore["rating_count_tot"] = pandas_appstore["rating_count_tot"].astype(int)

def avg_users_per_genre(data_set, column1, column2):
    return {k : (data_set.loc[data_set[column1] == k, column2].sum())/(data_set[column1] == k).sum() for k in set(data_set[column1].tolist())}

playstore_category_avg_installs = avg_users_per_genre(pandas_playstore, "Category", "Installs")
appstore_primegenre_avg_ratingcounttot = avg_users_per_genre(pandas_appstore, "prime_genre", "rating_count_tot")

playstore_sorted = dict(sorted(playstore_category_avg_installs.items(), key=lambda item: item[1], reverse = True))
appstore_sorted =  dict(sorted(appstore_primegenre_avg_ratingcounttot.items(), key=lambda item: item[1], reverse = True))

for k,v in playstore_sorted.items():
    print(k, ":", v)
    
print("\n")

for k,v in appstore_sorted.items():
    print(k, ":", v)

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

> **Function:** To generate a dictionary of average users per genre/category

> **What's Happening?** Before we pass the data set through the function we have to first convert into a Pandas DataFrame and clean it. Converting the dataset into a Pandas DataFrame makes it easier for us to maniplate the data, such as removing parts of a string from a specific column and converting strings to integers as shown above. After the datasets are passed throught the function, we are left with a dictionary of the average amount of users each genre has for the Google Playstore and the Apple Appstore

## Part 2: Analyzing The Data

> To develop a better understanding of which type of app is best to develop, we will further examine the average installs of each genre listed above to recommend a profitable application genre for the Apple App Store and the Google Play Store. The criteria for selecting such a genre will be:

> - A low skewness score in the "Installs" column for the specific genre.
> - A low count of well-established applications within the genre.
> - A genre that is popular.

In [8]:
# Finding the most dominant "Navigation" apps
appstore_navigation = pandas_appstore[pandas_appstore["prime_genre"] == "Navigation"].sort_values("rating_count_tot", ascending = False)
appstore_navigation

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
2987,323229106,"Waze - GPS Navigation, Maps & Real-time Traffic",94139392,USD,0.0,345046,3040,4.5,4.5,4.24,4+,Navigation,37,5,36,1
1141,585027354,Google Maps - Navigation & Transit,120232960,USD,0.0,154911,1253,4.5,4.0,4.31.1,12+,Navigation,37,5,34,1
1085,329541503,Geocaching¬Æ,108166144,USD,0.0,12811,134,3.5,1.5,5.3,4+,Navigation,37,0,22,1
502,504677517,CoPilot GPS ‚Äì Car Navigation & Offline Maps,82534400,USD,0.0,3582,70,4.0,3.5,10.0.0.984,4+,Navigation,38,5,25,1
1312,344176018,ImmobilienScout24: Real Estate Search in Germany,126867456,USD,0.0,187,0,3.5,0.0,9.5,4+,Navigation,37,5,3,1
2150,463431091,Railway Route Search,46950400,USD,0.0,5,0,3.0,0.0,3.17.1,4+,Navigation,37,0,1,1


In [9]:
# Calculating the Skewness Score for the "rating_count_tot" Column for the "Navigation" column
mean = pandas_appstore[pandas_appstore["prime_genre"] == "Navigation"]["rating_count_tot"].mean()
median = pandas_appstore[pandas_appstore["prime_genre"] == "Navigation"]["rating_count_tot"].median()
std_dev = st.stdev(pandas_appstore[pandas_appstore["prime_genre"] == "Navigation"]["rating_count_tot"])

# Skewness equation
(3*(mean - median)) / std_dev

1.6627029668430233

In [10]:
# Calculating the % decrease in total average after removing the well-established applications.
no_big_nav_apps = pandas_appstore[(pandas_appstore["prime_genre"] == "Navigation") & (~pandas_appstore["track_name"].isin(appstore_navigation["track_name"][:2]))]
((86090.33333333333 - (no_big_nav_apps["rating_count_tot"].mean())) / 86090.33333333333) * 100

95.1838379066949

In [11]:
# Calculating the count of big players in the genre
pandas_appstore[pandas_appstore["prime_genre"] == "Navigation"]["rating_count_tot"].value_counts().sort_index(ascending = False)

345046    1
154911    1
12811     1
3582      1
187       1
5         1
Name: rating_count_tot, dtype: int64

> Looking at the table above, we can see that the "Navigation" category is dominated by well-established applications such as "Waze" and "Google Maps". This Category might seem popular, because of its high number of average installs (~86090), however, this is not the case as the category average is highly influenced and skewed by install counts of these applications. This is evident through the skewness score of 1.66 and the percent decrease in average installs of ~ 95% after removing these "big" apps. The category only has 6 applications where most of the installs are coming from two well-established applications, making penetrating this market difficult and not feasible. Let us explore the second most popular genre: "Reference".

### Analyzing The "Reference" Genre

In [12]:
# Finding the most dominant "Reference" apps
appstore_reference = pandas_appstore[pandas_appstore["prime_genre"] == "Reference"].sort_values("rating_count_tot", ascending = False)
appstore_reference

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
256,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1
659,308750436,Dictionary.com Dictionary & Thesaurus,111275008,USD,0.0,200047,177,4.0,4.0,7.1.3,4+,Reference,37,0,1,1
660,364740856,Dictionary.com Dictionary & Thesaurus for iPad,165748736,USD,0.0,54175,10176,4.5,4.5,4.0,4+,Reference,24,5,9,1
1148,414706506,Google Translate,65281024,USD,0.0,26786,27,3.5,4.5,5.10.0,4+,Reference,37,5,59,1
1742,388389451,"Muslim Pro: Ramadan 2017 Prayer Times, Azan, Q...",100551680,USD,0.0,18418,706,4.5,5.0,9.2.1,4+,Reference,37,5,16,1
1830,1130829481,New Furniture Mods - Pocket Wiki & Game Tools ...,52959232,USD,0.0,17588,17588,4.5,4.5,1.0,4+,Reference,38,3,2,1
1651,399452287,Merriam-Webster Dictionary,155593728,USD,0.0,16849,1125,4.5,4.5,4.1,4+,Reference,38,1,12,1
1840,475772902,Night Sky,596499456,USD,0.0,12122,60,4.5,4.5,4.4.1,4+,Reference,37,5,29,1
484,1135575003,City Maps for Minecraft PE - The Best Maps for...,90124288,USD,0.0,8535,8535,4.0,4.0,1.0,4+,Reference,37,4,1,1
1478,1132715891,LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - T...,86874112,USD,0.0,4693,4693,4.0,4.0,1.0,12+,Reference,37,4,1,1


In [13]:
# Calculating the Skewness Score for the "rating_count_tot" Column for the "Reference" column
mean = pandas_appstore[pandas_appstore["prime_genre"] == "Reference"]["rating_count_tot"].mean()
median = pandas_appstore[pandas_appstore["prime_genre"] == "Reference"]["rating_count_tot"].median()
std_dev = st.stdev(pandas_appstore[pandas_appstore["prime_genre"] == "Reference"]["rating_count_tot"])

# Skewness equation
(3*(mean - median)) / std_dev

0.8831739195424232

In [14]:
#Calculating the % decrease in total average after removing the well-established applications.
no_big_ref_apps = pandas_appstore[(pandas_appstore["prime_genre"] == "Reference") & (~pandas_appstore["track_name"].isin(appstore_reference["track_name"][:2]))]
((74942.11111111111 - (no_big_ref_apps["rating_count_tot"].mean())) / 74942.11111111111) * 100

86.40692482642159

In [15]:
# Finding the count of big players in the genre
pandas_appstore[pandas_appstore["prime_genre"] == "Reference"]["rating_count_tot"].value_counts().sort_index(ascending = False)

985920    1
200047    1
54175     1
26786     1
18418     1
17588     1
16849     1
12122     1
8535      1
4693      1
1497      1
826       1
762       1
718       1
14        1
8         1
0         2
Name: rating_count_tot, dtype: int64

> We see the same pattern in the "Reference" genre as we did in the "Navigation" genre - well-established apps are dominating the category in terms of "Installs" ("Bible", "Dictionary.com"). However, in the "Reference" genre, for the "Installs" column we have a much lower skewness Score of ~0.88. Although this is still a relatively high skewness score, it is much lower than that of the "Navigation" genre and indicates that its "Installs" count is influenced by more than just a couple of applications. Also, the average percent decrease in the number of installs has improved relative to the "Navigation" genre (95% vs 86%) making developing a "Reference" application a viable option. The category has many more applications than that of the "Navigation" category further indicating greater potential in this genre. Coupled with the fact that the genre in itself is popular, the "Reference" genre seems to be a feasible and practical option for new mobile development.

### Google Playstore

In [16]:
# Finding the most dominant "COMMUNICATION" apps
playstore_communication = pandas_playstore[(pandas_playstore["Category"] == "COMMUNICATION") & ((pandas_playstore["Installs"] == 500000000) | (pandas_playstore["Installs"] == 100000000) | (pandas_playstore["Installs"] == 1000000000))]
playstore_communication

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
641,Android Messages,COMMUNICATION,4.2,781810,Varies with device,100000000,Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device
872,BBM - Free Calls & Messages,COMMUNICATION,4.3,12843436,Varies with device,100000000,Free,0,Everyone,Communication,"August 2, 2018",Varies with device,4.0.3 and up
4127,Firefox Browser fast & private,COMMUNICATION,4.4,3075118,Varies with device,100000000,Free,0,Everyone,Communication,"July 10, 2018",Varies with device,Varies with device
4399,"GO SMS Pro - Messenger, Free Themes, Emoji",COMMUNICATION,4.4,2876500,24M,100000000,Free,0,Everyone,Communication,"August 1, 2018",7.73,4.0 and up
4502,Gmail,COMMUNICATION,4.3,4604483,Varies with device,1000000000,Free,0,Everyone,Communication,"August 2, 2018",Varies with device,Varies with device
4552,Google Chrome: Fast & Secure,COMMUNICATION,4.3,9643041,Varies with device,1000000000,Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device
4556,Google Duo - High Quality Video Calls,COMMUNICATION,4.6,2083237,Varies with device,500000000,Free,0,Everyone,Communication,"July 31, 2018",37.1.206017801.DR37_RC14,4.4 and up
4737,Hangouts,COMMUNICATION,4.0,3419513,Varies with device,1000000000,Free,0,Everyone,Communication,"July 21, 2018",Varies with device,Varies with device
5120,KakaoTalk: Free Calls & Text,COMMUNICATION,4.3,2546549,Varies with device,100000000,Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device
5166,Kik,COMMUNICATION,4.3,2451136,Varies with device,100000000,Free,0,Teen,Communication,"August 3, 2018",Varies with device,Varies with device


In [17]:
# Calculating the Skewness Score for the "Installs" Column for the "COMMUNIATION" column
mean = pandas_playstore[pandas_playstore["Category"] == "COMMUNICATION"]["Installs"].mean()
median = pandas_playstore[pandas_playstore["Category"] == "COMMUNICATION"]["Installs"].median()
std_dev = st.stdev(pandas_playstore[pandas_playstore["Category"] == "COMMUNICATION"]["Installs"])

(3*(mean - median)) / std_dev

0.7274286367664985

In [18]:
# Calculating the percentage decrese in the average installs when the "big" COMMNICATION apps are removed
no_big_comm_apps = pandas_playstore[(pandas_playstore["Category"] == "COMMUNICATION") & (~pandas_playstore["App"].isin(playstore_communication["App"]))]
((38456119.167247385 - (no_big_comm_apps["Installs"].mean())) / 38456119.167247385) * 100
90.62961768765638

90.62961768765638

In [19]:
# Finding the count of big players in the genre
pandas_playstore[pandas_playstore["Category"] == "COMMUNICATION"]["Installs"].value_counts().sort_index(ascending = False)

1000000000     6
500000000      5
100000000     16
50000000       7
10000000      43
5000000       22
1000000       40
500000         9
100000        16
50000         10
10000         20
5000          16
1000          19
500            8
100           28
50             5
10            14
5              2
1              1
Name: Installs, dtype: int64

> The most popular category in the app store is "COMMUNICATION". However, after further inspection, we see that the mobile applications we would be competing with are: "Gmail", "Google Chrome", "Hangouts", "Messenger", "Skype", and "WhatsApp" etc. As we can see we have 27 apps skewing the average for all 287 applications in this Category. This shows us that these applications hold the biggest market share in the "COMMUNICATION" mobile app industry and it wouldn't make sense to try and penetrate such a market with already established players. After removing them we can see that the average installs for the "COMMUNICATION" category fell 90%. This indicates that the majority of the "Install" averages were made up of the above-listed applications. Also, the skewness Score of ~0.72 makes the "COMMUNICATION" category not a feasible option for profitable mobile development. Let us explore the second most popular category: "VIDEO_PLAYERS".

### Analyzing The "VIDEO_PLAYERS" Category

In [20]:
# Finding the most dominant "VIDEO_PLAYER" apps
playstore_video_players = pandas_playstore[(pandas_playstore["Category"] == "VIDEO_PLAYERS") & ((pandas_playstore["Installs"] == 500000000) | (pandas_playstore["Installs"] == 100000000) | (pandas_playstore["Installs"] == 1000000000))]
playstore_video_players

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
3220,Dubsmash,VIDEO_PLAYERS,4.2,1971777,29M,100000000,Free,0,Teen,Video Players & Editors,"May 11, 2018",2.35.8,4.1 and up
4571,Google Play Movies & TV,VIDEO_PLAYERS,3.7,906384,Varies with device,1000000000,Free,0,Teen,Video Players & Editors,"August 6, 2018",Varies with device,Varies with device
5518,MX Player,VIDEO_PLAYERS,4.5,6474672,Varies with device,500000000,Free,0,Everyone,Video Players & Editors,"August 6, 2018",Varies with device,Varies with device
5848,Motorola FM Radio,VIDEO_PLAYERS,3.9,54815,Varies with device,100000000,Free,0,Everyone,Video Players & Editors,"May 2, 2018",Varies with device,Varies with device
5849,Motorola Gallery,VIDEO_PLAYERS,3.9,121916,23M,100000000,Free,0,Everyone,Video Players & Editors,"January 25, 2016",Varies with device,Varies with device
8084,VLC for Android,VIDEO_PLAYERS,4.4,1032076,Varies with device,100000000,Free,0,Everyone,Video Players & Editors,"July 30, 2018",Varies with device,2.3 and up
8140,"VideoShow-Video Editor, Video Maker, Beauty Ca...",VIDEO_PLAYERS,4.6,4016834,Varies with device,100000000,Free,0,Everyone,Video Players & Editors,"July 23, 2018",Varies with device,Varies with device
8177,VivaVideo - Video Editor & Photo Movie,VIDEO_PLAYERS,4.6,9879473,40M,100000000,Free,0,Teen,Video Players & Editors,"August 4, 2018",7.2.1,4.1 and up
8485,YouTube,VIDEO_PLAYERS,4.3,25655305,Varies with device,1000000000,Free,0,Teen,Video Players & Editors,"August 2, 2018",Varies with device,Varies with device


In [21]:
# Calculating the Skewness Score for the "Installs" Column for the "VIDEO_PLAYERS" column
mean = pandas_playstore[pandas_playstore["Category"] == "VIDEO_PLAYERS"]["Installs"].mean()
median = pandas_playstore[pandas_playstore["Category"] == "VIDEO_PLAYERS"]["Installs"].median()
std_dev = st.stdev(pandas_playstore[pandas_playstore["Category"] == "VIDEO_PLAYERS"]["Installs"])

(3*(mean - median)) / std_dev

0.5977613211001883

In [22]:
# Calculating the percentage decrese in the average installs when the "big" "VIDEO_PLAYER" apps ar removed
no_big_video_players_apps = pandas_playstore[(pandas_playstore["Category"] == "VIDEO_PLAYERS") & (~pandas_playstore["App"].isin(playstore_video_players["App"]))]
((24727872.452830188 - (no_big_video_players_apps["Installs"].mean())) / 24727872.452830188) * 100

77.57640434327497

In [23]:
# Finding the count of big players in the genre
pandas_playstore[pandas_playstore["Category"] == "VIDEO_PLAYERS"]["Installs"].value_counts().sort_index(ascending = False)

1000000000     2
500000000      1
100000000      6
50000000      10
10000000      26
5000000        7
1000000       33
500000         4
100000        12
50000          6
10000         15
5000          14
1000           8
500            6
100            7
10             2
Name: Installs, dtype: int64

> We see a similar pattern in the "VIDEO_PLAYER" category as we did in the "COMMUNICATION" category, where 9 apps are skewing the average installs for all 159 applications. After removing such "big" applications and recalculating the average for the "VIDEO_PLAYER" category, the average fell by 77%, indicating how strong of a position "Dubsmash" and "Google Play Movies" have in the "VIDEO_PLAYER" mobile app market. However, we see an improvement in skewness score of ~0.59, indicating that there are more applications that are influencing the "Installs" average relative to the "COMMUNICATION" category. Although we see an improvement, we also need to pick a genre that is mutually available in the Apple App Store. For that reason coupled with the fact that there are still a small number of apps controlling the average "Install" count, we will have to pick another category. Since the "Reference" genre in the Apple App Store was a viable option, let us explore a similar category in the Google Play Store, the "BOOK_AND_REFERENCES" category.

### Analyzing The "BOOKS_AND_REFERENCE" Category

In [24]:
# Finding the most dominant "BOOKS_AND_REFERENCE" apps
playstore_books_and_references = pandas_playstore[(pandas_playstore["Category"] == "BOOKS_AND_REFERENCE") & ((pandas_playstore["Installs"] == 500000000) | (pandas_playstore["Installs"] == 100000000) | (pandas_playstore["Installs"] == 1000000000))]
playstore_books_and_references

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
615,Amazon Kindle,BOOKS_AND_REFERENCE,4.2,814151,Varies with device,100000000,Free,0,Teen,Books & Reference,"July 27, 2018",Varies with device,Varies with device
770,Audiobooks from Audible,BOOKS_AND_REFERENCE,4.5,568922,Varies with device,100000000,Free,0,Teen,Books & Reference,"August 1, 2018",Varies with device,Varies with device
1446,Bible,BOOKS_AND_REFERENCE,4.7,2440695,Varies with device,100000000,Free,0,Teen,Books & Reference,"August 2, 2018",Varies with device,Varies with device
4569,Google Play Books,BOOKS_AND_REFERENCE,3.9,1433233,Varies with device,1000000000,Free,0,Teen,Books & Reference,"August 3, 2018",Varies with device,Varies with device
8264,Wattpad üìñ Free Books,BOOKS_AND_REFERENCE,4.6,2915189,Varies with device,100000000,Free,0,Teen,Books & Reference,"August 1, 2018",Varies with device,Varies with device


In [25]:
# Calculating the Skewness Score for the "Installs" Column for the "BOOKS_AND_REFERENCES" column
mean = pandas_playstore[pandas_playstore["Category"] == "BOOKS_AND_REFERENCE"]["Installs"].mean()
median = pandas_playstore[pandas_playstore["Category"] == "BOOKS_AND_REFERENCE"]["Installs"].median()
std_dev = st.stdev(pandas_playstore[pandas_playstore["Category"] == "BOOKS_AND_REFERENCE"]["Installs"])

(3*(mean - median)) / std_dev

0.35469872992266904

In [26]:
# Calculating the percentage decrese in the average installs when the "big" "BOOKS_AND_REFERENCES" apps ar removed
no_big_books_and_references_apps = pandas_playstore[(pandas_playstore["Category"] == "BOOKS_AND_REFERENCE") & (~pandas_playstore["App"].isin(playstore_books_and_references["App"]))]
((8767811.894736841 - (no_big_books_and_references_apps["Installs"].mean())) / 8767811.894736841) * 100

83.60808564929468

In [27]:
# Finding the count of big players in the genre
pandas_playstore[pandas_playstore["Category"] == "BOOKS_AND_REFERENCE"]["Installs"].value_counts().sort_index(ascending = False)

1000000000     1
100000000      4
10000000      19
5000000        9
1000000       20
500000        16
100000        20
50000         11
10000         23
5000          14
1000          30
500            7
100            6
50             2
10             4
5              4
Name: Installs, dtype: int64

> We still see the same trend of a couple of applications influencing the average installs. This is evident when we calculate the percent decrease in average installs (-83%) after removing the above "BOOKS_AND_REFERENCES" applications. However, the "REFERENCE_AND_BOOKS" category does have a significantly lower skewness score of ~0.35, making it a more attractive genre to develop a profitable mobile application. The genre only has one application ("Google Play Books") that has 1,000,000,000 installs (unlike the "COMMUNICATION" and "VIDEO_PLAYER" categories), This shows that there is only a handful of very popular applications in this category, leaving us with an opportunity to create an app. Developing a profitable app in this genre is a feasible option as there are opportunities in both stores, the Apple App Store and the Google Play Store. Now that we have settled on a genre, let‚Äôs specifically pick a type of application to develop. Let us take look at the apps in these categories in both the Apple App Store and the Google Play Store.

In [28]:
pandas_playstore[pandas_playstore["Category"] == "BOOKS_AND_REFERENCE"][:100].sort_values("Installs", ascending = False)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
615,Amazon Kindle,BOOKS_AND_REFERENCE,4.2,814151,Varies with device,100000000,Free,0,Teen,Books & Reference,"July 27, 2018",Varies with device,Varies with device
1446,Bible,BOOKS_AND_REFERENCE,4.7,2440695,Varies with device,100000000,Free,0,Teen,Books & Reference,"August 2, 2018",Varies with device,Varies with device
770,Audiobooks from Audible,BOOKS_AND_REFERENCE,4.5,568922,Varies with device,100000000,Free,0,Teen,Books & Reference,"August 1, 2018",Varies with device,Varies with device
2480,Cool Reader,BOOKS_AND_REFERENCE,4.5,246315,Varies with device,10000000,Free,0,Everyone,Books & Reference,"July 17, 2015",Varies with device,1.5 and up
3678,English Hindi Dictionary,BOOKS_AND_REFERENCE,4.4,384368,Varies with device,10000000,Free,0,Everyone,Books & Reference,"August 4, 2018",Varies with device,Varies with device
3675,English Dictionary - Offline,BOOKS_AND_REFERENCE,4.4,341234,30M,10000000,Free,0,Everyone 10+,Books & Reference,"March 20, 2018",3.9.1,4.2 and up
546,Al'Quran Bahasa Indonesia,BOOKS_AND_REFERENCE,4.6,361780,9.7M,10000000,Free,0,Everyone,Books & Reference,"May 30, 2018",4.1,2.3 and up
3015,Dictionary.com: Find Definitions for English W...,BOOKS_AND_REFERENCE,4.6,899010,Varies with device,10000000,Free,0,Everyone,Books & Reference,"July 30, 2018",Varies with device,Varies with device
543,Al Quran Indonesia,BOOKS_AND_REFERENCE,4.8,445756,16M,10000000,Free,0,Everyone,Books & Reference,"May 15, 2018",2.6.22,4.0 and up
3013,Dictionary - Merriam-Webster,BOOKS_AND_REFERENCE,4.5,454412,Varies with device,10000000,Free,0,Everyone,Books & Reference,"May 18, 2018",Varies with device,Varies with device


In [29]:
pandas_appstore[pandas_appstore["prime_genre"] == "Reference"][:100].sort_values("rating_count_tot", ascending = False)

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
256,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1
659,308750436,Dictionary.com Dictionary & Thesaurus,111275008,USD,0.0,200047,177,4.0,4.0,7.1.3,4+,Reference,37,0,1,1
660,364740856,Dictionary.com Dictionary & Thesaurus for iPad,165748736,USD,0.0,54175,10176,4.5,4.5,4.0,4+,Reference,24,5,9,1
1148,414706506,Google Translate,65281024,USD,0.0,26786,27,3.5,4.5,5.10.0,4+,Reference,37,5,59,1
1742,388389451,"Muslim Pro: Ramadan 2017 Prayer Times, Azan, Q...",100551680,USD,0.0,18418,706,4.5,5.0,9.2.1,4+,Reference,37,5,16,1
1830,1130829481,New Furniture Mods - Pocket Wiki & Game Tools ...,52959232,USD,0.0,17588,17588,4.5,4.5,1.0,4+,Reference,38,3,2,1
1651,399452287,Merriam-Webster Dictionary,155593728,USD,0.0,16849,1125,4.5,4.5,4.1,4+,Reference,38,1,12,1
1840,475772902,Night Sky,596499456,USD,0.0,12122,60,4.5,4.5,4.4.1,4+,Reference,37,5,29,1
484,1135575003,City Maps for Minecraft PE - The Best Maps for...,90124288,USD,0.0,8535,8535,4.0,4.0,1.0,4+,Reference,37,4,1,1
1478,1132715891,LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - T...,86874112,USD,0.0,4693,4693,4.0,4.0,1.0,12+,Reference,37,4,1,1


> It seems like the applications associated with religion and languages are very popular in both markets, suggesting that building an app around such kinds of books is profitable in both stores. The popular applications center around the "Quran" and the "Bible" and so creating an application for other religions could prove to be a profitable idea. Adding features such as translations and audio recordings could also help attract more user downloads, leading to higher profitability.

## Conclusion
> The goal of this project was to find a profitable mobile app idea for both the Apple App Store and the Google Play Store. After analyzing both markets, we have narrowed down our findings to building a "Reference" application centered around a religious book. However, adding extra features to the application such as a translation and an audio recording option could help add to its uniqueness and generate more installs, leading to higher profits.

In [30]:
%%html
<style>
.nbviewer div.output_area {
  overflow-y: auto;
  max-height: 400px;
}
</style>