<h1>Profitable App Profiles for the App Store and Google Play Markets</h1>

Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users. We're working as data analysts for a company that builds Android and iOS mobile apps<br>

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use our app — the more users that see and engage with the ads, the better.

<h2>Opening and Exploring the Data</h2>

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead.Luckily, these are two data sets that seem suitable for our goals:<br><br>
- A data set containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from <a id="https://dq-content.s3.amazonaws.com/350/googleplaystore.csv">this link.</a><br>
- A data set containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from <a id="https://dq-content.s3.amazonaws.com/350/AppleStore.csv">this link.</a><br>

Let's start by opening the two data sets and then continue with exploring the data.

In [1]:
from csv import reader

#---- The Google Play data set ----#
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

#---- The App Store data set ----#
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

To make it easier to explore the two data sets, we'll first write a function named explore_data() that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any data set.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))


** Now let's take a look at the App Store data set.**

In [3]:
print(ios_header)
print('\n')
explore_data(ios, 0, 4, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


We have 7197 iOS apps in this App Store data set, and the columns that could help us with our analysis are:'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'. Not all column names are self-explanatory in this case, but details about each column can be found in the data set <a id="https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home">documentation.</a>

**Now let's take a look at the Google Play data set.**

In [4]:
print(android_header)
print('\n')
explore_data(android, 0, 4, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13


We see that the Google Play data set has 10841 apps and 13 columns. At a quick glance, the columns that might be useful for the purpose of our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.For further details refer to this <a id="https://www.kaggle.com/lava18/google-play-store-apps">documentation.</a> 

<h2>||Data Cleaning||</h2>

<h2>Deleting Wrong Data:</h2>

The Google Play data set has a dedicated <a id="https://www.kaggle.com/lava18/google-play-store-apps/discussion">discussion section</a>, and we can see that<a id="https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015"> one of the discussions</a> outlines an error for row 10472. Let's print this row and compare it against the header and another row that is correct.

In [5]:
print(android[10472])  # erroneous row
print('\n')
print(android_header)  # header
print('\n')
print(android[0])      # correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Here we see that the 10472 row corresponding to 'Life Made WI-Fi Touchscreen Photo Frame' has rating is 19 which is not possible as google play store max rating is 5.Hence, we will delete the row.

In [6]:
print(len(android))
del android[10472]  # don't run this more than once
print(len(android))

10841
10840


<h2> Removing Duplicate Entries:</h2>

<h2>Part One</h2>

If we explore the Google Play data set long enough, we'll find that some apps have more than one entry. For instance, the application Instagram has four entries:

In [7]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


<h4>counting number of dublicate apps</h4>

In [8]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
    
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


If you examine the rows we printed two cells above for the Instagram app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews. The different numbers show that the data was collected at different times. We can use this to build a criterion for keeping rows. THereby we won't remove rows randomly, but rather we'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more reliable the ratings.


<h2>Part Two</h2>

This can be done with the help of dictionaries:

In [9]:
max_reviews = {}

for apps in android:
    name = apps[0]
    reviews = float(apps[3])
    
    if name in max_reviews and max_reviews[name] < reviews:
        max_reviews[name] = reviews
        
    elif name not in max_reviews:
        max_reviews[name] = reviews

In a previous code cell, we found that there are 1,181 cases where an app occurs more than once, so the length of our dictionary (of unique apps) should be equal to the difference between the length of our data set and 1,181.

In [10]:
print('Expected length:', len(android) - 1181)
print('Actual length:', len(max_reviews))

Expected length: 9659
Actual length: 9659


Now, let's use the max_reviews dictionary to remove the duplicates.

In [11]:
android_clean = []
already_added_android = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (max_reviews[name] == n_reviews) and (name not in already_added_android):
        android_clean.append(app)
        already_added_android.append(name) # make sure this is inside the if block

In [12]:
print(len(android_clean))

9659


In [13]:
explore_data(android_clean,0,4,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9659
Number of columns: 13


We have 9659 rows,as expected!

<h2>Removing Non-English Apps:</h2>

<h2>Part one</h2>

If we explore the data long enough, we'll find that both data sets have apps with names that suggest they are not directed toward an English-speaking audience.Some of the examples from both datasets are:

In [14]:
print(ios[813][1])
print(ios[6731][1])

print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


We're not interested in keeping these apps, so we'll remove them.

All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.

In [15]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

print(is_english('Facebook'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


The function seems to work fine, but some English app names use emojis or other symbols (™, — (em dash), – (en dash),😜 etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form.

In [21]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

print(ord('™'))
print(ord('😜'))

False
False
8482
128540


<h2>Part Two</h2>

To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters:
but this is still not perfect, and very few non-English apps might get past our filter, but this seems good enough at this point in our analysis — we shouldn't spend too much time on optimization at this point.

Below, we use the is_other_char() function to filter out the non-English apps for both data sets:

In [16]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9117
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '3

<h2>Isolating the Free Apps</h2>

In [17]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8408
2922


We're left with 8408 Android apps and 2922 iOS apps, which should be enough for our analysis.

<h2>||Data Analysis||</h2>

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

<h2>Most Common Apps by Genre</h2><br>
<h2>Part One</h2>

our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.<br><br>To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

- Build a minimal Android version of the app, and add it to Google Play.
- If the app has a good response from users, we develop it further.
- If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the prime_genre column of the App Store data set, and the Genres and Category columns of the Google Play data set.

<h2>Part Two</h2>

We'll build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages
- Another function we can use to display the percentages in a descending order

In [18]:
def freq_table(dataset, index):
    table = {}
    total_rows = 0
    
    for row in dataset:
        total_rows += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentage = {}
    for key in table:
        percentage = (table[key] / total_rows) * 100
        table_percentage[key] = percentage 
    
    return table_percentage

    

In [19]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Examining the frequency table for the prime_genre column of the App Store data set.

In [20]:
display_table(ios_final, -5)

Games : 59.171800136892536
Entertainment : 7.529089664613278
Photo & Video : 5.133470225872689
Education : 3.8329911019849416
Social Networking : 3.1143052703627654
Shopping : 2.4982888432580426
Utilities : 2.2587268993839835
Music : 2.1560574948665296
Sports : 2.0533880903490758
Health & Fitness : 1.9849418206707734
Productivity : 1.7111567419575633
Lifestyle : 1.4715947980835045
News : 1.3347022587268993
Travel : 1.1293634496919918
Finance : 1.0951403148528405
Weather : 0.8898015058179329
Food & Drink : 0.8898015058179329
Reference : 0.5133470225872689
Business : 0.5133470225872689
Book : 0.2737850787132101
Medical : 0.20533880903490762
Navigation : 0.13689253935660506
Catalogs : 0.10266940451745381


We can see that among the free English apps, more than a half (59.17%) are games. Entertainment apps are close to 7.5%, followed by photo and video apps, which are close to 5%. Only 3.83% of the apps are designed for education, followed by social networking apps which amount for 3.11% of the apps in our data set.

The general impression is that App Store is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users

Examining the Genres and Category columns of the Google Play data set (two columns which seem to be related).

In [27]:
display_table(android_final, 1) # Category

FAMILY : 18.803520456707897
GAME : 9.60989533777355
TOOLS : 8.575166508087536
BUSINESS : 4.709800190294957
PRODUCTIVITY : 3.9724072312083734
LIFESTYLE : 3.8891531874405327
FINANCE : 3.73453853472883
MEDICAL : 3.6393910561370126
PERSONALIZATION : 3.306374881065652
SPORTS : 3.258801141769743
COMMUNICATION : 3.2231208372978117
HEALTH_AND_FITNESS : 3.1279733587059946
PHOTOGRAPHY : 3.0090390104662226
NEWS_AND_MAGAZINES : 2.7949571836346334
SOCIAL : 2.664129400570885
TRAVEL_AND_LOCAL : 2.3073263558515698
SHOPPING : 2.247859181731684
BOOKS_AND_REFERENCE : 2.1883920076117986
DATING : 1.8315889628924835
VIDEO_PLAYERS : 1.7602283539486203
MAPS_AND_NAVIGATION : 1.3558515699333968
FOOD_AND_DRINK : 1.2012369172216937
EDUCATION : 1.165556612749762
ENTERTAINMENT : 0.939581351094196
AUTO_AND_VEHICLES : 0.939581351094196
LIBRARIES_AND_DEMO : 0.9039010466222646
HOUSE_AND_HOME : 0.8087535680304472
WEATHER : 0.7968601332064701
EVENTS : 0.7136060894386299
ART_AND_DESIGN : 0.6660323501427212
PARENTING : 0.6

As we can see,there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

<img src="https://camo.githubusercontent.com/9bf24b9efc3d88a3d55f5c09e314987941f0bab5/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f64712d636f6e74656e742f3335302f7079316d385f66616d696c792e706e67" alt="Alt text that describes the graphic" title="Title text" />

 Frequency table we see for the Genres column:

In [32]:
display_table(android_final, -4) # Genres

Tools : 8.563273073263558
Entertainment : 6.089438629876309
Education : 5.387725975261656
Business : 4.709800190294957
Productivity : 3.9724072312083734
Lifestyle : 3.8772597526165553
Finance : 3.73453853472883
Medical : 3.6393910561370126
Sports : 3.3301617507136063
Personalization : 3.306374881065652
Communication : 3.2231208372978117
Health & Fitness : 3.1279733587059946
Action : 3.116079923882017
Photography : 3.0090390104662226
News & Magazines : 2.7949571836346334
Social : 2.664129400570885
Travel & Local : 2.3073263558515698
Shopping : 2.247859181731684
Books & Reference : 2.1883920076117986
Simulation : 2.0813510941960036
Dating : 1.8315889628924835
Arcade : 1.8315889628924835
Casual : 1.7721217887725977
Video Players & Editors : 1.736441484300666
Maps & Navigation : 1.3558515699333968
Food & Drink : 1.2012369172216937
Puzzle : 1.1298763082778307
Racing : 1.0228353948620361
Role Playing : 0.939581351094196
Auto & Vehicles : 0.939581351094196
Strategy : 0.9039010466222646
Librar

If we compare Google Play and App Store frequecy table we can see that  Google Play shows a more balanced landscape of both practical and fun apps.(as depicted from both genres and category column) compared to App Store. 

<h2>Finding Most Popular Apps</h2>

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

<h3> - Most Popular Apps by Genre on the App Store</h3>

calculating the average number of user ratings per app genre on the App Store:

In [21]:
genres_app_store=freq_table(ios_final,-5)
genre_ios=[]
for genre in genres_app_store:
    total=0
    len_genre = 0
    for row in ios_final:
        if row[-5]==genre:
            n_ratings = float(row[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    genre_ios.append((avg_n_ratings,genre))
genre_ios_sorted=sorted(genre_ios,reverse=True)  
for i in genre_ios_sorted:
    print(i[1],':',i[0])

Navigation : 125037.25
Reference : 89562.6
Social Networking : 78567.30769230769
Music : 55396.01587301587
Weather : 48275.57692307692
Travel : 34115.57575757576
Food & Drink : 33333.92307692308
Photo & Video : 29249.766666666666
Shopping : 28877.575342465752
Finance : 26038.6875
Sports : 25791.666666666668
News : 23382.17948717949
Productivity : 22842.22
Games : 21560.75072296125
Health & Fitness : 19418.620689655174
Lifestyle : 17260.53488372093
Book : 16671.0
Entertainment : 15006.227272727272
Utilities : 11571.69696969697
Business : 6839.6
Education : 6103.464285714285
Catalogs : 5195.0
Medical : 612.0


Navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

In [22]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold.

Reference apps have 89,562 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:

In [23]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Jishokun-Japanese English Dictionary & Translator : 0


 One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book.<br><br>
 Another thing we have seen above is the App Store is dominated by for-fun apps.But the total number of user ratings for games genre is much less comparatively .This suggest a practical app might have more of a chance to stand out among the huge number of apps on the App Store.<br><br>
 Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:<br>

- Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

- Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.

- Finance apps — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.

Let's explore the travel genre:<br>Here mostly Google earth dominates and the other apps have less ratings 

In [24]:
for app in ios_final:
    if app[-5] == 'Travel':
        print(app[1], ':', app[5])

Google Earth : 446185
Yelp - Nearby Restaurants, Shopping & Services : 223885
GasBuddy : 145549
TripAdvisor Hotels Flights Restaurants : 56194
Uber : 49466
Lyft : 46922
HotelTonight - Great Deals on Last Minute Hotels : 32341
Hotels & Vacation Rentals by Booking.com : 31261
Southwest Airlines : 30552
Airbnb : 22302
Expedia Hotels, Flights & Vacation Package Deals : 10278
Fly Delta : 8094
Hopper - Predict, Watch & Book Flights : 6944
United Airlines : 5748
Viator Tours & Activities : 1839
iExit Interstate Exit Guide : 1798
Gogo Entertainment : 1482
Google Street View : 1450
HISTORY Here : 685
DB Navigator : 512
Mobike - Dockless Bike Share : 494
BlaBlaCar - Trusted Carpooling : 397
Six Flags : 353
Voyages-sncf.com : book train and bus tickets : 268
Trainline UK: Live Train Times, Tickets & Planner : 248
Urlaubspiraten : 188
Ryanair - Cheapest Fares : 175
Fleet Air Travel Guide & Airport Directory : 105
FlixBus - bus travel in Europe : 92
SNCF : 7
skyticket - Reserve Best Valued Air Tick

Now let's analyze the Google Play market a bit.

<h3>- Most Popular Apps by Genre on Google Play</h3>

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [25]:
display_table(android_final, 5) # the Installs columns

1,000,000+ : 15.592293054234062
100,000+ : 11.596098953377735
10,000+ : 10.442435775451951
10,000,000+ : 10.323501427212179
1,000+ : 8.480019029495718
100+ : 7.088487155090391
5,000,000+ : 6.660323501427213
500,000+ : 5.5542340627973354
50,000+ : 4.7216936251189345
5,000+ : 4.5313986679353
10+ : 3.5442435775451955
500+ : 3.246907706945766
50,000,000+ : 2.2121788772597526
100,000,000+ : 2.1289248334919124
50+ : 1.9743101807802093
5+ : 0.8206470028544244
1+ : 0.5114176974310181
500,000,000+ : 0.285442435775452
1,000,000,000+ : 0.22597526165556614
0+ : 0.04757373929590866
0 : 0.011893434823977166


we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users, and we don't need perfect precision with respect to the number of users.<br>
so,we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

In [26]:
android_category=freq_table(android_final, 1)
category_gplay=[]
for category in android_category:
    total = 0
    len_category = 0
    for row in android_final:
        if row[1] == category:            
            installs = row[5]
            installs = installs.replace(',', '')
            installs = installs.replace('+', '')
            total += float(installs)
            len_category += 1
    avg_installs = total / len_category
    category_gplay.append((avg_installs,category))
category_gplay_sorted=sorted(category_gplay,reverse=True)  
for i in category_gplay_sorted:
    print(i[1],':',i[0])

COMMUNICATION : 36106662.328413285
VIDEO_PLAYERS : 25234606.216216218
SOCIAL : 24441088.17857143
PHOTOGRAPHY : 18099283.85375494
PRODUCTIVITY : 16972497.946107786
GAME : 15434835.816831684
TRAVEL_AND_LOCAL : 14487541.68041237
ENTERTAINMENT : 12346329.11392405
TOOLS : 11084333.292649098
NEWS_AND_MAGAZINES : 10006311.10638298
BOOKS_AND_REFERENCE : 8504745.97826087
SHOPPING : 7307823.2010582015
WEATHER : 5219216.7164179105
PERSONALIZATION : 5027006.791366907
MAPS_AND_NAVIGATION : 4304432.280701755
HEALTH_AND_FITNESS : 4263642.1749049425
SPORTS : 3647640.208029197
FAMILY : 3633707.342820999
FOOD_AND_DRINK : 1974937.1386138613
ART_AND_DESIGN : 1932519.642857143
EDUCATION : 1844897.9591836734
BUSINESS : 1602958.308080808
HOUSE_AND_HOME : 1391211.1911764706
LIFESTYLE : 1375297.3058103975
FINANCE : 1348224.9426751593
COMICS : 880440.625
DATING : 764959.4610389611
LIBRARIES_AND_DEMO : 674917.2368421053
AUTO_AND_VEHICLES : 645317.2278481013
PARENTING : 544745.6363636364
BEAUTY : 513151.886792452

On average, communication apps have the most installs: 36106662 . This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [27]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Messenger : 500,000,000+
WeChat : 100,000,000+
BBM - Free Call

If we remove all these popular apps then the number of communication app installs per app will fall low

In [28]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] != '1,000,000,000+'
                                      and app[5] != '500,000,000+'
                                      and app[5] != '100,000,000+'):
        print(app[0], ':', app[5])

Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
Contacts : 50,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Calls & Text by Mo+ : 5,000,000+
free video calls and chat : 50,000,000+
Messaging+ SMS, MMS Free : 1,000,000+
chomp SMS : 10,000,000+
Glide - Video Chat Messenger : 10,000,000+
Text SMS : 10,000,000+
Talkray - Free Calls & Texts : 10,000,000+
GroupMe : 10,000,000+
mysms SMS Text Messaging Sync : 1,000,000+
2ndLine - Second Phone Number : 1,000,000+
Ninesky Browser 

We see the same pattern for the video players category, which is the runner-up with 25,234,606 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).<br><br>
Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre is a bit saturated, so we'd like to come up with a different app recommendation if possible.<br>The travel genre as discussed above in ios app store (google earth dominates) is a good app to be build but it requires a collection of alot of information regarding all places , hotels,climate as per the kind of app to be build.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,504,745.This genre seems the best on which an  app can be build.<br><br>Let's take a look at some of the apps from this genre and their number of installs:

In [29]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
English translation from Bengali : 100,000

Let's filter the popular apps:

It seems there's still a small number of extremely popular apps that skew the average:

In [31]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Audiobooks from Audible : 100,000,000+


It looks like there are only a few very popular apps, so this market still shows potential.

As our company main source revenue is the in-app adds and we only build apps that are free to download and install.The books and reference genre apps seems the best as it doest not require much of resources and people spend alot of time on apps ,reading books.Also we notice that books like Quran and Bible are much popular therefore it would be greatly profitable to build appps on popular books and adding special features to it like audio version of the book,daily quotes from book,etc.It can also been seen that books and reference genre is popular on both ios and google play store.

<h2>Conclusions</h2>

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.<br>

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, etc.