# Profitable App Profiles for the App Store and Google Play Markets

This project is aimed at finding the profitable mobile app profiles for the App Store and Google Play markets. The goal is to analyze data to help developers understand types of Android and iOS application that would attract more users. Suppose that the app is free to download and is meant for English-speaking users.

The project is my very first touch with Data Analytics utilizing Python. My goal with this project is acknowledge that even the very fundamental Python programming language can be used to analyze data. It took me 2 days working consistenty and continuously to finish this project. I am very glad that I have done my very first analytics project with Python and looking forward for upcoming projects in Data Analytics. 

## Opening data

According to statistics on __[statista](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)__, in the third quarter of 2019, there are 2.47 million apps on Google Play and 1.8 million apps on Apple's App Store. Hence, there would be around 4.27 million apps in total. Collecting data for over 4 million apps requires abundant amount of time and cost. Therefore, smaller data sets of apps would be utilized instead as for the scope of this project:
-  __[Google Play market](https://www.kaggle.com/lava18/google-play-store-apps)__: around 10,000 apps.

-  __[Apple's App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)__: around 7,000 apps

In [2]:
# read in data
from csv import reader
opened_file = open('googleplaystore.csv', encoding = 'utf8')
read_file = reader(opened_file)
andr = list(read_file)
andr_header = andr[0]
andr = andr[1:]

opened_file = open('AppleStore.csv', encoding = 'utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

## Exploring data

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    data_slice = dataset[start:end]
    for row in data_slice:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of column:', len(dataset[0]))

In [4]:
print(andr_header)
print('\n')
explore_data(andr, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

In [5]:
print(ios_header)
print('\n')
explore_data(ios, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of column: 16


After exploring the data, these categories from header row should be taken into consideration: 
-  Google Play: App, Category, Reviews, Installs, Type, Price and Genres
-  App Store: track_name, currency, price, rating_count_ver, prime_genre

## Data cleaning

### Delete data

The Google Play data set has a __[dedicated discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015)__, where some user pointed out that there was something wrong with row number 10474.

In [6]:
print(len(andr))
del andr[10474]
print(len(andr))

10841
10840


### Remove Duplicate Entries


In [7]:
duplicate_apps = []
non_dup_apps = []

for app in andr:
    name = app[0]
    if name in non_dup_apps:
        duplicate_apps.append(name)
    else:
        non_dup_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:10]) 

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Next, I want to know the difference between these duplicate row for the same app name. Let's take Instagram as an example. 

In [8]:
print(andr_header)
print('\n')
for app in andr:
    name = app[0]
    if name == 'Instagram':
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


As shown above, the difference between these rows is in the reviews figure. This could be possibly attributed to the fact that the data was collected at different times. Based on this, I would take the rows that have the highest number of reviews since the more reviews there are, more reliable the ratings. To start doing it, I will start building a dictionary, where each key is a unique app name, and the value is the highest number of reviews of that app. Then I create a new data set that have only one entry per application. When I run the code below, I stumble upon ValueError and I need to delete another row that has the wrong value, which is row 10472.

In [9]:
del(andr[10472])

max_review = {}

for app in andr:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in max_review and max_review[name] < n_reviews:
        max_review[name] = n_reviews
    elif name not in max_review:
        max_review[name] = n_reviews

In [10]:
android_clean = []
already_added = []

for app in andr:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == max_review[name]) and (name not in already_added):
        android_clean.append(app) 
        already_added.append(name)
        
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9658
Number of column: 13


I added the supplementary condition to the if function in case that the highest number of reviews of a duplicate app is the same for more than one entry (e.g: the Box app).

### Remove Non-English apps

For the purpose of this project, all the applications that are not meant for English-speaking users will also be removed. In order to reduce data loss, only application with its name containing more than 3 non-ASCII will be removed.

In [11]:
def English(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127: #alphabet ranging until 127 according to ASCII
            non_ascii += 1
        
    if non_ascii > 3:
        return False
    else:
        return True

print(English('My God :)'))

True


In [12]:
android_eng = []
ios_eng = []

for app in android_clean:
    name = app[0]
    if English(name):
        android_eng.append(app)

for app in ios:
    name = app[1]
    if English(name):
        ios_eng.append(app)

explore_data(android_eng, 0, 3, True)
print('\n')
explore_data(ios_eng, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9613
Number of column: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', '

Total: 9614 Android apps and 6183 iOS apps left.

### Isolating the Free Apps

In [13]:
android_final = []
ios_final = []

for app in android_eng:
    price = app[7]
    if price == '0':
        android_final.append(app)

for app in ios_eng:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8863
3222


There are 8863 apps in Google Play store and 3222 apps in App Store for final analysis.

## Analyzing data

As mentioned from the beginning, the aim is to determine type of applications that are likely to attract mỏe users since revenue heavily depends on the amount of users using these applications. To minimize overhead and risks, the validation strategy for an app idea consists of 3 steps:
1. Build a minimal Android version of the app, and add it to Google Play store.
2. If the app has good responses from users, the app will be kept developing.
3. If the app is profitable after 6 months, an iOS version of app will be built and add to App Store. 

It is important that the app profile should be successful on both Google Play and App Store markets. 

### Most common apps by genre

2 functions will be built to analyze the frequency tables:
-  One function to generate frequency tables that show percentages
-  Another function that we can use to display the percentages in a descending order

In [14]:
def freq_table(dataset, index):
    table = {}
    total = 0 
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Examining the frequency table fo the prime_genre column (index is -5) of Apple Store data set.

In [15]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


It can be seen from the result above that Games genre applications appear the most in free English apps with around 58%. The 2 following genres are: Entertainment (around 8%) and Photo & Video (around 5%). This illustrates that App Store is dominated by apps that are designed for fun while apps with practical purposes such as medical, book, education are more rare. 


Let's move on examining the genres column (index is -4) of Google Play store.

In [16]:
display_table(android_final, -4)

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.226898341419384
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
St

The percentages of different categories are quite evenly divided in Google Play store. From the result above, a good amount of applications are designed for practical purposes with only 6% for entertainment. Let's examine the category column (index is 1) of Google Play data set. 

In [17]:
display_table(android_final, 1)

FAMILY : 18.910075595170937
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.226898341419384
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0.

Since the genre column is more granular than category and we need to look at bigger picture, only the category column of Google Play data set will be utilized onwards. 

From the results collected above, it can be concluded that App Store is dominated with apps meant for entertainment and fun with 58% for game genres. Meanwhile, apps meant for practical purposes appear more frequently than those for entertainment purposes in Ggoogle Play store.  

### Most popular apps by genre on the App Store

In [18]:
genre_ios = freq_table(ios_final, -5)

for genre in genre_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total/len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Navigation apps have the highest number of user reviews, following by Reference and Social Networking genres. 

In [19]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Navigation applications are heavily influenced by Waze and Google Maps, which have around half a million users's reviews. The same applies for social media applications with giants such as Facebook, Skype and Pinterest and very few smaller key players such as Spotify and Shazam heavily influencing the average number. Let's take a look at Reference genres:

In [20]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Once again, Bible and Dictionary.com skew up the average rating. It seems that the average number of ratings seem to be skewed up by few apps that have hundreds of thousanfs of user rating, while the other apps may struggle to attract users. We could get a bigger picture by removing these giants for ech genre and then recount the averages, but this requires a higher level of details, which is not suitable for this level of coding. 

Some other popular genres include: weather, book, food and drink, finance and music. 
- Weather apps: people do not really need to spend too much time in-app, so the chance of making profit in-app adds are pretty low. Additionally, getting reliable live weather data may require connection to non-free APIs. 
- Food & Drink apps: these applications require actual cooking and a delivery service, which is outside the scope of project.
- Finance apps: Building these apps requires domain knowledge and a finance expert, which is quite extra to build an app. 
- Music: this requires the understanding of license and copyright, a bit too complicated. 
- Book: seems to be most relevant one. If we can combine somehow to have some sort of dictionary or make the app towards gamification, that would be the best.

Let's move to Google Play data set and what are the outcomes then.

### Most popular apps by genre on Google Play 

In [21]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total/len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38590577.625874124
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

According to this niche, communication applications have the highest number of installation, following by entertainment, tools, social, photography and video players. Let's see what are these apps in communcation category. 

In [22]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print (app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

This again shows that there are some giants dominating the whole category, which could lead to possibly false direction of analysis. 

Let's look at book category and see if the market is potential or not. 

In [23]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE'and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print(app[0], '', app[5])

Google Play Books  1,000,000,000+
Bible  100,000,000+
Amazon Kindle  100,000,000+
Wattpad 📖 Free Books  100,000,000+
Audiobooks from Audible  100,000,000+


Very potential indeed, with very few popular applications. Let's check a bit less popular applications in books and references category. 

In [24]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE'and (app[5] == '50,000,000+' or app[5] == '10,000,000+' or app[5] == '5,000,000+' or app[5] == '1,000,000+'):
        print(app[0], '', app[5])

Wikipedia  10,000,000+
Cool Reader  10,000,000+
Book store  1,000,000+
FBReader: Favorite Book Reader  10,000,000+
Free Books - Spirit Fanfiction and Stories  1,000,000+
AlReader -any text book reader  5,000,000+
FamilySearch Tree  1,000,000+
Cloud of Books  1,000,000+
ReadEra – free ebook reader  1,000,000+
Ebook Reader  5,000,000+
Read books online  5,000,000+
eBoox: book reader fb2 epub zip  1,000,000+
All Maths Formulas  1,000,000+
Ancestry  5,000,000+
HTC Help  10,000,000+
Moon+ Reader  10,000,000+
English-Myanmar Dictionary  1,000,000+
Golden Dictionary (EN-AR)  1,000,000+
All Language Translator Free  1,000,000+
Aldiko Book Reader  10,000,000+
Dictionary - WordWeb  5,000,000+
50000 Free eBooks & Free AudioBooks  5,000,000+
Al-Quran (Free)  10,000,000+
Al Quran Indonesia  10,000,000+
Al'Quran Bahasa Indonesia  10,000,000+
Al Quran Al karim  1,000,000+
Al Quran : EAlim - Translations & MP3 Offline  5,000,000+
Koran Read &MP3 30 Juz Offline  1,000,000+
Hafizi Quran 15 lines per pag

The results above show that software for processing and reading ebooks, as well as various collections of libraries and dictionaries are the dominants. So it will not be a good idea to builid simila apps as there will be quite significant competition. 

There are some apps build around Al-Quran, which means that building an app around a popular book can be profitable for both markets. However, there must be some different features inside the application to make it outstanding from other applications, especially libraries. Some features could be audiobook, daily quotes from book, quizzes and discussion where people can discuss the book. 

## Conclusions

I have analyzed the web scrapped data sets of App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for these markets.

I found out that the book genre application is quite potential with very few popular applications compared to other popular genres such as communication or photography. That taking a recently popular book and turning it into an app could be profitable for App Store and Google Play markets. Since the book app market is now full of libraries, the app should contain special features to make it outstanding in the market. Such features could be audio version, daily quotes and quizzes, a discussion forum or review section. 