# App Profitability Analysis

This program is designed to analyze the success of certain apps published to Google Play and the App Store. The data will show the success rate of each app so that we may better analyze the revenue provided from ads.

By deriving market trends and success rates from the content of this data, we can develop an app that will be more likely to attract users. As of September 2018, the App Store had roughly 2 million apps available to the public while Google Play had roughly 2.1 million. With so much competition, it is crucial that we analyze the market and make adjustments in order to produce the most optimal marketing strategies.

In [1]:
from csv import reader

#AppStore data set
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
iOS = list(read_file)
iOS_header = iOS[0]
iOS = iOS[1:]

#Google Play data set
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
googlePlay = list(read_file)
googlePlay_header = googlePlay[0]
googlePlay = googlePlay[1:]


In [2]:
def explore_data(dataset, start, end, rows_and_columns = False):
    data_slice = dataset[start:end]
    for row in data_slice:
        print(row)
        print("\n")
        
    if(rows_and_columns):
        print("Number of Rows: ", len(dataset))
        print("Number of Columns: ", len(dataset[0]))


In [3]:
print(iOS_header)
print("\n")
explore_data(iOS, 0, 2, True)
print("\n")
print(googlePlay_header)
print("\n")
explore_data(googlePlay, 0, 2, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of Rows:  7197
Number of Columns:  16


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Ever

# Deleting Incorrect Data
We can see below that there is an error in row corresponding with 10472. It has a rating of 19 even though the highest possible rating in the Google Play Store is 5.


In [4]:
print(googlePlay[10472])
print("\n")
print(googlePlay_header)
print("\n")
print(googlePlay[0])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


To fix this error, we will delete the row.

In [5]:
print(len(googlePlay))
del googlePlay[10472]
print(len(googlePlay))

10841
10840


# Deleting Multiple Entries
We cannot properly analyze data if the dataset contains multiple entries of the same application.


In [6]:
for app in googlePlay:
    name = app[0]
    if name == "Instagram":
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


For example, there are 4 instances where the data for Instagram is entered. 
Below, we can see how many apps in the Google Play Store have duplicate entries as well as some examples.

In [7]:
unique_apps = []
duplicate_apps = []

for app in googlePlay:
    name = app[0]    
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print("Number of Duplicated Apps: ", len(duplicate_apps))
print('\n')
print("Examples of Duplicated Apps: ", duplicate_apps[:10])

Number of Duplicated Apps:  1181


Examples of Duplicated Apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [8]:
print(len(googlePlay))
print(len(googlePlay) - 1181)

10840
9659


If we compare the data presented in the Instagram duplicates example, the key difference in entries lies in the number of ratings. Since we can base the app's reliability of the number of ratings, we will examine which keep the entry that has the most ratings rather than delete duplicates at random.

In [9]:
max_reviews = {}

for app in googlePlay:
    name = app[0]
    num_reviews = float(app[3])
    
    if name in max_reviews and max_reviews[name] < num_reviews:
        max_reviews[name] = num_reviews
    elif name not in max_reviews:
        max_reviews[name] = num_reviews

Our previous analysis showed that there were 1181 duplicate copies. Our new list length should be the old length minus the number of duplicate copies (1181).

In [10]:
print("Expected length: ", len(googlePlay) - 1181)
print("Actual length: ", len(max_reviews))

Expected length:  9659
Actual length:  9659


Since they match, we have successfully removed all duplicate copies and only kept the entries with the most reviews

In [11]:
googlePlay_clean = []
already_added = []

for app in googlePlay:
    name = app[0]
    num_reviews = float(app[3])
    if (max_reviews[name] == num_reviews) and (name not in already_added):
        googlePlay_clean.append(app)
        already_added.append(name)

In [12]:
explore_data(googlePlay_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of Rows:  9659
Number of Columns:  13


As expected, we have 9659 entries.

Now, we repeat the process for the iOS store:

In [13]:
unique_apps = []
duplicate_apps = []

for app in iOS:
    name = app[1]    
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

In [14]:
max_reviews = {}

for app in iOS:
    name = app[1]
    num_reviews = float(app[5])
    
    if name in max_reviews and max_reviews[name] < num_reviews:
        max_reviews[name] = num_reviews
    elif name not in max_reviews:
        max_reviews[name] = num_reviews

In [15]:
iOS_clean = []
already_added = []

for app in iOS:
    name = app[1]
    num_reviews = float(app[5])
    if (max_reviews[name] == num_reviews) and (name not in already_added):
        iOS_clean.append(app)
        already_added.append(name)

In [16]:
print('iOS Store with duplicates:', len(iOS))
print('iOS Store without duplicates:', len(iOS_clean))

iOS Store with duplicates: 7197
iOS Store without duplicates: 7195


In this scenario, the iOS store only had two duplicate applications. Nonetheless, we now have data sets for the GooglePlay Store and the iOS store without duplicates.

# Removing Non-English Apps

Both data sets contain applications that are not directed toward an English-speaking audience:

In [17]:
print(iOS_clean[813][1])
print('\n')
print(googlePlay_clean[4412][0])

爱奇艺PPS -《欢乐颂2》电视剧热播


中国語 AQリスニング


We are not interested in these kind of apps, so we can remove them. This is achieved by filtering characters that are outside the ASCII range of English characters.

In [18]:
def isEnglish(string):
    
    for character in string:
        if ord(character) > 127:
            return False
        
    return True
    
print(isEnglish('Instagram'))
print(isEnglish('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


This function seemingly works fine, <i> but </i> it ignores some exceptions. Some characters such as the trademark symbol (™) lie outside of our range for English character ASCII values.

In [19]:
print(isEnglish('Skip-Bo™ Pro - The Classic Family Card Game'))
print(isEnglish('Docs To Go™ Free Office Suite'))
print(isEnglish('Instachat 😜'))

print(ord('™'))

False
False
False
8482


The application names above are in English, but our function indicates otherwise due to characters such as "™" whose values lie outside of our ASCII range.

To minimize this occurence, we can change the function to check for <i>multiple</i> occurrences. 

In [20]:
def isEnglish(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
        
    if non_ascii > 3:
        return False
    else:
        return True

print(isEnglish('Skip-Bo™ Pro - The Classic Family Card Game'))
print(isEnglish('Docs To Go™ Free Office Suite'))
print(isEnglish('Instachat 😜'))
print(isEnglish('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
True
False


This is not a complete solution to the issue, but an English application is unlikely to contain more than three characters that are exceptions.

Now, we can filter both data sets for English apps:

In [21]:
googlePlay_english= []
iOS_english = []

for app in googlePlay_clean:
    name = app[0]
    if isEnglish(name):
        googlePlay_english.append(app)
        
for app in iOS_clean:
    name = app[1]
    if isEnglish(name):
        iOS_english.append(app)
        
explore_data(googlePlay_english, 0, 2, True)
print('\n')
explore_data(iOS_english, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of Rows:  9614
Number of Columns:  13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of Rows:  6181
Number of Columns:  16


We are left with 9,614 apps in the <u> GooglePlay Store</u> and 6,181
 apps in the <u> iOS Store</u>.

# Isolating Free Apps
As previously mentioned, we are analyzing the success of <b> free </b> apps in the GooglePlay and iOS stores. We must now isolate the free apps in both data sets to proceed with our analysis.

In [22]:
print(iOS_header)
print('\n')
explore_data(iOS_english, 0,1)
print(googlePlay_header)
print ('\n')


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']




In [23]:
googlePlay_final = []
iOS_final = []

for app in googlePlay_english:
    price = app[7]
    
    if price == '0':
        googlePlay_final.append(app)

for app in iOS_english:
    price = app[4]
    
    if price == '0.0':
        iOS_final.append(app)
        
print("GooglePlay:", len(googlePlay_final))
print("iOS:", len(iOS_final))

GooglePlay: 8864
iOS: 3220


We are now left with 8,864 apps in the <u> GooglePlay Store</u> and 3,220 apps in the <u> iOS Store </u> and can proceed with our analysis.

# Most Common Apps by Genre

Our aim is to identify which kind of apps are most likely to attract more users. To accomplish this effectively, our validation strategy for an app idea is comprised of three steps:
    1. Build a minimal Android version of the app.
    2. If the app has a good response from users, then develop it further.
    3. If the app is profitable after six months, then we also build an iOS version of the app.
    
Since our end goal is to add an app that is profitable on both the googlePlay and the iOS stores, we must find app profiles that are successful in both markets. 

To begin the analysis, we will compare what the most common genres are for each market using a frequency table.

In [24]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages = {}
    
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    
    for key in table:
        keyVal_tuple = (table[key], key)
        table_display.append(keyVal_tuple)
    
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0],)

In [25]:
display_table(iOS_final, -5)

Games : 58.13664596273293
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.291925465838509
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602486
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801243
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


We can see that amonng free English apps, more than half (58%) of the apps are <b>games</b>. The next highest category at approximately 8% is  entertainment, and following are photo & video, education, and social networking.

The general impression of the iOS App Store is that apps are designed for fun while the apps that are meant for practical purposes such as education, finance, productivity, etc. are more rare. 

However, we simply see that <i> fun </i> apps are the most numerous. This does not meant that they have the greatest number of users. Let's continue by examining related statistics in the GooglePlay Store.

In [26]:
display_table(googlePlay_final, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

We can see a significant difference in the preferences between platforms. The GooglePlay Store has more apps geared toward practical purposes (tools, productivity, etc.). It does not have nearly the demand for fun as we can see in the iOS App Store. 

Up to this point, we can see that the iOS App Store is dominated by apps for fun while the GooglePlay Store has apps for both fun and practical uses. Now, we will consider which apps have the most users.

# Most Popular Apps by Genre in the iOS App Store
To identify which genres have the most users, we can calculate the average number of installs for each app genre. The GooglePlay Store has the <i> installs </i> column, but the iOS App Store is missing the category. Rather, we must look at the total number of user ratings.

In [27]:
genres_iOS = freq_table(iOS_final, -5)

for genre in genres_iOS:
    total = 0
    len_genre = 0
    
    for app in iOS_final:
        genreApp = app[-5]
        if genreApp == genre:
            numRatings = float(app[5])
            total += numRatings
            len_genre += 1
        
    avgNumRatings = total / len_genre
    print(genre, ':', avgNumRatings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22812.92467948718
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


On average, navigation apps have the number of reviews. However, this is heavily due to influence from Google Maps and Waze, which account for roughly half a million reviews combined. The following navigation apps do not compare to the reviews set by the two. We can see this below:


In [28]:
for app in iOS_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


We can see this concept applied to other categories as well:

In [29]:
count = 0
for app in iOS_final:
    if app[-5] == 'Social Networking' and count < 4:
        print(app[1], ':', app[5])
        count += 1

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466


In [30]:
count = 0
for app in iOS_final:
    if app[-5] == 'Music' and count < 4:
        print(app[1], ':', app[5])
        count += 1

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228


In the <i> Social Networking </i> and <i> Music </i> categories seen above, few apps were very successful relative to others in the category. Because of this, they skewed the success of the genre in the same way that we observed in <i> Navigation</i>. <b> However, let's take a look at one more genre</b>.

In [31]:
for app in iOS_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


The <i> Reference</i> genre shows the same skew as the previous genres, but this influence is not as drastic. Based on the top-reviewed apps in this genre, it seems that users switch between an app for reading the Bible to apps for definitions and translations. 
Perhaps there is potential in a new app that allows to read another common book but with bonus features such as: an audiobook option, an imbedded dictionary, translations for common languages, notes, etc. This combined the success of the leading apps of the <i> Reference </i> genre in the iOS App Store.

Since the iOS App Store is already heavily saturated with fun apps, developing a practical app may be easier to spread awareness for. This idea also avoids genres that are difficult to develop and gain approval for:
 - Weather - people do not spend much time on this app so it is likely to be profittable. Also, it may be difficult to get reliable, live weather updates.
 - Finance - this genre requires much more in-depth analysis that would require additional approval and professionals.

Let's examine the GooglePlay Store to see how our idea fits.

# Most Popular Apps by Genre on Google Play
The GooglePlay Store provides us with the number of installations per app, giving us a clearer picture of genre popularity.

In [32]:
googlePlay_categories = freq_table(googlePlay_final, 1)

for category in googlePlay_categories:
    total = 0
    len_category = 0
    
    for app in googlePlay_final:
        numInstalls = app[5]
        numInstalls = numInstalls.replace(',', '')
        numInstalls = numInstalls.replace('+', '')
        total += float(numInstalls)
        len_category += 1
        
    avgNumInstalls = total / len_category
    print(category, ':', avgNumInstalls)

ART_AND_DESIGN : 8489513.914147113
AUTO_AND_VEHICLES : 8489513.914147113
BEAUTY : 8489513.914147113
BOOKS_AND_REFERENCE : 8489513.914147113
BUSINESS : 8489513.914147113
COMICS : 8489513.914147113
COMMUNICATION : 8489513.914147113
DATING : 8489513.914147113
EDUCATION : 8489513.914147113
ENTERTAINMENT : 8489513.914147113
EVENTS : 8489513.914147113
FINANCE : 8489513.914147113
FOOD_AND_DRINK : 8489513.914147113
HEALTH_AND_FITNESS : 8489513.914147113
HOUSE_AND_HOME : 8489513.914147113
LIBRARIES_AND_DEMO : 8489513.914147113
LIFESTYLE : 8489513.914147113
GAME : 8489513.914147113
FAMILY : 8489513.914147113
MEDICAL : 8489513.914147113
SOCIAL : 8489513.914147113
SHOPPING : 8489513.914147113
PHOTOGRAPHY : 8489513.914147113
SPORTS : 8489513.914147113
TRAVEL_AND_LOCAL : 8489513.914147113
TOOLS : 8489513.914147113
PERSONALIZATION : 8489513.914147113
PRODUCTIVITY : 8489513.914147113
PARENTING : 8489513.914147113
WEATHER : 8489513.914147113
VIDEO_PLAYERS : 8489513.914147113
NEWS_AND_MAGAZINES : 848951

Since the <i> Communication </i> category is the largest, there may be few apps that greatly skew the number of installations.

In [33]:
for app in googlePlay_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                     or app[5] == '500,000,000+'
                                     or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Indeed, this category is heavily skewed by apps such as Messenger, Skype, and WhatsApp Messenger. We can expect the same occurrence in popular categories such as Music and Video. 

However, we can examine the category that our previous idea fit into:

In [34]:
for app in googlePlay_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

This category contains a wide variety of apps that allow users to read and study various books, dictionaries, programming languages, etc. Like before, there are still few apps that skew the data.

In [35]:
for app in googlePlay_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '5,000,000+'
                                           or app[5] == '10,000,000+'
                                           or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
FBReader: Favorite Book Reader : 10,000,000+
AlReader -any text book reader : 5,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Quran for Android : 10,000,000+
Dictionary.com: Find Definitions for English Words : 10,000,000+
English Dictionary - Offline : 10,000,000+
Bible KJV : 5,000,000+
NOOK: Read eBooks & Magazines : 10,000,000+
Dictionary : 10,000,000+
Spanish English Translator : 10,000,000+
Dictionary - Merriam-Webster : 10,000,000+
JW Library : 10,000,000+
Oxford Dictionary of English : Free : 10,000,000+
English Hindi Dictionary : 10,000,000+
English to Hindi Diction

This data is similar to what we have previously seen in the iOS App Store. There are various apps for dictionaries and libraries, so it is unlikely that a new app can compete against these established programs. However, we can enter the market by implementing multiple desired features into a new app to increase efficiency and convenience.

# Conclusion

In this guided project, I cleaned and analyzed data from the iOS App Store and GooglePlay free mobile apps.

I concluded that converting a popular book into an application can be profitable in both the iOS App Store and the GooglePlay markets. Both markets already possess a variety of libraries and dictionaries, so this new application will require special features to stand out. Some of these features may included an imbedded dictionary, translations, daily quotes, or a forum in which readers can discuss their thoughts.