# Profitable App Profiles for the App Store and Google Play Markets 

**Scenario**: You are a data analyst for a company that builds Android and iOS mobile apps. These apps become available on Google Play and in the App Store. Only apps that are free to download and install are built, thus the company's main source of revenue consists of in-app ads. The number of users of the apps is the primary revenue consideration for any given app — the more users who see and engage with the ads, the better. 

**Goal**: Analyze the data to help the developers understand what type of apps are likely to attract more users. 

**Table of Contents**:
- Load app data 
- Remove erroneous data
- Remove duplicate apps
- Remove non-English apps
- Remove paid apps
- Most common apps by genre
- Most popular apps by genre
- Conclusion


## Load app data from Apple Store and Google Play Store .csv files

In [1]:
from csv import reader

opened_ios = open("applestore.csv")
read_ios = reader(opened_ios)
ios = list(read_ios)
ios_header = ios[0]
ios = ios[1:]

opened_gplay = open("googleplaystore.csv")
read_gplay = reader(opened_gplay)
gplay = list(read_gplay)
gplay_header = gplay[0]
gplay = gplay[1:]

In [2]:
print(f"Total* number of Apple Store apps: {len(ios)}")
print(f"Total* number of Google Play Store: {len(gplay)}")
print("*As provided")

Total* number of Apple Store apps: 7197
Total* number of Google Play Store: 10841
*As provided


### Course-provided `explore_data` function serves to print rows from the data in a readable way

In [3]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print("\n")
    if rows_and_columns:
        print("Number of rows:", len(dataset))
        print("Number of columns:", len(dataset[0]))

#### Data snippets from Apple Store

In [4]:
explore_data(ios, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


#### Data snippets from Google Play Store

In [5]:
explore_data(gplay, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


## Remove erroneous data

From one of the [Kaggle discussion](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015)'s which points to app ```10472``` in the ```Google Play Store``` data, it appears that the category for the `Life Made Wi-Fi Touchscreen Photo Frame` app is missing and all subsequent columns have been shifted forward by one. Consequently, this entry has 12 total columns instead of 13.

The erroneous entry is found on row `10472` in our data (without the header).

In [6]:
print(gplay[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [7]:
print(len(gplay[10472]))

12


As one step in the data cleaning process, the erroneous row `10472` is removed.

In [8]:
del gplay[10472]

In [9]:
print(gplay[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


For completeness, the entire Google Play Store data is checked to see if the number of columns for any given row is not 13.

In [10]:
gplay_erroneous_length = []
for app in gplay:
    name = app[0]
    if len(app) != 13:
        gplay_erroneous_length.append(name)
print(gplay_erroneous_length)

[]


We then apply this method to the Apple Store data to isolate rows that do not have 16 columns. 

In [11]:
ios_erroneous_length = []
for app in ios:
    name = app[0]
    if len(app) != 16:
        ios_erroneous_length.append(name)
print(ios_erroneous_length)

[]


In [12]:
print(f"Total* number of Apple Store apps: {len(ios)}")
print(f"Total* number of Google Play Store: {len(gplay)}")
print("*After check for data errors")

Total* number of Apple Store apps: 7197
Total* number of Google Play Store: 10840
*After check for data errors


## Remove duplicate apps

From another [Kaggle discussion](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps/discussion/90409), it would appear there may be potential duplicates in the Apple Store data.

A Kaggler provided the [code](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps/discussion/106176#812842) below to list apps with duplicate names.

In [13]:
ios_unique_apps = [] 
ios_duplicate_apps = [] 

for app in ios: 
    name = app[1] 

    if name not in ios_unique_apps:
        ios_unique_apps.append(name)
    else:
        ios_duplicate_apps.append(app)
        
print(ios_duplicate_apps)

[['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1'], ['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']]


In [14]:
len(ios_duplicate_apps)

2

Upon closer inspection, the apps `Mannequin Challenge` and `VR Roller Coaster` are not duplicates but rather two separate apps each with the same names. This is noted by the differences primarily in the app size (`size_bytes`) and total rating count (`rating_count_tot`). 

Therefore, the Apple Store data does not contain any duplicates. 

There are 7,197 unique apps on the Apple Store.

In [15]:
print(ios[0])
print("\n")

for app in ios:
    if app[1] == "Mannequin Challenge" or app[1] == "VR Roller Coaster":
        print(app)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


In [16]:
print(f"Total* number of Apple Store apps: {len(ios)}")
print("*After checks for errors and duplicates")

Total* number of Apple Store apps: 7197
*After checks for errors and duplicates


We now apply this same method of finding duplicates in the Google Play Store data.

In [17]:
gplay_unique_apps = [] 
gplay_duplicate_apps = [] 

for app in gplay: 
    name = app[0] 

    if name not in gplay_unique_apps:
        gplay_unique_apps.append(name)
    else:
        gplay_duplicate_apps.append(app)

# Printing only first 5 rows        
print(gplay_duplicate_apps[5])

['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [18]:
len(gplay_duplicate_apps)

1181

It appears that the Google Play Store data has `1,181` lines associated with duplicates. 

Taking the header row into account, we expect the total number of Google Play apps to be `9,659`.

In [19]:
len(gplay)-len(gplay_duplicate_apps)

9659

Isolating the first app in the comprehensive list of duplicates above, we see the app `Quick PDF Scanner + OCR FREE` listed three times. There is a slight difference in the 3rd listing with `80804` reviews instead of `80805` as shown in the first two listings.

In [20]:
print(gplay_header)
print("\n")

for app in gplay[1:]:
    if app[0] == "Quick PDF Scanner + OCR FREE":
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


We can assume the app listing with the highest number of reviews has spent the longest duration on the store. Therefore, we take the listing with the maximum number of reviews as the criteria for removing duplicates. We create a dictionary called `reviews_max` to house the maximum number of reviews for each app. As noted above, we should end up with `9,659` unique Google Play listings after removing duplicates.

In [21]:
reviews_max = {}
for app in gplay:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

len(reviews_max)

9659

Rather than removing duplicates, we build and store the unique Google Play Store apps into a new list called `android_clean`. Because the `for` loop will produce an error if including a header, we loop without it and then add the header as the first row in the new `android_clean` list. 

In the case where an app may have two listings with the same maximum number of reviews, we employ the `already_added` list so that once an app has been added, it is not added to the clean list.

To retrieve the total number of apps, we take the length of the new list and subtract one to account for the header row.

In [22]:
android_clean = []
already_added = []

for app in gplay:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

print(f"Total* number of Google Play apps: {len(android_clean)}")
print("*After checks for errors and duplicates")

Total* number of Google Play apps: 9659
*After checks for errors and duplicates


## Remove non-English apps

From the Dataquest lesson:
`The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system. Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. If the number is equal to or less than 127, then the character belongs to the set of common English characters.`

We create a function that passes in the app's name returns `False` if the corresponding number for any character in the name is outside the range 0 to 127 and `True` if within the range.

In [23]:
def is_english(string):
    not_ascii = 0
    for char in string:
        if ord(char) > 127:
            not_ascii += 1
    if not_ascii > 3:
        return False
    else:
        return True
        
# tests
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


Applying this function to both the Apple Store and Google Play Store data sets to retrieve only the apps designed for English speakers.

In [24]:
ios_english = []
for app in ios:
    name = app[1]
    if is_english(name) == True:
        ios_english.append(app)

gplay_english = []
for app in android_clean:
    name = app[0]
    if is_english(name) == True:
        gplay_english.append(app)
        
print(f"Total* number of Apple Store apps: {len(ios_english)}")
print(f"Total* number of Google Play Store apps: {len(gplay_english)}")
print("*After removal of errors, duplicates, and non-ASCII app names")

Total* number of Apple Store apps: 6183
Total* number of Google Play Store apps: 9614
*After removal of errors, duplicates, and non-ASCII app names


## Remove paid apps

We print the app headers again to view indices.\
The `price` is in index `4` for Apple and `7` for Google Play.

In [25]:
print(ios_header)
print("\n")
print(gplay_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


We create new lists to pass in only apps with `price` = `0`. Both the Apple and Google Play prices come as strings so these are converted into `float` values. The Google Play app `price` has a leading `$` so this symbol is stripped using the `lstrip` function.

In [26]:
ios_free = []
gplay_free = []

for app in ios_english:
    price = float(app[4])
    if price == 0:
        ios_free.append(app)

for app in gplay_english:
    price = float(app[7].lstrip("$"))
    if price == 0:
        gplay_free.append(app)
        
print(f"Total* number of English Apple Store apps: {len(ios_free)}")
print(f"Total* number of English Google Play Store apps: {len(gplay_free)}")
print("*After removal of errors, duplicates, non-ASCII app names, and paid apps")

Total* number of English Apple Store apps: 3222
Total* number of English Google Play Store apps: 8864
*After removal of errors, duplicates, non-ASCII app names, and paid apps


## Most common apps by genre

We first import the plt module so we can graphically visualize results

In [27]:
import matplotlib.pyplot as plt

To figure out what genre is most popular for Apple Store apps, the number of installs is not provided so we can use the average number of ratings per genre instead.

In [28]:
ios_genre_counts = {}
for app in ios_free:
    genre = app[11]
    if genre in ios_genre_counts:
        ios_genre_counts[genre] += 1
    else:
        ios_genre_counts[genre] = 1

ios_genres = list(ios_genre_counts.keys())
ios_counts = list(ios_genre_counts.values())
ios_percentages = {genre: (ios_genre_counts[genre] * 100 / len(ios_free)) for genre in ios_genre_counts}
for row in sorted(ios_percentages.items(), key=lambda x:x[1], reverse=True):
    print(row)

('Games', 58.16263190564867)
('Entertainment', 7.883302296710118)
('Photo & Video', 4.9658597144630665)
('Education', 3.6623215394165114)
('Social Networking', 3.2898820608317814)
('Shopping', 2.60707635009311)
('Utilities', 2.5139664804469275)
('Sports', 2.1415270018621975)
('Music', 2.0484171322160147)
('Health & Fitness', 2.017380509000621)
('Productivity', 1.7380509000620732)
('Lifestyle', 1.5828677839851024)
('News', 1.3345747982619491)
('Travel', 1.2414649286157666)
('Finance', 1.1173184357541899)
('Weather', 0.8690254500310366)
('Food & Drink', 0.8069522036002483)
('Reference', 0.5586592178770949)
('Business', 0.5276225946617008)
('Book', 0.4345127250155183)
('Navigation', 0.186219739292365)
('Medical', 0.186219739292365)
('Catalogs', 0.12414649286157665)


`58%` of all free English Apple Store apps are in the `Games` category, followed by `Entertainment` at `7.9%` and `Photo & Video` at `5%`.

Similarly, we apply these functions to the Google Store apps.

In [29]:
gplay_genre_counts = {}
for app in gplay_free:
    genre = app[1]
    if genre in gplay_genre_counts:
        gplay_genre_counts[genre] += 1
    else:
        gplay_genre_counts[genre] = 1


gplay_genres = list(gplay_genre_counts.keys())
gplay_counts = list(gplay_genre_counts.values())
gplay_percentages = {genre: (gplay_genre_counts[genre] * 100 / len(gplay_free)) for genre in gplay_genre_counts}
for row in sorted(gplay_percentages.items(), key=lambda x:x[1], reverse=True):
    print(row)

('FAMILY', 18.907942238267147)
('GAME', 9.724729241877256)
('TOOLS', 8.461191335740072)
('BUSINESS', 4.591606498194946)
('LIFESTYLE', 3.9034296028880866)
('PRODUCTIVITY', 3.892148014440433)
('FINANCE', 3.700361010830325)
('MEDICAL', 3.5311371841155235)
('SPORTS', 3.395758122743682)
('PERSONALIZATION', 3.3167870036101084)
('COMMUNICATION', 3.237815884476534)
('HEALTH_AND_FITNESS', 3.079873646209386)
('PHOTOGRAPHY', 2.9444945848375452)
('NEWS_AND_MAGAZINES', 2.7978339350180503)
('SOCIAL', 2.6624548736462095)
('TRAVEL_AND_LOCAL', 2.33528880866426)
('SHOPPING', 2.2450361010830324)
('BOOKS_AND_REFERENCE', 2.1435018050541514)
('DATING', 1.861462093862816)
('VIDEO_PLAYERS', 1.7937725631768953)
('MAPS_AND_NAVIGATION', 1.3989169675090252)
('FOOD_AND_DRINK', 1.2409747292418774)
('EDUCATION', 1.1620036101083033)
('ENTERTAINMENT', 0.9589350180505415)
('LIBRARIES_AND_DEMO', 0.9363718411552346)
('AUTO_AND_VEHICLES', 0.9250902527075813)
('HOUSE_AND_HOME', 0.8235559566787004)
('WEATHER', 0.80099277978

`19%` of free English Google Play apps are in the `Family` category, followed by `Games` at `9.7%` and `Tools` at `8.5%`.

In [136]:
gplay_family_apps = []
for app in gplay_free:
    if app[1] == "FAMILY":
        name = app[0]
        gplay_family_apps.append(name)
        
print(gplay_family_apps[0:20])

['Jewels Crush- Match 3 Puzzle', 'Coloring & Learn', 'Mahjong', 'Super ABC! Learning games for kids! Preschool apps', 'Toy Pop Cubes', 'Educational Games 4 Kids', 'Candy Pop Story', 'Princess Coloring Book', 'Hello Kitty Nail Salon', 'Candy Smash', 'Happy Fruits Bomb - Cube Blast', 'Princess Adventures Puzzles', 'Kids Educational Game 3 Free', 'Puzzle Kids - Animals Shapes and Jigsaw Puzzles', 'Coloring book moana', 'Baby Panda Care', 'Kids Educational :All in One', 'Number Counting games for toddler preschool kids', 'Learn To Draw Glow Flower', 'No. Color - Color by Number, Number Coloring']


Upon closer inspection, many of the Google Play Store apps tagged in the `Family` category are actually games.

Since the majority of Apple Store and Google Play Store apps are in the `Gaming` category, we preliminarily propose narrowing the focus on gaming apps. To validate this claim, we explore whether this category is popular for both platforms.

## Most popular apps by genre

We pull in the two dataset headers again so we can retrieve the indices we need to analyze popular apps.

In [31]:
print(ios_header,"\n")
print(gplay_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Based on the header names, we will use the `rating_count_tot` in index `5` for the Apple Store in lieu of number of app installs since that is not provided. It appears that number of app installs for the Google Play Store however is provided in the `Installs` column in index `5`.

We first inspect the columns more closely.

In [32]:
ios_unique_genres = [] 

for app in ios_free: 
    name = app[11] 

    if name not in ios_unique_genres:
        ios_unique_genres.append(name)
        
print(len(ios_unique_genres))

23


There are 23 unique Apple Store genres.

In [33]:
gplay_unique_genres = [] 

for app in gplay_free: 
    name = app[1] 

    if name not in gplay_unique_genres:
        gplay_unique_genres.append(name)
        
print(len(gplay_unique_genres))

33


There are 33 unique Google Play genres.

In [34]:
print(ios_free[0:5])

[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']]


It appears that the `rating_count_tot` column will need to be converted from a `str` format to a `float` format.

In [35]:
avg_num_ratings = {}
for genre in ios_unique_genres:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[11]
        if genre_app == genre:
            user_ratings = float(app[5])
            total += user_ratings
            len_genre += 1
    avg_num_ratings[genre] = int(total/len_genre)
    
print(sorted(avg_num_ratings.items(), key=lambda x:x[1], reverse=True))

[('Navigation', 86090), ('Reference', 74942), ('Social Networking', 71548), ('Music', 57326), ('Weather', 52279), ('Book', 39758), ('Food & Drink', 33333), ('Finance', 31467), ('Photo & Video', 28441), ('Travel', 28243), ('Shopping', 26919), ('Health & Fitness', 23298), ('Sports', 23008), ('Games', 22788), ('News', 21248), ('Productivity', 21028), ('Utilities', 18684), ('Lifestyle', 16485), ('Entertainment', 14029), ('Business', 7491), ('Education', 7003), ('Catalogs', 4004), ('Medical', 612)]


`Navigation`, `Reference`, and `Social Networking` apps have the highest number of user ratings on the Apple Store.

In [42]:
for app in ios_free:
    if app[11] == "Navigation":
        print(f"{app[1]}: {app[5]}")

Waze - GPS Navigation, Maps & Real-time Traffic: 345046
Google Maps - Navigation & Transit: 154911
Geocaching®: 12811
CoPilot GPS – Car Navigation & Offline Maps: 3582
ImmobilienScout24: Real Estate Search in Germany: 187
Railway Route Search: 5


The `Navigation` category is dominated primarily by Waze and Google Maps.

In [47]:
for app in ios_free:
    if app[11] == "Reference":
        print(f"{app[1]}: {app[5]}")

Bible: 985920
Dictionary.com Dictionary & Thesaurus: 200047
Dictionary.com Dictionary & Thesaurus for iPad: 54175
Google Translate: 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran: 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition: 17588
Merriam-Webster Dictionary: 16849
Night Sky: 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE): 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools: 4693
GUNS MODS for Minecraft PC Edition - Mods Tools: 1497
Guides for Pokémon GO - Pokemon GO News and Cheats: 826
WWDC: 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free: 718
VPN Express: 14
Real Bike Traffic Rider Virtual Reality Glasses: 8
教えて!goo: 0
Jishokun-Japanese English Dictionary & Translator: 0


The `Reference` category is dominated by the Bible and Dictionary apps.

In [48]:
for app in ios_free:
    if app[11] == "Social Networking":
        print(f"{app[1]}: {app[5]}")

Facebook: 2974676
Pinterest: 1061624
Skype for iPhone: 373519
Messenger: 351466
Tumblr: 334293
WhatsApp Messenger: 287589
Kik: 260965
ooVoo – Free Video Call, Text and Voice: 177501
TextNow - Unlimited Text + Calls: 164963
Viber Messenger – Text & Call: 164249
Followers - Social Analytics For Instagram: 112778
MeetMe - Chat and Meet New People: 97072
We Heart It - Fashion, wallpapers, quotes, tattoos: 90414
InsTrack for Instagram - Analytics Plus More: 85535
Tango - Free Video Call, Voice and Chat: 75412
LinkedIn: 71856
Match™ - #1 Dating App.: 60659
Skype for iPad: 60163
POF - Best Dating App for Conversations: 52642
Timehop: 49510
Find My Family, Friends & iPhone - Life360 Locator: 43877
Whisper - Share, Express, Meet: 39819
Hangouts: 36404
LINE PLAY - Your Avatar World: 34677
WeChat: 34584
Badoo - Meet New People, Chat, Socialize.: 34428
Followers + for Instagram - Follower Analytics: 28633
GroupMe: 28260
Marco Polo Video Walkie Talkie: 27662
Miitomo: 23965
SimSimi: 23530
Grindr - G

The `Social Networking` category is dominated by quite a lot of social networking apps like Facebook, Pinterest, Skype, Messenger, Tumblr, WhatsApp and Kik.

We analyze the Google Play apps to see if we can make a comparison.

In [38]:
print(gplay_free[0:5])

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]


Google Play number of app installs appears to be bucketed in the format `XX,XXX+`. We will need to isolate these and reformat them into `float` values.

In [39]:
gplay_free_unique_installs = []

for app in gplay_free:
    count_installs = app[5]
    if count_installs not in gplay_free_unique_installs:
        gplay_free_unique_installs.append(count_installs)
        
print(sorted(gplay_free_unique_installs))

['0', '0+', '1+', '1,000+', '1,000,000+', '1,000,000,000+', '10+', '10,000+', '10,000,000+', '100+', '100,000+', '100,000,000+', '5+', '5,000+', '5,000,000+', '50+', '50,000+', '50,000,000+', '500+', '500,000+', '500,000,000+']


Already we can see formatting issues arising from the commas and plus symbols in the text strings. We proceed by removing these from the list values.

In [49]:
avg_num_ratings = {}
for genre in gplay_unique_genres:
    total = 0
    len_genre = 0
    for app in gplay_free:
        genre_app = app[1]
        if genre_app == genre:
            count_installs = app[5]
            count_installs = count_installs.replace("+", "")
            count_installs = count_installs.replace(",", "")
            total += float(count_installs)
            len_genre += 1
    avg_num_ratings[genre] = int(total/len_genre)
    
print(sorted(avg_num_ratings.items(), key=lambda x:x[1], reverse=True))

[('COMMUNICATION', 38456119), ('VIDEO_PLAYERS', 24727872), ('SOCIAL', 23253652), ('PHOTOGRAPHY', 17840110), ('PRODUCTIVITY', 16787331), ('GAME', 15588015), ('TRAVEL_AND_LOCAL', 13984077), ('ENTERTAINMENT', 11640705), ('TOOLS', 10801391), ('NEWS_AND_MAGAZINES', 9549178), ('BOOKS_AND_REFERENCE', 8767811), ('SHOPPING', 7036877), ('PERSONALIZATION', 5201482), ('WEATHER', 5074486), ('HEALTH_AND_FITNESS', 4188821), ('MAPS_AND_NAVIGATION', 4056941), ('FAMILY', 3695641), ('SPORTS', 3638640), ('ART_AND_DESIGN', 1986335), ('FOOD_AND_DRINK', 1924897), ('EDUCATION', 1833495), ('BUSINESS', 1712290), ('LIFESTYLE', 1437816), ('FINANCE', 1387692), ('HOUSE_AND_HOME', 1331540), ('DATING', 854028), ('COMICS', 817657), ('AUTO_AND_VEHICLES', 647317), ('LIBRARIES_AND_DEMO', 638503), ('PARENTING', 542603), ('BEAUTY', 513151), ('EVENTS', 253542), ('MEDICAL', 120550)]


The `COMMUNICATION` apps have been installed 38 million times on average, followed by 24.7 million for `VIDEO_PLAYERS` apps, and 23 million for `SOCIAL`.

In [128]:
gplay_communication = {}
for app in gplay_free:

    if app[1] == "COMMUNICATION":
        name = app[0]
        count_installs = app[5]
        count_installs = count_installs.replace("+", "")
        count_installs = count_installs.replace(",", "")
        if name not in gplay_communication:
            gplay_communication[name] = float(count_installs)
            
print(sorted(gplay_communication.items(), key=lambda x:x[1], reverse=True)[0:10])
print(len(gplay_communication))

[('WhatsApp Messenger', 1000000000.0), ('Messenger – Text and Video Chat for Free', 1000000000.0), ('Skype - free IM & video calls', 1000000000.0), ('Google Chrome: Fast & Secure', 1000000000.0), ('Gmail', 1000000000.0), ('Hangouts', 1000000000.0), ('Google Duo - High Quality Video Calls', 500000000.0), ('imo free video calls and chat', 500000000.0), ('LINE: Free Calls & Messages', 500000000.0), ('UC Browser - Fast Download Private & Secure', 500000000.0)]
287


The `COMMUNICATION` apps are heavily saturated on the Google Play Store with billions of installs.

In [123]:
gplay_video_players = {}
for app in gplay_free:

    if app[1] == "VIDEO_PLAYERS":
        name = app[0]
        count_installs = app[5]
        count_installs = count_installs.replace("+", "")
        count_installs = count_installs.replace(",", "")
        if name not in gplay_video_players:
            gplay_video_players[name] = float(count_installs)
            
print(sorted(gplay_video_players.items(), key=lambda x:x[1], reverse=True)[0:10])
print(len(gplay_video_players))

[('YouTube', 1000000000.0), ('Google Play Movies & TV', 1000000000.0), ('MX Player', 500000000.0), ('Motorola Gallery', 100000000.0), ('VLC for Android', 100000000.0), ('Dubsmash', 100000000.0), ('VivaVideo - Video Editor & Photo Movie', 100000000.0), ('VideoShow-Video Editor, Video Maker, Beauty Camera', 100000000.0), ('Motorola FM Radio', 100000000.0), ('Vote for', 50000000.0)]
159


Similarly for `VIDEO_PLAYERS`, YouTube and Google Play Movies & TV heavily skew the average.

In [122]:
gplay_social = {}
for app in gplay_free:

    if app[1] == "SOCIAL":
        name = app[0]
        count_installs = app[5]
        count_installs = count_installs.replace("+", "")
        count_installs = count_installs.replace(",", "")
        if name not in gplay_social:
            gplay_social[name] = float(count_installs)
            
print(sorted(gplay_social.items(), key=lambda x:x[1], reverse=True)[0:10])
print(len(gplay_social))

[('Facebook', 1000000000.0), ('Google+', 1000000000.0), ('Instagram', 1000000000.0), ('Facebook Lite', 500000000.0), ('Snapchat', 500000000.0), ('Tumblr', 100000000.0), ('Pinterest', 100000000.0), ('Badoo - Free Chat & Dating App', 100000000.0), ('Tango - Live Video Broadcast', 100000000.0), ('LinkedIn', 100000000.0)]
236


Similarly for `SOCIAL`, Facebook, Google+, Instagram, and Facebook Lite apps have over 1 billion installs.

To compare with the Apple Store, we will want to explore the categories `BOOKS_AND_REFERENCE` and `MAPS_AND_NAVIGATION` on the Google Play Store.

In [126]:
gplay_navigation = {}
total = 0
for app in gplay_free:

    if app[1] == "MAPS_AND_NAVIGATION":
        name = app[0]
        count_installs = app[5]
        count_installs = count_installs.replace("+", "")
        count_installs = count_installs.replace(",", "")
        if name not in gplay_navigation:
            gplay_navigation[name] = float(count_installs)
            
print(sorted(gplay_navigation.items(), key=lambda x:x[1], reverse=True)[0:10])
#print(sorted(gplay_navigation.items(), key=lambda x:x[1], reverse=True))
print(len(gplay_navigation))

[('Waze - GPS, Maps, Traffic Alerts & Live Navigation', 100000000.0), ('Uber', 100000000.0), ('GPS Navigation & Offline Maps Sygic', 50000000.0), ('Free GPS Navigation', 50000000.0), ('MapQuest: Directions, Maps, GPS & Navigation', 10000000.0), ('Yahoo! transit guide free timetable, operation information, transfer search', 10000000.0), ('Yandex.Transport', 10000000.0), ('Compass', 10000000.0), ('Subway Terminator: Smarter Subway', 10000000.0), ('Moovit: Bus Time & Train Time Live Info', 10000000.0)]
124


The `Navigation` category has a few heavy hitters. Any new disrupters or innovative features can most likely be implemented in a future update in any one of these large apps.

In [121]:
gplay_reference = {}
total = 0
for app in gplay_free:

    if app[1] == "BOOKS_AND_REFERENCE":
        name = app[0]
        count_installs = app[5]
        count_installs = count_installs.replace("+", "")
        count_installs = count_installs.replace(",", "")
        if name not in gplay_reference:
            gplay_reference[name] = float(count_installs)
            
print(sorted(gplay_reference.items(), key=lambda x:x[1], reverse=True)[0:10])
print(len(gplay_reference))

[('Google Play Books', 1000000000.0), ('Bible', 100000000.0), ('Amazon Kindle', 100000000.0), ('Wattpad 📖 Free Books', 100000000.0), ('Audiobooks from Audible', 100000000.0), ('Wikipedia', 10000000.0), ('Cool Reader', 10000000.0), ('FBReader: Favorite Book Reader', 10000000.0), ('HTC Help', 10000000.0), ('Moon+ Reader', 10000000.0)]
190


There are several apps in the `BOOKS_AND_REFERENCE` category that dominate this market. However, what is significant is the number of similar apps that have translated a certain text or made it available in a different region. There are quite a few different dictionaries and Al Quran texts with regions tied to them. With 100 million installs for the Bible alone, a market can potentially be created in the `Books` or `References` categories by creating an app for a popular book or other religious texts with in-house multiple language translations and regional differences.

We want to cross analyze book and reference apps on the Apple Store.

In [139]:
for app in ios_free:
    if app[11] == "Book":
        print(f"{app[1]}: {app[5]}")

Kindle – Read eBooks, Magazines & Textbooks: 252076
Audible – audio books, original series & podcasts: 105274
Color Therapy Adult Coloring Book for Adults: 84062
OverDrive – Library eBooks and Audiobooks: 65450
HOOKED - Chat Stories: 47829
BookShout: Read eBooks & Track Your Reading Goals: 879
Dr. Seuss Treasury — 50 best kids books: 451
Green Riding Hood: 392
Weirdwood Manor: 197
MangaZERO - comic reader: 9
ikouhoushi: 0
MangaTiara - love comic reader: 0
謎解き: 0
謎解き2016: 0


In [140]:
for app in ios_free:
    if app[11] == "Reference":
        print(f"{app[1]}: {app[5]}")

Bible: 985920
Dictionary.com Dictionary & Thesaurus: 200047
Dictionary.com Dictionary & Thesaurus for iPad: 54175
Google Translate: 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran: 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition: 17588
Merriam-Webster Dictionary: 16849
Night Sky: 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE): 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools: 4693
GUNS MODS for Minecraft PC Edition - Mods Tools: 1497
Guides for Pokémon GO - Pokemon GO News and Cheats: 826
WWDC: 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free: 718
VPN Express: 14
Real Bike Traffic Rider Virtual Reality Glasses: 8
教えて!goo: 0
Jishokun-Japanese English Dictionary & Translator: 0


## Conclusion

The Bible has 100 million installs on the Google Play Store and close to 1 million installs on the Apple Store. Similarly, another religious text Quran has multiple versions for Android phones and about 18,000 downloads on iOS phones. It appears there may be a market here to design a similar app that can house multiple language translations and regional location differences. Some examples may be a popular book or religious text. The lead time may fall short for a popular but recently released book so one approach could be to create an app for one of the former two and then develop a template that can be applied to the latter. In the case of a religious text, implementations can be made within the app to let the user set daily or weekly reminders and alarms for prayer or any user-entered function. For a popular book, the app can incorporate note-taking funtions if applicable.

While there exist heavy hitting apps on both platforms today, there are still opportunities to carve out a niche in some lesser volume categories. To fulfill one of the needs in developing an app which a user would be expected to interact with routinely, we proposed developing a companion app for massively popular books or religious texts. Introducing additional features like having multiple languages or countries to choose from to transform the text can help to shift users from language- or area-specific apps to more of a scaled-up one-stop shop.

The next step would be to research the smaller apps that currently exist in the market and compile a list of the features that make them unique. What do they currently offer that other similar apps don't have? Do they have limitations that can be removed if scaled up? What features are the users wanting to gain access to by downloading one particular app versus another one?