<h1> AppStore & Google Store Application Analysis </h1> 

Author: 
- Sahil Klaeb 
- sahilklaeb@gmail.com 
 
Significance of Study:  
- In the realm of mobile, tablet and desktop applications, both the Apple AppStore and Google's Google Play are both leaders in the application landscape. Both providers carry identical applications and those that are custom to their own platforms. 

- In addition, most of the applications on these stores are free. I want to analyze revenue and profitability of these free applications via in-application adds. 

- The logic in calculating revenue from free application stems from the fact that because adds are the number 1 revenue generator for profitability, we can deduce that the revenue for any given application is influenced by the number of customers who use the application. 

Goal of Study: 
- After forming our analysis, the goal of this study is to truly help our developers understand what type of applications attract more users, in turn, prioritizing our Product, Engineering, Operations and Communication verticals to target these types of applications.

A Note from the author: 
- Often times, individuals using Python will utilize the power of additional libraries such as Pandas, Numpy, Matplotlib, Seaborn, Scikit-Learn etc. In the case of this project, though it would be helpful to utilize some of these libraries, I will focus on utilizing the generic "stock" Python language. 
- In my experience, it is good to rely on traditional Python every once in a while so one is not so dependent on the various libraries.

In [4]:
from csv import reader
opened_file_as = open('AppleStore.csv')
opened_file_gs = open('googleplaystore.csv')
read_file_as = reader(opened_file_as)
read_file_gs = reader(opened_file_gs)

#Dataset/Dataframe 
apple_store_data = list(read_file_as)
google_store_data = list(read_file_gs)

<h1> Function to Explore the Two Datasets </h1> 

My function will contain four parameters: 
1. dataset - lists of lists 
2. start and end - which are both expected to be integers and represents the starting and the ending indices of a slice from the data 
3. rows_and_columns - which is a Boolean and has a False as the default argument 

Outside of the parameters, the specific code logic I will be using will be: 
1. Slicing the dataset using dataset[start:end] 
2. Loops through the slice, and for each iteration, prints a row and adds a new line after that using print('/n') 
3. Prints the numbers of rows and columns if rows_and_columns is True 


In [5]:
def explore_data(dataset, start, end, rows_and_column=False): 
    dataset_slice = dataset[start:end]
    for row in dataset_slice: 
        print(row)
        print('\n')
    if rows_and_column: 
        print('Number of rows:', len(dataset)) 
        print('Number of columns:', len(dataset[0]))
        

In [15]:
print(apple_store_data[0])
#For more descriptions on the column classifications, please visit 
# https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home#appleStore_description.csv

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [9]:
#These are the first five rows the apple store data including the header 
explore_data(apple_store_data,0,5,True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


Number of rows: 7198
Number of columns: 17


In [17]:
print(google_store_data[0])
#For more description on the Google Play Store Apps Columns, please visit: 
#https://www.kaggle.com/lava18/google-play-store-apps/home

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [12]:
explore_data(google_store_data,0,5,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10842
Number of columns: 13


<h3> Data Cleaning </h3> 

In the discussion sections of Kaggle, it is clearly evident that some of the data is filled with errors. For example, there is a problem with the row 10473 of the data. It is innacurate, and so I will not take this into account in my analysis

In [18]:
del google_store_data[10473]

<h3> Duplicate Values </h3> 

Checking for duplicate values. I will take a popular application name, and see if it contains duplicate. This will help me affirm whether or not I will have to further clean the data to remove duplicate values. 

In [50]:
#Apple Store Data 
#Checking to see if the Apple Store Data might have duplicates for a popular app 
for apps in apple_store_data: 
    name = apps[2]
    if name == 'Facebook': 
        print(apps)

['17', '284882215', 'Facebook', '389879808', 'USD', '0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


In [51]:
#Apple Store Data Duplicate Count 
duplicate_values_as = [] 
unique_values_as = [] 

for apps in apple_store_data: 
    name = apps[2]
    if name in unique_values_as: 
        duplicate_values_as.append(name)
    else: 
        unique_values_as.append(name)

print('Number of Duplicate Applications:', len(duplicate_values_as))
print('\n')
print('Examples of Duplicate Applications:', duplicate_values_as[:20])

Number of Duplicate Applications: 2


Examples of Duplicate Applications: ['VR Roller Coaster', 'Mannequin Challenge']


In [176]:
duplicate_apps_as = []
unique_apps_as = [] 

for app in apple_store_data: 
    name = app[2]
    
    if name in unique_apps_as: 
        duplicate_apps_as.append(app)
    else: 
        unique_apps_as.append(app)


In [177]:
duplicate_apps_as 

[]

In [151]:
duplicate_apple_store_apps = 2 
entire_apple_store_dataset = len(apple_store_data[1:])
new_target = entire_apple_store_dataset - duplicate_apple_store_apps
print(new_target)

7195


In [148]:
apple_store_data 

[['',
  'id',
  'track_name',
  'size_bytes',
  'currency',
  'price',
  'rating_count_tot',
  'rating_count_ver',
  'user_rating',
  'user_rating_ver',
  'ver',
  'cont_rating',
  'prime_genre',
  'sup_devices.num',
  'ipadSc_urls.num',
  'lang.num',
  'vpp_lic'],
 ['1',
  '281656475',
  'PAC-MAN Premium',
  '100788224',
  'USD',
  '3.99',
  '21292',
  '26',
  '4',
  '4.5',
  '6.3.5',
  '4+',
  'Games',
  '38',
  '5',
  '10',
  '1'],
 ['2',
  '281796108',
  'Evernote - stay organized',
  '158578688',
  'USD',
  '0',
  '161065',
  '26',
  '4',
  '3.5',
  '8.2.2',
  '4+',
  'Productivity',
  '37',
  '5',
  '23',
  '1'],
 ['3',
  '281940292',
  'WeatherBug - Local Weather, Radar, Maps, Alerts',
  '100524032',
  'USD',
  '0',
  '188583',
  '2822',
  '3.5',
  '4.5',
  '5.0.0',
  '4+',
  'Weather',
  '37',
  '5',
  '3',
  '1'],
 ['4',
  '282614216',
  'eBay: Best App to Buy, Sell, Save! Online Shopping',
  '128512000',
  'USD',
  '0',
  '262241',
  '649',
  '4',
  '4.5',
  '5.10.0',
  '12+'

In [149]:
reviews_max_as = {} 

for app in apple_store_data[1:]: 
    name = app[2]
    n_reviews = float(app[6])
    
    if name in reviews_max_as and reviews_max_as[name]<n_reviews: 
        reviews_max_as[name] = n_reviews
    elif name not in reviews_max_as: 
        reviews_max_as[name] = n_reviews

breviews_max_as

{'PAC-MAN Premium': 21292.0,
 'Evernote - stay organized': 161065.0,
 'WeatherBug - Local Weather, Radar, Maps, Alerts': 188583.0,
 'eBay: Best App to Buy, Sell, Save! Online Shopping': 262241.0,
 'Bible': 985920.0,
 'Shanghai Mahjong': 8253.0,
 'PayPal - Send and request money safely': 119487.0,
 'Pandora - Music & Radio': 1126879.0,
 'PCalc - The Best Calculator': 1117.0,
 'Ms. PAC-MAN': 7885.0,
 'Solitaire by MobilityWare': 76720.0,
 'SCRABBLE Premium': 105776.0,
 'Google – Search made just for mobile': 479440.0,
 'Bank of America - Mobile Banking': 119773.0,
 'FreeCell': 6340.0,
 'TripAdvisor Hotels Flights Restaurants': 56194.0,
 'Facebook': 2974676.0,
 'Yelp - Nearby Restaurants, Shopping & Services': 223885.0,
 'Shazam - Discover music, artists, videos & lyrics': 402925.0,
 'Crash Bandicoot Nitro Kart 3D': 31456.0,
 'iQuran': 2929.0,
 ':) Sudoku +': 11447.0,
 'Yahoo Sports - Teams, Scores, News & Highlights': 137951.0,
 'Mileage Log | Fahrtenbuch': 8.0,
 'Cleartune - Chromatic T

In [150]:
len(reviews_max_as)

7195

In [174]:
ios_cleaned = [] 
already_added_ios = [] 

for app in apple_store_data[1:]: 
    name=app[2]
    n_reviews = app[6]
    
    if (n_reviews == reviews_max_as[name]) and (name not in already_added_ios): 
        ios_cleaned.append(app)
        already_added_ios.append(name) 

In [31]:
#By conducting a quick check, we know 
for app in google_store_data: 
    name = app[0]
    if name =='Instagram': 
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [35]:
duplicate_apps_gs = []
unique_apps_gs = []

for apps in google_store_data: 
    name = apps[0]
    if name in unique_apps_gs: 
        duplicate_apps_gs.append(name)
    else: 
        unique_apps_gs.append(name)

print('Duplicate Apps Count:', len(duplicate_apps_gs))
print('\n')
print('Examples of Duplicate Apps:', duplicate_apps_gs[:20])

Duplicate Apps Count: 1181


Examples of Duplicate Apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express']


Since there are 1181 duplicate applications, I will need to not include these in the analysis. Thus, it becomes clear that I will have to re-adjust the number of applications (scope to include) within the analysis  

In [61]:
duplicate_app_count_gs = 1181 
adjusted_length_google_ds = len(google_store_data[1:])- duplicate_app_count_gs 
print(adjusted_length_google_ds)

9659


<h3> Removing Duplicates Strategically </h3> 

Rather than removing duplicate rows at random, I would like to devise a small but effective strategy - for the purpose of this analysis - to remove the duplicate values. In checking the above example for duplicates with the name Instagram, it becomes clearly evident that the main difference between the duplicate values is the number of reviews (4th column). 

That makes sense. The dataset contains multiple duplicate values for Instagram due to the fact  that multiple entries were made after the number of reviews was updated. Following the same line of logic, it would be logical to conclude that the higher the number of ratings in our duplicate columns, the more recent the data is - which is what we should be using to carry out our analysis. 

In [75]:
#Removing the Duplicates via Dictionary Method: 
#Each Dictionary Key will be a unique Application Name 
#Each Value will be it's highest rating count 
reviews_max_gs = {}
for app in google_store_data[1:]: 
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max_gs and reviews_max_gs[name]<n_reviews: 
        reviews_max_gs[name] = n_reviews
    elif name not in reviews_max_gs: 
        reviews_max_gs[name] = n_reviews

        
#Pseudo Code
#If the name of the application exists in the dictionary: this will create the dictionary
#The key will be application name and if the value of what already exists for each name is less than 
#The number of reviews 

In [74]:
reviews_max_gs

{'Photo Editor & Candy Camera & Grid & ScrapBook': 159.0,
 'Coloring book moana': 967.0,
 'U Launcher Lite – FREE Live Cool Themes, Hide Apps': 87510.0,
 'Sketch - Draw & Paint': 215644.0,
 'Pixel Draw - Number Art Coloring Book': 967.0,
 'Paper flowers instructions': 167.0,
 'Smoke Effect Photo Maker - Smoke Editor': 178.0,
 'Infinite Painter': 36815.0,
 'Garden Coloring Book': 13791.0,
 'Kids Paint Free - Drawing Fun': 121.0,
 'Text on Photo - Fonteee': 13880.0,
 'Name Art Photo Editor - Focus n Filters': 8788.0,
 'Tattoo Name On My Photo Editor': 44829.0,
 'Mandala Coloring Book': 4326.0,
 '3D Color Pixel by Number - Sandbox Art Coloring': 1518.0,
 'Learn To Draw Kawaii Characters': 55.0,
 'Photo Designer - Write your name with shapes': 3632.0,
 '350 Diy Room Decor Ideas': 27.0,
 'FlipaClip - Cartoon animation': 194216.0,
 'ibis Paint X': 224399.0,
 'Logo Maker - Small Business': 450.0,
 "Boys Photo Editor - Six Pack & Men's Suit": 654.0,
 'Superheroes Wallpapers | 4K Backgrounds': 

<h3> Affirming our Dictionary Contains no Duplicate Values </h3> 

Upon knowing that our target number of items for the dataset (after removing duplicates) is 9659, the above process creates a dictionary where each key is a unique application name, and the value is the highest number of reviews of that application. 

Because the length of the dictionary below matches that of our target, I was successfully able to remove the duplicate values from the data. 

In [60]:
#Inspecting the Dictionary 
print(len(reviews_max_gs))


9659


In [78]:
android_cleaned = [] #This list will store the new cleaned data 
already_added = [] #This list will store the names of the applications only. 

for app in google_store_data[1:]: 
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews ==reviews_max_gs[name]) and (name not in already_added): 
        android_cleaned.append(app)
        already_added.append(name)

In [79]:
#Explore this cleaned data: 
explore_data(android_cleaned, 0,5,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


In [83]:
duplicate_values_android_cleaned = [] 
unique_values_android_cleaned = []

for app in android_cleaned: 
    name = app[0]
    if name in unique_values_android_cleaned: 
        duplicate_values_android_cleaned.append(app)
    else: 
        unique_values_android_cleaned.append(app)
print('Duplicate values Updated:', len(duplicate_values_android_cleaned))
print(" ")
print("Now, I have affirmed that there are truly no duplicate values in the android application dataset.")

Duplicate values Updated: 0
 
Now, I have affirmed that there are truly no duplicate values in the android application dataset


<h3> Data Cleaning </h3> 

Another extremely relevant application will be cleaning the dataset and removing applications created for a Non-English audience. Since, I am only interested in English applications - as this is the focus of development for our company - I will need to remve the Non-English applications from both of the datasets. 

In order to remove the Non-English applications, I will be testing each iterable within the Name of an Application to test whether or not the Application is intended for an English audience. 

Thus, I will be utilizing the <b> ord </b> function to carry out this process. I will start by creating a definition called - english_detector. 

english_detector will return a boolean value (True) if the name of the application is, indeed, English and english_detector will also return another boolean value (False) if the name of the application is non-english. 

According to the American Standard Code for Information Interchange, the numbers corresponding to the characters we commonly use in an English text are within the range of 0 to 127. 

In [103]:
def english_detector(string): 
    for character in string: 
        if ord(character) > 127: 
            return False 
    return True 

In [106]:
#Testing the english_detector function
non_english_character = '爱奇艺PPS -《欢乐颂2》电视剧热播'
english_character = 'Spotify'
english_string_with_symbols = 'Docs To Go™ Free Office Suite'
english_string_with_emojis = 'Instachat 😜'
print(english_detector(non_english_character))
print(english_detector(english_character))
print(english_detector(english_string_with_symbols))
print(english_detector(english_string_with_emojis))

False
True
False
False


Though the definition above correctly identifies plain text english characters, it does not handle english strings with symbols nor does it handle english strings with emojis. Thus, if I were to use the above function, I would actually end up using useful data since many English apps with special characters might be labeleed as non-English. 

I will have to make a few adjustments. To minimize the impact of data loss, I'll only remove an app if it's name has more than three characters falling outside of the ASCII range of 127. 

Therefore, all English apps with up to three emojis or special characters will still be labeled as English apps. 

In [183]:
def english_detector(string): 
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

In [184]:
print(english_detector(non_english_character))
print(english_detector(english_character))
print(english_detector(english_string_with_symbols))
print(english_detector(english_string_with_emojis))

False
True
True
True


In [189]:
android_english = []
ios_english = []

for app in android_cleaned:
    name = app[0]
    if english_detector(name):
        android_english.append(app)
        
for app in apple_store_data:
    name = app[2]
    if english_detector(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'E

Isolating Free Apps
Basically this is filter savagery without Pandas using Pythonifius 

In [198]:
android_free = [] 
ios_free = [] 

for app in android_english: 
    name = app[0]
    free_status = app[6]
    if free_status == 'Free': 
        android_free.append(app)


In [199]:
explore_data(android_free,0,5,True)
print(" ")


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8863
Number of columns: 13
 


In [200]:
ios_free = [] 

for app in ios_english: 
    name = app[2]
    free_status = app[5]
    if free_status == '0': 
        ios_free.append(app)




In [201]:
explore_data(ios_free,0,5,True)

['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


['5', '282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


['7', '283646709', 'PayPal - Send and request money safely', '227795968', 'USD', '0', '119487', '879', '4', '4.5', '6.12.0', '4+', 'Finance', '37', '0', '19', '1']


Number of rows: 3222
Number of columns: 17


<h3> Checkpoint: Cleaning Complete </h3>

Thus far within the dataset, I have spent a majority of the time cleaning the data and narrowing the scope of our focus to two data frames - ios_free and android_free. 

The Steps Completed Thus Far: 
- Removing Innaccurate Data 
- Removing Duplicate Applications 
- Mantaining Strategic Balance While Removing Duplicates 
- Removing Irrelevant Applications (Such as Non-English Apps) 
- Isolating Free Apps as we want to assess the impact of in-application advertisemesnts 

<h3> Goal Moving Forward </h3> 

The aim from this point onwards in the study is to determine the type of applications that are likely to attract more users. Because revenue for "free" applications is directly correlated to the number of users for the application, this will be an integral part of our focus. 

In any business, the goal is to minimize risk and overhead costs associated with any given process. In this case, I would like to mention that the overall process is Application Development. Thus, in order to minimize costs and mitigate risk, the validation strategy I am following is based off of the following three assumptions: 
1. The company that is conducting this study builds a minimal Android MVP of the application and then adds this iteration to Google Play. 
2. If the application has a good response from users, the company develops it further and iterates on its findings. 
3. If the application then yields profitability within six month from positive response time, the company will focus on building an iOS version of the application and add this to the Apple Store. 

 Since the end goal for the company I am consulting is to add the application to both the Goole Play and Apple App Store, I will need to detect application types that are successful on both platforms. 

<h3> Analyzing Application Genres By Platform </h3> 

Building Two Functions to help this process: 
1. One function to generate frequency tbales that show percentages 
2. Another function that we can use to display the percentages in Descending Order

In [213]:
def freq_table(dataset,index): 
    table = {} 
    total = 0 
    for row in dataset: 
        total+= 1
        value = row[index]
        if value in table: 
            table[value]+=1 
        else: 
            table[value]=1
            
    table_percentages = {} 
    for key in table: 
        percentage = (table[key]/total)*100
        table_percentages[key] = percentage 
    return table_percentages 

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [214]:
display_table(ios_free,-5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


<h3> Takeaway from Apple Store Genres </h3> 

It is clearly evident that from the free English applications in the Apple App Store, more than half (58.16%) are games. Entertainment apps comprise of the next biggest bucket at 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education. Lastly, social networking apps account for 3.29% of the toal applications in the dataset.  

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users.

In [215]:
display_table(android_free,1)

FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

<h3> Takeaway: Google Store Categories </h3> 

When analyzing the frequency distribution of Google Play applications, it becomes clear that it is quite different from the distribution of that of the Apple App Store. 

Unlike the AppStore, there are not as many applications designed for fun. In fact, a good number of applications are designed for practical purposes(family, tools, business, lifestyle, productivity, etc.). However, looking at this deeper, the family category (which accounts for almost 19% of the apps) means mostly games for kids. 

Nevertheless, practical applications do seem to have a better representation on Google Play compared to App Store. 

In [216]:
display_table(android_free,9)

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

<h3> Takeaway: Google Store Continued </h3> 

Within the Google Store dataset, it becomes clear that there is another column labeled as "genres". Though this column is labeled the same as the Apple Store dataset, it is much more granular than the "category" column which makes it more difficult to make broader insights. For the purpose of this study, "category" does a much better job in comparing with the Apple Store data that is in the dataset. 

In [218]:
apple_store_data[0:10]

[['',
  'id',
  'track_name',
  'size_bytes',
  'currency',
  'price',
  'rating_count_tot',
  'rating_count_ver',
  'user_rating',
  'user_rating_ver',
  'ver',
  'cont_rating',
  'prime_genre',
  'sup_devices.num',
  'ipadSc_urls.num',
  'lang.num',
  'vpp_lic'],
 ['1',
  '281656475',
  'PAC-MAN Premium',
  '100788224',
  'USD',
  '3.99',
  '21292',
  '26',
  '4',
  '4.5',
  '6.3.5',
  '4+',
  'Games',
  '38',
  '5',
  '10',
  '1'],
 ['2',
  '281796108',
  'Evernote - stay organized',
  '158578688',
  'USD',
  '0',
  '161065',
  '26',
  '4',
  '3.5',
  '8.2.2',
  '4+',
  'Productivity',
  '37',
  '5',
  '23',
  '1'],
 ['3',
  '281940292',
  'WeatherBug - Local Weather, Radar, Maps, Alerts',
  '100524032',
  'USD',
  '0',
  '188583',
  '2822',
  '3.5',
  '4.5',
  '5.0.0',
  '4+',
  'Weather',
  '37',
  '5',
  '3',
  '1'],
 ['4',
  '282614216',
  'eBay: Best App to Buy, Sell, Save! Online Shopping',
  '128512000',
  'USD',
  '0',
  '262241',
  '649',
  '4',
  '4.5',
  '5.10.0',
  '12+'

In [223]:
genres_ios = freq_table(ios_free, 12)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[12]
        if genre_app == genre:            
            n_ratings = float(app[6])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Productivity : 21028.410714285714
Weather : 52279.892857142855
Shopping : 26919.690476190477
Reference : 74942.11111111111
Finance : 31467.944444444445
Music : 57326.530303030304
Utilities : 18684.456790123455
Travel : 28243.8
Social Networking : 71548.34905660378
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Games : 22788.6696905016
Food & Drink : 33333.92307692308
News : 21248.023255813954
Book : 39758.5
Photo & Video : 28441.54375
Entertainment : 14029.830708661417
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Navigation : 86090.33333333333
Medical : 612.0
Catalogs : 4004.0


On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:



In [241]:
#Navigational Applications iOS App Store 
for app in ios_free: 
    if app[12] =='Navigation': 
        print(app[2],':',app[6])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Geocaching® : 12811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS – Car Navigation & Offline Maps : 3582
Google Maps - Navigation & Transit : 154911


In [245]:
#Social Networking Applications iOS App Store 
for app in ios_free: 
    if app[12] =='Social Networking': 
        print(app[2],':',app[6])

Facebook : 2974676
LinkedIn : 71856
Skype for iPhone : 373519
Tumblr : 334293
Match™ - #1 Dating App. : 60659
WhatsApp Messenger : 287589
TextNow - Unlimited Text + Calls : 164963
Grindr - Gay and same sex guys chat, meet and date : 23201
imo video calls and chat : 18841
Ameba : 269
Weibo : 7265
Badoo - Meet New People, Chat, Socialize. : 34428
Kik : 260965
Qzone : 1649
Fake-A-Location Free ™ : 354
Tango - Free Video Call, Voice and Chat : 75412
MeetMe - Chat and Meet New People : 97072
SimSimi : 23530
Viber Messenger – Text & Call : 164249
Find My Family, Friends & iPhone - Life360 Locator : 43877
Weibo HD : 16772
POF - Best Dating App for Conversations : 52642
GroupMe : 28260
Lobi : 36
WeChat : 34584
ooVoo – Free Video Call, Text and Voice : 177501
Pinterest : 1061624
知乎 : 397
Qzone HD : 458
Skype for iPad : 60163
LINE : 11437
QQ : 9109
LOVOO - Dating Chat : 1985
QQ HD : 5058
Messenger : 351466
eHarmony™ Dating App - Meet Singles : 11124
YouNow: Live Stream Video Chat : 12079
Cougar 

In [249]:
#Music Applications iOS App Store 
for app in ios_free: 
    if app[12] =='Music': 
        print(app[2],':',app[6])

Pandora - Music & Radio : 1126879
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
Deezer - Listen to your Favorite Music & Playlists : 4677
Sonos Controller : 48905
NRJ Radio : 38
radio.de - Der Radioplayer : 64
Spotify Music : 878563
SoundCloud - Music & Audio : 135744
Sing Karaoke Songs Unlimited with StarMaker : 26227
SoundHound Song Search & Music Player : 82602
Ringtones for iPhone & Ringtone Maker : 25403
Coach Guitar - Lessons & Easy Tabs For Beginners : 2416
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Magic Piano by Smule : 131695
QQ音乐HD : 224
The Singing Machine Mobile Karaoke App : 130
Bandsintown Concerts : 30845
PetitLyrics : 0
edjing Mix:DJ turntable to remix and scratch music : 13580
Smule Sing! : 119316
Amazon Music : 106235
AutoRap by Smule : 18202
My Mixtapez Music : 26286
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
Napster - Top Music 

The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

The goal is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold. 

Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:

In [250]:
for app in ios_free: 
    if app[12] == 'Reference': 
        print(app[2],':',app[6])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


However, this niche seems to show some potential. One thing our business could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, our business could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.

This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.

Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:
- Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

- Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.

- Finance apps — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.



<h3> Analyzing the Google Play Store </h3> 

In [252]:
display_table(android_free, 5) # the Installs columns


1,000,000+ : 15.728308699086089
100,000+ : 11.55365000564143
10,000,000+ : 10.549475346947986
10,000+ : 10.199706645605326
1,000+ : 8.394448832223853
100+ : 6.916393997517771
5,000,000+ : 6.826131106848697
500,000+ : 5.562450637481666
50,000+ : 4.772650344127271
5,000+ : 4.513144533453684
10+ : 3.542818458761142
500+ : 3.2494640640866526
50,000,000+ : 2.3017037120613786
100,000,000+ : 2.1324607920568655
50+ : 1.9180864267178157
5+ : 0.7898002933543946
1+ : 0.5077287600135394
500,000,000+ : 0.270788672007221
1,000,000,000+ : 0.2256572266726842
0+ : 0.045131445334536835


One problem with this data is that is not precise. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to get an idea which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

To perform computations, however, we'll need to convert each install number to float — this means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category).

In [255]:
categories_android = freq_table(android_free, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

In [257]:
for app in android_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [259]:
under_100_m = []

for app in android_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

The pattern is the same for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and the primary goal is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.


In [260]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])


E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [262]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


However, it looks like there are only a few very popular apps, so this market still shows potential. Now, I will look at some applications that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [263]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

Also, there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.



<h3> Conclusions </h3> 


In this project, the primary objective was to analyze data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

In conclusion, a good strategy might be to take a popular book (perhaps a more recent book) and turn it into an app that could be profitable for both the Google Play and the Apple App Store markets. The markets are already full of libraries, so it is imperative to note that in order for this strategy to be successful, our developers will need to add some special features besides just the raw version of the book. As previously mentioned, this could look something like the following: 
- daily quotes from the book 
- an audio version of the book 
- a forum where people can discuss the book 
- any other special features 
