# Profitable App Profiles for the App Store and Google Play Markets

If an app-development company wishes to build a business model around building free apps and selling in-app adverts, then the company needs to know which apps are likely to attract the most users, as this is what increases the company's income. Therefore, the goal of this project is to determine what sort of apps the company should focus its efforts on developing to boost profits.

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play. Here, we'll try to analyze a sample of the data.

Data relating to the google play store can be found [here](https://www.kaggle.com/lava18/google-play-store-apps).  
Data relating to apple store can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

### Import the data

In [1]:
from csv import reader
# The Google Play data set
opened_file = open('Data/google-play-store-apps/googleplaystore.csv', encoding='utf8')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

# The App Store data set
opened_file = open('Data/app-store-apple-data-set-10k-apps/AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

### Explore the data
Here we define a function for looking at the data. In this project, we won't be using the pandas package and will explore and analyse data by parsing it as lists of lists. It is noted that pandas is designed to make this sort of analysis easier.  
Information and descriptions about each column can be found on the Kaggle website by following the links given above (taking you to the datasets).

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
print('Android data')
explore_data(android,0,5, True)
print('\niOS data')
explore_data(ios,0,5, True)

Android data
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13

iOS data
['1', '281

In [4]:
print('Android', android_header)
print('iOS', ios_header)

Android ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
iOS ['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


## Data Cleaning
It was reported on the discussion blog that the Android dataset has an error in row 10472. We can inspect this using the `explore_data` function on the surrounding rows.


In [5]:
explore_data(android,10471,10474,True)

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


Number of rows: 10841
Number of columns: 13


Indeed, we see that there is an error. There should be 13 columns for each row, but there are only 12 for the 'Life Made WI-Fi Touchscreen Photo Frame' app. On inspection, we see that the 'Category' column is missing.

We have two options:  

<ol>
<li>Delete the row</li>  
<li>Insert the correct category ('Lifestyle') into the row</li>
</ol>

For ease, we will delete the entire row using the `del` statement.


In [6]:
del android[10472] # only run once

In [7]:
explore_data(android,10471,10474,True)

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


['Sat-Fi Voice', 'COMMUNICATION', '3.4', '37', '14M', '1,000+', 'Free', '0', 'Everyone', 'Communication', 'November 21, 2014', '2.2.1.5', '2.2 and up']


Number of rows: 10840
Number of columns: 13


Note that after running the `explore_data` on the same rows, the 'Life Made WI-Fi Touchscreen Photo Frame' app has been removed.

### Duplicates
If we run the code below, we see that the instagram app appears 4 times in the android dataset.

In [8]:
for app in android:
    if app[0]=='Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [9]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print(f'Number of duplicate apps: {len(duplicate_apps)}')
print(f'Number of unique apps: {len(unique_apps)}')


Number of duplicate apps: 1181
Number of unique apps: 9659


Running the code above creates two list of apps: one for unique apps where every app is listed once, and one for duplicate apps, where the names of the apps which are repeated are listed.  
It is noted that for the example of duplicates given for instagram, the rows only varied in the fourth column (number of ratings given). We can use this criteria to remove the duplicates. We can choose to keep only the row with the highest number of reviews.

To remove the duplicates in this way, we shall loop over the apps once and store the number of maximum number of reviews and the app names in a dictionary.
 

In [10]:
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max:
        reviews_max[name] = max(reviews_max[name],n_reviews)
    else:
        reviews_max[name] = n_reviews

In [11]:
reviews_max['Instagram']

66577446.0

As you can see, when we look at the app 'Instagram' in the `reviews_max` dictionary, the maximum number of reviews for the list of four rows called in the original dataset is given.

We can create a new list of lists which contains just the rows with the maximum reviews for the unique app. This is done below and stored in a list of lists called android_clean. Another list, already_added, is used to keep track of which apps have already been added to the android_clean list.

In [12]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

We can use the explore_data function to check that our cleaned data is what we expected. Note that we set the `rows_and_columns` argument to True. This tells us that we have 9,659 rows. This is what we expected - the same as the number uniquely named apps.

In [13]:
explore_data(android_clean,0,5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


## Apple App Store Data

There should be no duplicate apps in the Apple app store data. However, on inspection, we see that there are some duplicates. These can be removed by choosing the most recent version. Since version numbers often have multiple decimal points, we cannot compare them as floats or integers. Luckly, we can still compare the strings, since '1.0.1' > '0.1.2' and '1.4.1' > '1.1.3'
 
Note that there are a different number of columns, and the column numbers do not correspond to the Android store.

### Duplicates

In [14]:
ios_duplicates = []
ios_unique = []

for app in ios:
    name = app[2]
    if name in ios_unique:
        ios_duplicates.append(name)
    else:
        ios_unique.append(name)

print(f'Length of iOS dataset: {len(ios)}')
print(f'Number of unique apps: {len(ios_unique)}')
print(f'Number of duplicates: {len(ios_duplicates)}')

print('Duplicates',ios_duplicates)

Length of iOS dataset: 7197
Number of unique apps: 7195
Number of duplicates: 2
Duplicates ['VR Roller Coaster', 'Mannequin Challenge']


In [15]:
print('iOS columns', ios_header, '\n') # N.B. 'track_name' == 'App Name'
for app in ios:
    name = app[2]
    if name in ios_duplicates:
        print(app)

iOS columns ['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['4000', '952877179', 'VR Roller Coaster', '169523200', 'USD', '0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['7579', '1089824278', 'VR Roller Coaster', '240964608', 'USD', '0', '67', '44', '3.5', '4', '0.81', '4+', 'Games', '38', '0', '1', '1']
['10751', '1173990889', 'Mannequin Challenge', '109705216', 'USD', '0', '668', '87', '3', '3', '1.4', '9+', 'Games', '37', '4', '1', '1']
['10885', '1178454060', 'Mannequin Challenge', '59572224', 'USD', '0', '105', '58', '4', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']


In [16]:
lastest_version = {}

for app in ios:
    name = app[2]
    version = app[10]
    if name in lastest_version:
        lastest_version[name] = max(lastest_version[name], version)
    else:
        lastest_version[name] = version

print(f'Number of apps in new dictionary: {len(lastest_version)}')

Number of apps in new dictionary: 7195


In [17]:
ios_part_clean = []
ios_already_added = []

for app in ios:
    name = app[2]
    version = app[10]
    if lastest_version[name] == version and name not in ios_already_added:
        ios_part_clean.append(app)
        ios_already_added.append(name)

print(f'Length of cleaned data: {len(ios_part_clean)}')

Length of cleaned data: 7195


### Non-english apps

Addionally, there are apps which are clearly not English in this dataset - which goes against the business requirements. These should be removed. The cell below shows us some examples of non-english apps in the dataset

In [18]:
print('iOS columns', ios_header) # N.B. 'track_name' == 'App Name'
explore_data(ios_part_clean,68,74,True)

iOS columns ['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['77', '299029654', 'Â§ßËæûÊûó', '210088960', 'USD', '21.99', '64', '0', '4.5', '0', '4.1.1', '4+', 'Reference', '37', '5', '2', '1']


['79', '299515267', 'Allrecipes Dinner Spinner', '36399104', 'USD', '0', '109349', '1540', '3.5', '5', '6.3', '12+', 'Food & Drink', '37', '5', '1', '1']


['80', '299853944', 'Êñ∞Êµ™Êñ∞Èóª-ÈòÖËØªÊúÄÊñ∞Êó∂‰∫ãÁÉ≠Èó®Â§¥Êù°ËµÑËÆØËßÜÈ¢ë', '115143680', 'USD', '0', '2229', '4', '3.5', '1', '6.2.1', '17+', 'News', '37', '0', '1', '1']


['81', '299949744', 'MotionX GPS', '56481792', 'USD', '1.99', '14970', '24', '3.5', '4.5', '24.2', '4+', 'Navigation', '37', '0', '1', '1']


['82', '300048137', 'AccuWeather - Weather for Life', '181941248', 'USD', '0', '144214', '2162', '3.5', '4', '10.4.1', '4+', 'Weather', '37', '1',

We can write a function test if a string is entirely english or not. English strings can be defined as the ASCII characters, which have an ordinal numner less than 127. Python has an in-built `ord()` function to transform a given character to the ordinal number. In order to account for app titles with emojis and other non-ASCII characters, we can define the function such that the title requires at least 3 non-ASCII characters.

In [19]:
def is_english(string):
    non_ascii_chars = 0
    for character in string:
        if ord(character) > 127:
            non_ascii_chars += 1
    
    if non_ascii_chars > 3:
        return False
    return True

In [20]:
print(is_english('Instachat üòú'))
print(is_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))

True
False


In [21]:
ios_clean = []
non_english = []

for app in ios_part_clean:
    name = app[2]
    if is_english(name) == True:
        ios_clean.append(app)
    else:
        non_english.append(name)

print(ios_clean)

 '1.1.4', '4+', 'Games', '38', '5', '11', '1'], ['10510', '1164512642', 'Sago Mini Holiday Trucks and Diggers', '116524032', 'USD', '0', '56', '56', '4.5', '4.5', '1.0', '4+', 'Education', '38', '5', '1', '1'], ['10513', '1164765952', 'Dungeon Witcher', '102876160', 'USD', '0', '5', '3', '2.5', '3.5', '1.2.0', '12+', 'Games', '40', '5', '0', '1'], ['10516', '1164801111', 'AutoSleep. Auto Sleep Tracker for Watch', '7802880', 'USD', '2.99', '979', '368', '4.5', '4.5', '4.0.1', '4+', 'Health & Fitness', '13', '0', '9', '1'], ['10522', '1164891129', 'We‚Äôre Going on a Bear Hunt', '167348224', 'USD', '2.99', '3', '3', '3.5', '3.5', '1.0.3', '4+', 'Games', '37', '5', '13', '0'], ['10524', '1164956679', 'BRICEMOJI Brice de Nice', '72006656', 'USD', '0.99', '0', '0', '0', '0', '1.0.2', '4+', 'Entertainment', '37', '0', '1', '1'], ['10525', '1165109912', 'WitchSpring2', '956569600', 'USD', '3.99', '17', '9', '4.5', '5', '1.36', '9+', 'Games', '37', '5', '4', '1'], ['10528', '1165315677', 'My F

In [22]:
ios_clean[60:80][2]

['71',
 '297606951',
 'Amazon App: shop, scan, compare, and read reviews',
 '133688320',
 'USD',
 '0',
 '126312',
 '22',
 '3.5',
 '3',
 '9.10.0',
 '4+',
 'Shopping',
 '37',
 '5',
 '8',
 '1']

In [23]:
print(f'Number of iOS apps after initial clean: {len(ios_clean)}')
print(f'Number of andriod apps after initial clean: {len(android_clean)}')

Number of iOS apps after initial clean: 6181
Number of andriod apps after initial clean: 9659


Since the company are only interested in free apps, we must now clean the apps so that only free apps remain. Note that the price of the apps can be found at the 7th and 5th index for android and iOS devices respectively.

In [24]:
print("Android",android_header)
print("iOS", ios_header)

Android ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
iOS ['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [25]:
print(android_clean[1])
print(ios_clean[1])

['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']
['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


In [26]:
android_free = []
ios_free = []

for app in android_clean:
    price = app[7]
    if price == '0' or price == '0.00' or price == '0.0':
        android_free.append(app)

for app in ios_clean:
    price = app[5]
    if price == '0' or price == '0.00' or price == '0.0':
        ios_free.append(app)

print(f'Android: {len(android_free)}')
print(f'iOS: {len(ios_free)}')

Android: 8905
iOS: 3220


In [27]:
count = 0
print('Android')
for app in android_clean[625:635]: # Alter for loop through android_free for comparison
    count += 1
    print(f'{count} Title: {app[0]}; Price: {app[7]}')

print('\niOS')
count = 0
for app in ios_clean[:15]: # Alter for loop through ios_free for comparison
    count += 1
    print(f'{app[0]} Title: {app[2]}; Price: {app[5]}')


Android
1 Title: Kids Learn Languages by Mondly; Price: 0
2 Title: Blinkist - Nonfiction Books; Price: 0
3 Title: Sago Mini Hat Maker; Price: $3.99
4 Title: Fuzzy Numbers: Pre-K Number Foundation; Price: $5.99
5 Title: Toca Life: Hospital; Price: $3.99
6 Title: Complete Spanish Movies; Price: 0
7 Title: Pluto TV - It‚Äôs Free TV; Price: 0
8 Title: Mobile TV; Price: 0
9 Title: TV+; Price: 0
10 Title: Digital TV; Price: 0

iOS
1 Title: PAC-MAN Premium; Price: 3.99
2 Title: Evernote - stay organized; Price: 0
3 Title: WeatherBug - Local Weather, Radar, Maps, Alerts; Price: 0
4 Title: eBay: Best App to Buy, Sell, Save! Online Shopping; Price: 0
5 Title: Bible; Price: 0
6 Title: Shanghai Mahjong; Price: 0.99
7 Title: PayPal - Send and request money safely; Price: 0
8 Title: Pandora - Music & Radio; Price: 0
9 Title: PCalc - The Best Calculator; Price: 9.99
10 Title: Ms. PAC-MAN; Price: 3.99
11 Title: Solitaire by MobilityWare; Price: 4.99
12 Title: SCRABBLE Premium; Price: 7.99
13 Title: Go

## Business Strategy

If the business wants to develop a particular app which will be most profitable, then it needs to be popular in both the Android and iOS markets. The strategy the business wants to employ is the following:
<ol>
    <li> Build a minimal Android version of the app, and add it to Google Play.
    <li> If the app has a good response from users, we develop it further.
    <li> If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.
</ol> 

In [28]:
print(f'Android{android_header}')
print(f'iOS{ios_header}')

Android['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
iOS['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Columns that could be used to determine which categories are popular on <b>android</b>:
- Rating
- Reviews
- Installs
- Category
- Genre

Columns that could be used to determine which categories are popular on <b>iOS</b>:
- rating_count_tot
- user_rating
- prime_genre

For this task, we will chose to build a frequency table for the prime_genre column of the App Store and the Genres and Category columns on the Google Play data.

In [29]:
def freq_table(dataset, index):
    table = {}
    total = len(dataset)
    for row in dataset:
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1

    percentage_table = {}
    for key in table:
        percentage_table[key] = (table[key] / total) * 100

    return percentage_table

In [30]:
# Table for display frequency tables in a sorted order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

#### iOS App Store
The most common genre on the app store is games (58.1%), followed by entertainment (7.8%).  
In general, we can assume that "fun" apps make up most of the marketplace, whereas practical apps are less common

In [31]:
display_table(ios_free,-5) # percentage frequency table of ios prime_genre

Games : 58.13664596273293
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.291925465838509
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602486
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801243
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


#### Google Play Store

The most common categories of apps on the Google Play Store is Family (19.0%), followed by Game (9.7%), followed by Tool (8.4%) and then types of at slightly lower percentages.

If "Family" includes some games aimed at kids, then this is similar to the patterns we saw on the iOS App Store. However, there are definitely a higher proportion of apps aimed at tools and productivity as highlighted by the genres frequency table.

In [32]:
display_table(android_free,1) # percentage frequency table of android categories

FAMILY : 18.97810218978102
GAME : 9.70241437394722
TOOLS : 8.433464345873105
BUSINESS : 4.581695676586187
LIFESTYLE : 3.9303761931499155
PRODUCTIVITY : 3.885457608085345
FINANCE : 3.6833239752947784
MEDICAL : 3.5148792813026386
SPORTS : 3.3801235261089273
PERSONALIZATION : 3.312745648512072
COMMUNICATION : 3.2341381246490735
HEALTH_AND_FITNESS : 3.065693430656934
PHOTOGRAPHY : 2.9421673217293653
NEWS_AND_MAGAZINES : 2.829870859067939
SOCIAL : 2.6501965188096577
TRAVEL_AND_LOCAL : 2.3245367770915215
SHOPPING : 2.2459292532285233
BOOKS_AND_REFERENCE : 2.1785513756316677
DATING : 1.8528916339135317
VIDEO_PLAYERS : 1.7967434025828188
MAPS_AND_NAVIGATION : 1.4149354295339696
FOOD_AND_DRINK : 1.235261089275688
EDUCATION : 1.167883211678832
ENTERTAINMENT : 0.9545199326221224
LIBRARIES_AND_DEMO : 0.9320606400898372
AUTO_AND_VEHICLES : 0.9208309938236946
HOUSE_AND_HOME : 0.8197641774284109
WEATHER : 0.7973048848961257
EVENTS : 0.7074677147669848
PARENTING : 0.6513194834362718
ART_AND_DESIGN : 0

In [33]:
display_table(android_free,-4) # percentage frequency table of android genres

Tools : 8.422234699606962
Entertainment : 6.086468276249298
Education : 5.390230207748456
Business : 4.581695676586187
Lifestyle : 3.9191465468837734
Productivity : 3.885457608085345
Finance : 3.6833239752947784
Medical : 3.5148792813026386
Sports : 3.4475014037057834
Personalization : 3.312745648512072
Communication : 3.2341381246490735
Action : 3.0881527231892196
Health & Fitness : 3.065693430656934
Photography : 2.9421673217293653
News & Magazines : 2.829870859067939
Social : 2.6501965188096577
Travel & Local : 2.313307130825379
Shopping : 2.2459292532285233
Books & Reference : 2.1785513756316677
Simulation : 2.0662549129702414
Dating : 1.8528916339135317
Arcade : 1.8416619876473892
Video Players & Editors : 1.7742841100505335
Casual : 1.7518248175182483
Maps & Navigation : 1.4149354295339696
Food & Drink : 1.235261089275688
Puzzle : 1.1229646266142617
Racing : 0.9882088714205502
Role Playing : 0.9320606400898372
Libraries & Demo : 0.9320606400898372
Strategy : 0.9208309938236946
Au

#### Takeaways

iOS App Store dominated by apps which are fun, whereas the Google Play Store has a more even balance between fun and practical apps when it comes to the make-up of available apps.

Clearly, the frequency of apps available only tells one part of the story. It tells us the make-up of the app store, but not how many users each app has.

## Most Popular Apps

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_column app.

### iOS App Store

In [34]:
ios_genres = freq_table(ios_free,-5) # frequency table of prime_genre in the ios dataset

for genre in ios_genres:
    total = 0 # total number of rating given to apps with that genre
    len_genre = 0
    for app in ios_free:
        genre_app = app[-5]
        if genre_app == genre:
            total += float(app[6]) # the rating_count_tot column
            len_genre += 1
    
    avg_num_ratings = total / len_genre

    print(genre, ':', avg_num_ratings)

Productivity : 21028.410714285714
Weather : 52279.892857142855
Shopping : 26919.690476190477
Reference : 74942.11111111111
Finance : 31467.944444444445
Music : 57326.530303030304
Utilities : 18684.456790123455
Travel : 28243.8
Social Networking : 71548.34905660378
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Games : 22812.92467948718
Food & Drink : 33333.92307692308
News : 21248.023255813954
Book : 39758.5
Photo & Video : 28441.54375
Entertainment : 14029.830708661417
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Navigation : 86090.33333333333
Medical : 612.0
Catalogs : 4004.0


Navigation has the highest number of ratings in the iOS App Store.

In [35]:
for app in ios_free:
    if app[-5] == 'Navigation':
        print(app[2],':', app[6])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Geocaching¬Æ : 12811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS ‚Äì Car Navigation & Offline Maps : 3582
Google Maps - Navigation & Transit : 154911


In [36]:
for app in ios_free:
    if app[-5] == 'Social Networking':
        print(app[2],':', app[6])

Facebook : 2974676
LinkedIn : 71856
Skype for iPhone : 373519
Tumblr : 334293
Match‚Ñ¢ - #1 Dating App. : 60659
WhatsApp Messenger : 287589
TextNow - Unlimited Text + Calls : 164963
Grindr - Gay and same sex guys chat, meet and date : 23201
imo video calls and chat : 18841
Ameba : 269
Weibo : 7265
Badoo - Meet New People, Chat, Socialize. : 34428
Kik : 260965
Qzone : 1649
Fake-A-Location Free ‚Ñ¢ : 354
Tango - Free Video Call, Voice and Chat : 75412
MeetMe - Chat and Meet New People : 97072
SimSimi : 23530
Viber Messenger ‚Äì Text & Call : 164249
Find My Family, Friends & iPhone - Life360 Locator : 43877
Weibo HD : 16772
POF - Best Dating App for Conversations : 52642
GroupMe : 28260
Lobi : 36
WeChat : 34584
ooVoo ‚Äì Free Video Call, Text and Voice : 177501
Pinterest : 1061624
Áü•‰πé : 397
Qzone HD : 458
Skype for iPad : 60163
LINE : 11437
QQ : 9109
LOVOO - Dating Chat : 1985
QQ HD : 5058
Messenger : 351466
eHarmony‚Ñ¢ Dating App - Meet Singles : 11124
YouNow: Live Stream Video Chat :

In [37]:
for app in ios_free:
    if app[-5] == 'Music':
        print(app[2],':', app[6])

Pandora - Music & Radio : 1126879
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio ‚Äì Free Music & Radio Stations : 293228
Deezer - Listen to your Favorite Music & Playlists : 4677
Sonos Controller : 48905
NRJ Radio : 38
radio.de - Der Radioplayer : 64
Spotify Music : 878563
SoundCloud - Music & Audio : 135744
Sing Karaoke Songs Unlimited with StarMaker : 26227
SoundHound Song Search & Music Player : 82602
Ringtones for iPhone & Ringtone Maker : 25403
Coach Guitar - Lessons & Easy Tabs For Beginners : 2416
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Magic Piano by Smule : 131695
QQÈü≥‰πêHD : 224
The Singing Machine Mobile Karaoke App : 130
Bandsintown Concerts : 30845
PetitLyrics : 0
edjing Mix:DJ turntable to remix and scratch music : 13580
Smule Sing! : 119316
Amazon Music : 106235
AutoRap by Smule : 18202
My Mixtapez Music : 26286
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
Napster - Top 

In [38]:
for app in ios_free:
    if app[-5] == 'Weather':
        print(app[2],':', app[6])

WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
The Weather Channel: Forecast, Radar & Alerts : 495626
AccuWeather - Weather for Life : 144214
MyRadar NOAA Weather Radar Forecast : 150158
The Weather Channel App for iPad ‚Äì best local forecast, radar map, and storm tracking : 208648
M√©t√©o-France : 24
Yurekuru Call : 53
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
FEMA : 128
Weather Underground: Custom Forecast & Local Radar : 49192
JaxReady : 22
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
Hurricane by American Red Cross : 1158
Weather & Radar : 37
WRAL Weather Alert : 25
Yahoo Weather : 112603
Weather Live Free - Weather Forecast & Alerts : 35702
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
iWeather - World weather forecast : 80
Almanac Long-Range Weather Forecast : 12
TodayAir : 0
Weather - Radar - Storm with Morecast App : 78
Storm Radar : 22792
WarnWetter : 0
wetter.com : 0
Forecast Bar : 375
Freddy

In [39]:
for app in ios_free:
    if app[-5] == 'Games':
        print(app[2],':', app[6])

witch : 1405
Flappy Bird : original version ! : 516
Fireboy and Watergirl: Online in the Forest Temple - Multiplayer Running and Adventure Game : 3965
Twisty Arrow - Shoot the Circle Wheel : 4440
Puzzle Monster Quest - New MultiPlayer : 894
Ball Escape! : 48
Loop Mania : 534
Independence Day Resurgence: Battle Heroes : 523
VR HORROR : 241
Plummet Dash : 194
KINGDOM HEARTS Union œá[Cross] : 2984
Dancing with the Stars: The Official Game : 1098
High School Crush - My First Love : 2003
LEGO¬Æ DC Super Heroes Mighty Micros : 172
Snakebird : 401
Move the Match - Matchstick Puzzles for Free : 441
Sloomy : 241
Warp Shift : 1178
Sausage Legend - Fighting Game : 90
Brio - Don‚Äôt Fall! : 56
Ridiculous Parking Simulator a Real Crazy Multi Car Driving Racing Game : 43
Tap Hero : 1444
War Tortoise : 10555
Emoji Blitz : 999
Dash Heroes : 2
Batman v Superman: Who Will Win : 314
Disco Dave : 444
Blair's Fashion Boutique - School Style : 299
Puppy Life - Secret Pet Party : 343
Escape Game: Relief : 49

In [40]:
for app in ios_free:
    if app[-5] == 'Reference':
        print(app[2],':', app[6])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
Êïô„Åà„Å¶!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pok√©mon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


In [41]:
for app in ios_free:
    if app[-5] == 'Health & Fitness':
        print(app[2],':', app[6])

Lifesum ‚Äì Inspiring healthy lifestyle app : 5795
Lose It! ‚Äì Weight Loss Program and Calorie Counter : 373835
Nike+ Training Club - Workouts & Fitness Plans : 33969
Sleep Cycle alarm clock : 104539
Period Tracker Lite : 53620
Weight Watchers : 136833
My Cycles Period and Ovulation Tracker : 7469
Runtastic Running, Jogging and Walking Tracker : 10298
Calorie Counter & Diet Tracker by MyFitnessPal : 507706
Waterlogged - Daily Hydration Tracker : 5000
WebMD for iPad : 9142
Fooducate - Lose Weight, Eat Healthy,Get Motivated : 11875
My Score Plus Weight Loss, Food & Exercise Tracker : 467
VIBO RealMassager : 6
Fitbit : 90496
Headspace : 12819
Charity Miles: Walking & Running Distance Tracker : 3115
Sworkit - Custom Workouts for Exercise & Fitness : 16819
Fitstar Personal Trainer : 7496
Garmin Connect‚Ñ¢ Mobile : 8341
Smart Alarm Clock : sleep cycle & snoring recorder : 3779
Plant Nanny - Water Reminder with Cute Plants : 27421
Sleep Meister - Sleep Cycle Alarm Lite : 445
ameli, l'Assuran

Lots of the average number of ratings for the categories above are skewed heavily by a few apps which appear to have the majority of the market share. However, 'Health & Fitness' and 'Reference' appear to have less of a skew, with most of the apps in these categories having quite a high number of ratings. 

### Google Play Store

Now to analyse the Google Play Store data.

For this dataset, we can use the column 'Installs' (index 5) to give the actual number of users. However, the install numbers aren't so precise and are open-ended.

In [42]:
display_table(android_free, 5)

1,000,000+ : 15.687815833801237
100,000+ : 11.577765300393038
10,000,000+ : 10.499719258843346
10,000+ : 10.252667040988209
1,000+ : 8.422234699606962
100+ : 6.917462099943853
5,000,000+ : 6.816395283548568
500,000+ : 5.53621560920831
50,000+ : 4.817518248175182
5,000+ : 4.525547445255475
10+ : 3.537338573834924
500+ : 3.2341381246490735
50,000,000+ : 2.2908478382930935
100,000,000+ : 2.1224031443009546
50+ : 1.9090398652442448
5+ : 0.7860752386299831
1+ : 0.5165637282425604
500,000,000+ : 0.26951151038742277
1,000,000,000+ : 0.22459292532285235
0+ : 0.044918585064570464
0 : 0.011229646266142616


In [43]:
android_header

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [44]:
# freq_table(android_free, 1)

google_play_cats_installs = {}
for category in freq_table(android_free,1):
    total = 0 # store the sum of intalls specific to each genre
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app ==  category:
            installs_app = app[5]
            installs_app = installs_app.replace('+','')
            installs_app = installs_app.replace(',','')
            installs_app = int(installs_app)
            total += installs_app
            len_category += 1
    
    avg_num_installs = total / len_category
    google_play_cats_installs[category] = avg_num_installs
    print(f'{category} : {avg_num_installs}')

ART_AND_DESIGN : 1952105.1724137932
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8587351.855670104
BUSINESS : 1708215.906862745
COMICS : 803234.8214285715
COMMUNICATION : 38322625.697916664
DATING : 854028.8303030303
EDUCATION : 1825480.7692307692
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1436126.94
GAME : 15551995.891203703
FAMILY : 3668870.823076923
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7001693.425
PHOTOGRAPHY : 17772018.759541985
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10787009.952063914
PERSONALIZATION : 5183850.806779661
PRODUCTIVITY : 16738957.554913295
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24573948.25
NEWS_AND_MAGAZINES : 9401635.95

In [45]:
def sort_dict(dictionary):
    tuples_list = [(value, key) for key, value in dictionary.items()]
    return sorted(tuples_list, reverse=True)

In [46]:
sort_dict(google_play_cats_installs)

[(38322625.697916664, 'COMMUNICATION'),
 (24573948.25, 'VIDEO_PLAYERS'),
 (23253652.127118643, 'SOCIAL'),
 (17772018.759541985, 'PHOTOGRAPHY'),
 (16738957.554913295, 'PRODUCTIVITY'),
 (15551995.891203703, 'GAME'),
 (13984077.710144928, 'TRAVEL_AND_LOCAL'),
 (11640705.88235294, 'ENTERTAINMENT'),
 (10787009.952063914, 'TOOLS'),
 (9401635.952380951, 'NEWS_AND_MAGAZINES'),
 (8587351.855670104, 'BOOKS_AND_REFERENCE'),
 (7001693.425, 'SHOPPING'),
 (5183850.806779661, 'PERSONALIZATION'),
 (5074486.197183099, 'WEATHER'),
 (4188821.9853479853, 'HEALTH_AND_FITNESS'),
 (3993339.603174603, 'MAPS_AND_NAVIGATION'),
 (3668870.823076923, 'FAMILY'),
 (3638640.1428571427, 'SPORTS'),
 (1952105.1724137932, 'ART_AND_DESIGN'),
 (1924897.7363636363, 'FOOD_AND_DRINK'),
 (1825480.7692307692, 'EDUCATION'),
 (1708215.906862745, 'BUSINESS'),
 (1436126.94, 'LIFESTYLE'),
 (1387692.475609756, 'FINANCE'),
 (1331540.5616438356, 'HOUSE_AND_HOME'),
 (854028.8303030303, 'DATING'),
 (803234.8214285715, 'COMICS'),
 (647317

The categories with the greatest average number of installs are communication, video players, social, photography, productivity, game, traveland local, entertainment and tools (in descending popularity).

As with the iOS data, we can investigate the skew of each category by iterating over the apps in that category and looking at the number of installs.

In [47]:
def android_app_in_cat(chosen_category):
    for app in android_free:
        category = app[1]
        if category == chosen_category:
            print(f'{app[0]} : {app[5]}')

In [48]:
android_app_in_cat('COMMUNICATION')

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free ‚Äì Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link‚Ñ¢ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+

In [49]:
android_app_in_cat('VIDEO_PLAYERS')

YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service Ôºç DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE ‚Äì Magic Video Maker & Community :

In [50]:
android_app_in_cat('SOCIAL')

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
Web Browser ( Fast & Secure Web Explorer) : 500,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Golden telegram : 50,000+
Facebook Local : 1,000,000+
Meet ‚Äì Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
üíò WhatsLov: Smileys of love

We see that indeed, most of these categories are skewed by a few apps (WhatsApp, Facebook, YouTube etc.).

In [51]:
android_app_in_cat('PHOTOGRAPHY')

TouchNote: Cards & Gifts : 1,000,000+
FreePrints ‚Äì Free Photos Delivered : 1,000,000+
Groovebook Photo Books & Gifts : 500,000+
Moony Lab - Print Photos, Books & Magnets ‚Ñ¢ : 50,000+
LALALAB prints your photos, photobooks and magnets : 1,000,000+
Snapfish : 1,000,000+
Motorola Camera : 50,000,000+
HD Camera - Best Cam with filters & panorama : 5,000,000+
LightX Photo Editor & Photo Effects : 10,000,000+
Sweet Snap - live filter, Selfie photo edit : 10,000,000+
HD Camera - Quick Snap Photo & Video : 1,000,000+
B612 - Beauty & Filter Camera : 100,000,000+
Waterfall Photo Frames : 1,000,000+
Photo frame : 100,000+
Huji Cam : 5,000,000+
Unicorn Photo : 1,000,000+
HD Camera : 5,000,000+
Makeup Editor -Beauty Photo Editor & Selfie Camera : 1,000,000+
Makeup Photo Editor: Makeup Camera & Makeup Editor : 1,000,000+
Moto Photo Editor : 5,000,000+
InstaBeauty -Makeup Selfie Cam : 50,000,000+
Garden Photo Frames - Garden Photo Editor : 500,000+
Photo Frame : 10,000,000+
Selfie Camera - Photo E

Further investigation shows that photography and books and reference are both good categories for having a high number of average installs but a good spread of apps with less skew across the category. Since the reference category performed well in the iOS App Store dataset, this might be a good bet.

In [52]:
# Books and reference with over 100M installs
for app in android_free:
    installs = int(app[5].replace('+','').replace(',',''))
    category = app[1]
    if category == 'BOOKS_AND_REFERENCE' and installs >= 100_000_000:
        print(f'{app[0]} : {app[5]}')

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad üìñ Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


In [53]:
# Books and reference with between 1M and 100M downloads
for app in android_free:
    installs = int(app[5].replace('+','').replace(',',''))
    category = app[1]
    if category == 'BOOKS_AND_REFERENCE' and installs >= 1_000_000 and installs < 100_000_000:
        print(f'{app[0]} : {app[5]}')

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+

Excluding the few apps in the books and reference category with lots of installs, we see that there are plently of apps with more than 1M installs. This list contains lots of books about the Quran and eBooks. From this we might suggest that the business builds an app around a classic book that has some eBook features such as audio reading.

## Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.