# Mobile App Profiles Recommendation based on Google and Apple Stores

The aim of this project is to recommend the most profitable mobile application profile for the Google and Apple market. Working as data analyst for a company that builds Android and iOS mobile apps. My mission is to analyze data to help our developers understand what type of apps are likely to attract more users.

The company only builds apps that are free to download and install, and the main source of revenue consists of in-app ads. This means the revenue for any given app is mostly influenced by the number of users who use it, so the more users that see and engage with the ads, the better.

## Opening and exploring the data set

As of the fourth quarter of 2019, Android users were able to choose between 2.57 million apps, making Google Play the app store with biggest number of available apps. Apple's App Store was the second-largest app store with almost 1.84 million available apps for iOS.
*Source: [Statistica](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)*

The following two data sets (Source: Kaggle) were used for the purpose of this project:
* A [data set](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately 10,000 Android apps from Google Play; the data was collected on August 2018. You can download the data set directly from [this link](https://www.kaggle.com/lava18/google-play-store-apps/download).
* A [data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store; the data was collected on July 2017. You can download the data set directly from [this link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/download).

Let's start by opening the two data sets and then continue with exploring the data.

In [1]:
## Import required library
from csv import reader

## Open Apple Store dataset
### Create a list of lists for Apple Store data
opened_file = open('apple_store_data/AppleStore.csv')
reader_file = reader(opened_file)
ios_data = list(reader_file)
## Remove the first index column
for row in ios_data:
    del row[0]
### Keep only the header of the dataset
ios_header = ios_data[0]
### Remove the header from the dataset
ios_data = ios_data[1:]

## Open Google Play Store dataset
### Create a list of lists for Google Play Store data
opened_file = open('google_store_data/googleplaystore.csv')
reader_file = reader(opened_file)
google_data = list(reader_file)
### Keep only the header of the dataset
google_header = google_data[0]
### Remove the header from the dataset
google_data = google_data[1:]

In [2]:
## Function that will help print the rows of the dataset in a more readable way
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
## Print the header, first 5 rows of Apple Store dataset. Also, print the total number of columns and rows
print(ios_header)
print('\n')
explore_data(ios_data, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


['282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


Number of

In [4]:
## Print the header, first 5 rows of Google Play Store dataset. Also, print the total number of columns and rows
print(google_header)
print('\n')
explore_data(google_data, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

#### Apple Store Column description

| Column          | Description                                     |
|:----------------|-------------------------------------------------|
| id              | App ID                                          |
| track_name      | App Name                                        |
| size_bytes      | Size (in Bytes)                                 |
| currency        | Currency Type                                   |
| price           | Price amount                                    |
| ratingcounttot  | User Rating counts (for all version)            |
| ratingcountver  | User Rating counts (for current version)        |
| user_rating     | Average User Rating value (for all version)     |
| userratingver   | Average User Rating value (for current version) |
| ver             | Latest version code                             |
| cont_rating     | Content Rating                                  |
| prime_genre     | Primary Genre                                   |
| sup_devices.num | Number of supporting devices                    |
| ipadSc_urls.num | Number of screenshots showed for display        |
| lang.num        | Number of supported languages                   |
| vpp_lic         | Vpp Device Based Licensing Enabled              |

#### Google Play Store Column description

| Column          | Description                                     |
|:----------------|-------------------------------------------------|
| App             | Application Name                                |
| Category        | Category the app belongs to                     |
| Rating          | Overall user rating of the app                  |
| Reviews         | Number of user reviews for the app              |
| Size            | Size of the app                                 |
| Installs        | Number of user downloads/installs for the app   |
| Type            | Paid or Free                                    |
| Price           | Price of the app                                |
| Content Rating  | Age group the app is targeted at                |
| Genres   | An app can belong to multiple genres (apart from main cat) |
| Last Updated    | Date when the app was last updated on Play Store |
| Current Ver     | Current version of the app                      |
| Android Ver     | Min required Android version                    |

Based on the above, the columns that are of interest are:

* Apple Store dataset:
   * track_name
   * price (we are interested only on free apps)
   * ratingcounttot
   * ratingcountver
   * user_rating
   * userratingver
   * cont_rating
   * prime_genre

* Google Play Store dataset:
   * App
   * Category
   * Rating
   * Reviews
   * Installs
   * Type (we are interested in free apps)
   * Genres

### Deleting Wrong data

Based on [this discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) row 10472 in Google Play Store dataset is wrong. Let's verify this and delete it from our dataset.

In [5]:
## Print the header, the wrong row and a correct row of Google Play Store dataset
print(google_header)
print('\n')
print(google_data[10472])
print('\n')
print(google_data[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


As can be seen the following information are missing from row 10472:
- Category
- Genres

Let's proceed with deleting this entry from the dataset

In [6]:
## Delete row 10472 from google_data list
del google_data[10472]

### Removing duplicate entries
Next step is to identify and remove any possible duplicate entry in both datasets.

In [7]:
## Identify if there are any duplicates in Google Play Store dataset
duplicate_apps = []
unique_apps = []

for app in google_data:
    name = app[0]
    if name in unique_apps:
        if name not in duplicate_apps:
            duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps: ', len(duplicate_apps))
print('\n')
print('Number of unique apps: ', len(unique_apps))
print('\n')
print('Example of duplicate apps: ', duplicate_apps[:10])

Number of duplicate apps:  798


Number of unique apps:  9659


Example of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Zenefits', 'Google Ads', 'Slack', 'FreshBooks Classic', 'Insightly CRM']


In [8]:
## Identify if there are any duplicates in Apple Store dataset
duplicate_apps = []
unique_apps = []

for app in ios_data:
    name = app[1]
    if name in unique_apps:
        if name not in duplicate_apps:
            duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps: ', len(duplicate_apps))
print('\n')
print('Number of unique apps: ', len(unique_apps))
print('\n')
print('Example of duplicate apps: ', duplicate_apps[:10])

Number of duplicate apps:  2


Number of unique apps:  7195


Example of duplicate apps:  ['VR Roller Coaster', 'Mannequin Challenge']


In [9]:
## Identify a pattern (Google Play Store) in order to justify a way to remove duplicate records
duplicate_apps = []
unique_apps = []

for app in google_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

for app in duplicate_apps:
    print(app)
    for google_entry in google_data:
        if app == google_entry[0]:
            print(google_entry)
    print('\n')

Quick PDF Scanner + OCR FREE
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


Box
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Vari


Skype - free IM & video calls
['Skype - free IM & video calls', 'COMMUNICATION', '4.1', '10484169', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Skype - free IM & video calls', 'COMMUNICATION', '4.1', '10484169', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Skype - free IM & video calls', 'COMMUNICATION', '4.1', '10484169', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 3, 2018', 'Varies with device', 'Varies with device']


WeChat
['WeChat', 'COMMUNICATION', '4.2', '5387333', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Communication', 'July 31, 2018', 'Varies with device', 'Varies with device']
['WeChat', 'COMMUNICATION', '4.2', '5387446', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Communication', 'July



Free Cam Girls - Live Webcam
['Free Cam Girls - Live Webcam', 'DATING', '3.5', '35', '16M', '1,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 16, 2018', '2.4', '4.0.3 and up']
['Free Cam Girls - Live Webcam', 'DATING', '3.5', '35', '16M', '1,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 16, 2018', '2.4', '4.0.3 and up']


Random Video Chat App With Strangers
['Random Video Chat App With Strangers', 'DATING', 'NaN', '3', '4.8M', '1,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 17, 2018', '1.', '4.0 and up']
['Random Video Chat App With Strangers', 'DATING', 'NaN', '3', '4.8M', '1,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 17, 2018', '1.', '4.0 and up']


Live Girls Talk - Free Video Chat
['Live Girls Talk - Free Video Chat', 'DATING', '4.8', '125', '4.7M', '5,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 8, 2018', '8.2', '4.0.3 and up']
['Live Girls Talk - Free Video Chat', 'DATING', '4.8', '125', '4.7M', '5,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 8, 20

['Nick', 'FAMILY', '4.2', '123309', '25M', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Entertainment;Music & Video', 'January 24, 2018', '2.0.8', '4.4 and up']


STARZ
['STARZ', 'ENTERTAINMENT', '4.3', '88185', 'Varies with device', '10,000,000+', 'Free', '0', 'Mature 17+', 'Entertainment', 'June 20, 2018', 'Varies with device', 'Varies with device']
['STARZ', 'ENTERTAINMENT', '4.3', '88185', 'Varies with device', '10,000,000+', 'Free', '0', 'Mature 17+', 'Entertainment', 'June 20, 2018', 'Varies with device', 'Varies with device']
['STARZ', 'ENTERTAINMENT', '4.3', '88185', 'Varies with device', '10,000,000+', 'Free', '0', 'Mature 17+', 'Entertainment', 'June 20, 2018', 'Varies with device', 'Varies with device']


Hulu: Stream TV, Movies & more
['Hulu: Stream TV, Movies & more', 'ENTERTAINMENT', '4.0', '319692', 'Varies with device', '10,000,000+', 'Free', '0', 'Teen', 'Entertainment', 'August 3, 2018', 'Varies with device', '5.0 and up']
['Hulu: Stream TV, Movies & more', 'ENTERTAIN

['Lose It! - Calorie Counter', 'HEALTH_AND_FITNESS', '4.4', '69395', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Lose It! - Calorie Counter', 'HEALTH_AND_FITNESS', '4.4', '69395', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Lose It! - Calorie Counter', 'HEALTH_AND_FITNESS', '4.4', '69395', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'August 3, 2018', 'Varies with device', 'Varies with device']


Calorie Counter - MyNetDiary
['Calorie Counter - MyNetDiary', 'HEALTH_AND_FITNESS', '4.5', '27439', '19M', '1,000,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'July 16, 2018', '6.5.1', '5.0 and up']
['Calorie Counter - MyNetDiary', 'HEALTH_AND_FITNESS', '4.5', '27439', '19M', '1,000,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'July 16, 2018',


Angry Birds Classic
['Angry Birds Classic', 'GAME', '4.4', '5566669', '97M', '100,000,000+', 'Free', '0', 'Everyone', 'Arcade', 'May 24, 2018', '7.9.3', '4.1 and up']
['Angry Birds Classic', 'GAME', '4.4', '5566805', '97M', '100,000,000+', 'Free', '0', 'Everyone', 'Arcade', 'May 24, 2018', '7.9.3', '4.1 and up']
['Angry Birds Classic', 'GAME', '4.4', '5566889', '97M', '100,000,000+', 'Free', '0', 'Everyone', 'Arcade', 'May 24, 2018', '7.9.3', '4.1 and up']
['Angry Birds Classic', 'GAME', '4.4', '5566908', '97M', '100,000,000+', 'Free', '0', 'Everyone', 'Arcade', 'May 24, 2018', '7.9.3', '4.1 and up']
['Angry Birds Classic', 'GAME', '4.4', '5565856', '97M', '100,000,000+', 'Free', '0', 'Everyone', 'Arcade', 'May 24, 2018', '7.9.3', '4.1 and up']


Flow Free
['Flow Free', 'GAME', '4.3', '1295557', '11M', '100,000,000+', 'Free', '0', 'Everyone', 'Puzzle', 'April 11, 2018', '4.0', '4.1 and up']
['Flow Free', 'GAME', '4.3', '1295606', '11M', '100,000,000+', 'Free', '0', 'Everyone', 'Puzzle


Homescapes
['Homescapes', 'GAME', '4.6', '3093358', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 13, 2018', '1.8.0.900', '4.0.3 and up']
['Homescapes', 'GAME', '4.6', '3093932', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 13, 2018', '1.8.0.900', '4.0.3 and up']


Wordscapes
['Wordscapes', 'GAME', '4.8', '230710', '87M', '10,000,000+', 'Free', '0', 'Everyone', 'Word', 'August 2, 2018', '1.0.47', '4.1 and up']
['Wordscapes', 'GAME', '4.8', '230727', '87M', '10,000,000+', 'Free', '0', 'Everyone', 'Word', 'August 2, 2018', '1.0.47', '4.1 and up']
['Wordscapes', 'GAME', '4.8', '230727', '87M', '10,000,000+', 'Free', '0', 'Everyone', 'Word', 'August 2, 2018', '1.0.47', '4.1 and up']
['Wordscapes', 'GAME', '4.8', '230849', '87M', '10,000,000+', 'Free', '0', 'Everyone', 'Word', 'August 2, 2018', '1.0.47', '4.1 and up']


My Talking Angela
['My Talking Angela', 'GAME', '4.5', '9881829', '99M', '100,000,000+', 'Free', '0', 'Ever

['Angry Birds Classic', 'GAME', '4.4', '5565856', '97M', '100,000,000+', 'Free', '0', 'Everyone', 'Arcade', 'May 24, 2018', '7.9.3', '4.1 and up']


Best Fiends - Free Puzzle Game
['Best Fiends - Free Puzzle Game', 'GAME', '4.6', '1480189', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Casual', 'August 2, 2018', '5.8.1', 'Varies with device']
['Best Fiends - Free Puzzle Game', 'GAME', '4.6', '1480182', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Casual', 'August 2, 2018', '5.8.1', 'Varies with device']


Hill Climb Racing 2
['Hill Climb Racing 2', 'GAME', '4.6', '2750410', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Racing', 'August 2, 2018', '1.17.2', '4.2 and up']
['Hill Climb Racing 2', 'GAME', '4.6', '2750645', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Racing', 'August 2, 2018', '1.17.2', '4.2 and up']


Swamp Attack
['Swamp Attack', 'GAME', '4.4', '2119218', '70M', '50,000,000+', 'Free', '0', 'Everyone 1

['Vargo Anesthesia Mega App', 'MEDICAL', '4.6', '92', '32M', '1,000+', 'Paid', '$79.99', 'Everyone', 'Medical', 'June 18, 2018', '19.0', '4.0.3 and up']


Monash Uni Low FODMAP Diet
['Monash Uni Low FODMAP Diet', 'MEDICAL', '4.2', '1135', '12M', '100,000+', 'Paid', '$9.00', 'Everyone', 'Medical', 'July 16, 2018', '2.0.7', '4.0 and up']
['Monash Uni Low FODMAP Diet', 'MEDICAL', '4.2', '1135', '12M', '100,000+', 'Paid', '$9.00', 'Everyone', 'Medical', 'July 16, 2018', '2.0.7', '4.0 and up']


mySugr: the blood sugar tracker made just for you
['mySugr: the blood sugar tracker made just for you', 'MEDICAL', '4.6', '21189', '36M', '1,000,000+', 'Free', '0', 'Everyone', 'Medical', 'August 6, 2018', '3.52.1', '5.0 and up']
['mySugr: the blood sugar tracker made just for you', 'MEDICAL', '4.6', '21189', '36M', '1,000,000+', 'Free', '0', 'Everyone', 'Medical', 'August 6, 2018', '3.52.1', '5.0 and up']
['mySugr: the blood sugar tracker made just for you', 'MEDICAL', '4.6', '21187', '35M', '1,000

['MeetMe: Chat & Meet New People', 'SOCIAL', '4.2', '1259723', '76M', '50,000,000+', 'Free', '0', 'Mature 17+', 'Social', 'August 3, 2018', 'Varies with device', '4.1 and up']


Meetup
['Meetup', 'SOCIAL', '4.2', '79129', '23M', '5,000,000+', 'Free', '0', 'Teen', 'Social', 'August 2, 2018', '3.10.26', '4.4 and up']
['Meetup', 'SOCIAL', '4.2', '79129', '23M', '5,000,000+', 'Free', '0', 'Teen', 'Social', 'August 2, 2018', '3.10.26', '4.4 and up']
['Meetup', 'SOCIAL', '4.2', '79130', '23M', '5,000,000+', 'Free', '0', 'Teen', 'Social', 'August 2, 2018', '3.10.26', '4.4 and up']
['Meetup', 'SOCIAL', '4.2', '79130', '23M', '5,000,000+', 'Free', '0', 'Teen', 'Social', 'August 2, 2018', '3.10.26', '4.4 and up']


Text Free: WiFi Calling App
['Text Free: WiFi Calling App', 'SOCIAL', '4.2', '83488', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Social', 'July 24, 2018', 'Varies with device', 'Varies with device']
['Text Free: WiFi Calling App', 'SOCIAL', '4.2', '83488', 'Varies w

['B612 - Beauty & Filter Camera', 'PHOTOGRAPHY', '4.4', '5276983', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Photography', 'July 30, 2018', '7.6.5', '4.3 and up']


BeautyPlus - Easy Photo Editor & Selfie Camera
['BeautyPlus - Easy Photo Editor & Selfie Camera', 'PHOTOGRAPHY', '4.4', '3158047', '53M', '100,000,000+', 'Free', '0', 'Everyone', 'Photography', 'July 31, 2018', '6.9.031', '4.1 and up']
['BeautyPlus - Easy Photo Editor & Selfie Camera', 'PHOTOGRAPHY', '4.4', '3157936', '53M', '100,000,000+', 'Free', '0', 'Everyone', 'Photography', 'July 31, 2018', '6.9.031', '4.1 and up']
['BeautyPlus - Easy Photo Editor & Selfie Camera', 'PHOTOGRAPHY', '4.4', '3158151', '53M', '100,000,000+', 'Free', '0', 'Everyone', 'Photography', 'July 31, 2018', '6.9.031', '4.1 and up']
['BeautyPlus - Easy Photo Editor & Selfie Camera', 'PHOTOGRAPHY', '4.4', '3158151', '53M', '100,000,000+', 'Free', '0', 'Everyone', 'Photography', 'July 31, 2018', '6.9.031', '4.1 and up']
['BeautyPl

['Yahoo Fantasy Sports - #1 Rated Fantasy App', 'SPORTS', '4.2', '277904', 'Varies with device', '5,000,000+', 'Free', '0', 'Mature 17+', 'Sports', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Yahoo Fantasy Sports - #1 Rated Fantasy App', 'SPORTS', '4.2', '277904', 'Varies with device', '5,000,000+', 'Free', '0', 'Mature 17+', 'Sports', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Yahoo Fantasy Sports - #1 Rated Fantasy App', 'SPORTS', '4.2', '277939', 'Varies with device', '5,000,000+', 'Free', '0', 'Mature 17+', 'Sports', 'August 2, 2018', 'Varies with device', 'Varies with device']


ESPN
['ESPN', 'SPORTS', '4.2', '521138', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Sports', 'July 19, 2018', 'Varies with device', '5.0 and up']
['ESPN', 'SPORTS', '4.2', '521138', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Sports', 'July 19, 2018', 'Varies with device', '5.0 and up']
['ESPN', 'SPORTS', '4.2', '521138

['Google Keep', 'PRODUCTIVITY', '4.4', '691474', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Productivity', 'August 6, 2018', 'Varies with device', 'Varies with device']
['Google Keep', 'PRODUCTIVITY', '4.4', '691474', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Productivity', 'August 6, 2018', 'Varies with device', 'Varies with device']
['Google Keep', 'PRODUCTIVITY', '4.4', '691474', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Productivity', 'August 6, 2018', 'Varies with device', 'Varies with device']
['Google Keep', 'PRODUCTIVITY', '4.4', '691474', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Productivity', 'August 6, 2018', 'Varies with device', 'Varies with device']


Evernote – Organizer, Planner for Notes & Memos
['Evernote – Organizer, Planner for Notes & Memos', 'PRODUCTIVITY', '4.6', '1488396', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Productivity', 'August 3, 2018', 'Varies

B612 - Beauty & Filter Camera
['B612 - Beauty & Filter Camera', 'PHOTOGRAPHY', '4.4', '5282578', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Photography', 'July 30, 2018', '7.6.5', '4.3 and up']
['B612 - Beauty & Filter Camera', 'PHOTOGRAPHY', '4.4', '5282558', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Photography', 'July 30, 2018', '7.6.5', '4.3 and up']
['B612 - Beauty & Filter Camera', 'PHOTOGRAPHY', '4.4', '5276983', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Photography', 'July 30, 2018', '7.6.5', '4.3 and up']


Block Craft 3D: Building Simulator Games For Free
['Block Craft 3D: Building Simulator Games For Free', 'GAME', '4.5', '946926', '57M', '50,000,000+', 'Free', '0', 'Everyone', 'Simulation', 'March 5, 2018', '2.10.2', '4.0.3 and up']
['Block Craft 3D: Building Simulator Games For Free', 'GAME', '4.5', '946926', '57M', '50,000,000+', 'Free', '0', 'Everyone', 'Simulation', 'March 5, 2018', '2.10.2', '4.0.3 and up

['slither.io', 'GAME', '4.4', '5231553', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Action', 'November 14, 2017', 'Varies with device', '2.3 and up']


POF Free Dating App
['POF Free Dating App', 'SOCIAL', '4.2', '1175794', 'Varies with device', '50,000,000+', 'Free', '0', 'Mature 17+', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['POF Free Dating App', 'SOCIAL', '4.2', '1175815', 'Varies with device', '50,000,000+', 'Free', '0', 'Mature 17+', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['POF Free Dating App', 'SOCIAL', '4.2', '1175188', 'Varies with device', '50,000,000+', 'Free', '0', 'Mature 17+', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Skype - free IM & video calls
['Skype - free IM & video calls', 'COMMUNICATION', '4.1', '10484169', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 3, 2018', 'Varies with device', 'Varies with device']


Youper - AI Therapy
['Youper - AI Therapy', 'MEDICAL', '4.6', '2006', '69M', '50,000+', 'Free', '0', 'Everyone', 'Medical', 'August 3, 2018', '6.02.000', '6.0 and up']
['Youper - AI Therapy', 'MEDICAL', '4.6', '2014', '69M', '50,000+', 'Free', '0', 'Everyone', 'Medical', 'August 3, 2018', '6.02.000', '6.0 and up']
['Youper - AI Therapy', 'MEDICAL', '4.6', '1976', '69M', '50,000+', 'Free', '0', 'Everyone', 'Medical', 'August 3, 2018', '6.02.000', '6.0 and up']


Animal Jam - Play Wild!
['Animal Jam - Play Wild!', 'FAMILY', '4.6', '361970', '58M', '5,000,000+', 'Free', '0', 'Everyone', 'Casual;Pretend Play', 'August 2, 2018', '28.0.14', '4.1 and up']
['Animal Jam - Play Wild!', 'FAMILY', '4.6', '361734', '58M', '5,000,000+', 'Free', '0', 'Everyone', 'Casual;Pretend Play', 'August 2, 2018', '28.0.14', '4.1 and up']


RULES OF SURVIVAL
['RULES OF SURVIVAL', 'GAME', '4.2', '1343866', '56M', '10,000,000+', 'Free', '0', 'Teen', 'Action', 'August 1, 2018', '1.180271.184729', '4.0 and up']
['RU

BP Journal - Blood Pressure Diary
['BP Journal - Blood Pressure Diary', 'MEDICAL', '5.0', '6', '26M', '1,000+', 'Free', '0', 'Everyone', 'Medical', 'May 25, 2018', '1.0.32', '4.4 and up']
['BP Journal - Blood Pressure Diary', 'MEDICAL', '5.0', '6', '26M', '1,000+', 'Free', '0', 'Everyone', 'Medical', 'May 25, 2018', '1.0.32', '4.4 and up']


Blood Pressure Monitor
['Blood Pressure Monitor', 'MEDICAL', '4.3', '17', '6.0M', '10,000+', 'Free', '0', 'Everyone', 'Medical', 'February 25, 2017', '1.0.1', '4.4 and up']
['Blood Pressure Monitor', 'MEDICAL', '4.3', '17', '6.0M', '10,000+', 'Free', '0', 'Everyone', 'Medical', 'February 25, 2017', '1.0.1', '4.4 and up']


Blood Pressure Companion
['Blood Pressure Companion', 'MEDICAL', '4.2', '178', '4.8M', '1,000+', 'Paid', '$0.99', 'Everyone', 'Medical', 'July 22, 2018', '4.1.5 (Steglitz)', '4.1 and up']
['Blood Pressure Companion', 'MEDICAL', '4.2', '178', '4.8M', '1,000+', 'Paid', '$0.99', 'Everyone', 'Medical', 'July 22, 2018', '4.1.5 (Stegli

Cache Cleaner-DU Speed Booster (booster & cleaner)
['Cache Cleaner-DU Speed Booster (booster & cleaner)', 'TOOLS', '4.5', '12759663', '15M', '100,000,000+', 'Free', '0', 'Everyone', 'Tools', 'July 25, 2018', '3.1.2', '4.0 and up']
['Cache Cleaner-DU Speed Booster (booster & cleaner)', 'TOOLS', '4.5', '12759815', '15M', '100,000,000+', 'Free', '0', 'Everyone', 'Tools', 'July 25, 2018', '3.1.2', '4.0 and up']


DU Browser—Browse fast & fun
['DU Browser—Browse fast & fun', 'COMMUNICATION', '4.3', '1133501', '4.7M', '10,000,000+', 'Free', '0', 'Everyone', 'Communication', 'April 1, 2016', '6.4.0.4', '4.0 and up']
['DU Browser—Browse fast & fun', 'COMMUNICATION', '4.3', '1133539', '4.7M', '10,000,000+', 'Free', '0', 'Everyone', 'Communication', 'April 1, 2016', '6.4.0.4', '4.0 and up']


Real Racing 3
['Real Racing 3', 'FAMILY', '4.5', '354384', '71M', '10,000,000+', 'Free', '0', 'Everyone', 'Racing;Action & Adventure', 'July 2, 2018', '6.4.0', '4.1 and up']
['Real Racing 3', 'FAMILY', '4.5

['Slickdeals: Coupons & Shopping', 'SHOPPING', '4.5', '33599', '12M', '1,000,000+', 'Free', '0', 'Everyone', 'Shopping', 'July 30, 2018', '3.9', '4.4 and up']


AAFP
['AAFP', 'MEDICAL', '3.8', '63', '24M', '10,000+', 'Free', '0', 'Everyone', 'Medical', 'June 22, 2018', '2.3.1', '5.0 and up']
['AAFP', 'MEDICAL', '3.8', '63', '24M', '10,000+', 'Free', '0', 'Everyone', 'Medical', 'June 22, 2018', '2.3.1', '5.0 and up']




In [10]:
## Identify a pattern (Apple Store) in order to justify a way to remove duplicate records
duplicate_apps = []
unique_apps = []

for app in ios_data:
    name = app[1]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

for app in duplicate_apps:
    print(app)
    for ios_entry in ios_data:
        if app == ios_entry[1]:
            print(ios_entry)
    print('\n')

VR Roller Coaster
['952877179', 'VR Roller Coaster', '169523200', 'USD', '0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0', '67', '44', '3.5', '4', '0.81', '4+', 'Games', '38', '0', '1', '1']


Mannequin Challenge
['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0', '668', '87', '3', '3', '1.4', '9+', 'Games', '37', '4', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0', '105', '58', '4', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']




As can be seen above:
- There are 2 unique duplicate applications in the Apple Store dataset
- There are 798 unique duplicate applications in the Google Play dataset

To remove the dupplicate applications for both Apple Store and Google Play Store, the method that will be used will be to identify which record has the highest count of reviews. In case of a tie, the record will be selected randomly.

As can be seen the total number of reviews is what distinguishes the duplicate records. We can infer that that record with the highest review count is the latest and should be the one that is kept.

To perform this:
- A dictionary should be prepared, where each key is a unique app name, and the value is the highest number of reviews of that app
- The dictionary should be used to create a new cleaned dataset, which will have only one entry per app

In [11]:
## Build the dictionary for Apple Store
max_reviews_ios = {}

for app in ios_data:
    name = app[1]
    n_reviews = float(app[5])
    if name in max_reviews_ios and n_reviews > max_reviews_ios[name]:
        max_reviews_ios[name] = n_reviews
    elif name not in max_reviews_ios:
        max_reviews_ios[name] = n_reviews

## Print the length of the dictionary (expected output is 7195)
print(len(max_reviews_ios))

7195


In [12]:
## Build the dictionary for Google Play Store
max_reviews_google = {}

for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    if name in max_reviews_google and n_reviews > max_reviews_google[name]:
        max_reviews_google[name] = n_reviews
    elif name not in max_reviews_google:
        max_reviews_google[name] = n_reviews

## Print the length of the dictionary (expected output is 9659)
print(len(max_reviews_google))

9659


In [13]:
## Remove the duplicates using the dictionary max_reviews_ios
ios_clean = []
already_added = []

for app in ios_data:
    name = app[1]
    n_reviews = float(app[5])
    if n_reviews == max_reviews_ios[name] and name not in already_added:
        ios_clean.append(app)
        already_added.append(name)
        
## Explore the cleaned data set (print first 5 rows)
explore_data(ios_clean, 0, 5, True)

['281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


['282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


Number of rows: 7195
Number of columns: 16


The number of rows is 7195, as expected.

In [14]:
## Remove the duplicates using the dictionary max_reviews_google
google_clean = []
already_added = []

for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == max_reviews_google[name] and name not in already_added:
        google_clean.append(app)
        already_added.append(name)
        
## Explore the cleaned data set (print first 5 rows)
explore_data(google_clean, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


The number of rows is 9659 as expected.

### Removing non-english applications

As our company only creates applications only with English speakers in mind, any application that does not have an English name should be removed from the dataset.

In [15]:
## Function that takes a string an returns "False" if there are at least 5 characters
## outside the ASCII range (0 - 127) in the string, otherwise returns "True"
def eng_check(input_name):
    count_non_english = 0
    for character in input_name:
        if ord(character) > 127:
            count_non_english += 1
    if count_non_english >= 5:
        return False
    else:
        return True
    

## Use the following examples to check the function
print(eng_check('Instagram'))
print(eng_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(eng_check('Docs To Go™ Free Office Suite'))
print(eng_check('Instachat 😜'))

True
False
True
True


Using the function above, both Google and Apple datasets should be filtered to remove any non-english application.

In [16]:
## Google Play Store (cleaned dataset)
### Remove the non-english apps
google_english_only = []
already_added = []

for app in google_clean:
    name = app[0]
    if eng_check(name) and name not in already_added:
        google_english_only.append(app)
        already_added.append(name)
        
## Explore the cleaned data set (print first 5 rows)
explore_data(google_english_only, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9619
Number of columns: 13


In [17]:
## Apple Store (cleaned dataset)
### Remove the non-english apps
ios_english_only = []
already_added = []

for app in ios_clean:
    name = app[1]
    if eng_check(name) and name not in already_added:
        ios_english_only.append(app)
        already_added.append(name)
        
## Explore the cleaned data set (print first 5 rows)
explore_data(ios_english_only, 0, 5, True)

['281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


['282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


Number of rows: 6238
Number of columns: 16


As can be seen, after the filtering of non-english applications, the total number of unique applications for Apple and Google are:
* Google: 9619
* Apple: 6238

This means that the number of non-english applications is:
* Google: 40
* Apple: 957

### Filter out non-free applications

As our company only creates free applications only , any application that costs to download and install should be removed from the dataset.

In [18]:
## Remove the non-free apps from Apple Store dataset
ios_final = []
already_added = []

for app in ios_english_only:
    name = app[1]
    price = float(app[4])
    if price == 0.00 and name not in already_added:
        ios_final.append(app)
        already_added.append(name)
        
## Explore the final data set (print first 5 rows)
explore_data(ios_final, 0, 5, True)

['281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


['282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


['283646709', 'PayPal - Send and request money safely', '227795968', 'USD', '0', '119487', '879', '4', '4.5', '6.12.0', '4+', 'Finance', '37', '0', '19', '1']


Number of rows: 3261
Number of columns: 16


In [19]:
## Remove the non-free apps from Google Play Store dataset
google_final = []
already_added = []

for app in google_english_only:
    name = app[0]
    ### We need to remove the $ symbol from some entries to be able to convert them to float numbers
    price = float(app[7].replace('$', ''))
    if price == 0.00 and name not in already_added:
        google_final.append(app)
        already_added.append(name)
        
## Explore the final data set (print first 5 rows)
explore_data(google_final, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8869
Number of columns: 13


#### Final data set

Our final datasets contains unique, English and free applications that can be found in either Google Play Store or Apple Store. The number of entries for each dataset is:
* **Apple**: 3261 apps
* **Google**: 8869 apps

### Most common apps by Genre

As was mentioned in the introduction, the aim is to determine the kinds of apps that are likely to attract more users because the revenue is highly influenced by the number of people using the company's apps.

To minimize risks and overhead, the validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play Store.
2. If the app has a good response from users, the company develops it further.
3. If the app is profitable after six months, the company builds an iOS version of the app and adds it to the Apple Store.

Because the end goal is to add the app on both Google Play and the Apple Stores, it is needed to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of what are the most common genres for each market.

In [20]:
## Function that returns the frequency table (as a dictionary) for any column of the dataset specified
## Frequencies are expressed in percentages
def freq_table(dataset, index):
    ### Create a dictionary for total count
    freq_dict = {}
    total = 0
    
    for row in dataset:
        total += 1
        category = row[index]
        if category in freq_dict:
            freq_dict[category] += 1
        else:
            freq_dict[category] = 1
            
    ### Create a dictionary for percentages
    freq_percent_dict = {}
    
    for key in freq_dict:
        percentage = (freq_dict[key] / total) * 100
        freq_percent_dict[key] = percentage
        
    return freq_percent_dict


## Function that generates a frequency table using the freq_table() function. 
## Transforms the frequency table into a list of tuples, then sorts the list in a descending order.
## Prints the entries of the frequency table in descending order.
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [21]:
## Examine the frequency table (expressed in percentages) of prime_genre in Apple Store dataset
display_table(ios_final, 11)

Games : 57.712358172339776
Entertainment : 7.973014412756823
Photo & Video : 4.937135847899418
Education : 3.6185219257896346
Social Networking : 3.2812020852499235
Shopping : 2.667893284268629
Utilities : 2.5145660840233055
Sports : 2.1159153633854646
Music : 2.02391904323827
Health & Fitness : 1.9932536031892059
Productivity : 1.778595522845753
Lifestyle : 1.6252683226004292
News : 1.3186139221097823
Travel : 1.2879484820607177
Finance : 1.2572830420116528
Weather : 0.8892977614228765
Food & Drink : 0.8892977614228765
Reference : 0.5519779208831647
Business : 0.5519779208831647
Book : 0.45998160073597055
Navigation : 0.24532352039251765
Medical : 0.18399264029438822
Catalogs : 0.12266176019625882


In [22]:
## Examine the frequency table (expressed in percentages) of Category in Google Play Store dataset
display_table(google_final, 1)

FAMILY : 18.919833126620812
GAME : 9.719246814747999
TOOLS : 8.456421242530162
BUSINESS : 4.5890179276130345
LIFESTYLE : 3.9125042282106213
PRODUCTIVITY : 3.889953771563874
FINANCE : 3.698274890066524
MEDICAL : 3.5291464652159203
SPORTS : 3.3938437253354383
PERSONALIZATION : 3.314917127071823
COMMUNICATION : 3.235990528808208
HEALTH_AND_FITNESS : 3.0781373322809786
PHOTOGRAPHY : 2.942834592400496
NEWS_AND_MAGAZINES : 2.79625662419664
SOCIAL : 2.6609538843161573
TRAVEL_AND_LOCAL : 2.333972262938324
SHOPPING : 2.243770436351336
BOOKS_AND_REFERENCE : 2.153568609764348
DATING : 1.8604126733566355
VIDEO_PLAYERS : 1.792761303416394
MAPS_AND_NAVIGATION : 1.4094035404216936
FOOD_AND_DRINK : 1.2402751155710903
EDUCATION : 1.1613485173074753
ENTERTAINMENT : 0.9583944074867515
LIBRARIES_AND_DEMO : 0.9358439508400046
AUTO_AND_VEHICLES : 0.924568722516631
HOUSE_AND_HOME : 0.823091667606269
WEATHER : 0.800541210959522
EVENTS : 0.7103393843725335
PARENTING : 0.6539632427556658
ART_AND_DESIGN : 0.6426

#### Conclusions for most common apps by genre

Games are dominating the Apple Store, as they accumulate almost 58% of the total applications. Second to that is Entertainment apps with almost 8%.
For Google Play Store, the difference is not so evident by the data. Most common apps are "Family", which account for almost 20% of the apps and second comes Games, which is almost 10%.

**Note**: "Family" genre in Google Play Store contains mostly games for kids

This means that the most common genre could be considered **Games**. However, this does not mean that this genre has the highest user base.

The data should be further explored to check the most popular (most users) apps by genre

### Most popular (highest user count) apps by Genre

The way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play dataset, this information is in the 'Installs' column, but this information is missing for the Apple Store dataset. As a workaround, the total number of user ratings will be used as a proxy, which can be found in the rating_count_tot column of the Apple Store dataset.

#### Most popular apps by Genre for Apple Store

In [23]:
## Function that returns the average rating count by genre (as dictionary) using the freq_table() function
def avg_rating_table(dataset, index):
    store_genre = freq_table(dataset, index)
    avg_genre_rating = {}
    for genre in store_genre:
        total = 0
        genre_rating_count = 0
        for app in dataset:
            genre_app = app[11]
            if genre_app == genre:
                genre_rating_count += float(app[5])
                total += 1
        avg_genre_rating[genre] = genre_rating_count / total
    return avg_genre_rating


## Function that generates an average rating count by genre table using the avg_rating_table() function. 
## Transforms the frequency table into a list of tuples, then sorts the list in a descending order.
## Prints the entries of the frequency table in descending order.
def display_avg_rating_table(dataset, index):
    table = avg_rating_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
        
display_avg_rating_table(ios_final, 11)

Reference : 74942.11111111111
Social Networking : 70884.73831775702
Navigation : 64667.375
Music : 57326.530303030304
Weather : 50477.137931034486
Book : 37217.73333333333
Food & Drink : 29885.758620689656
Photo & Video : 28264.888198757762
Finance : 27638.243902439026
Travel : 26925.166666666668
Shopping : 25996.32183908046
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22691.71041445271
News : 21248.023255813954
Productivity : 20360.241379310344
Utilities : 18460.353658536584
Lifestyle : 15863.77358490566
Entertainment : 13727.292307692307
Business : 7075.333333333333
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


The most popular app genre seems to be **"Reference"** apps. The metric used to determine the most popular app genre is the average user rating count by genre.

Let's have a closer look to the "Reference" and "Social Networking" genre

In [40]:
table_display = []
for app in ios_final:
    if app[11] == 'Reference':
        key_val_as_tuple = (float(app[5]), app[1])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 18
Bible : 985920.0
Dictionary.com Dictionary & Thesaurus : 200047.0
Dictionary.com Dictionary & Thesaurus for iPad : 54175.0
Google Translate : 26786.0
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418.0
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588.0
Merriam-Webster Dictionary : 16849.0
Night Sky : 12122.0
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535.0
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693.0
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497.0
Guides for Pokémon GO - Pokemon GO News and Cheats : 826.0
WWDC : 762.0
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718.0
VPN Express : 14.0
Real Bike Traffic Rider Virtual Reality Glasses : 8.0
教えて!goo : 0.0
Jishokun-Japanese English Dictionary & Translator : 0.0


In [39]:
table_display = []
for app in ios_final:
    if app[11] == 'Social Networking':
        key_val_as_tuple = (float(app[5]), app[1])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 107
Facebook : 2974676.0
Pinterest : 1061624.0
Skype for iPhone : 373519.0
Messenger : 351466.0
Tumblr : 334293.0
WhatsApp Messenger : 287589.0
Kik : 260965.0
ooVoo – Free Video Call, Text and Voice : 177501.0
TextNow - Unlimited Text + Calls : 164963.0
Viber Messenger – Text & Call : 164249.0
Followers - Social Analytics For Instagram : 112778.0
MeetMe - Chat and Meet New People : 97072.0
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414.0
InsTrack for Instagram - Analytics Plus More : 85535.0
Tango - Free Video Call, Voice and Chat : 75412.0
LinkedIn : 71856.0
Match™ - #1 Dating App. : 60659.0
Skype for iPad : 60163.0
POF - Best Dating App for Conversations : 52642.0
Timehop : 49510.0
Find My Family, Friends & iPhone - Life360 Locator : 43877.0
Whisper - Share, Express, Meet : 39819.0
Hangouts : 36404.0
LINE PLAY - Your Avatar World : 34677.0
WeChat : 34584.0
Badoo - Meet New People, Chat, Socialize. : 34428.0
Followers + for Instagram - F

In [41]:
table_display = []
for app in ios_final:
    if app[11] == 'Entertainment':
        key_val_as_tuple = (float(app[5]), app[1])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 260
Netflix : 308844.0
Fandango Movies - Times + Tickets : 291787.0
Colorfy: Coloring Book for Adults : 247809.0
IMDb Movies & TV - Trailers and Showtimes : 183425.0
TRUTH or DARE!!! - FREE : 171055.0
Mad Libs : 117889.0
Twitch : 109549.0
Action Movie FX : 101222.0
Voice Changer Plus : 98777.0
iFunny :) : 98344.0
The CW : 97368.0
The Moron Test : 88613.0
DIRECTV : 81006.0
ABC – Watch Live TV & Stream Full Episodes : 78890.0
Xbox : 72187.0
Redbox : 60236.0
Talking Tom Cat 2 for iPad : 56399.0
Hulu: Watch TV Shows & Stream the Latest Movies : 56170.0
NBC – Watch Now and Stream Full TV Episodes : 55950.0
Emoji> : 55338.0
DIRECTV App for iPad : 47506.0
Amazon Prime Video : 43667.0
CBS Full Episodes and Live TV : 39436.0
FOX NOW - Watch Full Episodes and Stream Live TV : 39391.0
Talking Angela for iPad : 32763.0
Recolor - Coloring Book : 31180.0
Talking Ben the Dog for iPad : 31116.0
Talking Tom Cat for iPad : 29492.0
YouTube Kids : 28560.0
Tom's Love Let

#### Most popular apps by Genre for Google Play Store

In [26]:
## Examine the frequency table (expressed in percentages) of Installs in Google Play Store dataset
display_table(google_final, 5)

1,000,000+ : 15.7289435111061
100,000+ : 11.557109031457886
10,000,000+ : 10.542338482354268
10,000+ : 10.215356860976435
1,000+ : 8.400045100913294
100+ : 6.911714962227985
5,000,000+ : 6.821513135640997
500,000+ : 5.55868756342316
50,000+ : 4.769421580787011
5,000+ : 4.51009132934942
10+ : 3.540421693539294
500+ : 3.2472657571315815
50,000,000+ : 2.3001465779682038
100,000,000+ : 2.131018153117601
50+ : 1.916788814973503
5+ : 0.7892659826361484
1+ : 0.5073852745518097
500,000,000+ : 0.27060547976096516
1,000,000,000+ : 0.22550456646747097
0+ : 0.045100913293494194
0 : 0.011275228323373548


As can be seen above the Install are not expressed in real numbers, but contains open-ended values (i.e. 1,000,000+, 100,000+ etc.)

These values should be converted to float numbers and then get the average number of installs per genre

In [27]:
## Function that returns the average installs by genre (as dictionary) using the freq_table() function
def avg_installs_table(dataset, index):
    store_genre = freq_table(dataset, index)
    avg_genre_installs = {}
    for genre in store_genre:
        total = 0
        genre_installs = 0
        for app in dataset:
            genre_app = app[1]
            if genre_app == genre:
                genre_installs += float(app[5].replace(',', '').replace('+', ''))
                total += 1
        avg_genre_installs[genre] = genre_installs / total
    return avg_genre_installs


## Function that generates an average installs by genre table using the avg_installs_table() function. 
## Transforms the frequency table into a list of tuples, then sorts the list in a descending order.
## Prints the entries of the frequency table in descending order.
def display_avg_installs_table(dataset, index):
    table = avg_installs_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
        
display_avg_installs_table(google_final, 1)

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8721959.47643979
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4025286.24
FAMILY : 3691833.545887962
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1433701.5244956773
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 513151.88679245

On average, Communication apps have the most installs: 38456119 (more than 38 million). 

Let's have a closer look at the highest genres.

**Note**: As the company's goal is to first launch the app at Google Play Store, to come up to a decision, the Google Play Store data should take precedence over the Apple Store data.

In [28]:
table_display = []
for app in google_final:
    if app[1] == 'COMMUNICATION':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 287
WhatsApp Messenger : 1000000000.0
Skype - free IM & video calls : 1000000000.0
Messenger – Text and Video Chat for Free : 1000000000.0
Hangouts : 1000000000.0
Google Chrome: Fast & Secure : 1000000000.0
Gmail : 1000000000.0
imo free video calls and chat : 500000000.0
Viber Messenger : 500000000.0
UC Browser - Fast Download Private & Secure : 500000000.0
LINE: Free Calls & Messages : 500000000.0
Google Duo - High Quality Video Calls : 500000000.0
imo beta free calls and text : 100000000.0
Yahoo Mail – Stay Organized : 100000000.0
Who : 100000000.0
WeChat : 100000000.0
UC Browser Mini -Tiny Fast Private & Secure : 100000000.0
Truecaller: Caller ID, SMS spam blocking & Dialer : 100000000.0
Telegram : 100000000.0
Opera Mini - fast web browser : 100000000.0
Opera Browser: Fast and Secure : 100000000.0
Messenger Lite: Free Calls & Messages : 100000000.0
Kik : 100000000.0
KakaoTalk: Free Calls & Text : 100000000.0
GO SMS Pro - Messenger, Free Themes, Em

There are 287 apps in the Communication genre. There are several huge "players" (i.e. WhatsApp, Skype, Messenger, Hangouts, Google Chrome, Gmail etc.) in this genre that will be hard to compete against.

In [29]:
table_display = []
for app in google_final:
    if app[1] == 'VIDEO_PLAYERS':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 159
YouTube : 1000000000.0
Google Play Movies & TV : 1000000000.0
MX Player : 500000000.0
VivaVideo - Video Editor & Photo Movie : 100000000.0
VideoShow-Video Editor, Video Maker, Beauty Camera : 100000000.0
VLC for Android : 100000000.0
Motorola Gallery : 100000000.0
Motorola FM Radio : 100000000.0
Dubsmash : 100000000.0
Vote for : 50000000.0
Vigo Video : 50000000.0
VMate : 50000000.0
Samsung Video Library : 50000000.0
Ringdroid : 50000000.0
MiniMovie - Free Video and Slideshow Editor : 50000000.0
LIKE – Magic Video Maker & Community : 50000000.0
KineMaster – Pro Video Editor : 50000000.0
HD Video Downloader : 2018 Best video mate : 50000000.0
DU Recorder – Screen Recorder, Video Editor, Live : 50000000.0
video player for android : 10000000.0
iMediaShare – Photos & Music : 10000000.0
YouTube Studio : 10000000.0
Video Player All Format : 10000000.0
Video Downloader - for Instagram Repost App : 10000000.0
Video Downloader : 10000000.0
Ustream : 100000

In [30]:
table_display = []
for app in google_final:
    if app[1] == 'SOCIAL':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 236
Instagram : 1000000000.0
Google+ : 1000000000.0
Facebook : 1000000000.0
Snapchat : 500000000.0
Facebook Lite : 500000000.0
VK : 100000000.0
Tumblr : 100000000.0
Tik Tok - including musical.ly : 100000000.0
Tango - Live Video Broadcast : 100000000.0
Pinterest : 100000000.0
LinkedIn : 100000000.0
Badoo - Free Chat & Dating App : 100000000.0
BIGO LIVE - Live Stream : 100000000.0
ooVoo Video Calls, Messaging & Stories : 50000000.0
Zello PTT Walkie Talkie : 50000000.0
SKOUT - Meet, Chat, Go Live : 50000000.0
POF Free Dating App : 50000000.0
MeetMe: Chat & Meet New People : 50000000.0
textPlus: Free Text & Calls : 10000000.0
magicApp Calling & Messaging : 10000000.0
YouNow: Live Stream Video Chat : 10000000.0
We Heart It : 10000000.0
Waplog - Free Chat, Dating App, Meet Singles : 10000000.0
TextNow - free text + calls : 10000000.0
Text free - Free Text + Call : 10000000.0
Text Me: Text Free, Call Free, Second Phone Number : 10000000.0
Tapatalk - 100,00

In [31]:
table_display = []
for app in google_final:
    if app[1] == 'PHOTOGRAPHY':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 261
Google Photos : 1000000000.0
Z Camera - Photo Editor, Beauty Selfie, Collage : 100000000.0
YouCam Perfect - Selfie Photo Editor : 100000000.0
YouCam Makeup - Magic Selfie Makeovers : 100000000.0
Sweet Selfie - selfie camera, beauty cam, photo edit : 100000000.0
S Photo Editor - Collage Maker , Photo Collage : 100000000.0
Retrica : 100000000.0
PicsArt Photo Studio: Collage Maker & Pic Editor : 100000000.0
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100000000.0
Photo Editor Pro : 100000000.0
Photo Editor Collage Maker Pro : 100000000.0
Photo Collage Editor : 100000000.0
LINE Camera - Photo editor : 100000000.0
Cymera Camera- Photo Editor, Filter,Collage,Layout : 100000000.0
Candy Camera - selfie, beauty camera, photo editor : 100000000.0
Camera360: Selfie Photo Editor with Funny Sticker : 100000000.0
BeautyPlus - Easy Photo Editor & Selfie Camera : 100000000.0
B612 - Beauty & Filter Camera : 100000000.0
AR effect : 100000000.0
Video Editor

DP Status 2017 : 50000.0
BlitzWolf Shutter - BW Shutter : 50000.0
B&W Photo Filter Editor : 50000.0
Z Camera : 10000.0
Square DP For Whatsapp : 10000.0
Profile w/o crop for Telegram : 10000.0
Night Camera Blur Effect : 10000.0
Leica Q : 10000.0
FN Cam : 10000.0
FH WiFiCam : 10000.0
DSLR camera - Auto Focus and Blur Professional : 10000.0
DP Photo Editor : 10000.0
DF Night Selfies : 10000.0
DENVER ACTION CAM 3 : 10000.0
Be Fabulous PHOTO BOOTH : 10000.0
BL 1-Click Camera - Free : 10000.0
Insta Square Profile DP : 5000.0
FL Drone 2 : 5000.0
DP Editor : 5000.0
CB Edits PNG & CB Backgrounds : 5000.0
Photo Editor - BPhoto : 1000.0
Photo BG Changer : 1000.0
PIP-Camera FN Photo Effect : 1000.0
Magical Insta DP : 1000.0
Live DV : 1000.0
Light Meter - EV : 1000.0
Leica CL : 1000.0
FB Photographie : 1000.0
Dp For FB : 1000.0
DV-4036 by Somikon : 1000.0
DS-L4 Viewer : 1000.0
BK Photography : 1000.0
Auto Dslr Photo Effect : Auto Focus Effect : 1000.0
All Types DP & Status Maker : 1000.0
AEE AP : 1

In [32]:
table_display = []
for app in google_final:
    if app[1] == 'PRODUCTIVITY':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 345
Google Drive : 1000000000.0
Microsoft Word : 500000000.0
Google Calendar : 500000000.0
Dropbox : 500000000.0
Cloud Print : 500000000.0
WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100000000.0
SwiftKey Keyboard : 100000000.0
Samsung Notes : 100000000.0
Microsoft PowerPoint : 100000000.0
Microsoft Outlook : 100000000.0
Microsoft OneNote : 100000000.0
Microsoft OneDrive : 100000000.0
Microsoft Excel : 100000000.0
Google Slides : 100000000.0
Google Sheets : 100000000.0
Google Keep : 100000000.0
Google Docs : 100000000.0
Evernote – Organizer, Planner for Notes & Memos : 100000000.0
ES File Explorer File Manager : 100000000.0
ColorNote Notepad Notes : 100000000.0
CamScanner - Phone PDF Creator : 100000000.0
Adobe Acrobat Reader : 100000000.0
myAT&T : 50000000.0
Verizon Cloud : 50000000.0
QR Droid : 50000000.0
My Airtel-Online Recharge, Pay Bill, Wallet, UPI : 50000000.0
Mobizen Screen Recorder - Record, Capture, Edit : 50000000.0
MEGA : 50000000

Ek Vote : 500.0
EY Events Switzerland : 500.0
EY ATL Fuel Calculator : 500.0
DT Manager : 500.0
DN Events : 500.0
CP Connect 2.0 : 500.0
CJ Wilson's ZoomZoomnation : 500.0
BZ Dealer : 500.0
iReadMe : 100.0
ec-Work : 100.0
eG Monitor : 100.0
Town of Princeton, BC : 100.0
Tips & Tricks Dynamics AX 365 : 100.0
Thistletown CI : 100.0
Somos CG : 100.0
SCS eC : 100.0
RoutePlan.cz : 100.0
Register.ca Mobile : 100.0
MiAI (Artificial Intelligence) Assistant : 100.0
Mat|r viewer : 100.0
K-App Mitarbeiter Galeria Kaufhof : 100.0
J. Polep Plus Mobile : 100.0
Fort Myers FL : 100.0
ES Billing System (Offline App) : 100.0
EP Home Energy Hub : 100.0
EG : 100.0
EF Staff : 100.0
DG Users : 100.0
DF-Server Mobile : 100.0
Cx Wize : 100.0
CJ'S TIRE AND AUTO INC. : 100.0
CI Time : 100.0
CI Remote for Go : 100.0
CI CAFETERIAS UBER : 100.0
Builder (by Engineer.ai) : 100.0
BW-IVMS : 100.0
BV Mobile Apps : 100.0
Ag Trucking Mobile App : 100.0
Ag Guardian : 100.0
My Ag Report : 50.0
MY GULFPORT FL : 50.0
EY Team

In [33]:
table_display = []
for app in google_final:
    if app[1] == 'GAME':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 862
Subway Surfers : 1000000000.0
Temple Run 2 : 500000000.0
Pou : 500000000.0
My Talking Tom : 500000000.0
Candy Crush Saga : 500000000.0
slither.io : 100000000.0
Zombie Tsunami : 100000000.0
Yes day : 100000000.0
Vector : 100000000.0
Trivia Crack : 100000000.0
Traffic Racer : 100000000.0
Temple Run : 100000000.0
Talking Tom Gold Run : 100000000.0
Super Mario Run : 100000000.0
Sonic Dash : 100000000.0
Sniper 3D Gun Shooter: Free Shooting Games - FPS : 100000000.0
Smash Hit : 100000000.0
Skater Boy : 100000000.0
Shadow Fight 2 : 100000000.0
Score! Hero : 100000000.0
Roll the Ball® - slide puzzle : 100000000.0
Pokémon GO : 100000000.0
Plants vs. Zombies FREE : 100000000.0
Piano Tiles 2™ : 100000000.0
PAC-MAN : 100000000.0
My Talking Angela : 100000000.0
Modern Combat 5: eSports FPS : 100000000.0
Mobile Legends: Bang Bang : 100000000.0
Lep's World 2 🍀🍀 : 100000000.0
Jetpack Joyride : 100000000.0
Hungry Shark Evolution : 100000000.0
Hill Climb Racing 2 

Bullet Force : 10000000.0
Bubble Shooter : 10000000.0
Bounce Classic : 10000000.0
Bomber Friends : 10000000.0
Blossom Blast Saga : 10000000.0
Block Strike : 10000000.0
Bike Rivals : 10000000.0
Bike Race - Bike Blast Rush : 10000000.0
Bike Mayhem Free : 10000000.0
Big Hunter : 10000000.0
Big Fish Casino – Play Slots & Vegas Games : 10000000.0
Best Fiends - Free Puzzle Game : 10000000.0
Battlefield™ Companion : 10000000.0
Baseball Boy! : 10000000.0
BLACKJACK! : 10000000.0
BEYBLADE BURST app : 10000000.0
Asphalt Xtreme: Rally Racing : 10000000.0
Arrow.io : 10000000.0
Arena of Valor: 5v5 Arena Game : 10000000.0
Alto's Adventure : 10000000.0
1LINE – One Line with One Touch : 10000000.0
Word Crossy - A crossword game : 5000000.0
Will it Crush? : 5000000.0
Wheelie Challenge : 5000000.0
War and Order : 5000000.0
Toughest Game Ever 2 : 5000000.0
Tomb of the Mask : 5000000.0
Tiny Archers : 5000000.0
The Visitor : 5000000.0
The Cube : 5000000.0
Texas Holdem Poker Pro : 5000000.0
THE KING OF FIGHT

AE Gun Ball: arcade ball games : 1000000.0
AE Coin Mania : Arcade Fun : 1000000.0
7 Nights at Pixel Pizzeria - 2 : 1000000.0
4x4 Jeep Racer : 1000000.0
Zombie Sniper 3D III : 500000.0
Zlax.io Zombs Luv Ax : 500000.0
ZOMBIE RIPPER : 500000.0
World of Warriors: Duel : 500000.0
Who am I? (Biblical) : 500000.0
U-48 Submarine Commander Free : 500000.0
Toy Attack : 500000.0
The Visitor: Ep.1 - Kitty Cat Carnage : 500000.0
The Great Wobo Escape Ep. 1 : 500000.0
The Grand Way : 500000.0
The Gang Sniper V. Pocket Edition. : 500000.0
Texas Hold’em Poker + | Social : 500000.0
SuperBikers 2 : 500000.0
Super Dancer VN : 500000.0
Sports Car Driving Simulator 2018 : 500000.0
Solitaire: Decked Out Ad Free : 500000.0
Sid Story : 500000.0
Shoot`Em Down: Shooting game : 500000.0
Route Z : 500000.0
Rage Z: Multiplayer Zombie FPS Online Shooter : 500000.0
Punch em : 500000.0
Project Grand Auto Town Sandbox Beta : 500000.0
PokerStars Play: Free Texas Holdem Poker Game : 500000.0
Pacific Navy Fighter C.E. (A

Woodman Deluxe : 5000.0
Virtual Dice EX : 5000.0
Ultimate Fighter Z : 5000.0
Tamago egg : 5000.0
Roulette Advisor LITE : 5000.0
Robot Fighting Games™ - Real Boxing Champions 3D : 5000.0
Race Manager FP : 5000.0
Heirs & Graces : 5000.0
CNY Slots : Gong Xi Fa Cai 发财机 : 5000.0
BW-GnuGo : 5000.0
BJ Bridge Acol Beginner 2018 : 5000.0
Axe Knight : 5000.0
Axe Champion : 5000.0
VR AG Racing for Cardboard : 1000.0
Super ball DZ : 1000.0
Quiz DC : 1000.0
Puppy Shooting an AK-47: Platformer Zombie Game : 1000.0
Guess the song of J Balvin : 1000.0
Guess the Class 🔥 AQW : 1000.0
FP Runner : 1000.0
FJ Final Join , Circles Game : 1000.0
EP Gem Hunter : 1000.0
DG Mobile : 1000.0
DB Ultra Saiyan Battle : 1000.0
Crayola Color Blaster : 1000.0
Clash of Axe: Flippy Lumberjack Action X : 1000.0
C3-C4-PİCASSO-ELYSEE RACİNG : 1000.0
Bu Hangi Youtuber ? : 1000.0
Bu Hangi Dizi ? : 1000.0
Blackjack aj Poker : 1000.0
BZ Zombie VR : 1000.0
BW-DGS plugin : 1000.0
BJ Bridge Standard American 2018 : 1000.0
B@dL!bs L

In [34]:
table_display = []
for app in google_final:
    if app[1] == 'TRAVEL_AND_LOCAL':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 207
Maps - Navigate & Explore : 1000000000.0
Google Street View : 1000000000.0
TripAdvisor Hotels Flights Restaurants Attractions : 100000000.0
Google Earth : 100000000.0
Booking.com Travel Deals : 100000000.0
trivago: Hotels & Travel : 50000000.0
VZ Navigator : 50000000.0
MAPS.ME – Offline Map and Travel Navigation : 50000000.0
2GIS: directory & navigator : 50000000.0
easyJet: Travel App : 10000000.0
Yelp: Food, Shopping, Services Nearby : 10000000.0
Yatra - Flights, Hotels, Bus, Trains & Cabs : 10000000.0
XE Currency : 10000000.0
Where is my Train : Indian Railway & PNR Status : 10000000.0
Skyscanner : 10000000.0
PagesJaunes - local search : 10000000.0
NTES : 10000000.0
MakeMyTrip-Flight Hotel Bus Cab IRCTC Rail Booking : 10000000.0
Live Camera Viewer ★ World Webcam & IP Cam Streams : 10000000.0
KakaoMap - Map / Navigation : 10000000.0
KAYAK Flights, Hotels & Cars : 10000000.0
Hotels.com: Book Hotel Rooms & Find Vacation Deals : 10000000.0
Goibibo 

In [35]:
table_display = []
for app in google_final:
    if app[1] == 'ENTERTAINMENT':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 85
Talking Ben the Dog : 100000000.0
Talking Angela : 100000000.0
Netflix : 100000000.0
IMDb Movies & TV : 100000000.0
Hotstar : 100000000.0
Twitch: Livestream Multiplayer Games & Esports : 50000000.0
Talking Ginger 2 : 50000000.0
PlayStation App : 50000000.0
Amazon Prime Video : 50000000.0
ivi - movies and TV shows in HD : 10000000.0
WWE : 10000000.0
Vudu Movies & TV : 10000000.0
Viki: Asian TV Dramas & Movies : 10000000.0
Tubi TV - Free Movies & TV : 10000000.0
SketchBook - draw and paint : 10000000.0
STARZ : 10000000.0
Redbox : 10000000.0
Movies by Flixster, with Rotten Tomatoes : 10000000.0
Motorola Spotlight Player™ : 10000000.0
Mobile TV : 10000000.0
MEGOGO - Cinema and TV : 10000000.0
Imgur: Find funny GIFs, memes & watch viral videos : 10000000.0
Fandango Movies - Times + Tickets : 10000000.0
FOX : 10000000.0
Crunchyroll - Everything Anime : 10000000.0
Crackle - Free TV & Movies : 10000000.0
Colorfy: Coloring Book for Adults - Free : 10000000

In [36]:
table_display = []
for app in google_final:
    if app[1] == 'TOOLS':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 750
Google : 1000000000.0
Security Master - Antivirus, VPN, AppLock, Booster : 500000000.0
SHAREit - Transfer & Share : 500000000.0
Google Translate : 500000000.0
Gboard - the Google Keyboard : 500000000.0
Clean Master- Space Cleaner & Antivirus : 500000000.0
Tiny Flashlight + LED : 100000000.0
Speedtest by Ookla : 100000000.0
Share Music & Transfer Files - Xender : 100000000.0
Samsung Smart Switch Mobile : 100000000.0
Samsung Calculator : 100000000.0
Lookout Security & Antivirus : 100000000.0
Google Now Launcher : 100000000.0
Google Korean Input : 100000000.0
GO Keyboard - Cute Emojis, Themes and GIFs : 100000000.0
Device Help : 100000000.0
DU Battery Saver - Battery Charger & Battery Life : 100000000.0
Calculator : 100000000.0
Cache Cleaner-DU Speed Booster (booster & cleaner) : 100000000.0
CM Locker - Security Lockscreen : 100000000.0
Battery Doctor-Battery Life Saver & Battery Cooler : 100000000.0
Avast Mobile Security 2018 - Antivirus & App Lock

Lock Screen : 1000000.0
Limbo PC Emulator QEMU ARM x86 : 1000000.0
LG AV REMOTE : 1000000.0
Inf VPN - Global Proxy & Unlimited Free WIFI VPN : 1000000.0
I Screen Dialer : 1000000.0
I Can't Wake Up! Alarm Clock : 1000000.0
HTC Sense Input - ES : 1000000.0
Graphing Calculator : 1000000.0
Free antivirus and VPN : 1000000.0
Free & Premium VPN - FinchVPN : 1000000.0
Force LTE Only : 1000000.0
Flashlight X : 1000000.0
Flashlight Ultimate : 1000000.0
Flash Light on Call & SMS : 1000000.0
Fingerprint Quick Action : 1000000.0
Fingerprint Lock Screen Prank : 1000000.0
File Viewer for Android : 1000000.0
Fast Secure VPN : 1000000.0
Fast Download Manager : 1000000.0
Farsi Keyboard : 1000000.0
FREEDOME VPN Unlimited anonymous Wifi Security : 1000000.0
EasyNote Notepad | To Do List : 1000000.0
ES Dark Theme for free : 1000000.0
ES Classic Theme : 1000000.0
ES App Locker : 1000000.0
Dr. Battery - Fast Charger - Super Cleaner 2018 : 1000000.0
Download Manager - File & Video : 1000000.0
Download Accele

GTS-M : 5000.0
FreedomPop Friends for Free Data : 5000.0
FR Roster : 5000.0
FJ WiFi HDD : 5000.0
FE Connect : 5000.0
F-Secure SENSE : 5000.0
Elif Ba Oyunu : 5000.0
EZ-VPNGate : 5000.0
EZ Utilities Extension : 5000.0
EZ Screenshot : 5000.0
EZ Pass : 5000.0
ES File Explorer & File Manager 2018 : 5000.0
EH kontrollrakendus : 5000.0
DW Missed call cleaner patch : 5000.0
DW Contacts Wear : 5000.0
DNS Changer - BEST (Gprs/Edge/3G/H/H+/4G) : 5000.0
DM HiDisk : 5000.0
Convert degree Celsius to Fahrenheit or °F to °C : 5000.0
CL Keyboard - Myanmar Keyboard (No Ads) : 5000.0
BW Smart : 5000.0
BC Wildfire : 5000.0
Axxess AX-DSP : 5000.0
AZ Screen Recorder pro : 5000.0
AP Installer : 5000.0
AI Benchmark : 5000.0
df : 1000.0
bi-Cube Mobile Token : 1000.0
Zetup, print in one click : 1000.0
Walk Freely (Ec Shlire) : 1000.0
Vote 4 DC : 1000.0
TuenMun BM : 1000.0
Texas Hold'em EV Calculator : 1000.0
Selfie DV : 1000.0
Schengen/EU App : 1000.0
Sam.BN : 1000.0
SOLEM AG : 1000.0
Roland DG Mobile Panel : 1

In [37]:
table_display = []
for app in google_final:
    if app[1] == 'NEWS_AND_MAGAZINES':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 248
Google News : 1000000000.0
Twitter : 500000000.0
Flipboard: News For Our Time : 500000000.0
Dailyhunt (Newshunt) - Latest News, Viral Videos : 50000000.0
detikcom - Latest & Most Complete News : 10000000.0
Updates for Samsung - Android Update Versions : 10000000.0
Topbuzz: Breaking News, Videos & Funny GIFs : 10000000.0
SmartNews: Breaking News Headlines : 10000000.0
Reddit: Social News, Trending Memes & Funny Videos : 10000000.0
Read- Latest News, Information, Gossip and Politics : 10000000.0
Pulse Nabd - World News, Urgent : 10000000.0
Pocket : 10000000.0
Opera News - Trending news and videos : 10000000.0
Newsroom: News Worth Sharing : 10000000.0
NewsDog - Latest News, Breaking News, Local News : 10000000.0
News by The Times of India Newspaper - Latest News : 10000000.0
News Republic : 10000000.0
NYTimes - Latest News : 10000000.0
NEW - Read Newspaper, News 24h : 10000000.0
Fox News – Breaking News, Live Video & News Alerts : 10000000.0
CNN Bre

Conclusions on the following Genres:
* **Video Players**: There are 159 apps in this genre, which means that there could be room for a new app. However, there are "big player" apps, such as Youtube and Google Play, which have more than 1 billion downloads each. Third comes MX Player with 500 million downloads. With 100 million downloads we have VivaVideo, VideoShow, VLC, Motorola Gallery, Motorola FM Radio and Dubsmash.
* **Social**: There are 236 apps in this genre. This market is dominated by Instagram, Facebook, Google+ and Snapchat. It will be hard to compete against these apps.
* **Photography**: There are 261 apps in this genre. This market is dominated by Google Photos. Apart from the aforementioned app, there are several mobile camera related or photo editor apps. There are several apps in this category and the company should avoid it, unless it can work with a "big name" camera brand (such as Canon, Nikon or Sony) in order to be able to use that name for branding / extra exposure. Something similar has happened for the Motorolla Camera app that has more than 50 million downloads.
* **Productivity**: There are 345 apps in this genre. This genre is dominated by Google, with apps like Google Drive and Google Calendar. Other applications are related to office word, such as Word, Excel, PowerPoint etc. Cloud services such as Dropbox and Cloud Print, also dominate this genre. There is little room for improving the already existing apps and the cost of such project does not justify the possible gains.
* **Game**: There are 862 apps in this genre. Although this field is the easiest to get into, there are so many apps that this category could be considered saturated. As a result a new gaming app should be avoided.
* **Travel and Local**: There are 207 apps in this genre. This field is dominated by Google, with apps like Google Maps and Google Street. Travel related applications, such as TripAvisor and Booking will be very hard to compete against, as they have already established a significant user base.
* **Entertainment**: There are 85 apps in this genre, which means that there could be room for a new app. There are only 5 apps with more than 100 million downloads (i.e. Talking Ben the Dog, Talking Angela, Netflix, IMDb Movies & TV, Hotstar). Also, there are no entertainment apps with under 10 thousand downloads. This means that this could potentially be a good field for a new app.
* **Tools**: There are 750 apps in this genre. Like "Game" category, this genre seems saturated and a new tool app should be avoided.
* **News and Magazines**: There are 248 apps in this genre. In this category, the most popular app is Google News with more than 1 billion downloads. Then, there are Twitter, Flipboard and Dailyhunt with more than 500 million downloads each. To get in this field, the company should collaborate with a well known newspaper or magazine, that does not have already its own app. This could be difficult to achieve.

In [38]:
table_display = []
for app in google_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        key_val_as_tuple = (float(app[5].replace(',', '').replace('+', '')), app[0])
        table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
print('Total number of apps in this Genre:' , len(table_sorted))
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Total number of apps in this Genre: 191
Google Play Books : 1000000000.0
Wattpad 📖 Free Books : 100000000.0
Bible : 100000000.0
Audiobooks from Audible : 100000000.0
Amazon Kindle : 100000000.0
Wikipedia : 10000000.0
Spanish English Translator : 10000000.0
Quran for Android : 10000000.0
Oxford Dictionary of English : Free : 10000000.0
NOOK: Read eBooks & Magazines : 10000000.0
Moon+ Reader : 10000000.0
JW Library : 10000000.0
HTC Help : 10000000.0
FBReader: Favorite Book Reader : 10000000.0
English Hindi Dictionary : 10000000.0
English Dictionary - Offline : 10000000.0
Dictionary.com: Find Definitions for English Words : 10000000.0
Dictionary - Merriam-Webster : 10000000.0
Dictionary : 10000000.0
Cool Reader : 10000000.0
Aldiko Book Reader : 10000000.0
Al-Quran (Free) : 10000000.0
Al'Quran Bahasa Indonesia : 10000000.0
Al Quran Indonesia : 10000000.0
Read books online : 5000000.0
English to Hindi Dictionary : 5000000.0
Ebook Reader : 5000000.0
Dictionary - WordWeb : 5000000.0
Bible KJV

The above genre "Books and Reference" was the most popular in the Apple Store. For the Google Play Store:
* There are 191 apps in this genre. The most popular is Google Play Books with more than 1 billion downloads. After that, there are Wattpad, Bible, Audiobooks and Amazon Kindle with more than 100 million downloads. This means that apart from Google, there could be room to compete with the other apps in this category.

#### App profile recommendation

Based on the above information, the most suitable app should be categorized as "Entertainment". As has been seen in the Google Play Store, there is room to develop a new app in this category.
The company should prepare an app that meets a demand that is not fully supported by already developed apps. 
An idea would be to create an app that combines the most popular entertainment applications. In this app, the user will be able to:
1. Stream his/her favourite movies/series/anime.
2. Create a talking avatar that the user will be able to interact with. The avatar will be able to memorize the favourite movies/series/anime and based on them, propose other material that the user could like. The avatar will also have a series calendar, where it will inform the user when the next episode of an ongoing series is going to be.
3. Rate movies/series/anime and provide reviews, which will be rated by the users. The user's avatar will be able to "talk" to the user and specify the most positive and most negative review based on the user ratings.

Furthermore, other features could be that, based on user activity, the user will be able to purchase some movies/series/anime for free.

The main source of income will be in-app adds. The company should discuss with the companies that have the rights of the material that is going to be streamed in this app, so there is an agreement to stream their products.

## Conclusion

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets (giving priority to Google Play Store, as the app will first be launched there).

We concluded that a streaming application with a talking avatar (helper) could be profitable for both Google Play Store and Apple Store. There is no application that combines the streaming features of Netflix, the talking avatar (like Talking Tom) and the rating system of IMDB. So, this could be a chance to prepare all in one app that could compete with each individual application.
The most challenging feature would be to make an agreement with the companies that have the rights to the movies/a
series/anime that is going to be streamed in this app.