# Highest Revenue Ad Based Google play and Apple Store Apps Guided Project

This project will analyze the profitability of free apps downloaded from the Apple Store and Google Play Store. The main source of revenue for free apps is in-app ads, and therefore predominently influenced by the number of app users and the time spent inside the app. 

The goal of this project is to determine which types of apps have the highest number of users. 

We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

### 1. Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play. 

This project will look at a sample of these apps using two data sets:

[Data set 1](https://dq-content.s3.amazonaws.com/350/AppleStore.csv) - Approximately 7,000 apps from the Apple Store

[Data set 2](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv) - Approximately 10,000 apps from the Google Play Store


In [1]:
#open the data sets
#save header row
#save body to a list of lists

from csv import reader
#Apple data
opened_file_apple = open('AppleStore.csv')
read_file_apple = reader(opened_file_apple)
apple_apps_data = list(read_file_apple)
apple_header = apple_apps_data[0]
apple_data = apple_apps_data[1:]

#Google data
opened_file_google = open('googleplaystore.csv')
read_file_google = reader(opened_file_google)
google_apps_data = list(read_file_google)
google_header = google_apps_data[0]
google_data = google_apps_data[1:]

In [2]:
#function given by data quest to print a subset of the data sets

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
#check a subset of the apple data
#find number of rows and columns

print(apple_header)
print('\n')
explore_data(apple_data, 0, 4, rows_and_columns=True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


There are 7197 apple store apps in this data set. The columns that could be helpful in answering the project question, 'which free apps have the highest number of users?', are;
- track_name
- currency
- price
- rating_count_tot
- rating_count_ver
- prime_genre)

Documentation explaining column descriptions can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)

In [4]:
#check a subset of the google data
#find number of rows and columns

print(google_header)
print('\n')
explore_data(google_data, 0, 4, rows_and_columns=True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13


There are 10841 google play store apps in this data set. The columns that could be helpful in answering the project question, 'which free apps have the highest number of users?', are;

- App
- Category
- Rating
- Installs
- Type
- Price
- Genres

Documentation explaining column descriptions can be found [here](https://www.kaggle.com/lava18/google-play-store-apps)

### 2. Deleting Incorrect Data

The Google Play [discussion forum for this data set](https://www.kaggle.com/lava18/google-play-store-apps/discussion) mentions a row with incorrect data [](https://www.kaggle.com/lava18/google-play-store-apps/discussion/164101) 

In [5]:
print(google_header) # header row
print('\n')
print(google_data[10472]) # incorrect row
print('\n')
print(google_data[0]) # correct row
print('\n')
print(google_data[1]) # correct  row

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


The incorrect row is missing the Category. Every value after has moved one column to the left.

The cell below deletes the incorrect row and checks the length to make sure it was deleted

In [6]:
print(len(google_data))
del google_data[10472]
print(len(google_data))

10841
10840


### 3. Duplicates

In [7]:
# this shows that Instagram has 4 entries

print(google_header)
print('\n')
for app in google_data:
    if app[0] == 'Instagram':
        print(app)
        print('\n')

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




In [8]:
duplicate_apps = []
unique_apps = []

for app in google_data:
    app_name = app[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)

print('Number of duplicate apps:', len(duplicate_apps))
print('Number of unique apps:', len(unique_apps))

Number of duplicate apps: 1181
Number of unique apps: 9659


In [9]:
# taking a closer look at the third index 'Reviews'
# the latest record must be the one with the highest number of reviews

instagram_reviews = []
for app in google_data:
    if app[0] == 'Instagram':
        instagram_reviews.append(app[3])
print(instagram_reviews)
print(max(instagram_reviews))

['66577313', '66577446', '66577313', '66509917']
66577446


In [10]:
# create a dictionary
# loop through the google data to create entries in the dictionary with the highest reviews

reviews_max = {}
for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    if name not in reviews_max or name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
print(len(reviews_max))

9659


In [11]:
# use the dictionary to create a list of unique apps by the highest number of reviews

google_clean = []
already_added = []

for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        google_clean.append(app)
        already_added.append(name)
explore_data(google_clean, 0, 4, rows_and_columns=True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9659
Number of columns: 13


In [12]:
# check for duplicates in the apple data

apple_duplicates = []
apple_unique = []

for app in apple_data:
    id = app[0]
    if id in apple_unique:
        apple_duplicates.append(id)
    else:
        apple_unique.append(id)
print(len(apple_duplicates))
print(len(apple_unique))
print(len(apple_data))

0
7197
7197


No duplicates found in the apple data

### Remove Non-English Apps

According to the ASCII system, english characters have an ASCII code of between 1 - 127. For this project we will assume that any app name with more than 3 ASCII character codes above 127 is a non-english app.

In [13]:
# function to check each character in string for its ASCII code
# only strings with 3 or less codes above 127 will return True

def check_english(string):
    count_non_english = 0
    for character in string:
        if ord(character) > 127:
            count_non_english += 1
    if count_non_english > 3:
        return False
    else:
        return True

print(check_english('Instagram'))  
print(check_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_english('Docs To Go™ Free Office Suite'))
print(check_english('Instachat 😜'))

True
False
True
True


In [14]:
# create an apple data list and google data list of apps we have defined as english
# using the function in the cell above
apple_english = []
google_english = []

for app in apple_data:
    if check_english(app[1]) == True:
        apple_english.append(app)

for app in google_clean:
    if check_english(app[0]) == True:
        google_english.append(app)

explore_data(apple_english, 0, 4, rows_and_columns=True)
print('\n')
explore_data(google_english, 0, 4, rows_and_columns=True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 6183
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketc

### Isolating the Free Apps

In [15]:
apple_final_clean = []
google_final_clean = []

for app in apple_english:
    if app[4] == '0.0':
        apple_final_clean.append(app)
        
for app in google_english:
    if app[6] == 'Free':
        google_final_clean.append(app)
        
print(len(apple_final_clean))
print(len(google_final_clean))

3222
8863


### Finding The Most Common Apps by Genre

This analysis will be used to build an app that will eventually be released on both the Google Play Store and the Apple Store.

Therefore, an analysis of both the apple and google markets is necessary.

In [16]:
print(apple_header)
print('\n')
print(google_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


##### Create frequncey tables to determine the most common genres

From the Apple data, use the columns **prime_genre** 

From the Google data, use the columns **Category** and **Genres**

In [17]:
# the freq_table function creates a dictionary with column values as the key and their frequencies as the key value
# finally it returns a dictionary with percentages as the frequency

def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage

    return table_percentages

# the display_table function uses the freq_table function to convert the percentages dictionary
# to a tuple within a list so that it can be sorted
# finally it prints the columns values by frequency in descending order

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [18]:
# apple store 'prime_genre' column
# frequency of apps by genre as percentage 

display_table(apple_final_clean, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Approximately 78% of Apps in the Apple Store are based on a 'fun' element coming from the genres: 

* Games 
* Entertainment 
* Photo & Video
* Social Networking
* Sports
* Music

This doesn't imply they have the most users.

In [19]:
# google store 'category' column
# frequency of apps by category as percentage

display_table(google_final_clean, 1)

FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

The most common genres are different in the Google Play Store than the Apple Store. The Family category is mainly kids games so grouping it into apps that are based on a 'fun' element, the following categories sum to approximately 37%.

* Family
* Game
* Sports
* Photography
* Social

The following categories have a more practical element. They account for approximately 47%.

* Tools
* Business
* Lifestyle
* Productivity
* Finance
* Medical
* Personalization
* Communication
* Health and fitness
* Photograpthy
* News and Magazines
* Travel and Local
* Shopping
* Books and Reference

In [20]:
# google store 'genres' column
# frequency of apps by genre as percentage

display_table(google_final_clean, -4)

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

The Genres column from the google data splits categories further. Moving forward we'll use the Category column to give a broader overview.

### Finding the Most Popular Apps by Genre

We will do this by calculating average installs for each genre. For the Google data we can use the column 'Installs'. For the apple data we'll have to use the number of ratings from the 'rating_count_tot' column since there is no 'installs' data.

##### Apple Store Data

In [21]:
apple_genres = freq_table(apple_final_clean, 11)

genre_freq = {}

# generates an average amount of ratings for each genre
for genre in apple_genres:
    total = 0
    len_genre = 0
    for app in apple_final_clean:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    genre_freq[genre] = round(avg_n_ratings)
    print(genre, ':', round(avg_n_ratings))

Social Networking : 71548
Photo & Video : 28442
Games : 22789
Music : 57327
Reference : 74942
Health & Fitness : 23298
Weather : 52280
Utilities : 18684
Travel : 28244
Shopping : 26920
News : 21248
Navigation : 86090
Lifestyle : 16486
Entertainment : 14030
Food & Drink : 33334
Sports : 23009
Book : 39758
Finance : 31468
Education : 7004
Productivity : 21028
Business : 7491
Catalogs : 4004
Medical : 612


In [22]:
# top 5 genres by average number of ratings given
sorted(genre_freq.items(), key=lambda x: x[1], reverse=True)[:5]

[('Navigation', 86090),
 ('Reference', 74942),
 ('Social Networking', 71548),
 ('Music', 57327),
 ('Weather', 52280)]

The top 5 most popular apps by genre and their average number of ratings are;

1. Navigation : 86090
2. Reference : 74942
3. Social Networking : 71548
4. Music : 57327
5. Weather : 52280

In [23]:
# generates the number of apps in each genre
number_of_apps_in_genre = {}
for app in apple_final_clean:
    genre = app[11]
    if genre in number_of_apps_in_genre:
        number_of_apps_in_genre[genre] += 1
    else:
        number_of_apps_in_genre[genre] = 1
print('Navigation', ':', number_of_apps_in_genre.get('Navigation'))
print('Reference', ':', number_of_apps_in_genre.get('Reference'))
print('Social Networking', ':', number_of_apps_in_genre.get('Social Networking'))
print('Music', ':', number_of_apps_in_genre.get('Music'))
print('Weather', ':', number_of_apps_in_genre.get('Weather'))

Navigation : 6
Reference : 18
Social Networking : 106
Music : 66
Weather : 28


In [24]:
# apps in the Navigation genre and a break down of the number of ratings given
for app in apple_final_clean:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


There are only 6 apps in the Navigation genre. Two of those, Waze and google, dominate the market with approximately half a million reviews.  

Therefore, it may be risky to develop an app in Navigation unless you have confidence in an innovative idea.

In [25]:
# apps in the Reference genre and a break down of the number of ratings given
for app in apple_final_clean:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


The reference genre is not completely dominated by two or three apps but the dictionary and map type apps in this genre may not have users spending enough time in them to watch adds. 

In [26]:
# apps in the Social Networking genre and a break down of the number of ratings given
for app in apple_final_clean:
    if (app[11] == 'Social Networking') and int(app[5]) > 10000:
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

The Social Networking genre looks promising. There are a lot of apps with over 10000 reviews but the Social Networking genre might have made the top 5 most popular apps list due to a few apps with an extremely high number of reviews. 

The Music and Weather genre's below are similar to Social Networking, they have a few apps with a very high number of reviews.

In [27]:
# apps in the Music genre and a break down of the number of ratings given
for app in apple_final_clean:
    if (app[11] == 'Music') and int(app[5]) > 10000:
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118


In [28]:
# apps in the Weather genre and a break down of the number of ratings given
for app in apple_final_clean:
    if (app[11] == 'Weather') and int(app[5]) > 10000:
        print(app[1], ':', app[5])

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792


From this analysis I would suggest a Social Networking app. This genre has apps that users spend a lot of time in when they are relaxing. They may be more likely to watch ads, and click through to ads in these apps rather than apps in the Navigation, Reference, Music and Weather genres where users are in the app for a purpose.

##### Google Play Store Data

In [29]:
# google store 'installs' column
# frequency of apps by number of installs as percentage

display_table(google_final_clean, 5)

1,000,000+ : 15.728308699086089
100,000+ : 11.55365000564143
10,000,000+ : 10.549475346947986
10,000+ : 10.199706645605326
1,000+ : 8.394448832223853
100+ : 6.916393997517771
5,000,000+ : 6.826131106848697
500,000+ : 5.562450637481666
50,000+ : 4.772650344127271
5,000+ : 4.513144533453684
10+ : 3.542818458761142
500+ : 3.2494640640866526
50,000,000+ : 2.3017037120613786
100,000,000+ : 2.1324607920568655
50+ : 1.9180864267178157
5+ : 0.7898002933543946
1+ : 0.5077287600135394
500,000,000+ : 0.270788672007221
1,000,000,000+ : 0.2256572266726842
0+ : 0.045131445334536835


In [30]:
# installs column values in descending order
installs_values = []
for value in google_final_clean:
    n_installs = value[5]
    n_installs = n_installs.replace('+', '')
    n_installs = n_installs.replace(',', '')
    n_installs = int(n_installs)
    if n_installs not in installs_values:
        installs_values.append(n_installs)
sorted(installs_values, reverse=True)

[1000000000,
 500000000,
 100000000,
 50000000,
 10000000,
 5000000,
 1000000,
 500000,
 100000,
 50000,
 10000,
 5000,
 1000,
 500,
 100,
 50,
 10,
 5,
 1,
 0]

Since we only need an idea of which categories have the most users, we will use the installs column values to represent the actual number of installs, for instance, 5,000+ installs will represent 5000 installs and 100,000+ installs will represent 100,000 installs.

Firstly, we will calculate the average amount of installs for each category.

In [45]:
google_category = freq_table(google_final_clean, 1)

category_freq = {}

#generates an average amount of installs for each category
for category in google_category:
    total = 0
    len_category = 0
    for row in google_final_clean:
        category_app = row[1]
        if category_app == category:
            n_installs = row[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
    avg_installs = total / len_category
    category_freq[category] = round(avg_installs)
    print(category, ':', round(avg_installs))

ART_AND_DESIGN : 1986335
AUTO_AND_VEHICLES : 647318
BEAUTY : 513152
BOOKS_AND_REFERENCE : 8767812
BUSINESS : 1712290
COMICS : 817657
COMMUNICATION : 38456119
DATING : 854029
EDUCATION : 1833495
ENTERTAINMENT : 11640706
EVENTS : 253542
FINANCE : 1387692
FOOD_AND_DRINK : 1924898
HEALTH_AND_FITNESS : 4188822
HOUSE_AND_HOME : 1331541
LIBRARIES_AND_DEMO : 638504
LIFESTYLE : 1437816
GAME : 15588016
FAMILY : 3697848
MEDICAL : 120551
SOCIAL : 23253652
SHOPPING : 7036877
PHOTOGRAPHY : 17840110
SPORTS : 3638640
TRAVEL_AND_LOCAL : 13984078
TOOLS : 10801391
PERSONALIZATION : 5201483
PRODUCTIVITY : 16787331
PARENTING : 542604
WEATHER : 5074486
VIDEO_PLAYERS : 24727872
NEWS_AND_MAGAZINES : 9549178
MAPS_AND_NAVIGATION : 4056942


In [32]:
# top 5 categories by average number of installs
sorted(category_freq.items(), key=lambda i: i[1], reverse=True)[:5]

[('COMMUNICATION', 38456119),
 ('VIDEO_PLAYERS', 24727872),
 ('SOCIAL', 23253652),
 ('PHOTOGRAPHY', 17840110),
 ('PRODUCTIVITY', 16787331)]

In [33]:
# number of apps per category
number_apps_per_category = {}
for app in google_final_clean:
    category = app[1]
    if category in number_apps_per_category:
        number_apps_per_category[category] += 1
    else:
        number_apps_per_category[category] = 1
        
print('COMMUNICATION', ':', number_apps_per_category.get('COMMUNICATION'))
print('VIDEO_PLAYERS', ':', number_apps_per_category.get('VIDEO_PLAYERS'))
print('SOCIAL', ':', number_apps_per_category.get('SOCIAL'))
print('PHOTOGRAPHY', ':', number_apps_per_category.get('PHOTOGRAPHY'))
print('PRODUCTIVITY', ':', number_apps_per_category.get('PRODUCTIVITY'))

COMMUNICATION : 287
VIDEO_PLAYERS : 159
SOCIAL : 236
PHOTOGRAPHY : 261
PRODUCTIVITY : 345


In [34]:
# function to show apps with more than 10,000 installs by category

def installs_above_ten_thousand(category_string):
    installs = {}
    for app in google_final_clean:
        n_installs = app[5]
        n_installs = n_installs.replace('+', '')
        n_installs = n_installs.replace(',', '')
        n_installs = int(n_installs)
        if (app[1] == category_string) and (n_installs > 10000):
            installs[app[0]] = n_installs
    return installs
installs_above_ten_thousand('COMMUNICATION')

{'WhatsApp Messenger': 1000000000,
 'Messenger for SMS': 10000000,
 'My Tele2': 5000000,
 'imo beta free calls and text': 100000000,
 'Contacts': 50000000,
 'Call Free – Free Call': 5000000,
 'Web Browser & Explorer': 5000000,
 'Browser 4G': 10000000,
 'MegaFon Dashboard': 10000000,
 'ZenUI Dialer & Contacts': 10000000,
 'Cricket Visual Voicemail': 10000000,
 'TracFone My Account': 1000000,
 'Xperia Link™': 10000000,
 'TouchPal Keyboard - Fun Emoji & Android Keyboard': 10000000,
 'Skype Lite - Free Video Call & Chat': 5000000,
 'My magenta': 1000000,
 'Android Messages': 100000000,
 'Google Duo - High Quality Video Calls': 500000000,
 'Seznam.cz': 1000000,
 'Antillean Gold Telegram (original version)': 100000,
 'AT&T Visual Voicemail': 10000000,
 'GMX Mail': 10000000,
 'Omlet Chat': 10000000,
 'My Vodacom SA': 5000000,
 'Microsoft Edge': 5000000,
 'Messenger – Text and Video Chat for Free': 1000000000,
 'imo free video calls and chat': 500000000,
 'Calls & Text by Mo+': 5000000,
 'free

In [35]:
# function to show the number of apps in a category with more than 10,000 installs
# and the percentage of apps in category with more than 10,000 installs

def freq_and_percent(category):
    print('Number of apps in {} category with > 10,000 installs:'.format(category), len(installs_above_ten_thousand(category)))
    print('Percent of apps in {} category with > 10,000 installs:'.format(category), round(len(installs_above_ten_thousand(category)) / 287 * 100, 2), '%')

In [36]:
freq_and_percent('COMMUNICATION')

Number of apps in COMMUNICATION category with > 10,000 installs: 174
Percent of apps in COMMUNICATION category with > 10,000 installs: 60.63 %


In [37]:
installs_above_ten_thousand('VIDEO_PLAYERS')

{'YouTube': 1000000000,
 'All Video Downloader 2018': 1000000,
 'Video Downloader': 10000000,
 'HD Video Player': 1000000,
 'Iqiyi (for tablet)': 1000000,
 'Video Player All Format': 10000000,
 'Motorola Gallery': 100000000,
 'Free TV series': 100000,
 'Video Player All Format for Android': 500000,
 'VLC for Android': 100000000,
 'Code': 10000000,
 'Vote for': 50000000,
 'XX HD Video downloader-Free Video Downloader': 1000000,
 'OBJECTIVE': 1000000,
 'Music - Mp3 Player': 10000000,
 'HD Movie Video Player': 1000000,
 'YouCut - Video Editor & Video Maker, No Watermark': 5000000,
 'Video Editor,Crop Video,Movie Video,Music,Effects': 1000000,
 'YouTube Studio': 10000000,
 'video player for android': 10000000,
 'Vigo Video': 50000000,
 'Google Play Movies & TV': 1000000000,
 'HTC Service － DLNA': 10000000,
 'VPlayer': 1000000,
 'MiniMovie - Free Video and Slideshow Editor': 50000000,
 'Samsung Video Library': 50000000,
 'OnePlus Gallery': 1000000,
 'LIKE – Magic Video Maker & Community': 5

In [38]:
freq_and_percent('VIDEO_PLAYERS')

Number of apps in VIDEO_PLAYERS category with > 10,000 installs: 107
Percent of apps in VIDEO_PLAYERS category with > 10,000 installs: 37.28 %


In [39]:
installs_above_ten_thousand('SOCIAL')

{'Facebook': 1000000000,
 'Facebook Lite': 500000000,
 'Tumblr': 100000000,
 'Social network all in one 2018': 100000,
 'Pinterest': 100000000,
 'TextNow - free text + calls': 10000000,
 'Google+': 1000000000,
 'The Messenger App': 1000000,
 'Messenger Pro': 1000000,
 'Free Messages, Video, Chat,Text for Messenger Plus': 1000000,
 'Telegram X': 5000000,
 'The Video Messenger App': 100000,
 'Jodel - The Hyperlocal App': 1000000,
 'Hide Something - Photo, Video': 5000000,
 'Love Sticker': 1000000,
 'Web Browser & Fast Explorer': 5000000,
 'LiveMe - Video chat, new friends, and make money': 10000000,
 'VidStatus app - Status Videos & Status Downloader': 5000000,
 'Love Images': 1000000,
 'Web Browser ( Fast & Secure Web Explorer)': 500000,
 'SPARK - Live random video chat & meet new people': 5000000,
 'Golden telegram': 50000,
 'Facebook Local': 1000000,
 'Meet – Talk to Strangers Using Random Video Chat': 5000000,
 'MobilePatrol Public Safety App': 1000000,
 '💘 WhatsLov: Smileys of love,

In [40]:
freq_and_percent('SOCIAL')

Number of apps in SOCIAL category with > 10,000 installs: 145
Percent of apps in SOCIAL category with > 10,000 installs: 50.52 %


In [41]:
installs_above_ten_thousand('PHOTOGRAPHY')

{'TouchNote: Cards & Gifts': 1000000,
 'FreePrints – Free Photos Delivered': 1000000,
 'Groovebook Photo Books & Gifts': 500000,
 'Moony Lab - Print Photos, Books & Magnets ™': 50000,
 'LALALAB prints your photos, photobooks and magnets': 1000000,
 'Snapfish': 1000000,
 'Motorola Camera': 50000000,
 'HD Camera - Best Cam with filters & panorama': 5000000,
 'LightX Photo Editor & Photo Effects': 10000000,
 'Sweet Snap - live filter, Selfie photo edit': 10000000,
 'HD Camera - Quick Snap Photo & Video': 1000000,
 'B612 - Beauty & Filter Camera': 100000000,
 'Waterfall Photo Frames': 1000000,
 'Photo frame': 100000,
 'Huji Cam': 5000000,
 'Unicorn Photo': 1000000,
 'HD Camera': 5000000,
 'Makeup Editor -Beauty Photo Editor & Selfie Camera': 1000000,
 'Makeup Photo Editor: Makeup Camera & Makeup Editor': 1000000,
 'Moto Photo Editor': 5000000,
 'InstaBeauty -Makeup Selfie Cam': 50000000,
 'Garden Photo Frames - Garden Photo Editor': 500000,
 'Photo Frame': 10000000,
 'Selfie Camera - Photo

In [42]:
freq_and_percent('PHOTOGRAPHY')

Number of apps in PHOTOGRAPHY category with > 10,000 installs: 208
Percent of apps in PHOTOGRAPHY category with > 10,000 installs: 72.47 %


In [43]:
installs_above_ten_thousand('PRODUCTIVITY')

{'Microsoft Word': 500000000,
 'All-In-One Toolbox: Cleaner, Booster, App Manager': 10000000,
 'AVG Cleaner – Speed, Battery & Memory Booster': 10000000,
 'QR Scanner & Barcode Scanner 2018': 10000000,
 'Chrome Beta': 10000000,
 'Microsoft Outlook': 100000000,
 'Google PDF Viewer': 10000000,
 'My Claro Peru': 5000000,
 'Power Booster - Junk Cleaner & CPU Cooler & Boost': 1000000,
 'Google Assistant': 10000000,
 'Microsoft OneDrive': 100000000,
 'Calculator - unit converter': 50000000,
 'Microsoft OneNote': 100000000,
 'Metro name iD': 10000000,
 'Google Keep': 100000000,
 'Archos File Manager': 5000000,
 'ES File Explorer File Manager': 100000000,
 'ASUS SuperNote': 10000000,
 'HTC File Manager': 10000000,
 'MyMTN': 1000000,
 'Dropbox': 500000000,
 'ASUS Quick Memo': 10000000,
 'HTC Calendar': 10000000,
 'Google Docs': 100000000,
 'ASUS Calling Screen': 10000000,
 'lifebox': 5000000,
 'Yandex.Disk': 5000000,
 'Content Transfer': 5000000,
 'HTC Mail': 10000000,
 'Advanced Task Killer': 

In [44]:
freq_and_percent('PRODUCTIVITY')

Number of apps in PRODUCTIVITY category with > 10,000 installs: 206
Percent of apps in PRODUCTIVITY category with > 10,000 installs: 71.78 %


### Conclusion

From this analysis, I would suggest the development of a Social Media App for the following reasons;

1. They are successful in both the Apple store and Google Play store markets

2. Users may be more receptive to in app ads in social media apps when they are relaxing. Apps in genres/categories such as Navigation, Weather, Communication, Productivity, Photograpthy, and Reference generally have users enter for shorter periods of time for a specific purpose, and therefore in a rush to leave.