# Google and Apple Apps Data Analysis - Part 2

Recall the goal for this project: to analyze datasets for [Google](https://www.kaggle.com/lava18/google-play-store-apps/version/6) and [Apple](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) apps and advise our client on which type of app would perform well in both markets.

In the first notebook, we cleaned and transformed our data. In this second part we analyze calculate some metrics by app genre to make a recommendation on which genres tend to be most successful on both platforms.

## I. Import and Review Transformed Data.

In [1]:
from csv import reader

open_file_google = open('google_data_transformed.csv', encoding="utf-8")
read_file_google = reader(open_file_google)
google_data = list(read_file_google)

open_file_apple = open('apple_data_transformed.csv', encoding="utf-8")
read_file_apple = reader(open_file_apple)
apple_data = list(read_file_apple)

In [2]:
# We use the same function we defined in the first notebook to explore the data

def explore_data(dataset, start, end, rows_and_columns=False, header=False):  
    
    if rows_and_columns:
        # print number of rows
        if header:
            print('Number of rows with data:', len(dataset)-1) # account for header row by subtracting one from total row count
        else:
            print('Number of rows:', len(dataset))
        
        # print number of columns
        print('Number of columns:', len(dataset[0]))
        print('\n')
    
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

In [3]:
explore_data(google_data, 0, 3, rows_and_columns=True, header=True)

Number of rows with data: 8848
Number of columns: 13


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']




In [4]:
explore_data(apple_data, 0, 3, rows_and_columns=True, header=True)

Number of rows with data: 4054
Number of columns: 16


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']




## II. Investigate Variables

We look at what the most common genres are.

The Google data has two fields that may give us this information - "Categories" and "Genres". We examine both.

In the Apple data there is a field labelled "prime_genre" which seems to be exactly what we are looking for.

In [5]:
# Look at Google data field "Genres"

genres_google = []
for row in google_data[1:]:
    genre = row[9]
    if genre not in genres_google:
        genres_google.append(genre)

print(len(genres_google))
genres_google

114


['Art & Design',
 'Art & Design;Creativity',
 'Auto & Vehicles',
 'Beauty',
 'Books & Reference',
 'Business',
 'Comics',
 'Comics;Creativity',
 'Communication',
 'Dating',
 'Education',
 'Education;Creativity',
 'Education;Education',
 'Education;Pretend Play',
 'Education;Brain Games',
 'Entertainment',
 'Entertainment;Brain Games',
 'Entertainment;Creativity',
 'Entertainment;Music & Video',
 'Events',
 'Finance',
 'Food & Drink',
 'Health & Fitness',
 'House & Home',
 'Libraries & Demo',
 'Lifestyle',
 'Lifestyle;Pretend Play',
 'Card',
 'Arcade',
 'Puzzle',
 'Racing',
 'Sports',
 'Casual',
 'Simulation',
 'Adventure',
 'Trivia',
 'Action',
 'Word',
 'Role Playing',
 'Strategy',
 'Board',
 'Music',
 'Action;Action & Adventure',
 'Casual;Brain Games',
 'Educational;Creativity',
 'Puzzle;Brain Games',
 'Educational;Education',
 'Casual;Pretend Play',
 'Educational;Brain Games',
 'Art & Design;Pretend Play',
 'Educational;Pretend Play',
 'Entertainment;Education',
 'Casual;Education',

In [6]:
# Look at Google data field "Category"

categories_google = []
for row in google_data[1:]:
    category = row[1]
    if category not in categories_google:
        categories_google.append(category)

print(len(categories_google))
categories_google

33


['ART_AND_DESIGN',
 'AUTO_AND_VEHICLES',
 'BEAUTY',
 'BOOKS_AND_REFERENCE',
 'BUSINESS',
 'COMICS',
 'COMMUNICATION',
 'DATING',
 'EDUCATION',
 'ENTERTAINMENT',
 'EVENTS',
 'FINANCE',
 'FOOD_AND_DRINK',
 'HEALTH_AND_FITNESS',
 'HOUSE_AND_HOME',
 'LIBRARIES_AND_DEMO',
 'LIFESTYLE',
 'GAME',
 'FAMILY',
 'MEDICAL',
 'SOCIAL',
 'SHOPPING',
 'PHOTOGRAPHY',
 'SPORTS',
 'TRAVEL_AND_LOCAL',
 'TOOLS',
 'PERSONALIZATION',
 'PRODUCTIVITY',
 'PARENTING',
 'WEATHER',
 'VIDEO_PLAYERS',
 'NEWS_AND_MAGAZINES',
 'MAPS_AND_NAVIGATION']

In [7]:
# Look at Apple data field "prime_genre"

genres_apple = []
for row in apple_data[1:]:
    genre = row[11]
    if genre not in genres_apple:
        genres_apple.append(genre)

print(len(genres_apple))
genres_apple

23


['Social Networking',
 'Photo & Video',
 'Games',
 'Music',
 'Reference',
 'Health & Fitness',
 'Weather',
 'Utilities',
 'Travel',
 'Shopping',
 'News',
 'Navigation',
 'Lifestyle',
 'Entertainment',
 'Food & Drink',
 'Sports',
 'Book',
 'Finance',
 'Education',
 'Productivity',
 'Business',
 'Catalogs',
 'Medical']

In comparing the values in the fields "Category" and "Genres" of the Google dataset, we note that there is significant overlap. It seems that "Genre" breaks down some variables in Category into more precise classifications, and sometimes includes a secondary subgenre label in the string value.

See examples below; the Category values "GAMES" and "FAMILY", for instance, have different values in the Genre field, whereas "BEAUTY"/"Beauty" is the same in both fields.

In [8]:
# Compare google_data fields "Category" and "Genres"

for row in google_data:
    print([row[1], row[9]])

['Category', 'Genres']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design;Creativity']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design;Creativity']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design;Creativity']
['ART_AND_DESIGN', 'Art & Design']
['ART_AND_DESIGN', 'Art & Design']

['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment;Brain Games']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment;Creativity']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment']
['ENTERTAINMENT', 'Entertainment

['HEALTH_AND_FITNESS', 'Health & Fitness']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOME', 'House & Home']
['HOUSE_AND_HOM

['GAME', 'Action']
['GAME', 'Puzzle']
['GAME', 'Puzzle']
['GAME', 'Puzzle']
['GAME', 'Arcade']
['GAME', 'Arcade']
['GAME', 'Arcade']
['GAME', 'Action']
['GAME', 'Word']
['FAMILY', 'Casual;Brain Games']
['FAMILY', 'Educational;Creativity']
['FAMILY', 'Puzzle;Brain Games']
['FAMILY', 'Educational;Education']
['FAMILY', 'Casual;Brain Games']
['FAMILY', 'Educational;Education']
['FAMILY', 'Casual;Brain Games']
['FAMILY', 'Education;Creativity']
['FAMILY', 'Casual;Pretend Play']
['FAMILY', 'Casual;Brain Games']
['FAMILY', 'Casual;Brain Games']
['FAMILY', 'Education;Creativity']
['FAMILY', 'Educational;Education']
['FAMILY', 'Educational;Brain Games']
['FAMILY', 'Art & Design;Pretend Play']
['FAMILY', 'Educational;Pretend Play']
['FAMILY', 'Education;Education']
['FAMILY', 'Educational;Education']
['FAMILY', 'Entertainment;Education']
['FAMILY', 'Entertainment;Brain Games']
['FAMILY', 'Casual;Education']
['FAMILY', 'Educational;Education']
['FAMILY', 'Casual;Brain Games']
['FAMILY', 'Casual;

['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['SHOPPING', 'Shopping']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photog

['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PRODUCTI

['FAMILY', 'Education']
['FAMILY', 'Education']
['FAMILY', 'Education']
['FAMILY', 'Education']
['FAMILY', 'Education']
['FAMILY', 'Education']
['FAMILY', 'Education']
['FAMILY', 'Education']
['FAMILY', 'Education']
['VIDEO_PLAYERS', 'Video Players & Editors']
['COMMUNICATION', 'Communication']
['GAME', 'Action']
['GAME', 'Arcade']
['VIDEO_PLAYERS', 'Video Players & Editors']
['GAME', 'Racing']
['GAME', 'Arcade']
['GAME', 'Racing']
['GAME', 'Action']
['GAME', 'Arcade']
['VIDEO_PLAYERS', 'Video Players & Editors']
['FAMILY', 'Simulation']
['GAME', 'Racing']
['FAMILY', 'Role Playing']
['SPORTS', 'Sports']
['PERSONALIZATION', 'Personalization']
['GAME', 'Arcade']
['FAMILY', 'Puzzle']
['GAME', 'Arcade']
['FAMILY', 'Casual']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['FAMILY', 'Simulation']
['FAMILY', 'Entertainment']
['NEWS_AND_MAGAZINES', 'News & Magazines']
['FAMILY', 'Action;Action & Adventure']
['FAMILY', 'Entertainment']
['FAMILY', 'Role Playing']
['FAMILY', 'Entertainment']
['LIFESTYLE',

['PHOTOGRAPHY', 'Photography']
['TOOLS', 'Tools']
['MAPS_AND_NAVIGATION', 'Maps & Navigation']
['GAME', 'Music']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['COMMUNICATION', 'Communication']
['BUSINESS', 'Business']
['TOOLS', 'Tools']
['FOOD_AND_DRINK', 'Food & Drink']
['FINANCE', 'Finance']
['FAMILY', 'Education']
['MAPS_AND_NAVIGATION', 'Maps & Navigation']
['TOOLS', 'Tools']
['FAMILY', 'Education']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['FAMILY', 'Education']
['FAMILY', 'Education']
['FAMILY', 'Education']
['FAMILY', 'Puzzle']
['FAMILY', 'Simulation']
['FAMILY', 'Simulation']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['FAMILY', 'Education']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['FAMILY', 'Education']
['TOOLS', 'Tools']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['FAMILY', 'Education']
['FAMILY', 'Education']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['FAMILY', 'Simulation']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['FAMILY', 'Education']
['FAMILY', 'Education']

['PRODUCTIVITY', 'Productivity']
['COMICS', 'Comics']
['TOOLS', 'Tools']
['COMMUNICATION', 'Communication']
['PRODUCTIVITY', 'Productivity']
['FAMILY', 'Education']
['VIDEO_PLAYERS', 'Video Players & Editors']
['PRODUCTIVITY', 'Productivity']
['BUSINESS', 'Business']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['COMMUNICATION', 'Communication']
['GAME', 'Card']
['WEATHER', 'Weather']
['LIFESTYLE', 'Lifestyle']
['BUSINESS', 'Business']
['TOOLS', 'Tools']
['BUSINESS', 'Business']
['TOOLS', 'Tools']
['PRODUCTIVITY', 'Productivity']
['FAMILY', 'Puzzle']
['SHOPPING', 'Shopping']
['FAMILY', 'Entertainment']
['VIDEO_PLAYERS', 'Video Players & Editors']
['FAMILY', 'Education']
['FAMILY', 'Puzzle']
['PRODUCTIVITY', 'Productivity']
['GAME', 'Action']
['FAMILY', 'Casual']
['SPORTS', 'Sports']
['GAME', 'Casino']
['GAME', 'Action']
['GAME', 'Card']
['GAME', 'Casino']
['FAMILY', 'Casual']
['GAME', 'Arcade']
['GAME', 'Arcade']
['GAME', 'Casino']
['GAME', 'Card']
['SPORTS', 'Sports']
['GAME', 'Card']
['GAME

['FAMILY', 'Puzzle']
['FAMILY', 'Strategy']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['FINANCE', 'Finance']
['LIFESTYLE', 'Lifestyle']
['TOOLS', 'Tools']
['FAMILY', 'Entertainment']
['PRODUCTIVITY', 'Productivity']
['GAME', 'Action']
['FAMILY', 'Simulation']
['COMMUNICATION', 'Communication']
['VIDEO_PLAYERS', 'Video Players & Editors']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['FAMILY', 'Entertainment']
['FAMILY', 'Strategy']
['FAMILY', 'Strategy']
['GAME', 'Action']
['GAME', 'Action']
['FAMILY', 'Entertainment']
['GAME', 'Action']
['FAMILY', 'Strategy']
['FAMILY', 'Simulation']
['FAMILY', 'Simulation']
['VIDEO_PLAYERS', 'Video Players & Editors']
['FAMILY', 'Educational']
['GAME', 'Action']
['FAMILY', 'Entertainment']
['FAMILY', 'Simulation']
['GAME', 'Adventure']
['GAME', 'Adventure']
['FAMILY', 'Entertainment']
['GAME', 'Action']
['FAMILY', 'Entertainment']
['GAME', 'Adventure']
['GAME', 'Action']
['GAME', 'Action']
['GAME', 'Adventure']
['GAME', 'Action']
['GAME', 'Action']
['

['FINANCE', 'Finance']
['FAMILY', 'Education']
['FAMILY', 'Entertainment']
['FAMILY', 'Education']
['BUSINESS', 'Business']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['FAMILY', 'Education']
['COMMUNICATION', 'Communication']
['LIFESTYLE', 'Lifestyle']
['COMMUNICATION', 'Communication']
['COMMUNICATION', 'Communication']
['NEWS_AND_MAGAZINES', 'News & Magazines']
['FINANCE', 'Finance']
['GAME', 'Trivia']
['PHOTOGRAPHY', 'Photography']
['LIFESTYLE', 'Lifestyle']
['FAMILY', 'Simulation']
['SOCIAL', 'Social']
['PERSONALIZATION', 'Personalization']
['FAMILY', 'Entertainment']
['FAMILY', 'Casual']
['SPORTS', 'Sports']
['PERSONALIZATION', 'Personalization']
['PHOTOGRAPHY', 'Photography']
['WEATHER', 'Weather']
['FAMILY', 'Entertainment']
['FAMILY', 'Casual']
['FAMILY', 'Entertainment']
['SPORTS', 'Sports']
['MAPS_AND_NAVIGATION', 'Maps & Navigation']
['SPORTS', 'Sports']
['FAMILY', 'Casual']
['PHOTOGRAPHY', 'Photography']
['COMICS', 'Comics']
['FAMILY', 'Ente

['GAME', 'Action']
['TOOLS', 'Tools']
['BUSINESS', 'Business']
['BUSINESS', 'Business']
['BUSINESS', 'Business']
['LIFESTYLE', 'Lifestyle']
['FAMILY', 'Strategy']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['TRAVEL_AND_LOCAL', 'Travel & Local']
['TOOLS', 'Tools']
['GAME', 'Board']
['PHOTOGRAPHY', 'Photography']
['PHOTOGRAPHY', 'Photography']
['GAME', 'Board']
['GAME', 'Board']
['LIFESTYLE', 'Lifestyle']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['PERSONALIZATION', 'Personalization']
['FINANCE', 'Finance']
['FINANCE', 'Finance']
['BUSINESS', 'Business']
['MAPS_AND_NAVIGATION', 'Maps & Navigation']
['GAME', 'Board']
['TOOLS', 'Tools']
['EVENTS', 'Events']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['PHOTOGRAPHY', 'Photography']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['PERSONALIZATION', 'Personalization']
['PERSONALIZATION', 'Personalization']
['TRAVEL_AND_LOCAL', 'Travel & Local']
['LIFESTYLE', 'Lifestyle']
['TRAVEL_AND_LOCAL', 'Travel & Local']
['FAMILY',

['FAMILY', 'Education']
['TOOLS', 'Tools']
['LIFESTYLE', 'Lifestyle']
['LIFESTYLE', 'Lifestyle']
['LIFESTYLE', 'Lifestyle']
['PERSONALIZATION', 'Personalization']
['COMMUNICATION', 'Communication']
['FINANCE', 'Finance']
['LIFESTYLE', 'Lifestyle']
['TOOLS', 'Tools']
['FINANCE', 'Finance']
['PERSONALIZATION', 'Personalization']
['LIFESTYLE', 'Lifestyle']
['LIFESTYLE', 'Lifestyle']
['PERSONALIZATION', 'Personalization']
['PERSONALIZATION', 'Personalization']
['FINANCE', 'Finance']
['PERSONALIZATION', 'Personalization']
['TOOLS', 'Tools']
['MEDICAL', 'Medical']
['PERSONALIZATION', 'Personalization']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['LIFESTYLE', 'Lifestyle']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['GAME', 'Action']
['LIFESTYLE', 'Lifestyle']
['SPORTS', 'Sports']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['FAMILY', 'Education']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['LIFESTYLE', 'Lifes

['FAMILY', 'Entertainment']
['FAMILY', 'Entertainment']
['FAMILY', 'Entertainment']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['FAMILY', 'Strategy']
['FAMILY', 'Education']
['AUTO_AND_VEHICLES', 'Auto & Vehicles']
['TOOLS', 'Tools']
['PERSONALIZATION', 'Personalization']
['FAMILY', 'Entertainment']
['FAMILY', 'Entertainment']
['FAMILY', 'Entertainment']
['TOOLS', 'Tools']
['FAMILY', 'Education']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['AUTO_AND_VEHICLES', 'Auto & Vehicles']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['HEALTH_AND_FITNESS', 'Health & Fitness']
['TOOLS', 'Tools']
['BUSINESS', 'Business']
['PRODUCTIVITY', 'Productivity']
['PRODUCTIVITY', 'Productivity']
['GAME', 'Action']
['GAME', 'Action']
['GAME', 'Action']
['GAME', 'Action']
['COMMUNICATION', 'Communication']
['GAME', 'Action']
['FAMILY', 'Education']
['GAME', 'Trivia']
['GAME', 'Action']
['GAME', 'Action']
['BUSINESS', 'Business']
['FAMILY', 'Entertainment']
['FAMILY', 'Education']
['GAME', 'Trivia']
['FAMILY', 'Entertainme

['FAMILY', 'Entertainment']
['SPORTS', 'Sports']
['PHOTOGRAPHY', 'Photography']
['GAME', 'Arcade']
['FAMILY', 'Casual']
['GAME', 'Action']
['GAME', 'Trivia']
['FAMILY', 'Casual']
['PERSONALIZATION', 'Personalization']
['TOOLS', 'Tools']
['FAMILY', 'Educational;Pretend Play']
['GAME', 'Arcade']
['GAME', 'Arcade']
['GAME', 'Adventure']
['GAME', 'Music']
['MAPS_AND_NAVIGATION', 'Maps & Navigation']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['GAME', 'Action']
['LIBRARIES_AND_DEMO', 'Libraries & Demo']
['MAPS_AND_NAVIGATION', 'Maps & Navigation']
['TOOLS', 'Tools']
['FAMILY', 'Entertainment']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['NEWS_AND_MAGAZINES', 'News & Magazines']
['TOOLS', 'Tools']
['FAMILY', 'Simulation']
['FAMILY', 'Education']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['TRAVEL_AND_LOCAL', 'Travel & Local']
['PRODUCTIVITY', 'Productivity']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['MAPS_AND_NAVIGATION', 'Maps & Navigation']
['BUSINESS', 'Business']
['TRAVEL_AND_LOC

['TOOLS', 'Tools']
['PERSONALIZATION', 'Personalization']
['PERSONALIZATION', 'Personalization']
['PHOTOGRAPHY', 'Photography']
['FAMILY', 'Education']
['SPORTS', 'Sports']
['GAME', 'Action']
['FAMILY', 'Education']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['FAMILY', 'Entertainment']
['PRODUCTIVITY', 'Productivity']
['FAMILY', 'Education']
['FAMILY', 'Entertainment']
['FAMILY', 'Education;Education']
['PRODUCTIVITY', 'Productivity']
['FAMILY', 'Entertainment']
['TOOLS', 'Tools']
['VIDEO_PLAYERS', 'Video Players & Editors']
['VIDEO_PLAYERS', 'Video Players & Editors']
['LIFESTYLE', 'Lifestyle']
['PHOTOGRAPHY', 'Photography']
['AUTO_AND_VEHICLES', 'Auto & Vehicles']
['FAMILY', 'Entertainment']
['TOOLS', 'Tools']
['FAMILY', 'Entertainment']
['TOOLS', 'Tools']
['PHOTOGRAPHY', 'Photography']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['PHOTOGRAPHY', 'Photography']
['HOUSE_AND_HOME', 'House & Home']
['BOOKS_AND_REFERENCE', 'Books & Reference

['COMMUNICATION', 'Communication']
['FAMILY', 'Casual']
['FAMILY', 'Education']
['FINANCE', 'Finance']
['TOOLS', 'Tools']
['BUSINESS', 'Business']
['PRODUCTIVITY', 'Productivity']
['FAMILY', 'Education']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['GAME', 'Trivia']
['FAMILY', 'Education']
['SPORTS', 'Sports']
['GAME', 'Arcade']
['FAMILY', 'Simulation']
['TOOLS', 'Tools']
['FAMILY', 'Puzzle']
['FAMILY', 'Casual']
['FAMILY', 'Casual']
['FAMILY', 'Education']
['GAME', 'Action']
['FAMILY', 'Puzzle']
['FAMILY', 'Casual']
['FAMILY', 'Casual']
['SPORTS', 'Sports']
['FAMILY', 'Education']
['FAMILY', 'Casual']
['FAMILY', 'Casual;Pretend Play']
['FAMILY', 'Casual;Action & Adventure']
['FAMILY', 'Puzzle']
['NEWS_AND_MAGAZINES', 'News & Magazines']
['FAMILY', 'Entertainment']
['NEWS_AND_MAGAZINES', 'News & Magazines']
['FAMILY', 'Entertainment;Action & Adventure']
['NEWS_AND_MAGAZINES', 'News & Magazines']
['GAME', 'Arcade']
['NEWS_AND_MAGAZINES', 'News & Magazines']
['SPORTS', 'Sports']
['FAMILY', 'Si

['WEATHER', 'Weather']
['TOOLS', 'Tools']
['VIDEO_PLAYERS', 'Video Players & Editors']
['BUSINESS', 'Business']
['TRAVEL_AND_LOCAL', 'Travel & Local']
['COMMUNICATION', 'Communication']
['TOOLS', 'Tools']
['PRODUCTIVITY', 'Productivity']
['FAMILY', 'Education']
['TOOLS', 'Tools']
['FINANCE', 'Finance']
['LIFESTYLE', 'Lifestyle']
['FINANCE', 'Finance']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['VIDEO_PLAYERS', 'Video Players & Editors']
['SPORTS', 'Sports']
['TOOLS', 'Tools']
['DATING', 'Dating']
['COMMUNICATION', 'Communication']
['SOCIAL', 'Social']
['SOCIAL', 'Social']
['BUSINESS', 'Business']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['FAMILY', 'Entertainment']
['SPORTS', 'Sports']
['TOOLS', 'Tools']
['SPORTS', 'Sports']
['GAME', 'Casino']
['FAMILY', 'Entertainment']
['GAME', 'Casino']
['SOCIAL', 'Social']
['FAMILY', 'Entertainment']
['MEDICAL', 'Medical']
['FINANCE', 'Finance']
['FAMILY', 'Casual']
['FAMILY', 'Educatio

['PERSONALIZATION', 'Personalization']
['SPORTS', 'Sports']
['BOOKS_AND_REFERENCE', 'Books & Reference']
['MAPS_AND_NAVIGATION', 'Maps & Navigation']
['PHOTOGRAPHY', 'Photography']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['LIBRARIES_AND_DEMO', 'Libraries & Demo']
['MEDICAL', 'Medical']
['FAMILY', 'Entertainment']
['TOOLS', 'Tools']
['PERSONALIZATION', 'Personalization']
['COMMUNICATION', 'Communication']
['BUSINESS', 'Business']
['MAPS_AND_NAVIGATION', 'Maps & Navigation']
['GAME', 'Racing']
['COMMUNICATION', 'Communication']
['PHOTOGRAPHY', 'Photography']
['TOOLS', 'Tools']
['LIFESTYLE', 'Lifestyle']
['FAMILY', 'Entertainment']
['FINANCE', 'Finance']
['FINANCE', 'Finance']
['TOOLS', 'Tools']
['PERSONALIZATION', 'Personalization']
['COMMUNICATION', 'Communication']
['GAME', 'Arcade']
['TOOLS', 'Tools']
['FAMILY', 'Education']
['FINANCE', 'Finance']
['SHOPPING', 'Shopping']
['BUSINESS', 'Business']
['MAPS_AND_NAVIGATION', 'Maps & Navigation']
['TOOLS', 'Tools']
['TOOLS', 'Tools']
['TOOLS'

We will perform our analysis with both Google data fields "Genre" and "Category", but may find that the "Category" field is simplest and sufficient to draw conclusions from.

## III. Analysis

We will look at two metrics. First we look at the frequency of each genre. Then we assess a different metric that may be a better measure of app genre popularity.

### III.a Examine Frequency of Apps per Genre

We start by building a dictionary, where the keys are the genre labels and the values are the percent of apps in that genre. Then we transform this into an ordered list of tuples to easily identify the genre with the highest percent of apps.

In [9]:
# Build function that, given dataset and column index, returns a frequency dictionary.

def build_freq_table(data, index_genre):
    freq_dict = {}
    
    for row in data[1:]:
        genre = row[index_genre]
        if genre in freq_dict.keys():
            freq_dict[genre] += 1
        else:
            freq_dict[genre] = 1
    
    # Convert to percentages
    total_num_apps = len(data) - 1  # We will divide by the total number of rows
    for genre in freq_dict:
        freq_dict[genre] /= total_num_apps  # Calculate proportion
        freq_dict[genre] *= 100   # Turn the proportion into a percentage
    
    return freq_dict

In [10]:
# Test that this works. Use the Google data, field "Categories" (index 1)

build_freq_table(google_data, 1)

{'ART_AND_DESIGN': 0.6442133815551537,
 'AUTO_AND_VEHICLES': 0.9267631103074141,
 'BEAUTY': 0.599005424954792,
 'BOOKS_AND_REFERENCE': 2.1360759493670884,
 'BUSINESS': 4.599909584086799,
 'COMICS': 0.6103074141048824,
 'COMMUNICATION': 3.2323688969258586,
 'DATING': 1.8648282097649187,
 'EDUCATION': 1.164104882459313,
 'ENTERTAINMENT': 0.9606690777576853,
 'EVENTS': 0.7120253164556962,
 'FINANCE': 3.7070524412296564,
 'FOOD_AND_DRINK': 1.2432188065099457,
 'HEALTH_AND_FITNESS': 3.0854430379746836,
 'HOUSE_AND_HOME': 0.8024412296564195,
 'LIBRARIES_AND_DEMO': 0.9380650994575045,
 'LIFESTYLE': 3.887884267631103,
 'GAME': 9.697106690777577,
 'FAMILY': 18.942133815551536,
 'MEDICAL': 3.5375226039783,
 'SOCIAL': 2.667269439421338,
 'SHOPPING': 2.2490958408679926,
 'PHOTOGRAPHY': 2.949819168173599,
 'SPORTS': 3.390596745027125,
 'TRAVEL_AND_LOCAL': 2.3395117540687163,
 'TOOLS': 8.453887884267631,
 'PERSONALIZATION': 3.322784810126582,
 'PRODUCTIVITY': 3.899186256781193,
 'PARENTING': 0.65551

In [11]:
# Now we turn the dictionary pairs into tuples, which will allow us to sort these by value.

def sort_freq_table(data, index_genre):
    freq_dict = build_freq_table(data, index_genre)
    tuples_list = []
    
    for key in freq_dict:
        new_tuple = (freq_dict[key], key)
        tuples_list.append(new_tuple)
    
    sorted_list = sorted(tuples_list, reverse=True)  # We want the order to be from greatest to least
    
#    return sorted_list

**Review results.** 

Let's look at the ordered lists for each data set. Then we will note some observations.

In [12]:
# Google data. Field "Categories" (index 1).

sort_freq_table(google_data, 1)

In [13]:
# Google data. Field "Genres" (index 9).

sort_freq_table(google_data, 9)

In [14]:
# Google data. Field "prime_genre" (index 11).

sort_freq_table(apple_data, 11)

From these results we see that the most common Apple apps tend to be for enjoyment ("Games", "Entertainment", and "Photo and Video" are top three), while the most common Google apps include both apps for enjoyment and those for pragmatic purposes ("Tools" is one of the top genres/categories).

However, the most common genre does not necessarily tell us what is the most popular genre amongst users. So we need to ask, what field can we use as an indicator of the number of users using the apps? 

### III.b Analyze Genre Popularity with Average Number of Reviews

There are a couple data points we may look at as a proxy for number of users using the apps. The Google data has a field called "Installs" - the number of times the app was installed on a device. However, this field is categorical, with values like "10,000+" - we will need to convert these to numeric values before we can perform any calculations with the numbers. We will do this by simply removing the "+" and "," symbols in the string and using the number remaining - it will be an approximation, but that's good enough for our purposes.

In the Apple data there is no field that counts number of installs, so we will use the number of reviews as a proxy for th number of users using the app. This can be found in the field "rating_count_tot".

Simply counting frequency of the reviews per genre will not give us a good measure of popularity - the number of genres that have more apps are likely to also have more reviews. So instead we will find the average number of reviews, by calculating the number of reviews per number of apps in each genre.

In [15]:
# Calculate number of installs or reviews per number of apps for each genre
# Build this as a dictionary, then convert to an ordered list of tuples.

def avg_num_reviews(data, index_genre, index_count):
    genres_dict = build_freq_table(data, index_genre)
    avg_reviews_dict = {}
    
    for genre in genres_dict:
        count_apps = genres_dict[genre]
        count_reviews = 0
        
        for row in data:
            if row[index_genre] == genre:
                  # Select the relevant column value - be it number of installs or number of reviews
                add_count = row[index_count]
                  # Use str.replace to remove '+' and ',' from the values in the case of the Google data
                add_count = add_count.replace('+','').replace(',','')
                add_count = float(add_count)
                count_reviews += add_count
        
        avg_reviews = count_reviews / count_apps
        avg_reviews_dict[genre] = avg_reviews
    
    # Turn the dictionary value-key pairs into tuples then sort by value.
    tuples_list = []
    
    for key in avg_reviews_dict:
        new_tuple = (avg_reviews_dict[key], key)
        tuples_list.append(new_tuple)
    
    sorted_list = sorted(tuples_list, reverse=True)  # We want the order to be from greatest to least
    
    return sorted_list

In [16]:
# Google data; group by 'Categories' (index 1); calculating average 'Installs' (index 5)

avg_num_reviews(google_data, 1, 5)

[(3414494614.614266, 'COMMUNICATION'),
 (2187922154.626415, 'VIDEO_PLAYERS'),
 (2057483140.2074578, 'SOCIAL'),
 (1578492968.3954022, 'PHOTOGRAPHY'),
 (1485343077.3991885, 'PRODUCTIVITY'),
 (1375334403.8881118, 'GAME'),
 (1237311195.7936232, 'TRAVEL_AND_LOCAL'),
 (1029969656.4705883, 'ENTERTAINMENT'),
 (958260694.3576471, 'TOOLS'),
 (844911310.8258065, 'NEWS_AND_MAGAZINES'),
 (779880397.2740741, 'BOOKS_AND_REFERENCE'),
 (622622904.5266331, 'SHOPPING'),
 (460227181.53142864, 'PERSONALIZATION'),
 (455278289.28000003, 'WEATHER'),
 (370626969.26358974, 'HEALTH_AND_FITNESS'),
 (358279819.6292683, 'MAPS_AND_NAVIGATION'),
 (326990388.21670645, 'FAMILY'),
 (323005289.43946666, 'SPORTS'),
 (175750928.5614035, 'ART_AND_DESIGN'),
 (170314951.71345454, 'FOOD_AND_DRINK'),
 (162227650.4854369, 'EDUCATION'),
 (151503432.24373466, 'BUSINESS'),
 (127956079.64511627, 'LIFESTYLE'),
 (122783030.24195123, 'FINANCE'),
 (120385714.77859156, 'HOUSE_AND_HOME'),
 (75564470.90521212, 'DATING'),
 (73669676.8888889

In [17]:
# Google data; group by 'Genres' (index 9); calculating average 'Installs' (index 5)

avg_num_reviews(google_data, 9, 5)

[(3414494614.614266, 'Communication'),
 (3126293333.3333335, 'Adventure;Action & Adventure'),
 (2207340271.24586, 'Video Players & Editors'),
 (2057483140.2074578, 'Social'),
 (2037532606.4490795, 'Arcade'),
 (1731484727.3948717, 'Casual'),
 (1625082666.6666667, 'Puzzle;Action & Adventure'),
 (1578492968.3954022, 'Photography'),
 (1505634666.6666667, 'Educational;Action & Adventure'),
 (1485343077.3991885, 'Productivity'),
 (1407773929.9272728, 'Racing'),
 (1243274609.3654368, 'Travel & Local'),
 (1142866666.6666667, 'Casual;Action & Adventure'),
 (1103089505.2963505, 'Action'),
 (990967375.9308643, 'Strategy'),
 (958359035.3139492, 'Tools'),
 (884800000.0, 'Tools;Education'),
 (884800000.0, 'Role Playing;Brain Games'),
 (884800000.0, 'Lifestyle;Pretend Play'),
 (884800000.0, 'Casual;Music & Video'),
 (884800000.0, 'Card;Action & Adventure'),
 (884800000.0, 'Adventure;Education'),
 (844911310.8258065, 'News & Magazines'),
 (835745213.3333333, 'Music'),
 (829500000.0, 'Educational;Prete

In [18]:
# Apple data; group by 'prime_genre' (index 11); calculating average 'rating_count_tot' (index 5)

avg_num_reviews(apple_data, 11, 5)

[(2734337.866, 'Reference'),
 (2289781.4901492535, 'Music'),
 (2151790.0579020977, 'Social Networking'),
 (1914336.724516129, 'Weather'),
 (1104710.6304191616, 'Photo & Video'),
 (1052906.907, 'Navigation'),
 (819557.3639285713, 'Travel'),
 (818060.4311627909, 'Food & Drink'),
 (816028.6336708859, 'Sports'),
 (808866.8821052632, 'Health & Fitness'),
 (772444.5829032258, 'Productivity'),
 (767884.2481330377, 'Games'),
 (759990.3133884298, 'Shopping'),
 (644291.0365517242, 'News'),
 (567969.4911926605, 'Utilities'),
 (548192.4976190475, 'Finance'),
 (438762.84209580836, 'Entertainment'),
 (363980.62702127657, 'Lifestyle'),
 (344522.43333333335, 'Book'),
 (258150.612, 'Business'),
 (254037.15333333335, 'Education'),
 (72143.18222222221, 'Catalogs'),
 (18638.264999999996, 'Medical')]

## IV. Draw Conclusions

We already decided that popularity of app genres - approximated by the number of installs or reviews - would be a better metric to look at than number of apps in each genre. So we draw our conclusions from the latter analysis.

Social networking, photo, and video apps seem to be a common popular genres on both platforms. "Social Networking" and "Photo and Video" are in the top five most popular Apple app genres, and "Social", "Photo", and "Video Players" are in the top five categories on Google apps.  

From this, we may decide on a recommendation for what type of app our clients should build that would be successful on both platforms: Build a social networking app, or perhaps a photo or video app that has a social component.