# Analysing Mobile App Data

Our company builds Android and iOS mobile apps, available on Google Play and the App Store. The company's apps are free, and revenue is generated through in-app ads. 

This project aims to understand what types of apps are more likely to attract more users. The number of users will determine the revenue for a given app, given the company's business model

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
opened_file = open('AppleStore.csv')

from csv import reader
read_file = reader(opened_file)
apple_apps_data = list(read_file)

opened_file_2 = open('googleplaystore.csv')
read_file_2 = reader(opened_file_2)
google_apps_data = list(read_file_2)

apple_apps_data[:4]
google_apps_data[:4]

[['App',
  'Category',
  'Rating',
  'Reviews',
  'Size',
  'Installs',
  'Type',
  'Price',
  'Content Rating',
  'Genres',
  'Last Updated',
  'Current Ver',
  'Android Ver'],
 ['Photo Editor & Candy Camera & Grid & ScrapBook',
  'ART_AND_DESIGN',
  '4.1',
  '159',
  '19M',
  '10,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'January 7, 2018',
  '1.0.0',
  '4.0.3 and up'],
 ['Coloring book moana',
  'ART_AND_DESIGN',
  '3.9',
  '967',
  '14M',
  '500,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design;Pretend Play',
  'January 15, 2018',
  '2.0.0',
  '4.0.3 and up'],
 ['U Launcher Lite – FREE Live Cool Themes, Hide Apps',
  'ART_AND_DESIGN',
  '4.7',
  '87510',
  '8.7M',
  '5,000,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'August 1, 2018',
  '1.2.4',
  '4.0.3 and up']]

In [3]:
explore_data(google_apps_data,0,4)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']




In [4]:
explore_data(apple_apps_data,0,4)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']




The code below calculates the number of columns for each dataset

In [5]:
print(len(google_apps_data[0]))

13


In [6]:
print(len(apple_apps_data[0]))

16


The code below calculates the number of apps in each dataset, ommitting the header

In [7]:
number_google_apps = 0
for row in google_apps_data[1:]:
    number_google_apps += 1
    
print(number_google_apps)

number_apple_apps = 0
for row in apple_apps_data[1:]:
    number_apple_apps += 1
    
print(number_apple_apps)

10841
7197


Let's look at the column names to see which ones are relevant:

In [8]:
print(google_apps_data[0])
print(apple_apps_data[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Before starting any analysis we must clean the datasets. We have identified an error for entry 10472 of the Google Play dataset that needs fixing.

In [9]:
print(google_apps_data[10473])
print(len(google_apps_data[10473]))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
12


In [10]:
del(google_apps_data[10473])

The Google Play dataset has some duplicate entries. We have included an example below to illustrate the issue:

In [11]:
for app in google_apps_data:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


If we dig a bit further, we can find out how widespread this issue is, and how many duplicate apps should be removed.

In [12]:
duplicate_apps = []
unique_apps = []

for app in google_apps_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
            
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


To remove duplicates, we will keep the most recent version, which should be the one with the largest number of reviews. Below, we are creating a loop that will add unique app names and the highest number of reviews for that name. 

If the app is in in the dictionary and the number of reviews in the dictionary is lower than the row that is being passed, the value will be updated.

If the name of the app is not in the dictionary, it will be added. Otherwise, nothing will happen. This way, we will keep the entries with lower reviews out of our dictionary. 

In [13]:
reviews_max = {}
for app in google_apps_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name]= n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))

9659


We will now add the cleaned list of lists for the apps called android_clean. We will use another for loop to add the correct record for each app that was duplicated.

In [14]:
android_clean = []
already_added = []

for app in google_apps_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name]== n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

In [15]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


Because our company only develops apps in English at our company, we would like to analyse only the apps designed for an English-speaking audience.

Each character we use in a string has a number associated with it. We can use the built-in function ord() to get the corresponding number of each character. 

The numbers corresponding to the characters commonly used in English are in the range of 0 to 127. Therefore, if an app name contains a character greater than 127, it probably means that the app has a non-English name.

In [16]:
# We are defining a function that returns False whenever an app has
# characters that are non-English characters. We have added an if
# clause to allow for up to three non-English characters, for example
# emojis, so that these apps are still included in our dataset.

def english_character(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
            
    if non_ascii > 3:
        return False
    else:
        return True

print(english_character('Docs To Go™ Free Office Suite'))
print(english_character('Instachat 😜'))

True
True


We will test out the function with various examples to see how it works:

In [17]:
print(english_character('爱奇艺PPS -《欢乐颂2》电视剧热'))

False


In [18]:
print(english_character('Docs To Go™ Free Office Suite'))

True


In [19]:
print(english_character('Instagram'))

True


In [20]:
print(english_character('Instachat 😜'))

True


In [21]:
print(ord('™'))

8482


Now we will use the function we created to separate the English-language apps from the non-English language apps into two lists of lists. We will use a for loop to achieve this. 

In [22]:
android_english_apps = []

for app in android_clean:
    name = app[0]
    if english_character(name) is True:
        android_english_apps.append(app)

In [23]:
print(android_english_apps[:5])

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]


We will explore this dataset and repeat the loop above with the iOS apps:

In [24]:
explore_data(android_english_apps, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


In [25]:
ios_english_apps = []

for app in apple_apps_data[1:]:
    name = app[1]
    if english_character(name) is True:
        ios_english_apps.append(app)

explore_data(ios_english_apps,0,3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


We can see that there are 6183 iOS apps and 9614 Android apps in our dataset

Since we are only interested in free apps for our comparison (our apps are all free to download and install, and our main source of revenue is in-app ads), we will isolate only the free apps for the analysis

In [26]:
ios_free_apps = []
for app in ios_english_apps:
    price = app[4]
    
    if price == '0.0':
        ios_free_apps.append(app)
    if price == '0':
        ios_free_apps.append(app)

explore_data(ios_free_apps,0,3,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16


In [27]:
android_free_apps = []
for app in android_english_apps:
    price = app[7]
    
    if price == '0.0':
        android_free_apps.append(app)
    if price == '0':
        android_free_apps.append(app)

explore_data(android_free_apps,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


# Most Common Apps by Genre

Our goal is to determine the kinds of apps that are likely to attract more users - the more users we get, the more revenue we will bring

Our validation strategy for an app idea has three steps:

1. Build a minimal Android version of the app, and add it to Google Play
2. If the app has a good response from users, we develop it further
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store

Our end goal is to add the app on both Google Play and the App Store - we want to find apps that are successful in both markets. 

We will begin the analysis by determining the most common genres for each market using frequency tables.

In [28]:
# To make a frequency table, we will create an empty dictionary
# to store the values

freq_genre = {}
# The for loop checks whether the genre is in the dictionary
# If the key is found, then the value stored increases by 1
# If the key is not found, it is added to the dictionary with a value
# of 1
for app in ios_free_apps:
    genre = app[11]
    
    if genre in freq_genre:
        freq_genre[genre] += 1
    else:
        freq_genre[genre] = 1
        
print(freq_genre)

{'Social Networking': 106, 'Photo & Video': 160, 'Games': 1874, 'Music': 66, 'Reference': 18, 'Health & Fitness': 65, 'Weather': 28, 'Utilities': 81, 'Travel': 40, 'Shopping': 84, 'News': 43, 'Navigation': 6, 'Lifestyle': 51, 'Entertainment': 254, 'Food & Drink': 26, 'Sports': 69, 'Book': 14, 'Finance': 36, 'Education': 118, 'Productivity': 56, 'Business': 17, 'Catalogs': 4, 'Medical': 6}


It appears that the most common genres for iOS apps are Games, Entertainment, Photo & Video, Education and Social Networking.

Now let's create a frequency table for the Android dataset:

In [29]:
freq_genre = {}

for app in android_free_apps:
    genre = app[1]
    
    if genre in freq_genre:
        freq_genre[genre] += 1
    else:
        freq_genre[genre] = 1
        
print(freq_genre)

{'ART_AND_DESIGN': 57, 'AUTO_AND_VEHICLES': 82, 'BEAUTY': 53, 'BOOKS_AND_REFERENCE': 190, 'BUSINESS': 407, 'COMICS': 55, 'COMMUNICATION': 287, 'DATING': 165, 'EDUCATION': 103, 'ENTERTAINMENT': 85, 'EVENTS': 63, 'FINANCE': 328, 'FOOD_AND_DRINK': 110, 'HEALTH_AND_FITNESS': 273, 'HOUSE_AND_HOME': 73, 'LIBRARIES_AND_DEMO': 83, 'LIFESTYLE': 346, 'GAME': 862, 'FAMILY': 1676, 'MEDICAL': 313, 'SOCIAL': 236, 'SHOPPING': 199, 'PHOTOGRAPHY': 261, 'SPORTS': 301, 'TRAVEL_AND_LOCAL': 207, 'TOOLS': 750, 'PERSONALIZATION': 294, 'PRODUCTIVITY': 345, 'PARENTING': 58, 'WEATHER': 71, 'VIDEO_PLAYERS': 159, 'NEWS_AND_MAGAZINES': 248, 'MAPS_AND_NAVIGATION': 124}


The genres for Android differ from the Apple ones. The most frequent category appears to be Family, followed by Tools, Business, Lifestyle, Finance, and Medical. There are two columns in the dataset that could be used for the analysis: Genres and Category

In [30]:
# We will build two functions to analyse the frequency tables:
# 1: To generate frequency tables that show percentages;
# 2: To display the percentages in descending order

def freq_table(dataset,index):
    table = {}
    total = 0
   #In the code above we created an empty dictionary called table
   # to store the frequency table and a variable to store the
   # number of elements in the dictionary

    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    # For each row in the dataset, we are adding 1 to the total
    # We are then using an if statement like in the frequency tables
    # that we created before, to add each unique value as equal to 1,
    # and adding 1 for every value that already existed in the table
    
    # To turn this into percentages, we create a new dictionary that
    # uses the keys from the previous dictionary and creates the 
    # percentages using the value stored by the key and dividing it 
    # by the total number of elements that we calculated previously
    # and multiplying that by 100
    
    table_percentages = {}
    for key in table:
        percentage = (table[key]/total) * 100
        table_percentages[key] = percentage
    
    return table_percentages

The function below takes two parameters, dataset and index, and generates a frequency table using the freq_table function above. It transforms the frequency table into a list of tuples and sorts the list in a descending order. This is necessary because dictionaries can only be ordered by key and not by the values stored in them. 

We are interested in seeing the most popular genres, which is why we need to sort the dictionary results. 

In [31]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [32]:
display_table(ios_free_apps, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Games is the most common genre in the App Store dataset for free English-language apps, accounting for almost 60% of the apps. The next most common genres are Entertainment, with almost 8%, and Photo & Video, with almost 5%. 

This means that most of the apps in the App Store are designed for entertainment rather than practical purposes.

However, the frequency table does not tell us how popular these apps are with users. With this informatio, we only know that there is a large number of entertainment apps in the dataset.

In [33]:
display_table(android_free_apps, 1)
print('\n')
display_table(android_free_apps, 9)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

The Google Play dataset has many more categories and genres. As a result, the percentages are more evenly spread. Games still figure prominently as the second most common category (with almost 10% of the apps belonging in this category), but they are much less common than the almost 60% that we saw in the App Store dataset.

The most common category is Family, and aside from Games, Tools, Business, Lifestyle, Productivity and Finance sit at the top the frequency table. For the genres, Tools tops the list, followed by Entertainment, Education, Business, and Productivity. 

These results are very different than the ones from the App Store dataset - the Google Play dataset has more apps focused on practical purposes, such as education, utilities, productivity and lifestyle. 

We will have to explore these datasets further to uncover which apps are the most popular with users.

# Most Popular Apps by Genre on the App Store

To find out what genres are the most popular, we can calculate the average number of installs for each app genre. For the Google Play dataset, we can find this information in the Installs column. For the App Store dataset, this column is missing, but we can take the total number of user ratings as a proxy.

In [34]:
ios_unique_genres = freq_table(ios_free_apps, 11)
    
for genre in ios_unique_genres:
    total = 0
    len_genre = 0
    for app in ios_free_apps:
        genre_app = app[11]
        if genre_app == genre:
            num_ratings = float(app[5])
            total += num_ratings
            len_genre += 1
    avg_num_ratings = total / len_genre
    print(genre, ':', avg_num_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


The most popular genre appears to be Social Networking. However, Social Networking apps are difficult to launch and require many thousands (if not millions) of users to succeed. A few apps are skewing the data. The same is true for Music. 

Reference, Health and Fitness, Utilities may be more accessible genres, since there are not a few apps that account for most of the users. 

For the Google Play store, the number of installs are not very precise. Many values are open-ended (100+, 1,000+, 5,000+, etc.). For this exercise, we will consider that an app with 100,000+ installs has 100,000 installs. This will give us a rough idea of the genres' popularity. 

In [35]:
android_unique_genres = freq_table(android_free_apps, 1)
    
for category in android_unique_genres:
    total = 0
    len_category = 0
    for app in android_free_apps:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',','')
            total += float(n_installs)
            len_category += 1
    avg_num_installs = total / len_genre
    print(category, ':', round(avg_num_installs,2))

ART_AND_DESIGN : 18870183.33
AUTO_AND_VEHICLES : 8846676.83
BEAUTY : 4532841.67
BOOKS_AND_REFERENCE : 277647376.67
BUSINESS : 116150348.33
COMICS : 7495191.67
COMMUNICATION : 1839484366.83
DATING : 23485792.83
EDUCATION : 31475000.0
ENTERTAINMENT : 164910000.0
EVENTS : 2662193.33
FINANCE : 75860522.0
FOOD_AND_DRINK : 35289791.83
HEALTH_AND_FITNESS : 190591400.33
HOUSE_AND_HOME : 16200410.17
LIBRARIES_AND_DEMO : 8832635.0
LIFESTYLE : 82914071.5
GAME : 2239478241.67
FAMILY : 1032315948.33
MEDICAL : 6288724.0
SOCIAL : 914643650.33
SHOPPING : 233389764.17
PHOTOGRAPHY : 776044802.5
SPORTS : 182538447.17
TRAVEL_AND_LOCAL : 482450681.0
TOOLS : 1350173912.33
PERSONALIZATION : 254872648.0
PRODUCTIVITY : 965271552.33
PARENTING : 5245168.33
WEATHER : 60048086.67
VIDEO_PLAYERS : 655288620.0
NEWS_AND_MAGAZINES : 394699376.67
MAPS_AND_NAVIGATION : 83843463.33


Communication apps have the most installs: 38,456,119. However, this figure again is heavily skewed by a few really popular apps.

In [36]:
for app in android_free_apps:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

We can see above that some apps have over a billion installs or over 100 million. Most apps in Communication will not be so popular. 

In [37]:
for app in android_free_apps:
    if app[1] == 'AUTO_AND_VEHICLES' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
        
for app in android_free_apps:
    if app[1] == 'LIFESTYLE' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])    
        
for app in android_free_apps:
    if app[1] == 'FOOD_AND_DRINK' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])       

Tinder : 100,000,000+


The categories above could be good candidates for our company. We want to develop apps that can get as many downloads as possible. For these categories, we can see there is only one app that has over 10