# App Store Analysis

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

We are working with 2 distinct datasets here, each containing data on Android or iOS apps.

## Introduction

In [2]:
from csv import reader

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

To make it easier to explore the two data sets, we'll first write a function named explore_data() that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any data set.

In [3]:
with open('AppleStore.csv', encoding='utf8') as f:
    apple = reader(f)
    apple = list(apple)
    apple_header = apple[0]
    apple_data = apple[1:]
    explore_data(apple, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns: 16


We see that the Google Play data set has 10841 apps and 13 columns. At a quick glance, the columns that might be useful for the purpose of our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.

In [4]:
with open('googleplaystore.csv', encoding='utf8') as f:
    android = reader(f)
    android = list(android)
    android_header = android[0]
    android_data = android[1:]
    explore_data(android_data, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


We see that the Google Play data set has 10841 apps and 13 columns. At a quick glance, the columns that might be useful for the purpose of our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.
 
## Deleting wrong data
 
In the discussion board for the Android dataset, 1 row is identified as having an incorrect rating (https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015). 

In [5]:
print(android_header)
print(android_data[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


We can see by printing this row, that the rating is set as 19 which is impossible. The max rating an android app can have is 5, so we will remove this row. 

In [6]:
del android_data[10472]
print(android_data[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


## Removing duplicate data

We will start here by identifying any duplicate apps in this dataset.

In [7]:
duplicate_apps = []
unique_apps = []

for app in android_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Specifically we will look at the Instagram app, to see why these duplicates have occured. 

In [8]:
print(android_header)
for app in android_data:
    name = app[0]
    if name == 'Instagram':
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Here we see the four entries for 'Instagram', with the Reviews column the only one with differing data. As each row contains a different number of reviews, we can guess that this data was taken repeatedely at different times, and collated. 

We want to remove the duplicates, but keep one (the most relevant) data row. This would be the most recently collected data row, that is the one with the highest number of reviews.

In [12]:
def count_reviews(dataset, name_index, review_index):
    reviews_max = {}
    for app in dataset:
        name = app[name_index]
        n_reviews = float(app[review_index])
        if name in reviews_max:
            if reviews_max[name] < n_reviews:
                reviews_max[name] = n_reviews
        else:
            reviews_max[name] = n_reviews
    return reviews_max

reviews_max = count_reviews(android_data, 0, 3)
n_items = {k: reviews_max[k] for k in list(reviews_max)[:10]}
n_items

{'HTC Mail': 6572.0,
 'League of Gamers - Be an E-Sports Legend!': 68072.0,
 'Mobile CS': 13.0,
 'Photo Collage - Layout Editor': 285788.0,
 'Relax Melodies P: Sleep Sounds': 19543.0,
 'Slickdeals: Coupons & Shopping': 33599.0,
 'Strava Training: Track Running, Cycling & Swimming': 328469.0,
 'Telstra': 4260.0,
 'Tiny Flashlight + LED': 4254879.0,
 'Zara': 95905.0}

The dictionary created here contains the maximum number of reviews for each app, so we can now use this dictionary to remove the duplicate rows with fewer reviews i.e. the older duplicate data. 

In [13]:
android_clean = []
already_added = []
for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
        
explore_data(android_clean, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


## Removing non-English apps

Exploring the data, we find that both data sets have apps whose name suggests that they are not directed toward an English-speaking audience e.g. '爱奇艺PPS -《欢乐颂2》电视剧热播'

As we are not interested in these apps, we will remove them from the dataset. 

In [14]:
def check_eng(name):
    count = 0
    for char in name:
        if int(ord(char)) > 127:
            count += 1
            if count > 3: # allows contingency for a few non-Eng chars
                return False
    return True

names = ['Instagram', '爱奇艺PPS -《欢乐颂2》电视剧热播', 'Docs To Go™ Free Office Suite', 'Instachat 😜']
for name in names:
    print(check_eng(name))

True
False
True
True


This function allows us to easily check whether a string contains purely english characters or not. We can use this to remove the irrelevant data rows. 

In [15]:
android_eng = []
already_added = []
for app in android_clean:
    name = app[0]
    if check_eng(name) and name not in already_added:
        android_eng.append(app)
        already_added.append(name)
        
explore_data(android_eng, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9614
Number of columns: 13


Down from 9659 rows, we are left with 9614 rows containing english named apps. 

We will also isolate the free apps, to continue with the analysis. 

In [16]:
android_free = []
already_added = []
for app in android_eng:
    name = app[0]
    app_type = app[6]
    if app_type == 'Free' and name not in already_added:
        android_free.append(app)
        already_added.append(name)
        
explore_data(android_free, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8863
Number of columns: 13


Finally, we are left with 8863 cleaned rows, reduced from 10841. Removing almost 2000 rows we cleaned the data in the following ways:
- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps
- Isolated the free apps

We are now at a point to start analysing this data for answers to our problem. 

## Most common apps by genre

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we then develop it further.
3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of what are the most common genres for each market.

In [17]:
def freq_table(dataset, index):
    table = {}
    for app in dataset:
        name = app[0]
        cat = app[index]
        if cat in table:
            table[cat] += 1
        else:
            table[cat] = 1
    return table

category = freq_table(android_free, 1)
print(category)

{'SHOPPING': 199, 'ART_AND_DESIGN': 57, 'BEAUTY': 53, 'NEWS_AND_MAGAZINES': 248, 'LIBRARIES_AND_DEMO': 83, 'LIFESTYLE': 346, 'PERSONALIZATION': 294, 'PARENTING': 58, 'WEATHER': 71, 'VIDEO_PLAYERS': 159, 'HEALTH_AND_FITNESS': 273, 'HOUSE_AND_HOME': 73, 'PRODUCTIVITY': 345, 'SPORTS': 301, 'GAME': 862, 'COMMUNICATION': 287, 'BOOKS_AND_REFERENCE': 190, 'TRAVEL_AND_LOCAL': 207, 'AUTO_AND_VEHICLES': 82, 'TOOLS': 750, 'COMICS': 55, 'EDUCATION': 103, 'BUSINESS': 407, 'EVENTS': 63, 'MEDICAL': 313, 'ENTERTAINMENT': 85, 'PHOTOGRAPHY': 261, 'SOCIAL': 236, 'FAMILY': 1675, 'DATING': 165, 'FOOD_AND_DRINK': 110, 'FINANCE': 328, 'MAPS_AND_NAVIGATION': 124}


We are now able to generate frequency tables for a given dataset and column. The next function will allow us to view the table in an ordered form. 

In [18]:
def display_table(table):
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [21]:
print('Category column from Android dataset')
display_table( freq_table(android_free, 1) )

Category column from Android dataset
FAMILY : 1675
GAME : 862
TOOLS : 750
BUSINESS : 407
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 313
SPORTS : 301
PERSONALIZATION : 294
COMMUNICATION : 287
HEALTH_AND_FITNESS : 273
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 190
DATING : 165
VIDEO_PLAYERS : 159
MAPS_AND_NAVIGATION : 124
FOOD_AND_DRINK : 110
EDUCATION : 103
ENTERTAINMENT : 85
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
HOUSE_AND_HOME : 73
WEATHER : 71
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 57
COMICS : 55
BEAUTY : 53


In [22]:
print('Genre column from Android dataset')
display_table( freq_table(android_free, 9) )

Genre column from Android dataset
Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294
Communication : 287
Action : 275
Health & Fitness : 273
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 164
Video Players & Editors : 157
Casual : 156
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 80
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 53
Parenting : 44
Card : 40
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Action & Adv

From the Android data, we see the of the top 10 genres, 7 are more practical uses cases. 

Also, the largest category is 'Family' by almost 2x, therefore this is also likely to be a useful marker for the user profile.

In [23]:
# CLEANING APPLE DATA: removing duplicate, non english, and paid apps
reviews_max = count_reviews(apple_data, 1, 5)
apple_cleaned = []
already_added = []
for app in apple_data:
    name = app[1]
    price = float(app[4])
    n_reviews = float(app[5])
    if check_eng(name) and name not in already_added and price == 0 and n_reviews == reviews_max[name]:
        apple_cleaned.append(app)
        already_added.append(name)
    
explore_data(apple_cleaned, 0, 5, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 3220
Number of columns: 16


As we had primarily been focusing on the Android data, the Apple data still contained all of the unclean data. After cleaning, less than half of the Apple data remains. We can now continue and build the genre frequency table for the Apple data.

In [24]:
print('prime_genre column from Apple dataset')
display_table( freq_table(apple_cleaned, 11) )

prime_genre column from Apple dataset
Games : 1872
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


The most common genre for Apple is 'Games', by a long way with 1872 entries, followed by 'Entertainment' at 254. Following the top category, the distribution of the other genres falls steadily. 

Past the top 'Games' category, there seems to be a relatively equal split between practical purposes (education, shopping, utilities, productivity, lifestyle, etc) and fun (games, entertainment, photo and video, social networking, sports, music, etc.). 

Based on the business model for phone apps, we can assume that the number of apps in a genre will be positively correlated with the number of users in that genre. 

Therefore, when compared with the Android data, we can say that a large proportion of the iPhone users are either younger or specifically choose Apple for personal use. Whereas the Android users may either be older due to the number of apps in the 'Family' category and large proportion of practical and business apps. 

## Most popular apps by genre

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the 'Installs' column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the 'rating_count_tot app'.

In [25]:
genre_freq = freq_table(apple_cleaned, 11)
genre_users = {}
for genre in genre_freq:
    total = 0
    len_genre = 0
    for app in apple_cleaned:
        genre_app = app[11]
        if genre_app == genre:
            total += float(app[5])
            len_genre += 1
    genre_users[genre] = round(total / len_genre, 1)

display_table(genre_users)

Navigation : 86090.3
Reference : 74942.1
Social Networking : 71548.3
Music : 57326.5
Weather : 52279.9
Book : 39758.5
Food & Drink : 33333.9
Finance : 31467.9
Photo & Video : 28441.5
Travel : 28243.8
Shopping : 26919.7
Health & Fitness : 23298.0
Sports : 23008.9
Games : 22812.9
News : 21248.0
Productivity : 21028.4
Utilities : 18684.5
Lifestyle : 16485.8
Entertainment : 14029.8
Business : 7491.1
Education : 7004.0
Catalogs : 4004.0
Medical : 612.0


Starting with the Apple data, we can have calculated the average number of users per app in each genre. 

In [26]:
android_cat = freq_table(android_free, 1)
cat_installs = {}
for category in android_cat:
    total = 0
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app == category:
            installs = app[5].replace('+', '').replace(',', '')
            total += float(installs)
            len_category += 1
    cat_installs[category] = round( total / len_category, 1 )

display_table(cat_installs)

COMMUNICATION : 38456119.2
VIDEO_PLAYERS : 24727872.5
SOCIAL : 23253652.1
PHOTOGRAPHY : 17840110.4
PRODUCTIVITY : 16787331.3
GAME : 15588015.6
TRAVEL_AND_LOCAL : 13984077.7
ENTERTAINMENT : 11640705.9
TOOLS : 10801391.3
NEWS_AND_MAGAZINES : 9549178.5
BOOKS_AND_REFERENCE : 8767811.9
SHOPPING : 7036877.3
PERSONALIZATION : 5201482.6
WEATHER : 5074486.2
HEALTH_AND_FITNESS : 4188822.0
MAPS_AND_NAVIGATION : 4056941.8
FAMILY : 3697848.2
SPORTS : 3638640.1
ART_AND_DESIGN : 1986335.1
FOOD_AND_DRINK : 1924897.7
EDUCATION : 1833495.1
BUSINESS : 1712290.1
LIFESTYLE : 1437816.3
FINANCE : 1387692.5
HOUSE_AND_HOME : 1331540.6
DATING : 854028.8
COMICS : 817657.3
AUTO_AND_VEHICLES : 647317.8
LIBRARIES_AND_DEMO : 638503.7
PARENTING : 542603.6
BEAUTY : 513151.9
EVENTS : 253542.2
MEDICAL : 120550.6


When comparing the Apple and Android data, we clearly see the Android numbers much higher. This is due to using actual install data for Android, and only ratings data for Apple, which is likely much lower as only a small proportion of users leave ratings. Therefore, we cannot directly compare the platforms on users. 

Despit this, we can still make generalisations on the user profile from this data based on app usad