# User Attractions on Mobile Applications
---
This notebook explores the datasets of the content of the Apple App Store, and Google Play Store (those which are directed toward an English-speaking audience) in order to help the developers in choosing the type of application which attracts more users, therefor, increase profit form advertisement.

## Opening and Exploring Datasets from CSV Files

### Opening Datasets:

In [1]:
from csv import reader

#Apple App Store
openedFile1 = open('AppleStore.csv')
readFile1 = reader(openedFile1)
appleList = list(readFile1)
appleHeader = appleList[0]
appleList = appleList[1:]

#Google Play Store
openedFile2 = open('googleplaystore.csv')
readFile2 = reader(openedFile2)
googleList = list(readFile2)
googleHeader = googleList[0]
googleList = googleList[1:]

### Exploring Datasets:

This step is meant for finding out the interesting fields, as well as exploring the data in the datasets. This helps understand the data we're going to work on, as well as prepares for the data cleaning step.

*Below function is used to explore different datasets:*

In [2]:
def exploreData(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

#### Apple App Store:

In [3]:
print(appleHeader,end='\n\n')
exploreData(appleList, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


Interesting columns in the above dataset would be track_name, price, rating_count_tot, and prime_genre. These columns are detailed in the Apple App Store dataset [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

#### Google Play Store:

In [4]:
print(googleHeader,end='\n\n')
exploreData(googleList, 0, 1, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


Interesting columns in the above datasets would be App, Category, Rating, Reviews, Installs, Price, and Genres.

A detailed explaination of the datasets can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) for Apple store, and [here](https://www.kaggle.com/lava18/google-play-store-apps) for Google play store.

## Data Cleaning
---
In this step, we're going to perform the following actions:
1. Detect inaccurate data, and correcct or remove it.
2. Detect duplicate data, and remove the duplicates.
3. Remove non-English appps, like 爱奇艺PPS -《欢乐颂2》电视剧热播.
4. Remove apps that aren'r free

### Finding and Removing Errors:
The errors can be generally discussed in the forums of the source of the dataset.

#### Google Play Store Dataset:
The Google Play Store has a discussion section (which can be found [here](https://www.kaggle.com/lava18/google-play-store-apps/discussion)). Based on findings from [this link](http://kaggle.com/lava18/google-play-store-apps/discussion/66015), index 10472 has a missing category column.

In [5]:
for item in googleList:
    if len(item) < 13:
        print('index ' + str(googleList.index(item)) + '\n' + str(item))
        print('Length: ' + str(len(item)), end='\n\n')

index 10472
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
Length: 12



This raw will be deleted. (The below must not run more than once)

In [6]:
del(googleList[10472])

Index 10472 data is now changed (index 10473 replaced it)

In [7]:
print(googleList[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


In [8]:
exploreData(googleList,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10840
Number of columns: 13


Total number of rows now changed

#### Apple App Store Dataset:
Since Google Play Store has one record that has a missing column, we will check the length of columns as well for the Apple App Store dataset

In [9]:
for item in appleList:
    if len(item) != 16:
        print(item)

Above function returned nothing, which means all fields are the same quantity.

### Finding and Removing Duplicates:
If we look through Google Play Store dataset, or look at the [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion) section, we'll notice some apps have duplicate enteries. For example, Instagram has 4 enteries.



In [10]:
print(googleHeader,end='\n\n')
for app in googleList:
    if app[0] == 'Instagram':
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


The records shows that the difference is in the number of reviews. These duplicates will not be removed randomly, rather, the ones with the highest number of reviews will stay and the rest will be removed.

We will need to calculate the total number of duplicates:

In [11]:
googleDuplicateApps = []
googleUniqueApps = []

for app in googleList:
    if app[0] in googleUniqueApps:
        googleDuplicateApps.append(app[0])
    else:
        googleUniqueApps.append(app[0])

print('Number of duplicate applications:', len(googleDuplicateApps))

Number of duplicate applications: 1181


We see from the previous report that the Google Play Store dataset has many duplicate enteries.

Examples of duplicates:

In [12]:
print(googleDuplicateApps[:15])

['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


If we removed those duplicate records (while keeping only the one with the highest number of reviews) we will get the below total:

In [13]:
print('Expected length after removing duplicates:', 
      len(googleList)-1181)

Expected length after removing duplicates: 9659


To remove duplicates, we will:
1. Create a dictionary with an application name as key, and highest number of reviews as value.
2. Create a new dataset for all the unique applications based on the previous dictionary.

In [14]:
dict = {}

for app in googleList:
    if app[0] not in dict:
        dict[app[0]] = float(app[3])
    elif float(app[3]) > dict[app[0]]:
        dict[app[0]] = float(app[3])

Comparing results:

In [15]:
print('Expected number of unique apps:', len(googleList)-1181)
print('Actual number of unique apps:', len(dict))

Expected number of unique apps: 9659
Actual number of unique apps: 9659


From the above results, we can conclude that we have removed all the duplicates and kept the unique values in the dictionary for the next step.

In [16]:
print(dict['Instagram'])

66577446.0


The above result is an example for one application (which we analyzed its records ealrier) showing that the reviews value is the highest.

Now we create two empty lists, one for the end result of cleaned data, and one for comparison purposes. Then we loop through the applications in the main list and then check for two conditions, which if are true, then the application is added to the cleaned list:
1. if the number of reviews in the dictionary (highest reviews numebr) matches the number of reviews in the current app row.
2. if the name of the app is not in the temporary list of comparison purposes. This helps keep track of apps already added in the final cleaned list.

In [17]:
googleListClean = []
alreadyAdded = []

for app in googleList:
    name = app[0]
    numOfReviews = float(app[3])
    if dict[name] == numOfReviews and name not in alreadyAdded:
        googleListClean.append(app)
        alreadyAdded.append(name)

In [18]:
exploreData(googleListClean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


The number of rows for the cleaned data matches our expectation.

### Finding non-English content
This step requires a function that checks the `ord()` value of each letter of an application name. This function will return `True` if the name has a maximum of 3 non-ASCII characters, otherwise, it will return `False`.

*ASCII function:*

In [19]:
def checkIfEnglish(string):
    counter = 0
    for letter in string:
        if ord(letter) > 127:
            counter += 1
            if counter > 3:
                return False
    return True

Below, we check the function using different applications names.

In [20]:
print(checkIfEnglish('Instagram'))
print(checkIfEnglish('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(checkIfEnglish('Docs To Go™ Free Office Suite'))
print(checkIfEnglish('Instachat 😜'))

True
False
True
True


In [21]:
appleListEnglish = []
googleListEnglish = []

for app in appleList:
    if checkIfEnglish(app[1]):
        appleListEnglish.append(app)

for app in googleListClean:
    if checkIfEnglish(app[0]):
        googleListEnglish.append(app)

Now, we explore the new datasets:

In [22]:
exploreData(appleListEnglish,0,3,True)
print('\n\n')
exploreData(googleListEnglish,0,3,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16



['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Var

### Isolating free apps
We are interested in free applications, because this is the approach of the company.

In [23]:
appleListFinal = []
googleListFinal = []

for app in appleListEnglish:
    if app[4] == '0.0':
        appleListFinal.append(app)

for app in googleListEnglish:
    if app[7] == '0':
        googleListFinal.append(app)

Now, we explore our final datasets:

In [24]:
print(appleHeader,end='\n\n\n')
exploreData(appleListFinal,0,3,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16


In [25]:
print(googleHeader,end='\n\n\n')
exploreData(googleListFinal,0,3,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


## Analysis
---
The analysis phase of this document will concentrate on analyzing applications profiles on both Apple Store and Google Play Store markets, in-order to determine the kinds of applications that are likely to attract more users, since the revenues are tied to the number of users.

The validation stratedy for an application idea comprised of 3 stesp:
1. Building a minimal Androoid version of the application, and add it to the Google Play Store.
2. If the application has a good response from users, the developers go ahead and further continue developing it.
3. If the application is profitable after 6 months, the developers build an iOS version of the application and add it to the Apple Store.

### Common Genres
Identifying the columns for frequency tables to find out what are the most common genres in each market. For that, we'll create two functions:
1. Frequency generator function, which displays the percentage for frequency of a field in a dataset.
2. Display Table Functuon, which displays the frequency table in decending order.

*Frequency Table Generator Function:*

In [26]:
def freqTable(dataset, index):
    tempDict = {}
    finalDict = {}
    total = len(dataset)
    for row in dataset:
        if row[index] not in tempDict:
            tempDict[row[index]] = 1
        else:
            tempDict[row[index]] += 1
    for key in tempDict:
        finalDict[key] = (tempDict[key]/total)*100
    return finalDict

*Display Frequency Table Function:*

In [27]:
def displayTable(dataset, index):
    table = freqTable(dataset, index)
    tableDisplay = []
    for key in table:
        keyValAsTuple = (table[key],key)
        tableDisplay.append(keyValAsTuple)
    tableSorted = sorted(tableDisplay, reverse=True)
    for entry in tableSorted:
        print(entry[1], ':', entry[0])

*Apple Store - Prime Genre*

In [28]:
displayTable(appleListFinal, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


*Google Play Store Dataset - Category*

In [29]:
displayTable(googleListFinal, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

*Google Play Store Dataset - Genre*

In [30]:
displayTable(googleListFinal, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Noting that we're using cleaned datasets, we conclude from the above:

Apple App Store:
1. Most Common Prime Genre: Games

Google Play Store:
1. Most Common Category: Family (further check shows that these are mostly games for kids)
2. Most Common Genre: Tools (practicall applications seems to have a better representation in the store compared to Apple App Store.

For Apple Store, the recommeded application profile is on the entertainment side.
As for the Google Play Store, the recommended application profile is a little more towards the practical profile.

### Number of Installs
Average based on the genre. `Installs` column will be used for Google Play Store dataset, and `rating_count_tot` will be used for Apple App Store dataset.

**Apple App Store:** 

In this step, we will:
1. Isolate the applications of each genre.
2. Sum up the user ratings for the applications of that genre.
3. Divide the sum by the number of applications belonging to that genre (not by the total number of applications).

In [31]:
applePrimeGenre = freqTable(appleListFinal, 11)

In [32]:
for genre in applePrimeGenre:
    total=0      #stores the sum of number of user ratings
    lenGenre = 0 #stores the number of apps specific to each genre
    for app in appleListFinal:
        genreApp = app[11]
        if genreApp == genre:
            total += float(app[5])
            lenGenre += 1
    avgNumOfUserRating = total / lenGenre
    print(genre, ':', avgNumOfUserRating)

Productivity : 21028.410714285714
Social Networking : 71548.34905660378
Weather : 52279.892857142855
Catalogs : 4004.0
Medical : 612.0
News : 21248.023255813954
Reference : 74942.11111111111
Travel : 28243.8
Lifestyle : 16485.764705882353
Business : 7491.117647058823
Photo & Video : 28441.54375
Navigation : 86090.33333333333
Games : 22788.6696905016
Finance : 31467.944444444445
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Utilities : 18684.456790123455
Shopping : 26919.690476190477
Book : 39758.5
Music : 57326.530303030304
Health & Fitness : 23298.015384615384
Education : 7003.983050847458


Navigation and Social Networking application genres are most commonly installed. Those have influences and we need to go into more options (e.g. Reference genre)

**Google Play Store:**

In [33]:
googleCategory = freqTable(googleListFinal, 1)

In [34]:
for item in googleHeader:
    print(googleHeader.index(item), ':', item)

0 : App
1 : Category
2 : Rating
3 : Reviews
4 : Size
5 : Installs
6 : Type
7 : Price
8 : Content Rating
9 : Genres
10 : Last Updated
11 : Current Ver
12 : Android Ver


In [38]:
tempDict2 = {}
tempList2 = []
for category in googleCategory:
    total = 0 #sum of installs specific to each genre
    lenCategory = 0 #number of apps specific to each genre
    for app in googleListFinal:
        categoryApp = app[1]
        if categoryApp == category:
            numOfInstalls = app[5]
            numOfInstalls = numOfInstalls.replace('+','')
            numOfInstalls = numOfInstalls.replace(',','')
            total += float(numOfInstalls)
            lenCategory += 1
    avgNumInstalls = total / lenCategory
    print(category, ':', avgNumInstalls)    
    tempDict2[avgNumInstalls] = category
    tempList2.append(avgNumInstalls)

GAME : 15588015.603248259
SPORTS : 3638640.1428571427
HOUSE_AND_HOME : 1331540.5616438356
FOOD_AND_DRINK : 1924897.7363636363
BOOKS_AND_REFERENCE : 8767811.894736841
MEDICAL : 120550.61980830671
ENTERTAINMENT : 11640705.88235294
ART_AND_DESIGN : 1986335.0877192982
LIBRARIES_AND_DEMO : 638503.734939759
FAMILY : 3695641.8198090694
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
DATING : 854028.8303030303
TRAVEL_AND_LOCAL : 13984077.710144928
MAPS_AND_NAVIGATION : 4056941.7741935486
NEWS_AND_MAGAZINES : 9549178.467741935
SOCIAL : 23253652.127118643
WEATHER : 5074486.197183099
EDUCATION : 1833495.145631068
BEAUTY : 513151.88679245283
PARENTING : 542603.6206896552
LIFESTYLE : 1437816.2687861272
PRODUCTIVITY : 16787331.344927534
EVENTS : 253542.22222222222
HEALTH_AND_FITNESS : 4188821.9853479853
TOOLS : 10801391.298666667
FINANCE : 1387692.475609756
VIDEO_PLAYERS : 24727872.452830188
COMMUNICATION : 38456119.167247

In [39]:
tempList2.sort()
for item in tempList2:
    print(tempDict2[item],':',item)

MEDICAL : 120550.61980830671
EVENTS : 253542.22222222222
BEAUTY : 513151.88679245283
PARENTING : 542603.6206896552
LIBRARIES_AND_DEMO : 638503.734939759
AUTO_AND_VEHICLES : 647317.8170731707
COMICS : 817657.2727272727
DATING : 854028.8303030303
HOUSE_AND_HOME : 1331540.5616438356
FINANCE : 1387692.475609756
LIFESTYLE : 1437816.2687861272
BUSINESS : 1712290.1474201474
EDUCATION : 1833495.145631068
FOOD_AND_DRINK : 1924897.7363636363
ART_AND_DESIGN : 1986335.0877192982
SPORTS : 3638640.1428571427
FAMILY : 3695641.8198090694
MAPS_AND_NAVIGATION : 4056941.7741935486
HEALTH_AND_FITNESS : 4188821.9853479853
WEATHER : 5074486.197183099
PERSONALIZATION : 5201482.6122448975
SHOPPING : 7036877.311557789
BOOKS_AND_REFERENCE : 8767811.894736841
NEWS_AND_MAGAZINES : 9549178.467741935
TOOLS : 10801391.298666667
ENTERTAINMENT : 11640705.88235294
TRAVEL_AND_LOCAL : 13984077.710144928
GAME : 15588015.603248259
PRODUCTIVITY : 16787331.344927534
PHOTOGRAPHY : 17840110.40229885
SOCIAL : 23253652.127118643

Based on the above results, we can conclude that the application profile should be towards entertainment/games categories. As well as, for an aveage use, a book_and_reference category.