# Analyzing Mobile App Data For Profitable Apps

We are working as data analysts for a company that builds `free` Android and iOS mobile modile apps for `English-speaking audience`. Their main source of revenue consisting of in-app ads. Thus, the number of users of their apps determines their revenue for any given app.</p>
Our goal for this project is to analyze the data to help the app developers understand what type of apps are likely to attract more users on Google Play and the Apps Store.

## Table of Content

> #### 1. Data Sources
> #### 2. Open the datasets
> #### 3. Explore the datasets
> #### 4. Deleting wrong data
> #### 5. Removing duplicate entries: part 1
> #### 6. Removing Duplicate Entries: Part 2
> #### 7. Removing Non-English Apps: Part 1
> #### 8. Removing Non-English Apps: Part 2
> #### 9. Isolating the Free Apps
> #### 10. Most Common Apps by Genre: Part 1
> #### 11. Most Common Apps by Genre: Part 2
> #### 12. Most Common Apps by Genre: Part 3
> #### 13. Most Popular Apps by Genre on the App Store
> #### 14. Most Popular Apps by Genre on Google Play
> #### 15. Conclusion

---------------------

### Data Sources

In this project, we use two readily available data sets which are suitable for our goals:

1. A dataset containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from [Google Play data](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).
2. A dataset containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from [App Store data](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

### Open the datasets

In [2]:
from csv import reader

### Opening the Apple App Store data ###
open_apple_file = open("AppleStore.csv", encoding="utf8")
apple_file_reader = reader(open_apple_file)
# convert the data to a list of lists
apple_apps_data = list(apple_file_reader)

### Opening the Google Play Store data ###
open_google_file = open("googleplaystore.csv", encoding="utf8")
google_reader = reader(open_google_file)
google_apps_data = list(google_reader)

### Explore the datasets

In [3]:
def explore_data(dataset, start,end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row,"\n")
        
    if rows_and_columns:
        print("Number of rows:", len(dataset[1:]))
        print("Number of columns:", len(dataset[0]))     

In [4]:
### Explore the Apple Apps Store data ###

explore_data(apple_apps_data,0,3,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] 

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] 

Number of rows: 7197
Number of columns: 16


In [5]:
# get the columns names of the Apple Apps Store data
applestore_header = apple_apps_data[0]
print(applestore_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Exploring the Apple Apps Store datasets above, we made the following findings:
* The Apple App Store dataset has `7197 app records`, which has been recorded in `16 columns`.
* The first row is the header which bears the name of each column and  provides the clue of the type of data that column hold.

We printed the first 3 rows of the data set, where the first row is the header.
Seeking to answer the question of our goal, we highlight the following columns of interest to us:

|Column|Description|
|:---|:---|
|track_name|App name|
|price|Price amount|
|rating_count_tot|User rating counts (for all versions)|
|rating_count_ver|User rating counts (for current version)|
|user_rating|Average user rating value (for all versions)|
|cont_rating|Content rating|
|prime_genre|App primary genre|

---------------------------

In [6]:
### Explore the Google Play Store data ###
explore_data(google_apps_data,0,3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] 

Number of rows: 10841
Number of columns: 13


In [7]:
# get the columns names of the Apple Apps Store data
googletore_header = google_apps_data[0]
print(googletore_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Exploring the Google Play Store datasets above, we made the following findings:
* The Google Play Store dataset has `10841 apps record`, which has been recorded in `13 columns`.
* The first row is the header which bears the name of each column and  provides the clue of the type of data that column hold.

We printed the first 3 rows of the data set, where the first row is the header.
Seeking to answer the question of our goal, we highlight the following columns of interest to us:

|Column|Description|
|:---|:---|
|App|Name of the app|
|Category|The category of the app|
|Reviews|Reviews made on the app|
|Installs|Total number of app installations|
|Type|Whether app is free or paid|
|Price|Price of the app|
|Genres|The app genre|

---------------------------

### Deleting wrong data

It is crucial that we ensurethe data we are analyzing is clean. We will therefore perform the following
* Detect inaccurate data, and correct or remove them.
* Detect duplicate data, and remove the duplicates.

From the [discussion section]("https://www.kaggle.com/lava18/google-play-store-apps/discussion") we discover there is an error on `row 10473` of the Google Play Store data as the rating is 19 which is off the rating range of 1 to 5. Note that the dataset contains the header.

In [8]:
print(googletore_header,"\n")
# The row with inaccurate data (rating)
print(google_apps_data[10473])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [9]:
# remove the the row with rating 19 at index 10473
del google_apps_data[10473]

In [10]:
print(google_apps_data[10473])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


### Removing duplicate entries: part 1

Going through this [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion) we discover there are duplicate apps in the Google Play Store data.


In [11]:
# check duplicate of Instagram
count = 0
for app in google_apps_data:
    app_name = app[0]
    if app_name == "Instagram":
        print(app,"\n")
        count +=1
    
print("Instragram duplicates: ",count)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] 

['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] 

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] 

['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] 

Instragram duplicates:  4


From above, we realize `Instagram` is `repeating 4 times`. This might be the case with other apps. Therefore we need to run a check on all the apps.

-------------

In [12]:
# loop through entire Google Play Store data to check duplicates
duplicate_apps = []
unique_apps = []
for app in google_apps_data[1:]:
    app_name = app[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)

print("Unique apps:", len(unique_apps))
print("Duplicates apps:", len(duplicate_apps))

Unique apps: 9659
Duplicates apps: 1181


 We observe that there are `1181 duplicate rows` in our google play store data.
 
However we will not remove these rows at random as they still hold vital information. For example if we look at the Instragram duplicate rows we printed above, we realize the major difference is at the level of the `Reviews`, which indicates that the data were collected at different time. Therefore we will keep the most recently collected row which is the row with the highest number of reviews

-------------

### Removing Duplicate Entries: Part 2

In [13]:
reviews_max = dict()
for app in google_apps_data[1:]:
    app_name = app[0]
    n_reviews = float(app[3])
    if app_name in reviews_max and reviews_max[app_name] < n_reviews:
        reviews_max[app_name] = n_reviews
    if app_name not in reviews_max:
        reviews_max[app_name] = n_reviews

In [14]:
# number of Unque rows in dictionary
print(len(reviews_max))

9659


In [15]:
# Using the reviews_max dictionary to remove duplicate rows
android_clean = []
already_added = []
for app in google_apps_data[1:]:
    app_name = app[0]
    n_reviews = float(app[3])
   
    if n_reviews == reviews_max[app_name] and app_name not in already_added:
        android_clean.append(app)
        already_added.append(app_name)
        

We use supplementary condition above to accountfor those cases where the highest number of reviews of a duplicate app is the same for more than one entry.

In [16]:
#We explore the android_clean data
print(android_clean[:3])
print("\nNumber of rows:", len(android_clean))
print("Number of columns:", len(android_clean[0]))  

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]

Number of rows: 9659
Number of columns: 13


Now we are left with `9659 unique rows` in our dataset. However, since our `target audience are English-speaking`, we will also remove remove all apps that are not English-speaking

------------

### Removing Non-English Apps: Part 1

 Since we are only interested in app that are designed for an English-speaking audience. we will explore the datsets to verify if there are any non-English app then we remove them.

In [17]:
# Some rows that has non-english apps
print(apple_apps_data[814][1])
print(apple_apps_data[6732][1])
print("\n")
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


Above, we see that the apps in `row 814 and 6732` of the Apple App Store dataset are `non-English apps`. This applies too with the apps at `row 4412 and 7940` of the Google Play Store dataset.

There are possibly many of these non-English apps in our datasets. We will proceed the remove them.

------------

In [18]:
# get the corresponding number for each character.
print(ord("a"))
print(ord("A"))
print(ord("爱"))
print(ord("5"))
print(ord("+"))

97
65
29233
53
43


Since the numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the [ASCII](https://en.wikipedia.org/wiki/ASCII) (American Standard Code for Information Interchange) system.

We will use this number range, to build a function that detects whether a character belongs to the set of common English characters or not.

------------

In [19]:
# takes string and determine if all its char are in the ASCII range
def char_check(string):
    for character in string:
        if ord(character) >127:
            return False
    return True

In [20]:
print(char_check('Instagram'))
print(char_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(char_check('Docs To Go™ Free Office Suite'))
print(char_check('Instachat 😜'))

True
False
False
False


Above we write a function that takes in a string and `returns True if contain common English character` and `Fales otherwise`. We have used the fuction to demonstrate that `Intagram is an English app` while the second print statement returns a False to show that the characters are not common English characters.

However, `"Docs To Go™ Free Office Suite"` and `"Instachat 😜"` are wrongly not recognized as English apps. We will thus need to find a way around this

------------

### Removing Non-English Apps: Part 2

Looking at the step above, we can find out why our function didn,t recognized some English apps

In [21]:
print(ord("™"))
print(ord("😜"))

8482
128540


We see that the `emoji and ™` character falls outside the ASCII range and have corresponding numbers `over 127`. This implies using our function above will lead to data loss as many English apps will likely be incorrectly labeled as non-English.

We can minimize the impact of data loss by removing only apps whose `name has more than three characters with corresponding numbers falling outside the ASCII range`. With this, all English apps with up to three emojis or other special characters will still be labeled as English. This is not perfect but `it's fairly effective.`

---------

In [22]:
# check if app has more 3 char outside the ASCII range and label it non-english
def english_app(string):
    non_english = 0
    for character in string:
        if ord(character) >127:
            non_english += 1
    if non_english > 3:
        return False
    else:
        return True

In [23]:
print(english_app("Docs To Go™ Free Office Suite"))
print(english_app("Instachat 😜"))
print(english_app("爱奇艺PPS -《欢乐颂2》电视剧热播"))

True
True
False


Now we get a better result. We will then us this to explore the datasets to filter all the non-English apps.

-----------

In [24]:
android_english = []
apple_english = []

# filter all non-English apps from the android_clean dataset
for app in android_clean:
    app_name = app[0]
    if english_app(app_name):
        android_english.append(app)

# filter all non-English apps from the apple apps store dataset
# we start looping from index 1 since the dataset contains the header
for app in apple_apps_data[1:]:
    app_name = app[1]
    if english_app(app_name):
        apple_english.append(app)

In [25]:
# We explore the apple_english data
print(apple_english[:3])
print("\nNumber of rows:", len(apple_english))
print("Number of columns:", len(apple_english[0]))

[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']]

Number of rows: 6183
Number of columns: 16


In [26]:
# We explore the android_english data
print(android_english[:3])
print("\nNumber of rows:", len(android_english))
print("Number of columns:", len(android_english[0]))

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]

Number of rows: 9614
Number of columns: 13


From the output above, we can see that the apple_apps_data which had 7197 rows has been clean and we are now left with only `6183 rows as English apps` in the `apple_english`

Also, the we initially had 9659 rows in android_clean dataset. After removing non-English apps, we are now left with `9614 English apps` in the `android_english` dataset.

----------

### Isolating the Free Apps

We have clean our data removing inaccurate data, duplicate entries and non-English app.

However we are only interested in `Free apps` as our company only designs apps that are `free to download and install`

In [27]:
android_free = []
apple_free = []

# filter all paid apps from the apple_english dataset
for app in apple_english:
    price = app[4]
    if price == '0.0':
        apple_free.append(app)

# filter all paid apps from the android_english dataset
for app in android_english:
    price = app[7]
    if price == '0':
        android_free.append(app)

In [28]:
print("\nNumber of Android Free apps:", len(android_free))
print("\nNumber of Apple Free apps:", len(apple_free))


Number of Android Free apps: 8864

Number of Apple Free apps: 3222


From the result above, we have cleaned and filtered paid apps and now left with
> * 8864 android free apps stored in the android_free dataset
> * 3222 apple ios free apps stored in the apple_free dataset.

We will used these datasets to do our analysis.

-------------

### Most Common Apps by Genre: Part 1

Now we step into our goal of determining the apps that are likely to attract more users since the number of people using our apps affect the companiy's revenue.

To minimize risks and overhead, our validation strategy for an app idea has three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. A profile that works well for both markets might be a productivity app.

We will thus build a `frequency table` to determine the most common genres for each dataset. Looking at our datasets headers above the genre is on the `prime_genre` column at index 11 for Apple App Store dataset and on the `Category` and `Genres` column at index 1 and 9 respectively for Google Play Store dataset

### Most Common Apps by Genre: Part 2

In [29]:
# function for generating frequency tables
def freq_table(dataset, index):
    genre_freq_table = dict()
    app_total = 0
    
    for app in dataset:
        app_genre = app[index]
        app_total += 1
        if app_genre in genre_freq_table:
            genre_freq_table[app_genre] += 1
        else:
            genre_freq_table[app_genre] = 1

    # convert frequencies to percentages
    percentage_table = dict()
    for key in genre_freq_table:
        percentage = (genre_freq_table[key] / app_total) * 100
        percentage = round(percentage, 4)
        percentage_table[key] = percentage

    return percentage_table

# function to take our percentage table and sort them
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Most Common Apps by Genre: Part 3

In [30]:
# display the frequency table of prime_genre column of Apple apps
display_table(apple_free, 11)

Games : 58.1626
Entertainment : 7.8833
Photo & Video : 4.9659
Education : 3.6623
Social Networking : 3.2899
Shopping : 2.6071
Utilities : 2.514
Sports : 2.1415
Music : 2.0484
Health & Fitness : 2.0174
Productivity : 1.7381
Lifestyle : 1.5829
News : 1.3346
Travel : 1.2415
Finance : 1.1173
Weather : 0.869
Food & Drink : 0.807
Reference : 0.5587
Business : 0.5276
Book : 0.4345
Navigation : 0.1862
Medical : 0.1862
Catalogs : 0.1241


From the result, we notice that the `Game` genre constitutes `58.1626%` of all the free English app. This is is more than half of all the app. `Enterainment` apps follows with close to `8%`, `Photo and Video with close to 5%` and `Education with close to 4%`

Going by this result, we can say that apps developed for fun (Games, Entertainment, Photo & Video) dorminates the Apps Store.

Let's examine the android apps too

In [31]:
# display the frequency table of Category column of Google apps
display_table(android_free, 1)

FAMILY : 18.9079
GAME : 9.7247
TOOLS : 8.4612
BUSINESS : 4.5916
LIFESTYLE : 3.9034
PRODUCTIVITY : 3.8921
FINANCE : 3.7004
MEDICAL : 3.5311
SPORTS : 3.3958
PERSONALIZATION : 3.3168
COMMUNICATION : 3.2378
HEALTH_AND_FITNESS : 3.0799
PHOTOGRAPHY : 2.9445
NEWS_AND_MAGAZINES : 2.7978
SOCIAL : 2.6625
TRAVEL_AND_LOCAL : 2.3353
SHOPPING : 2.245
BOOKS_AND_REFERENCE : 2.1435
DATING : 1.8615
VIDEO_PLAYERS : 1.7938
MAPS_AND_NAVIGATION : 1.3989
FOOD_AND_DRINK : 1.241
EDUCATION : 1.162
ENTERTAINMENT : 0.9589
LIBRARIES_AND_DEMO : 0.9364
AUTO_AND_VEHICLES : 0.9251
HOUSE_AND_HOME : 0.8236
WEATHER : 0.801
EVENTS : 0.7107
PARENTING : 0.6543
ART_AND_DESIGN : 0.6431
COMICS : 0.6205
BEAUTY : 0.5979


From the result, we notice that unlike in the ios dataset where Game genre constituted more than half the total free English app, the `FAMALY` category tops the android apps market with close to `19%`. This is followed by `GAME` apps with close to `10%`, `TOOLS with close to 8%` and `BUSINESS with close to 5%`

Finally let's examine the android Genre column

In [32]:
# display the frequency table of Genres column of Google apps
display_table(android_free, 9)

Tools : 8.4499
Entertainment : 6.0695
Education : 5.3475
Business : 4.5916
Productivity : 3.8921
Lifestyle : 3.8921
Finance : 3.7004
Medical : 3.5311
Sports : 3.4634
Personalization : 3.3168
Communication : 3.2378
Action : 3.1024
Health & Fitness : 3.0799
Photography : 2.9445
News & Magazines : 2.7978
Social : 2.6625
Travel & Local : 2.324
Shopping : 2.245
Books & Reference : 2.1435
Simulation : 2.042
Dating : 1.8615
Arcade : 1.8502
Video Players & Editors : 1.7712
Casual : 1.7599
Maps & Navigation : 1.3989
Food & Drink : 1.241
Puzzle : 1.1282
Racing : 0.9928
Role Playing : 0.9364
Libraries & Demo : 0.9364
Auto & Vehicles : 0.9251
Strategy : 0.9138
House & Home : 0.8236
Weather : 0.801
Events : 0.7107
Adventure : 0.6769
Comics : 0.6092
Beauty : 0.5979
Art & Design : 0.5979
Parenting : 0.4964
Card : 0.4513
Casino : 0.4287
Trivia : 0.4174
Educational;Education : 0.3949
Board : 0.3836
Educational : 0.3723
Education;Education : 0.3384
Word : 0.2595
Casual;Pretend Play : 0.2369
Music : 0.20

From the result, we get a clearer picture of the frequency of the android app genre. Here the `Tools` genre tops with close to `8%`. This is followed by `Entertainment` apps with close to `6%`, `Education with close to 5%` and `Business with close to 4%`

----------------

 ### Most Popular Apps by Genre on the App Store
 
In the last section, we saw that apps designed for fun dominate the App Store, while Google Play shows a more balanced landscape of both practical and fun apps.
 
Now, we'd proceed to determine the kind of apps with the most users.

We will start with the App Store dataset. However, since this dataset does not contain the Installs column, we will consider using the total number of users ratings `rating_count_tot column` column as a proxy.

In [33]:
# calculate average number of user ratings per app genre on the App Store

# generate frequency table for the prime_genre to get the unique app genres
genre_freq = freq_table(dataset=apple_free, index=11)

for genre in genre_freq:
    # stores the sum of the number of user ratings specific to each genre.
    total = 0 
    # store the number of apps specific to each genre.
    len_genre = 0
    
    for app in apple_free:
        genre_app = app[11]
        if genre_app == genre:
            rating_count = float(app[5])
            total += rating_count
            len_genre += 1
    avg_rating = total / len_genre
    print( genre, ":" ,avg_rating)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


From the average we obtain above, `Navigation apps` has the highest number of average user rating reviews`(86090)`. The `Waze - GPS Navigation, Maps & Real-time Traffic` app (with `345046` user reviews) projected this value, this will be a good profile recommendation for the App Store.

Also, we see that `Reference` apps are the second highest with the average user rating reviews`(74942)`

------------------

In [34]:
# print number of ratings for  Navigation apps
for app in apple_free:
    if app[11] == "Navigation":
        print(app[1], ":", app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


This is means Navigation apps are the most popular apps.
However only two apps (Waze and Google Maps) are pushing this value high. This means if you remove these two apps, the average will definitely fall. 

At this point, we will explore the Reference apps which had the second highest average user rating reviews. This might give us a clear picture of the popularity apps in the Apps Store

----------

In [35]:
# print number of ratings for  Navigation apps
for app in apple_free:
    if app[11] == "Reference":
        print(app[1], ":", app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


This result shows a less skewed trend. This also goes to align with idea above that fun apps dorminates the App Store. Therefore, we could see Reference apps as a better recomendation for the Aps Store market.

-----------

### Most Popular Apps by Genre on Google Play

For the Google Play data set, we can find the information concerning the most porpular app (have the most users) in the Installs column.

In [36]:
# the Installs column with it frequencies
display_table(android_free, 5)

1,000,000+ : 15.7265
100,000+ : 11.5523
10,000,000+ : 10.5483
10,000+ : 10.1986
1,000+ : 8.3935
100+ : 6.9156
5,000,000+ : 6.8254
500,000+ : 5.5618
50,000+ : 4.7721
5,000+ : 4.5126
10+ : 3.5424
500+ : 3.2491
50,000,000+ : 2.3014
100,000,000+ : 2.1322
50+ : 1.9179
5+ : 0.7897
1+ : 0.5077
500,000,000+ : 0.2708
1,000,000,000+ : 0.2256
0+ : 0.0451
0 : 0.0113


Though the values of the Installs ends with a plus (Example 200,000+), we're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. 

-----------

In [37]:
# Calculate average number of apps installs

# generate frequency table for the Category to get the unique app genres
genre_freq = freq_table(dataset=android_free, index=1)

for genre in genre_freq:
    # stores the sum of installs specific to each genre.
    total = 0 
    # stores the number of apps specific to each genre.
    len_genre = 0
    
    for app in android_free:
        category_app = app[1]
        if category_app == genre:
            n_installs = app[5]
            # Remove any + or , character from the string.
            n_installs = n_installs.replace(",", "")
            n_installs = n_installs.replace("+", "")
            n_installs = float(n_installs)
            total += n_installs
            len_genre += 1
            
    avg_installs = total / len_genre
    print(genre, ":", avg_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

From the result of the average above, `COMMUNICATION` apps are the most installed apps `(38456119)`. We can however verify to see if all communication apps have this high number of installs or this value is skewed by few giant apps with bilions of installs.

In [40]:
# print number of Installs for  Communication apps
for app in android_free:
    if app[1] == "COMMUNICATION":                            
        print(app[0], ":", app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

In [None]:
# print number of Installs for the giant Communication apps
for app in android_free:
    if app[1] == "COMMUNICATION" and (app[5] == "1,000,000,000+" or app[5] == "500,000,000+" or app[5] == "100,000,000+"):                            
        print(app[0], ":", app[5])

The results shows exactly what we had quesioned above. Some few giant communication apps like `WhatsApp Messenger`,`Messenger`, `Skype`, and `Google Chrome: Fast & Secure`, are repsonsible for the high average apps installs on the Google Play market.

Since in our project we seek to find popularity in both the App Store and Google Play, it will be interested to explore the reference apps in the Play Store since it had already shown great popularity in App Store.

In [41]:
# print number of Installs for  Communication apps
for app in android_free:
    if app[1] == "BOOKS_AND_REFERENCE":                            
        print(app[0], ":", app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

This result gives us a better view. Looking at the result, we see a less skewed trend with high number of installs. This means BOOKS_AND_REFERENCE apps are also very popluar on the Google Play market space. Since this popularity is on both the Apps Store and the Google Play market, we can consider recommending books to our company.

-----------

### Conclusion

Since Books are having high popularity on both the Apps Store and the Google Play market, we recommend book apps to our company. To get the best out of the competitive market of the genre, we also recommend adding more attractive feature that will make the app stand out.