## Profitable App Profiles for the App Store and Google Play Markets
## Introduction
In this scenario, as a data analyst for a company that builds Android and iOS mobile apps, this project aims to make data-driven decisions in regards to the mobile apps market.

The company builds apps that are free to download, install, and made available on Google Play and App Store. The main source of the company revenue comes from the in-app ads of which is influenced by the number of its apps users who interacts with the ads. The goal for this project is to analyze data to help developers understand what type of apps are the most profitable and most likely to attract more users.

## Data Collection and Exploration
As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over 4 million apps requires a significant amount of time and money, so a subset of the data or a sample size was chosen to be analyzed instead. There are two data sets provided by Kaggle which can be utilized to perform the data analysis:

* A [data set](https://www.kaggle.com/lava18/google-play-store-apps/home) containing data of about 10,000 Android apps from Google Play collected in August 2018
* A [data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) containing data of about 7,000 iOS apps from the App Store collected in July 2017

In [6]:
# Import dependencies
from csv import reader

# Read App Store data set
open_file_apple = open('Resources/AppleStore.csv', encoding='utf-8')
read_file_apple = reader(open_file_apple)
ios = list(read_file_apple)
ios_head = ios[0]
ios_data = ios[1:]

# Read Google Play data set
open_file_android = open('Resources/googleplaystore.csv', encoding='utf-8')
read_file_android = reader(open_file_android)
android = list(read_file_android)
android_head = android[0]
android_data = android[1:]

In [7]:
# Exploring both data sets by creating a function
# that displays desired rows of data
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [8]:
# Exploring ios data set
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7198
Number of columns: 16


In the ios data set, the first row contains a list of all of the columns title names. Titles that are of interest for this project are:

* 'track_name' (app name)
* 'currency' (currency type)
* 'price' (app price)
* 'rating_count_tot' (total rating counts)
* 'prime_genre' (app genre)

Furthermore, the ios data set contains 7,198 rows of ios apps and 16 columns of ios app descriptions.



In [9]:
# Exploring android data set
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


In the android data set, the first row contains a list of all of the columns title names. Titles that are of interest for this project are:

* 'App' (app name)
* 'Category' (app category)
* 'Rating' (overall app rating)
* 'Reviews' (app user reviews)
* 'Installs' (number of apps installs)
* 'Type' (paid or free apps)
* 'Price' (app price)
* 'Genres' (app genre)

Furthermore, the android data set contains 10,842 rows of ios apps and 13 columns of android app descriptions.



## Data Cleaning
### Removing Inaccurate Data
The Google Play data set has a dedicated [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), and in [one of th discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015), an error for a certain row (entry 10472) in the android data set was detected.

In [10]:
# Preview the located data entry row that may contain errors
# Preview a sample row that contain correct data entries for comparison
print(android_head)
print('\n')
print(android_data[10472])
print('\n')
print(android_data[10471])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


In the above comparisons (title row, row 10472 error, and row 10471), row 10472 displays its 'Category' cell as '1.9' rather than an actual, text-centric category such as 'PERSONALIZATION' from a proper row, row 10471. Row 10472 also displays a 'Rating' of '19' which is impossible as app ratings are rated up to a maximum of 5. It is also worth noting that there is an empty string in row 10472 under the 'Genre' column title description. So it seems that row 10472 contains misplaced and erroneous data, and therefore, needs to be omitted from the data set.

In [11]:
# Omit the row that contains bad data entries
print('# of rows before deletion: ', len(android_data))
del android_data[10472]
print('# of rows after deletion: ', len(android_data))

# of rows before deletion:  10841
# of rows after deletion:  10840


### Removing Duplicate App Entries

With further inspection in the [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion) of the android data set, it was brought up that the data contains multiple duplicate entries. For example, there are four entries for the 'Instagram' app.

In [12]:
# Locate all rows that contain 'Instagram' app
print(android_head)
print('\n')

for app in android_data:
    name = app[0]
    if name == 'Instagram':
        print(app)
        print('\n')

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




In [13]:
# Locate all duplicate entries
duplicate_android_apps = []
unique_android_apps = []

for app in android_data:
    name = app[0]
    if name in unique_android_apps:
        duplicate_android_apps.append(name)
    else:
        unique_android_apps.append(name)
        
print('# of duplicate android apps: ', len(duplicate_android_apps))
print('# of unique android apps: ', len(unique_android_apps))

# of duplicate android apps:  1181
# of unique android apps:  9659


In [14]:
# Locate all duplicate entries
duplicate_ios_apps = []
unique_ios_apps = []

for app in ios_data:
    name = app[0]
    if name in unique_ios_apps:
        duplicate_ios_apps.append(name)
    else:
        unique_ios_apps.append(name)
        
print('# of duplicate iOS apps: ', len(duplicate_ios_apps))
print('# of unique iOS apps: ', len(unique_ios_apps))

# of duplicate iOS apps:  0
# of unique iOS apps:  7197


Finding unique android apps is not enough when its duplicates show conflicting data. For example, in the above duplicate 'Instagram' apps example, all parameters of the 'Instagram' apps are the same except for the number of review counts in the 'Reviews' column title.

To avoid random selection, it is decided that the app with the highest review count is considered the more reliable data because the more reviews an app has, the more reliable the ratings of the apps will be. 

In [15]:
# Create an empty dictionary
reviews_max = {}

# Loop through android data set
for app in android_data:
    # Assign app name
    name = app[0]
    # Convert reviews into float data type
    n_reviews = float(app[3])
    # If current app is in the dictionary and
    # the current review count is greater than the previous review count,
    # Update to the current review count
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    # If current app is not in the dictionary, add the app and review count to it
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [16]:
# Count the # of unique apps with max # of review counts
print('# of unique android apps with updated review counts: ', len(reviews_max))

# of unique android apps with updated review counts:  9659


Using the dictionary created above, we can update the android data to contain only unique data sets.

To update the *android_data* data set to remove duplications, two new lists are created: *android_clean* for an updated android_data version, and *already_added* for keeping track of the added apps.

While looping through the *android_data* rows, we can compare the number of reviews of the current row to the number of reviews of the same app in the *reviews_max* dictionary.

The *already_added* list helps to keep track of the unique apps, and to avoid duplicates if the number of reviews of each duplicated apps are equivalent.

In [17]:
# Create two empty lists: one for cleaned data set, one for storing app names
android_clean = []
already_added = []

# Loop through android data set
for app in android_data:
    # Assign app name
    name = app[0]
    # Convert reviews into float data type
    n_reviews = float(app[3])
    # Update the empty lists with the following the conditions
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

In [18]:
# Exploring android data set
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


### Removing non-English Apps
Both app stores, Google Play and App Store, contain app names with non-English characters. For the scope of this project, the company creates apps that are directly only towards English-speaking audience.

In the computer world, the [ASCII](https://en.wikipedia.org/wiki/ASCII) (American Standard Code for Information Interchange) system dictates that the commonly used English texts are all in the range of numbers from 0 to 127.

Also, some English apps contain characters that does not fall within the ASCII range such as popular emoji characters and characters like ™. If these characters get omitted in the data set, we will lose a lot of useful data since many English apps will be incorrectly labeled as non-English. Although, not perfect, we can make exceptions to the English app criteria to limit non-English characters in the app name to no greater than three.

In [19]:
# Create a function that deciphers whether an app name contain ASCII coded numbers
# Where the numbers define an english character
def is_english(app_name):
    not_english = 0
    for character in app_name:
        # If ASCII number is greater than 127
        if ord(character) > 127:
            not_english += 1
    if not_english > 3:
        return False
    else:
        return True

In [20]:
# ASCII number function approval test
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


In [21]:
# Create a new list to hold english android apps
english_android_app = []

# Loop through android_clean data set
for app in android_clean:
    # Assign app name
    name = app[0]
    # Update the empty lists if app name is english
    if is_english(name):
        english_android_app.append(app)

# Exploring android data set
explore_data(english_android_app, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


In [22]:
# Create a new list to hold english ios apps
english_ios_app = []

# Loop through android_clean data set
for app in ios:
    # Assign app name
    name = app[1]
    # Update the empty lists if app name is english
    if is_english(name):
        english_ios_app.append(app)

# Exploring android data set
explore_data(english_ios_app, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6184
Number of columns: 16


### Isolating Free Apps
As mentioned, we are looking for free apps data from both the Google Play and Apps Store market.

In [23]:
# Create a new list to hold free android apps
android_free = []

# Loop through previous english_android_app data set
for app in english_android_app:
    # Assign app name and price
    name = app[0]
    price = app[7]
    # Update the empty lists if app is free
    if price == '0':
        android_free.append(app)

# Exploring android data set
explore_data(android_free, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


In [24]:
# Create a new list to hold free ios apps
ios_free = []

# Loop through previous english_ios_app data set
for app in english_ios_app:
    # Assign app name and price
    name = app[0]
    price = app[4]
    # Update the empty lists if app is free
    if price == '0.0':
        ios_free.append(app)

# Exploring ios data set
explore_data(ios_free, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16


## Data Analysis
So far, we have cleaned up both android and iOS data sets by removing inaccuracies in the data, removing duplicate app entries in the data, removing non-English apps in the data, and isolating the free apps in the data. We can now begin to analyze the clean data sets to tackle the purpose of the project goal: to determine the type of apps that are most profitable and attractive to majority of mobile apps users.

The validation strategy for an app idea consists of:
1. building an android app version.
2. testing and monitoring the android app for user responses.
3. building the iOS app version if android is deemed profitable.

Therefore, we wil generate frequency tables to find out what are the most common genres in each market. The *'prime_genre'* column title from the iOS data set, and the *'Category'* and *'Genres'* column title of the android data set will be used to build the frequency table.

In [25]:
# Create a function that performs a frequency count
def freq_table(dataset, index):
    frequency_table = {}
    total_count = 0
    
    for row in dataset:
        total_count += 1 
        value = row[index]
        
        if value in frequency_table:
            frequency_table[value] += 1
        else:
            frequency_table[value] = 1
    
    # Convert count values into percentages
    table_percent = {}
    
    for key in frequency_table:
        percent = (frequency_table[key] / total_count) * 100
        table_percent[key] = percent
    
    return table_percent

In [26]:
# Creat a function that converts dictionary into tuple and
# Performs a descending sorting
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [27]:
# View the iOS genre frequencies
display_table(ios_free, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


The most common genre from the App Store data set is *'Games'* which heavily dominates the iOS app market at 58% which *'Entertainment'* at second place with just 7%. Generally, most of the iOS apps looks to be for entertainment purposes (games, photo and video, social networking, sports, music) rather than for practical purposes (education, shopping, utilities, productivity, lifestyle). This finding, however, can be shortsighted as large number of apps of a popular genre does not mean that the particular genre will have a large user base.

In [28]:
# View the android category frequencies
display_table(android_free, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

The most common genre from the Google Play data set for the *'Category'* section is *'Family'* at 18% with *'Game'* right behind it at only 9%. This finding shows the opposite of what you would expect when comparing against the iOS apps App Store data.

In [29]:
# View the android genre frequencies
display_table(android_free, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Upon further inspection, by analyzing the *'Genre'* column of the android data set, the data supports the finding that the android Google Play apps market lean towards a more practical purpose rather than entertainment while the iOS App Store shows the opposite of favoring entertainment apps.

### Top iOS Apps for App Store

In [30]:
# Create a frequency table for the 'prime_genre' column
# to get unique app genres from the App Store
ios_genre = freq_table(ios_free, 11)

for genre in ios_genre:
    total = 0 # sum of user ratings specific to each genre
    len_genre = 0 # number of apps specific to each genre
    
    for app in ios_free:
        genre_app = app[11]
        
        if genre_app == genre:
            ios_rating = float(app[5]) # convert 'rating_count_tot' into float
            total += ios_rating
            len_genre += 1
    avg_ios_rating = total / len_genre
    
    print(genre, ':', avg_ios_rating)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


From the iOS genre frequency table, the top 3 popular apps seems to be:
1. *'Navigation'* with the most ratings on average at about 86,000 counts.
2. *'Reference'* at around 74,000 counts.
3. *'Social Networking'* at around 71,000 counts.

### Top Android Apps for Google Play

In [31]:
# Create a frequency table for the 'Category' column
# to get unique app genres from Google Play
android_category = freq_table(android_free, 1)

for category in android_category:
    total = 0 # sum of installs specific to each category
    len_category = 0 # number of apps specific to each category
    
    for app in android_free:
        category_app = app[1]
        
        if category_app == category:
            # Locate the 'Installs' column and convert string into float
            app_installs = app[5]
            app_installs = app_installs.replace('+', '')
            app_installs = app_installs.replace(',', '')
            app_installs = float(app_installs)
            total += app_installs
            len_category += 1
    avg_android_install = total / len_category
    
    print(category, ':', avg_android_install)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

From the android category frequency table, the top 3 popular apps seems to be:
1. *'COMMUNICATION'* with the most ratings on average at about 38,000 counts.
2. *'VIDEO_PLAYERS'* at around 24,000 counts.
3. *'SOCIAL'* at around 21,000 counts.

## Conclusions
In this project, both iOS App Store and android Google Play mobile apps sample data sets were collected from kaggle, cleaned up, and analyzed. The goal of the project was to find and determine what type of apps are popular with its users and profitable for in-app ad business means.

The type of app that seems to land at the top charts for both App Store and Google Play revolved around social networking or social media. Therefore, it is profitable for a company to design its own free social app that will take on something new and trendy as to avoid competition from already popular brands such as Facebook, Youtube, Instagram, Snapchat, etc.

### Analysis Next Steps:
* Analyze the frequency table for the Genre column of the Google Play data set, and see whether you can find useful patterns.
* Assume we could also make revenue via in-app purchases and subscriptions, and try to find out which genres seem to be liked the most by users — you could examine app ratings here.