# Profitable App Profiles for the App Store and Google Play Markets

## About this project
Company: Builds free Android and iOS apps, revenue source is in-app adds

## Goal of project
Analyze Apple Store and Google Play data to help the company's developers understand what types of apps are likely to attract more users

In [1]:
## Apple Store data set ##
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [2]:
## Google Play data set ##
opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

Exploration Function
Explores data a few lines at a time to be more readable. The fundction also has an option to show the number of rows and columns for the data set. 

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns"', len(dataset[0]))
        

In [4]:
print(ios_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Exploration of Apple Store Data (ios) using explore_data function

In [5]:
print(ios_header)
print('\n')
explore_data(ios, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns" 16


The Apple Store Data Set contains 7,197 rows and 16 columns. At first glance the following columns may be helpful with the analysis:
   * track_name
   * price
   * rating_count_tot
   * user_rating
   * cont_rating
   * prime_genre
    
As a resource I have included the documentation of the column descriptions [here.](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps#AppleStore.csv)

Next, I will look at the Android data set
  

In [6]:
print(android_header)
print('\n')
explore_data(android, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

The Google Play data set contains 10,841 rows and 13 columns. At first glace the following columns may be helpful with the analysis:
 * App
 * Category
 * Rating
 * Installs
 * Type
 * Content Rating
 * Genres
 
As a resource I have included the documentation of the column descriptions [here.](https://www.kaggle.com/lava18/google-play-store-apps)

## Reviewing data for errors
While reviewing Kaggle for this dataset, one user noted an incorrect rating for entry 10472 in the [discussions.](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015). The user stated the incorrect rating caused all the data in the columns to incorrect shift. Next, I will take a look at that entry to determine if there is an error and document next steps. 
   

In [7]:
print(android[10472]) ## incorrect row note: because I am cleaning up my code and already ran delete, a different app now shows
print('\n')
print(android_header) ## header
print('\n')
print(android[0])     ## correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


After reviewing the data for Life Made WI-Fi Touchscreen Photo Frame, I agree that there is an error in the line. The Category appears to be missing, causing the rating to fill in the category column instead. This is showing the rating as 19 when the max rating for an app is 5.  I will delete this row below as the guided project suggests. However, in the future, I may either look up the category for that app or just insert a blank value depending on how many other similar errors existed. 

In [8]:
print(len(android))
del android[10472] #do not run this more than once
print (len(android))

10841
10840


Again, after reviewing the solution code, I like the way the printed the len before deletion and after. Before deletion print(len(android)) had an output of 10841. I will look at the len after delete like they did. They ran everyting in one cell which I can do in the future. 

Next, I will review the Apple Store data [discussion.](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion). Based on the review of the discussion, there are no identified erros in the Apple Store data. 

## Duplicate Entries

From reading the discussions for the Google Play Store data, it is likely that there are duplicate lines for apps. I will next perform a search for duplicates to confirm using a loop to create two lists: a list of duplicate apps and a list of unique apps. 

In [9]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:10])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


From that I have noted 1,181 duplicate apps (out of 10,841 apps). 

I do not want to double count apps so I intend to remove duplicate entries, however I want to understand what the differnce is between the duplicate lines for the same app. So next, I will search for a Instagram, a app with knowns duplicates to review the differences and determine a deletion criteria. 

In [10]:
for app in android:
    name = app[0]
    if name == "Instagram":
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Instagram has 4 lines within the android apps data set. After reviewing each entry, all the data remained the same except for the number of reviews, the 4th element. The different numbers show that the data was collected at different times. Because the google playstore data does not contain a version number, I am deducing that the higher the number of reviews, the more recent that data should be. Additionally, the higher number of reviews will increase the reliability of ratings. 

Therefore, instead of randomly remvoing duplicates, I will try to caputre the most recent version by keeping the line with the highest number of reviews, the remaining duplicates will be deleted. 

To do that, I will:
 * Creat a dictionary where each key is a unqiue app name, and the value is the higest number of reviews for that app
 * Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)

In [11]:
print('Expected length:', len(android) - 1181)

Expected length: 9659


In [12]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))

9659


I can tell the dictionary was correct because the expected length (10,841 - 1,181) is equivalent to the lenght of the dictionary reviews_max. Next, I will remove the duplicates and keep the entries with the highest number of reviews.

To do this in the code below:
1. Start by initializing two empty lists, android_clean and already_added
2. Loop through the android data set, and for every iteration:
    - Isolate the name of the app and number of reviews
    - Add the current row (app) to the android_clean lists, and the app
      name to the already added list if:
          - The number of reviews of the current app match the number ofreviews of that app per the reviews_max dictionary; and
          - The name of the app is not in the already_added list. This supplementary condition accounts for cases where the highest number of reviews of a duplicate app is the same for more than one entry. If we just check for reviews_max[name] ==n_reviews, there will still be duplicate entries for some apps

In [13]:
android_clean = [] ##will store new cleaned data set
already_added = [] ##will just store app names

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

print(len(android_clean))

9659


I confirmed that the number of apps in the clean data set is as expected, 9659. Below, I sill run a few lines using the explore function from above to check the data. 

In [14]:
explore_data(android_clean, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns" 13


In [15]:
for app in android_clean:
    name = app[0]
    if name == "Instagram":
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


I also validated that there is now only 1 entry for Instagram instead of 4. 

Althrough the guided exercise does not mention this, I am now also curious about duplicates in the Apple Store data, so will do a check for duplicates there next. 

In [16]:
duplicate_apps_apple = []
unique_apps_apple = []

for app in ios:
    name = app[1]
    if name in unique_apps_apple:
        duplicate_apps_apple.append(name)
    else:
        unique_apps_apple.append(name) 

print('Number of duplicate apps:', len(duplicate_apps_apple))
print('\n')
print('Examples of duplicate apps:', duplicate_apps_apple[:10])

Number of duplicate apps: 2


Examples of duplicate apps: ['Mannequin Challenge', 'VR Roller Coaster']


I have noted only 2 duplicates in the Apple Store Data. I would like to see what they look like next.

In [17]:
print(ios_header)

for app in ios:
    name = app[1]
    if name == "Mannequin Challenge" or name == 'VR Roller Coaster':
        print(app)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


Because Apple Store includes version number, I can just pull the latest version and delete the older version. I will perform similar steps to isolate the duplicates and keep only the latest version

In [18]:
## to edit for versions and ios
version_max = {}

for app in ios:
    name = app[1]
    version_n = (app[9])
    
    if name in version_max and version_max[name] < version_n:
        version_max[name] = version_n
        
    elif name not in version_max:
        version_max[name] = version_n
        
print(len(version_max))


ios_clean = [] ##will store new cleaned data set
already_added_ios = [] ##will just store app names

for app in ios:
    name = app[1]
    version_n = (app[9])
    
    if (version_max[name] == version_n) and (name not in already_added_ios):
        ios_clean.append(app)
        already_added_ios.append(name)

print(len(ios_clean))


7195
7195


In [19]:
print(len(ios))

explore_data(ios_clean, 0, 5, True)

7197
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7195
Number of columns" 16


In [20]:
for app in ios_clean:
    name = app[1]
    if name == "Mannequin Challenge" or name == 'VR Roller Coaster':
        print(app)

['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']


## Removing Non-English Apps

The numbers corresponding to characters used in the English text range from 0 to 127 according to the ASCII (American Standard Code for Information Interchange) system. 

To identify App names containing names with characters greater than 127, I will first create a function below. This fucntion will take in the name of the app as 'string' and then I will use the built in ord() function to determine if greater than 127 then false otherwise return true.

In [21]:
def is_english(string):
    
    for character in string:
        if ord(character) >127:
            return False
    
    return True

print(is_english('Instgram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
    

True
False
False
False


It appears that our function struggles with characters like TM or emojis but works correctly for nonenglish app names. If I use this function, I will incorrectly remove English apps labeled as non-English. To minimize this loss, I will remove an app only if its name as more than 3 characters with corresponding numbers falling outside the ASCII range. Meaning all English apps with up to three emoji or other special characters will be labeled as English. 

Below I modified the function above. I created the variable non_ascii. For each iteration, the number of characters >127 are added to this variable. If the variable is > 3, then return False - NonEnglish app. Otherwise return True. 

In [22]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
True


In [23]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in ios_clean:
    name = app[1]
    if is_english(name):
        ios_english.append(app)

print(android_header)
explore_data(android_english, 0, 3, True)
print('\n')
print(ios_header)
explore_data(ios_english, 0, 3, True)    

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns" 13


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['284882215', 'F

## Isolating Free Apps

Because our company only builds apps that are free to download and install, I next want to filter out any paid apps so that we are left with free apps. I will do this by making two lists for each data set, 1 free and paid and loop through each data set, checking for apps where the price equals 0. 

Index # of Price for Apple is 4
Index # of Price for Google is 7

In [24]:
free_apps_android = []
paid_apps_android = []

for app in android_english:
    price = (app[7])
    if price == '0':
        free_apps_android.append(app)
    else:
        paid_apps_android.append(app) 

print(len(free_apps_android))

8864


In [25]:
free_apps_apple = []
paid_apps_apple = []

for app in ios_english:
    price = float(app[4])
    if price == 0:
        free_apps_apple.append(app)
    else:
        paid_apps_apple.append(app) 

print(len(free_apps_apple))

3220


  We are now left with 8,864 Free, English Apps for Google and 3,220 Free, English Apps for Apple.

## Most Common Apps by Genre

Because our company relies on in-app ad revenue, we want to build an app that is likely to attract more users because our revenue is highly influenced by the number of people using our apps. 

The current validation strategy for an app idea involves the following:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

With that strategy in mind, we need to apps that are successful on both markets. I will begin by getting a sense of what are the most common genres for each market. 

I will build frequency tables to perform this analysis. After reviewing the data sets, I believe the following columns will help identify the most common genres:

Android:
- Ratings
- Reviews
- Installs
- Genres (Index Number 9)
- Category (Index Number 1)

Apple:
- Rating Count Total
- User Rating
- Prime Genre (Index Number 11)

Having identified the key columns, I will next build two functions to analyze the frequency tables:
- One function to generate frequency tables that show percentages
- Another function we can use to display the percentages in a descending order

In [26]:
## function provided by DataQuest to create Tuple and Sort
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [27]:
def freq_table(data_set, index):
    table = {}
    total = 0
    
    for row in data_set:
        total += 1
        value = row[index]
        if value in table:
              table[value] += 1
        else: 
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    
    return table_percentages



# Analysis of Apple ios Prime Genre

In [28]:
display_table(free_apps_apple, 11) ## Apple Prime Genre

Games : 58.13664596273293
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.291925465838509
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602486
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801243
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


Note: this analysis only applies to Free English Apps and may not represent the entire App Store patterns. 

## What is the most common genre? Runner up?
Games repreent more than half at 58.14% The next runner up is Entertainment at 8%, followed by Photo & Video at 5%. Only 3.7% of apps are related to education, followed by Social Networking at 3.3%. 


## What is the general impression - are most of the apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle) or more for entertainment (games, photo and video, social networking, sports, music)? 
Entertainment apps account for 79.71% of all the app genres. This shows that apps designed for fun are more predominate that apps with practical purposes. 

## Can you recommend an app profile for the App Store market based on this frequency table alone? If there's a large number of apps for a particular genre, does that also imply that apps of that genre generally have a large number of users?
No, I cannot recommend an app profile for the App Store market based on this frequency table alone. A large number of apps for a particular genre does not imply that apps of that genre generally have a large number of users. For example, while there are not a large number of navigations apps, many users probably use 1 of the few (Apple Maps, Google Maps, Waze). With several games available, users have a lot of apps to choose from. 

# Google Play Store 
## Analysis of Category

In [42]:
display_table(free_apps_android, 1) ##Android Category


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

## Analysis of Genre

In [29]:
display_table(free_apps_android, 9) ##Android Genres

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Note: After comparing the result of Category to Genre for the Google Play Store data, I noticed that there are several more genres compared to categories. Also in genres, there appears to be duplicate genres. It also appears that there are subgenres such as a games. To have a more aggregated view and more easily compare to the Apple store, I will use the Category to perform my analysis. 

# Analysis of Google Play Store Category

Note: this analysis only applies to Free English Apps and may not represent the entire Google Play Store patterns. 


## What are the most common genres? 
Family is the most common with 19% followed by Game at 9.7% Practical genres like Tools (8.5%), Business (4.6%) and Lifestyle (4%) round out the top 5. Because I was unclear what "Family" meant, I went to the Google Play Store and found that most Family apps are also game apps. 

## What other patterns do you see?
In the Google Play Market, practical apps appear to be more common, however Game apps (Family + Game) still appear to be most frequent. 

## Compare the patterns you see for the Google Play market with those you saw for the App Store Market
In the Google Play Market, practical apps appear to be more common. Some genres less popular at Apple like medical (0.19%) are more common at Google (3.53%). Games are less frequent at Google but still quite popular. 

## Can you recommend an app profile based on what you found so far? Do the frequency tables you generated reveal the most frequent app genres or what genres have the most users

No, I cannot recommend an app profile based on what has been found so far. The frequency tables do not reveal which genres have the most users, only the most frequent app genres. A genre could contain a low number of apps but a high number of users. 


## Conclusion of Genre Analysis
Based on analysis so far, it appears that for Free, English Apps, the Apple App Store contains many apps designed for fun, while Google Play has a more balanced assortment of practical and fun apps. However, as mentioned above, we need to find the number of users for apps to determine which are most popular. 

# Most Popular App by Genre on the App Store

One way to find out which genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. 

For the App Store Data Set, we will take the total number of users as number of installs is not available (rating_count_tot). To calculate the average number of users  I will perform a loop inside of another loop to:

- Isolate the apps of each genre
- Sum up the user ratings for the apps of that genre
- Divide the sum by the number of apps belonging to that genre (not the total number of apps)


In [30]:
genres_ios = freq_table(free_apps_apple, 11)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in free_apps_apple:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22812.92467948718
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Although Navigation has the highest number of ratings, followed by Social Networking, these genres have a few well known apps. Waze and Google Maps for Navigation, Facebook and Skype for Social Networking. For our company it may be difficult to compete with such large, established apps. Note companies like Instagram and Snapchat fall into the Photo and Video category which would also be difficult to compete with. 

In [31]:
for app in free_apps_apple:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [32]:
for app in free_apps_apple:
    if app[11] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [33]:
for app in free_apps_apple:
    if app[11] == 'Photo & Video':
        print(app[1], ':', app[5])

Instagram : 2161558
Snapchat : 323905
YouTube - Watch Videos, Music, and Live Streams : 278166
Pic Collage - Picture Editor & Photo Collage Maker : 123433
Funimate video editor: add cool effects to videos : 123268
musical.ly - your video social network : 105429
Photo Collage Maker & Photo Editor - Live Collage : 93781
Vine Camera : 90355
Google Photos - unlimited photo and video storage : 88742
Flipagram : 79905
Mixgram - Picture Collage Maker - Pic Photo Editor : 54282
Shutterfly: Prints, Photo Books, Cards Made Easy : 51427
Pic Jointer – Photo Collage, Camera Effects Editor : 51330
Color Pop Effects - Photo Editor & Picture Editing : 45320
Photo Grid - photo collage maker & photo editor : 40531
iSwap Faces LITE : 39722
MOLDIV - Photo Editor, Collage & Beauty Camera : 39501
Photo Editor by Aviary : 39501
Photo Lab: Picture Editor, effects & fun face app : 34585
Rookie Cam - Photo Editor & Filter Camera : 33921
FotoRus -Camera & Photo Editor & Pic Collage Maker : 32558
PicsArt Photo St

## Apple Store App Review

After reviewing both the number of users for each genre as well as the % of apps by genre, I have elminated the following genres due to the large number of users and small % of apps, indicating a few big players
 - Navigation
 - Social Networking
 - Music
 - Weather
 - Photo

Other genres were elminated because the apps often are run by companies themselves such as restaurants, hotels, and airlines:
 - Food & Drink
 - Travel
 - Shopping
 
The Average Number of Users per Genre is 43,351 and the Median is 23,928. To stay above the Median number of users the remaining categories include:
 - Reference
 - Book
 - Health & Fitness

In [34]:
for app in free_apps_apple:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [35]:
for app in free_apps_apple:
    if app[11] == 'Book':
        print(app[1], ':', app[5])

Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0


In [36]:
for app in free_apps_apple:
    if app[11] == 'Health & Fitness':
        print(app[1], ':', app[5])

Calorie Counter & Diet Tracker by MyFitnessPal : 507706
Lose It! – Weight Loss Program and Calorie Counter : 373835
Weight Watchers : 136833
Sleep Cycle alarm clock : 104539
Fitbit : 90496
Period Tracker Lite : 53620
Nike+ Training Club - Workouts & Fitness Plans : 33969
Plant Nanny - Water Reminder with Cute Plants : 27421
Sworkit - Custom Workouts for Exercise & Fitness : 16819
Clue Period Tracker: Period & Ovulation Tracker : 13436
Headspace : 12819
Fooducate - Lose Weight, Eat Healthy,Get Motivated : 11875
Runtastic Running, Jogging and Walking Tracker : 10298
WebMD for iPad : 9142
8fit - Workouts, meal plans and personal trainer : 8730
Garmin Connect™ Mobile : 8341
Record by Under Armour, connects with UA HealthBox : 7754
Fitstar Personal Trainer : 7496
My Cycles Period and Ovulation Tracker : 7469
Seven - 7 Minute Workout Training Challenge : 6808
RUNNING for weight loss: workout & meal plans : 6407
Lifesum – Inspiring healthy lifestyle app : 5795
Waterlogged - Daily Hydration Tr

## Apple Store App Profile Recommendation

After reviewing Reference, Book, and Health and Fitness I have the following app recomendations:

Reference: It appears that MineCraft guides are currently popular within References. Although this is a niche market, it could be an opportunity for ad revenue. Other game guides are a possibility as well. 

Books: There are not a lot of book apps in the Apple store, this could be a result of Apple's own reading app. However, I think there is an opportunity here for an app that helps recommend books to you. It could also count all the books you have read, not just in Apple and you could create a list of books you want to read. 

Health and Fitness: Although several are in place, a fitness or weight loss app could be popular and a source of ad revenue. However, it would likely require an expert in health and fitness to guide the product which our company does not have currently. 

# Most Popular App by Genre on the Google Play Store


Note: From DataQuest
We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. To perform computations, however, we'll need to convert each install number from string to float.

In [37]:
categories_android = freq_table(free_apps_android, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in free_apps_android:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', "")
            n_installs = n_installs.replace('+', "")
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

For the same reasons with Apple, the following Categories will be eliminated: 

I have elminated the following genres due to the large number of users and small % of apps, indicating a few big players
 - Maps & Navigation
 - Social
 - Weather

Other genres were elminated because the apps often are run by companies themselves such as restaurants, hotels, and airlines:
 - Food & Drink
 - Travel
 - Shopping
 

In [39]:
for app in free_apps_android:
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

The Communication Category has the highest number of installations at 3,8456,119. This includes messaging apps , internet browsers, and video call apps. Because it also contains a high % of apps, 3.24% and this data does not match Apple, communication is likely not a good profile for our company to explore. 

Similarly, because I would like to suggest one app profile that can succeed in both Google's Play Store and Apple's App store, I would eleminiate video players. 

Using Excel, I calculated the Average Number of Installs: 7,281,532 and the Median Number of Installs: 3,695,641. Similar to Apple, it would be prerable to recommend a category that is at a minimum, greater than median, and close to or above average. Note: the average is skewed due to a few categories with high installs. Using this criteria the following categories remain:
 - Photography
 - Productivity
 - Game
 - Entertainment
 - Tools
 - News and Magazines
 - Books and Reference
 
I will note that this data per [kaggle](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) was scaped in 2017. I would suspect that as of 2020 the number of installs and users for photography apps is higher given the rise of Instagram influencers, photo editing apps. Athough there are several smaller editing apps available, it would be difficult to differentiate. 
 
Because Books and Reference appeared in the Apple App Store Analysis, I would like to take a look at the apps for this Google Play Store Category Next:

In [40]:
for app in free_apps_android:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

## Reference
While Field Guides for games such as MineCraft were popular on the Apple App Store, there does not appear to be the same type of apps in the Google Play Store. The question is - is there a market for this in the Google Play Store that just hasn't been delivered to android users, or, like we noted earlier, games are more popular on Apple devices, while Android is more balanced between pracitcal and fun apps. Because fun apps are still popular, it would be worth further researching the popularity of field guides for popular games. 

## Books
Again, Google Play Store does have it's own reader app. However, the possibility for a Book Recommendation App still stands with the Google Play Store App.

In [41]:
for app in free_apps_android:
    if app[1] == 'HEALTH_AND_FITNESS':
        print(app[0], ':', app[5])

Step Counter - Calorie Counter : 500,000+
Lose Belly Fat in 30 Days - Flat Stomach : 5,000,000+
Pedometer - Step Counter Free & Calorie Burner : 1,000,000+
Six Pack in 30 Days - Abs Workout : 10,000,000+
Lose Weight in 30 Days : 10,000,000+
Pedometer : 10,000,000+
LG Health : 10,000,000+
Step Counter - Pedometer Free & Calorie Counter : 10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App : 10,000,000+
Sportractive GPS Running Cycling Distance Tracker : 1,000,000+
30 Day Fitness Challenge - Workout at Home : 10,000,000+
Home Workout for Men - Bodybuilding : 1,000,000+
Fat Burning Workout - Home Weight lose : 100,000+
Buttocks and Abdomen : 500,000+
Walking for Weight Loss - Walk Tracker : 100,000+
Running & Jogging : 500,000+
Sleep Sounds : 1,000,000+
Fitbit : 10,000,000+
Lose Belly Fat-Home Abs Fitness Workout : 50,000+
Cycling - Bike Tracker : 500,000+
Abs Training-Burn belly fat : 100,000+
Calorie Counter - EasyFit free : 1,000,000+
Aunjai i lert u : 500,000+
Garmin Connect

There are several Health and Fitness apps including step counters and calorie counters. Due to the high number of apps to choose from, it may difficult to differentiate our app. And as noted previously, we would need to hire a specialist for a health related app. 

# Final App Profile Recommendations:

After reviewing first the Popularity of Apps by Genre (%) and the number of Installations (Google) and Users (Apple) for free, english apps in both the Apple App Store and Android Google Play Store data sets I would recommend the following app profiles:

Books: A book recommendation app where users can enter the books they have read, mark what they would like to read, recommend books, and obtain recommendations generated for them based on their preferences for both Google and Apple. The BOOK genre has an above average number of users in both data sets and does not contain either too many apps making differentiation difficult or too few apps making it difficult to compete with established, large apps. 

Reference: While reviewing Reference apps in the Apple Store, apps with Field Guides for popular games like Minecraft were available. These apps however were not in the Google Play Store. I recommend we further research to determine if there is a market for these field guides in the Google Play Store that have simply not been fulfilled yet, or if field guides are less popular with android users. Although this is niche market, it is an opportunity for targeted ads should we continue with this app. 