# Profitable App Profiles for the App Store and Google Play Markets

The goal of this Project is to determine which types of **free apps** are likely to attract more users, and therefore generate more revenue through in-app ads,

We will be using the following datasets:
* A [dataset](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018
* A [dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017

## Open and explore datasets

The first step is to create a custom function `explore_data()` that can be used to print rows in a readable way

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

The `explore_data()` function does the following:
* Takes in four parameters:
    * `dataset`, which will be a list of lists
    * `start` and `end`, which will both be integers and represent the starting and the ending indices of a slice from the dataset
    * rows_and_columns, which will be a Boolean and has False as a default argument
* Slices the dataset using `dataset[start:end]`
* Loops through the slice, and for each iteration, prints a row and adds a new line after that row using `print('\n')``
    * The `\n` in `print('\n')` is a special character that won't print. Instead, the \n character adds a new line, and we use `print('\n')` to add some blank space between rows
* Prints the number of rows and columns if `rows_and_columns` is `True``
    * `dataset` shouldn't have a header row, or the function will print the wrong number of rows (one more row compared to the actual length)








Open the datasets:

In [2]:
# Apple Store data set
opened_file = open('AppleStore (1).csv')
from csv import reader
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

# Android Store data set
opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]


Explore datasets using `explore_data()` function:

In [3]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


Columns of interest for this analysis:
* track_name
* currency
* price
* rating_count_tot
* rating_count_ver
* prime_genre

In [4]:
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


Columns of interest for this analysis:
* App
* Category
* Reviews
* Installs
* Type
* Price
* Genres

## Data cleaning


Since in this scenario we consider only free apps and apps for English-speaking audience, we will remove the following:
* non-English apps
* non-free apps

In [5]:
print(android_header)
print(android[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [6]:
print(len(android_header))

13


In [7]:
print(len(android[10472]))

12


For the example row above, we noticed on the discussion forum for the dataset that the row 10472 has a missing value - we determine it is missing the column Category, which then shifts all other data one column to the left

We can check for rows with missing columns with the code below:

In [8]:
for row in android:
    if len(row) != len(android_header):
        print(row)
        print("\n")
        print("Index postion is:", android.index(row))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Index postion is: 10472


Let's also check the ios dataset:

In [9]:
for row in ios:
    if len(row) != len(ios_header):
        print(row)
        print("\n")
        print("Index postion is:", ios.index(row))

Looks like ios dataset doesn't have rows with missing columns

Delete the row 10472 from android dataset:

In [10]:
del android[10472]

Now let's re-check the android dataset using the checker code:

In [11]:
for row in android:
    if len(row) != len(android_header):
        print(row)
        print("\n")
        print("Index postion is:", android.index(row))

Looks good!

Now, let's check for duplicate data in both datasets.
We will use the following code:

In [12]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Number of unique apps:', len(unique_apps))

Number of duplicate apps: 1181


Number of unique apps: 9659


Above, we did the following:

* Created two lists: one for storing the name of duplicate apps, and one for storing the name of unique apps.
* Looped through the `android` data set (the Google Play data set), and for each iteration, we did the following:
    * We saved the app name to a variable named `name`.
    * If `name` was already in the `unique_apps` list, we appended `name` to the `duplicate_apps list`.
    * `Else` (if name wasn't already in the `unique_apps` list), we appended `name` to the `unique_apps` list.

Let's define this code as a function `check_for_duplicates(data_set)` to be able to easily reuse it whenever we need. Second parameter in the function is `id_column` which is the index of the column we want to check, usually a unique identifier such as `app_name` or `app_id`. In this example, `android` has `App` only, while ios has `id`.

In [13]:
def check_for_duplicates(data_set, id_column = 0, print_uniques = False):
    duplicate_apps = []
    unique_apps = []

    for app in data_set:
        name = app[id_column]
        if name in unique_apps:
            duplicate_apps.append(name)
        else:
            unique_apps.append(name)
        
    print('Number of duplicate apps:', len(duplicate_apps))
    print('\n')
    print('Number of unique apps:', len(unique_apps))
    
    if print_uniques:
        print(unique_apps)

In [14]:
check_for_duplicates(android)

Number of duplicate apps: 1181


Number of unique apps: 9659


In [15]:
check_for_duplicates(ios)

Number of duplicate apps: 0


Number of unique apps: 7197


We can see that `android` dataset contains 1181 duplicate values, and `ios` dataset has no duplicates

We don't want to count certain apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app. One thing we could do is remove the duplicate rows randomly, but we could probably find a better way.

If you examine the rows we printed for the Instagram app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews. The different numbers show the data was collected at different times.

In [16]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


We can use this information to build a criterion for removing the duplicates. The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

After removing duplicates, our `android` dataset should contain the below number of rows:

In [17]:
print('Expected length:', len(android) - 1181)

Expected length: 9659


To remove the duplicates, we will do the following:

* Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
* Use the information stored in the dictionary and create a new dataset, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

In [18]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews


Check if Instagram has been updated to show the max number of reviews:

In [19]:
print(reviews_max['Instagram'])

66577446.0


Check the length of `reviews_max` to match the expected length of 9659:

In [20]:
len(reviews_max)

9659

We can now use the new dictionary to remove row duplicates:

First, we create two empty lists:
* `android_clean` which will store our new cleaned dataset
* `already_added` which will just store app names

We loop through the `android` dataset and:
* if `n_reviews` matches the `n_reviews` in the `reviews_max` dictionary for the corresponding key AND the app `name` is not present in the `already_added` list:
    * we `.append` the whole app row to the `android_clean` list
    * we `.append` the app `name` to the `already_added` list to keep track of the apps that were already checked. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for `reviews_max[name] == n_reviews`, we'll still end up with duplicate entries for some apps.

In [21]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app) #adds the whole row to the list
        already_added.append(name) #keeps track of already added apps


Let's explore the newly created `android_clean` dataset to verify everything went as expected:

In [22]:
explore_data(android_clean, 0, 6, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+

Running the `check_for_duplicates()` function:

In [23]:
check_for_duplicates(android_clean)

Number of duplicate apps: 0


Number of unique apps: 9659


Looks like we got rid of the duplicates

### Removing non-English apps

One way to do this is to remove each app with a name containing a symbol that isn't commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

Each character we use in a string has a corresponding number associated with it. For instance, the corresponding number for character `'a'` is 97, character `'A'` is 65, and character `'爱'` is 29,233. We can get the corresponding number of each character using the `ord()` built-in function.

The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system. Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. If the number is equal to or less than 127, then the character belongs to the set of common English characters. If an app name contains a character that is greater than 127, then it probably means that the app has a non-English name.

In Python, strings are indexable and iterable, which means we can use indexing to select an individual character, and we can also iterate on the string using a for loop.

We will now write an example function that takes in a string and returns `False` if there's any character in the string that doesn't belong to the set of common English characters; otherwise, the function returns `True`:

In [24]:
def is_english(string):
    for character in string:
        if ord(character) > 127:
            return False
        
    return True

In [25]:
is_english('Instagram')

True

In [26]:
is_english('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [27]:
is_english('Docs To Go™ Free Office Suite')

False

In [28]:
is_english('Instachat 😜')

False

The function couldn't correctly identify certain English app names like `'Docs To Go™ Free Office Suite'` and `'Instachat 😜'`. This is because emojis and characters like `™` fall outside the ASCII range and have corresponding numbers over 127.

If we're going to use the function we've created, we'll lose useful data since many English apps will be incorrectly labeled as non-English. To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English. Our filter function is still not perfect, but it should be fairly effective.

In [29]:
def is_english(string):
    n_outside_range = 0
    for character in string:
        if ord(character) > 127:
            n_outside_range += 1
    if n_outside_range > 3:
            return False
    else:
        return True
        


In [30]:
is_english('Instagram')

True

In [31]:
is_english('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [32]:
is_english('Docs To Go™ Free Office Suite')

True

In [33]:
is_english('Instachat 😜')

True

In [34]:
is_english('Instachat 😜😜😜')

True

In [35]:
is_english('Instachat 😜😜😜😜😜😜😜')

False

Looking at the above tests, our new function works as inteded

We will now use it to filter both data sets and append English apps to a separate list

In [36]:
android_english = []

for app in android_clean:
    name = app[0]
    check_name = is_english(name)
    if check_name == True:
        android_english.append(app)
        
        

In [37]:
explore_data(android_english, 0, 6, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+

In [38]:
print('Number of removed non-English apps:', len(android_clean) - len(android_english))

Number of removed non-English apps: 45


In [39]:
ios_english = []

for app in ios:
    name = app[1]
    check_name = is_english(name)
    if check_name == True:
        ios_english.append(app)

In [40]:
explore_data(ios_english, 0, 6, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1']


Number of rows: 6183
Number of columns: 16


In [41]:
print('Number of removed non-English apps:', len(ios) - len(ios_english))

Number of removed non-English apps: 1014


We can see that we're left with 9614 Android apps and 6183 iOS apps.



### Isolating the free apps only

In [42]:
ios_free = []
android_free = []

for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_free.append(app)
        
for app in android_english:
    price = app[7]
    if price == '0':
        android_free.append(app)
        
    
explore_data(ios_free, 0, 3, True)
print('\n')
explore_data(android_free, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Vari

Below we will explore the corresponding price columns to determine which value represents 'Zero' or free app in each data set. Once that is determined, we go back to the above cell and insert proper values to compare with, since we need to compare strings

In [43]:
check_for_duplicates(android_english, 7, True)

Number of duplicate apps: 9522


Number of unique apps: 92
['0', '$4.99', '$3.99', '$1.49', '$2.99', '$7.99', '$5.99', '$3.49', '$1.99', '$6.99', '$9.99', '$7.49', '$0.99', '$9.00', '$5.49', '$10.00', '$11.99', '$79.99', '$16.99', '$14.99', '$1.00', '$29.99', '$2.49', '$24.99', '$10.99', '$1.50', '$19.99', '$15.99', '$33.99', '$74.99', '$39.99', '$3.95', '$4.49', '$1.70', '$8.99', '$2.00', '$3.88', '$25.99', '$399.99', '$17.99', '$400.00', '$3.02', '$1.76', '$4.84', '$4.77', '$1.61', '$2.50', '$1.59', '$6.49', '$1.29', '$5.00', '$13.99', '$299.99', '$379.99', '$37.99', '$18.99', '$389.99', '$19.90', '$8.49', '$1.75', '$14.00', '$4.85', '$46.99', '$109.99', '$154.99', '$3.08', '$2.59', '$4.80', '$1.96', '$19.40', '$3.90', '$4.59', '$15.46', '$3.04', '$12.99', '$4.29', '$2.60', '$3.28', '$4.60', '$28.99', '$2.95', '$2.90', '$1.97', '$200.00', '$89.99', '$2.56', '$30.99', '$3.61', '$394.99', '$1.26', '$1.20', '$1.04']


In [44]:
check_for_duplicates(ios_english, 4, True)

Number of duplicate apps: 6149


Number of unique apps: 34
['0.0', '1.99', '0.99', '6.99', '2.99', '7.99', '4.99', '9.99', '3.99', '8.99', '5.99', '14.99', '13.99', '19.99', '17.99', '15.99', '24.99', '20.99', '29.99', '12.99', '39.99', '74.99', '16.99', '249.99', '11.99', '27.99', '49.99', '59.99', '22.99', '18.99', '99.99', '21.99', '34.99', '299.99']


Now we have created new lists `ios_free` and `android_free` which contain cleaned data with english-only apps
* 3222 apps in ios_free
* 8864 apps in android_free

## Checkpoint

In [45]:
ios_final = ios_free.copy()
android_final = android_free.copy()

In [46]:
explore_data(ios_final, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16


In [47]:
explore_data(android_final, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


### Steps done so far:

* Removed inaccurate data
* Removed duplicate entries
* Removed non-English apps
* Isolated free apps

The goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

To minimize risks and overhead, our validation strategy for an app idea has three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

We will begin the analysis by determining the most common genres for each market.

In [48]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [49]:
print(ios_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


We determine useful columns for frequency tables are:
* prime_genre / Genres
* Category

## Build a frequency table for Genres

Next step is to build a frequency table for the `prime_genre` column of the ios data set, and for the `Genres` and `Category` columns for the android dataset.

In [50]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
        
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    
    return table_percentages 

Below we will create a function `display_table()` which will convert dictionary into a list of tuples, so that we can sort properly. Sorting dictionaries with the built-in `sorted()` function doesn't work very well, because it only considers the keys

The `display_table()` function does the following:
* takes in two parameters - `dataset` and `index` value
* generates a frequency table using the `freq_table()` function
* transforms the frequency table into a list of tuples, then sorts in a descending order
* prints the entries of the frequency table in descending order

In [51]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [52]:
display_table(android_final, 1) #Category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [53]:
display_table(android_final, 9) #Genre

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [54]:
display_table(ios_final, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


The frequency tables we analyzed showes that apps designed for fun dominate the App Store, while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to determine the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` column.

Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to do the following:

* Isolate the apps of each genre
* Add up the user ratings for the apps of that genre
* Divide the sum by the number of apps belonging to that genre (not by the total number of apps)

In [55]:
genres_ios = freq_table(ios_final, 11)

In [56]:
for genre in genres_ios:
    total = 0 #stores the sum of user ratings for each genre
    len_genre = 0 #stores the number of apps specific to each genre
    
    for row in ios_final:
        genre_app = row[11]
        if genre_app == genre:
            n_ratings = float(row[5])
            total += n_ratings #adding the number of ratings if genres match
            len_genre += 1 #increase by 1 for each app that matches genre
    
    avg_n_rating_genre = total / len_genre
    print(genre, ' : ', avg_n_rating_genre)
        
    

Social Networking  :  71548.34905660378
Photo & Video  :  28441.54375
Games  :  22788.6696905016
Music  :  57326.530303030304
Reference  :  74942.11111111111
Health & Fitness  :  23298.015384615384
Weather  :  52279.892857142855
Utilities  :  18684.456790123455
Travel  :  28243.8
Shopping  :  26919.690476190477
News  :  21248.023255813954
Navigation  :  86090.33333333333
Lifestyle  :  16485.764705882353
Entertainment  :  14029.830708661417
Food & Drink  :  33333.92307692308
Sports  :  23008.898550724636
Book  :  39758.5
Finance  :  31467.944444444445
Education  :  7003.983050847458
Productivity  :  21028.410714285714
Business  :  7491.117647058823
Catalogs  :  4004.0
Medical  :  612.0


On average, navigation apps have the highest number of user reviews, but let's examine this category to see if we can see why there is such high number of ratings with such low % of the market share

In [57]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


We can see in the above breakdown that Waze and Google Maps have almost 500,000 ratings, which heavily influences the average number of ratings per app in the category Navigation

Let's look at some other categories with high number ov avg user ratings:

In [58]:
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5]) # print name and number of ratings

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [59]:
for app in ios_final:
    if app[-5] == 'Music':
        print(app[1], ':', app[5]) # print name and number of ratings

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:



In [60]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


However, this niche seems to show some potential. One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.

This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.



Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:



In [61]:
for app in ios_final:
    if app[-5] == 'Weather':
        print(app[1], ':', app[5]) # print name and number of ratings

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.



In [62]:
for app in ios_final:
    if app[-5] == 'Food & Drink':
        print(app[1], ':', app[5]) # print name and number of ratings

Starbucks : 303856
Domino's Pizza USA : 258624
OpenTable - Restaurant Reservations : 113936
Allrecipes Dinner Spinner : 109349
DoorDash - Food Delivery : 25947
UberEATS: Uber for Food Delivery : 17865
Postmates - Food Delivery, Faster : 9519
Dunkin' Donuts - Get Offers, Coupons & Rewards : 9068
Chick-fil-A : 5665
McDonald's : 4050
Deliveroo: Restaurant Delivery - Order Food Nearby : 1702
SONIC Drive-In : 1645
Nowait Guest : 1625
7-Eleven, Inc. : 1356
Outback : 805
Bon Appetit : 750
Starbucks Keyboard : 457
Whataburger : 197
Delish Eatmoji Keyboard : 154
Lieferheld - Delicious food delivery service : 29
Lieferando.de : 29
McDo France : 22
Chefkoch - Rezepte, Kochen, Backen & Kochbuch : 20
Youmiam : 9
Marmiton Twist : 2
Open Food Facts : 1


Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.



In [63]:
for app in ios_final:
    if app[-5] == 'Finance':
        print(app[1], ':', app[5]) # print name and number of ratings

Chase Mobile℠ : 233270
Mint: Personal Finance, Budget, Bills & Money : 232940
Bank of America - Mobile Banking : 119773
PayPal - Send and request money safely : 119487
Credit Karma: Free Credit Scores, Reports & Alerts : 101679
Capital One Mobile : 56110
Citi Mobile® : 48822
Wells Fargo Mobile : 43064
Chase Mobile : 34322
Square Cash - Send Money for Free : 23775
Capital One for iPad : 21858
Venmo : 21090
USAA Mobile : 19946
TaxCaster – Free tax refund calculator : 17516
Amex Mobile : 11421
TurboTax Tax Return App - File 2016 income taxes : 9635
Bank of America - Mobile Banking for iPad : 7569
Wells Fargo for iPad : 2207
Stash Invest: Investing & Financial Education : 1655
Digit: Save Money Without Thinking About It : 1506
IRS2Go : 1329
Capital One CreditWise - Credit score and report : 1019
U by BB&T : 790
Paribus - Rebates When Prices Drop : 768
KeyBank Mobile : 623
VyStar Mobile Banking for iPhone : 434
Sparkasse - Your mobile branch : 77
VyStar Mobile Banking for iPad : 57
Zaim : 4

Finance apps — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.

In [64]:
for app in ios_final:
    if app[-5] == 'Health & Fitness':
        print(app[1], ':', app[5]) # print name and number of ratings

Calorie Counter & Diet Tracker by MyFitnessPal : 507706
Lose It! – Weight Loss Program and Calorie Counter : 373835
Weight Watchers : 136833
Sleep Cycle alarm clock : 104539
Fitbit : 90496
Period Tracker Lite : 53620
Nike+ Training Club - Workouts & Fitness Plans : 33969
Plant Nanny - Water Reminder with Cute Plants : 27421
Sworkit - Custom Workouts for Exercise & Fitness : 16819
Clue Period Tracker: Period & Ovulation Tracker : 13436
Headspace : 12819
Fooducate - Lose Weight, Eat Healthy,Get Motivated : 11875
Runtastic Running, Jogging and Walking Tracker : 10298
WebMD for iPad : 9142
8fit - Workouts, meal plans and personal trainer : 8730
Garmin Connect™ Mobile : 8341
Record by Under Armour, connects with UA HealthBox : 7754
Fitstar Personal Trainer : 7496
My Cycles Period and Ovulation Tracker : 7469
Seven - 7 Minute Workout Training Challenge : 6808
RUNNING for weight loss: workout & meal plans : 6407
Lifesum – Inspiring healthy lifestyle app : 5795
Waterlogged - Daily Hydration Tr

Health & Fitness genre seems interesting. Although avg n of ratings is affected by popular apps like Sleep Cycle, myfitnesspal and similar, there is some potential for various tracker apps, simple workout templates and similar. Might be worth exploring.

In [65]:
for app in ios_final:
    if app[-5] == 'Health & Fitness':
        print(app[1], ':', app[5]) # print name and number of ratings

Calorie Counter & Diet Tracker by MyFitnessPal : 507706
Lose It! – Weight Loss Program and Calorie Counter : 373835
Weight Watchers : 136833
Sleep Cycle alarm clock : 104539
Fitbit : 90496
Period Tracker Lite : 53620
Nike+ Training Club - Workouts & Fitness Plans : 33969
Plant Nanny - Water Reminder with Cute Plants : 27421
Sworkit - Custom Workouts for Exercise & Fitness : 16819
Clue Period Tracker: Period & Ovulation Tracker : 13436
Headspace : 12819
Fooducate - Lose Weight, Eat Healthy,Get Motivated : 11875
Runtastic Running, Jogging and Walking Tracker : 10298
WebMD for iPad : 9142
8fit - Workouts, meal plans and personal trainer : 8730
Garmin Connect™ Mobile : 8341
Record by Under Armour, connects with UA HealthBox : 7754
Fitstar Personal Trainer : 7496
My Cycles Period and Ovulation Tracker : 7469
Seven - 7 Minute Workout Training Challenge : 6808
RUNNING for weight loss: workout & meal plans : 6407
Lifesum – Inspiring healthy lifestyle app : 5795
Waterlogged - Daily Hydration Tr

In [66]:
for app in ios_final:
    if app[-5] == 'Book':
        print(app[1], ':', app[5]) # print name and number of ratings

Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0


Let's explore how users are rating apps in each genre. Higher ratings might be an indicator of higher user retention - users who rate apps well might use the app more often

In [67]:
print(ios_header)
print('\n')
print(ios_final[:3])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']]


In [68]:
for genre in genres_ios:
    total = 0 #stores the sum of user ratings for each genre
    len_genre = 0 #stores the number of apps specific to each genre
    
    for row in ios_final:
        genre_app = row[11]
        if genre_app == genre:
            ratings = float(row[7])
            total += ratings #adding the number of ratings if genres match
            len_genre += 1 #increase by 1 for each app that matches genre
    
    avg_rating_genre = total / len_genre
    print(genre, ' : ', avg_rating_genre)
        
    

Social Networking  :  3.5943396226415096
Photo & Video  :  3.903125
Games  :  4.037086446104589
Music  :  3.946969696969697
Reference  :  3.6666666666666665
Health & Fitness  :  3.769230769230769
Weather  :  3.482142857142857
Utilities  :  3.5308641975308643
Travel  :  3.4875
Shopping  :  3.9702380952380953
News  :  3.244186046511628
Navigation  :  3.8333333333333335
Lifestyle  :  3.411764705882353
Entertainment  :  3.5393700787401574
Food & Drink  :  3.6346153846153846
Sports  :  3.0652173913043477
Book  :  3.0714285714285716
Finance  :  3.375
Education  :  3.635593220338983
Productivity  :  4.0
Business  :  3.9705882352941178
Catalogs  :  4.125
Medical  :  3.0


We can see that highest rated Genres are Catalogs, Games, Business, Music, Productivity and Photo & Video. We have already determinedd that Music is not of interest to us since it is dominated by the streaming giants such as Spotify, Pandora and similar.

Let's explore Business, Catalogs and Games further:


In [69]:
for app in ios_final:
    if app[-5] == 'Business':
        print(app[1], ':', app[5]) # print name and number of ratings

Indeed Job Search : 38681
Flashlight ◎ : 24744
Adobe Acrobat Reader: View, Create, & Convert PDFs : 20069
Scanner App - PDF Document Scan : 11696
SayHi Translate : 8623
ADP Mobile Solutions : 8324
Sideline - 2nd Phone Number : 7907
Uber Driver : 3289
AirWatch Agent : 1150
VPN Go - Safe Fast & Stable VPN Proxy : 881
Cisco AnyConnect : 825
GreenVPN - Free & fast VPN with unlimited traffic : 464
iPlum Business Phone Number for Calling & Texting : 392
OPEN Forum : 200
Pulse Secure : 53
DingTalk : 40
Mon Espace - Pôle emploi : 11


Business genre is of no interest, ratings are highly dominated by apps such as Indeed job search, flashlight and pdf reader. We cannot be competitive here.

In [70]:
for app in ios_final:
    if app[-5] == 'Games':
        print(app[1], ':', app[5]) # print name and number of ratings

Clash of Clans : 2130805
Temple Run : 1724546
Candy Crush Saga : 961794
Angry Birds : 824451
Subway Surfers : 706110
Solitaire : 679055
CSR Racing : 677247
Crossy Road - Endless Arcade Hopper : 669079
Injustice: Gods Among Us : 612532
Hay Day : 567344
PAC-MAN : 508808
DragonVale : 503230
Head Soccer : 481564
Despicable Me: Minion Rush : 464312
The Sims™ FreePlay : 446880
Sonic Dash : 418033
8 Ball Pool™ : 416736
Tiny Tower - Free City Building : 414803
Jetpack Joyride : 405647
Bike Race - Top Motorcycle Racing Games : 405007
Kim Kardashian: Hollywood : 397730
Trivia Crack : 393469
WordBrain : 391401
Sniper 3D Assassin: Shoot to Kill Gun Game : 386521
Flow Free : 373857
Geometry Dash Lite : 370370
▻Sudoku : 359832
Fruit Ninja® : 327025
Pixel Gun 3D : 301182
Temple Run 2 : 295211
My Horse : 293857
Word Cookies! : 287095
Dragon City Mobile : 277268
The Simpsons™: Tapped Out : 274501
Plants vs. Zombies™ 2 : 267394
Clash Royale : 266921
Pokémon GO : 257627
CSR Racing 2 : 257100
Star Wars™: 

Games are interesting since there are so many different games with high number of ratings. Considering the average user rating is 4.037, that means that coming up with an interesting and addicting game has a good potential for high ad revenue. Some ideas might be retro console emulators that include many older fan-favorite games (nostalgia factor), simple addictive games for killing time, clickers/tappers etc.

In [71]:
for app in ios_final:
    if app[-5] == 'Productivity':
        print(app[1], ':', app[5]) # print name and number of ratings

Evernote - stay organized : 161065
Gmail - email by Google: secure, fast & organized : 135962
iTranslate - Language Translator & Dictionary : 123215
Yahoo Mail - Keeps You Organized! : 113709
Google Docs : 64259
Google Drive - free online storage : 59255
Dropbox : 49578
Microsoft Word : 47999
Microsoft OneNote : 39638
Microsoft Outlook - email and calendar : 32807
Hotspot Shield Free VPN Proxy & Wi-Fi Privacy : 32499
Documents 6 - File manager, PDF reader and browser : 29110
Google Sheets : 24602
Microsoft Excel : 24430
Inbox by Gmail : 21561
T-Mobile : 19977
Paper by FiftyThree - Sketch, Diagram, Take Notes : 18219
MyScript Calculator - Handwriting calculator : 16555
VPN Proxy Master - Unlimited WiFi security VPN : 13674
Microsoft OneDrive – File & photo cloud storage : 12797
Ever - Capture Your Memories : 12755
Speak & Translate － Voice and Text Translator : 12062
Tayasui Sketches : 11505
Drawing Desk - Draw, Paint, Doodle & Sketch board : 11040
Microsoft PowerPoint : 10939
Email - F

Productivity can also be of interest. While rating count is affected by giants such as Evernote and Google Apps (Gmail, Docs, Drive...), there are other smaller apps with significant rating count. Basically, this genre is very flexible, and can support many different ideas - note taking apps, to-do-lists, calculators, sketching apps, drawing apps, calendars, habit trackers. If we come up with an innovative idea, such as gamification of something rudimentary, packing it in a nice graphical interface, we might see good revenue.

### Let's move on to Google Play Store apps

Explore the `Installs` column:

In [72]:
display_table(android_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


To perform computations, however, we'll need to convert each install number from a string to a float. This means we need to remove the commas and the plus characters, or the conversion will fail and cause an error.

To remove characters from strings, we can use the `str.replace(old, new)` method.
To remove certain characters, we can replace them with the empty string `''`

We will focus on the `Category` column from the `android_final` dataset, since `Genres` is more granular. `Category` column has less overall categories, so it makes it easier for us to look at the bigger picture. 

In [73]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [74]:
categories_android = freq_table(android_final, 1)

In [75]:
for category in categories_android:
    total = 0 # sum of all installs per category
    len_category = 0 #stores the number of apps specific to each category
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
    
    avg_installs = total / len_category
    print(category, ' : ', avg_installs)

ART_AND_DESIGN  :  1986335.0877192982
AUTO_AND_VEHICLES  :  647317.8170731707
BEAUTY  :  513151.88679245283
BOOKS_AND_REFERENCE  :  8767811.894736841
BUSINESS  :  1712290.1474201474
COMICS  :  817657.2727272727
COMMUNICATION  :  38456119.167247385
DATING  :  854028.8303030303
EDUCATION  :  1833495.145631068
ENTERTAINMENT  :  11640705.88235294
EVENTS  :  253542.22222222222
FINANCE  :  1387692.475609756
FOOD_AND_DRINK  :  1924897.7363636363
HEALTH_AND_FITNESS  :  4188821.9853479853
HOUSE_AND_HOME  :  1331540.5616438356
LIBRARIES_AND_DEMO  :  638503.734939759
LIFESTYLE  :  1437816.2687861272
GAME  :  15588015.603248259
FAMILY  :  3695641.8198090694
MEDICAL  :  120550.61980830671
SOCIAL  :  23253652.127118643
SHOPPING  :  7036877.311557789
PHOTOGRAPHY  :  17840110.40229885
SPORTS  :  3638640.1428571427
TRAVEL_AND_LOCAL  :  13984077.710144928
TOOLS  :  10801391.298666667
PERSONALIZATION  :  5201482.6122448975
PRODUCTIVITY  :  16787331.344927534
PARENTING  :  542603.6206896552
WEATHER  :  50

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [76]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:



In [77]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.



The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.



The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.



Let's take a look at some of the apps from this genre and their number of installs:



In [78]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ' : ', app[5])

E-Book Read - Read Book for free  :  50,000+
Download free book with green book  :  100,000+
Wikipedia  :  10,000,000+
Cool Reader  :  10,000,000+
Free Panda Radio Music  :  100,000+
Book store  :  1,000,000+
FBReader: Favorite Book Reader  :  10,000,000+
English Grammar Complete Handbook  :  500,000+
Free Books - Spirit Fanfiction and Stories  :  1,000,000+
Google Play Books  :  1,000,000,000+
AlReader -any text book reader  :  5,000,000+
Offline English Dictionary  :  100,000+
Offline: English to Tagalog Dictionary  :  500,000+
FamilySearch Tree  :  1,000,000+
Cloud of Books  :  1,000,000+
Recipes of Prophetic Medicine for free  :  500,000+
ReadEra – free ebook reader  :  1,000,000+
Anonymous caller detection  :  10,000+
Ebook Reader  :  5,000,000+
Litnet - E-books  :  100,000+
Read books online  :  5,000,000+
English to Urdu Dictionary  :  500,000+
eBoox: book reader fb2 epub zip  :  1,000,000+
English Persian Dictionary  :  500,000+
Flybook  :  500,000+
All Maths Formulas  :  1,000

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:



In [79]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'
                                           or app[5] == '50,000,000+'):
        print(app[0], ' : ', app[5])

Google Play Books  :  1,000,000,000+
Bible  :  100,000,000+
Amazon Kindle  :  100,000,000+
Wattpad 📖 Free Books  :  100,000,000+
Audiobooks from Audible  :  100,000,000+


However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [80]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '5,000,000+'
                                            or app[5] == '50,000,000+'
                                            or app[5] == '10,000,000+'
                                           or app[5] == '1,000,000+'):
        print(app[0], ' : ', app[5])

Wikipedia  :  10,000,000+
Cool Reader  :  10,000,000+
Book store  :  1,000,000+
FBReader: Favorite Book Reader  :  10,000,000+
Free Books - Spirit Fanfiction and Stories  :  1,000,000+
AlReader -any text book reader  :  5,000,000+
FamilySearch Tree  :  1,000,000+
Cloud of Books  :  1,000,000+
ReadEra – free ebook reader  :  1,000,000+
Ebook Reader  :  5,000,000+
Read books online  :  5,000,000+
eBoox: book reader fb2 epub zip  :  1,000,000+
All Maths Formulas  :  1,000,000+
Ancestry  :  5,000,000+
HTC Help  :  10,000,000+
Moon+ Reader  :  10,000,000+
English-Myanmar Dictionary  :  1,000,000+
Golden Dictionary (EN-AR)  :  1,000,000+
All Language Translator Free  :  1,000,000+
Aldiko Book Reader  :  10,000,000+
Dictionary - WordWeb  :  5,000,000+
50000 Free eBooks & Free AudioBooks  :  5,000,000+
Al-Quran (Free)  :  10,000,000+
Al Quran Indonesia  :  10,000,000+
Al'Quran Bahasa Indonesia  :  10,000,000+
Al Quran Al karim  :  1,000,000+
Al Quran : EAlim - Translations & MP3 Offline  :  5,

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.



However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.



In [81]:
for app in android_final:
    if app[1] == 'HEALTH_AND_FITNESS':
        print(app[0], ' : ', app[5])

Step Counter - Calorie Counter  :  500,000+
Lose Belly Fat in 30 Days - Flat Stomach  :  5,000,000+
Pedometer - Step Counter Free & Calorie Burner  :  1,000,000+
Six Pack in 30 Days - Abs Workout  :  10,000,000+
Lose Weight in 30 Days  :  10,000,000+
Pedometer  :  10,000,000+
LG Health  :  10,000,000+
Step Counter - Pedometer Free & Calorie Counter  :  10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App  :  10,000,000+
Sportractive GPS Running Cycling Distance Tracker  :  1,000,000+
30 Day Fitness Challenge - Workout at Home  :  10,000,000+
Home Workout for Men - Bodybuilding  :  1,000,000+
Fat Burning Workout - Home Weight lose  :  100,000+
Buttocks and Abdomen  :  500,000+
Walking for Weight Loss - Walk Tracker  :  100,000+
Running & Jogging  :  500,000+
Sleep Sounds  :  1,000,000+
Fitbit  :  10,000,000+
Lose Belly Fat-Home Abs Fitness Workout  :  50,000+
Cycling - Bike Tracker  :  500,000+
Abs Training-Burn belly fat  :  100,000+
Calorie Counter - EasyFit free  :  1,000,00

In [82]:
for app in android_final:
    if app[1] == 'HEALTH_AND_FITNESS' and (app[5] == '5,000,000+'
                                            or app[5] == '50,000,000+'
                                            or app[5] == '10,000,000+'
                                           or app[5] == '1,000,000+'):
        print(app[0], ' : ', app[5])

Lose Belly Fat in 30 Days - Flat Stomach  :  5,000,000+
Pedometer - Step Counter Free & Calorie Burner  :  1,000,000+
Six Pack in 30 Days - Abs Workout  :  10,000,000+
Lose Weight in 30 Days  :  10,000,000+
Pedometer  :  10,000,000+
LG Health  :  10,000,000+
Step Counter - Pedometer Free & Calorie Counter  :  10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App  :  10,000,000+
Sportractive GPS Running Cycling Distance Tracker  :  1,000,000+
30 Day Fitness Challenge - Workout at Home  :  10,000,000+
Home Workout for Men - Bodybuilding  :  1,000,000+
Sleep Sounds  :  1,000,000+
Fitbit  :  10,000,000+
Calorie Counter - EasyFit free  :  1,000,000+
Garmin Connect™  :  10,000,000+
BetterMe: Weight Loss Workouts  :  5,000,000+
Bike Computer - GPS Cycling Tracker  :  1,000,000+
Running Distance Tracker +  :  1,000,000+
Runkeeper - GPS Track Run Walk  :  10,000,000+
Walking: Pedometer diet  :  1,000,000+
8fit Workouts & Meal Planner  :  10,000,000+
Keep Trainer - Workout Trainer & Fitn

Health and fitness category seems to be dominated by apps for various tracking/counting, fitness programs, plans etc. Possibly we can turn a popular workout program into an app which includes workout tracking, timers, calorie counters etc. There are various workout programs which are not implemented into an app. Other interesting ideas might be meditation/relaxation apps, sleep sounds etc.

Let's check the ratings:

In [83]:
print(android_header)
print('\n')
print(android_final[:3])
print(categories_android)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]
{'ART_AND_DESIGN': 0.6430505415162455, 'AUTO_AND_VEHICLES': 0.9250902527075812, 'BEAUTY': 0.5979241877256317, 'BOOKS_AND_REFERENCE': 2.1435018050541514, 'BUSINESS': 4.591606498194946, 'COMICS': 0.6204873646209386, 'COMMUNICATION': 3.2378158844765346, 'DATING': 1.861462093862816, 'EDUCATIO

In [92]:
for category in categories_android:
    total = 0 #stores the sum of user ratings for each category
    len_category = 0 #stores the number of apps specific to each category
    
    for row in android_final:
        category_app = row[1]
        if category_app == category:
            ratings = float(row[2])
            total += ratings #adding the number of ratings if categories match
            len_category += 1 #increase by 1 for each app that matches category
    
    avg_rating_category = total / len_category
    print(category, ' : ', avg_rating_category)

ART_AND_DESIGN  :  nan
AUTO_AND_VEHICLES  :  nan
BEAUTY  :  nan
BOOKS_AND_REFERENCE  :  nan
BUSINESS  :  nan
COMICS  :  nan
COMMUNICATION  :  nan
DATING  :  nan
EDUCATION  :  nan
ENTERTAINMENT  :  4.118823529411763
EVENTS  :  nan
FINANCE  :  nan
FOOD_AND_DRINK  :  nan
HEALTH_AND_FITNESS  :  nan
HOUSE_AND_HOME  :  nan
LIBRARIES_AND_DEMO  :  nan
LIFESTYLE  :  nan
GAME  :  nan
FAMILY  :  nan
MEDICAL  :  nan
SOCIAL  :  nan
SHOPPING  :  nan
PHOTOGRAPHY  :  nan
SPORTS  :  nan
TRAVEL_AND_LOCAL  :  nan
TOOLS  :  nan
PERSONALIZATION  :  nan
PRODUCTIVITY  :  nan
PARENTING  :  nan
WEATHER  :  nan
VIDEO_PLAYERS  :  nan
NEWS_AND_MAGAZINES  :  nan
MAPS_AND_NAVIGATION  :  nan


In [91]:
for category in categories_android:
    total = 0 #stores the sum of user ratings for each category
    len_category = 0 #stores the number of apps specific to each category
    
    for row in android_final:
        category_app = row[1]
        if category_app == category:
            ratings = row[2]
            print(ratings)


4.1
4.7
4.5
4.3
4.4
3.8
4.1
4.4
4.7
4.4
4.4
4.2
4.6
4.4
3.2
4.7
4.5
4.3
4.6
4.0
4.1
4.7
4.7
4.8
4.7
4.1
3.9
4.1
4.2
4.1
4.5
4.2
4.7
3.8
4.1
4.7
4.0
4.2
4.5
3.8
4.2
4.7
4.6
4.2
4.3
4.8
4.4
4.7
3.4
4.8
4.0
4.8
NaN
4.2
4.3
NaN
5.0
4.2
4.0
3.8
4.6
3.9
4.3
4.9
4.4
4.2
4.0
3.9
4.6
4.9
4.3
4.6
4.9
3.9
4.0
4.3
3.9
4.2
4.8
3.6
4.2
4.8
4.8
4.6
4.5
4.3
4.5
4.9
3.9
4.4
4.0
4.3
3.7
4.4
4.3
3.2
4.6
4.6
4.5
3.7
4.6
4.6
4.6
4.0
4.4
4.0
4.9
4.5
4.0
4.4
NaN
NaN
NaN
4.3
4.8
NaN
4.5
4.8
4.2
3.5
4.2
4.0
2.6
NaN
4.2
4.4
4.0
3.5
NaN
3.2
NaN
3.0
NaN
3.1
2.1
3.9
4.6
NaN
NaN
4.7
4.9
4.7
3.9
3.9
4.2
4.6
4.3
4.7
4.7
4.8
4.2
4.3
4.5
4.1
NaN
4.2
4.5
4.4
4.0
4.1
4.1
4.4
4.6
4.5
NaN
3.9
4.4
NaN
4.6
3.8
NaN
NaN
4.0
4.3
4.5
NaN
4.1
3.7
4.7
4.2
3.1
NaN
4.0
NaN
4.0
NaN
4.5
NaN
NaN
4.0
4.7
3.9
4.5
4.6
4.4
4.5
4.5
4.4
4.5
4.6
4.8
3.9
4.6
4.2
4.7
4.3
3.3
4.6
4.8
NaN
4.1
4.6
4.1
4.6
4.7
4.5
3.9
4.4
4.3
4.2
4.5
4.4
3.4
4.9
4.6
4.4
NaN
4.4
4.4
4.4
3.5
NaN
4.7
4.2
4.7
4.0
4.2
4.3
4.2
4.3
3.9
NaN
5.0
NaN
NaN
NaN
4.3
4.6
4.6
4.6


We see that some apps have ratings as NaN, so we need to clean up those values - we will replace each nan with 0.0

In [86]:
import copy
android_final_clean = copy.deepcopy(android_final)

In [87]:
for app in android_final_clean:
    rating = app[2]
    if rating == 'NaN':
        app[2] = '0.0'

    

In [88]:
print(android_final_clean[:3])

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]


In [89]:
for category in categories_android:
    total = 0 #stores the sum of user ratings for each category
    len_category = 0 #stores the number of apps specific to each category
    
    for row in android_final_clean:
        category_app = row[1]
        if category_app == category:
            ratings = row[2]
            print(ratings)

4.1
4.7
4.5
4.3
4.4
3.8
4.1
4.4
4.7
4.4
4.4
4.2
4.6
4.4
3.2
4.7
4.5
4.3
4.6
4.0
4.1
4.7
4.7
4.8
4.7
4.1
3.9
4.1
4.2
4.1
4.5
4.2
4.7
3.8
4.1
4.7
4.0
4.2
4.5
3.8
4.2
4.7
4.6
4.2
4.3
4.8
4.4
4.7
3.4
4.8
4.0
4.8
0.0
4.2
4.3
0.0
5.0
4.2
4.0
3.8
4.6
3.9
4.3
4.9
4.4
4.2
4.0
3.9
4.6
4.9
4.3
4.6
4.9
3.9
4.0
4.3
3.9
4.2
4.8
3.6
4.2
4.8
4.8
4.6
4.5
4.3
4.5
4.9
3.9
4.4
4.0
4.3
3.7
4.4
4.3
3.2
4.6
4.6
4.5
3.7
4.6
4.6
4.6
4.0
4.4
4.0
4.9
4.5
4.0
4.4
0.0
0.0
0.0
4.3
4.8
0.0
4.5
4.8
4.2
3.5
4.2
4.0
2.6
0.0
4.2
4.4
4.0
3.5
0.0
3.2
0.0
3.0
0.0
3.1
2.1
3.9
4.6
0.0
0.0
4.7
4.9
4.7
3.9
3.9
4.2
4.6
4.3
4.7
4.7
4.8
4.2
4.3
4.5
4.1
0.0
4.2
4.5
4.4
4.0
4.1
4.1
4.4
4.6
4.5
0.0
3.9
4.4
0.0
4.6
3.8
0.0
0.0
4.0
4.3
4.5
0.0
4.1
3.7
4.7
4.2
3.1
0.0
4.0
0.0
4.0
0.0
4.5
0.0
0.0
4.0
4.7
3.9
4.5
4.6
4.4
4.5
4.5
4.4
4.5
4.6
4.8
3.9
4.6
4.2
4.7
4.3
3.3
4.6
4.8
0.0
4.1
4.6
4.1
4.6
4.7
4.5
3.9
4.4
4.3
4.2
4.5
4.4
3.4
4.9
4.6
4.4
0.0
4.4
4.4
4.4
3.5
0.0
4.7
4.2
4.7
4.0
4.2
4.3
4.2
4.3
3.9
0.0
5.0
0.0
0.0
0.0
4.3
4.6
4.6
4.6


2.8
0.0
0.0
4.3
3.9
0.0
0.0
3.9
4.4
4.6
0.0
4.1
0.0
0.0
3.8
4.6
3.5
4.7
4.3
4.5
4.4
3.4
4.3
0.0
4.3
4.6
4.6
0.0
0.0
0.0
4.6
0.0
0.0
4.4
3.9
4.5
3.8
4.3
3.7
5.0
3.2
4.5
4.7
4.0
0.0
4.1
3.8
4.7
4.5
0.0
4.2
0.0
4.1
3.1
4.3
4.2
4.2
4.1
4.0
4.4
3.9
4.2
4.0
4.0
4.1
4.4
4.0
3.9
3.6
0.0
4.4
4.3
4.1
3.8
4.1
3.7
3.9
4.4
4.2
0.0
0.0
0.0
0.0
5.0
0.0
4.5
4.2
0.0
4.1
4.3
4.1
4.1
4.3
4.0
4.6
4.4
4.4
4.3
4.6
3.2
4.5
4.6
4.6
4.3
3.0
4.4
4.4
4.3
3.5
4.3
4.5
3.7
4.5
4.2
4.3
4.6
4.5
3.9
4.3
3.3
4.2
3.2
4.4
3.5
3.7
3.8
3.8
3.7
4.0
3.6
3.8
4.1
4.7
3.9
4.1
4.6
4.2
4.6
4.2
4.4
4.6
4.1
3.9
4.0
4.1
4.6
4.1
4.1
4.1
4.4
4.3
4.4
4.2
4.4
4.3
4.1
4.0
4.2
4.1
3.9
4.4
4.6
4.5
4.5
4.8
4.4
4.5
4.5
4.6
4.5
3.1
1.7
4.5
4.7
3.4
3.8
3.0
3.0
3.1
4.0
3.4
3.5
3.8
2.9
2.6
3.5
4.0
4.1
3.8
4.0
4.2
4.1
4.0
4.4
2.9
2.8
0.0
1.9
3.4
4.3
4.4
5.0
4.1
0.0
5.0
3.8
5.0
0.0
0.0
3.8
4.6
4.7
4.6
4.7
4.6
4.3
4.5
4.0
4.2
4.3
4.0
3.5
4.3
4.4
4.4
3.1
4.7
0.0
0.0
4.1
3.8
4.5
4.1
0.0
4.6
0.0
0.0
4.1
4.4
3.9
4.3
3.9
3.7
4.7
0.0
2.3
4.2
5.0
4.9
0.0


In [90]:
for category in categories_android:
    total = 0 #stores the sum of user ratings for each category
    len_category = 0 #stores the number of apps specific to each category
    
    for row in android_final_clean:
        category_app = row[1]
        if category_app == category:
            ratings = float(row[2])
            total += ratings #adding the number of ratings if categories match
            len_category += 1 #increase by 1 for each app that matches category
    
    avg_rating_category = total / len_category
    print(category, ' : ', avg_rating_category)

ART_AND_DESIGN  :  4.185964912280701
AUTO_AND_VEHICLES  :  3.674390243902439
BEAUTY  :  3.3905660377358484
BOOKS_AND_REFERENCE  :  3.638421052631579
BUSINESS  :  2.5511056511056505
COMICS  :  4.025454545454546
COMMUNICATION  :  3.364808362369337
DATING  :  3.161818181818181
EDUCATION  :  4.298058252427182
ENTERTAINMENT  :  4.118823529411763
EVENTS  :  3.168253968253969
FINANCE  :  3.6375000000000006
FOOD_AND_DRINK  :  3.4854545454545454
HEALTH_AND_FITNESS  :  3.615384615384615
HOUSE_AND_HOME  :  3.4602739726027405
LIBRARIES_AND_DEMO  :  3.2216867469879515
LIFESTYLE  :  3.291618497109824
GAME  :  4.030742459396756
FAMILY  :  3.6934964200477376
MEDICAL  :  3.021405750798721
SOCIAL  :  3.6220338983050833
SHOPPING  :  3.781407035175881
PHOTOGRAPHY  :  3.957088122605364
SPORTS  :  3.3308970099667774
TRAVEL_AND_LOCAL  :  3.517874396135265
TOOLS  :  3.5284000000000004
PERSONALIZATION  :  3.4078231292517014
PRODUCTIVITY  :  3.4182608695652217
PARENTING  :  3.5913793103448284
WEATHER  :  3.8718

We see that highest rated categories are Art and Design, Comics, Education, Entertainment and Game.

In [93]:
for app in android_final_clean:
    if app[1] == 'ART_AND_DESIGN':
        print(app[0], ' : ', app[5])

Photo Editor & Candy Camera & Grid & ScrapBook  :  10,000+
U Launcher Lite – FREE Live Cool Themes, Hide Apps  :  5,000,000+
Sketch - Draw & Paint  :  50,000,000+
Pixel Draw - Number Art Coloring Book  :  100,000+
Paper flowers instructions  :  50,000+
Smoke Effect Photo Maker - Smoke Editor  :  50,000+
Infinite Painter  :  1,000,000+
Garden Coloring Book  :  1,000,000+
Kids Paint Free - Drawing Fun  :  10,000+
Text on Photo - Fonteee  :  1,000,000+
Name Art Photo Editor - Focus n Filters  :  1,000,000+
Tattoo Name On My Photo Editor  :  10,000,000+
Mandala Coloring Book  :  100,000+
3D Color Pixel by Number - Sandbox Art Coloring  :  100,000+
Learn To Draw Kawaii Characters  :  5,000+
Photo Designer - Write your name with shapes  :  500,000+
350 Diy Room Decor Ideas  :  10,000+
FlipaClip - Cartoon animation  :  5,000,000+
ibis Paint X  :  10,000,000+
Logo Maker - Small Business  :  100,000+
Boys Photo Editor - Six Pack & Men's Suit  :  100,000+
Superheroes Wallpapers | 4K Backgrounds 

In [94]:
for app in android_final_clean:
    if app[1] == 'EDUCATION':
        print(app[0], ' : ', app[5])

English Communication - Learn English for Chinese (Learn English for Chinese)  :  100,000+
Khan Academy  :  5,000,000+
Ai La Trieu Phu - ALTP Free  :  100,000+
Learn Spanish - Español  :  1,000,000+
Speed Reading  :  500,000+
English for beginners  :  1,000,000+
Mermaids  :  5,000,000+
Learn Japanese, Korean, Chinese Offline & Free  :  1,000,000+
Kids Mode  :  500,000+
Dinosaurs Coloring Pages  :  500,000+
Cars Coloring Pages  :  1,000,000+
Math Tricks  :  10,000,000+
Learn English Words Free  :  5,000,000+
Japanese / English one-shop search dictionary - Free Japanese - English - Japanese dictionary application  :  50,000+
English speaking texts  :  1,000,000+
Thai Handwriting  :  1,000,000+
THAI DICT 2018  :  1,000,000+
Kanji test · Han search Kanji training (free version)  :  1,000,000+
Flippy Campus - Buy & sell on campus at a discount  :  500,000+
Free intellectual training game application |  :  1,000,000+
ABC Preschool Free  :  5,000,000+
PINKFONG Baby Shark  :  1,000,000+
Englis

In [95]:
for app in android_final_clean:
    if app[1] == 'ENTERTAINMENT':
        print(app[0], ' : ', app[5])

Complete Spanish Movies  :  1,000,000+
Pluto TV - It’s Free TV  :  1,000,000+
Mobile TV  :  10,000,000+
TV+  :  5,000,000+
Digital TV  :  5,000,000+
Motorola Spotlight Player™  :  10,000,000+
Vigo Lite  :  5,000,000+
Hotstar  :  100,000,000+
Peers.TV: broadcast TV channels First, Match TV, TNT ...  :  5,000,000+
The green alien dance  :  1,000,000+
Spectrum TV  :  5,000,000+
H TV  :  5,000,000+
StarTimes - Live International Champions Cup  :  1,000,000+
Cinematic Cinematic  :  1,000,000+
MEGOGO - Cinema and TV  :  10,000,000+
Talking Angela  :  100,000,000+
DStv Now  :  5,000,000+
ivi - movies and TV shows in HD  :  10,000,000+
Radio Javan  :  1,000,000+
Talking Ginger 2  :  50,000,000+
Girly Lock Screen Wallpaper with Quotes  :  5,000,000+
🔥 Football Wallpapers 4K | Full HD Backgrounds 😍  :  1,000,000+
Movies by Flixster, with Rotten Tomatoes  :  10,000,000+
Low Poly – Puzzle art game  :  1,000,000+
BBC Media Player  :  10,000,000+
Amazon Prime Video  :  50,000,000+
Adult Glitter Colo

In [99]:
for app in android_final_clean:
    if app[1] == 'GAME' and (app[5] == '5,000,000+'
                                            or app[5] == '50,000,000+'
                                            or app[5] == '10,000,000+'
                                           or app[5] == '1,000,000+'):
        print(app[0], ' : ', app[5])

Solitaire  :  10,000,000+
Bubble Witch 3 Saga  :  50,000,000+
Race the Traffic Moto  :  10,000,000+
Marble - Temple Quest  :  10,000,000+
Shooting King  :  10,000,000+
Geometry Dash World  :  10,000,000+
Jungle Marble Blast  :  5,000,000+
Block Craft 3D: Building Simulator Games For Free  :  50,000,000+
Farm Fruit Pop: Party Time  :  1,000,000+
Love Balls  :  50,000,000+
Paint Hit  :  10,000,000+
Snake VS Block  :  50,000,000+
Rolly Vortex  :  10,000,000+
Woody Puzzle  :  1,000,000+
Stack Jump  :  10,000,000+
The Cube  :  5,000,000+
Bricks n Balls  :  1,000,000+
The Fish Master!  :  1,000,000+
Color Road  :  10,000,000+
Draw In  :  10,000,000+
Looper!  :  1,000,000+
Will it Crush?  :  5,000,000+
Tomb of the Mask  :  5,000,000+
Baseball Boy!  :  10,000,000+
Hello Stars  :  10,000,000+
Tank Stars  :  10,000,000+
Hole.io  :  10,000,000+
Mini Golf King - Multiplayer Game  :  5,000,000+
Flip the Gun - Simulator Game  :  10,000,000+
Mad Skills BMX 2  :  1,000,000+
MMX Hill Dash 2 – Offroad T

## Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets. Some of the factors we considered are number of ratings, number of installs, market share of each category and finally user ratings. 

We concluded that Genres and Categories which have high number of installs or number of ratings are much more likely to generate ad revenue due to the sheer number of user trying the app. In order to maintain user retention, we also look at whcih categories users tend to rate better. We concluded that Books can be a good option if we can come up with an app that takes a popular newer book and adds new features to it, such as daily quotes, summaries, quizzes and similar.

Another avenue might be Productivity apps, suchs as note taking apps, to-do lists, habit trackers and other productivity tools. They tend to rate better on Apple Store though. This might be due to the fact that Android has a particular prominence in lower income areas and developing nations. Comparatively, iOS users typically have higher income, higher education levels, more engagement, and spend more per app. iOS users are also more likely to subscribe to a good quality productivity app.

Games and Entertainment apps fare well in both markets, so developing a free app of that kind might be a good revenue stream. Some examples include simple addictive games, clickers, idle games, card/board games or emulators/remakes of older fan favorites whose licencing may be free.
Games are also very applicable for in-app purchases.

I hope you have enjoyed going through this analysis. I did take some extra steps to explore other key factors, did some additional cleaning etc.