# Project 1: Bringing more value from free apps based on ads revenue
***
## What is this project and its goal?
This project is about simulating the role of a data analyst working at a company that develops apps for Android and iOS systems. The business model here is one of free apps with revenue coming in through in-app ads.

We are in charge of using data from our portfolio of apps to help the app developers understand what type of apps bring about the most users.

This is valuable since more users means more ad revenue.

The data we will be working here are "play" data sets, and they both have data for the Android environment and another for Apple. The data sets, and their documentation, can be found here: [Google](https://www.kaggle.com/lava18/google-play-store-apps) and [Apple](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

# First objective: 
## Open and read the data, and understand its size

In [1]:
# First, let's open and read the csv files with the data from both the Google 
# Play Store and Apple Store

opfil_g = open(r"C:\Users\fv.eco\Documents\Jupyter Docs\DQ Project 1\Data\googleplaystore.csv", encoding='utf8')
opfil_a = open(r"C:\Users\fv.eco\Documents\Jupyter Docs\DQ Project 1\Data\AppleStore.csv", encoding='utf8')

from csv import reader

refil_g = reader(opfil_g)
refil_a = reader(opfil_a)

google_data = list(refil_g)
apple_data = list(refil_a)

# The working data sets are now called "google_data" and "apple_data". They 
# both include a header row.

In [2]:
# Now let's see how the data set looks like, plus its size

# Print just the header
print(google_data[0])
print('\n')
print(apple_data[0])
print('\n')

# Print two first rows of data
print(google_data[1:3])
print('\n')
print(apple_data[1:3])
print('\n')

# Now let's measure the width (number of columns / features) and length (how 
# many data points) of the data sets
print('Google, width')
print(len(google_data[0]))
print('Apple, width')
print(len(apple_data[0]))
print('\n')
print('Google, length')
print(len(google_data[1:]))
print('Apple, length')
print(len(apple_data[1:]))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']]


[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Pho

## Short description of data sets
Google data set has 13 variables and 10841 observations.

Apple data set has 16 variables and 7197 observations.

Noteworthy is that the variables in each data set do not match, and it seems that all data entries are in the form of strings.

# Second Objective
## Identify variables that are likely useful for our goal
This identification will be done manually by utilizing our own reasoning isntead of leaning on statistical/econometric techniques.

The name alone on some of the variables is not enough to understand what exactly it is that they describe. So, the documentation for the data sets can be found here: [Google](https://www.kaggle.com/lava18/google-play-store-apps) and [Apple](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

Here are the tables with the selected variables.
For Google:

| Variable | Description |
| -------- | ----------- |
| App      | Application name |
| Category | Category the app belongs to |
| Rating   | Overall user rating of the app (as when scraped) |
| Reviews  | Number of user reviews for the app (as when scraped) |
| Size     | Size of the app (as when scraped) |
| Installs | Number of user downloads/installs for the app (as when scraped) |
| Type     | Paid or Free |
| Price    | Price of the app (as when scraped) |
| Content Rating | Age group the app is targeted at - Children / Mature 21+ / Adult |
| Genres | An app can belong to multiple genres (apart from its main category) |

For Apple:

| Variable | Description |
| -------- | ----------- |
| id      | App ID |
| track_name | App Name |
| size_bytes   | Size (in bytes) |
| currency  | Currency Type |
| price     | Price amount |
| ratingcounttot | User Rating counts (for all version) |
| user_rating | Average User Rating value (for all version) |
| cont_rating    | Content Rating |
| prime_genre | Primary Genre |
| ipadSc_urls.num | Number of screenshots showed for display |
| lang.num | Number of supported languages |

# Third Objective - Data cleaning

We are now going to detect inaccurate data, correcting it or removing it, and remove duplicate data entries. We will also remove non-english apps from the data set

## Rows with problems

There seems to be one row in the Google Play data set with problems, according to a discussion forum about the data set. The row in question is 10473. I also saw a potential problem in row 13.
Let's check out both those rows and see what is going on:

In [3]:
# Print the potnetially problematic rows from Google data set (with header row
# for reference):
print('\n')
print(google_data[0])
print('\n')
print(google_data[10473])
print('\n')
print(google_data[13])
print('\n')



['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Tattoo Name On My Photo Editor', 'ART_AND_DESIGN', '4.2', '44829', '20M', '10,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'April 2, 2018', '3.8', '4.1 and up']




The data set indeed has a problem on row 10473. There is no problem on row 13.

We will just delete the problematic row:

In [4]:
del google_data[10473]

## Next, duplicate entries

We will now find all duplicate entries. For this, we will comb through all app names and store duplicate names in a specific list for that purpose:

In [5]:
unique_apps = []
duplicate_apps = []

for app in google_data[1:]:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
print(len(duplicate_apps))
print(len(unique_apps))

1181
9659


We find that there are 1181 duplicate values. Let's see a few of those apps:

In [6]:
for dup in duplicate_apps[:2]:
    for app in google_data[1:]:
        if dup == app[0]:
            print(app)
            print('\n')

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,

We see that sometimes the duplicates differ in number of reviews. Some other times, they are perfect duplicates.
We will now define a strategy to handle duplicate.

If the duplicates are perfect copies of each other:
- We will remove all duplicates after the first copy is found on the data set.
Reason for this is that if they are just perfect copies of each other, then it doesn't matter which one we pick. It is easier to code something that keeps the first copy found.

If the duplicates are not perfect copies:
- We will keep the record with the most reviews.
Reason for this is that the item with the most downloads/installs ought to be the most recent one.

Now comes the code that implements this strategy. We will start by building a dictionary where each key is the name of the app, and the numbers held by that key are the numbers that will help us select the right app information list element accoridng to our strategy defined just above. We can then use these numbers to support a "for" loop and clean the duplicates off of our data set.

In [7]:
reviews_max = {}
for app in google_data[1:]:
    if app[0] not in reviews_max:
        reviews_max[app[0]] = float(app[3])
    elif float(app[3]) > reviews_max[app[0]]:
        reviews_max[app[0]] = float(app[3])

Let's check how the dictionary looks. We can print a few values to see if everything is looking as expected:

In [8]:
dict_vals = list(reviews_max.items())
print(dict_vals[:5])
del dict_vals

[('Photo Editor & Candy Camera & Grid & ScrapBook', 159.0), ('Coloring book moana', 974.0), ('U Launcher Lite – FREE Live Cool Themes, Hide Apps', 87510.0), ('Sketch - Draw & Paint', 215644.0), ('Pixel Draw - Number Art Coloring Book', 967.0)]


The dictionary seems to look like it should.

We will now remove duplicates from our original data set:

In [9]:
# Create an empty list to hold our new data with no duplicates and another list to help keep track of apps already in the 
# data set:
android_clean = []
already_added = []

# Now loop through the original data set with the duplicates and add only non-duplicates to a new list called "android_clean".

for app in google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
        
    

Let's now check if things went as expected. Check a few lines of the new data set and check its length:

In [10]:
for n in range(0,5):
    print(android_clean[n])
    print("\n")

print(len(android_clean))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


9659


Both a few elements of the data set look alright, and its length also is correct.

## Remove non-english apps from data set

Now, in the hypothetic scenario we are in, working for a company etc., we are only interested in english apps. This means we should filtrate any non-english apps.

What we will do now is create a function that can flag strings if they include any non-english character. We can do so by looking at each character in a string and check its ASCII code. Any ASCII code above 127 will not be a regular english character.

Our strategy is to count an app as non-english if there are more than three non-english characters as defined by ASCII codes above 127.

Here is the checker function:

In [11]:
def eng_check(st):
    count = 0
    for char in st:
        if count == 4:
            return False
        elif ord(char)>127:
            count += 1
    return True

Let's check that the function is working as intended with a few hand made examples:

In [12]:
print(eng_check('Instagram'))
print(eng_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(eng_check('Docs To Go™ Free Office Suite'))
print(eng_check('Instachat 😜'))
print(eng_check('Clean Title!!!!!'))

True
False
True
True
True


The function above seems to be working as intended.

We can now use it to clean our data sets:

In [13]:
android_eng = []
apple_eng = []
for app in android_clean:
    if eng_check(app[0]):
        android_eng.append(app)
for app in apple_data[1:]:
    if eng_check(app[1]):
        apple_eng.append(app)

print(android_eng[0:3])
print(apple_eng[0:3])
print(len(android_eng))
print(len(apple_eng))

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']]
9616
6226


Let's see how many rows were deleted from the data sets. Originally we had 9659 (7197) rows for the Android (Apple) data set. We have now 9616 rows in the Android data set and 6226 in the Apple data set.

## Removing non-free apps from the data set

Since at our job we only get revenue from in-app ads, we also want to remove any non-free apps from the data set.

For this we will loop the data sets and copy all free apps to new lists, in the same style we have used before. We then check a few rows and the overall length of the new data sets.

In [14]:
android_free = []
apple_free = []

for app in android_eng:
    type_p = app[6]
    if type_p == 'Free':
        android_free.append(app)

for app in apple_eng:
    price = float(app[4])
    if price == 0:
        apple_free.append(app)

print(android_free[:3])
print(apple_free[:3])
print(len(android_free))
print(len(apple_free))

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']]
8865
3253


We have 8865 free apps on the android platform and 3253 free apps in the Apple one. This will now be the information we will use to derive value to our app developer colleagues.

# Fourth Objective - Analysis

We want to analyze the data now to derive value from it. Recall that our objective is to bring valuable information to the developers at the company so that they can focus on app characteristics that are popular. This way, a larger user base can be achieved, and more add revenue will follow.

The company has a validation strategy for developing app ideas. It is a minimum viable product type of strategy as follows:

- The idea is first developed for the Android platform with a minimum viable set of features.
- If the app has a good response from users, it is developed further.
- If the app is profitable after six months, the app is ported into the Apple platform.

An alternative strategy would be to plan and execute an app idea with a broad set of features first. Though this strategy can work, it is a risky one, since a lot more resources are being put into a product that may not be popular. The strategy we defined above carries the risk of offering an underwelming app with lacking features, but at the same time only a good core idea should be followed. Such ideas can be explored easily since the overhead for a deployable product is low.

## Apps by genre

We will now look at what app genres are most popular. The Android data set as a column for app category and another for genre, in the 2nd and 10th column. The Apple data set has only a column for genre on the 12th column.

We need tools to, first, take out the information into a frequency table(the *freq_table* function), and second, display those results orderly (the *display_table* function). We will start by building functions to scrape the relevant information from the data set, and then another function to process the information into the right display format.

In [15]:
def freq_table(dataset, index):
    tot_rows = len(dataset)
    fr_tb = {}
    for dp in dataset:
        if dp[index] in fr_tb:
            fr_tb[dp[index]] += 1
        else:
            fr_tb[dp[index]] = 1
    for diel in fr_tb:
        fr_tb[diel] /= tot_rows
        fr_tb[diel] *= 100
        fr_tb[diel] = round(fr_tb[diel], 2)
    return fr_tb

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We can now show the three columns of interest, "Category" and "Genre" for Android, and "prime_genre" for Apple:

In [16]:
print("For Android, the 'Category' column:")
print(display_table(android_free, 1))
print("\n")
print("For Android, the 'Genre' column:")
print(display_table(android_free, 9))
print("\n")
print("For Apple, the 'Genre' column:")
print(display_table(apple_free, 11))
print("\n")

For Android, the 'Category' column:
FAMILY : 18.89
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.91
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.24
BOOKS_AND_REFERENCE : 2.15
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.92
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6
None


For Android, the 'Genre' column:
Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Lifestyle : 3.9
Productivity : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
S

 ### For the analysis of the Apple data set:
 
 - What is the most common genre?
     - Games first.
     - Then Entertainment.

 - What other patterns do we see?
     - Games are by far the most popular type of app. By far.
     - Every other app genre has a relatively low market share.
     
 - What is the general impression?
     - Most apps are geared towards entertainment, and not practical purposes.
     
 - What recommendations can be made from this table alone?
     - Not a hole lot. We need more information to derive value from the data. Such as:
         - How is the user base for each app genre?
         - Is the market saturated for a given app genre?


 ### For the analysis of the Android data set:
 
 - What is the most common genre?
     - Tools first.
     - Then Entertainment.

 - What other patterns do we see?
     - The market share is quite evenly spread out.
     - The most popular app genre is close to the rest of the stack.
     
 - How does the Android results compare to the Apple ones?
     - There are a lot more genres.
     - There seems to be a stronger preference for practical purpose apps than entertainment.
     
 - What recommendations can be made from this table alone?
     - Same answer as before. We need other complementary information to derive value from these results, such as the user base of each app genre.

***********

## Grabbing extra information - the user base

We will now scrape info about the user base. We can take the number of downloads/installs from the Android data set, but the Apple data set is missing this information. We will proxy that information by making use of the "Nr. of ratings" column there.

We also want to have the user base numbers per genre of app.

We will first obtain the unique genre names, and then run statistics on the user base per unique genre name.

In [17]:
apple_un_gen = freq_table(apple_free, 11)
rk_1 = []
for genre in apple_un_gen:
    total = 0
    len_genre = 0
    for app in apple_free:
        genre_app = app[11]
        if app[11] == genre:
            count = float(app[5])
            total += count
            len_genre += 1
    avg_nr = round(total / len_genre, 1)
    
    rk_1.append([avg_nr, genre])

rk_1.sort(reverse=True)

for el in rk_1:
    print(el)       

[86090.3, 'Navigation']
[74942.1, 'Reference']
[71548.3, 'Social Networking']
[57326.5, 'Music']
[50477.1, 'Weather']
[37217.7, 'Book']
[29885.8, 'Food & Drink']
[28441.5, 'Photo & Video']
[27638.2, 'Finance']
[26925.2, 'Travel']
[25996.3, 'Shopping']
[23298.0, 'Health & Fitness']
[23008.9, 'Sports']
[22691.8, 'Games']
[21248.0, 'News']
[20702.2, 'Productivity']
[18460.4, 'Utilities']
[16168.7, 'Lifestyle']
[13831.3, 'Entertainment']
[7075.3, 'Business']
[7004.0, 'Education']
[4004.0, 'Catalogs']
[612.0, 'Medical']


Navigation apps have the highest average user base, followed by Reference and Social Networking.

Given that these three genres had really low market shares, this impies that the market for these genres is quite concentrated. That presents possibilities for obtaining a large user base, but it also may be challenging to disrupt the oligopoly in place.

Trying to strike a balance between a high user base and a low amount of apps already in the market, an app recommendation is a **Weather** app.

## User base for the Android data set

We will now do the same for the Android data set.

In [18]:
android_cat = freq_table(android_free, 1)
rk_2 = []
for cat in android_cat:
    total = 0
    len_cat = 0
    for app in android_free:
        cat_app = app[1]
        if cat_app == cat:
            nr_inst = app[5]
            nr_inst = float(nr_inst.replace("+","").replace(",",""))
            total += nr_inst
            len_cat += 1
    avg_nr = round(total / len_cat, 1)
    rk_2.append([avg_nr, cat])

rk_2.sort(reverse=True)

for el in rk_2:
    print(el) 

[38456119.2, 'COMMUNICATION']
[24727872.5, 'VIDEO_PLAYERS']
[23253652.1, 'SOCIAL']
[17840110.4, 'PHOTOGRAPHY']
[16787331.3, 'PRODUCTIVITY']
[15588015.6, 'GAME']
[13984077.7, 'TRAVEL_AND_LOCAL']
[11640705.9, 'ENTERTAINMENT']
[10801391.3, 'TOOLS']
[9549178.5, 'NEWS_AND_MAGAZINES']
[8721959.5, 'BOOKS_AND_REFERENCE']
[7036877.3, 'SHOPPING']
[5201482.6, 'PERSONALIZATION']
[5074486.2, 'WEATHER']
[4188822.0, 'HEALTH_AND_FITNESS']
[4056941.8, 'MAPS_AND_NAVIGATION']
[3697848.2, 'FAMILY']
[3638640.1, 'SPORTS']
[1986335.1, 'ART_AND_DESIGN']
[1924897.7, 'FOOD_AND_DRINK']
[1833495.1, 'EDUCATION']
[1712290.1, 'BUSINESS']
[1433701.5, 'LIFESTYLE']
[1387692.5, 'FINANCE']
[1331540.6, 'HOUSE_AND_HOME']
[854028.8, 'DATING']
[817657.3, 'COMICS']
[647317.8, 'AUTO_AND_VEHICLES']
[638503.7, 'LIBRARIES_AND_DEMO']
[542603.6, 'PARENTING']
[513151.9, 'BEAUTY']
[253542.2, 'EVENTS']
[120550.6, 'MEDICAL']


Given that we want an app to be successful in both Android and Apple systems, we now check if our **Weather** app recommendation still makes sense.

It seems that it does, as it has a relatively high user base and a very low market share in app genres.

# Result of Analysis

**Our app recommendation is a weather one for the purpose of raising revenue through in-app ads.**