# Determining Profitable Apps for the Smartphone market

We are trying to create an app that will succeed in the App Store and Google Play market. One that is free to download yet still brings in a lot of revenue.

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

The function above takes in 4 parameters: `dataset`, `start`, `end`, and `rows_and_columns`.

It will help slice the data we want to view as well as keep track of row and column we are viewing.

In [2]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apple_apps_data = list(read_file)

opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
google_apps_data = list(read_file)

In [3]:
explore_data(apple_apps_data, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns: 16


In [4]:
explore_data(google_apps_data, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10842
Number of columns: 13


In [5]:
print(apple_apps_data[0])

print(google_apps_data[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


# Data Cleaning Process

There are some rows of data that need to be removed as pointed out by some users on Kaggle. We need to understand what is going with the data by removing what is messing up that process of analysis.

In [19]:
# We use this to determine which row is causing anomolies, 
# such as not having correct amount of data equal to total columns
i = 1
for row in google_apps_data[1:]:
    if len(row) != len(google_apps_data[0]):
        print(i)
    i += 1

In [18]:
# Now we can delete the data row 10473
del google_apps_data[10473]

We need to remove duplicates of data.

In [27]:
# Let's determine how many dupes we have
duplicate_apps = []
unique_apps = []
for app in google_apps_data[::-1]:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of Dupicates:', len(duplicate_apps))
print('Examples:', duplicate_apps[:12])

Number of Dupicates: 1181
Examples: ['Maps & GPS Navigation — OsmAnd', 'Moovit: Bus Time & Train Time Live Info', 'Uber', 'Mapy.cz - Cycling & Hiking offline maps', 'Transit: Real-Time Transit App', 'Waze - GPS, Maps, Traffic Alerts & Live Navigation', 'Google News', 'NYTimes - Latest News', 'AC - Tips & News for Android™', 'Twitter', 'Newsroom: News Worth Sharing', 'CNN Breaking US & World News']


In order to get the most up-to-date data on particular apps with duplicates, we will run through the list in reverse order. *If it is possible, then we will be adding the right ones. I hope*

In [28]:
reviews_max = {}
for app in google_apps_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews # Updates to new max value
    # If not in the dictionary, just make an entry for it
    if name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('The length of the dictionary is:', len(reviews_max))

The length of the dictionary is: 9659


In [29]:
# We will use the dictionary we created to remove duplicate rows
google_clean = []
already_added = []
for app in google_apps_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        google_clean.append(app)
        already_added.append(name)
        
print('Length of Google Clean:', len(google_clean))

Length of Google Clean: 9659


In [31]:
google_clean[:3]

[['Photo Editor & Candy Camera & Grid & ScrapBook',
  'ART_AND_DESIGN',
  '4.1',
  '159',
  '19M',
  '10,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'January 7, 2018',
  '1.0.0',
  '4.0.3 and up'],
 ['U Launcher Lite – FREE Live Cool Themes, Hide Apps',
  'ART_AND_DESIGN',
  '4.7',
  '87510',
  '8.7M',
  '5,000,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'August 1, 2018',
  '1.2.4',
  '4.0.3 and up'],
 ['Sketch - Draw & Paint',
  'ART_AND_DESIGN',
  '4.5',
  '215644',
  '25M',
  '50,000,000+',
  'Free',
  '0',
  'Teen',
  'Art & Design',
  'June 8, 2018',
  'Varies with device',
  '4.2 and up']]

In order to remove duplicates, we had two steps:

- The first was determining which row of all duplicates was the most important. It was decided the that row with the most number of reviews was important because it seemed to be the most updated data we had on a particular app.
    - To execute that, an empty dictionary was created. Then every row of the data was cycled through to determine which app row was important. The number of reviews was noted in order to find the max number. The length was checked.
- Then the dictionary was used to cross reference apps that were already added into the cleaned data set and to match with the number of reviews already determined. 

Length checks were determined after each step. Everything matched.

In [46]:
# We are aimed at English-speaking audience so apps that are not 
# in English must be removed from the data
def english(string):
    i = 0 # The counter
    for char in string:
        if ord(char) > 127:
            i += 1
            if i > 3:
                return False
    return True

In [52]:
print(english('Instagram'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))

True
False
True
True


In [55]:
# Remove the non English apps
english_google_clean = []
for app in google_clean:
    if english(app[0]):
        english_google_clean.append(app)
        
explore_data(english_google_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


In [57]:
# Also for the iOS apps too
english_apple_clean = []
for app in apple_apps_data[1:]:
    if english(app[1]):
        english_apple_clean.append(app)
explore_data(english_apple_clean, 0, 5, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 6183
Number of columns: 16


In order to isolate it down to English only apps, we had create a function that does the following:

- It takes in a string.
- It initializes a counter where if the counter reaches a number larger than 3, then the string is deemed a non-English app.
    - The function iterates through each character of a string, then determines if that character is an english character using the `ord()` function.
        - If it deems it non-English, the counter adds 1.
        - If not, it continues.
- The function then deems a string, or app title, English (True) or non-English (false).

Once we completed the function, we then iterated through all the already cleaned data from earlier and use it to get our English only apps.

In [92]:
# We are focused on free app data, so we need to figure out how to
# remove the paid for apps, as well.
print(' Google Columns')
print(google_apps_data[0])
print(' Apple Columns')
print(apple_apps_data[0])

 Google Columns
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
 Apple Columns
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [73]:
print(type(english_google_clean[1][7]))
print(type(english_apple_clean[1][4]))

<class 'str'>
<class 'str'>


In [77]:
print(english_apple_clean[510][4])
print(english_google_clean[101][7])

0.99
0


In [87]:
free_google = []
free_apple = []
for app in english_google_clean:
    if app[7] == '0':
        free_google.append(app)
        
for app in english_apple_clean:
    if app[4] == '0.0':
        free_apple.append(app)
        
print(' Google Data')
explore_data(free_google, 0, 5, True)
print('\n', 'Apple Data')
explore_data(free_apple, 0, 5, True)

 Google Data
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8864
Number of columns: 13

 Apple Data
['284882215', 'Faceboo

# Insights

Now that we have cleaned our data, we want to determine which apps are more likely to attract more users because the number of app users affect revenue.

This is what we are targetting:
- Build a minimal Android version of the app, and add it to Google Play.
- If the app has a good response from users, we develop it further.
- If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

The reason why we are aiming towards finding an app profile that works on both the Apple and Google ecosystem is to define itself as the standard for apps that work on both. Google has a very open rules about app development so it's good for getting started on their unlike Apple, every app must be vetted.

As you can see there were a lot of apps on the Google play store that needed to be cleaned out, so once we isolated the market we were after we can provide insights from there. Apple is pretty clean about their data. They know which apps are allowed on the store and how much revenue can be made.

In order to execute the validation strategy for an app idea, we will build 2 functions. One to generate frequency tables, another to display percentages in descending order.

In [105]:
def freq_table(dataset, index):
    res = {}
    unique = []
    i = 0
    for row in dataset:
        name = row[index]
        i += 1
        if name in res:
            res[name] += 1
        else:
            res[name] = 1
            unique.append(name)
    
    for item in unique:
        res[item] /= i
        res[item] *= 100
#         res[item] = round(res[item], 2)
    return res

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

# The Tables

Apple's Prime Genre Column

In [107]:
display_table(free_apple, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


The most common genre on the Apple App store is Games. There are also little apps for genres such as Weather, Food & Drink, Reference, Business, Book, Navigation, Medical, and Catalogs. It seems as free apps, these are all focused on making revenue up front, not through ads. 

Most of the apps are focused on entertaining the user or connecting with others, rather than practical purposes. Since there are a lot of Games apps, it seems making an app in that market wouldn't ensure revenue. Making an app that has less competition, aka most of the practical usage apps, then it'd help in revenue.

Google's "Genres" and "Category" columns

In [108]:
# Genres
print('Genres')
display_table(free_google, 9)
print('\n')
# Categories
print('Category')
display_table(free_google, 1)

Genres
Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.925090

Looking on the Google Play store, the top 5 most popular genres are Tools, Entertainment, Education, Business, and Productiviy. The top 5 most popular categories are Family, Game, Tools, Business, and Lifestyle. In comparison to the Apple App store, the Google play store has the genres of entertainment and practicality evenly distributed amongst their users. 

However looking at the Genres, there are a lot of gray areas, there are apps that fall into one or more categories, rather than just being strictly Entertainent or strictly Educational, there could be an app that does both and is labeled as such.

It is difficult to recommend an app profile when there is lots more distribution of apps in all aspects, there essentially is no niche you can try to approach in easily. Trying to recommend an app for Apple is easy, on the Android systems, not so much.

In [109]:
# Let's see which kind of apps have the most users
prime_genre = freq_table(free_apple, 11)
print(prime_genre)

{'Social Networking': 3.2898820608317814, 'Photo & Video': 4.9658597144630665, 'Games': 58.16263190564867, 'Music': 2.0484171322160147, 'Reference': 0.5586592178770949, 'Health & Fitness': 2.0173805090006205, 'Weather': 0.8690254500310366, 'Utilities': 2.5139664804469275, 'Travel': 1.2414649286157666, 'Shopping': 2.60707635009311, 'News': 1.3345747982619491, 'Navigation': 0.186219739292365, 'Lifestyle': 1.5828677839851024, 'Entertainment': 7.883302296710118, 'Food & Drink': 0.8069522036002483, 'Sports': 2.1415270018621975, 'Book': 0.4345127250155183, 'Finance': 1.1173184357541899, 'Education': 3.662321539416512, 'Productivity': 1.7380509000620732, 'Business': 0.5276225946617008, 'Catalogs': 0.12414649286157665, 'Medical': 0.186219739292365}


In [111]:
# Remind ourselves of the column names
print(' Google Columns')
print(google_apps_data[0])
print(' Apple Columns')
print(apple_apps_data[0])

 Google Columns
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
 Apple Columns
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [120]:
for key in prime_genre:
    total = 0
    len_genre = 0
    for app in free_apple:
        genre_app = app[11]
        if genre_app == key:
            number = float(app[5])
            total += number
            len_genre += 1
    average = total / len_genre
    print('Average Rating Count for', key, ': \t\t', round(average, 2))

Average Rating Count for Social Networking : 		 71548.35
Average Rating Count for Photo & Video : 		 28441.54
Average Rating Count for Games : 		 22788.67
Average Rating Count for Music : 		 57326.53
Average Rating Count for Reference : 		 74942.11
Average Rating Count for Health & Fitness : 		 23298.02
Average Rating Count for Weather : 		 52279.89
Average Rating Count for Utilities : 		 18684.46
Average Rating Count for Travel : 		 28243.8
Average Rating Count for Shopping : 		 26919.69
Average Rating Count for News : 		 21248.02
Average Rating Count for Navigation : 		 86090.33
Average Rating Count for Lifestyle : 		 16485.76
Average Rating Count for Entertainment : 		 14029.83
Average Rating Count for Food & Drink : 		 33333.92
Average Rating Count for Sports : 		 23008.9
Average Rating Count for Book : 		 39758.5
Average Rating Count for Finance : 		 31467.94
Average Rating Count for Education : 		 7003.98
Average Rating Count for Productivity : 		 21028.41
Average Rating Count fo

Navigation has the highest amount of reviews but is most likely coming from Google Maps, Waze, or Apple's built in Maps app.

In [115]:
# Now we look at Android's number of Average Installs per Genre
display_table(free_google, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [116]:
category = freq_table(free_google, 1)
print(category)

{'ART_AND_DESIGN': 0.6430505415162455, 'AUTO_AND_VEHICLES': 0.9250902527075812, 'BEAUTY': 0.5979241877256317, 'BOOKS_AND_REFERENCE': 2.1435018050541514, 'BUSINESS': 4.591606498194946, 'COMICS': 0.6204873646209386, 'COMMUNICATION': 3.2378158844765346, 'DATING': 1.861462093862816, 'EDUCATION': 1.1620036101083033, 'ENTERTAINMENT': 0.9589350180505415, 'EVENTS': 0.7107400722021661, 'FINANCE': 3.7003610108303246, 'FOOD_AND_DRINK': 1.2409747292418771, 'HEALTH_AND_FITNESS': 3.0798736462093865, 'HOUSE_AND_HOME': 0.8235559566787004, 'LIBRARIES_AND_DEMO': 0.9363718411552346, 'LIFESTYLE': 3.9034296028880866, 'GAME': 9.724729241877256, 'FAMILY': 18.907942238267147, 'MEDICAL': 3.531137184115524, 'SOCIAL': 2.6624548736462095, 'SHOPPING': 2.2450361010830324, 'PHOTOGRAPHY': 2.944494584837545, 'SPORTS': 3.395758122743682, 'TRAVEL_AND_LOCAL': 2.33528880866426, 'TOOLS': 8.461191335740072, 'PERSONALIZATION': 3.3167870036101084, 'PRODUCTIVITY': 3.892148014440433, 'PARENTING': 0.6543321299638989, 'WEATHER': 

In [139]:
for key in category:
    total = 0
    len_category = 0
    for app in free_google:
        category_app = app[1]
        if category_app == key:
            number = app[5]
            number = number.replace('+','')
            number = number.replace(',','')
            number = float(number)
            total += number
            len_category += 1
    average = total / len_category
    print('Average Installs of {}: {:>10,}'.format(key, round(average, 2)))

Average Installs of ART_AND_DESIGN: 1,986,335.09
Average Installs of AUTO_AND_VEHICLES: 647,317.82
Average Installs of BEAUTY: 513,151.89
Average Installs of BOOKS_AND_REFERENCE: 8,767,811.89
Average Installs of BUSINESS: 1,712,290.15
Average Installs of COMICS: 817,657.27
Average Installs of COMMUNICATION: 38,456,119.17
Average Installs of DATING: 854,028.83
Average Installs of EDUCATION: 1,833,495.15
Average Installs of ENTERTAINMENT: 11,640,705.88
Average Installs of EVENTS: 253,542.22
Average Installs of FINANCE: 1,387,692.48
Average Installs of FOOD_AND_DRINK: 1,924,897.74
Average Installs of HEALTH_AND_FITNESS: 4,188,821.99
Average Installs of HOUSE_AND_HOME: 1,331,540.56
Average Installs of LIBRARIES_AND_DEMO: 638,503.73
Average Installs of LIFESTYLE: 1,437,816.27
Average Installs of GAME: 15,588,015.6
Average Installs of FAMILY: 3,695,641.82
Average Installs of MEDICAL: 120,550.62
Average Installs of SOCIAL: 23,253,652.13
Average Installs of SHOPPING: 7,036,877.31
Average Insta

Communication has the highest number of installs per app. This might be heavily influenced by those that use FB Messenger, WhatsApp, or other communication apps that are popular around the world. There also seems to be lots of h

# In Conclusion

Things to consider to improve this project:
- Analyze the frequency table for the Genre column of the Google Play dataset, and see if you can find useful patterns.
- Assume we could also make revenue via in-app purchases and subscriptions, and try to determine which genres seem to be liked the most by users — you could examine app ratings here.
- Refine your project using our data science project [style guide](https://www.dataquest.io/blog/data-science-project-style-guide/).