# Guided Project: Profitable App Profiles for the App Store and Google Play Markets

In this project, I will act as a data analyst for a company that builds Android and iOS mobile apps. The apps that are created are free and in-app ads are the source of the revenue, meaning, the more users that use and see the ads, the better. The goal is to identify what type of apps are most likely to attract more users.

For this guided project, the goal is to apply all the missions in Fundamentals of Python for Data Science in Dataquest. In this course, I learned the basics of programming in Python, list and for loops, conditional statements, dictionaries and frequency tables, functions, and Jupyter Notebook.

## Opening and Exploring the Data

In this section, I will open the provided csv file from Apple Store and Google Playstore. To explore the data, `explore_data()` function will be used in order to repeatedly use to print rows in a readable way.

Under this section, we can identify the column names that can be useful in evaluating which apps are most likely to attract users. 'price', 'rating_count_tot', 'user_rating', and 'content_rating' are some examples. Full documentation can be accessed [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).

In [1]:
from csv import reader

# To open AppleStore.csv
opened_file = open('AppleStore.csv', encoding="utf8")
read_file = reader(opened_file)
iOS_apps_data = list(read_file)
iOS_header = iOS_apps_data[0]  
iOS_body = iOS_apps_data[1:]

# To open googleplaystore.csv
opened_file = open('googleplaystore.csv', encoding="utf8")
read_file = reader(opened_file)
android_apps_data = list(read_file)
android_header = android_apps_data[0]
android_body = android_apps_data[1:]

In [2]:
# Exploring the Data in Apple Store
def explore_data(dataset, start, end):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # Will add an empty line after each row
        
    print('Number of rows:', len(dataset))
    print('Number of columns:', len(dataset[0]))

print(iOS_header)
print('\n')
explore_data(iOS_body,0,3)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


In [3]:
# Exploring the Data in Google Playstore
def explore_data(dataset, start, end):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # Will add an empty line after each row
        
    
    print('Number of rows:', len(dataset))
    print('Number of columns:', len(dataset[0]))
        
print(android_header) 
print('\n')
explore_data(android_body,0,3)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


## Deleting Wrong Data

Since the company only targets the *English Speaking audience* and *free to download apps*, we need to detect apps that is not in English language and those apps that are not free. *Data cleaning* has been implemented.

In this section, the Google Play data set has a dedicated [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), and we can see that [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes an error for a certain row.

In [4]:
for row in android_apps_data:
    length_header = len(android_header)
    length_row = len(row)
    if length_header != length_row:
        print(row)
        print(android_body.index(row))      

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472


Now that we identify and verify the index of the row that has an error, we can now remove it by using `del` statement. 

In [5]:
print('Number of rows before deletion of error: ', len(android_body))
del android_body[10472]
print('Number of rows after deletion of error: ', len(android_body))

Number of rows before deletion of error:  10841
Number of rows after deletion of error:  10840


## Removing Duplicate Entries: Part One

In the Google Play data set, there are parts wherein applications have duplicate entries. The code below will show how many apps were duplicated in the data set and some examples to confirm that there are duplicate rows in the data set.

In [6]:
duplicate_apps = []
unique_apps = []

for app in android_body:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps: ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps: ', duplicate_apps[:10])

Number of duplicate apps:  1181


Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


However, we will not remove the duplicates randomly. The code below will show that there are different number of reviews in an app, say, *Google Ads*. As we can observe, the third data has the highest reviews. What we will do is compare them and only consider the app with the highest number of reviews.  

In [7]:
for app in android_body:
    name = app[0]
    if name == 'Google Ads':
        print(app)
        print('\n')

['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']




For now, let us now compute how many applications should remain when the duplicates were removed.

In [8]:
print('Expected length: ', len(android_body) - len(duplicate_apps))

Expected length:  9659


## Removing Duplicate Entries: Part Two

In this section, we will use dictionary and use the information stored in it. After this, the data should be only around 9659 since there are 1181 duplicates.

In removing duplicates:
- Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
- Use the information stored in the dictionary and create a new data set, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

In [9]:
reviews_max = {}
for app in android_body:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In the code below, we will show comparison of the expected and actual length of the data.

In [10]:
print('Expected length: ', len(android_body) - len(duplicate_apps))
print('Actual length: ', len(reviews_max))

Expected length:  9659
Actual length:  9659


In this next code, we will remove the duplicate rows.

In [11]:
android_clean = [] # will store new cleaned data set
already_added = [] # will just store app names

for app in android_body:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
    

To explore the `android_clean` data set, use `explore_data` function.

In [12]:
explore_data(android_clean, 0,3)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Removing Non-English Apps: Part One

In Google Playstore, there are duplicates. That is not the case in App Store data set. The problem however is that there are Non-English Apps included. The goal here now is to remove it from the data set.  

In [13]:
def english_language(a_string):
    for character in a_string:
        if ord(character) > 127:
            return False
    return True

print(english_language(a_string = 'Instagram'))
print(english_language(a_string = '爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_language(a_string = 'Docs To Go™ Free Office Suite'))
print(english_language(a_string = 'Instachat 😜'))

True
False
False
False


## Removing Non-English Apps: Part Two

In part one of this, we failed to identify Docs To Go™ Free Office Suite and Instachat 😜 as an English App since there are characters that are not in the ASCII range and emojis. 

In [14]:
print(ord('™')) # returns ASCII code
print(ord('😜')) # returns ASCII code

8482
128540


This time, we will edit the function created earlier in order to filter out the non-English apps.

In [15]:
def check_english_language(a_string):
    count_input = 0
    for character in a_string:
        if ord(character) > 127:
            count_input += 1
            if count_input > 3:
                return False
        
    return True

print(check_english_language(a_string = '爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_english_language(a_string = 'Docs To Go™ Free Office Suite'))
print(check_english_language(a_string = 'Instachat 😜'))

False
True
True


Next, we will use the new function created to filter out non-English apps from both data sets.

In [16]:
check_english_iOS = []
check_english_android = []

for app in iOS_body:
    name = app[1]
    if check_english_language(name):
        check_english_iOS.append(app)
        
for app in android_clean:
    name = app[0]
    if check_english_language(name):
        check_english_android.append(app)

print('New Data in iOS (English Apps):', len(check_english_iOS))
explore_data(check_english_iOS,0,3)
print('\n')
print('New Data in Google Playstore (English Apps):', len(check_english_android))
explore_data(check_english_android,0,3)

New Data in iOS (English Apps): 6183
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


New Data in Google Playstore (English Apps): 9614
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '21

## Isolating the Free Apps

In this part, we will isolate the free apps since we only build apps that are free to download and isntall. Since the data set contains free and non-free apps, we will remove the non-free apps. 

In [17]:
check_final_iOS = []
check_final_android = []

for app in check_english_iOS:
    price = app[4]
    if price == '0.0':
        check_final_iOS.append(app)
        
for app in check_english_android:
    price = app[7]
    if price == '0':
        check_final_android.append(app)
        
print('New Data in iOS (English and Free Apps):', len(check_final_iOS))
print('\n')
print('New Data in Google Playstore (English and Apps):', len(check_final_android))

New Data in iOS (English and Free Apps): 3222


New Data in Google Playstore (English and Apps): 8864


## Most Common Apps by Genre: Part 1

So we learned how to:
- Remove inaccurate data
- Remove duplicate app entries
- Remoce non-English apps
- Isolate the free apps

Earlier, it is stated that the goal is to identify the most commonly used apps in both app profile because the revenue will likely go up if it will be identified. The idea now is to build frequency tables for a few column in the data sets.

The columns that can be useful in generating frequency tables are `prime_genre` for App Store data set and `Genres` and `Category` for Google Playstore data set.

## Most Common Apps by Genre: Part Two

Now, we will build two functions that we can use in analyzing the frequency tables:
- One function to generate frequency tables that shows percentages.
- Another function we can use to display the percentages in a descending order.

The built-in `sorted()` function will be used to return a list of the elemetns of that iterable sorted in ascending or descending order. It works well in list of tuples (contains a dictionary key along with its corresponding dictionary value). The dictionary value comes first followed by the dictionary key.

In [27]:
# For prime_genre(iOS)

def freq_table(dataset, index):
    table = {}
    total = 0
    
    for app in dataset:
        total += 1
        value = app[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key]/total) * 100
        table_percentages[key] = percentage
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0]) 

## Most Common Apps by Genre: Part Three

For this part, we will analyze the calculated data.

As we can see here, the most common `prime_genre` in iOS is *Games* with 58.16%. Next to it is the *Entertainment* with 7.88%. Games was way more ahead compared in other genres. After that, percentage gaps were small. We can also observe that most apps that are popular here are designed for entertainment. It is highly recommended to look out for apps that are designed for entertainment. 

In [19]:
display_table(check_final_iOS, 11) # prime_genre (iOS)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


As we can see in the result below, the most common genres for `Genres` in Google Play data set was *Tools* with 8.45%. Next to it was *Entertainment* with 6.07%. Although, we can observe that there is a slight difference between them unlike with Games and Entertainment in Apple Store. For `Category` part, *Family* got the highest percentage with 18.91%. Second was *Game* with 9.72%. Overall, we should focus with apps that are designed for Games and Entertainment since they got a place both in Apple Store and Google Playstore.

In [20]:
display_table(check_final_android, 9) # Genres (Google Playstore)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [21]:
display_table(check_final_android, 1) # Category (Google Playstore)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

## Most Popular Apps by Genre on the App Store

In this section, we will identify the most popular apps on the App Store. Earlier, we learned that the genre that dominate in App Store was mostly designed for fun. Now, to know what is the most popular app, total number of user ratings will be considered. It can be accessed to `rating_count_tot` app.

What we will do here:
- Isolate the apps of each genre.
- Sum up the user ratings for the apps of that genre.
- Divide the sum by the number of apps belonging to that genre (not by the total number of apps).

For this one, we get the average number of user ratings. 

In [22]:
prime_genre = freq_table(check_final_iOS, -5) # prime_genre (iOS)

for genre in prime_genre:
    total = 0
    len_genre = 0
    
    for app in check_final_iOS:
        genre_app = app[-5]
        if genre_app == genre:
            user_rating = float(app[5])
            total += user_rating
            len_genre += 1
            
    avg_num_user_rating = total/len_genre    
    print(genre, ": ", avg_num_user_rating)

Social Networking :  71548.34905660378
Photo & Video :  28441.54375
Games :  22788.6696905016
Music :  57326.530303030304
Reference :  74942.11111111111
Health & Fitness :  23298.015384615384
Weather :  52279.892857142855
Utilities :  18684.456790123455
Travel :  28243.8
Shopping :  26919.690476190477
News :  21248.023255813954
Navigation :  86090.33333333333
Lifestyle :  16485.764705882353
Entertainment :  14029.830708661417
Food & Drink :  33333.92307692308
Sports :  23008.898550724636
Book :  39758.5
Finance :  31467.944444444445
Education :  7003.983050847458
Productivity :  21028.410714285714
Business :  7491.117647058823
Catalogs :  4004.0
Medical :  612.0


For this one, we will consider *Music* as an app profile recommendation for the App Store. Below shows that Pandora - Music & Radio has the highest number of reviews for Music. 

In [23]:
for app in check_final_iOS:
    if app[11] == 'Music':
        print(app[1], ": ", app[5]) # name and number of ratings

Pandora - Music & Radio :  1126879
Spotify Music :  878563
Shazam - Discover music, artists, videos & lyrics :  402925
iHeartRadio – Free Music & Radio Stations :  293228
SoundCloud - Music & Audio :  135744
Magic Piano by Smule :  131695
Smule Sing! :  119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music :  110420
Amazon Music :  106235
SoundHound Song Search & Music Player :  82602
Sonos Controller :  48905
Bandsintown Concerts :  30845
Karaoke - Sing Karaoke, Unlimited Songs! :  28606
My Mixtapez Music :  26286
Sing Karaoke Songs Unlimited with StarMaker :  26227
Ringtones for iPhone & Ringtone Maker :  25403
Musi - Unlimited Music For YouTube :  25193
AutoRap by Smule :  18202
Spinrilla - Mixtapes For Free :  15053
Napster - Top Music & Radio :  14268
edjing Mix:DJ turntable to remix and scratch music :  13580
Free Music - MP3 Streamer & Playlist Manager Pro :  13443
Free Piano app by Yokee :  13016
Google Play Music :  10118
Certified Mixtapes - Hip Hop Albums & Mixtapes :  997

## Most Popular Apps by Genre on Google Play

For this section, we will evaluate the most popular apps by Genre on Google Play. Although, the problem is that we cannot clearly see the install numbers since most values are open-ended. Refer to the output below.

In [24]:
display_table(check_final_android, 5) # the Installs columns

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


For this code we determine the average number of installs in Google Playstore. 

In [25]:
category_android = freq_table(check_final_android, 1) # Category (Google Playstore)

for category in category_android:
    total = 0
    len_category = 0
    
    for app in check_final_android:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
            
    avg_num_install = total/len_category    
    print(category, ": ", avg_num_install)

ART_AND_DESIGN :  1986335.0877192982
AUTO_AND_VEHICLES :  647317.8170731707
BEAUTY :  513151.88679245283
BOOKS_AND_REFERENCE :  8767811.894736841
BUSINESS :  1712290.1474201474
COMICS :  817657.2727272727
COMMUNICATION :  38456119.167247385
DATING :  854028.8303030303
EDUCATION :  1833495.145631068
ENTERTAINMENT :  11640705.88235294
EVENTS :  253542.22222222222
FINANCE :  1387692.475609756
FOOD_AND_DRINK :  1924897.7363636363
HEALTH_AND_FITNESS :  4188821.9853479853
HOUSE_AND_HOME :  1331540.5616438356
LIBRARIES_AND_DEMO :  638503.734939759
LIFESTYLE :  1437816.2687861272
GAME :  15588015.603248259
FAMILY :  3695641.8198090694
MEDICAL :  120550.61980830671
SOCIAL :  23253652.127118643
SHOPPING :  7036877.311557789
PHOTOGRAPHY :  17840110.40229885
SPORTS :  3638640.1428571427
TRAVEL_AND_LOCAL :  13984077.710144928
TOOLS :  10801391.298666667
PERSONALIZATION :  5201482.6122448975
PRODUCTIVITY :  16787331.344927534
PARENTING :  542603.6206896552
WEATHER :  5074486.197183099
VIDEO_PLAYERS 

For app profile recommendation in Google Play, I will consider now the category *SOCIAL* gaining an average number of installs of 23253652.127118643. We can see below that Facebook contributed 1,000,000,000+ installs from the social category.

In [26]:
for app in check_final_android:
    if app[1] == 'SOCIAL':
        print(app[0], ": ", app[5]) # name and number of installs

Facebook :  1,000,000,000+
Facebook Lite :  500,000,000+
Tumblr :  100,000,000+
Social network all in one 2018 :  100,000+
Pinterest :  100,000,000+
TextNow - free text + calls :  10,000,000+
Google+ :  1,000,000,000+
The Messenger App :  1,000,000+
Messenger Pro :  1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus :  1,000,000+
Telegram X :  5,000,000+
The Video Messenger App :  100,000+
Jodel - The Hyperlocal App :  1,000,000+
Hide Something - Photo, Video :  5,000,000+
Love Sticker :  1,000,000+
Web Browser & Fast Explorer :  5,000,000+
LiveMe - Video chat, new friends, and make money :  10,000,000+
VidStatus app - Status Videos & Status Downloader :  5,000,000+
Love Images :  1,000,000+
Web Browser ( Fast & Secure Web Explorer) :  500,000+
SPARK - Live random video chat & meet new people :  5,000,000+
Golden telegram :  50,000+
Facebook Local :  1,000,000+
Meet – Talk to Strangers Using Random Video Chat :  5,000,000+
MobilePatrol Public Safety App :  1,000,000+
💘 Whats

## Conclusion

The Fundamentals of Python are used in this guided project. Data cleaning has been implemented. First thing is to delete the wrong data. Second, we remove duplicate entries from Google Playstore. Third, we  remove non-english apps. Next, consider only the free apps. After all these steps, we now identify the most used apps in Apple Store and Google Playstore. Overall, I can say that apps that are most attractive to users are the ones that can entertain them (such as games, music, and social media apps).