# Google Play and App Store Analysis

I am an analyst that works with a company that creates android and IOS apps. It is free to download their apps so they make their profit through ads. Meaning this company's profitability is based on the amount of people visit the game.

The goal of this project is about finding out which apps attract the most consumers by studying data from Google Play and the App store.

In [0]:
from csv import reader

In [0]:
## The App Store Dataset
file = open('AppleStore.csv')
file_reader = reader(file)
ios = list(file_reader)
ios_header = ios[0]
ios = ios[1:]


## The Google Play Dataset
file = open('googleplaystore.csv')
file_reader = reader(file)
andrd = list(file_reader)
andrd_header = andrd[0]
andrd = andrd[1:]


FileNotFoundError: ignored

Now that I have uploaded the dataset into Python, let us find out a little more about them

In [0]:
## Created a function that allows us to breakdown the dataset
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [0]:
explore_data(ios, 0, 3 , rows_and_columns=True)
print('\n')
ios_header

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16




['id',
 'track_name',
 'size_bytes',
 'currency',
 'price',
 'rating_count_tot',
 'rating_count_ver',
 'user_rating',
 'user_rating_ver',
 'ver',
 'cont_rating',
 'prime_genre',
 'sup_devices.num',
 'ipadSc_urls.num',
 'lang.num',
 'vpp_lic']

Here we see that our App Store dataset contains 7197 records and 16 variables. Some variables that will help during our analysis are 'track_name', 'price', 'rating_count_tot', and user_rating.

In [0]:
explore_data(andrd, 0, 3, rows_and_columns=True)
print('\n')
andrd_header

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13




['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

The Google Play dataset has 10841 records and 13 variables. A couple of variables that will be useful during our analysis are 'App', 'Category', 'Rating', and 'Price'.

# Data Clean Up

Now that we have imported our datasets. Let us clean it up a bit. First, I am going to search for missing records. There is a discussion board online for our Google Play dataset. One of the comments speak about an error that occurs at index 10472.

In [0]:
print(andrd[10472])
print("\n")
andrd_header

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']




['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

If you look at the 'Category' variable, you will see it says 1.9 which makes no sense. It seems that this is suppose to be the rating but everything shifted to account for the missing variable. Lets go ahead and delete this record.

In [0]:
del(andrd[10472])

In [0]:
andrd[10472]

['osmino Wi-Fi: free WiFi',
 'TOOLS',
 '4.2',
 '134203',
 '4.1M',
 '10,000,000+',
 'Free',
 '0',
 'Everyone',
 'Tools',
 'August 7, 2018',
 '6.06.14',
 '4.4 and up']

# Deleting Duplicates

Another comment pointed out that Instagram has a couple of duplicates in this dataset so lets check it out.

In [0]:
for app in andrd:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Whoaaa, there are more Instagram duplicates than I thought there would be. Just to be safe lets check the whole dataset for duplicates.


In [0]:
duplicate_apps = []
unique_apps = []

for app in andrd:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:10])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Looks like we have 1181 duplicates in the Google Play dataset. We're gonna need to remove those before we move forward. We could just randomly choose which duplicate to keep, but i feel there is a more logical approach. If you look at the fourth variable [index 3] for each record,ti shows a count of all the reviews submitted. Which means the higher the count, the newer the entry is. We shall use this criteria to remove all the older duplicates and retain the newest.

In [0]:
reviews_max = {}

for app in andrd:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

What I have done here can be explained in two steps:

  1. I created a dictionary with unique names of app and 
  associated them with their number of reviews.
    
  2. Then I used a loop to place in the record which has the 
  highest number of reviews 

We had 1181 duplicates. If we subtract that from the count of our dataset, we should be able to verify if things went as expected.

In [0]:
print(len(andrd) - 1181)        
print(len(reviews_max))

9659
9659


Now that we have set ourselves up. Let's actually remove the duplicates from the Google Play dataset.

In [0]:
andrd_clean = []
already_added = []

for app in andrd:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        andrd_clean.append(app)
        already_added.append(name)

Using a loop I have created a new dataset free of duplicates! Let's explore this new dataset and verify the length of the dataset to ensure I did everything right.

In [0]:
explore_data(andrd_clean, 0, 3, True)
print("\n")
print(len(reviews_max))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


9659


# Deleting Non-English Apps

Some of the apps in our datasets have are from different countries. Here are some examples.

In [0]:
print(ios[813][1])
print(ios[6731][1])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


Our company produces apps for an english speaking market so we would like to analyze data towards an english speaking crowd. To that purpose, I'm going to create a function that can differentiate between English and Non-English text.

In [0]:
def is_english(string):
    
    for character in string:   
        if ord(character ) > 127:
            return False
    
    return True

In [0]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


Looks like we've done it but any functions with emojis or special character are actually counted as False.

In [0]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

False
False


If you use the function I have just created, we could lose valuable data. To minimize the impact of data loss, we will have to create a function that looks for at least 3 non-English characters before removing it from the list.

In [0]:
def is_english(string):
    non_english = 0
    
    for char in string:
        if ord(char) > 127:
            non_english += 1
    if non_english > 3:
        return False
    else:
        return True

In [0]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False


The edit worked! Now lets use our function to identify and remove all the records we dont want from our datasets.

In [0]:
andrd_cleaner = []
ios_clean = []

for app in andrd_clean:
    name = app[0]
    
    if is_english(name):
        andrd_cleaner.append(app)

explore_data(andrd_cleaner, 0, 3, True)
print('\n')

for app in ios:
    name = app[1]
    
    if is_english(name):
        ios_clean.append(app)

explore_data(ios_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

# Removing Non-Free Apps

It seems that we have now reduced our Google Play and App Store dataset to 9614 records and 7197 records, respectively. Feels like a lot but we're almost done cleaning this dataset up. Just to recap, we have removed some inaccurate data, duplicate entries, and now we have removed all non-English apps


Now we need to remove all non-free apps from the dataset because we only build apps that are free and our main source of revenue is the in-app ads. Now lets isolate these free apps.

In [0]:
andrd_free = []
ios_free = []

for app in andrd_cleaner:
    price = (app[7])
    if price == '0':
        andrd_free.append(app)
        
for app in ios_clean:
    price = app[4]
    if price == '0.0':
        ios_free.append(app)
        

explore_data(andrd_free, 0, 1, True)
print('\n')
explore_data(ios_free, 0, 1, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


Number of rows: 3222
Number of columns: 16


# Analyzing the Dataset

We have finally cleaned the data to the best of our ability. Now its time to look for insights. Let's remember that our company makes its revenue through in-app ads. Our goal is to understand which kinds of apps attract more users because our revenue is influenced by the number of people who use the app.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

  1. Build a minimal Android version of the app, and add it to 
  Google Play.
  2. If the app has a good response from users, we develop it further.
  3. If the app is profitable after six months, we build an iOS 
  version of the app and add it to the App Store.
    
    
Since the end goal is to add the app to both Google Play and the App Store, we need to check for app profiles that are successful in both markets. Let's begin to analyze by trying to figure out what the most common genres are. We're going to look at the 'prime_genre' variable in the App Store. And the 'Genre' and 'Category' variable in the Google Play Store.

In [0]:
#ios 11
#andrd 1 9

def freq_table(dataset, index):
    freq_table = {}
    total = 0
    for app in dataset:
        total += 1
        genre = app[index]
        if genre in freq_table:
            freq_table[genre] += 1
        else:
            freq_table[genre] = 1
            
    freq_percent = {}
    for dic in freq_table:
        percentage = (freq_table[dic] / total) * 100
        freq_percent[dic] = percentage
        
    return freq_percent
        

In [0]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Above I have created two functions:
    
    1. The fuction returns a frequency table of percentages
    2. Reorganizes our results into descending order
    
Now we can see which genre of app occurs most often!

In [0]:
display_table(ios_free, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


With 58% of all free-English apps in its category, Gaming seems to the most common genre in our App Store dataset. Entertainment is in second place with 7.9%. It seems that most apps are designed for entertainment purposes. Now let's look at the Google Play dataset by examining both variables, Genres and Category.

In [0]:
display_table(andrd_free, 1) # Category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

At first glance, you can tell that these apps are more taillored towards productivity rather than entertainment like our last dataset. These free-English apps mostly consist of 19% Family-based apps. Close runner up is Gaming apps with almost 10%. It is safe to say that some of those family apps are also be games.

In [0]:
display_table(andrd_free, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

The genre section is the same as the Category section in regards to that the apps are still tailored towards productivity. These frequency tables give us an understanding, but not enough understanding to pull any genuine insights. So we need to take this a little further by analyzing apps are the most popular. 

# Most Popular Apps by Genre

We can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app. What we will be doing is calculating which genre has the most downloads. For the App Store, we'll go by which has the most ratings. Let's start by calculating the average number of user ratings per genre on the App Store.

In [0]:
table = freq_table(ios_free, 11)
for genre in table:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[11]
        if genre_app == genre:
            tot_user_ratings = float(app[5]) 
            total += tot_user_ratings
            len_genre += 1
    avg_user_rating = total / len_genre
    print("{} : {}".format(int(avg_user_rating), genre))
            


18684 : Utilities
21028 : Productivity
33333 : Food & Drink
14029 : Entertainment
4004 : Catalogs
26919 : Shopping
28441 : Photo & Video
57326 : Music
23298 : Health & Fitness
23008 : Sports
22788 : Games
74942 : Reference
31467 : Finance
7003 : Education
21248 : News
7491 : Business
28243 : Travel
39758 : Book
86090 : Navigation
52279 : Weather
71548 : Social Networking
16485 : Lifestyle
612 : Medical


Navigation, Social Networking, and Reference apps have the most user ratings among free-English apps. Based on my analysis here, I would recommend creating a Navigation app. However, Google and Waze have amazing products that pretty much corner the market, so a Navigation app is not the best route for us. Same with Social Networking, Facebook, Instagram, and Twitter have a large piece of the market. We can still make supplemental apps that help increase user experience on social media. Let's analyze our Google Play dataset before making any concrete decision though. Remember we need the app to be successful in both markets.


In [0]:
table= freq_table(andrd_free, 1)
for category in table:
    total = 0
    len_category = 0
    
    for app in andrd_free:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            installs = installs.replace('+', '')
            installs = int(installs.replace(',', '') )
            total += installs
            len_category += 1
            
    avg_user_rating = total / len_category
    print(int(avg_user_rating), ':', category )

513151 : BEAUTY
120550 : MEDICAL
3638640 : SPORTS
15588015 : GAME
3695641 : FAMILY
1331540 : HOUSE_AND_HOME
5201482 : PERSONALIZATION
1924897 : FOOD_AND_DRINK
1986335 : ART_AND_DESIGN
11640705 : ENTERTAINMENT
10801391 : TOOLS
13984077 : TRAVEL_AND_LOCAL
4188821 : HEALTH_AND_FITNESS
17840110 : PHOTOGRAPHY
38456119 : COMMUNICATION
253542 : EVENTS
16787331 : PRODUCTIVITY
24727872 : VIDEO_PLAYERS
8767811 : BOOKS_AND_REFERENCE
1833495 : EDUCATION
638503 : LIBRARIES_AND_DEMO
4056941 : MAPS_AND_NAVIGATION
854028 : DATING
817657 : COMICS
542603 : PARENTING
1712290 : BUSINESS
7036877 : SHOPPING
1387692 : FINANCE
23253652 : SOCIAL
1437816 : LIFESTYLE
5074486 : WEATHER
9549178 : NEWS_AND_MAGAZINES
647317 : AUTO_AND_VEHICLES


Video Players, Social, and Communication apps have the most downloads of all free-English apps in the Google Play dataset. Some differences here but also some similarities. The Navigation category is not on top like in our App Store dataset. This might be due to the fact that Google Maps come by default with android phones. Communication apps lead the Google Play with most average installs per app. Let's find out why real quick.

In [0]:
for app in andrd_free:
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

Facebook Messenger, Whatsapp, and Skype all have over a billion downloads which inflate the averages around it. We can actually see this same pattern for the Video Players Category 

In [0]:
for app in andrd_free:
    if app[1] == 'VIDEO_PLAYERS' and app[5] == '1,000,000,000+':
        print(app[0], ':', app[5])
    

YouTube : 1,000,000,000+
Google Play Movies & TV : 1,000,000,000+


Both the Google Play and App Store dataset have high numbers in regards to user interaction with Social Media. Like I stated previously, I believe a supplemental app that helps increase user experience on social media is a great option. For example, we can create an app that shows a user how they were feeling on a certain day. We would need to acquire the user's archive and use sentiment analysis to show the user their past, well, sentiments. There are many supplemental apps that are successful with over 1 million downloads such as Friendly for Facebook, TwitCasting Live, and EZ Video Download for Facebook.

In [0]:
for app in andrd_free:
    if app[1] == 'SOCIAL' and app[5] == '1,000,000+':
        print(app[0], ':', app[5])

The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Jodel - The Hyperlocal App : 1,000,000+
Love Sticker : 1,000,000+
Love Images : 1,000,000+
Facebook Local : 1,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, stickers and GIF : 1,000,000+
Family GPS tracker KidControl + GPS by SMS Locator : 1,000,000+
Moment : 1,000,000+
TwitCasting Live : 1,000,000+
Banjo : 1,000,000+
Frontback - Social Photos : 1,000,000+
Couple - Relationship App : 1,000,000+
B-Messenger Video Chat : 1,000,000+
FollowMeter for Instagram : 1,000,000+
pixiv : 1,000,000+
U LIVE – Video Chat & Stream : 1,000,000+
VMate Lite - Funny Short Videos Social Network : 1,000,000+
GUYZ - Gay Chat & Gay Dating : 1,000,000+
Snaappy – 3D fun AR core communication platform : 1,000,000+
Lesbian Chat & Dating - SPICY : 1,000,000+
BOO! - Next Generation Messenger : 1,000,000+
Wishbone - Compare Anything : 1,000,000+
Fiesta by Tango 

# Conclusion

Wow, what a journey right! 

We successfully cleaned our data by removing inaccuracies and duplicates. Along with removing non-English and non-free apps.

Then we started our analysis which brought us some useful insights. We found that creating an app that would support an existing social media platform could be a large success. 

There have been many companies who have been successful in this field which makes our probability of success greater.