# Application Market Analysis

In the following project, we will be analysing different mobile application stores like App Store and Google Play to deduce the type of application which will gain more user base, preferably for an ad-based application deployment.

### Step 1 : Creating a function to explore data sets

In [4]:
def explore_data(dataset, start, end , rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') #To print a blank line between rows
    if rows_and_columns:
        print('Length of row is : ', len(dataset)) #Number of rows
        print('Length of column is : ', len(dataset[0])) #Number of columns

### Step 2 : Reading our data sets using the function
Reading the data from our created explore_data() and printing few lines of it.

In [8]:
open_apple_data = open('AppleStore.csv') #Open Apple dataset to read
open_google_data = open('googleplaystore.csv') #Open Google Play Store to read

from csv import reader #Open a reader to read those dataset

read_apple_data = reader(open_apple_data) #Reading the AppleStore.csv
read_google_data = reader(open_google_data) #Reading the googleplaystore.csv

apple_data = list(read_apple_data) #Store AppleStore data in a list
google_data = list(read_google_data) #Store googleplaystore data in a list

explore_data(apple_data, 1 ,5, True) #Function call - explore_data
print('\n\n\n') # 3 blank lines
explore_data(google_data, 1, 5, True) #Function call - explore_data

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Length of row is :  7198
Length of column is :  16




['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite 

### Step 3 : Explore the header columns of dataset

- To know more about AppleStore.csv, visit -> [Link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)
- To know more about Googleplaystore.csv, visit -> [Link](https://www.kaggle.com/lava18/google-play-store-apps/home)

In [12]:
explore_data(apple_data,0,1) # 0,1 means printing rows from 0 to (before) 1
print('\n')
explore_data(google_data,0,1)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']




['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']




### Step 4 : Data Cleaning

- Deleting rows with missing column values

In [15]:
print(google_data[10473]) #Missisng rating column(Source:Forums)
del google_data[10473] #Deleted that row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


- Finding rows with duplicate entries

In [16]:
duplicate_apps = [] #List to store duplicate app name
unique_apps = [] #List to store unique app name

#Function to add app to appropriate list
for row in google_data: 
    name = row[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps : ', len(duplicate_apps)) 
print('Name of duplicate apps : ', duplicate_apps)

Number of duplicate apps :  1181
Name of duplicate apps :  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express', 'Accounting App - Zoho Books', 'Invoice & Time Tracking - Zoho', 'join.me - Simple Meetings', 'Invoice 2go — Professional Invoices and Estimates', 'SignEasy | Sign and Fill PDF and other Documents', 'Quick PDF Scanner + OCR FREE', 'Genius Scan - PDF Scanner', 'Tiny Scanner - PDF Scanner App', 'Fast Scanner : Free PDF Scan', 'Mobile Doc Scanner (MDScan) Lite', 'TurboScan: scan documents and receipts in PDF', 'Tiny Scanner Pro: PDF Doc Scan', 'Docs To Go™ Fr

   - Since, only the latest data of an application is required, we will be deleting other older entries, by the use of column "rating_count_tot" which shows number of users who rated the app. 

In [30]:
rating_dict = {} #New dictionary {Name,Reviews}

#If app is in list but with low reviews, it will be replaced with new rating
#else if not present, it will be added
for row in google_data[1:]:
    name = row[0]
    rating = float(row[3])
    if name in rating_dict:
        if rating_dict[name]>rating:
            rating_dict[name] = rating
    else:
        rating_dict[name] = rating
        
android_clean = [] #Only has unique entry for each app (cleaned dataset)
already_added = [] #(names)To prevent already added apps from being added again

# If name is not added and it has highest reviews, add it to both list
for row in google_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if (rating_dict[name] == n_reviews) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)
        
print(len(android_clean))

9659


- Removing non-english named apps. Can be done by removing apps which have 
names beyond ASCII value 0 to 127.

In [51]:
#Takes name as input and if any of the character has ASCII > 127, return false
def check_name(name):
    count = 0
    #Loop through each character of name
    for char in name:
        if (ord(char)>127):
            count += 1
        if count>3:
            return False
        
    return True

noeng_google=[]
noeng_apple=[]

#Removing non-english names
for row in google_data:
    name = row[0]
    if check_name(name):
        noeng_google.append(row)
        
for row in apple_data:
    name = row[1]
    if check_name(name):
        noeng_apple.append(row) 
        

- Since, we only need free apps, paid apps should be removed

In [74]:
#Removing paid apps
final_google=[]
final_apple=[]

#App whose Type is Free
for row in noeng_google[1:]:
    if row[6] == 'Free':
        final_google.append(row)
        
#App whose price is 0.0       
for row in noeng_apple[1:]:
    if float(row[4]) == 0.0:
        final_apple.append(row)
        

[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'], ['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1'], ['282935706', 'Bible', '92774400', 'USD', '0.0', '985920', '5320', '4.5', '5.0', '7.5.1', '4+', 'Reference', '37', '5', '45', '1'], ['553834731', '

# Step 5 : Creating frequency of genres
Because our end goal is to add the app on both the App Store and Google Play, we need to find app genre that are successful on both markets.

In [55]:
google_genre_freq={}
apple_genre_freq={}

for row in final_apple[1:]:
    genre = row[11]
    if genre in apple_genre_freq:
        apple_genre_freq[genre] += 1
    else:
        apple_genre_freq[genre] = 1
        
for row in final_google[1:]:
    genre = row[9]
    if genre in google_genre_freq:
        google_genre_freq[genre] += 1
    else:
        google_genre_freq[genre] = 1
        
print(google_genre_freq)
print(apple_genre_freq)

{'Parenting': 44, 'Casual;Music & Video': 2, 'Educational;Pretend Play': 14, 'Casual;Pretend Play': 25, 'Music;Music & Video': 2, 'Card;Action & Adventure': 1, 'Tools': 763, 'Health & Fitness;Action & Adventure': 1, 'Board;Brain Games': 8, 'Casual;Education': 2, 'Video Players & Editors;Creativity': 2, 'Arcade;Action & Adventure': 12, 'Weather': 74, 'Word': 29, 'Beauty': 53, 'Food & Drink': 125, 'Dating': 227, 'Entertainment;Education': 1, 'Arcade': 200, 'Health & Fitness': 325, 'Entertainment;Action & Adventure': 3, 'Libraries & Demo': 84, 'Education;Action & Adventure': 4, 'Simulation;Education': 1, 'Comics': 58, 'Books & Reference': 199, 'Parenting;Music & Video': 6, 'News & Magazines': 277, 'Communication': 359, 'Board;Action & Adventure': 2, 'Productivity': 395, 'Role Playing;Pretend Play': 5, 'Lifestyle': 358, 'Social': 292, 'Shopping': 257, 'Parenting;Education': 7, 'Strategy;Creativity': 1, 'Business': 445, 'Music & Audio;Music & Video': 1, 'Card;Brain Games': 1, 'Travel & Loca

# Step 6 : Getting most installed apps
Now, sorting the frequency table in descending order to know most popular app

In [86]:
#A function doing the same above task of calculating frequency
def freq_table(dataset,index):
    freq={}
    for row in dataset:
        if row[index] in freq:
            freq[row[index]] += 1
        else:
            freq[row[index]] = 1
            
        #freq[row[index]] = (freq[row[index]]/sum(freq[row[index]])) * 100
        #print(freq)
    return freq

#sorting frequency in descending order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])           
        
display_table(final_apple,11)
print('\n')
display_table(final_google,9)

Games : 1874
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


Tools : 763
Entertainment : 600
Education : 513
Business : 445
Productivity : 395
Sports : 374
Communication : 359
Lifestyle : 358
Medical : 354
Finance : 349
Action : 341
Health & Fitness : 325
Photography : 312
Personalization : 308
Social : 292
News & Magazines : 277
Shopping : 257
Travel & Local : 245
Dating : 227
Arcade : 200
Books & Reference : 199
Simulation : 188
Casual : 184
Video Players & Editors : 168
Maps & Navigation : 130
Food & Drink : 125
Puzzle : 121
Racing : 95
Strategy : 92
House & Home : 88
Role Playing : 87
Libraries & Demo : 84
Auto & Vehicles : 82
Weather : 74
Events : 63
Adventure : 62
Comics : 58
Art & Design : 54
Beaut

# Step 7 : Calculating average install for AppleStore
We want a category with most app installation on average, which we get by dividing each genre's total user ratings by the number of apps.

In [92]:
apple_genre = freq_table(final_apple,11)

for genre in apple_genre:
    total_apple_rating=0
    apple_len_genre=0
    for row in final_apple:
        if row[11] == genre:
            total_apple_rating += float(row[5])
            apple_len_genre += 1
    total_apple_rating /= apple_len_genre
    print(genre,' ',total_apple_rating)
            

Medical   612.0
Catalogs   4004.0
News   21248.023255813954
Social Networking   71548.34905660378
Lifestyle   16485.764705882353
Shopping   26919.690476190477
Utilities   18684.456790123455
Education   7003.983050847458
Health & Fitness   23298.015384615384
Book   39758.5
Photo & Video   28441.54375
Productivity   21028.410714285714
Games   22788.6696905016
Travel   28243.8
Entertainment   14029.830708661417
Food & Drink   33333.92307692308
Navigation   86090.33333333333
Weather   52279.892857142855
Finance   31467.944444444445
Sports   23008.898550724636
Reference   74942.11111111111
Music   57326.530303030304
Business   7491.117647058823


# Step 7 : Calculating average install for PlayStore
We want a category with most app installation on average, which we get by dividing each genre's total user ratings by the number of apps. But, in playstore total ratings are given as 1,000,000+. hence, we need to remove + and , signs and treat the number as the exact installation no. For Example, 1,000,000+ installation = 1000000 installation

In [108]:
google_genre = freq_table(final_google,1)

for genre in google_genre:
    total_google_install = 0
    google_len_genre = 0
    for row in final_google:
        if row[1] == genre:
            google_len_genre += 1
            install = row[5]
            install = install.replace('+','')
            install= install.replace(',','')
            total_google_install += float(install)
    total_google_install /= google_len_genre
    print(genre,'',total_google_install)

BEAUTY  513151.88679245283
PHOTOGRAPHY  32321374.407051284
GAME  33111302.596789423
NEWS_AND_MAGAZINES  27058831.263537906
HEALTH_AND_FITNESS  4869225.852307692
MEDICAL  147563.28813559323
DATING  1164270.7356828193
EVENTS  253542.22222222222
BUSINESS  2250454.1348314607
SOCIAL  48184458.56849315
BOOKS_AND_REFERENCE  9655197.28643216
AUTO_AND_VEHICLES  647317.8170731707
SPORTS  4860918.563888889
FOOD_AND_DRINK  2190710.008
EDUCATION  5760596.026490066
ENTERTAINMENT  19516734.69387755
SHOPPING  12637504.221789883
FINANCE  2511355.6790830945
PERSONALIZATION  7533233.402597402
COMMUNICATION  90935671.86908078
ART_AND_DESIGN  2038050.8196721312
PRODUCTIVITY  35885137.50379747
TRAVEL_AND_LOCAL  27921561.32520325
LIBRARIES_AND_DEMO  749950.119047619
COMICS  950443.220338983
LIFESTYLE  1479956.6267409471
FAMILY  5787370.152887883
MAPS_AND_NAVIGATION  5569698.307692308
VIDEO_PLAYERS  36599010.11764706
HOUSE_AND_HOME  1917187.0568181819
PARENTING  542603.6206896552
TOOLS  14988276.79842932
WEAT

# Conclusion

By analysing most installed App Store genre (Step-6) and Google Play Store genre (Step-7), we reach to a conclusion that Photography or Navigation genre are ideal for an ad-revenue based application.