# Project 1: Popular Apps
## Python (Fundamentals)

In this project, we will analyze most popular genre/category on both free and paid apps on apple store and google play store using Python.

Datasets used in this project :
* [Apple Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/version/2)
* [Google Play Store](https://www.kaggle.com/lava18/google-play-store-apps)



We will first import data and use a simple function to peek to information of of the datasets :

In [1]:
from csv import reader

In [2]:
opened_file = open('AppleStore.csv',encoding='utf8')
read_file = reader(opened_file)
apple = list(read_file)

opened_file = open('googleplaystore.csv',encoding='utf8')
read_file = reader(opened_file)
google = list(read_file)

In [3]:
def peek_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
peek_data(apple,1,3,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7198
Number of columns: 16


In [5]:
peek_data(google,1,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


We will start by cleaning the datas first, by deleting rows with missing datas :

In [6]:
def del_missing_datas(dataset,exist=False):
    for row in dataset[1:]:
        if len(row)!=len(dataset[0]):
            exist=True
            missing=dataset.index(row)
            print('Deleted Row:',missing)
            del dataset[missing]
    if exist==False:
        print('No Deleted Row')

In [7]:
del_missing_datas(apple)

No Deleted Row


In [8]:
del_missing_datas(google)

Deleted Row: 10473


Previously knwon that the google play store datasets have multiple rows with the same name, so we will only select the row with the most reviews :

In [9]:
google_unique=[]

In [10]:
def clean_duplicates(old_list,clean_list):
    apps=['Header']
    clean_list.append(old_list[0])
    for row in old_list[1:]:
        if row[0] not in apps:
            clean_list.append(row)
            apps.append(row[0])      
        elif int(row[3])>int(clean_list[apps.index(row[0])][3]):
            clean_list[apps.index(row[0])]=row
    
    print('Number of Rows:',len(clean_list))

In [11]:
clean_duplicates(google,google_unique)

Number of Rows: 9660


We will also only select English Apps (Name with less than 3 non-english characters) :

In [114]:
eng_apple=[]
eng_google=[]

In [13]:
def select_eng(old_list,eng_list,name):
    eng_list.append(old_list[0])
    for row in old_list[1:]:
        count=0
        for char in row[name]:
            if ord(char)>127:
                count+=1
        if count<=3:
            eng_list.append(row)
    
    print('Number of Rows:',len(eng_list))

In [14]:
select_eng(apple,eng_apple,name=1)

Number of Rows: 6184


In [15]:
select_eng(google_unique,eng_google,name=0)

Number of Rows: 9615


As we finished cleaning the data, we will start seperating the free and paid apps :

In [16]:
free_apple=[]
free_google=[]
nfree_apple=[]
nfree_google=[]

In [17]:
def free_apps(old_list,free_list,nfree_list,price):
    free_list.append(old_list[0])
    nfree_list.append(old_list[0])
    for row in old_list[1:]:
        if row[price]=='0.0' or row[price]=='0':
            free_list.append(row)
        else:
            nfree_list.append(row)
    
    print('Number of Rows (free):',len(free_list))
    print('Number of Rows (non_free):',len(nfree_list))

In [18]:
free_apps(eng_apple,free_apple,nfree_apple,4)

Number of Rows (free): 3223
Number of Rows (non_free): 2962


In [19]:
free_apps(eng_google,free_google,nfree_google,7)

Number of Rows (free): 8865
Number of Rows (non_free): 751


Finally, using functions below, we will analyze and conclude the percentage of most installed/reviewed apps by genres :

In [105]:
def insight_table(dataset,index,tot):
    d1={}
    for row in dataset[1:]:
        x=int((row[tot].replace(',','')).replace('+',''))
        if row[index] in d1:
            d1[row[index]]+=int((row[tot].replace(',','')).replace('+',''))
        else:
            d1[row[index]]=int((row[tot].replace(',','')).replace('+',''))
     
    return d1

In [106]:
def display_table(dataset, index, tot):
    table = insight_table(dataset, index, tot)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    totsum = sum([row[0] for row in table_sorted])
    for entry in table_sorted:
        print(entry[1], ':', round(((entry[0]*100)/totsum),2),'%')

Result for free apps on apple store:

In [107]:
display_table(free_apple,-5,5)

Games : 53.39 %
Social Networking : 9.48 %
Photo & Video : 5.69 %
Music : 4.73 %
Entertainment : 4.46 %
Shopping : 2.83 %
Sports : 1.98 %
Health & Fitness : 1.89 %
Utilities : 1.89 %
Weather : 1.83 %
Reference : 1.69 %
Productivity : 1.47 %
Finance : 1.42 %
Travel : 1.41 %
News : 1.14 %
Food & Drink : 1.08 %
Lifestyle : 1.05 %
Education : 1.03 %
Book : 0.7 %
Navigation : 0.65 %
Business : 0.16 %
Catalogs : 0.02 %
Medical : 0.0 %


Result for non free apps on apple store:

In [108]:
display_table(nfree_apple,-5,5)

Games : 80.16 %
Photo & Video : 3.61 %
Entertainment : 3.28 %
Health & Fitness : 2.11 %
Productivity : 1.99 %
Music : 1.54 %
Education : 1.48 %
Utilities : 1.38 %
Business : 1.15 %
Weather : 1.05 %
Reference : 0.66 %
News : 0.43 %
Lifestyle : 0.34 %
Navigation : 0.2 %
Book : 0.1 %
Travel : 0.1 %
Finance : 0.09 %
Food & Drink : 0.08 %
Medical : 0.08 %
Sports : 0.07 %
Social Networking : 0.06 %
Shopping : 0.02 %
Catalogs : 0.01 %


Result for free apps on google play store:

In [112]:
display_table(free_google,1,5)

GAME : 17.86 %
COMMUNICATION : 14.67 %
TOOLS : 10.77 %
FAMILY : 8.23 %
PRODUCTIVITY : 7.7 %
SOCIAL : 7.29 %
PHOTOGRAPHY : 6.19 %
VIDEO_PLAYERS : 5.22 %
TRAVEL_AND_LOCAL : 3.85 %
NEWS_AND_MAGAZINES : 3.15 %
BOOKS_AND_REFERENCE : 2.21 %
PERSONALIZATION : 2.03 %
SHOPPING : 1.86 %
HEALTH_AND_FITNESS : 1.52 %
SPORTS : 1.46 %
ENTERTAINMENT : 1.31 %
BUSINESS : 0.93 %
MAPS_AND_NAVIGATION : 0.67 %
LIFESTYLE : 0.66 %
FINANCE : 0.6 %
WEATHER : 0.48 %
FOOD_AND_DRINK : 0.28 %
EDUCATION : 0.25 %
DATING : 0.19 %
ART_AND_DESIGN : 0.15 %
HOUSE_AND_HOME : 0.13 %
AUTO_AND_VEHICLES : 0.07 %
LIBRARIES_AND_DEMO : 0.07 %
COMICS : 0.06 %
MEDICAL : 0.05 %
PARENTING : 0.04 %
BEAUTY : 0.04 %
EVENTS : 0.02 %


Result for non free apps on google play store:

In [113]:
display_table(nfree_google,1,5)

FAMILY : 36.87 %
GAME : 36.61 %
PERSONALIZATION : 5.68 %
PHOTOGRAPHY : 3.28 %
TOOLS : 3.01 %
PRODUCTIVITY : 2.46 %
COMMUNICATION : 2.37 %
SPORTS : 2.17 %
LIFESTYLE : 2.06 %
WEATHER : 1.42 %
MEDICAL : 0.98 %
HEALTH_AND_FITNESS : 0.83 %
BUSINESS : 0.37 %
ENTERTAINMENT : 0.35 %
FINANCE : 0.32 %
TRAVEL_AND_LOCAL : 0.32 %
MAPS_AND_NAVIGATION : 0.21 %
EDUCATION : 0.18 %
VIDEO_PLAYERS : 0.12 %
FOOD_AND_DRINK : 0.1 %
PARENTING : 0.09 %
AUTO_AND_VEHICLES : 0.09 %
BOOKS_AND_REFERENCE : 0.04 %
ART_AND_DESIGN : 0.03 %
DATING : 0.02 %
SHOPPING : 0.02 %
SOCIAL : 0.01 %
NEWS_AND_MAGAZINES : 0.01 %
LIBRARIES_AND_DEMO : 0.0 %
EVENTS : 0.0 %


End. Thankyou!