# Analysing apps data from IOS and Google Play
 In this project we will analyze data from apps downloads data from IOS and Google play and analyze what kind of apps best suits and get more revenue
 
 Download links for the datasets: 
 1. [Apple IOS][1]
 2. [Google Play][2]
 
[2]:https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps   
[1]:https://www.kaggle.com/lava18/google-play-store-apps

1. Lets define a function `read_file` to provide the list of the lists

In [431]:
def read_file(filename):
    with open(filename) as f:
        from csv import reader
        all_data = list(reader(f))
    return all_data

2. Lets read appstore.csv and googleplaystore.csv files

In [432]:
ios = read_file('AppleStore.csv')
gplay = read_file('googleplaystore.csv')

3. Lets define a function to explore the data

In [433]:
def explore_data(data_set,start,end,print_rows_columns=False, print_header=False):
    data_slice = data_set[start:end]
    if print_header:
        print('Header:',data_set[0],sep='\n')
        
    print('\nData ' + f'from {start} to {end} rows:')
    for row in data_slice:
        print(row)
        
    if print_rows_columns:
        print('\nNumber of rows:' , len(data_set)-1)
        print('Number of columns:' , len(data_set[0]))
        


4. Lets explore the datasets we read into lists

In [434]:
explore_data(ios,2,5,True, True)

Header:
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

Data from 2 to 5 rows:
['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']
['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']
['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']

Number of rows: 7197
Number of columns: 16


In [435]:
explore_data(gplay,2,5,True, True)

Header:
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

Data from 2 to 5 rows:
['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']
['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']

Number of rows: 10841
Number of columns: 13


5. Lets delete the index 10473 from gplay dataset

In [436]:
print(gplay[10473], gplay[10474], len(gplay[10473]), len(gplay[10474]))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] ['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up'] 12 13


In [437]:
del gplay[10473]

In [438]:
print(gplay[0])
for index,app in enumerate(gplay):
    if app[0] == 'Instagram':
        print(index, app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
2546 ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
2605 ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
2612 ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
3910 ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [439]:
def find_non_english_names(data_set, index, charcount=3, charpct=None):
    english_app_names=[]
    non_english_app_names=[]
    data_set_cleaned = [data_set[0]]
    
    #lets use enumerate to get the index
    for i,row in enumerate(data_set[1:], start=1):
        app_name = row[index]
        
        #non english chars
        non_english_chars = [c for c in app_name if ord(c) > 127]
        
        # check the percentage of non-enlgish chars
        if charcount:
            if len(non_english_chars) > charcount:
                if app_name not in non_english_app_names:
                    non_english_app_names.append(app_name)
            else:
                data_set_cleaned.append(row)
                if app_name not in english_app_names:
                    english_app_names.append(app_name)
        elif charpct:
            if charpct <= (len(non_english_chars)/len(app_name))*100:
                if app_name not in non_english_app_names:
                    non_english_app_names[i]=app_name
            else:
                if app_name not in english_app_names:
                    english_app_names[i]=app_name
            
    # print the stats
    print(f'No. of Apps: {len(data_set[1:])}')
    print(f'No. of English Name Apps: {len(english_app_names)}')
    print(f'No. of Non-English Name Apps: {len(non_english_app_names)}')

    
    #lets return the both
    return non_english_app_names,english_app_names,data_set_cleaned     

In [440]:
ios_ne_names,ios_eng_names,ios_cleaned = find_non_english_names(ios,1)

No. of Apps: 7197
No. of English Name Apps: 6181
No. of Non-English Name Apps: 1014


In [441]:
len(ios_cleaned)

6184

In [442]:
gplay_ne_names,gplay_eng_names,gplay_cleaned = find_non_english_names(gplay,0)

No. of Apps: 10840
No. of English Name Apps: 9614
No. of Non-English Name Apps: 45


In [443]:
len(gplay_cleaned)

10796

6. Lets create a function to figure out dupes for the app names

In [444]:
def dupe_check_columns(data_set, col_index_list):
    if type(col_index_list) != list:
        col_index_list = [col_index_list]
    
    # lets create the list of lists for columns provided
    dupes = [{} for _ in col_index_list]
    unique = [{} for _ in col_index_list]
    
    for row_index, row in enumerate(data_set[1:], start=1):
        for col_num,index in enumerate(col_index_list):
            col_value = row[index]
            if col_value in unique[col_num]:
                if col_value in dupes[col_num]:
                    dupes[col_num][col_value].append(row_index)
                else:
                    dupes[col_num][col_value] = [unique[col_num][col_value][0], row_index]
            else:
                unique[col_num][col_value] = [row_index]
    
    return dupes, unique

In [445]:
def dedupe_columns(data_set, col_index, col_crit=None):
    # lets create the list of lists for columns provided
    dupes = {}
    unique = {}
    
    # lets loop thru and get all teh dupes
    for row_index, row in enumerate(data_set[1:], start=1):
        col_value = row[col_index]
        if col_value in unique:
            # check for the column crit
            if col_crit:
                col_crit_value = float(row[col_crit])
                
                if col_crit_value > unique[col_value][1]:
                    if col_value in dupes:
                        dupes[col_value].append(unique[col_value])
                    else:
                        dupes[col_value] = [unique[col_value]]
                    unique[col_value] = [row_index, col_crit_value]
                else:
                    if col_value in dupes:
                        dupes[col_value].append([row_index, col_crit_value])
                    else:
                        dupes[col_value] = [[row_index, col_crit_value]]
            else:
                if col_value in dupes:
                    dupes[col_value].append([row_index])
                else:
                    dupes[col_value] = [[row_index]]
        else:
            if col_crit:
                col_crit_value = float(row[col_crit])
                unique[col_value] =[row_index, col_crit_value]
            else:
                unique[col_value] = [row_index]
                                            
    dupe_list=[]        
    # lets a set of all dupes
    for dupe in dupes.values():
        for row in dupe:
            dupe_list.append(row[0])
  
    
    # no of dupes count
    print(f'No. of dupes:{len(dupe_list)}')
    
    # dedupes dataset
    dedupe = [row for index,row in enumerate(data_set, start=0) if index not in dupe_list]
    
    return dedupe


In [446]:
def print_dupes(data_set,gdupes,start=0,end=5):
    alldupes = []
    for dupes in list(gdupes[0].values())[start:end]:
        alldupes += dupes
    for index in alldupes:
        print(f'Index[{index}]: {data_set[index]}')

7. Find dupes for ios data set

In [447]:
dupes, unique = dupe_check_columns(ios_cleaned,1)

In [448]:
dupes

[{'Mannequin Challenge': [2901, 4350], 'VR Roller Coaster': [4329, 4700]}]

In [449]:
print_dupes(ios_cleaned,dupes)

Index[2901]: ['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
Index[4350]: ['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']
Index[4329]: ['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
Index[4700]: ['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


In [450]:
dedupe_ios = dedupe_columns(ios_cleaned,1)

No. of dupes:2


In [464]:
print(len(ios),len(ios_cleaned), len(dedupe_ios))

7198 6184 6182


In [452]:
print(dedupe_ios[4349],ios_cleaned[4350])

['625411864', 'Sproggiwood', '438601728', 'USD', '4.99', '105', '37', '4.5', '4.5', '1.2.10', '12+', 'Games', '40', '5', '1', '1'] ['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']


Below are the dupes for googleplay dataset on Appname

In [453]:
gdupes, gunique = dupe_check_columns(gplay_cleaned,0)

In [454]:
gdupes[0]['Box']

[205, 237, 266]

In [455]:
print_dupes(gplay_cleaned,gdupes,0,2)

Index[223]: ['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
Index[230]: ['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
Index[286]: ['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
Index[205]: ['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
Index[237]: ['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
Index[266]: ['Box

In [456]:
dedupe_gplay = dedupe_columns(gplay_cleaned,0,col_crit=3)

No. of dupes:1181


In [463]:
print(gplay_cleaned[0])
for index,app in enumerate(gplay_cleaned):
    if app[0] == 'Instagram':
        print(index, app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
2544 ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
2603 ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
2610 ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
3907 ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [462]:
print(dedupe_gplay[0])
for index,app in enumerate(dedupe_gplay):
    if app[0] == 'Instagram':
        print(index, app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
1915 ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [465]:
print(len(gplay), len(gplay_cleaned), len(dedupe_gplay))

10841 10796 9615


In [467]:
print(dedupe_gplay[:5])

[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'], ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']]


In [475]:
def find_free_apps(data_set,col_index):
    free_data_set=[data_set[0]]
    for row in data_set[1:]:
        if float(row[col_index].replace('$','')) == 0.0:
            free_data_set.append(row)
    
    return free_data_set

8. Find Free IOS apps

In [471]:
free_ios = find_free_apps(dedupe_ios,4)

In [472]:
print(len(ios),len(ios_cleaned), len(dedupe_ios),len(free_ios))

7198 6184 6182 3221


In [473]:
print(free_ios[:10])

[['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'], ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'], ['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624',

9. Find Free Google play apps

In [476]:
free_gplay = find_free_apps(dedupe_gplay,7)

In [477]:
print(len(gplay), len(gplay_cleaned), len(dedupe_gplay),len(free_gplay))

10841 10796 9615 8865


In [478]:
print(free_gplay[:10])

[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'], ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyon

10. Lets create a frequency table for the datasets

In [491]:
def freq_table(data_set, col_index):
    freq={}
    total = len(data_set[1:])
    for row in data_set[1:]:
        col = row[col_index]
        if col in freq:
            freq[col] += 100/total
        else:
            freq[col] = 100/total
            
    return freq

In [506]:
def display_table(dataset, index):
    if index:
        table = freq_table(dataset, index)
    else:
        table = dataset
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [493]:
prime_genre = freq_table(free_ios,11)
genres = freq_table(free_gplay,9)
category = freq_table(free_gplay,1)

In [494]:
display_table(free_ios,11)

Games : 58.13664596273457
Entertainment : 7.888198757764019
Photo & Video : 4.968944099378889
Education : 3.66459627329192
Social Networking : 3.2919254658385046
Shopping : 2.6086956521739095
Utilities : 2.5155279503105556
Sports : 2.14285714285714
Music : 2.0496894409937862
Health & Fitness : 2.0186335403726683
Productivity : 1.7391304347826066
Lifestyle : 1.5838509316770168
News : 1.3354037267080732
Travel : 1.2422360248447193
Finance : 1.1180124223602474
Weather : 0.8695652173913038
Food & Drink : 0.8074534161490678
Reference : 0.5590062111801242
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


11. Based on dataset we have for free ios english apps we can games as the highest number of apps \
exceeding far ahead than any other genres. The other genres are entertainment , photo and education \
as other generes

In [496]:
display_table(free_gplay,9)

Tools : 8.449909747292507
Entertainment : 6.069494584837599
Education : 5.34747292418777
Business : 4.591606498194979
Productivity : 3.8921480144404565
Lifestyle : 3.8921480144404565
Finance : 3.7003610108303455
Medical : 3.5311371841155417
Sports : 3.46344765342962
Personalization : 3.3167870036101235
Communication : 3.2378158844765483
Action : 3.1024368231047053
Health & Fitness : 3.079873646209398
Photography : 2.944494584837555
News & Magazines : 2.7978339350180583
Social : 2.6624548736462152
Travel & Local : 2.3240072202166075
Shopping : 2.2450361010830324
Books & Reference : 2.14350180505415
Simulation : 2.041967509025268
Dating : 1.861462093862813
Arcade : 1.8501805054151597
Video Players & Editors : 1.771209386281586
Casual : 1.7599277978339327
Maps & Navigation : 1.398916967509025
Food & Drink : 1.2409747292418778
Puzzle : 1.1281588447653441
Racing : 0.9927797833935037
Role Playing : 0.9363718411552363
Libraries & Demo : 0.9363718411552363
Auto & Vehicles : 0.9250902527075828


In [497]:
display_table(free_gplay,1)

FAMILY : 18.907942238266926
GAME : 9.724729241877363
TOOLS : 8.46119133574016
BUSINESS : 4.591606498194979
LIFESTYLE : 3.90342960288811
PRODUCTIVITY : 3.8921480144404565
FINANCE : 3.7003610108303455
MEDICAL : 3.5311371841155417
SPORTS : 3.3957581227436986
PERSONALIZATION : 3.3167870036101235
COMMUNICATION : 3.2378158844765483
HEALTH_AND_FITNESS : 3.079873646209398
PHOTOGRAPHY : 2.944494584837555
NEWS_AND_MAGAZINES : 2.7978339350180583
SOCIAL : 2.6624548736462152
TRAVEL_AND_LOCAL : 2.335288808664261
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.14350180505415
DATING : 1.861462093862813
VIDEO_PLAYERS : 1.7937725631768928
MAPS_AND_NAVIGATION : 1.398916967509025
FOOD_AND_DRINK : 1.2409747292418778
EDUCATION : 1.1620036101083042
ENTERTAINMENT : 0.9589350180505433
LIBRARIES_AND_DEMO : 0.9363718411552363
AUTO_AND_VEHICLES : 0.9250902527075828
HOUSE_AND_HOME : 0.8235559566787015
WEATHER : 0.8009927797833946
EVENTS : 0.7107400722021667
PARENTING : 0.6543321299638993
ART_AND_DESIGN : 0.6

12. Compared to ios apps it looks like google play apps seems to be distributed evenlt productivity, games and tools

In [501]:
ios_genres = prime_genre.keys()
ios_genres

dict_keys(['Social Networking', 'Photo & Video', 'Games', 'Music', 'Reference', 'Health & Fitness', 'Weather', 'Utilities', 'Travel', 'Shopping', 'News', 'Navigation', 'Lifestyle', 'Entertainment', 'Food & Drink', 'Sports', 'Book', 'Finance', 'Education', 'Productivity', 'Business', 'Catalogs', 'Medical'])

In [507]:
avg_rating={}
genre_totals={}
for row in free_ios[1:]:
    genre = row[11]
    if genre in avg_rating:
        avg_rating[genre] += float(row[5])
    else:
        avg_rating[genre] = float(row[5])

for row in free_ios[1:]:
    genre = row[11]
    if genre in genre_totals:
        genre_totals[genre] += 1
    else:
        genre_totals[genre] = 1

for genre in ios_genres:
    avg_rating[genre] = avg_rating[genre]/genre_totals[genre]
    
display_table(avg_rating,None)

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22812.92467948718
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


In [512]:
def display_apps(data_set,col_index,col_values, needed_cols = [1,5]):
    fil_data_set={}
    for row in data_set[1:]:
        col_value = row[col_index]
        row_dis = [col for index,col in enumerate(row) if index in needed_cols ]
        if col_value in fil_data_set:
            fil_data_set[col_value].append(row_dis)
        else:
            fil_data_set[col_value] = [row_dis]
            
    for col_value in col_values:
        print(f'{col_value}: \n {fil_data_set[col_value]}')
        

In [514]:
display_apps(free_ios, -5, ['Navigation','Reference','Social Networking'])

Navigation: 
 [['Waze - GPS Navigation, Maps & Real-time Traffic', '345046'], ['Google Maps - Navigation & Transit', '154911'], ['Geocaching¬Æ', '12811'], ['CoPilot GPS ‚Äì Car Navigation & Offline Maps', '3582'], ['ImmobilienScout24: Real Estate Search in Germany', '187'], ['Railway Route Search', '5']]
Reference: 
 [['Bible', '985920'], ['Dictionary.com Dictionary & Thesaurus', '200047'], ['Dictionary.com Dictionary & Thesaurus for iPad', '54175'], ['Google Translate', '26786'], ['Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran', '18418'], ['New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition', '17588'], ['Merriam-Webster Dictionary', '16849'], ['Night Sky', '12122'], ['City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE)', '8535'], ['LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools', '4693'], ['GUNS MODS for Minecraft PC Edition - Mods Tools', '1497'], ['Guides for Pok√©mon GO - Pokemon GO News

13. Navigation apps seems to be low but have high rating becuase of waze and google apps

In [516]:
gplay_category = category.keys()
gplay_category

dict_keys(['ART_AND_DESIGN', 'AUTO_AND_VEHICLES', 'BEAUTY', 'BOOKS_AND_REFERENCE', 'BUSINESS', 'COMICS', 'COMMUNICATION', 'DATING', 'EDUCATION', 'ENTERTAINMENT', 'EVENTS', 'FINANCE', 'FOOD_AND_DRINK', 'HEALTH_AND_FITNESS', 'HOUSE_AND_HOME', 'LIBRARIES_AND_DEMO', 'LIFESTYLE', 'GAME', 'FAMILY', 'MEDICAL', 'SOCIAL', 'SHOPPING', 'PHOTOGRAPHY', 'SPORTS', 'TRAVEL_AND_LOCAL', 'TOOLS', 'PERSONALIZATION', 'PRODUCTIVITY', 'PARENTING', 'WEATHER', 'VIDEO_PLAYERS', 'NEWS_AND_MAGAZINES', 'MAPS_AND_NAVIGATION'])

In [521]:
display_table(free_gplay,5)

1,000,000+ : 15.726534296029072
100,000+ : 11.552346570397244
10,000,000+ : 10.548285198556075
10,000+ : 10.198555956678813
1,000+ : 8.393501805054239
100+ : 6.915613718411619
5,000,000+ : 6.82536101083039
500,000+ : 5.561823104693188
50,000+ : 4.772111913357437
5,000+ : 4.512635379061404
10+ : 3.5424187725631953
500+ : 3.249097472924202
50,000,000+ : 2.3014440433213004
100,000,000+ : 2.1322202166064965
50+ : 1.9178700361010799
5+ : 0.7897111913357411
1+ : 0.5076714801444041
500,000,000+ : 0.2707581227436822
1,000,000,000+ : 0.22563176895306852
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [519]:
avg_installs={}
category_totals={}
for row in free_gplay[1:]:
    category = row[1]
    if category in avg_installs:
        avg_installs[category] += float(row[5].replace(',','').replace('+',''))
    else:
        avg_installs[category] = float(row[5].replace(',','').replace('+',''))

for row in free_gplay[1:]:
    category = row[1]
    if category in category_totals:
        category_totals[category] += 1
    else:
        category_totals[category] = 1

for category in gplay_category:
    avg_installs[category] = avg_installs[category]/category_totals[category]
    
display_table(avg_installs,None)

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

In [520]:
display_apps(free_gplay, 1, ['COMMUNICATION','VIDEO_PLAYERS','SOCIAL'],needed_cols=[0,5])

COMMUNICATION: 
 [['WhatsApp Messenger', '1,000,000,000+'], ['Messenger for SMS', '10,000,000+'], ['My Tele2', '5,000,000+'], ['imo beta free calls and text', '100,000,000+'], ['Contacts', '50,000,000+'], ['Call Free ‚Äì Free Call', '5,000,000+'], ['Web Browser & Explorer', '5,000,000+'], ['Browser 4G', '10,000,000+'], ['MegaFon Dashboard', '10,000,000+'], ['ZenUI Dialer & Contacts', '10,000,000+'], ['Cricket Visual Voicemail', '10,000,000+'], ['TracFone My Account', '1,000,000+'], ['Xperia Link‚Ñ¢', '10,000,000+'], ['TouchPal Keyboard - Fun Emoji & Android Keyboard', '10,000,000+'], ['Skype Lite - Free Video Call & Chat', '5,000,000+'], ['My magenta', '1,000,000+'], ['Android Messages', '100,000,000+'], ['Google Duo - High Quality Video Calls', '500,000,000+'], ['Seznam.cz', '1,000,000+'], ['Antillean Gold Telegram (original version)', '100,000+'], ['AT&T Visual Voicemail', '10,000,000+'], ['GMX Mail', '10,000,000+'], ['Omlet Chat', '10,000,000+'], ['My Vodacom SA', '5,000,000+'], ['M

14. How about a social networking games should be pretty good profile