# Android apps data analysis
In this assignment we're given a data with 10000 entries about Ratings, Genre, Reviews etc. The dataset can be found attached [AndroidApps_data](https://drive.google.com/file/d/1RCx964NxP7zpdgTyhVtWIXgX2FDnvRAe/view?usp=sharing) 

Now let's create a **function** which enable us to explore the dataset conveniently. The function should display the first few rows and if asked should also give out the the number of rows and columns in the dataset.


In [55]:
opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
android_data = list(read_file) 

In [56]:
def explore(dataset, count_rc = False) :
    print('The following are the columns:')
    print('\n')
    
    for element in dataset[0]:
        print(element)
        print('\n')
        
    print('The first row')
    print('\n')
    
    for row in dataset[1:2]:
        print(row)
        print('\n')
        
    if count_rc :
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [57]:
explore(android_data, count_rc = True)

The following are the columns:


App


Category


Rating


Reviews


Size


Installs


Type


Price


Content Rating


Genres


Last Updated


Current Ver


Android Ver


The first row


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


There seems to be some anomaly with row 10472. Lets check it out.

In [58]:
print(android_data[10473])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Clearly this row has the rating column missing and shift seems to have happened for the subsequent columns. Lets delete this row.

In [59]:
del(android_data[10473])

In [60]:
print(len(android_data))

10841


The defaulter row has been deleted.

Lets check if there are multiple entries of the same app.

In [61]:
uniqueApp_name = []
duplicate_name = []
for app in android_data:
    name = app[0]
    if name in uniqueApp_name:
        duplicate_name.append(name)
    else :
        uniqueApp_name.append(name)
        
print(len(uniqueApp_name))

9660


Clearly the number of unique apps is 9659 (1 is the header row). Rest are repeated. Lets remove the repeats.Among the repeated entries, we'll choose he entry with maximum reviews. Chances of it being the latest are highest.

In [62]:
#Creating dictionary 
unique_entries = {}
for row in android_data[1:]:
    name = row[0]
    reviews = float(row[3])
    if (name not in unique_entries) or (unique_entries[name] < reviews):
        unique_entries[name] = reviews
print(len(unique_entries))

9659


In [63]:
unique_dataset = []
UniqueApp_name = []
for row in android_data[1:]:
    name = row[0]
    reviews = float(row[3])
    if name not in UniqueApp_name and reviews == unique_entries[name]:
        unique_dataset.append(row)
        UniqueApp_name.append(name)
        
len(unique_dataset)

9659

Now since we need only english apps. Let's filter out the non english apps.

In [64]:
def english_only(string) :
    count = 0
    for element in string :
        if ord(element) > 127 :
            count += 1  
    if count > 3 :
        return False 
    
    return True 
     

In [65]:
englishApps_dataset = []
for app in unique_dataset :
    name = app[0]
    result = english_only(name)
    if result == True:
        englishApps_dataset.append(app) 
        
print(len(englishApps_dataset))

9614


Now this is final cleaned dataset and we can work on it for analysis purposes.

In [66]:
freeApps = []
for element in englishApps_dataset:
    price = element[7]
    if price == '0':
        freeApps.append(element)
print(len(freeApps))

8864


Now let's try to understand the genre of the **Most** popular app. Let us understand the market first and see which genre has most apps on app store. 

In [71]:
def gen_freq_table(dataset, index):
    genre_freq = {}
    for app in dataset:
        genre = app[index]
        if genre in genre_freq:
            genre_freq[genre] += 1
        else:
            genre_freq[genre] = 1
            
    for value in genre_freq:
        genre_freq[value] = (genre_freq[value] / len(dataset)) * 100
    return genre_freq
        

def display_table(dataset, index):
    table = gen_freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted: 
        print(entry[1], ':', entry[0])

display_table(freeApps, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [68]:
list_genre = []
for app in freeApps :
    genre = app[9]
    if genre not in list_genre:
        list_genre.append(genre)
for name_genre in list_genre:
    print(name_genre)
    print('\n')

Art & Design


Art & Design;Creativity


Auto & Vehicles


Beauty


Books & Reference


Business


Comics


Comics;Creativity


Communication


Dating


Education


Education;Creativity


Education;Education


Education;Pretend Play


Education;Brain Games


Entertainment


Entertainment;Brain Games


Entertainment;Creativity


Entertainment;Music & Video


Events


Finance


Food & Drink


Health & Fitness


House & Home


Libraries & Demo


Lifestyle


Lifestyle;Pretend Play


Card


Arcade


Puzzle


Racing


Sports


Casual


Simulation


Adventure


Trivia


Action


Word


Role Playing


Strategy


Board


Music


Action;Action & Adventure


Casual;Brain Games


Educational;Creativity


Puzzle;Brain Games


Educational;Education


Casual;Pretend Play


Educational;Brain Games


Art & Design;Pretend Play


Educational;Pretend Play


Entertainment;Education


Casual;Education


Casual;Creativity


Casual;Action & Adventure


Music;Music & Video


Arcade;Pretend Play


Adventure;Act

In [89]:
genre_installs = []
entry = {}

for name in list_genre:
    n = 0 
    for app in freeApps:
        if app[9] == name:
            installs = app[5]
            installs = installs.replace('+' , '')
            installs = installs.replace(',' , '')    
            installs = float(installs)
            n = n + installs
    entry[name] = n
    rufus = ( entry[name], name)
    genre_installs.append(rufus)

genre_installs_sorted = sorted(genre_installs, reverse = True)
for element in genre_installs_sorted :
    print(element[1], ':', element[0])
    print('\n')

Communication : 11036906201.0


Tools : 8091043474.0


Productivity : 5791629314.0


Social : 5487861902.0


Photography : 4656268815.0


Video Players & Editors : 3916731720.0


Arcade : 3753691940.0


Action : 3465986940.0


Casual : 3052798570.0


Entertainment : 3014302513.0


Travel & Local : 2894604086.0


News & Magazines : 2368196260.0


Books & Reference : 1665884260.0


Personalization : 1529235888.0


Sports : 1411230683.0


Shopping : 1400338585.0


Racing : 1400136820.0


Health & Fitness : 1143548402.0


Strategy : 907192105.0


Puzzle : 830286191.0


Business : 696902090.0


Simulation : 629062620.0


Maps & Navigation : 503060780.0


Lifestyle : 487484429.0


Finance : 455163132.0


Weather : 360288520.0


Role Playing : 329148570.0


Adventure : 295367120.0


Education : 260787900.0


Food & Drink : 211738751.0


Word : 209172550.0


Music : 170020500.0


Board : 161813110.0


Casual;Action & Adventure : 155000000.0


Card : 152618500.0


Casual;Pretend Play : 14610000

# Conclusion 
The above mentioned are the different genres in decreasing order of their installs.