# Profitable App Profiles for the App Store and Google Play Markets

**The aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. I'm working as data analysts for a company that builds Android and iOS mobile apps, and the job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.**

**The goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.**


In [2]:
def explore_data(dataset, start,end, rows_columns = False):
    data_slice = dataset[start:end]
    for row in data_slice:
        print(row)
        print('\n')
        
    if rows_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Takes in four parameters:
dataset, which is expected to be a list of lists.
start and end, which are both expected to be integers and represent the starting and the ending indices of a slice from the data set.
rows_and_columns, which is expected to be a Boolean and has False as a default argument.
Slices the data set using dataset[start:end].
Loops through the slice, and for each iteration, prints a row and adds a new line after that row using print('\n').
The \n in print('\n') is a special character and won't be printed. Instead, the \n character adds a new line, and we use print('\n') to add some blank space between rows.
Prints the number of rows and columns if rows_and_columns is True.
dataset shouldn't have a header row, otherwise the function will print the wrong number of rows (one more row compared to the actual length).


**Instruction**

1. Open the two data sets we mentioned above, and save both as lists of lists.
	
The App Store data set is stored in a CSV file named `AppleStore.csv`, and the Google Play data set is stored in a CSV file named `googleplaystore.csv.`
Both CSV files can be opened directly in the Jupyter Notebook interface you see on the right of the screen.
If you run into an error named UnicodeDecodeError, add encoding="utf8" to the open() function (for instance, use `open('AppleStore.csv', encoding='utf8'`)).


2. Explore both data sets using the `explore_data()` function.

Print the first few rows of each data set.
Find the number of rows and columns of each data set (recall that the function assumes the argument for the dataset parameter doesn't have a header row).


3. Print the column names and try to identify the columns that could help us with our analysis. Use the documentation of the data sets if you're having trouble understanding what a column describes. Add a link to the documentation for readers if you think the column names are not descriptive enough



In [3]:
from csv import reader

### The Google Play data set ###
opened_file = open('googleplaystore.csv', encoding = 'utf8')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv', encoding = 'utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]


In [4]:
explore_data(ios, 0,9, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1']


['282935706', 'Bible', '92774400', 'USD', '0.0', '985920', '5320', '4.5', '5.0', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


['5538347

In [5]:
explore_data(android, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


1. There is an error for a certain row find out what the index of the row is 
2. Print the row at the index to check if it's incorrect
3. Use the del statement to remove the row that has an error

In [6]:
print(android[10472])  # incorrect row
print('\n')
print(android_header)  # header
print('\n')
print(android[0]) #correct row


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


In [7]:
print(len(android))
print('\n')
del android[10472]
print('\n')
print(len(android))

10841




10840


In [8]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)
    

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


We don't want to count certain apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app. One thing we could do is remove the duplicate rows randomly, but we could probably find a better way.

If you examine the rows we printed two cells above for the Instagram app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews. The different numbers show that the data was collected at different times. We can use this to build a criterion for keeping rows. We won't remove rows randomly, but rather we'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more reliable the ratings.


In [9]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
print('Number of duplicates apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate app:' ,duplicate_apps[:19])

Number of duplicates apps: 1181


Examples of duplicate app: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics']


**Above, we:**

1. Created two lists: one for storing the name of duplicate apps, and one for storing the name of unique apps.
2. Looped through the `android` data set (the Google Play data set), and for each iteration:
We saved the app name to a variable named `name`.
If name was already in the `unique_apps` list, we appended `name` to the `duplicate_apps` list.
Else (if `name` wasn't already in the `unique_apps` list), we appended `name` to the `unique_apps list`.


In [10]:
print("Expected length:", len(android)-1181)

Expected length: 9659


**To remove the duplicates, we will:**

1. Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
2. Use the information stored in the dictionary and create a new data set, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).


In [11]:
print('z' in ['a','b','c'])
print('z' not in ['a','b','c'])

False
True


In [12]:
name_and_reviews = {'Instagram':66577313, 'Facebook':68786565}
print('LinkedIn' not in name_and_reviews)
print('LinkedIn'  in name_and_reviews)

True
False


**Instructions**

1. Create a dictionary where each key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
        Start by creating an empty dictionary named `reviews_max`.
        Loop through the Google Play data set (make sure you don't include the header row). For each iteration:
            Assign the app name to a variable named `name`.
            Convert the number of reviews to `float`. Assign it to a variable named `n_reviews`.
            If `name` already exists as a key in the `reviews_max` dictionary and `reviews_max[name] <    n_reviews`, update the number of reviews for that entry in the `reviews_max` dictionary.
           
     If `name` is not in the `reviews_max` dictionary as a key, create a new entry in the dictionary where the key is the app name, and the value is the number of reviews. Make sure you don't use an `else` clause here, otherwise the number of reviews will be incorrectly updated whenever `reviews_max[name] < n_reviews` evaluates to `False`.
     
Inspect the dictionary to make sure everything went as expected. Measure the length of the dictionary ‚Äî remember that the expected length is 9,659 entries.


2.  Use the dictionary you created above to remove the duplicate rows:

       Start by creating two empty lists: `android_clean` (which will store our new cleaned data set) and `already_added` (which will just store app names).
       Loop through the Google Play data set (make sure you don't include the header row), and for each iteration:
            Assign the app name to a variable named `name`.
            Convert the number of reviews to `float`, and assign it to a variable named `n_reviews`.
           
       If `n_reviews` is the same as the number of maximum reviews of the app `name` (the number can be found in the `reviews_max` dictionary) and `name` is not already in the list already_added (read the solution notebook to find out why we need this supplementary condition):
           
       Append the entire row to the `android_clean` list (which will eventually be a list of list and store our cleaned data set).
       
       Append the name of the app `name` to the `already_added` list ‚Äî this helps us to keep track of apps that we already added.


In [13]:
review_max = {}
for apps in android:
    name = apps[0]
    n_reviews = float(apps[3])
    if name in review_max:
        review_max[name] < n_reviews
        review_max[name] = n_reviews
    elif name not in review_max:
         review_max[name] = n_reviews
            
android_clean = []
already_added = []

for apps in android:
    name = apps[0]
    n_reviews = float(apps[3])
    

    if (review_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(apps)
        already_added.append(name)
    

In [14]:

len(android_clean)

9659

In [15]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


**Removing Non-English Apps**

In [16]:
print(ios[813][1])
print(ios[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠
„ÄêËÑ±Âá∫„Ç≤„Éº„É†„ÄëÁµ∂ÂØæ„Å´ÊúÄÂæå„Åæ„Åß„Éó„É¨„Ç§„Åó„Å™„ÅÑ„Åß „ÄúË¨éËß£„ÅçÔºÜ„Éñ„É≠„ÉÉ„ÇØ„Éë„Ç∫„É´„Äú


Just a Line - Draw Anywhere, with AR
DEM DZ


In [17]:
print(ios[1][1])

Instagram


In [18]:
print(ord('a'))
print(ord('‰πê'))
print(ord('Áà±'))



97
20048
29233


The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system. Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. If the number is equal to or less than 127, then the character belongs to the set of common English characters.

The ord() function in Python accepts a string of length 1 as an argument and returns the unicode code point  
for example ord('B') returns 66

**Instructions**

1. Write a function that takes in a string and returns `False` if there's any character in the string that doesn't belong to the set of common English characters, otherwise it returns `True`.

Inside the function, iterate over the input string. For each iteration check whether the number associated with the character is greater than 127. When a character is greater than 127, the function should immediately `return False` ‚Äî the app name is probably non-English since it contains a character that doesn't belong to the set of common English characters.
If the loop finishes running without the return statement being executed, then it means no character had a corresponding number over 127 ‚Äî the app name is probably English, so the functions should return `True`.

2. Use your function to check whether these app names are detected as English or non-English:

'Instagram'
'Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'


In [19]:
def is_english(string):
    for character in string:
        if ord(character) > 127:
            return False
        else:
            return True


In [20]:
is_english('Instagram')

True

In [21]:
is_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠')

False

In [22]:
is_english('üòú')

False

In [23]:
def is_english(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii +=1
    if non_ascii > 3:
        return False
    else:
        return True

In [24]:
is_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠')

False

In [25]:
print(is_english('Docs To Go‚Ñ¢ Free Office Suite'))
print(is_english('Instachat üòúüòú'))

True
True


In [26]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)


In [27]:
explore_data(android_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


In [28]:
explore_data(ios_english, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16



So far in the data cleaning process, we:

1. Removed inaccurate data
2. Removed duplicate app entries
3. Removed non-English apps


In [29]:
print(android[0][6])

Free


**Instructions**
1. Loop through each data set to isolate the free apps in separate lists. Make sure you identify the columns
   describing the app price ccorrectly
2. After you isolate the free apps, check the length of each datasets to seehow many you have remaining

In [51]:
free_appss = []
for apps in android_english:
    free_apps = apps[6]
    if free_apps == 'Free':
        free_appss.append(apps)
        

In [52]:
len(free_appss)

8863

In [61]:
android_final = []
for apps in android_english:
    price = apps[7]
    if price == '0':
        android_final.append(apps)
    
    
    

In [63]:
ios_final= []
for apps in ios_english:
    ios_price = apps[4]
    if ios_price == '0.0':
        ios_final.append(apps)
    

In [64]:
print(len(android_final))
print(len(ios_final))

8864
3222


So far, we  have
1. Removed Inaccurate data
2. Removed duplicate app entries
3. Removed non - english apps
4. Isolated the free apps

In [67]:
a_list = [50,20,100]
print(sorted(a_list))
print(sorted(a_list, reverse = True))

[20, 50, 100]
[100, 50, 20]


The sorted() doesn't function doesn't work too well with dictionaries because it only considers and return dictionary key

In [71]:
freq_table = {'Genre':50, 'Genre_2':20, 'Genre_1':100}
freq_table_as_tuple = [(50,'Genre'), (20,'Genre_2'),(100,'Genre_1')]
sorted(freq_table)
sorted(freq_table_as_tuple)

[(20, 'Genre_2'), (50, 'Genre'), (100, 'Genre_1')]

1. Takes in two parameters: dataset and index. dataset is expected to be a list of lists, and index is expected to be an integer.
2. Generates a frequency table using the freq_table() function (which you're going to write as an exercise).
3. Transforms the frequency table into a list of tuples, then sorts the list in a descending order.
4. Prints the entries of the frequency table in descending order.


In [None]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key],key)
        table_display.append(key_val_as_tuple)
        table_sorted = sorted(table_display, reverse = True)
        for entry in table_sorted:
            print(entry[1], ':', entry[0])

In [73]:
freq_table = {'Genre':50, 'Genre_2':20, 'Genre_1':100}

sorted(freq_table)

['Genre', 'Genre_1', 'Genre_2']

In [74]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


In [75]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


In [91]:
print(freq_table(ios_final, 11))
print(' ')
print(' ')
print(display_table(ios_final, -5))

{'Social Networking': 3.2898820608317814, 'Photo & Video': 4.9658597144630665, 'Games': 58.16263190564867, 'Music': 2.0484171322160147, 'Reference': 0.5586592178770949, 'Health & Fitness': 2.0173805090006205, 'Weather': 0.8690254500310366, 'Utilities': 2.5139664804469275, 'Travel': 1.2414649286157666, 'Shopping': 2.60707635009311, 'News': 1.3345747982619491, 'Navigation': 0.186219739292365, 'Lifestyle': 1.5828677839851024, 'Entertainment': 7.883302296710118, 'Food & Drink': 0.8069522036002483, 'Sports': 2.1415270018621975, 'Book': 0.4345127250155183, 'Finance': 1.1173184357541899, 'Education': 3.662321539416512, 'Productivity': 1.7380509000620732, 'Business': 0.5276225946617008, 'Catalogs': 0.12414649286157665, 'Medical': 0.186219739292365}
 
 
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.14152700186

We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users ‚Äî the demand might not be the same as the offer.


The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.



Even so, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the Genres column:


In [96]:
print(display_table(android_final, -4))

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.580324909747293
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.5424187725631766
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2490974729241873
Action : 3.1024368231046933
Health & Fitness : 3.068592057761733
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.861462093862816
Video Players & Editors : 1.782490974729242
Casual : 1.7486462093862816
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.925090252707581

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.


In [93]:
some_strings = ['FIRST', 'SECOND']
some_integers = [1,2,3,4,5]

for string in some_strings:
    print(string)
    for integer in some_integers:
        print(integer)

FIRST
1
2
3
4
5
SECOND
1
2
3
4
5


In [106]:
genre_ios = freq_table(ios_final, -5)

for genre in genre_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            user_rating = float(app[5])
            total += user_rating
            len_genre += 1
            
    average_rating = total/len_genre
    print(genre,':', average_rating)
        



Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:



In [107]:
display_table(android_final,5)

1,000,000+ : 15.749097472924186
100,000+ : 11.563628158844766
10,000,000+ : 10.503158844765343
10,000+ : 10.209837545126353
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [110]:
n_installs = '100,000+'
print(n_installs.replace('+','plus'))
print(n_installs.replace('1','one'))
print(n_installs.replace('&','ampersand1'))

100,000plus
one00,000+
100,000+


In [111]:
n_installs = '100,000+'
print(n_installs.replace('+',''))


100,000


In [115]:
n_installs = '100,000+'
n_installs = n_installs.replace('+','')

In [116]:
print(n_installs.replace(',',''))

100000


In [140]:
genre_android = freq_table(android_final, 1)

for category in genre_android:
    total = 0
    len_category = 0
    
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            save_install = app[5]
            save_install = save_install.replace(',','')
            save_install = save_install.replace('+','')
            total += float(save_install)
            len_category += 1
    average_number = total/len_category
        
    print(category,':', average_number)
        

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1704192.3399014778
COMICS : 817657.2727272727
COMMUNICATION : 38326063.197916664
DATING : 854028.8303030303
EDUCATION : 1768500.0
ENTERTAINMENT : 9146923.076923076
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4167457.3602941176
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 12914435.883748516
FAMILY : 5180161.789906103
MEDICAL : 123064.7898089172
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 4274688.722772277
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16772838.591304347
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24790074.17721519
NEWS_AND_MAGAZINES : 

1. NEWS_AND_MAGAZINES : 9549178.467741935
2. MAPS_AND_NAVIGATION : 4056941.7741935486


On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:


