# What kind of Apps generate the most profit?

## Introduction:
   We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use our app — the more users that see and engage with the ads, the better.
   As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

![blog-12](blog-12.jpg)

## Goal:
   Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

## Resources:
   Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our goals
- A dataset containing ~10,000 Android apps from Google Play. Data was collected in August 2018.                              [Here is the link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)
- A dataset containing ~7,000 iOS apps from App Store. Data was collected in July 2017.                                        [Here is the link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)

---

## Exploring the Data

Lets create an 'explore' function to look into the dataset

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    '''
    Peeks into the dataset
    
    Args:
        dataset:list of the lists
        start: int, representing the starting 
           indice of the slice
        end: int, representing the ending indice
          of the slice
        rows_and_columns: bool, has False value by 
        default. 
        
    Returns:
        Rows of the Dataset. If Arg `rows_and_columns' 
        is True, it also returns number of 
        rows and columns
        '''
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

### Lets use csv module to load the dataset

In [2]:
from csv import reader 
# The Google Play data set 
open_file = open('googleplaystore.csv',encoding="utf8")
read_file = reader(open_file)
android = list(read_file)
android_header = android[0] # Separate the header row
android = android[1:]

# The App Store data set
open_file = open('AppleStore.csv',encoding="utf8")
read_file = reader(open_file)
apple = list(read_file)
apple_header = apple[0]    # Separate the header row
apple = apple[1:]

In [3]:
explore_data(android,0,5, True) # The PlayStore Data

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


In [4]:
explore_data(apple, 0, 5, True) # The AppStore Data

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


In [5]:
print(android_header) 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [6]:
print(apple_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


## Data Cleaning

#### Detect **inaacurate data**, and correct or remove it.

The Google Play data set has a [dedicated discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion) section where an error for a certain row has been described. The row 10472 has been highlighted as it does not have the Category details which has made the colums shift.
This entry has missing `Rating` and a column shift happened for next columns..

In [7]:
print(android[10472])
print('\n')
print(android_header)
print(len(android[10472]))
print(len(android_header))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
12
13


The `rating` of 19 is invalid.

In [8]:
# Removal of the entry
del android[10472]

#### Detect **duplicate data**, and remove the duplicates.

Its important to check for the duplicates in our data

In [9]:
# lets check our data for duplicate entries through a for loop

duplicate_entries = [] # a new list for duplicate entries
unique_entries = [] # a list to contain unique entries

for row in android: # iterate through the rows, 
    name = row[0]   # within each row, go for the name.
    if name in unique_entries: # if name already exists in unique_entries... ,
        duplicate_entries.append(name) # append it to duplicate_entries.
    else: # Or if it never existed in unique_entries, then it must be unique.
        unique_entries.append(name) # Add it into the list of unique_entries
        
print('Number of Duplicate entries:\n{}'.format(len(
    duplicate_entries)))
print('\n')
print('Starting 5 entries:\n{}'.format(duplicate_entries[:5]))

Number of Duplicate entries:
1181


Starting 5 entries:
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


- We have around 1181 Duplicate entries. Lets dive in further into one of those

In [10]:
for row in android: # Iterate through every row
    name = row[0] # Go for the name inside a row i-e. the first item,
    if name == 'Quick PDF Scanner + OCR FREE': # if name=='X'
        print(row) # print the entire row with name 'X'

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


Since the number of `reviews` validate the `ratings`, we want to keep the row with maximum `reviews`. For that purpose, lets make a dictionary that will keep the `name` and the **maximum** `reviews` for us.

In [11]:
reviews_max = {} # Make a new dictionary.
for row in android:  # Iterate through each row of the dataset
    name = row[0] # Assign variable to the first element of each row, that is its name.
    n_reviews = float(row[3]) # Each of the 3rd element of the row tells about the number of reviews, convert it into floats assign it to n_reviews
    if name in reviews_max and reviews_max[name] < n_reviews: # if the name already existed in our newly built dictionary, and a lower n_reviews number, then... 
        reviews_max[name] = n_reviews # Replace it with the bigger number
    else: # else,
        reviews_max[name] = n_reviews # leave it as it is.
        

In [16]:
reviews_max

{'Photo Editor & Candy Camera & Grid & ScrapBook': 159.0,
 'Coloring book moana': 974.0,
 'U Launcher Lite – FREE Live Cool Themes, Hide Apps': 87510.0,
 'Sketch - Draw & Paint': 215644.0,
 'Pixel Draw - Number Art Coloring Book': 967.0,
 'Paper flowers instructions': 167.0,
 'Smoke Effect Photo Maker - Smoke Editor': 178.0,
 'Infinite Painter': 36815.0,
 'Garden Coloring Book': 13791.0,
 'Kids Paint Free - Drawing Fun': 121.0,
 'Text on Photo - Fonteee': 13880.0,
 'Name Art Photo Editor - Focus n Filters': 8788.0,
 'Tattoo Name On My Photo Editor': 44829.0,
 'Mandala Coloring Book': 4326.0,
 '3D Color Pixel by Number - Sandbox Art Coloring': 1518.0,
 'Learn To Draw Kawaii Characters': 55.0,
 'Photo Designer - Write your name with shapes': 3632.0,
 '350 Diy Room Decor Ideas': 27.0,
 'FlipaClip - Cartoon animation': 194216.0,
 'ibis Paint X': 224399.0,
 'Logo Maker - Small Business': 450.0,
 "Boys Photo Editor - Six Pack & Men's Suit": 654.0,
 'Superheroes Wallpapers | 4K Backgrounds': 

Using our dictionary reviews_max, (which contains `name` and `reviews` of each app or "row"), lets capture the full row to a new dataset.

The dictionary reviews_max will help us to match the 'right duplicate' from original dataset. By 'right duplicate' we mean the duplicate with the highest number of reviews.

We will take that best match along with its complete row to our new dataset called **android_clean**


In [17]:
android_clean = [] # Stores our new cleaned dataset
already_added = [] # keeps record to help where we have more than one maximum reviews for an app

for row in android: 
    name = row[0]
    n_reviews = float(row[3])
    if (reviews_max[name] == n_reviews) and (name not in already_added): # If n_reviews is the same as the number of maximum reviews of the app name in review_max and name is not already in the list already_added
        android_clean.append(row) # build the new dataset
        already_added.append(name) # to assist in cases where we have same number of reviews for more than one entry
print(len(android_clean))
print('\n')
explore_data(android_clean,0,10)

9659


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '

In [18]:
# Lets do the same for AppStore data
duplicate_entries = []
unique_entries = []

for row in apple:
    if row[0] in unique_entries:
        duplicate_entries.append(row[0])
    else :
        unique_entries.append(row[0])
        
print('Number of unique apps: ' ,len(unique_entries))
print('Number of duplicate apps: ' , len(duplicate_entries))
print('\n')
print('Examples of duplicate apps:', duplicate_entries[:15])

Number of unique apps:  7197
Number of duplicate apps:  0


Examples of duplicate apps: []


- There are no Duplicate entries in AppStore dataset. So no special treatments required

### Removing Non-English Apps

<font color='red'>The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system. Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. If the number is equal to or less than 127, then the character belongs to the set of common English characters.

If an app name contains a character that is greater than 127, then it probably means that the app has a non-English name.

In [19]:
def is_english(string):
    '''
    Tells whether a character belongs to standard english or not
    
    Args:
        string: String
        
    Returns:
        bool
    '''
    for element in string:
        if ord(element) > 127:
            return False
        return True # Else is implied
    

Lets check whether the function works correctly or not:


In [20]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


The function worked correctly but, If we're going to use the function we've created, we'll lose useful data since many English apps will be incorrectly labeled as non-English. This is because emojis and characters like ™ fall outside the ASCII range and have corresponding numbers over 127, the emojis and characters that the function picked out

Lets change the function to make it a bit more lenient

In [21]:
def is_english(string):
    '''
    Lenient version of is_english() as It allows some non-ascii characters.
    
    Args
        string: String
        
    Returns
        bool
    '''
    non_ascii = 0
    for element in string:
        if ord(element) > 127:
            non_ascii += 1
    if non_ascii > 3: # Allowing 3 non-ASCII characters
        return False
    else:
        return True

Lets test the newer version of the function we just made

In [22]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(is_english("爱奇艺PPS -《欢乐颂2》电视剧热播"))

True
True
False


- This worked much better. By relaxing the condition, we are able to save some data we could had lost

Next, we are going to apply the above function our datasets to remove any apps which include non-English characters.

In [23]:
english_android = []
english_apple = []

for app in android_clean:
    name = app[0] # index of the name of an App
    if is_english(name):
        english_android.append(app)
        
for app in apple:
    name = app[1] # index of the name of an App
    if is_english(name):
        english_apple.append(app)
        
print('After applying the `is_english()` function, we are left with:\n {} PlayStore apps\n'.format(len(english_android)))   
print('\n')
print('After applying the `is_english()` function, we are left with:\n {} AppStore apps\n'.format(len(english_apple)))        



After applying the `is_english()` function, we are left with:
 9614 PlayStore apps



After applying the `is_english()` function, we are left with:
 6183 AppStore apps



### Isolating Free English apps 

As we mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

In [24]:
free_english_android = []
free_english_apple = []

for row in english_android:
    name = row[0] # Index of name within the row
    price = row[6] # Index of the price within the row
    
    if price == 'Free': # 'Free' marks $ 0.0 in this dataset
        free_english_android.append(row)


for row in english_apple:
    name = row[1]
    price = row[4]
    
    if price == '0.0': # '0.0' marks $ 0.0 in the dataset
        free_english_apple.append(row) 

print('Android apps those are in english and are free:\n', len(free_english_android))
print('\n')
print('Apple apps those are in english and are free:\n', len(free_english_apple))

Android apps those are in english and are free:
 8863


Apple apps those are in english and are free:
 3222


---

## Data Analysis

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. 

Let's begin the analysis by getting a sense of what are the **most common genres** for each market. For this, we'll need to build frequency tables for a few columns in our data sets.

For Genres we have two columns in PlayStore Data, namely `Genres` and `Categories`, whereas in AppStore Data we just have one column `prime_genre`. 

Do dig deeper into them, lets build two functions to analyze the frequencies:

- freq_tables to generate frequency tables that show percentages
- display_table to display the percentages in an order

In [25]:
def freq_table(dataset, index):
    '''
    Generates frequency tables
    
    Args:
        dataset: Dataset
        index: Index of the column frequencies are to be drawn from
        
    Return:
        Frequencies as Percentages
    '''
    freq_apps = {}
    total = 0 
    
    for row in dataset:
        total += 1
        val = row[index]
        if val in freq_apps:
            freq_apps[val] += 1
        else:
            freq_apps[val] = 1
    
    #Converting frquencies to percentage table
    table_percentages = {}
    for key in freq_apps:
        percentage = (freq_apps[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

In [26]:
def display_table(dataset, index):
    '''
    Sorts the frequency table 
    
    Args:
        dataset: Dataset
        index: Index of the column frequencies are to be drawn from
        
    Return:
        Prints the entries of the frequency table in descending order
    '''
    table = freq_table(dataset, index) # takes in the table generated from `freq_table()`
    table_display = [] # initiate a list
    for key in table:
        key_val_as_tuple = (table[key], key) # transformation into tupples, Dictionaries can not be ordered
        table_display.append(key_val_as_tuple) # append into the list

    table_sorted = sorted(table_display, reverse = True) # sorts the list in a descending order
    for entry in table_sorted:
        print(entry[1], ':', entry[0]) # print the list

Lets check both functions on our datasets

In [27]:
# To make sure we get correct Index numbers, Lets print our headers we extracted in the beginning of this project
print('Android Header:\n',android_header,'\n\n','Apple Header:\n',apple_header)

Android Header:
 ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

 Apple Header:
 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


**Free English AppStore Apps:**

Examining the frequency table for the `prime_genre` column of the AppStore data set.

In [28]:
display_table(free_english_apple, -5) # Index -5: `prime_genre`

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


**Free English PlayStore Apps:**

Examining the frequency table for the `Genres` column of the PlayStore data set.

In [29]:
display_table(free_english_android, 9) # Index 9: `Genres`

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.580841701455489
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.542818458761142
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2494640640866526
Action : 3.102786866749408
Health & Fitness : 3.068938282748505
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8616721200496444
Video Players & Editors : 1.7826920907142052
Casual : 1.7488435067133026
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
St

Examining the frequency table for the `Category` column of the PlayStore data set.

In [30]:
display_table(free_english_android, 1)

FAMILY : 19.21471285117906
GAME : 9.511452104253639
TOOLS : 8.462146000225657
BUSINESS : 4.580841701455489
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.542818458761142
SPORTS : 3.4187069840911652
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2494640640866526
HEALTH_AND_FITNESS : 3.068938282748505
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7826920907142052
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.128286133363421
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
ENTERTAINMENT : 0.8800631840234684
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0.64

In [31]:
# AppStore's Data does not have a Category

##### Unlike PlayStore's data, AppStore's Data does not have a column of Categories. 

The difference between Genres and the Category column is not clear. However, Category column gives a better insight for 'Bird's eye' review. It has clearer defined groups than Genre column. Furthermore, it seems to run parallel with the Genre column of `prime genre` of AppStore. We might need Genre column when we reach the deeper level. For now, lets use Category column

- In Free English Apps,
    - Apple's AppStore has more number of users looking for fun Apps, 
        - Top three genres namely *Games*, *Entertainment* and *Photos & Videos* account for well above 70% of apps
        - *Games*, being alone, accounts for 58% of the apps
    
    - Google's PlayStore has a more balanced user base with equal emphasis for everything.
        - *Family*, *Games*, *Tools*, and *Business* score approximately 20%, 10%, 9%, and 5% respectively
          

## Most Popular Free English Apps by Genre 


One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. 
- For the Google Play data set, we can find this information in the `Installs` column. 
- For the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` app.

#### on Apple's App Store

In [32]:
genres_apple = freq_table(free_english_apple, -5)
genres_table = []
count = 0

for genre in genres_apple:
    total = 0
    len_genre = 0
    
    for row in free_english_apple:
        genre_app = row[-5]
        
        if genre_app == genre:            
            n_ratings = float(row[5])
            total += n_ratings
            len_genre += 1
            
    avg_n_ratings = total / len_genre
    key_val_tuple = (avg_n_ratings, genre)
    genres_table.append(key_val_tuple)
    count += avg_n_ratings
    
sorted_table = sorted(genres_table, reverse = True)

for element in sorted_table:
    print(element[1], ':', round(element[0]))
    

Navigation : 86090
Reference : 74942
Social Networking : 71548
Music : 57327
Weather : 52280
Book : 39758
Food & Drink : 33334
Finance : 31468
Photo & Video : 28442
Travel : 28244
Shopping : 26920
Health & Fitness : 23298
Sports : 23009
Games : 22789
News : 21248
Productivity : 21028
Utilities : 18684
Lifestyle : 16486
Entertainment : 14030
Business : 7491
Education : 7004
Catalogs : 4004
Medical : 612


**Observations**

**Apprently**, the following are top 5 and bottom 5 Genres of **Free English Apps** on **AppStore**

The top 5 Genres in Free English apps of Apple's AppStore are 
1. Navigation 
2. Reference
3. Social Networking
4. Music
5. Weather

Least Popular Genres in Free English apps of Apple's AppStore are (Lowest first)
1. Medical
2. Catalogs
3. Education
4. Business
5. Entertainment

**Inference**
- We can not move on with these results to make recommendation without exploring the influences on these results and probing its meanings. For that, lets do a quick exploration of the top 5 genres

In [33]:
apparently_popular_genres_ios = ['Navigation', 'Reference', 'Social Networking', 'Music', 'Weather']

for genre in apparently_popular_genres_ios: # loops for each term in the list
    print('\n','----',genre,'----','\n') # prints a header
    for row in free_english_apple: # loops within the rows
        if row[-5] == genre: # picks up index [-5], which is the genre
            print(row[1], ':',row[5]) # picks up the index[1] which is the name and index[5] which is total rating count from the same level
    


 ---- Navigation ---- 

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5

 ---- Reference ---- 

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for

- We can clearly see that the genre of *Navigation* is heavily influenced by "Waze - GPS Navigation, Maps & Real-time Traffic", "Google Maps - Navigation & Transit", and "Geocaching®"
- Apps under the genre of *Social* are heavily influenced by giants like "Facebook", and "Pinterest"
- Same pattern repeats itself in genre *Music* where "Pandora", "Spotify", and "Shazam" have heavily impacted the average number
- Even in the *Reference* genre, "Bible" and "Dictionary" completely overwhelm the numbers.



Lets move onto the PlayStore for now
#### on Android's PlayStore

In [34]:
android_header

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [35]:
genres_android = freq_table(free_english_android, 1)
genres_table = []
count2 = 0

for genre in genres_android:
    total = 0
    len_genre = 0
    
    for row in free_english_android:
        genre_app = row[1]
        if genre_app == genre:            
            n_installs = row[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_genre += 1
            
    avg_n_installs = total / len_genre
    key_val_tuple2 = (avg_n_installs, genre)
    genres_table.append(key_val_tuple2)
    count2 += avg_n_installs
    
sorted_table2 = sorted(genres_table, reverse = True)

for element in sorted_table2:
    print(element[1], ':', round(element[0]))
    
    

COMMUNICATION : 38326063
VIDEO_PLAYERS : 24790074
SOCIAL : 23253652
PHOTOGRAPHY : 17840110
PRODUCTIVITY : 16772839
TRAVEL_AND_LOCAL : 13984078
GAME : 12914436
TOOLS : 10801391
NEWS_AND_MAGAZINES : 9549178
ENTERTAINMENT : 9146923
BOOKS_AND_REFERENCE : 8767812
SHOPPING : 7036877
PERSONALIZATION : 5201483
FAMILY : 5183204
WEATHER : 5074486
SPORTS : 4274689
HEALTH_AND_FITNESS : 4167457
MAPS_AND_NAVIGATION : 4056942
ART_AND_DESIGN : 1986335
FOOD_AND_DRINK : 1924898
EDUCATION : 1768500
BUSINESS : 1704192
LIFESTYLE : 1437816
FINANCE : 1387692
HOUSE_AND_HOME : 1331541
DATING : 854029
COMICS : 817657
AUTO_AND_VEHICLES : 647318
LIBRARIES_AND_DEMO : 638504
PARENTING : 542604
BEAUTY : 513152
EVENTS : 253542
MEDICAL : 123065


**Observations:**

Top 5 Genres in Free English apps of Goolge PlayStore are: 
1. Communication
2. Video Players
3. Social Networking
4. Photo
5. Productivity 

Least popular Free English apps of Google's PlayStore: (lowest on top)
1. Medical
2. Events
3. Beauty
4. Parenting
5. Libraries and Demo

Lets dig further into the top 5 genres to check whether the Giants are influencing PlayStore's genres or not

In [36]:
apparently_popular_genres_android = ['COMMUNICATION', 'VIDEO_PLAYERS', 'SOCIAL', 'PHOTOGRAPHY', 'PRODUCTIVITY']
communication = {}
video_players = {}
social = {}
photography = {}
productivity = {}

for genre in apparently_popular_genres_android: # loops for each term in the list
    for row in free_english_android: # loops within the rows
        if row[1] == genre: # picks up index [-5], which is the genre
            n_installs = row[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            n_installs = int(n_installs)
            # total += float(n_installs)
            #print(row[0], ':',n_installs) # picks up the index[1] which is the name and index[5] which is total rating count from the same level
            if genre == 'COMMUNICATION':
                communication[row[0]] = n_installs
            elif genre == 'VIDEO_PLAYERS':
                video_players[row[0]] = n_installs
            elif genre == 'SOCIAL':
                social[row[0]] = n_installs
            elif genre == 'PHOTOGRAPHY':
                photography[row[0]] = n_installs
            elif genre == 'PRODUCTIVITY':
                productivity[row[0]] = n_installs
    

To sort the results, we will have to introduce another function. It will help us to observe the trends orderly

In [37]:
def sort_dictionary(dictionary, reverse=True):
    '''
    This functions sorts a dictionary into order
    by its values. It does so by switching numberical
    values to keys (i-e. right to left)
    
    Args:
        dictionary: dict. A dictionary with numerical values
        reverse: bool. True by default
        
    Returns:
        List. A list of sets of tuples ordered wrt the values 
        of the inputted dictionary. 
            If reverse = True, default, It returns the list with     
            descending order
            If reverse = False, It returns the list in ascending
            order.
    '''
    table = []
    for item in dictionary.items():
        key_val_tup = (item[1], item[0]) 
        table.append(key_val_tup)
    if reverse == True:
        sort_table_desc = sorted(table, reverse=True)
        return sort_table_desc
    else:
        sort_table_asc = sorted(table)
        return sort_table_asc

PlayStore's Category `communication`. Ranking for Free English Android's apps

In [38]:
sort_dictionary(communication)

[(1000000000, 'WhatsApp Messenger'),
 (1000000000, 'Skype - free IM & video calls'),
 (1000000000, 'Messenger – Text and Video Chat for Free'),
 (1000000000, 'Hangouts'),
 (1000000000, 'Google Chrome: Fast & Secure'),
 (1000000000, 'Gmail'),
 (500000000, 'imo free video calls and chat'),
 (500000000, 'Viber Messenger'),
 (500000000, 'UC Browser - Fast Download Private & Secure'),
 (500000000, 'LINE: Free Calls & Messages'),
 (500000000, 'Google Duo - High Quality Video Calls'),
 (100000000, 'imo beta free calls and text'),
 (100000000, 'Yahoo Mail – Stay Organized'),
 (100000000, 'Who'),
 (100000000, 'WeChat'),
 (100000000, 'UC Browser Mini -Tiny Fast Private & Secure'),
 (100000000, 'Truecaller: Caller ID, SMS spam blocking & Dialer'),
 (100000000, 'Telegram'),
 (100000000, 'Opera Mini - fast web browser'),
 (100000000, 'Opera Browser: Fast and Secure'),
 (100000000, 'Messenger Lite: Free Calls & Messages'),
 (100000000, 'Kik'),
 (100000000, 'KakaoTalk: Free Calls & Text'),
 (10000000

PlayStore's Category `video_players`. Ranking for Free English Android's apps

In [39]:
sort_dictionary(video_players)

[(1000000000, 'YouTube'),
 (1000000000, 'Google Play Movies & TV'),
 (500000000, 'MX Player'),
 (100000000, 'VivaVideo - Video Editor & Photo Movie'),
 (100000000, 'VideoShow-Video Editor, Video Maker, Beauty Camera'),
 (100000000, 'VLC for Android'),
 (100000000, 'Motorola Gallery'),
 (100000000, 'Motorola FM Radio'),
 (100000000, 'Dubsmash'),
 (50000000, 'Vote for'),
 (50000000, 'Vigo Video'),
 (50000000, 'VMate'),
 (50000000, 'Samsung Video Library'),
 (50000000, 'Ringdroid'),
 (50000000, 'MiniMovie - Free Video and Slideshow Editor'),
 (50000000, 'LIKE – Magic Video Maker & Community'),
 (50000000, 'KineMaster – Pro Video Editor'),
 (50000000, 'HD Video Downloader : 2018 Best video mate'),
 (50000000, 'DU Recorder – Screen Recorder, Video Editor, Live'),
 (10000000, 'video player for android'),
 (10000000, 'iMediaShare – Photos & Music'),
 (10000000, 'YouTube Studio'),
 (10000000, 'Video Player All Format'),
 (10000000, 'Video Downloader - for Instagram Repost App'),
 (10000000, 'V

PlayStore's Category `social`. Ranking for Free English Android's apps

In [40]:
sort_dictionary(social)

[(1000000000, 'Instagram'),
 (1000000000, 'Google+'),
 (1000000000, 'Facebook'),
 (500000000, 'Snapchat'),
 (500000000, 'Facebook Lite'),
 (100000000, 'VK'),
 (100000000, 'Tumblr'),
 (100000000, 'Tik Tok - including musical.ly'),
 (100000000, 'Tango - Live Video Broadcast'),
 (100000000, 'Pinterest'),
 (100000000, 'LinkedIn'),
 (100000000, 'Badoo - Free Chat & Dating App'),
 (100000000, 'BIGO LIVE - Live Stream'),
 (50000000, 'ooVoo Video Calls, Messaging & Stories'),
 (50000000, 'Zello PTT Walkie Talkie'),
 (50000000, 'SKOUT - Meet, Chat, Go Live'),
 (50000000, 'POF Free Dating App'),
 (50000000, 'MeetMe: Chat & Meet New People'),
 (10000000, 'textPlus: Free Text & Calls'),
 (10000000, 'magicApp Calling & Messaging'),
 (10000000, 'YouNow: Live Stream Video Chat'),
 (10000000, 'We Heart It'),
 (10000000, 'Waplog - Free Chat, Dating App, Meet Singles'),
 (10000000, 'TextNow - free text + calls'),
 (10000000, 'Text free - Free Text + Call'),
 (10000000, 'Text Me: Text Free, Call Free, Se

PlayStore's Category `photography`. Ranking for Free English Android's apps

In [41]:
sort_dictionary(photography)

[(1000000000, 'Google Photos'),
 (100000000, 'Z Camera - Photo Editor, Beauty Selfie, Collage'),
 (100000000, 'YouCam Perfect - Selfie Photo Editor'),
 (100000000, 'YouCam Makeup - Magic Selfie Makeovers'),
 (100000000, 'Sweet Selfie - selfie camera, beauty cam, photo edit'),
 (100000000, 'S Photo Editor - Collage Maker , Photo Collage'),
 (100000000, 'Retrica'),
 (100000000, 'PicsArt Photo Studio: Collage Maker & Pic Editor'),
 (100000000, 'PhotoGrid: Video & Pic Collage Maker, Photo Editor'),
 (100000000, 'Photo Editor Pro'),
 (100000000, 'Photo Editor Collage Maker Pro'),
 (100000000, 'Photo Collage Editor'),
 (100000000, 'LINE Camera - Photo editor'),
 (100000000, 'Cymera Camera- Photo Editor, Filter,Collage,Layout'),
 (100000000, 'Candy Camera - selfie, beauty camera, photo editor'),
 (100000000, 'Camera360: Selfie Photo Editor with Funny Sticker'),
 (100000000, 'BeautyPlus - Easy Photo Editor & Selfie Camera'),
 (100000000, 'B612 - Beauty & Filter Camera'),
 (100000000, 'AR effec

PlayStore's Category `productivity`. Ranking for Free English Android's apps

In [42]:
sort_dictionary(productivity)

[(1000000000, 'Google Drive'),
 (500000000, 'Microsoft Word'),
 (500000000, 'Google Calendar'),
 (500000000, 'Dropbox'),
 (500000000, 'Cloud Print'),
 (100000000, 'WPS Office - Word, Docs, PDF, Note, Slide & Sheet'),
 (100000000, 'SwiftKey Keyboard'),
 (100000000, 'Samsung Notes'),
 (100000000, 'Microsoft PowerPoint'),
 (100000000, 'Microsoft Outlook'),
 (100000000, 'Microsoft OneNote'),
 (100000000, 'Microsoft OneDrive'),
 (100000000, 'Microsoft Excel'),
 (100000000, 'Google Slides'),
 (100000000, 'Google Sheets'),
 (100000000, 'Google Keep'),
 (100000000, 'Google Docs'),
 (100000000, 'Evernote – Organizer, Planner for Notes & Memos'),
 (100000000, 'ES File Explorer File Manager'),
 (100000000, 'ColorNote Notepad Notes'),
 (100000000, 'CamScanner - Phone PDF Creator'),
 (100000000, 'Adobe Acrobat Reader'),
 (50000000, 'myAT&T'),
 (50000000, 'Verizon Cloud'),
 (50000000, 'QR Droid'),
 (50000000, 'My Airtel-Online Recharge, Pay Bill, Wallet, UPI'),
 (50000000, 'Mobizen Screen Recorder -

In [44]:
under_100_m = []

for app in free_english_android:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3593510.3486590036

In [45]:
display_table(free_english_android, -5)

Everyone : 81.43969310617173
Teen : 11.057204106961525
Mature 17+ : 4.185941554778292
Everyone 10+ : 3.2607469254202868
Adults only 18+ : 0.033848584000902626
Unrated : 0.022565722667268417


### Inferences:
1. The terms used to define the genres in each Dataset are not standard and even ambigious to a considerable extent. Lets look at the most popular Genres of both Stores again.
    - In AppStore, following genres of free english apps are on the top
        1. Navigation 
        2. Reference
        3. Social Networking
        4. Music
        5. Weather
    - Whereas in PlayStore, following are the genres of free english apps are the most popular:
        1. Communication
        2. Video Players
        3. Social Networking
        4. Photo
        5. Productivity 

If we compare the lists, `Social Networking` and `Music` are not entirely exclusive of `Communication` and `Video Players`.

2. However, We can clearly see that the genre `Navigation` that tops the AppStore's Free English genres has a very low ranking on PlayStore's Free English apps' genres. Similarly, `Productivity` ranks among the highest in PlayStore but has a very low ranking on AppStore's ranking 

3. Both `Education` and `Business` genress perform poorly on popularity charts of Free English apps of both, AppStore and PlayStore






## Conclusions:
- Based on the above analysis, the Social (networking) Genre is the most popular with AppStore and GooglePlayStore's free english Apps with 10.8%, 9.88% of most popular apps. 
- Another encouraging factor to consider the social genre is that it is quite under representated, as it can be seen that on IOS only 3.28% apps, and on Android, only 2.66% of total free english apps are there in their respective app stores