**PROFITABLE APPS FOR iOS AND ANDROID**

The aim of this project is to come up with an app profile that would be most marketable to both the Android and iOS users. 

Our company focuses on building free apps for an *english-speaking* audience. The analysis will be used by among others, the developers team, that is focusing on building highly profitable apps. The main source of revenue in these apps will be the advertisements within the app.

In [1]:
from csv import reader

opened_file = open ('AppleStore.csv', encoding='utf8')
read_file = reader (opened_file)
appstore_data = list (read_file) #this is a list of lists
appstore_header = appstore_data [0] #header row
appstore_data = appstore_data [1:] #excludes header row


opened_file2 = open ('googleplaystore.csv', encoding='utf8') 
read_file2 = reader (opened_file2)
googleplay_data = list (read_file2)
googleplay_header = googleplay_data [0]
googleplay_data = googleplay_data [1:]

**Exploring The Data**
- Defining a function 'explore_data()' that explores the datasets.
- This function utilizes four parameters (dataset, two slice indices and 'rows_and_columns').
- 'rows_and_columns' is a boolean giving us the option to print the rows and columns along side their lengths. By default, they will not be printed unless we specify "True". 

1. the 'print (row)' loops over the entire slice                printing each row in the slice. 
2. the 'print ('\n')' prints a blank line separating           each row.
- The 'if' statement enables us to print the number of rows or columns and their lengths; whenever the argument 'rows_and_columns' is true.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
        
        sliced_data = dataset[start:end]  
        for row in sliced_data:
            print(row) 
            print('\n') 
            
        if  rows_and_columns:
            print('Number of rows:', len(dataset))
            print('Number of columns:', len(dataset[0]))

Now let's explore what the function does for each dataset. 

In [3]:
explore_data(appstore_data, 2, 5, True)

['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


In [4]:
explore_data(googleplay_data, 2, 5, True)

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


**Key Columns For The Analysis**

The columns that will provide the most insight to our data analysis include:
- In the appstore dataset: **'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'.** The appstore dataset can be downloaded [here](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).
- In the googleplay dataset:**'App', 'Rating', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.** The googleplay dataset can be downloaded [here](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).

In [5]:
print(appstore_header)
print('\n')
print(googleplay_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


**Removing Incorrect Data**

Row 10472 represents the app 'Life Made WI-Fi Touchscreen Photo Frame' with a rating of 19, which is off since the maximum rating for a Google Play app is 5. 

In [6]:
print(googleplay_header)
print('\n')
print(googleplay_data[10472])
print('\n')
print(googleplay_data[9])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Kids Paint Free - Drawing Fun', 'ART_AND_DESIGN', '4.7', '121', '3.1M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'July 3, 2018', '2.8', '4.0.3 and up']


Therefore, we'll delete row 10472.

In [7]:
print(len(googleplay_data))
del googleplay_data[10472]  # caution! to be ran only once
print(len(googleplay_data))

10841
10840


**Detecting and Removing Duplicates**

Some apps like 'slack' have more than one entry.

In [8]:
for app in googleplay_data:
    app_name = app[0]
    if app_name == 'Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


Now let's find the total number of duplicates existing in the entire google_play dataset by:
1. Creating two lists 'duplicates' and 'uniques'. 
2. Establishing a boolean retrieves 'duplicates' and 'uniques'.
3. Determining the length of the duplicate list.

In [9]:
duplicates = []
uniques = []

for app in googleplay_data:
    app_name = app[0]
    if app_name in uniques:
        duplicates.append(app_name)
    else:
        uniques.append(app_name)
        
print('Count of duplicates:', len(duplicates))
print('\n')
print('Some of the duplicates:', duplicates[:10])        

Count of duplicates: 1181


Some of the duplicates: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Rather than removing duplicates randomly, we'll remove them based on 'reviews' because these can be reliable in justifying the ratings. We'll keep those entries that have the highest reviews.
Secondly, the 'reviews' row is one of those whose entries are precisely different among the duplicate apps.  

- We use a dictionary to isolate the app instance that has the highest reviews from the rest of the duplicate instances.
- The key-value pairs will be [app_name, review].
- In the 'if' statement, we are updating the number of reviews for each instance of the app; whenever the name already exists in the keys and the corresponding value is less than number of reviews. 

In [10]:
highest_reviews = {}
for app in googleplay_data:
    app_name = app[0]
    reviews = float(app[3])
    
    
    if app_name in highest_reviews and highest_reviews[app_name]<reviews:
        highest_reviews[app_name] = reviews
        
    elif app_name not in highest_reviews:
        highest_reviews[app_name] = reviews
        

Previously we found that the total number of duplicates is 1181. Therefore, the length of our new dictionary should be equal to the length of dataset - length of duplicates.

In [11]:
print('Expected length of uniques:', len(googleplay_data) - 1181)
print('Actual length of dictionary:', len(highest_reviews))

Expected length of uniques: 9659
Actual length of dictionary: 9659


We'll use the highest_reviews dictionary to remove the duplicates:

For the duplicate cases, we'll only keep the entries with the highest number of reviews. 
As we loop through the Googleplay data set, we:
- isolate the name of the app and the number of reviews.
- add the current row (app) to the clean list.
- add the app name (name) to the already existing list whenever the number of reviews of the current app matches the number of reviews of that app as described in the dictionary; and the name of the app is not already in the already_added list.

In [12]:
clean_googleplay =[]
already_existing = []

for app in googleplay_data:
    app_name = app[0]
    reviews = float(app[3])
    
    if  highest_reviews[app_name] == reviews and app_name not in already_existing:
        clean_googleplay.append(app)
        already_existing.append(app_name)
        

In [13]:
explore_data(clean_googleplay, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


**Removing *Non-English* Apps**

Some of the apps are meant for *non-english* speaking audience. These are not our targeted users. See below examples:

In [14]:
print(clean_googleplay[4412][0])
print(clean_googleplay[7940][0])


中国語 AQリスニング
لعبة تقدر تربح DZ


- Remove the *non-english* apps by checking whether the characters in their names are within the ASCII range (0-127). Remove those that are out of range.
- Character check will be performed for each individual character in the strings that make up the app names.

In [15]:
def english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

print(english('Instagram'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))

True
False
False
False


- Looks like some english apps data will be lost if we proceed with removing all apps have out of range characters. This because emojis and some symbols fall outside of the ASCII range.
- To salvage this, we'll remove only those apps that have more than three characters falling out of ASCII range.

In [16]:
def english(string):
    outside_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            outside_ascii += 1
    
    if outside_ascii > 3:
        return False
    else:
        return True
            
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))    
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False


Now let's get rid of *non-english* apps in both appstore and googleplay data sets.

In [17]:
googleplay_english = []
appstore_english = []

for app in clean_googleplay:
    app_name = app[0]
    if english(app_name):
        googleplay_english.append(app)
        
for app in appstore_data:
    app_name = app[1]
    if english(app_name):
        appstore_english.append(app)
        
explore_data(googleplay_english, 2, 5, True)
print('\n')
explore_data(appstore_english, 2, 5, True)

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9614
Number of columns: 13


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number 

**Isolating Free Apps**

In [18]:
free_android_apps = []
free_ios_apps = []

for app in googleplay_english:
    price = app[7]
    if price == '0':
        free_android_apps.append(app)
        
for app in appstore_english:
    price = app[4]
    if price == '0.0':
        free_ios_apps.append(app)
        
print(len(free_android_apps))
print(len(free_ios_apps))

8864
3222


**What App Genres Are Most Popular Among Both The Android and iOS Users?**
- The idea is to focus on the most popular apps because these provide the most obvious opportunity of boosting revenue. For instance, it would make sense to increase ads in the most popular free apps.
- To find the most common genres, we'll build a frequency table based on the **'prime_genre'** column (App Store data set) and the **'Genres'** and **'Category'**columns (Google Play data set).

In [19]:
print(googleplay_header)
print('\n')
print(googleplay_data[1])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


In [20]:
print(appstore_header)
print('\n')
print(appstore_data[1])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Now let's analyze the frequency tables using percentages.

In [21]:
def frequency_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = frequency_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Analyzing the frequency table for **'prime_genre'** column:

In [22]:
display_table(free_ios_apps, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


From the table we see that:
- 'Games' is the most common genre among the *english* apps while 'Entertainment' is the runner-up.
- 'Games' accounts for more than half the most built apps.
- 'Catalogs' and 'Medical' apps are the least common app genres.

Generally, most free *english* apps in the iOS system are built entertainment purposes.

We would not rely on this frequency table alone to recommend an app profile that our company should target in the Apps Store market. Why? Because having more apps of a certain genre out there does not necessarily mean that the apps attract the most ussers. This still needs to be investigated further. 

Analyzing the frequency table for **'Genres'** and **'Category'**columns 

In [23]:
display_table(free_android_apps, -4) # Genres

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [24]:
display_table(free_android_apps, 1) # Category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

From the table we see that:
- The most common genres at the category level are 'Family', 'Game' and 'Tools'. The 'Category' column gives a more general outlook as opposed to 'Genres' column which provides a granular break-down of the apps.
- The representation of practical v. entertainment apps seems be more balanced in Googleplay.

Similarly, the above attributes stand out in the comparison between apps in Appstore and those in Googleplay:
- Unlike in the case of Appstore, we don't have a genre that is dominating more than half the market of free *english* apps.
- Practical apps are not as poorly represented in the Googleplay market as was in the case of Appstore.

As previously mentioned, frequency tables alone are not sufficient when recommending an app profile. We also need to consider the number of users that the apps attract. 

**Apps Having The Most Users**

We'll need to:
1. Calculate the average installs for each genre in Googleplay
2. Calculate the average number of user rating per genre in Appstore (since this dataset does not include the number of installs).

Average number of user rating per genre:

In [25]:
appstore_genres = frequency_table(free_ios_apps, -5)

for genre in appstore_genres:
    total = 0
    len_genre = 0
    for app in free_ios_apps:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Navigation apps have the most users

In [26]:
for app in free_ios_apps:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) 

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Although navigation apps attract the most users, most of the ratings are concentrated within one major navigation brand- Google Maps. This is therefore not a true picture that speaks for the 'Navigation' genre as a whole.

Consider 'Education' apps:

In [27]:
for app in free_ios_apps:
    if app[-5] == 'Education':
        print(app[1], ':', app[5]) 

Duolingo - Learn Spanish, French and more : 162701
Guess My Age  Math Magic : 123190
Lumosity - Brain Training : 96534
Elevate - Brain Training and Games : 58092
Fit Brains Trainer : 46363
ClassDojo : 35440
Memrise: learn languages : 20383
Peak - Brain Training : 20322
Canvas by Instructure : 19981
ABCmouse.com - Early Learning Academy : 18749
Quizlet: Study Flashcards, Languages & Vocabulary : 16683
Photomath - Camera Calculator : 16523
iTunes U : 15801
Blackboard Mobile Learn™ : 13567
Star Chart : 13482
Remind: Fast, Efficient School Messaging : 9796
PBS KIDS Video : 8651
Toca Kitchen Monsters : 8062
Toca Hair Salon - Christmas Gift : 8049
Edmodo : 7197
Prodigy Math Game : 6683
Epic! - Unlimited Books for Kids : 6676
ChineseSkill -Learn Mandarin Chinese Language Free : 6077
Google Classroom : 5942
TED : 5782
Khan Academy: you can learn anything : 5459
Got It - Homework Help Math, Chem, Physics Solver : 4903
PowerSchool Mobile : 4547
SkyView® Free - Explore the Universe : 4188
Hopsco

**Recommended App Profile in Appstore**

I would recommed investing in an app that falls within this genre. The most attractive aspect of this market is that each key player represents a completely different area of education as the other, hence there is a great potential of coming up with a different app that will earn a reputable place in this genre. For example, 'Duolingo - Learn Spanish, French and more' is a language  app whereas 'Guess My Age  Math Magic' and 'Lumosity - Brain Training' are intellectual game apps. Yet there is a healthy competition going on among the three.
More specifically, I would recommend a language educational app. Although the app with the most average ratings is a language app, there seems to be a huge gap between this app and the next language app - 'Memrise: learn languages'. 
We can come up with a language app that incorporates more interesting languages along side the most commonly spoken languages. We can also include games, dictionaries, short stories and songs that can help people learn new languages!

 Average number of installs for each genre in Googleplay:

In [28]:
categories_android = frequency_table(free_android_apps, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in free_android_apps:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Communication apps have the highest average installs but are mostly dominated by the market giants - WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts. 

In [29]:
for app in free_android_apps:
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

In [30]:
for app in free_android_apps:
    if app[1] == 'EDUCATION':
        print(app[0], ':', app[5])

English Communication - Learn English for Chinese (Learn English for Chinese) : 100,000+
Khan Academy : 5,000,000+
Ai La Trieu Phu - ALTP Free : 100,000+
Learn Spanish - Español : 1,000,000+
Speed Reading : 500,000+
English for beginners : 1,000,000+
Mermaids : 5,000,000+
Learn Japanese, Korean, Chinese Offline & Free : 1,000,000+
Kids Mode : 500,000+
Dinosaurs Coloring Pages : 500,000+
Cars Coloring Pages : 1,000,000+
Math Tricks : 10,000,000+
Learn English Words Free : 5,000,000+
Japanese / English one-shop search dictionary - Free Japanese - English - Japanese dictionary application : 50,000+
English speaking texts : 1,000,000+
Thai Handwriting : 1,000,000+
THAI DICT 2018 : 1,000,000+
Kanji test · Han search Kanji training (free version) : 1,000,000+
Flippy Campus - Buy & sell on campus at a discount : 500,000+
Free intellectual training game application | : 1,000,000+
ABC Preschool Free : 5,000,000+
PINKFONG Baby Shark : 1,000,000+
English words application mikan : 500,000+
Learn E

**Recommended App Profile in Googleplay**

I would still recommend building education-based apps in Googleplay. This will fit in well with the initial goal of having an app profile that attracts users within both iOS and Android systems.

In this genre, the market share appears to be more diverse compared to the 'COMMUNICATION' genre.
Most importantly, there is so much room for innovation. We can do the same thing we recommended for Appstore educational app- come up with a language app that incorporates more interesting languages along side the most commonly spoken languages. Include games, dictionaries, short stories and songs that can help people learn new languages!