## Google playstore and App Store Data Analysis

### Table of Content
- [Introduction](#introduction)
- [Data Cleaning](#data_cleaning)
- [Analysis](#analysis)
- [Conclusion](#conclusion)

## <div id="introduction"></div>Introduction

The data collection includes information that was extracted from the Google Play and Apple Stores, including all of the apps that were listed on those platforms.
When using other Python libraries and built-in functions, these analyses just use functions and loops to analyze data for improved applications and understanding.
## Background Information
I play a data analyst for an organization that creates Android and iOS mobile apps. Our apps are distributed through Google Play and the App Store.
My company only creates apps that are free to download and are intended for the English-speaking population. We also generate income via in-app advertisements, so the more time users spend using the app, the more money we make.
The developer team has asked the data analyst team to examine the platform's data in order to gain knowledge and offer suggestions about the development of the following applications.

#### Exploring the datasets

In [242]:
# opening the files with CSV library
from csv import reader
# opening the apple dataset
opend_file= open('AppleStore.csv',encoding='utf8')
apple_r = reader(opend_file)
apple = list(apple_r)
# specifying the header of the data  set
apple_header = apple[0]
# setting the remaining rows as file 
apple = apple[1:]

#opening the google dataset
open_file_2 = open('googleplaystore.csv',encoding='utf8')
google_r = reader(open_file_2)
google = list(google_r)
# specifying the header of the google data set
google_header = google[0]
# setting the remaining row 
google= google[1:]

In [243]:
# creating a function that explore our data set
def explore_data(dataset,start,end,row_column=False):
    """ accept 4 varables
    dataset = the data set
    start = number count from the begining
    end = where the files end
    row_column = boolen Fales as defult argument if column has an header

    """
    dataset_slice= dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    if row_column:
        print('number of row:',len(dataset))
        print('number of column:',len(dataset[0]))

In [244]:
# printing the header row for apple dataset
print(apple_header)
print('\n')
# exploring 5 row in the dataset excluding the header since we have reomoved it in the upper code
explore_data(apple,0,2, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


number of row: 7197
number of column: 16


In [245]:
# printing the header row for google dataset
print(google_header)
print('\n')
explore_data(google,0,5,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

Using the explore data function, the Apple data set contain 7197 row and 16 columns and the Google dataset contained 10841 row and 13 columns
THE COLUMNS INCLUEDS:
| Apple Dataset| Info   | Google Dataset| Info  |
|-------------- |--------------| -------------- | ------------|
|   ID       | app identification num   | App     | the app name|
| Track_name  | app name | Category | app category|
| Size_bytes | Size (in Bytes) |Rating| app rating
| Currency | app payment currency |Reviews| app review
| Price | Price amount| Size | Size of the app (as when scraped)|
| Rating_count_tot |User Rating counts (for all version) | Installs | Number of user downloads|
| Rating_count_ver | User Rating counts (for current version)| Type | Paid or Free|
| User_rating | Average User Rating (for all version)| Price | Price of the app|
| User_rating_ver | Average User Rating (for current version) | Content Rating | Age group the app is targeted at |
| Ver | Latest version code | Genres | An app can belong to multiple genres (apart from its main category) |
| Cont_rating | Content Rating | Last Updated | date |
| prime_genre | Primary Genre | Current Ver | current version|
| sup_devices.num| Number of supporting devices| Android Ver | Android Ver|
| ipadSc_urls.num| Number of screenshots showed for display|    |   |
| lang.num| Number of supported languages| | |
| vpp_lic | Vpp Device Based Licensing Enabled | | |

## <div id="data_cleaning"></div>Data Cleaning
1. Remove row that is not equal to the header
2. Droped duplicated rows
3. Remove non english app name
4. Remove non free app


#### 1.  Remove Empty Row

In [246]:
# After checking the info from the Kaggle discussion i found out their is a particular row that the column is missing since i am not using the panda data frame
# printing the length of the header columns and saving in a varable
google_head= len(google_header)
# for every row in google dataset
for row in google:
    # find the length of the each row
    row_length=len(row)
    # if the length of the header column is not equal to the length of any of the row column
    if google_head != row_length:
        # print out the index number 
        print(google.index(row))
        # print out the row in full
        print(row)

10472
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


 From our result we can see that the row which was printed had 12 columns while the header is 13

In [247]:
# To delete the row using the del fuction
del google[10472]

### 2. Checking for duplicate and droping the duplicated values

In [248]:
# loop through the row to print all row where the app name is equal to instagram
for row in google:
    dup=row[0]
    if dup == "Instagram":
        print(row)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


 From this it is evident that their are duplicated row in the data set, which made me divide the dataset into two

In [249]:
# here i created two empty list to store the duplicated and unique app names
duplicated_value=[]
unique_value = []
# loop through every row in google dataset
for row in google:
    # for the first column in each row 
    dup_name = row[0]
    # if the column name is in the unique list
    if dup_name in unique_value:
        # append the row to duplicated variable if it is already present in the unique list
        duplicated_value.append(dup_name)
        # if it is not present save it as a unique value
    else:
        unique_value.append(dup_name)

In [250]:
# now i want to print the number of duplicated values
print(' The number of duplicated values are', len(duplicated_value))
# printing the 10 examples
print(' Example of the duplicated values are', duplicated_value[:10])

 The number of duplicated values are 1181
 Example of the duplicated values are ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


### Removing Duplicates and keeping the ones with the highest review
 However, the rows will not be dropped automatical as there are some criteria that will be needed for me to drop them. For example when I print all columns that have Instagram as their app name I see they are all the same except for their review count which is diffrent in the rating count. with this, I will only be leaving rows in review count thaat has the highest value and drop the lower review count. 

In [251]:
# loop through the google data set and save the maximum number in the review for the duplicated row
reviews_max = {}
# for app in the google dataset save a varable for the app column and review column in float form
for app in google:
    name = app[0]
    n_reviews = float(app[3])
    # if the app name is in review_max and the
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    # elif the name is not in     
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [252]:
# to check if our app is correct 
print(" The lenth of the non duplicated value is", len(google)-1181)
print('The length of our function is',len(reviews_max))

 The lenth of the non duplicated value is 9659
The length of our function is 9659


In [253]:
# To remove the duplicated app form the google dataset and leave only the clean ones
# create 2 empty list
app_cleaned_google = []
already_in_app = []
#for the row in the google dataset change the review row to float
for row in google:
    name = row[0]
    n_reviews = float(row[3])
    # if the review max is equal to the review and it in not in the list append it to the app_cleaned
    if (reviews_max[name] == n_reviews) and (name not in already_in_app):
        app_cleaned_google.append(row)
        already_in_app.append(name) 

In [254]:
# checking our code if correct
explore_data(app_cleaned_google,0,2,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


number of row: 9659
number of column: 13


### 3. Removing non english app names
Now we have cleanded the google dataset and we need to remove characters that are not in english, since the company major target is only for english speakers
* to do this we know that according the American Standard Code for Inoformation, the english letters only inclued 127 using the ord finction.
* with this we will remove apps that the order is greater than 127

In [255]:
# creating a function that identify letters that are not english words
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True
# to check fir instagram is an english world
print(is_english('Instagram'))
#  to check for a korean word that means fine
print(is_english('벌금'))

True
False


How ever we notice that some of the words have some characters like trademarks,@,and emojis that are part of the app but not an english words 
* in reasoning to this we will only remove the app names that have  grerater than 3 non english characteers
* FIRST: i will create a function that function that will loop through the a string and return only strings that have less than 3 foreign characters
* NEXT: then i will loop through the google and create a new data set that donot contain thoes characters

In [256]:
# creating a function that checkes the strings characters for any one not in the english
def to_english(string):
    num_cha = 0
    for characters in string:
        if ord(characters) > 127:
            num_cha += 1
            # for any one that is greater than 3 do not print it
    if num_cha > 3:
        return False
    else:
        return True   

Using the function created to search through the two dataset and append only app names that do not contain more than 3 characters.

In [257]:
google_cleaned=[]
apple_cleaned =[]
# for the google data set
for row in app_cleaned_google:
    name=row[0]
    if to_english(name):
        google_cleaned.append(row)

# for the apple dataset i did the same

for row in apple:
    name = row[1]
    if to_english(name):
        apple_cleaned.append(row)
        

In [258]:
# to check our data set and explor them using our function
explore_data(google_cleaned,0,2,True)
print('\n')
explore_data(apple_cleaned,0,2,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


number of row: 9614
number of column: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


number of row: 6183
number of column: 16


After removing app names that are not english in the row it is noticable that about 1000 rows have been droped from each dataset

### 4. Removing Non-Free Apps
* Since our organization main focus is on free apps, i will be dropping apps which are not free for use.

In [259]:
print(apple_header)
print('\n')
print(google_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [260]:
# creat and empty list for android
android= []
# for the price row if it equal to 0 append it
for row in google_cleaned:
    price = row[7]
    if price == '0' or price == '0.0':
        android.append(row)
# the same applies to ios data for the price row
ios = []
for row in apple_cleaned:
    price = row[4]
    if price == '0' or price == '0.0':
        ios.append(row)

In [261]:
explore_data(android,0,2,True)
print('\n')
explore_data(ios,0,2,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


number of row: 8864
number of column: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


number of row: 3222
number of column: 16


* For the android dataset about 1000 rows of the data where not free and needed to be deleted
* While for the ios apps it is obvious that most apps are paid for where almost half of the dataset is droped

## <div id="analysis"></div>Analysis
*   First we want to get the generes that are the most commonly made on IOS and Android
*   Now we want to get the top downloded apps in ios and top in android

#### To get the list
* we create a frequency table that count how many times a particulare name occure depending on whatever column we want to check
* then we convert the count into percentage
* we use the sort function to arrange it in decending order

In [262]:
# creating a frequency table function that count how many times a name occur and
def freq_table(dataset,index):
    # first we create an empty dictionary and store our total to zero
    table = {}
    total = 0
# loop through the row in the dataset and increase the total
    for row in dataset:
        total += 1
        value = row[index]
# if tbe value name is aleady in our dictionary increase the count by one else count from 0ne
        if value in table:
            table[value] += 1
        else:
            table[value] = 1

# create an empty dictionary to store our percentage 
    table_percentage = {}
    for key in table:
# if the key is present divide by total and round the answer up
        percentage = (table[key]/total)*100
        table_percentage[key] = round(percentage,2)
# return the table when the function is called
    return table_percentage

# create another function that helps sort our value in ascending order by converting to tuple
def table_sort(dataset,index):
    table= freq_table(dataset,index)
    arranged_table = []
# we can only order the table if it is converted to tuple
    for row in table:
        value_tuple=(table[row],row)
        arranged_table.append(value_tuple)


    table_sorted= sorted(arranged_table,reverse= True)
    for entry in table_sorted:
        print(entry[1],':',entry[0])



In [263]:
# to print the frequency table for the ios prime_genre column
table_sort(ios,-5)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


In [264]:
# to print the frequency table for the android categorie column
table_sort(android,1)

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


In [265]:
# to print the frequency table for the android genre column
table_sort(android,9)

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

### Result:
After looking through the 'prime_gener' in the IOS app and 'Geners' and 'Category' in the Android dataset 
* The IOS app are more of fun app having about half of it app 58% produce for games followed by entertainment with 7% and photo & Video witth 4%. we could conclude that majoryty of app produce in the IOS are for fun
* In the Android section contained two sections that looked identical, however the Gener was more classified than the Category
* Unlike the IOS app the Andriod app where more distributed around service apps with Tools at the top 7% next to Entertaiment having 5.7% followed by education and Medicals with 5% and 4% respectively
* The conclusion is based on the Apps that are free and in english form

### Most Download App by Installs

In [266]:
# using the frequency table function to generate the most downloded app
prime_gener=freq_table(ios,-5)
# for rows in the varable
for genre in prime_gener:
    total = 0
    len_genre= 0
#specifying the row rating count total and converting to float add them together
    for row in ios:
        gener_app= row[-5]
        if gener_app == genre:
            n_rating = float(row[5])
            total += n_rating
            len_genre +=1

    avg_rating = total/len_genre
    print(genre,':',avg_rating)



Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Most downloded apps on apple store are Social Networking,Photo & Video, Games,Music and Reference which goes along with our conclusion that most apps that are english free in Ios where created for fun.
* looking into the apps to get more insight that make up the top most downloded apps to get better info

In [267]:
# getting the apps that makes up the social networking
for row in ios:
    if row[-5] == "Social Networking":
        print(row[1],':',row[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

Although the social media app and networking apps are the most popular looking into the apps that are created under this gener, i see that only few apps skewed the count to the top example of the apps are facebook, Skype etc. which means i wont recomend this type of app to my organizations as only apps that are already popular dominate this gener and require larg resource to build.
* i would look into less popular apps to see if 

In [268]:
# more ishight in health and fitness
for row in ios:
    if row[-5] == "Health & Fitness":
        print(row[1],':',row[5])

Calorie Counter & Diet Tracker by MyFitnessPal : 507706
Lose It! – Weight Loss Program and Calorie Counter : 373835
Weight Watchers : 136833
Sleep Cycle alarm clock : 104539
Fitbit : 90496
Period Tracker Lite : 53620
Nike+ Training Club - Workouts & Fitness Plans : 33969
Plant Nanny - Water Reminder with Cute Plants : 27421
Sworkit - Custom Workouts for Exercise & Fitness : 16819
Clue Period Tracker: Period & Ovulation Tracker : 13436
Headspace : 12819
Fooducate - Lose Weight, Eat Healthy,Get Motivated : 11875
Runtastic Running, Jogging and Walking Tracker : 10298
WebMD for iPad : 9142
8fit - Workouts, meal plans and personal trainer : 8730
Garmin Connect™ Mobile : 8341
Record by Under Armour, connects with UA HealthBox : 7754
Fitstar Personal Trainer : 7496
My Cycles Period and Ovulation Tracker : 7469
Seven - 7 Minute Workout Training Challenge : 6808
RUNNING for weight loss: workout & meal plans : 6407
Lifesum – Inspiring healthy lifestyle app : 5795
Waterlogged - Daily Hydration Tr

I could recommend apps that are utility type to my organization but i will want apps which requires the user to use for a longer time since majority of our revenue are on ads. For example a fitness traker that will have period traker addes to it. 

### Most Popular Apps by Gener on Andriod
* Unlike the IOS that has a column for the rating count total. the Google dataset do not contain thoes
* which will require us to do some manipulation and use the number of installation that the app got.

In [269]:
# using the frequency table function
categories_android = freq_table(android, 1)
# for row in android add up the install column by app geners 
for category in categories_android:
    total = 0
    len_category = 0
    for app in android:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            # replace all characters before adding and convert to float
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
            # get the average and print
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)
        

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

i will be looking at the health and fitness section since that waht i looked into in the Ios section to get better view if my recomendation also apply in the google dataset

In [270]:
# checking for the health and fitness geners to see the apps that are mostly downloaded
for app in android:
    if app[1] == 'HEALTH_AND_FITNESS' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Lose Belly Fat in 30 Days - Flat Stomach : 5,000,000+
Pedometer - Step Counter Free & Calorie Burner : 1,000,000+
Six Pack in 30 Days - Abs Workout : 10,000,000+
Lose Weight in 30 Days : 10,000,000+
Pedometer : 10,000,000+
LG Health : 10,000,000+
Step Counter - Pedometer Free & Calorie Counter : 10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App : 10,000,000+
Sportractive GPS Running Cycling Distance Tracker : 1,000,000+
30 Day Fitness Challenge - Workout at Home : 10,000,000+
Home Workout for Men - Bodybuilding : 1,000,000+
Sleep Sounds : 1,000,000+
Fitbit : 10,000,000+
Calorie Counter - EasyFit free : 1,000,000+
Garmin Connect™ : 10,000,000+
BetterMe: Weight Loss Workouts : 5,000,000+
Bike Computer - GPS Cycling Tracker : 1,000,000+
Running Distance Tracker + : 1,000,000+
Runkeeper - GPS Track Run Walk : 10,000,000+
Walking: Pedometer diet : 1,000,000+
8fit Workouts & Meal Planner : 10,000,000+
Keep Trainer - Workout Trainer & Fitness Coach : 1,000,000+
PumpUp — Fitness Co

unlike the apple store i did not see period tracker here but most where like the ios that inclueded weight loss,step counter e.t.c

## <div id="conclusion"></div>Conclusion
There are a lot of apps which the company might concentrate, and I believe that this mostly depends on the genre they are interested in. However, as the company's analyst, I would not advise against downloading popular social or entertainment apps because a few number of apps, such as Facebook, Instagram, and Spotify, control those markets.
I might suggest a fitness app with a period tracker because it will encourage users to spend more time using the app, which will increase the opportunity for app adverts, which is how we make money.