# Dataquest App Profiles for App Store and Google Play Store

Purpose: To help developers understand what type of apps are likely to attract more users on Google Play and the App Store

In [1]:
# function to open datasets

def open_sesame(data):
    opened_file=open(data)
    from csv import reader
    read_file=reader(opened_file)
    dataset=list(read_file)
    return dataset

Apple=open_sesame('AppleStore.csv')
Android=open_sesame('googleplaystore.csv')

In [2]:
# function to explore dataset 

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') 

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

explore_data(Apple,0,5,rows_and_columns=True)
explore_data(Android,0,5,rows_and_columns=True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns: 16
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Ed

## Data Cleaning

- Removing missing entries

- Removing duplicate entries

- Removing non-English apps

- Removing apps that are not free

In [3]:
# to find out which rows have missing data

for row in Android:
    header_length = len(Android[0])
    row_length = len(row)
    if row_length != header_length:
        print(row)
        print('Row '+ str(Android.index(row)) + ' has missing data!!')
        

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
Row 10473 has missing data!!


In [4]:
# found out Row 10473's Category data and Genres data are missing
# used loop to delete row so that won't delete more than once 

for row in Android:
    header_length = len(Android[0])
    row_length = len(row)
    if row_length != header_length:
        print('Row '+ str(Android.index(row))+ ' has been deleted due to missing data!')
        del row

Row 10473 has been deleted due to missing data!


In [5]:
# created a function to find missing data and then delete the row

def missing_data_deleted(dataset):
    for row in dataset:
        header_length = len(dataset[0])
        row_length = len(row)
        if row_length != header_length:
            print(row)
            row_index=dataset.index(row)
            print('Row '+ str(row_index) + ' has missing data!!')
            print('Row '+ str(row_index)+ ' from' + ' dataset'+ ' has been deleted due to missing data!')
            del dataset[row_index]
    return dataset


In [6]:
Android=missing_data_deleted(Android)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
Row 10473 has missing data!!
Row 10473 from dataset has been deleted due to missing data!


In [7]:
Apple=missing_data_deleted(Apple)

In [8]:
# confirm if there is indeed a reduction in rows in the dataset

explore_data(Apple,0,3,True)
explore_data(Android,0,3,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7198
Number of columns: 16
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone

Successfully removed rows with missing data. Now, I gotta identify and remove the duplicate entries

In [9]:
Android_unique=[] # empty list to hold names of unique entries 
Android_duplicates=[] # empty list to hold names of duplicate entries

for i in Android[1:]:
    name=i[0]
    if name not in Android_unique:
        Android_unique.append(name)
    else:
        Android_duplicates.append(name)

print('There are {} duplicate apps'.format(len(Android_duplicates)))
print('Some examples are {}'.format(Android_duplicates[:5]))

There are 1181 duplicate apps
Some examples are ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


I will be keeping the most updated duplicate entry which will be based on highest number of reviews

#### Cleaning for Android dataset first

In [10]:
print('Expected length of updated dataset:',len(Android[1:])-1181)

Expected length of updated dataset: 9659


In [11]:
# store values of highest ratings for apps

reviews_max={}

for i in Android[1:]:
    name=i[0]
    n_reviews=i[3]
    if name in reviews_max and reviews_max[name]< n_reviews:
        reviews_max[name]=n_reviews
    elif name not in reviews_max:
        reviews_max[name]=n_reviews

# Check if it matches expected length of updated dataset

print(len(reviews_max))


9659


In [12]:
Android_unique=[]
already_added=[]

# add to the Android_unique list if the number of reviews is the highest and was not in the already_added list

for i in Android[1:]:
    name=i[0]
    n_reviews=i[3]
    if n_reviews==reviews_max[name] and name not in already_added:
        Android_unique.append(i)
        already_added.append(name)

# verify if number of rows is as expected

explore_data(Android_unique,0,4,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9659
Number of columns: 13


As shown above, the Android dataset no longer has any duplicates and no missing data

#### Check for duplicates in Apple dataset, similar to what I did for Android dataset

In [13]:
Apple_unique=[] # empty list to hold names of unique entries 
Apple_duplicates=[] # empty list to hold names of duplicate entries

for i in Apple[1:]:
    name=i[1]
    if name not in Apple_unique:
        Apple_unique.append(name)
    else:
        Apple_duplicates.append(name)

print('There are {} duplicate apps'.format(len(Apple_duplicates)))
print('Some examples are {}'.format(Apple_duplicates[:5]))

There are 2 duplicate apps
Some examples are ['Mannequin Challenge', 'VR Roller Coaster']


#### Cleaning for Apple dataset

In [14]:
print('Expected length of updated dataset:',len(Apple[1:])-2)

Expected length of updated dataset: 7195


In [15]:
# store values of highest ratings for apps

reviews_max_apple={}

for i in Apple[1:]:
    name=i[1]
    n_reviews=i[5]
    if name in reviews_max_apple and reviews_max_apple[name]< n_reviews:
        reviews_max_apple[name]=n_reviews
    elif name not in reviews_max_apple:
        reviews_max_apple[name]=n_reviews

# Check if it matches expected length of updated dataset

print(len(reviews_max_apple))


7195


In [16]:
Apple_unique=[]
already_added_apple=[]

# add to the Apple_unique list if the number of reviews is the highest and was not in the already_added_apple list

for i in Apple[1:]:
    name=i[1]
    n_reviews=i[5]
    if n_reviews==reviews_max_apple[name] and name not in already_added_apple:
        Apple_unique.append(i)
        already_added_apple.append(name)

# verify if number of rows is as expected

explore_data(Apple_unique,0,4,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7195
Number of columns: 16


Now both Android and Apple datasets have no missing data and no duplicates

#### Remove non-English Apps

In [17]:
# function to check if any characters in string are not English
# if there are more than 3 characters not in ASCII range, the function returns False

def check_english(a_string):
    count=0
    for i in a_string:
        if ord(i) >127:
            count+=1
    if count>3:
        return False
    return True 
print(check_english('Instachat 😜'))
print(check_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


In [18]:
Android_unique_ENG=[] # list of only english apps for android dataset

for i in Android_unique:
    name=i[0]
    if check_english(name):
        Android_unique_ENG.append(i)

explore_data(Android_unique_ENG,0,4, True)        


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9614
Number of columns: 13


In [19]:
Apple_unique_ENG=[] # list of only english apps for Apple dataset

for i in Apple_unique:
    name=i[1]
    if check_english(name):
        Apple_unique_ENG.append(i)
explore_data(Apple_unique_ENG,0,4, True) 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 6181
Number of columns: 16


#### Both Android and Apple datasets now only have English apps, do not have missing data and do not have duplicates

In [20]:
Android_free_ENG=[]
Apple_free_ENG=[]

for i in Android_unique_ENG:
    price=(i[6])
    if price == 'Free':
        Android_free_ENG.append(i)

for i in Apple_unique_ENG:
    price=float(i[4])
    if price == 0:
        Apple_free_ENG.append(i) 


Android_header_row=Android[0]
Apple_header_row=Apple[0]

# add back header rows to the cleaned data

Android_free_ENG.insert(0,Android_header_row)
Apple_free_ENG.insert(0,Apple_header_row)

Both datasets have now been completely cleaned

## Analysing what type of app will be popular in both the App Store and Google Play store 

In [21]:
# function to generate frequency table(%) for any column of interest for any dataset

def freq_table(dataset,index):
    total=0
    dict={}
    for i in dataset[1:]:
        total+=1
        interest=i[index]
        if interest not in dict:
            dict[interest]=1
        else:
            dict[interest]+=1
    for i in dict:
        dict[i]/= (total/100)
        dict[i]=round(dict[i],2)
    
    return dict

In [22]:
# create a function that sorts the frequency in descending order

def display_table(dataset,index):
    sorted_dict={}
    sorted_dict_keys=sorted(freq_table(dataset,index), key=freq_table(dataset,index).get, reverse=True)

    for i in sorted_dict_keys:
        sorted_dict[i]=freq_table(dataset,index)[i]

    return sorted_dict

### Analysis of Apple Store apps 

In [23]:
# Based on prime_genre column

display_table(Apple_free_ENG,11)

{'Games': 58.14,
 'Entertainment': 7.89,
 'Photo & Video': 4.97,
 'Education': 3.66,
 'Social Networking': 3.29,
 'Shopping': 2.61,
 'Utilities': 2.52,
 'Sports': 2.14,
 'Music': 2.05,
 'Health & Fitness': 2.02,
 'Productivity': 1.74,
 'Lifestyle': 1.58,
 'News': 1.34,
 'Travel': 1.24,
 'Finance': 1.12,
 'Weather': 0.87,
 'Food & Drink': 0.81,
 'Reference': 0.56,
 'Business': 0.53,
 'Book': 0.43,
 'Navigation': 0.19,
 'Medical': 0.19,
 'Catalogs': 0.12}

The most common app genre is Games, followed by Entertainment. On the other hand, apps that are for more practical purposes are less common. 

Unfortunately, the Apple app store dataset does not provide the number of installs to provide more insights. 

### Analysis of Android Google Play Store apps

In [24]:
# Based on genre column

display_table(Android_free_ENG,9)

{'Tools': 8.44,
 'Entertainment': 6.07,
 'Education': 5.35,
 'Business': 4.59,
 'Lifestyle': 3.89,
 'Productivity': 3.89,
 'Finance': 3.7,
 'Medical': 3.52,
 'Sports': 3.46,
 'Personalization': 3.32,
 'Communication': 3.24,
 'Action': 3.1,
 'Health & Fitness': 3.08,
 'Photography': 2.95,
 'News & Magazines': 2.8,
 'Social': 2.66,
 'Travel & Local': 2.32,
 'Shopping': 2.25,
 'Books & Reference': 2.14,
 'Simulation': 2.04,
 'Dating': 1.86,
 'Arcade': 1.85,
 'Video Players & Editors': 1.77,
 'Casual': 1.75,
 'Maps & Navigation': 1.4,
 'Food & Drink': 1.24,
 'Puzzle': 1.13,
 'Racing': 0.99,
 'Libraries & Demo': 0.94,
 'Role Playing': 0.94,
 'Auto & Vehicles': 0.93,
 'Strategy': 0.9,
 'House & Home': 0.82,
 'Weather': 0.8,
 'Events': 0.71,
 'Adventure': 0.68,
 'Comics': 0.61,
 'Art & Design': 0.6,
 'Beauty': 0.6,
 'Parenting': 0.5,
 'Card': 0.44,
 'Casino': 0.43,
 'Trivia': 0.42,
 'Educational;Education': 0.39,
 'Board': 0.37,
 'Educational': 0.37,
 'Education;Education': 0.34,
 'Word': 0.2

In [25]:
# Based on the category column

display_table(Android_free_ENG,1)

{'FAMILY': 18.93,
 'GAME': 9.69,
 'TOOLS': 8.45,
 'BUSINESS': 4.59,
 'LIFESTYLE': 3.9,
 'PRODUCTIVITY': 3.89,
 'FINANCE': 3.7,
 'MEDICAL': 3.52,
 'SPORTS': 3.4,
 'PERSONALIZATION': 3.32,
 'COMMUNICATION': 3.24,
 'HEALTH_AND_FITNESS': 3.08,
 'PHOTOGRAPHY': 2.95,
 'NEWS_AND_MAGAZINES': 2.8,
 'SOCIAL': 2.66,
 'TRAVEL_AND_LOCAL': 2.34,
 'SHOPPING': 2.25,
 'BOOKS_AND_REFERENCE': 2.14,
 'DATING': 1.86,
 'VIDEO_PLAYERS': 1.79,
 'MAPS_AND_NAVIGATION': 1.4,
 'FOOD_AND_DRINK': 1.24,
 'EDUCATION': 1.17,
 'ENTERTAINMENT': 0.96,
 'LIBRARIES_AND_DEMO': 0.94,
 'AUTO_AND_VEHICLES': 0.93,
 'HOUSE_AND_HOME': 0.82,
 'WEATHER': 0.8,
 'EVENTS': 0.71,
 'PARENTING': 0.65,
 'ART_AND_DESIGN': 0.64,
 'COMICS': 0.62,
 'BEAUTY': 0.6}

In comparison to the Apple App Store, the Android Google Play Store has a higher proportion of apps that fulfill practical purposes. The Google Play Store also has a smaller percentage of games and entertainment apps.

### Comparing number of installs 

The Apple App store dataset does not have information on the number of installs. Hence, I will be using the total number of user ratings as a proxy for the number of installs and then calculate the average number of user ratings for each genre.  

In [26]:
# Number of installs for each genre for Apple App Store

genre_apple_installs={}

for i in display_table(Apple_free_ENG,11):
    if i not in genre_apple_installs:
        genre_apple_installs[i]=0

for genre in genre_apple_installs:
    total=0
    len_genre=0
    for row in Apple_free_ENG[1:]:
        if row[11]==genre:
            total+=int(row[5]) # column 5 is total rating count
            len_genre+=1
    genre_apple_installs[genre]=round(total/len_genre)

genre_apple_installs



{'Games': 22813,
 'Entertainment': 14030,
 'Photo & Video': 28442,
 'Education': 7004,
 'Social Networking': 71548,
 'Shopping': 26920,
 'Utilities': 18684,
 'Sports': 23009,
 'Music': 57327,
 'Health & Fitness': 23298,
 'Productivity': 21028,
 'Lifestyle': 16486,
 'News': 21248,
 'Travel': 28244,
 'Finance': 31468,
 'Weather': 52280,
 'Food & Drink': 33334,
 'Reference': 74942,
 'Business': 7491,
 'Book': 39758,
 'Navigation': 86090,
 'Medical': 612,
 'Catalogs': 4004}

The Android Google Play Store has information on the number of installs as seen in the Installs column. However, it does not have the absolute number of installs. Rather, it shows 100,000+ or 10,000+ installs for example. 

To work around this, I will just ignore the '+' and use the value as a proxy (eg. 100,000+ as 100,000)

In [27]:
# Number of installs for each genre for Android Google Play Store

genre_android_installs={}

for i in display_table(Android_free_ENG,1): # here I used the Category column instead of Genre column
    if i not in genre_android_installs:
        genre_android_installs[i]=0

for genre in genre_android_installs:
    for row in Android_free_ENG[1:]:
        installs=row[5].replace('+','')
        installs=installs.replace(',','')
        installs=int(installs) # need to convert the string to integer
        if row[1]==genre:
            genre_android_installs[genre]+=installs # column 5 is total rating count

genre_android_installs



{'FAMILY': 6198995690,
 'GAME': 13366869450,
 'TOOLS': 8001043474,
 'BUSINESS': 696902090,
 'LIFESTYLE': 497484429,
 'PRODUCTIVITY': 5791629314,
 'FINANCE': 455163132,
 'MEDICAL': 37632344,
 'SPORTS': 1095230683,
 'PERSONALIZATION': 1529235888,
 'COMMUNICATION': 11036906201,
 'HEALTH_AND_FITNESS': 1143548402,
 'PHOTOGRAPHY': 4647268815,
 'NEWS_AND_MAGAZINES': 2368196260,
 'SOCIAL': 5487861902,
 'TRAVEL_AND_LOCAL': 2894704086,
 'SHOPPING': 1400338585,
 'BOOKS_AND_REFERENCE': 1665884260,
 'DATING': 140914757,
 'VIDEO_PLAYERS': 3931731720,
 'MAPS_AND_NAVIGATION': 503060780,
 'FOOD_AND_DRINK': 211738751,
 'EDUCATION': 189350000,
 'ENTERTAINMENT': 989460000,
 'LIBRARIES_AND_DEMO': 52995810,
 'AUTO_AND_VEHICLES': 53080061,
 'HOUSE_AND_HOME': 97202461,
 'WEATHER': 360288520,
 'EVENTS': 15973160,
 'PARENTING': 31471010,
 'ART_AND_DESIGN': 113221100,
 'COMICS': 44971150,
 'BEAUTY': 27197050}

In [28]:
# create a function that sorts the frequency in descending order for dictionaries

def sort_dictionary(dictionary):
    sorted_dict={}
    sorted_dict_keys=sorted(dictionary, key=dictionary.get, reverse=True)

    for i in sorted_dict_keys:
        sorted_dict[i]=dictionary[i]

    return sorted_dict

In [29]:
sort_dictionary(genre_apple_installs)

{'Navigation': 86090,
 'Reference': 74942,
 'Social Networking': 71548,
 'Music': 57327,
 'Weather': 52280,
 'Book': 39758,
 'Food & Drink': 33334,
 'Finance': 31468,
 'Photo & Video': 28442,
 'Travel': 28244,
 'Shopping': 26920,
 'Health & Fitness': 23298,
 'Sports': 23009,
 'Games': 22813,
 'News': 21248,
 'Productivity': 21028,
 'Utilities': 18684,
 'Lifestyle': 16486,
 'Entertainment': 14030,
 'Business': 7491,
 'Education': 7004,
 'Catalogs': 4004,
 'Medical': 612}

In [30]:
sort_dictionary(genre_android_installs)

{'GAME': 13366869450,
 'COMMUNICATION': 11036906201,
 'TOOLS': 8001043474,
 'FAMILY': 6198995690,
 'PRODUCTIVITY': 5791629314,
 'SOCIAL': 5487861902,
 'PHOTOGRAPHY': 4647268815,
 'VIDEO_PLAYERS': 3931731720,
 'TRAVEL_AND_LOCAL': 2894704086,
 'NEWS_AND_MAGAZINES': 2368196260,
 'BOOKS_AND_REFERENCE': 1665884260,
 'PERSONALIZATION': 1529235888,
 'SHOPPING': 1400338585,
 'HEALTH_AND_FITNESS': 1143548402,
 'SPORTS': 1095230683,
 'ENTERTAINMENT': 989460000,
 'BUSINESS': 696902090,
 'MAPS_AND_NAVIGATION': 503060780,
 'LIFESTYLE': 497484429,
 'FINANCE': 455163132,
 'WEATHER': 360288520,
 'FOOD_AND_DRINK': 211738751,
 'EDUCATION': 189350000,
 'DATING': 140914757,
 'ART_AND_DESIGN': 113221100,
 'HOUSE_AND_HOME': 97202461,
 'AUTO_AND_VEHICLES': 53080061,
 'LIBRARIES_AND_DEMO': 52995810,
 'COMICS': 44971150,
 'MEDICAL': 37632344,
 'PARENTING': 31471010,
 'BEAUTY': 27197050,
 'EVENTS': 15973160}

It seems that in the Apple App Store, even though productive practical apps are less common, they have the highest number of users and much higher than entertainment apps

On the other hand, although the Android Google Play Store did not have a high proportion of games, the 'Games' category had the highest number of installations. 

### Recommended App Genre

Based on the above findings, it is recommended to develop an app that has a **practical value**, such as a **navigation app**. It has the highest number of users in the Apple app store and although not the highest in the Google Play Store, there are still many installations at 503 million.

This app genre is likely to have a high potential for being profitable in both the App Store and Google Play store. 



In [31]:
# Examine Navigation apps in Apple App Store

Navigation_apple={}
for i in Apple_free_ENG[1:]:
    name=i[1]
    if i[11]=='Navigation':
        Navigation_apple[name]=0

for app_name in Navigation_apple:
    for row in Apple_free_ENG[1:]:
        if row[1]==app_name:
            Navigation_apple[app_name]+=int(row[5])

Navigation_apple

{'Waze - GPS Navigation, Maps & Real-time Traffic': 345046,
 'Google Maps - Navigation & Transit': 154911,
 'Geocaching®': 12811,
 'CoPilot GPS – Car Navigation & Offline Maps': 3582,
 'ImmobilienScout24: Real Estate Search in Germany': 187,
 'Railway Route Search': 5}

In [32]:
# Examine Navigation apps in Android Google Play Store

Navigation_android={}
for i in Android_free_ENG[1:]:
    name=i[0]
    if i[1]=='MAPS_AND_NAVIGATION':
        Navigation_android[name]=0

for app_name in Navigation_android:
    for row in Android_free_ENG[1:]:
        if row[0]==app_name:
            installs=row[5].replace('+','')
            installs=installs.replace(',','')
            installs=int(installs)
            Navigation_android[app_name]+=installs

sort_dictionary(Navigation_android)

{'Waze - GPS, Maps, Traffic Alerts & Live Navigation': 100000000,
 'Uber': 100000000,
 'GPS Navigation & Offline Maps Sygic': 50000000,
 'Free GPS Navigation': 50000000,
 'MapQuest: Directions, Maps, GPS & Navigation': 10000000,
 'Yahoo! transit guide free timetable, operation information, transfer search': 10000000,
 'Yandex.Transport': 10000000,
 'Compass': 10000000,
 'Subway Terminator: Smarter Subway': 10000000,
 'Moovit: Bus Time & Train Time Live Info': 10000000,
 'AT&T DriveMode': 10000000,
 'TomTom GPS Navigation Traffic': 10000000,
 'DB Navigator': 10000000,
 'Maps, GPS Navigation & Directions, Street View': 10000000,
 'T map (te map, T map, navigation)': 5000000,
 '乗換NAVITIME\u3000Timetable & Route Search in Japan Tokyo': 5000000,
 'Transit: Real-Time Transit App': 5000000,
 'My Location: GPS Maps, Share & Save Places': 5000000,
 'Yanosik: "antyradar", traffic jams, navigation, camera': 5000000,
 'NAVITIME - Map & Transfer Navi': 5000000,
 'Sygic Car Navigation': 5000000,
 'G

## Conclusion 

Although Navigation apps have the highest installations in the Apple app store, there are only 6 apps, dominated by traffic navigation apps. 

The Android Google Play Store has a much larger variety of navigation apps. It also seems to be dominated by traffic navigation apps and transport timing apps. 

Therefore, I would suggest app developers to consider creating more niche navigation apps for more targeted groups. I listed 2 examples of navigation apps below:

1. Navigation apps for delivery services. For example, even though giants like Waze and Google Maps are widely used, they suggest routes for cars. They do not have features that consider optimal routes for other larger modes of transport like lorries or heavy trucks. 
For delivery lorries/trucks, certain routes may not be accessible due to height restrictions and following Google Maps or Waze may lead to unnecessary detours as a result. 

2. Navigation apps for trekking/ outdoor sports such as cycling. In forrested areas, satellite map views may not be helpful and internet connection may be poor. Offline trail maps that show views of the route from the ground may be better. Also, not all terrains may be suited for cycling and can get more dangerous after bad weather conditions. The app could have community features that enable outdoor enthusiasts to contribute suggestions and images of trekking/cycling routes and provide recommendations on difficulty level for completion. 

### Limitations

Both datasets were scraped in 2018. There may have been different trends as of 2022

The current analysis is only for English apps that are free. It is unclear if charging a price would render the app to be more profitable overall. 

More research needs to be done to determine the type of navigation app to be developed