# Android and iOS mobile apps Project

Company X, which builds Android and IOS mobile apps, is interested in raising their revenue. Since all the apps are free, the company's main source of revenue consists of in-app ads. Because of that, the company is seeking to attract and engage more users. The more users, the more money the company will make.

The goal of this project is identify the features of apps that are likely to attract more users.

In [1]:
# Open AppleStore.csv and save it as lists of lists
opened_file = open('AppleStore.csv', encoding="utf8")
from csv import reader
read_file = reader(opened_file)
ios_apps = list(read_file)

# Open googleplaystore.csv and save it as lists of lists
opened_file = open('googleplaystore.csv', encoding="utf8")
read_file = reader(opened_file)
android_apps = list(read_file)

# Save the headers
ios_header = ios_apps[0]
ios_apps = ios_apps[1:]
android_header = android_apps[0]
android_apps = android_apps[1:]

In [2]:
# Function explore_data() to help visualise the datasets

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

# Let's explore the data sets using explore_data() function
print("Firts lines of Apple apps:\n")
explore_data(ios_apps, 0, 4, rows_and_columns=True)
print("First lines of Android apps:\n")
explore_data(android_apps, 0, 4, rows_and_columns=True)

Firts lines of Apple apps:

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16
First lines of Android apps:

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018',

Now it is time to see the column names and try to identify which columns could be the most relevant to the analysis. More information on the content of each column can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) for Apple apps and [here](https://www.kaggle.com/lava18/google-play-store-apps) for Android apps.

In [3]:
# Print the column names and try to identify
# which columns are the most relevant to the analysis
print("ios_header:")
print(ios_header)
print('\n')
print("android_header:")
print(android_header)

ios_header:
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


android_header:
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


The most important columns in `ios_apps` seem to be the following:

- track_name (app name)
- price (price)
- user_rating (average user rating value)
- prime_genre (genre)

The most important columns in `android_apps` seem to be the following:

- App (app name)
- Category (category)
- Rating (overall user rating)
- Price (price)
- Genres (an app can belong to multiple genres)





The row 10472 in the `android_apps` has no Category value (as demonstrated below).

In [4]:
header_length = len(android_header)
print("Header length = ", end='')
print(header_length)

for row in android_apps:
    row_length = len(row)
    if header_length != row_length:
        print(row)
        print("Row length = ", end='')
        print(row_length)
        print("Row index = ", end='')
        print(android_apps.index(row))

Header length = 13
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
Row length = 12
Row index = 10472


So I will use the `del` statement to remove that row from the data set.

In [5]:
del android_apps[10472]

From this [discussion thread](https://www.kaggle.com/lava18/google-play-store-apps/discussion), we know that Google Play data set has duplicate entries. Some of the apps that have duplicates are **Instagram**, **Box** and **ZOOM Cloud Meetings**.

In [6]:
for row in android_apps:
    name = row[0]
    if name == 'Instagram':
        print(row, end='\n\n')


for row in android_apps:
    name = row[0]
    if name == 'Box':
        print(row, end='\n\n')

for row in android_apps:
    name = row[0]
    if name == 'ZOOM Cloud Meetings':
        print(row, end='\n\n')

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']

['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 20

From the rows above, it is possible to see that the main difference among the duplicates is the fourth value, which represents the number of reviews. Using lists, I will count the number of duplicates first. Then, I will keep in my data set only the row with the highest number of reviews and remove the other entries for that app.

In [7]:
unique_apps = []
duplicate_apps = []

for row in android_apps:
    name = row[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print("Number of unique apps: ", len(unique_apps))
print("Number of duplicates: ", len(duplicate_apps))

Number of unique apps:  9659
Number of duplicates:  1181


In [8]:
# Create a dictionary with the names of duplicates and highest number of reviews
apps_max_reviews = {}

for row in android_apps:
    name = row[0]
    reviews = float(row[3])
    if name in apps_max_reviews:
        if reviews > apps_max_reviews[name]:
            apps_max_reviews[name] = reviews
    else:
        apps_max_reviews[name] = reviews

print("Expected length of my dictionary: 9659")
print("Lenth of my dictionary: ", len(apps_max_reviews))
print("")
                    
# Create a new data set removing the duplicates 
unique_android_apps = []
already_added = []

for row in android_apps:
    name = row[0]
    reviews = float(row[3])
    MAX = apps_max_reviews[name]
    if (reviews == MAX) and (name not in already_added):
        unique_android_apps.append(row)
        already_added.append(name)

# Check the length of my new data set without duplicates
print("Expected number of rows of the new data set: 9659")
print("Number of rows of the new data set: ", len(unique_android_apps))

Expected length of my dictionary: 9659
Lenth of my dictionary:  9659

Expected number of rows of the new data set: 9659
Number of rows of the new data set:  9659


In this analysis, we are interested only in the apps directed toward an English-speaking audience. So the next step is removing all the apps that contain non-English characters.

The numbers corresponding to the characters that are commonly used in English are all in the range 0 to 127, according to [ASCII](https://en.wikipedia.org/wiki/ASCII) (American Standard Code for Information Interchange).


If a character is greater than 27, then it is probably a non-English character. The function below tells us if a string has a non-English character.

In [9]:
def hasNonEnglishChar(string):
    for char in string:
        int_char = ord(char)
        if int_char > 127:
            return True
    return False

print(hasNonEnglishChar('Instagram'))
print(hasNonEnglishChar('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(hasNonEnglishChar('Docs To Go™ Free Office Suite'))
print(hasNonEnglishChar('Instachat 😜'))

False
True
True
True


However, this function is not perfect. It considers emojis (😜) and quotation marks (') as non-English characters. If we used it to clean the data, we would lose a lot of English apps.

To minimize the impact of data loss, we will only remove an app if it has more than three characters greater than 127. This filter is still not perfect, but it should be fine for this analysis.

Let's modify the previous function and see how it works. In this new function, we will say that an app is English if its name has three or less characters greater than 127.

In [10]:
def isEnglish(string):
    counter = 0
    for char in string:
        int_char = ord(char)
        if int_char > 127:
            counter += 1
            if counter == 4:  # case when it's not English
                return False
    return True

print(isEnglish('Instagram'))
print(isEnglish('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(isEnglish('Docs To Go™ Free Office bSuite'))
print(isEnglish('Instachat 😜'))

True
False
True
True


The new function `isEnglish` seems to work much better than the previous one. 

Now we will use it to filter out non-English apps from both data sets.

In [11]:
# Remove non-English apps from Android data set
english_android_apps = []
englishApps = []

for row in unique_android_apps:
    name = row[0]
    if isEnglish(name) == True:
        englishApps.append(name)

for row in unique_android_apps:
    name = row[0]
    if name in englishApps:
        english_android_apps.append(row)
        
# Remove non-English apps from Apple data set
english_ios_apps = []
englishApps = []

for row in ios_apps:
    name = row[1]
    if isEnglish(name) == True:
        englishApps.append(name)

for row in ios_apps:
    name = row[1]
    if name in englishApps:
        english_ios_apps.append(row)

android_rows = len(english_android_apps)
ios_rows = len(english_ios_apps)
print("Number of rows in new Android data set: ", android_rows)
print(">>> Before: ", len(unique_android_apps))
print("Number of rows in new Apple data set: ", ios_rows)
print(">>> Before: ", len(ios_apps))

Number of rows in new Android data set:  9614
>>> Before:  9659
Number of rows in new Apple data set:  6183
>>> Before:  7197


So far in the data cleaning process, we:

- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps

Since we want to analyse only free apps, the next and last step will be removing all non-free apps from both data sets.

In [12]:
clean_android = []
clean_ios = []

for row in english_android_apps:
    price = row[7] # type if Free or Paid
    if price == '0':
        clean_android.append(row)

for row in english_ios_apps:
    price = row[4]
    if price == '0.0':
        clean_ios.append(row)
        
android_rows = len(clean_android)
ios_rows = len(clean_ios)
        
print("Number of rows in clean Android data set: ", android_rows)
print(">>> Before: ", len(english_android_apps))
print("Number of rows in clean Apple data set: ", ios_rows)
print(">>> Before: ", len(english_ios_apps))

Number of rows in clean Android data set:  8864
>>> Before:  9614
Number of rows in clean Apple data set:  3222
>>> Before:  6183


Now we begin the analysis. 
First, let's get a sense of what are the most common genres for each market (Android and Apple). After all, we are interested in apps that are likely to attract more users because our revenue is highly influenced by the number of people using the apps. 

In [13]:
android_category_freq = {}
android_genre_freq = {}
ios_genre_freq = {}

# Receives a dataset and a column index and 
# returns a dictionary {column value, frequency}
def build_freq_col(data, col_index):
    
    value_freq = {}
    
    for row in data:
        value = row[col_index]
        if value in value_freq:
            value_freq[value] += 1
        else:
            value_freq[value] = 1
            
    return value_freq

android_category_index = 1 
android_genre_index = 9
ios_genre_index = 11 

android_category_freq = build_freq_col(clean_android, android_category_index)
android_genre_freq = build_freq_col(clean_android, android_genre_index)
ios_genre_freq = build_freq_col(clean_ios, ios_genre_index)

print('Android categories:')
print(android_category_freq)
print('\n')
print('Android genres:')
print(android_genre_freq)
print('\n')
print('iOS genres:')
print(ios_genre_freq)

Android categories:
{'ART_AND_DESIGN': 57, 'AUTO_AND_VEHICLES': 82, 'BEAUTY': 53, 'BOOKS_AND_REFERENCE': 190, 'BUSINESS': 407, 'COMICS': 55, 'COMMUNICATION': 287, 'DATING': 165, 'EDUCATION': 103, 'ENTERTAINMENT': 85, 'EVENTS': 63, 'FINANCE': 328, 'FOOD_AND_DRINK': 110, 'HEALTH_AND_FITNESS': 273, 'HOUSE_AND_HOME': 73, 'LIBRARIES_AND_DEMO': 83, 'LIFESTYLE': 346, 'GAME': 862, 'FAMILY': 1676, 'MEDICAL': 313, 'SOCIAL': 236, 'SHOPPING': 199, 'PHOTOGRAPHY': 261, 'SPORTS': 301, 'TRAVEL_AND_LOCAL': 207, 'TOOLS': 750, 'PERSONALIZATION': 294, 'PRODUCTIVITY': 345, 'PARENTING': 58, 'WEATHER': 71, 'VIDEO_PLAYERS': 159, 'NEWS_AND_MAGAZINES': 248, 'MAPS_AND_NAVIGATION': 124}


Android genres:
{'Art & Design': 53, 'Art & Design;Creativity': 6, 'Auto & Vehicles': 82, 'Beauty': 53, 'Books & Reference': 190, 'Business': 407, 'Comics': 54, 'Comics;Creativity': 1, 'Communication': 287, 'Dating': 165, 'Education': 474, 'Education;Creativity': 4, 'Education;Education': 30, 'Education;Pretend Play': 5, 'Educat

Instead of looking at absolute numbers, let's create a table with the percentages of each genre in a descending order.

In [14]:
android_category_percent = {}
android_genre_percent = {}
ios_genre_percent = {}

def percent_table(dataset, freq_dict):
    
    percent_table = {}
    rows = len(dataset)
    
    for key in freq_dict:
        freq = freq_dict[key]
        percent_table[key] = freq/rows * 100
            
    return percent_table

android_category_percent = percent_table(clean_android, android_category_freq)
android_genre_percent = percent_table(clean_android, android_genre_freq)
ios_genre_percent = percent_table(clean_ios, ios_genre_freq)
    
def display_descending_table(my_dict):
    display_table = []
    
    for key in my_dict:
        val_key_as_tuple = (my_dict[key], key)
        display_table.append(val_key_as_tuple)
        
    table_sorted = sorted(display_table, reverse=True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

print("Android: Category | Percentage")
display_descending_table(android_category_percent)
print('')
print("Android: Genre | Percentage")
display_descending_table(android_genre_percent)
print('')
print("iOS: Genre | Percentage")
display_descending_table(ios_genre_percent)

Android: Category | Percentage
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.654

Up to this point, we found that the App Store is dominated by apps designed for fun (more than 65% of the apps are labeled as Games or Entertainment), while Google Play shows a more balanced landscape of both practical and for-fun apps (Family: 18%, Game: 9%, Tools: 8%, Business: 4%). Now we'd like to get an idea about the kind of apps that have most users.

For the Android apps, we can find this information in the `Installs` column. There is no corresponding column in the App Store data, so we we'll take the total number of user rating as proxy, which we can find in the `rating_count_column`.

Let's start with calculating the average number of user ratings per app genre on the App Store.

In [15]:
genre_avg_ratings = {}

for row in clean_ios:
    genre = row[11]
    ratings = int(row[5])
    if genre not in genre_avg_ratings:
        ratings_list = []
        ratings_list.append(ratings)
        genre_avg_ratings[genre] = ratings_list
    else:
        genre_avg_ratings[genre].append(ratings)

# Now we have to sum up the ratings lists and divide by the number of apps beloging to each genre

for genre in genre_avg_ratings:
    ratings_list = genre_avg_ratings[genre]
    ratings_sum = sum(ratings_list)
    size = len(ratings_list)
    genre_avg_ratings[genre] = ratings_sum/size

print("App Store: Genre | Average number of ratings")
display_descending_table(genre_avg_ratings)


App Store: Genre | Average number of ratings
Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


Most Apple users have Navigation apps. Which ones are them and how many user ratings does each one have?

In [16]:
for row in clean_ios:
    genre = row[11]
    if genre == 'Navigation':
        name = row[1]
        ratings = row[5]
        print(name, ": ", ratings)

Waze - GPS Navigation, Maps & Real-time Traffic :  345046
Google Maps - Navigation & Transit :  154911
Geocaching® :  12811
CoPilot GPS – Car Navigation & Offline Maps :  3582
ImmobilienScout24: Real Estate Search in Germany :  187
Railway Route Search :  5


As we can see, the `Navigation` market is dominated by giants like Waze and Google. The same applies to other genres such as `Reference`, `Social Networking` and `Music`.

It would be a better idea investing in other categories that have more growth potential and still high number of ratings, such as `Book` and `Photo & Video`.

Now let's analyze the Google Play market. We can use the `Installs` column to know which apps are most used by the users. This columns contains strings with ranges, like '100+', '1,000+', '1,000,000+'. Let's consider each range as its minimum. For example, to all the apps that are in the range '1,000+' we are going to assing the value 1000.

To do that, we'll need to convert each install number from string to float. To remove the '+' and ',' characthers, we'll use `str.replace(old, new)`.

`
installs = '1,000+'
new_installs = installs.replace('+', '') 
new_installs = new_installs.replace(',', '')
`
(new_installs will be '1000')

Next, we'll have to convert the string into integer and perform calculations.

In [17]:
def cleanString(str):
    
    new_string = str.replace('+', '')
    new_string = new_string.replace(',', '')
    return new_string

category_installs = {}

for row in clean_android:
    category = row[1]
    installs = row[5]
    installs = cleanString(installs)
    installs = int(installs)
    
    if category not in category_installs:
        installs_list = []
        installs_list.append(installs)
        category_installs[category] = installs_list
    else:
        category_installs[category].append(installs)

# Now we have to sum up the installs and divide by the number of apps beloging to each category

for category in category_installs:
    installs_list = category_installs[category]
    installs_sum = sum(installs_list)
    size = len(installs_list)
    category_installs[category] = installs_sum/size
    
print("Google Play: Category | Average number of installs")
display_descending_table(category_installs)

Google Play: Category | Average number of installs
COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734

The `COMMUNICATION` category has the most installs: 38,456,119. It is followed by `VIDEO_PLAYERS` (24,727,872) and `SOCIAL` (23,253,652).

What are the `COMMUNICATION` apps and how many installs do they have?

In [18]:
for row in clean_android:
    category = row[1]
    if category == 'COMMUNICATION':
        name = row[0]
        installs = row[5]
        print(name, ": ", installs)
    

WhatsApp Messenger :  1,000,000,000+
Messenger for SMS :  10,000,000+
My Tele2 :  5,000,000+
imo beta free calls and text :  100,000,000+
Contacts :  50,000,000+
Call Free – Free Call :  5,000,000+
Web Browser & Explorer :  5,000,000+
Browser 4G :  10,000,000+
MegaFon Dashboard :  10,000,000+
ZenUI Dialer & Contacts :  10,000,000+
Cricket Visual Voicemail :  10,000,000+
TracFone My Account :  1,000,000+
Xperia Link™ :  10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard :  10,000,000+
Skype Lite - Free Video Call & Chat :  5,000,000+
My magenta :  1,000,000+
Android Messages :  100,000,000+
Google Duo - High Quality Video Calls :  500,000,000+
Seznam.cz :  1,000,000+
Antillean Gold Telegram (original version) :  100,000+
AT&T Visual Voicemail :  10,000,000+
GMX Mail :  10,000,000+
Omlet Chat :  10,000,000+
My Vodacom SA :  5,000,000+
Microsoft Edge :  5,000,000+
Messenger – Text and Video Chat for Free :  1,000,000,000+
imo free video calls and chat :  500,000,000+
Calls & Tex

What are the `VIDEO_PLAYERS` apps and how many installs do they have?

In [19]:
for row in clean_android:
    category = row[1]
    if category == 'VIDEO_PLAYERS':
        name = row[0]
        installs = row[5]
        print(name, ": ", installs)

YouTube :  1,000,000,000+
All Video Downloader 2018 :  1,000,000+
Video Downloader :  10,000,000+
HD Video Player :  1,000,000+
Iqiyi (for tablet) :  1,000,000+
Video Player All Format :  10,000,000+
Motorola Gallery :  100,000,000+
Free TV series :  100,000+
Video Player All Format for Android :  500,000+
VLC for Android :  100,000,000+
Code :  10,000,000+
Vote for :  50,000,000+
XX HD Video downloader-Free Video Downloader :  1,000,000+
OBJECTIVE :  1,000,000+
Music - Mp3 Player :  10,000,000+
HD Movie Video Player :  1,000,000+
YouCut - Video Editor & Video Maker, No Watermark :  5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects :  1,000,000+
YouTube Studio :  10,000,000+
video player for android :  10,000,000+
Vigo Video :  50,000,000+
Google Play Movies & TV :  1,000,000,000+
HTC Service － DLNA :  10,000,000+
VPlayer :  1,000,000+
MiniMovie - Free Video and Slideshow Editor :  50,000,000+
Samsung Video Library :  50,000,000+
OnePlus Gallery :  1,000,000+
LIKE – Magic Vi

The categories that are most used by the users are dominated by some giants, like WhatsApp, Messenger and YouTube. Maybe it would a better idea investing in a new Photography of Productivity app, because they are more diverse niches and still present high numbers of installs.

## Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app genre that can be profitable for both markets. 

A photography app would be a good recommendation, since most users have apps that fall in that genre. Besides that, it could be a good bet trying to gain market share in this niche.