# Profitable App Profiles for the App Store and Google Play Markets

This project is about the App profiles.
The goal of this project is to help the developers understand what type of apps are likely to attract more users on Google Play and the App Store. This project will only focus on free apps.

### Summary of results
After analyzing data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets. It can be concluded that Weather app could be profitable for App Store and Personalization app could be profitable for Google Play Store market.

## Reading in data

In [36]:
# The Google Play data set
from csv import reader
opened_file = open('googleplaystore.csv')
reader_file = reader(opened_file)
an_apps_data = list(reader_file)
an_apps_header = an_apps_data[0]
an_apps_data = an_apps_data[1:]

# Apple store data set
opened_file = open('AppleStore.csv')
reader_file = reader(opened_file)
ios_apps_data = list(reader_file)
ios_apps_header = ios_apps_data[0]
ios_apps_data = ios_apps_data[1:]

## Exploring data

In [2]:
def explore_data(dataset, start, end, rows_and_column=False):
    # define function to explore data and show number of row and column
    data_slice = dataset[start:end]
    for row in data_slice:
        print(row)
        print('\n') 
        
    if rows_and_column:
            print('Number of rows: ', len(dataset))
            print('Number of columns: ', len(dataset[0]))
    

In [3]:
# explore data in Google Play data set
print('Google play store:')
print('\n')
print(an_apps_header)
print('\n')
explore_data(an_apps_data, 0, 2, True)

Google play store:


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows:  10841
Number of columns:  13


we can see that there are 10841 apps and 13 columns for each app in the Google play store data set. 

In [4]:
# explore data in App store data set
print('App store:')
print('\n')
print(ios_apps_header)
print('\n')
explore_data(ios_apps_data, 0, 2, True)

App store:


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows:  7197
Number of columns:  16


we can see that there are 7197 apps and 16 columns for each app in the Apple store data set. For the meaning of the column name, please refer to the [documentation][1]

[1]: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home?select=AppleStore.csv

## Deleting wrong data

**Google play Store:**

In [5]:
# find out incorrect row which has different length
for row in an_apps_data:
    if len(row) != len(an_apps_header):
        print(row)
        print(an_apps_data.index(row))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472


In android dataset, by comparing the data set length and data header length, it's found that some data in entry 10472 is missing, therefore entry 10472 will be deleted.

In [6]:
# delete incorrect row
del an_apps_data[10472]

**App Store:**

In [7]:
# find out incorrect row which has different length
for row in ios_apps_data:
    if len(row) != len(ios_apps_header):
        print(row)
        print(ios_apps_data.index(row))

0 entry with missing data was found in IOS dataset.

## Removing duplicate entries
**Google Play Store:**

In [8]:
# find out duplicate entries
unique_apps = []
duplicate_apps = []

for app in an_apps_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate app in Android dataset ', len(duplicate_apps))
print('\n')
print('Example: ', duplicate_apps[0:5])

Number of duplicate app in Android dataset  1181


Example:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


1181 entries with same app name were found in android dataset. Duplicates will be deletedonly and only entries with hightest number of reviews will be kept. 

In [9]:
# calculate expected number of entries after cleaning
print('Expected Number of entries after cleaning: ', len(an_apps_data) - len(duplicate_apps))

Expected Number of entries after cleaning:  9659


In [10]:
# remove duplicated entries
reviews_max = {}
for app in an_apps_data:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
android_clean = []
already_added = []

for app in an_apps_data:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
        
print('Length of actual dataset after cleaning: ', len(android_clean))

Length of actual dataset after cleaning:  9659


the *reviews_max* dictionary was used to remove the duplicates. For the duplicate cases, only the entries with the highest number of reviews were kept.

In [11]:
# confirm result
explore_data(android_clean, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows:  9659
Number of columns:  13


We have 9659 rows, just as expected.

**App Store:**

In [12]:
# find out duplicated entries
unique_apps = []
duplicate_apps = []

for app in ios_apps_data:
    name = app[1]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate app in IOS dataset: ', len(duplicate_apps))
print('\n')
print('Example: ', duplicate_apps[0:5])

Number of duplicate app in IOS dataset:  2


Example:  ['Mannequin Challenge', 'VR Roller Coaster']


2 entries with same app name were found in IOS dataset. Duplicates will be deletedonly and only entries with hightest number of reviews will be kept. 

In [13]:
# calculate expected number after cleaning
print('Expected Number of entries after cleaning: ', len(ios_apps_data) - len(duplicate_apps))

Expected Number of entries after cleaning:  7195


In [14]:
# remove duplicated entries
reviews_max = {}
for app in ios_apps_data:
    name = app[1]
    n_reviews = float(app[5])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
ios_clean = []
already_added = []

for app in ios_apps_data:
    name = app[1]
    n_reviews = float(app[5])
    if n_reviews == reviews_max[name] and name not in already_added:
        ios_clean.append(app)
        already_added.append(name)
        
print('Length of actual dataset after cleaning: ', len(ios_clean))

Length of actual dataset after cleaning:  7195


In [15]:
# confirm result
explore_data(ios_clean, 0, 2, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows:  7195
Number of columns:  16


We have 7195 rows, just as expected.

## Removing non-english app

In [16]:
def eng_check(name):
    # define function to find out non-english app
    for cha in name:
        if ord(cha) > 127:
            return False
    return True
        
print(eng_check('Instagram'))
print(eng_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(eng_check('Docs To Go™ Free Office Suite'))
print(eng_check('Instachat 😜'))

True
False
False
False


Non-English App are not our target, therefore *eng_check* function was built to find out the entries with non-english characters based on the using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.


The function seems to work fine, but some English app names use emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form.

In [17]:
def eng_check(name):
    # improve non english function
    non_eng = 0
    for cha in name:
        if ord(cha) > 127:
            non_eng += 1
    if non_eng > 3:
        return False
    else:
        return True
    
print(eng_check('Docs To Go™ Free Office Suite'))
print(eng_check('Instachat 😜'))

True
True


The function has been improved to identify entries with more than three non-english characters.

**Google Play Store:**

In [18]:
# remove non english app
an_clean_eng = []
for app in android_clean:
    if eng_check(app[0]):
        an_clean_eng.append(app)
        
explore_data(an_clean_eng, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows:  9614
Number of columns:  13


After removing non-english app, there are 9614 entries in android dataset.

**App Store: **

In [19]:
# remove non english app
ios_clean_eng = []
for app in ios_clean:
    if eng_check(app[1]):
        ios_clean_eng.append(app)
        
explore_data(ios_clean_eng, 0, 2, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows:  6181
Number of columns:  16


After removing non-english app, there are 6181 entries in IOS dataset.

# Isolating free apps 

As mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

**Google Play Store:**

In [20]:
# extract free app
an_final = []
for app in an_clean_eng:
    if app[6] == 'Free':
        an_final.append(app)
        
explore_data(an_final, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows:  8863
Number of columns:  13


After removing non-free app, there are 8863 entries in android dataset.

**App Store:**

In [21]:
# extract free app
ios_final = []
for app in ios_clean_eng:
    if app[4] == '0.0':
        ios_final.append(app)
        
explore_data(ios_final, 0, 2, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows:  3220
Number of columns:  16


After removing non-free app, there are 3220 entries in IOS dataset.

---

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we develop it further.
If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.
Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

we will begin the analysis by getting a sense of what are the most common genres for each market.

In [22]:
def freq_table(dataset, index):
    # define frequency table
    table = {}
    total = 0
    for row in dataset:
        total += 1
        key = row[index]
        if key in table:
            table[key] += 1
        else:
            table[key] = 1
    
    table_percent = {}
    for key in table: 
        table_percent[key] = (table[key]/total)*100
    
    return table_percent

In [23]:
def display_table(dataset, index):
    # define function to create frequency table
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

**Google Play Store:**

In [24]:
# print frequenct table by category
print('By Category:')
display_table(an_final, 1)

By Category:
FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_A

We can see that in Google Play Store, about 18.90% of the free English apps are in the Family catagory. Game-catagory apps are close to 9.73%, followed by Tools-catagory apps, which are close to 8.46%. Those are the three main catagories.

In [25]:
# print frequency table by genres
print('By Genres:')
display_table(an_final, 9)

By Genres:
Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946

If we sort the data by Genres, we can see that 8.45% of the free English apps belong to Tools, followed by Entertainment app, which is 6.07% and Education apps, which is 5.35%

**App Store:**

In [27]:
# print frequency table by prime genre
print('By prime_genre:')
display_table(ios_final, 11)

By prime_genre:
Games : 58.13664596273293
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.291925465838509
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602486
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801243
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


For the IOS Store, more than half(58.14%) of the free English apps are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. 

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

# Most Popular Apps by Genre on the App Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

In [28]:
# create frequency table with number of review
table = {}
for genre in freq_table(ios_final, 11):
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[11]
        if genre_app == genre:
            total += float(app[5])
            len_genre += 1
    average_num_rating = total/len_genre
    table[genre] = average_num_rating

table_display = []
for genre in table:
    gen_val_as_tuple = (table[genre], genre)
    table_display.append(gen_val_as_tuple)

table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22812.92467948718
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


On average, navigation apps have the highest number of user reviews (86090), but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

In [29]:
# explore navigation apps
for app in ios_final:
    if app[11] == 'Navigation':
        print(app[1],":",app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold. We could get a better picture by removing these extremely popular apps for each genre and then rework the averages, but we'll leave this level of detail for later.

In [30]:
# exlpore weather apps
for app in ios_final:
    if app[11] == 'Weather':
        print(app[1],":",app[5])

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

The Weather app shows prtential, no domination apps in the market so far. But to provide reliable live weather data may require us to connect our apps to non-free APIs. This expense needs to be covered with other income, like Ads. 

# Most Popular Apps by Genre on the Google Play Store

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.). 
One problem with this data is that is not precise, but we're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

In [31]:
# create frequency table for installation
table = {}
for category in freq_table(an_final, 1):
    total = 0
    len_category = 0
    for app in an_final:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            installs = installs.replace('+','')
            installs = installs.replace(',','')
            total += float(installs)
            len_category += 1
    average_install = total/len_category
    table[category] = average_install

table_display = []
for category in table:
    cat_val_as_tuple = (table[category], category)
    table_display.append(cat_val_as_tuple)

table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3697848.1731343283
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [32]:
# explore communication apps
for app in an_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.) Books and Reference (Google Play Books, Bible, Amazon Kindle, etc.) or Shopping (letgo, Lazada, OLX, The birth, etc.)

These niches seem to be dominated by a few giants who are hard to compete against.

In [33]:
# explore personalization apps
for app in an_final:
    if app[1] == 'PERSONALIZATION':
        print(app[0], ':', app[5])

Nova Launcher : 50,000,000+
Funny Ringtones : 1,000,000+
ZEDGE™ Ringtones & Wallpapers : 100,000,000+
XOS - Launcher,Theme,Wallpaper : 5,000,000+
3D Live Neon Weed Launcher : 100,000+
Evie Launcher : 5,000,000+
Golden Launcher : 5,000,000+
Launcher : 1,000,000+
CM Launcher 3D - Theme, Wallpapers, Efficient : 100,000,000+
4K Wallpapers and Ultra HD Backgrounds : 500,000+
OnePlus Launcher : 1,000,000+
Birds Sounds Ringtones & Wallpapers : 1,000,000+
Funny Alarm Clock Ringtones : 1,000,000+
ZenUI Launcher : 50,000,000+
Color Call - Caller Screen, LED Flash : 1,000,000+
New Launcher 2018 : 10,000,000+
Diamond Zipper Lock Screen : 10,000,000+
Emoji Keyboard - Cute Emoji,GIF, Sticker, Emoticon : 10,000,000+
3D Blue Glass Water Keyboard Theme : 10,000,000+
Backgrounds (HD Wallpapers) : 10,000,000+
New 2018 Keyboard : 10,000,000+
APUS Launcher - Theme, Wallpaper, Hide Apps : 100,000,000+
ZenUI Themes – Stylish Themes : 10,000,000+
Keyboard - wallpapers , photos : 10,000,000+
Lovely Cute Pink K

The personalization catagory includes a variety of apps: Launcher, Ringtones, Wallpapers, etc. It seems there's still a small number of extremely popular apps that skew the average:

In [34]:
# divide personalization apps into 3 categories
for app in an_final:
    if app[1] == 'PERSONALIZATION' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

ZEDGE™ Ringtones & Wallpapers : 100,000,000+
CM Launcher 3D - Theme, Wallpapers, Efficient : 100,000,000+
APUS Launcher - Theme, Wallpaper, Hide Apps : 100,000,000+
Hola Launcher- Theme,Wallpaper : 100,000,000+
Backgrounds HD (Wallpapers) : 100,000,000+
GO Keyboard - Emoticon keyboard, Free Theme, GIF : 100,000,000+
Parallel Space - Multiple accounts & Two face : 100,000,000+
GO Launcher - 3D parallax Themes & HD Wallpapers : 100,000,000+


However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [35]:
for app in an_final:
    if app[1] == 'PERSONALIZATION' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Nova Launcher : 50,000,000+
Funny Ringtones : 1,000,000+
XOS - Launcher,Theme,Wallpaper : 5,000,000+
Evie Launcher : 5,000,000+
Golden Launcher : 5,000,000+
Launcher : 1,000,000+
OnePlus Launcher : 1,000,000+
Birds Sounds Ringtones & Wallpapers : 1,000,000+
Funny Alarm Clock Ringtones : 1,000,000+
ZenUI Launcher : 50,000,000+
Color Call - Caller Screen, LED Flash : 1,000,000+
New Launcher 2018 : 10,000,000+
Diamond Zipper Lock Screen : 10,000,000+
Emoji Keyboard - Cute Emoji,GIF, Sticker, Emoticon : 10,000,000+
3D Blue Glass Water Keyboard Theme : 10,000,000+
Backgrounds (HD Wallpapers) : 10,000,000+
New 2018 Keyboard : 10,000,000+
ZenUI Themes – Stylish Themes : 10,000,000+
Keyboard - wallpapers , photos : 10,000,000+
Microsoft Launcher : 10,000,000+
Goku Wallpaper Art : 1,000,000+
Door Lock Screen : 1,000,000+
Yandex Browser with Protect : 50,000,000+
Cute wallpapers & kawaii backgrounds images : 1,000,000+
ASUS Cover for ZenFone 2 : 10,000,000+
Live 3D Neon Blue Love Heart Keyboard 

It seems there is no domination type in this niche, so it's probably not a good idea to build similar apps since there'll be some significant competition.

# Conclusions¶
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that Weather app could be profitable for App Store and Personalization app could be profitable for Google Play Store market. 