# Profitable App Profiles for the App Store and Google Play Markets

We have received data analysis task from a company that builds free English Android and iOS mobile apps.
The main source of the company's revenue consists of in-app ads, which means the revenue is highly influenced by the number of people using the apps.  The validation strategy for an app idea is comprised of three steps:

* Build a minimal Android version of the app, and add it to Google Play.
* If the app has a good response from users, we develop it further.
* If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Our aim of this project is to determine the kinds of apps that are likely to attract more users. To do this, we'll need to collect and analyze data about mobile apps available on Google Play and the App Store.

# 1. Opening and Exploring the Data


To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our goals:

* The Google Play Store data set contains data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from this [link](https://www.kaggle.com/lava18/google-play-store-apps)


* The Apple Store data set contains data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this [link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps); 

To make easier for exploring the datasets,  we will define a function open_data to open the two data sets we mentioned above, and impoprt data from csv file to a list.

In [71]:
from csv import reader
def open_data(filename):
    open_file = open(filename, encoding="utf8")
    read_file= reader(open_file)
    return list(read_file)
    
Apple_list = open_data('AppleStore.csv') 
google_list = open_data('googleplaystore.csv')   

Apple_header =Apple_list[0]
Apple_list = Apple_list[1:]
google_header =google_list[0]
google_list = google_list[1:]

To explore both data sets, we can defined and call explore_data() function.

* Print the first few rows of each data set.
* Find the number of rows and columns of each data set
* Print the column names and try to identify the columns that could help us with our analysis. 

In [72]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print(Apple_header)    
print('\n')
explore_data(Apple_list,0,3,True)    
print('\n')
print(google_header)
print('\n')
explore_data(google_list,0,3,True) 

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 

# 2. Data Cleaning

Before beginning our analysis, we need to make sure the data we analyze is accurate, otherwise the results of our analysis will be wrong.

This means that we need to:

1. Removed inaccurate data
2. Remove the duplicates
3. Removed non-English apps,like 爱奇艺PPS -《欢乐颂2》电视剧热播.
4. Isolated the free apps

##  2.1 Removed inaccurate data

In [73]:
# From the discuss of Google Play data on kaggle, we see that row 10472 of google list
print("The normal row:" , google_list[2])
print("\n")
print("The error row:" , google_list[10472])

The normal row: ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


The error row: ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [74]:
# Remove the row using del
del google_list[10472]  # only use once
print(google_header) 
explore_data(google_list,0,2,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10840
Number of columns: 13


## 2.2 Remove the duplicates

Google Play data set has duplicate entries. For instance, Instagram has four entries:

In [75]:
for app in google_list:
    if app[0]=='Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


The code below is to create two lists for duplicate apps and unique apps.

* Created two lists: one for storing the name of duplicate apps, and one for storing the name of unique apps.
* Looped through the google_list data set, 
* and for each iteration: We saved the app name to a variable named app.
* If name was already in the unique_apps list, we appended name to the duplicate_apps list.
* Else (if name wasn't already in the unique_apps list), we appended name to the unique_apps list.

In [76]:
unique_apps= []
duplicate_apps = []

for row in google_list: 
    app = row [0]
    if app in unique_apps: 
        duplicate_apps.append(app)
    else:
        unique_apps.append(app)

In [77]:
print ('Number of dupliate apps:',len(duplicate_apps))
print ('Example of dupliate apps:',duplicate_apps[:10])

print ('Number of unique apps:',len(unique_apps))
print ('Example of unique apps:',unique_apps[:10])

Number of dupliate apps: 1181
Example of dupliate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']
Number of unique apps: 9659
Example of unique apps: ['Photo Editor & Candy Camera & Grid & ScrapBook', 'Coloring book moana', 'U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'Sketch - Draw & Paint', 'Pixel Draw - Number Art Coloring Book', 'Paper flowers instructions', 'Smoke Effect Photo Maker - Smoke Editor', 'Infinite Painter', 'Garden Coloring Book', 'Kids Paint Free - Drawing Fun']


Next, we will create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.


In [78]:
reviews_max ={}

for row in google_list: 
    app = row [0]
    n_reviews =  float(row [3])
    
    if app in reviews_max: 
        if reviews_max[app] < n_reviews:
            reviews_max[app]=n_reviews
    else:
        reviews_max[app]=n_reviews
print(len(reviews_max))        

9659


Use the information stored in the dictionary, we can create a new data set, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

* loop through google list, and for every iteration:
* assign app name and num_reviews variables from row value
* if the number of review matchs the max number in reviews_max dictionary, and the app name is not in the already_added  then add the app to android_clean and already_added 

In [79]:
google_clean =[]
already_added  =[]

for row in google_list:
    app = row [0]
    n_reviews =  float(row [3])
    
    if reviews_max[app] == n_reviews and app not in already_added:
        google_clean.append(row)
        already_added.append(app)

In [80]:
# Explore the google_clean, expect 9659 entries
len(google_clean)

9659

## 2.3 Removed non-English apps,like 爱奇艺PPS -《欢乐颂2》电视剧热播.

In [81]:
# Check if app name is in common English characters,
# If app name contains 3+  characters that fall outside the ASCII range (0 - 127), 
# return False, else return True

def english_name(str):
    no_english = 0
    for charactor in str:
        if ord(charactor)>127:
            no_english +=1
    if  no_english > 3:  # in solution, the condition is >3. it keeps many none english apps
        return False
    else:
        return True

print (english_name('Instagram'))
print (english_name('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print (english_name('Docs To Go™ Free Office Suite'))
print (english_name('Instachat 😜'))

True
False
True
True


In [82]:
# Loop through each data set. If an app name is identified as English, append the whole row to a separate list.
def english_filter(app_list, column):
    english_list =[]
    for row in app_list:
        app = row[column]
        if english_name(app):
            english_list.append(row)
    return english_list

Apple_english = english_filter(Apple_list,2)
google_english = english_filter(google_clean,0)

print('Number of Android apps left: ', len(google_english))
print('Number of iOS apps left: ',len(Apple_english))

Number of Android apps left:  9614
Number of iOS apps left:  7197


## 2.4 Isolated the free apps

In [83]:
# Loop through each data set to isolate the free apps in separate lists. 
# After isolate the free apps, check the length of each data set to see how many apps you have remaining.


google_free =[]
for row in google_english:
        price = row[7]
        if price == '0':
            google_free.append(row)

Apple_free =[]
for row in Apple_english:
        price = row[4]
        if price == '0.0':
            Apple_free.append(row)
print(len(Apple_free))
print(len(google_free))

4056
8864


* We have 8864 Android apps and 3222 iOS apps let. It should be enough for our analysis.

# 3. Most Common Apps by Genre

## Part 1

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:
* Build a minimal Android version of the app, and add it to Google Play.
* If the app has a good response from users, we develop it further.
* If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

## Part 2

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For this, let's begin the analysis by getting a sense of what are the most common genres for each market. We'll build two functions we can use to analyze the frequency tables:
* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in a descending order

In [84]:
def freq_table (dataset, index):
    frequency_table ={}
    pecentage = 100/len(dataset)
    for row in dataset:
        key = row[index]
        if key in frequency_table:
            frequency_table[key] += pecentage
        else:
            frequency_table[key] = pecentage
    return  frequency_table

In [85]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (round(table[key],2), key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    table_sorted
    print(table_sorted) # use this display to save screen space 

## Part 3

In [86]:
# display the frequency table for the prime_genre column of the App Store data set.
frequency=display_table(Apple_free, -5)

[(55.65, 'Games'), (8.23, 'Entertainment'), (4.12, 'Photo & Video'), (3.53, 'Social Networking'), (3.25, 'Education'), (2.98, 'Shopping'), (2.69, 'Utilities'), (2.32, 'Lifestyle'), (2.07, 'Finance'), (1.95, 'Sports'), (1.87, 'Health & Fitness'), (1.65, 'Music'), (1.63, 'Book'), (1.53, 'Productivity'), (1.43, 'News'), (1.38, 'Travel'), (1.06, 'Food & Drink'), (0.76, 'Weather'), (0.49, 'Reference'), (0.49, 'Navigation'), (0.49, 'Business'), (0.22, 'Catalogs'), (0.2, 'Medical')]


Now, we can analyze the frequency table that generated for the prime_genre column of the App Store data set.

* What is the most common genre?  Games,
* What is the runner-up? Entertainment, Photo & Video, Social Networking  
* What other patterns do you see?  App store is domnated by apps for fun 
* What is the general impression?  Most of the apps designed for entertainment (games, photo and video, social networking, sports, music)

In [87]:
# Display Genres and Category columns of the Google Play data set 
display_table(google_free, 1) # Category

[(18.91, 'FAMILY'), (9.72, 'GAME'), (8.46, 'TOOLS'), (4.59, 'BUSINESS'), (3.9, 'LIFESTYLE'), (3.89, 'PRODUCTIVITY'), (3.7, 'FINANCE'), (3.53, 'MEDICAL'), (3.4, 'SPORTS'), (3.32, 'PERSONALIZATION'), (3.24, 'COMMUNICATION'), (3.08, 'HEALTH_AND_FITNESS'), (2.94, 'PHOTOGRAPHY'), (2.8, 'NEWS_AND_MAGAZINES'), (2.66, 'SOCIAL'), (2.34, 'TRAVEL_AND_LOCAL'), (2.25, 'SHOPPING'), (2.14, 'BOOKS_AND_REFERENCE'), (1.86, 'DATING'), (1.79, 'VIDEO_PLAYERS'), (1.4, 'MAPS_AND_NAVIGATION'), (1.24, 'FOOD_AND_DRINK'), (1.16, 'EDUCATION'), (0.96, 'ENTERTAINMENT'), (0.94, 'LIBRARIES_AND_DEMO'), (0.93, 'AUTO_AND_VEHICLES'), (0.82, 'HOUSE_AND_HOME'), (0.8, 'WEATHER'), (0.71, 'EVENTS'), (0.65, 'PARENTING'), (0.64, 'ART_AND_DESIGN'), (0.62, 'COMICS'), (0.6, 'BEAUTY')]


In [88]:
display_table(google_free, 9) # Genres

[(8.45, 'Tools'), (6.07, 'Entertainment'), (5.35, 'Education'), (4.59, 'Business'), (3.89, 'Productivity'), (3.89, 'Lifestyle'), (3.7, 'Finance'), (3.53, 'Medical'), (3.46, 'Sports'), (3.32, 'Personalization'), (3.24, 'Communication'), (3.1, 'Action'), (3.08, 'Health & Fitness'), (2.94, 'Photography'), (2.8, 'News & Magazines'), (2.66, 'Social'), (2.32, 'Travel & Local'), (2.25, 'Shopping'), (2.14, 'Books & Reference'), (2.04, 'Simulation'), (1.86, 'Dating'), (1.85, 'Arcade'), (1.77, 'Video Players & Editors'), (1.76, 'Casual'), (1.4, 'Maps & Navigation'), (1.24, 'Food & Drink'), (1.13, 'Puzzle'), (0.99, 'Racing'), (0.94, 'Role Playing'), (0.94, 'Libraries & Demo'), (0.93, 'Auto & Vehicles'), (0.91, 'Strategy'), (0.82, 'House & Home'), (0.8, 'Weather'), (0.71, 'Events'), (0.68, 'Adventure'), (0.61, 'Comics'), (0.6, 'Beauty'), (0.6, 'Art & Design'), (0.5, 'Parenting'), (0.45, 'Card'), (0.43, 'Casino'), (0.42, 'Trivia'), (0.39, 'Educational;Education'), (0.38, 'Board'), (0.37, 'Education

Now, we can analyze the frequency table that generated for the Category and Genres column of the Google Play data set.

* What are the most common genres? Tools, Entertainment, FAMILY, GAME,BUSINESS
* What other patterns do you see? There are more practical apps 
* Google Play shows a more balanced landscape of both practical and for-fun apps. 

# 4. Most Popular Apps by  ratings number per app genre on the ios App Store

The frequency tables we analyzed on the previous screen showed us that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to get an idea about the kind of apps with the most users. 

For the Google Play data set, we can find this information in the <strong>Installs</strong> column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the <strong>rating_count_column</strong> app.

In [89]:
apple_genre = freq_table(Apple_free, -5)

for genre in apple_genre:
    total = 0
    len_genre = 0
    for row in Apple_free:
        genre_app = row[-5]
        if genre_app == genre:
            n_rating = float(row[6])
            total += n_rating 
            len_genre +=1
    average_n_rating = int(total  / len_genre)
    print (genre, ":",average_n_rating)   

Social Networking : 646
Photo & Video : 417
Games : 609
Music : 618
Reference : 2471
Health & Fitness : 234
Weather : 1624
Utilities : 1618
Travel : 175
Shopping : 714
News : 62
Navigation : 228
Lifestyle : 1007
Entertainment : 179
Food & Drink : 457
Sports : 177
Book : 106
Finance : 298
Education : 683
Productivity : 225
Business : 183
Catalogs : 364
Medical : 36


On average, Navigation, Reference, Social Networking apps have the highest number of user reviews, followed by Music,Weather,Book. <BR>

Let take a close look at apps of the most popular catagory

In [90]:
for app in Apple_free:
    if app[-5] == 'Navigation':
        print(app[2], ':', app[6]) # print name and number of ratings

94139392 : 3040
120232960 : 1253
108166144 : 134
82534400 : 70
166025216 : 5
213586944 : 23
107509760 : 22
126867456 : 0
103106560 : 3
36229120 : 13
46950400 : 0
55756800 : 0
39891968 : 0
117473280 : 0
25631744 : 0
27655168 : 0
5177344 : 0
41207059 : 0
125665280 : 0
48708608 : 0


Navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together. <BR>
Same pattern applys to Social Networking apps. Below is the detail for Refernce apps 


In [91]:
for app in Apple_free:
    if app[-5] == 'Reference':
        print(app[2], ':', app[6]) # print name and number of ratings

92774400 : 5320
111275008 : 177
165748736 : 10176
65281024 : 27
100551680 : 706
52959232 : 17588
155593728 : 1125
596499456 : 60
90124288 : 8535
86874112 : 4693
85424128 : 245
27029504 : 58
34959360 : 7
89927680 : 718
10645504 : 1
125990912 : 2
44208128 : 0
57025536 : 0
225522688 : 0
31080448 : 0


The reference seems to show some potential.  <BR/>
Other genres that seem popular include  book and game. 


We removed all the apps that have over 30,000 rating_count_column, the average below shows the popularity ios apps among second tier

In [92]:
for genre in apple_genre:
    total = 0
    len_genre = 0
    for row in Apple_free:
        genre_app = row[-5]
        if genre_app == genre:
            n_rating = float(row[6])
            if n_rating < 30000:
                total += n_rating 
                len_genre +=1
    average_n_rating = int(total  / len_genre)
    print (genre, ":",average_n_rating)

Social Networking : 135
Photo & Video : 417
Games : 490
Music : 618
Reference : 2471
Health & Fitness : 234
Weather : 1624
Utilities : 640
Travel : 175
Shopping : 392
News : 62
Navigation : 228
Lifestyle : 66
Entertainment : 179
Food & Drink : 457
Sports : 177
Book : 106
Finance : 298
Education : 163
Productivity : 225
Business : 183
Catalogs : 364
Medical : 36


Among the second tier apps, the top three are Reference , Productivity , Shopping. Run ups include Music, Social Networking,Business 

# 5. Most Popular Apps by the number of installs  per Genre on Google Play

For the Google Play market, we can use the number of installs to understand app popularity.

In [93]:
display_table(google_free, 5) # install numbers

[(15.73, '1,000,000+'), (11.55, '100,000+'), (10.55, '10,000,000+'), (10.2, '10,000+'), (8.39, '1,000+'), (6.92, '100+'), (6.83, '5,000,000+'), (5.56, '500,000+'), (4.77, '50,000+'), (4.51, '5,000+'), (3.54, '10+'), (3.25, '500+'), (2.3, '50,000,000+'), (2.13, '100,000,000+'), (1.92, '50+'), (0.79, '5+'), (0.51, '1+'), (0.27, '500,000,000+'), (0.23, '1,000,000,000+'), (0.05, '0+'), (0.01, '0')]


In [94]:
google_genre = freq_table (google_free, 1) # for Genre
# freq_category

for category in google_genre:
    total = 0
    len_category = 0
    for row in google_free:
        category_app = row[1]
        if category_app ==  category:
            n_installs =row[5]
            n_installs =n_installs.replace(',', '')
            n_installs =n_installs.replace('+', '')
            total +=float(n_installs)
            len_category +=1 
    average_installs = total / len_category
    print (category, ":" , round(average_installs,2) )       

ART_AND_DESIGN : 1986335.09
AUTO_AND_VEHICLES : 647317.82
BEAUTY : 513151.89
BOOKS_AND_REFERENCE : 8767811.89
BUSINESS : 1712290.15
COMICS : 817657.27
COMMUNICATION : 38456119.17
DATING : 854028.83
EDUCATION : 1833495.15
ENTERTAINMENT : 11640705.88
EVENTS : 253542.22
FINANCE : 1387692.48
FOOD_AND_DRINK : 1924897.74
HEALTH_AND_FITNESS : 4188821.99
HOUSE_AND_HOME : 1331540.56
LIBRARIES_AND_DEMO : 638503.73
LIFESTYLE : 1437816.27
GAME : 15588015.6
FAMILY : 3695641.82
MEDICAL : 120550.62
SOCIAL : 23253652.13
SHOPPING : 7036877.31
PHOTOGRAPHY : 17840110.4
SPORTS : 3638640.14
TRAVEL_AND_LOCAL : 13984077.71
TOOLS : 10801391.3
PERSONALIZATION : 5201482.61
PRODUCTIVITY : 16787331.34
PARENTING : 542603.62
WEATHER : 5074486.2
VIDEO_PLAYERS : 24727872.45
NEWS_AND_MAGAZINES : 9549178.47
MAPS_AND_NAVIGATION : 4056941.77


the top three apps are communication,social,photography; followed by game, entertainment,books_and_reference

the number of communication app installs is heavily skewed up by a few popular apps. <br/>
below are 2 funtions to show apps with over or less 100m installs

In [95]:
# function to show google apps with over 100M installs
def app_100m_plus(category):
    for app in google_free:
        if app[1] ==  category and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
           
            print(app[0], ':', app[5])
            
# function to show apps with less than 100M installs            
def app_100m_less(category):
    for app in google_free:
        if app[1] ==  category and (app[5] != '1,000,000,000+' 
                                    and app[5] != '500,000,000+' 
                                    and app[5] != '100,000,000+'):
            
            print(app[0], ':', app[5])

Apply funtion app_100m_plus, we find out that communication Apps is heavily influenced by few over 100 millian installs <br/>
We see the same pattern for the video players category and social apps. 

In [96]:
app_100m_plus("COMMUNICATION")

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

We removed all the apps that have over 100 million installs, the average below shows the popularity google apps among second tier

In [97]:
for category in google_genre:
    total = 0
    len_category = 0
    for row in google_free:
        category_app = row[1]
        if category_app ==  category :
            n_installs =row[5]
            n_installs =n_installs.replace(',', '')
            n_installs =n_installs.replace('+', '')
            if float(n_installs) < 100000000:
                total +=float(n_installs)
                len_category +=1 
    average_installs = total / len_category
    print (category, ":" , round(average_installs,2) )  

ART_AND_DESIGN : 1986335.09
AUTO_AND_VEHICLES : 647317.82
BEAUTY : 513151.89
BOOKS_AND_REFERENCE : 1437212.22
BUSINESS : 1226918.74
COMICS : 817657.27
COMMUNICATION : 3603485.39
DATING : 854028.83
EDUCATION : 1833495.15
ENTERTAINMENT : 6118250.0
EVENTS : 253542.22
FINANCE : 1086125.79
FOOD_AND_DRINK : 1924897.74
HEALTH_AND_FITNESS : 2005713.66
HOUSE_AND_HOME : 1331540.56
LIBRARIES_AND_DEMO : 638503.73
LIFESTYLE : 1152128.78
GAME : 6272564.69
FAMILY : 2342897.53
MEDICAL : 120550.62
SOCIAL : 3084582.52
SHOPPING : 4640920.54
PHOTOGRAPHY : 7670532.29
SPORTS : 2994082.55
TRAVEL_AND_LOCAL : 2944079.63
TOOLS : 3191461.13
PERSONALIZATION : 2549775.83
PRODUCTIVITY : 3379657.32
PARENTING : 542603.62
WEATHER : 5074486.2
VIDEO_PLAYERS : 5544878.13
NEWS_AND_MAGAZINES : 1502841.88
MAPS_AND_NAVIGATION : 2484104.75


Among the second tier apps, the top three are photography, game, entertainment. Run ups include,shopping,communication,social <BR>  
    
From the above analysis, we noticed photography app has potential. It performs well among all catagories. 

# 6. Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets. 

We concluded that photography, game and book/reference app has potential