# Profitable App Profiles for the App Store and Google Play Markets

* Our Company only build free apps toward English-speaking audience
* Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

## 1. Opening and Exploring the Data


To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our goals:

* The Google Play Store data set contains data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from this [link](https://www.kaggle.com/lava18/google-play-store-apps)


* The Apple Store data set contains data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this [link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps); 

Define a function open_data to open the two data sets we mentioned above, and impoprt data from csv file to a list.

In [1]:
from csv import reader
def open_data(filename):
    open_file = open(filename, encoding="utf8")
    read_file= reader(open_file)
    return list(read_file)
    
Apple_list = open_data('AppleStore.csv') 
google_list = open_data('googleplaystore.csv')   

Apple_header =Apple_list[0]
Apple_list = Apple_list[1:]
google_header =google_list[0]
google_list = google_list[1:]

Explore both data sets using the defined explore_data() function.
* Print the first few rows of each data set.
* Find the number of rows and columns of each data set
* Print the column names and try to identify the columns that could help us with our analysis. 

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print(Apple_header)    
print('\n')
explore_data(Apple_list,0,3,True)    
print('\n')
print(google_header)
print('\n')
explore_data(google_list,0,3,True) 

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 7197
Number of columns: 17


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,

# Data Cleaning

1. Removed inaccurate data
2. Detect duplicate data, and remove the duplicates
3. Removed non-English apps,like Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠.
4. Isolated the free apps

###  1. Removed inaccurate data

In [3]:
# From the discuss of Google Play data, we see that row 10472 of google list
print("The normal row:" , google_list[2])
print("\n")
print("The error row:" , google_list[10472])

The normal row: ['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


The error row: ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [4]:
# Remove the row using del
del google_list[10472]  # only use once
print(google_header) 
explore_data(google_list,0,2,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10840
Number of columns: 13


### 2. Detect duplicate data, and remove the duplicates

In [5]:
 # some apps have duplicate entries. For instance, Instagram has four entries:
for app in google_list:
    if app[0]=='Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Google Play data set has duplicate entries. In the code below, I: 
* Created two lists: one for storing the name of duplicate apps, and one for storing the name of unique apps.
* Looped through the google_list data set, 
* and for each iteration: We saved the app name to a variable named app.
* If name was already in the unique_apps list, we appended name to the duplicate_apps list.
* Else (if name wasn't already in the unique_apps list), we appended name to the unique_apps list.

In [6]:
unique_apps= []
duplicate_apps = []

for row in google_list: 
    app = row [0]
    if app in unique_apps: 
        duplicate_apps.append(app)
    else:
        unique_apps.append(app)

In [7]:
print ('Number of dupliate apps:',len(duplicate_apps))
print ('Example of dupliate apps:',duplicate_apps[:10])

print ('Number of unique apps:',len(unique_apps))
print ('Example of unique apps:',unique_apps[:10])

Number of dupliate apps: 1181
Example of dupliate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']
Number of unique apps: 9659
Example of unique apps: ['Photo Editor & Candy Camera & Grid & ScrapBook', 'Coloring book moana', 'U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'Sketch - Draw & Paint', 'Pixel Draw - Number Art Coloring Book', 'Paper flowers instructions', 'Smoke Effect Photo Maker - Smoke Editor', 'Infinite Painter', 'Garden Coloring Book', 'Kids Paint Free - Drawing Fun']


To remove the duplicates, we will:

* Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.


In [8]:
reviews_max ={}

for row in google_list: 
    app = row [0]
    n_reviews =  float(row [3])
    
    if app in reviews_max: 
        if reviews_max[app] < n_reviews:
            reviews_max[app]=n_reviews
    else:
        reviews_max[app]=n_reviews

* Use the information stored in the dictionary and create a new data set, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).
* loop through google list, and for every iteration:
* assign app name and num_reviews variables from row value
* if the number of review matchs the max number in reviews_max dictionary, and the app name is not in the already_added  then add the app to android_clean and already_added 

In [9]:
google_clean =[]
already_added  =[]

for row in google_list:
    app = row [0]
    n_reviews =  float(row [3])
    
    if reviews_max[app] == n_reviews and app not in already_added:
        google_clean.append(row)
        already_added.append(app)

In [10]:
# Explore the google_clean, expect 9659 entries
len(google_clean)

9659

### 3. Removed non-English apps,like Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠.

In [11]:
# Check if app name is in common English characters,
# If app name contains 3+  characters that fall outside the ASCII range (0 - 127), return False, else return True

def english_name(str):
    no_english = 0
    for charactor in str:
        if ord(charactor)>127:
            no_english +=1
    
    if  no_english >= 2:  # in solution, the condition is >3. it keeps many none english apps
        return False
    else:
        return True

print (english_name('Instagram'))
print (english_name('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))
print (english_name('Docs To Go‚Ñ¢ Free Office Suite'))
print (english_name('Instachat üòú'))

True
False
True
True


In [12]:
# Loop through each data set. If an app name is identified as English, append the whole row to a separate list.
def english_filter(app_list, column):
    english_list =[]
    for row in app_list:
        app = row[column]
        if english_name(app):
            english_list.append(row)
    return english_list

Apple_english = english_filter(Apple_list,2)
google_english = english_filter(google_clean,0)

print(Apple_header) 
explore_data (Apple_english, 0, 3, True)
print ("\n")
print(google_header) 
explore_data (google_english, 0, 3, True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 6100
Number of columns: 17


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+

### 4. Isolated the free apps

In [13]:
# Loop through each data set to isolate the free apps in separate lists. 
# After isolate the free apps, check the length of each data set to see how many apps you have remaining.

Apple_free =[]
for row in Apple_english:
        price = float(row[5])
        if price == 0.0:
            Apple_free.append(row)

google_free =[]
for row in google_english:
        ifFree = (row[6])
        if ifFree == 'Free':
            google_free.append(row)

print(len(Apple_free))
print(len(google_free))

3169
8780


* We have 8863 Android apps and 4056 iOS apps let. It should be enough for our analysis.

### Most Common Apps by Genre

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:
* Build a minimal Android version of the app, and add it to Google Play.
* If the app has a good response from users, we develop it further.
* If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For this, let's begin the analysis by getting a sense of what are the most common genres for each market. We'll build two functions we can use to analyze the frequency tables:
* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in a descending order

In [14]:
def freq_table (dataset, index):
    frequency_table ={}
    pecentage = 100/len(dataset)
    for row in dataset:
        key = row[index]
        if key in frequency_table:
            frequency_table[key] += pecentage
        else:
            frequency_table[key] = pecentage
    return  frequency_table

In [15]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (round(table[key],2), key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    table_sorted
    print(table_sorted) # use this display to save screen space 
#     for entry in table_sorted:   
#         print(entry[1], ':', entry[0])

In [16]:
# display the frequency table for the prime_genre column of the App Store data set.
frequency=display_table(Apple_free, -5)

[(58.54, 'Games'), (7.83, 'Entertainment'), (5.05, 'Photo & Video'), (3.72, 'Education'), (3.28, 'Social Networking'), (2.52, 'Shopping'), (2.4, 'Utilities'), (2.18, 'Sports'), (2.05, 'Music'), (1.99, 'Health & Fitness'), (1.7, 'Productivity'), (1.55, 'Lifestyle'), (1.33, 'News'), (1.14, 'Travel'), (1.1, 'Finance'), (0.85, 'Weather'), (0.82, 'Food & Drink'), (0.54, 'Reference'), (0.54, 'Business'), (0.38, 'Book'), (0.19, 'Navigation'), (0.19, 'Medical'), (0.13, 'Catalogs')]


Now, we can analyze the frequency table that generated for the prime_genre column of the App Store data set.

* What is the most common genre?  Games,
* What is the runner-up? Entertainment, Photo & Video, Social Networking  
* What other patterns do you see?  App store is domnated by apps for fun 
* What is the general impression?  Most of the apps designed for entertainment (games, photo and video, social networking, sports, music)

In [17]:
# Display Genres and Category columns of the Google Play data set 
display_table(google_free, 1) # Category

[(18.94, 'FAMILY'), (9.66, 'GAME'), (8.46, 'TOOLS'), (4.64, 'BUSINESS'), (3.93, 'PRODUCTIVITY'), (3.91, 'LIFESTYLE'), (3.71, 'FINANCE'), (3.54, 'MEDICAL'), (3.33, 'SPORTS'), (3.3, 'PERSONALIZATION'), (3.26, 'COMMUNICATION'), (3.1, 'HEALTH_AND_FITNESS'), (2.97, 'PHOTOGRAPHY'), (2.8, 'NEWS_AND_MAGAZINES'), (2.69, 'SOCIAL'), (2.33, 'TRAVEL_AND_LOCAL'), (2.26, 'SHOPPING'), (2.15, 'BOOKS_AND_REFERENCE'), (1.86, 'DATING'), (1.8, 'VIDEO_PLAYERS'), (1.38, 'MAPS_AND_NAVIGATION'), (1.23, 'FOOD_AND_DRINK'), (1.17, 'EDUCATION'), (0.96, 'ENTERTAINMENT'), (0.93, 'LIBRARIES_AND_DEMO'), (0.92, 'AUTO_AND_VEHICLES'), (0.8, 'HOUSE_AND_HOME'), (0.79, 'WEATHER'), (0.72, 'EVENTS'), (0.65, 'ART_AND_DESIGN'), (0.64, 'PARENTING'), (0.6, 'BEAUTY'), (0.58, 'COMICS')]


In [18]:
display_table(google_free, 9) # Genres

[(8.45, 'Tools'), (6.07, 'Entertainment'), (5.38, 'Education'), (4.64, 'Business'), (3.93, 'Productivity'), (3.9, 'Lifestyle'), (3.71, 'Finance'), (3.54, 'Medical'), (3.39, 'Sports'), (3.3, 'Personalization'), (3.26, 'Communication'), (3.1, 'Health & Fitness'), (3.1, 'Action'), (2.97, 'Photography'), (2.8, 'News & Magazines'), (2.69, 'Social'), (2.32, 'Travel & Local'), (2.26, 'Shopping'), (2.15, 'Books & Reference'), (2.05, 'Simulation'), (1.86, 'Dating'), (1.83, 'Arcade'), (1.78, 'Video Players & Editors'), (1.74, 'Casual'), (1.38, 'Maps & Navigation'), (1.23, 'Food & Drink'), (1.14, 'Puzzle'), (1.0, 'Racing'), (0.95, 'Role Playing'), (0.93, 'Libraries & Demo'), (0.92, 'Auto & Vehicles'), (0.91, 'Strategy'), (0.8, 'House & Home'), (0.79, 'Weather'), (0.72, 'Events'), (0.66, 'Adventure'), (0.6, 'Beauty'), (0.6, 'Art & Design'), (0.57, 'Comics'), (0.49, 'Parenting'), (0.44, 'Card'), (0.41, 'Casino'), (0.4, 'Trivia'), (0.4, 'Educational;Education'), (0.39, 'Board'), (0.38, 'Educational'

Now, we can analyze the frequency table that generated for the Category and Genres column of the Google Play data set.

* What are the most common genres? Tools, Entertainment, FAMILY, GAME,BUSINESS
* What other patterns do you see? There are more practical apps 
* Google Play shows a more balanced landscape of both practical and for-fun apps. 

## Most Popular Apps by  ratings number per app genre on the ios App Store

The frequency tables we analyzed on the previous screen showed us that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to get an idea about the kind of apps with the most users. 

We will generate a frequency table to calculate average number of user ratings per app genre on the App Store:

In [19]:
apple_genre = freq_table(Apple_free, -5)

for genre in apple_genre:
    total = 0
    len_genre = 0
    for row in Apple_free:
        genre_app = row[-5]
        if genre_app == genre:
            n_rating = float(row[6])
            total += n_rating 
            len_genre +=1
    average_n_rating = int(total  / len_genre)
    print (genre, ":",average_n_rating)      


Productivity : 21799
Weather : 54215
Shopping : 27816
Reference : 79350
Finance : 32367
Music : 58205
Utilities : 19900
Travel : 31358
Social Networking : 72916
Sports : 23008
Health & Fitness : 24037
Games : 22985
Food & Drink : 33333
News : 21750
Book : 46384
Photo & Video : 28441
Entertainment : 14364
Business : 7491
Lifestyle : 16739
Education : 7003
Navigation : 86090
Medical : 612
Catalogs : 4004


On average, Navigation, Reference, Music, Social Networking apps have the highest number of user reviews, followed by Music,Weather,Book,Food & Drink,Finance,Travel,Photo & Video,Shopping. Let take a close look at apps of the popular catagory

In [20]:
for app in Apple_free:
    if app[-5] == 'Navigation':
        print(app[2], ':', app[6]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Geocaching¬Æ : 12811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS ‚Äì Car Navigation & Offline Maps : 3582
Google Maps - Navigation & Transit : 154911


Navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together. Same pattern applys to Social Networking apps. Below is the detail for Refernce apps 


In [21]:
for app in Apple_free:
    if app[-5] == 'Reference':
        print(app[2], ':', app[6]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pok√©mon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


The reference seems to show some potential.  <BR/>
Other genres that seem popular include weather, book, food and drink, or finance. 

We removed all the apps that have over 100 million installs, the average below shows the popularity google apps among second tier

In [22]:
for category in apple_genre:
    total = 0
    len_category = 0
    for row in Apple_free:
        category_app = row[-5]
        if category_app ==  category :
            n_installs =row[6]
            n_installs =n_installs.replace(',', '')
            n_installs =n_installs.replace('+', '')
            if float(n_installs) < 100000000:
                total +=float(n_installs)
                len_category +=1 
    average_installs = total / len_category
    print (category, ":" , round(average_installs,2) )  

Productivity : 21799.15
Weather : 54215.3
Shopping : 27816.2
Reference : 79350.47
Finance : 32367.03
Music : 58205.03
Utilities : 19900.47
Travel : 31358.5
Social Networking : 72916.55
Sports : 23008.9
Health & Fitness : 24037.63
Games : 22985.21
Food & Drink : 33333.92
News : 21750.07
Book : 46384.92
Photo & Video : 28441.54
Entertainment : 14364.77
Business : 7491.12
Lifestyle : 16739.35
Education : 7003.98
Navigation : 86090.33
Medical : 612.0
Catalogs : 4004.0


Among the second tier apps, the top three are Navigation, Reference, Social Networking. <BR>
Run ups include Music, Weather, Book, Food & Drink, Finance, Travel, Photo & Video, Shopping  <BR>

## Most Popular Apps by the number of installs  per Genre on Google Play

For the Google Play market, we can use the number of installs to understand app popularity.

In [23]:
display_table(google_free, 5) # install numbers

[(15.75, '1,000,000+'), (11.51, '100,000+'), (10.63, '10,000,000+'), (10.19, '10,000+'), (8.37, '1,000+'), (6.94, '100+'), (6.87, '5,000,000+'), (5.56, '500,000+'), (4.77, '50,000+'), (4.49, '5,000+'), (3.51, '10+'), (3.2, '500+'), (2.3, '50,000,000+'), (2.14, '100,000,000+'), (1.92, '50+'), (0.79, '5+'), (0.51, '1+'), (0.27, '500,000,000+'), (0.23, '1,000,000,000+'), (0.05, '0+')]


In [24]:
google_genre = freq_table (google_free, 1) # for Genre
# freq_category

for category in google_genre:
    total = 0
    len_category = 0
    for row in google_free:
        category_app = row[1]
        if category_app ==  category:
            n_installs =row[5]
            n_installs =n_installs.replace(',', '')
            n_installs =n_installs.replace('+', '')
            total +=float(n_installs)
            len_category +=1 
    average_installs = total / len_category
    print (category, ":" , round(average_installs,2) )       

ART_AND_DESIGN : 1986335.09
AUTO_AND_VEHICLES : 654074.83
BEAUTY : 513151.89
BOOKS_AND_REFERENCE : 8814199.79
BUSINESS : 1712290.15
COMICS : 859042.16
COMMUNICATION : 38590581.09
DATING : 861409.55
EDUCATION : 1833495.15
ENTERTAINMENT : 11767380.95
EVENTS : 253542.22
FINANCE : 1365500.4
FOOD_AND_DRINK : 1951283.81
HEALTH_AND_FITNESS : 4204220.23
HOUSE_AND_HOME : 1380033.73
LIBRARIES_AND_DEMO : 645070.85
LIFESTYLE : 1447458.98
GAME : 15593824.69
FAMILY : 3719532.88
MEDICAL : 121161.88
SOCIAL : 23253652.13
SHOPPING : 7072366.59
PHOTOGRAPHY : 17840110.4
SPORTS : 3750580.64
TRAVEL_AND_LOCAL : 14120454.08
TOOLS : 10902378.83
PERSONALIZATION : 5273184.1
PRODUCTIVITY : 16787331.34
PARENTING : 552875.18
WEATHER : 5212877.1
VIDEO_PLAYERS : 24878048.86
NEWS_AND_MAGAZINES : 9626407.36
MAPS_AND_NAVIGATION : 4115374.21


The top three apps are communication, video_players, social;   followed by photography, productivity, game, travel_and_local, entertainment, tools,news_and_magazines, books_and_reference,shopping

The number of communication app installs is heavily skewed up by a few popular apps. <br/>
Below are 2 funtions to show apps with over or less 100M installs

In [25]:
# function to show google apps with over 100M installs
def app_100m_plus(category):
    for app in google_free:
        if app[1] ==  category and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
           
            print(app[0], ':', app[5])
            
# function to show apps with less than 100M installs            
def app_100m_less(category):
    for app in google_free:
        if app[1] ==  category and (app[5] != '1,000,000,000+' 
                                    and app[5] != '500,000,000+' 
                                    and app[5] != '100,000,000+'):
            
            print(app[0], ':', app[5])

Apply funtion app_100m_plus, we find out that COMMUNICATION Apps is heavily influenced by few over 100 millian installs <br/>
We see the same pattern for the video players category and social apps. 


In [26]:
app_100m_plus("COMMUNICATION")

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Me

We removed all the apps that have over 100 million installs, the average below shows the popularity google apps among second tier

In [27]:
for category in google_genre:
    total = 0
    len_category = 0
    for row in google_free:
        category_app = row[1]
        if category_app ==  category :
            n_installs =row[5]
            n_installs =n_installs.replace(',', '')
            n_installs =n_installs.replace('+', '')
            if float(n_installs) < 100000000:
                total +=float(n_installs)
                len_category +=1 
    average_installs = total / len_category
    print (category, ":" , round(average_installs,2) )  

ART_AND_DESIGN : 1986335.09
AUTO_AND_VEHICLES : 654074.83
BEAUTY : 513151.89
BOOKS_AND_REFERENCE : 1445020.43
BUSINESS : 1226918.74
COMICS : 859042.16
COMMUNICATION : 3617398.42
DATING : 861409.55
EDUCATION : 1833495.15
ENTERTAINMENT : 6183037.97
EVENTS : 253542.22
FINANCE : 1062009.64
FOOD_AND_DRINK : 1951283.81
HEALTH_AND_FITNESS : 2013140.38
HOUSE_AND_HOME : 1380033.73
LIBRARIES_AND_DEMO : 645070.85
LIFESTYLE : 1159293.65
GAME : 6232358.66
FAMILY : 2356326.97
MEDICAL : 121161.88
SOCIAL : 3084582.52
SHOPPING : 4664914.95
PHOTOGRAPHY : 7670532.29
SPORTS : 3086791.54
TRAVEL_AND_LOCAL : 2973465.43
TOOLS : 3221943.24
PERSONALIZATION : 2585898.54
PRODUCTIVITY : 3379657.32
PARENTING : 552875.18
WEATHER : 5212877.1
VIDEO_PLAYERS : 5575380.67
NEWS_AND_MAGAZINES : 1514799.22
MAPS_AND_NAVIGATION : 2503867.9


Among the second tier apps, the top three are photography,game,entertainment. <BR>
Run ups include video_players,weather, shopping, communication, productivity, tools and social <BR>  
    
From the above analysis, we noticed photography app has potential. It performs well among all catagories. 

# Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets. We concluded that photography app has potential