# Profitable App Store Project
---
This project will determine the most profitable apps on the iOS App Store and Android's Google Play Store. I will be garnering profitability insights from the app store/ google play store datasets. Skills from this project include:
    1. Data Wrangling: How to import and clean a data set.
    2. Data Analytics: How to use data science techniques to retrieve insights from data sets. 

# Step 1: Import Data 
---
- In the first step of this project we look to import the data for analysis. Both Google Play Store data for Android and AppStore data for iOS. 

In [71]:
from csv import reader

### Google Play data set ###
opened_file = open("googleplaystore.csv", encoding='utf8')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [73]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')  #prints new line for space
        
    if rows_and_columns:
        print('Number of Rows:', len(dataset))
        print('Number of Columns:', len(dataset[0]))

#See first 3 rows of each data set. 
print(ios_header)
print('\n')
explore_data(ios,0,3, True)  


print(android_header)
print('\n')
explore_data(android,0,3, True)  

    



['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of Rows: 7197
Number of Columns: 17
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,00

In [3]:
print(len(android))
del android[10472] 
print(len(android))

10841
10840


# Step 2A: Clean Data 
---
- In the first step of the project, we clean the data in this data set. 
- I then explore the data using the **explore function** and determine incorrect or missing data and delete them. 
    - From the data source, it was determined that there was missing average reviews on line 1072, shifting all columns in that line to the left. It was deleted. 
- We also identifying duplicate apps located within the data set, and then determining what makes them different. The main differences between the duplicares are the amount of reviews, which also identify when this data was gathered. As a result, the most recent iteration of the duplicate will be kept in the data set, and the others will be deleted. I will do this by deleting the variants that have the least amount of reviews. 


In [69]:
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

    
print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))


Expected length: 9659
Actual length: 9659


In [68]:
android_clean = []
already_added = []

for appname in android:
    name = appname[0]
    n_reviews = float(appname[3])

    if (n_reviews == reviews_max[name]) and (name not in already_added):
            android_clean.append(appname)
            already_added.append(name)

explore_data(android_clean, 0, 3, True)
#print('Number of duplicate apps:',len(duplicates))
#print('Number of remaining apps:',len(cleaned))
#print('Duplicate Apps:', duplicates[:3])



['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of Rows: 9659
Number of Columns: 13


Step 2B: Removing Non-English Apps
---
___
After removing the apps that are duplicated and apps with missing data, we now focus on removing non english apps. We will do this by putting to use the ASCII: American Standard Code for Information Interchange 

- Without diving too much into the ASCII, more information on it can be seen [here](https://www.ascii-code.com/). We will be using it to determine which characters are not a part of the english language, which app names include them in the title, and use this factor to determine non english apps and remove them from the dataset. 

In [67]:
def english_test(string):
    
    for character in string: 
        if ord(character) > 127:
            return False
    return True 
    
print(english_test('Instagram'))
print(english_test('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_test('Instachat 😜'))

True
False
False


As mentioned previously, the ASCII codes characters as a respective ASCII number. 'a' is 97 and 'A' is 65. all characters used within the english language end at 127. Thus, anything over 127 is tagged as a non english character as it iterates over each character in the input string passed through the function. 

We can see that there's a slight problem however, as emojis are over 127 in their ASCII designation. Thus we will have to make slight edits to this rule. 

In [66]:
def english_app(string):
    non_ascii = 0
    for character in string: 
        if ord(character) > 127:
            non_ascii += 1 
    if non_ascii > 3:
        return False
    return True 
print(english_app('Instagram'))
print(english_app('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_app('Instachat 😜'))


True
False
True


- I have decided to update the criteria of what is classified as an english app and what isn't, increasing the counter to 3 instead of only one character over 127. 
- As a result, apps with up to 3 emojis will be kept as English apps. Otherwise, app titles with fall names in another language (more than 3 characters within the name) will be classified as non english. 
- We will now apply it to our data set and determine how many apps remain that are english 

In [140]:
android_english = []
android_nonenglish = []
ios_english = []
ios_nonenglish = []

for app in android_clean:
    name = app[0] 
    if english_app(name) == True:
        android_english.append(app)
    else:
        android_nonenglish.append(app)

for app in ios:
    name = app[0] 
    if english_app(name) == True:
        ios_english.append(app)
    else:
        ios_nonenglish.append(app)

explore_data(android_english, 0,1, True)
#explore_data(android_nonenglish, 0,1, True)
print('\n')
explore_data(ios_english, 0,1, True)
print('\n')

#explore_data(ios_nonenglish, 0,1, True)
print('Number of English Android Apps:', len(android_english))
print('Number of Non - English Android Apps:', len(android_nonenglish))
print('\n')

print('Number of English ios Apps:', len(ios_english))
print('Number of Non - English ios Apps:', len(ios_nonenglish))
            

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of Rows: 9614
Number of Columns: 13


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


Number of Rows: 7197
Number of Columns: 17


Number of English Android Apps: 9614
Number of Non - English Android Apps: 45


Number of English ios Apps: 7197
Number of Non - English ios Apps: 0


Step 2C: Removing Paid Apps
---
___
This analysis will be focused on apps that are free and gain revenue form in app purchases and advertising. I've chosen this decision as there will need to be a deeper analysis per genre to determine profitability of apps per genre. As someone is more than likely to buy certain apps more than others. Paying 5 dollars for a Bible App might be viewed as less valuable than paying 5 dollars for a well made financial responsibility app. The quality of the app begins to become a more significant factor in the price than the genre itself, which is our focus. 

In addition, it is not certain that paid apps bring in more revenue than non paid apps. There are microtrasactions in free mobile games that has generated hundreds of millions, and there are paid apps that have less than a million purchases. There are also paid apps with in app purchases which affects our analysis in determining app profitability given the data we are currently using.  

As a result, I will remove paid apps and focus on analysing free apps. This will be done by identifying the columns associated with the app cost on both android and iOS. 


In [114]:
ios_freeapps = []
android_freeapps = []

for app in android_english:
    app_cost = app[6]
    if app_cost == 'Free': 
        android_freeapps.append(app)
for app in ios_english:
    app_cost = app[5]
    if app_cost == '0':
        ios_freeapps.append(app)
        
print(android_freeapps[0])
print(ios_freeapps[0])
print('Number of Free Android Apps:', len(android_freeapps))
print('Number of Free iOS apps:', len(ios_freeapps))
android_final = android_freeapps
ios_final = ios_freeapps

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']
['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']
Number of Free Android Apps: 8863
Number of Free iOS apps: 4056


Step 3A: Profitability Analysis 
---
___
As mentioned previously This analysis will be focused on apps that are free and gain revenue form in app purchases and advertising. 

To minimize the risk of developing an unsuccessful app we will be looking to focus on 3 seperate steps. 
   1. Build minimal version of an android app for Google Play
   2. Determine reception, if positive develop further. 
   3. If app is profitable after 6 months develop an app for iOS and add to the App Store. 
   
Since our goal is to be as profitable as possible in our app development, we will look to find the most successful app genres on both platforms. As this might be an indicator of a uniqely successful idea.

In [307]:
android_final = android_freeapps
ios_final = ios_freeapps
def freq_table(dataset, index):
    table = {}
    total = 0 
    for row in dataset: 
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else : 
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key]/total) * 100
        table_percentages[key] = percentage
        
    return table_percentages
            
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
display_table(ios_final, -5)


Games : 55.64595660749507
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597637
Sports : 1.947731755424063
Health & Fitness : 1.8737672583826428
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032


In the above we use the frequency table to determine the percentage of iOS apps per genre. iOS apps can be seen as more of a games and entertainment platform. Where below, we see that the highest catogory is family. At first glance Android might seem to be more of a family platform, however taking a second look at the family category we see that this might not be the case. As a variety of games apps are categorized as family. This highlights the issue with the google play categorizations, and the lack of clearly defined delineations.  

![Image](https://www.lowyat.net/wp-content/uploads/2015/06/Google-Play-Store-Family-Category.jpg)	

In [123]:
display_table(android_final, 1) # Category column


FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

In [308]:
display_table(android_final, -4) #genres column 


Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

We see that in the second column included that identifies category on the google play story, it includes many more categories listed. With the highest percentage of apps in a particular genre being tools. More information on how Google expects developers to identify categories can be seen [here](https://support.google.com/googleplay/android-developer/answer/113475?hl=en). 

Most Popular Users By genre (App Store
---
___
To start, we'll be performing an analys is on the most popular apps by genre. Due to the iOS data set not including install data, we will aim to use the average number of ratings  per genre as a proxy. As a result, we will need to name a frequency table in order to calculate the averages manually, as average number of ratings per genre  is not included in the data set. 

In [302]:
import operator
genres_ios = freq_table(ios_final, -5)
genres_data = {}

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[6])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)
    
       

    

Productivity : 19053.887096774193
Weather : 47220.93548387097
Shopping : 18746.677685950413
Reference : 67447.9
Finance : 13522.261904761905
Music : 56482.02985074627
Utilities : 14010.100917431193
Travel : 20216.01785714286
Social Networking : 53078.195804195806
Sports : 20128.974683544304
Health & Fitness : 19952.315789473683
Games : 18924.68896765618
Food & Drink : 20179.093023255813
News : 15892.724137931034
Book : 8498.333333333334
Photo & Video : 27249.892215568863
Entertainment : 10822.961077844311
Business : 6367.8
Lifestyle : 8978.308510638299
Education : 6266.333333333333
Navigation : 25972.05
Medical : 459.75
Catalogs : 1779.5555555555557


Popular iOS App
---
___

Based on this list, we can see that the top 3 categories that have the highest average number of ratings are Reference, Music and social networking. Let's take a closer look at this category to see the respective apps that lie within them. 

In [142]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[2], ':', app[6]) #print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
彩库宝典-【官方版】 : 0
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8
無料で音楽や写真・カメラの裏技アプリ for iPhone7 : 0


In [303]:
for app in ios_final:
    if app[-5] == 'Music':
        print(app[2], ':', app[6]) # print name and number of ratings
        

Pandora - Music & Radio : 1126879
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
Deezer - Listen to your Favorite Music & Playlists : 4677
Sonos Controller : 48905
NRJ Radio : 38
radio.de - Der Radioplayer : 64
Spotify Music : 878563
SoundCloud - Music & Audio : 135744
Sing Karaoke Songs Unlimited with StarMaker : 26227
SoundHound Song Search & Music Player : 82602
Ringtones for iPhone & Ringtone Maker : 25403
Coach Guitar - Lessons & Easy Tabs For Beginners : 2416
QQ音乐-来这里“发现・音乐” : 745
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Magic Piano by Smule : 131695
QQ音乐HD : 224
The Singing Machine Mobile Karaoke App : 130
Bandsintown Concerts : 30845
PetitLyrics : 0
edjing Mix:DJ turntable to remix and scratch music : 13580
Smule Sing! : 119316
Amazon Music : 106235
AutoRap by Smule : 18202
My Mixtapez Music : 26286
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
Karaoke - Sing Karaoke, Unlimited Songs! : 2860

In [143]:
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[2], ':', app[6]) #print name and number of ratings
        

Facebook : 2974676
LinkedIn : 71856
Skype for iPhone : 373519
Tumblr : 334293
Match™ - #1 Dating App. : 60659
WhatsApp Messenger : 287589
TextNow - Unlimited Text + Calls : 164963
Grindr - Gay and same sex guys chat, meet and date : 23201
imo video calls and chat : 18841
Ameba : 269
Weibo : 7265
Badoo - Meet New People, Chat, Socialize. : 34428
Kik : 260965
Qzone : 1649
Fake-A-Location Free ™ : 354
Tango - Free Video Call, Voice and Chat : 75412
MeetMe - Chat and Meet New People : 97072
SimSimi : 23530
Viber Messenger – Text & Call : 164249
Find My Family, Friends & iPhone - Life360 Locator : 43877
Weibo HD : 16772
POF - Best Dating App for Conversations : 52642
GroupMe : 28260
出会い系アプリ i-Mail（アイメール） : 0
Lobi : 36
WeChat : 34584
YY- 小全民手机直播交友软件 : 624
ooVoo – Free Video Call, Text and Voice : 177501
Pinterest : 1061624
知乎 : 397
Qzone HD : 458
Skype for iPad : 60163
LINE : 11437
QQ : 9109
LOVOO - Dating Chat : 1985
MOMO陌陌-开启视频社交,用直播分享生活 : 1862
QQ HD : 5058
Messenger : 351466
eHarmony™ Dat

It seems quite clear that for profitability on iOS, these app categories should be considered first. Some of the most popular apps in these categories are religious apps, dictionaries, music streaming services, music identification, and messaging apps. 

Most Popular Users By genre (Google Play Store)
---
___

In this scenario, google play data set does indeed have user install data. As a result we will use this as a method to determine popularity. 

In [306]:
display_table(android_final, 5) # the Installs columns

1,000,000+ : 15.728308699086089
100,000+ : 11.55365000564143
10,000,000+ : 10.549475346947986
10,000+ : 10.199706645605326
1,000+ : 8.394448832223853
100+ : 6.916393997517771
5,000,000+ : 6.826131106848697
500,000+ : 5.562450637481666
50,000+ : 4.772650344127271
5,000+ : 4.513144533453684
10+ : 3.542818458761142
500+ : 3.2494640640866526
50,000,000+ : 2.3017037120613786
100,000,000+ : 2.1324607920568655
50+ : 1.9180864267178157
5+ : 0.7898002933543946
1+ : 0.5077287600135394
500,000,000+ : 0.270788672007221
1,000,000,000+ : 0.2256572266726842
0+ : 0.045131445334536835


Though the user install data is a great start to identify popularity, it's is not precise enough to determine where within each bracket these percentages lie. For example, we are unsure of how many of 100,000+ percentages lies at 200,000, 300,000 or 400,000. However for our purposes, this gives a good snapshot of the breakdown in installs. Let's now breakdown the average number of installs by category to determine which category has the highest average number  of user installs. 


In [309]:
categories_android = freq_table(android_final, 1)
for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Based on the android install data within the android data set the following apps stand out with the highest average number of user installs:
    - Communication
    - Social
    - Photography
    - Productivity
    - Games
    - Entertainment 
    - Books & Reference 

You can already begin to see some similarities. The iOS data showed that Books and Reference, communication, and music were at the top within the google play store. Depending on categorization styles, both social and communication can be seen as the same category, and music can be considered within an entertainment category (which is not included within the android data set. 

In [311]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])# A look at communication apps for android

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [315]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                      or app[5] == '5,000,000+'
                                      or app[5] == '10,000,000+'):
        print(app[0], ':', app[5]) # A look at the books and reference apps for android.

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

In [316]:
for app in android_final:
    if app[1] == 'GAME' and (app[5] == '1,000,000+'
                                      or app[5] == '5,000,000+'
                                      or app[5] == '10,000,000+'):
        print(app[0], ':', app[5]) # A look at the gaming apps for android.

Solitaire : 10,000,000+
Race the Traffic Moto : 10,000,000+
Marble - Temple Quest : 10,000,000+
Shooting King : 10,000,000+
Geometry Dash World : 10,000,000+
Jungle Marble Blast : 5,000,000+
Farm Fruit Pop: Party Time : 1,000,000+
Paint Hit : 10,000,000+
Rolly Vortex : 10,000,000+
Woody Puzzle : 1,000,000+
Stack Jump : 10,000,000+
The Cube : 5,000,000+
Bricks n Balls : 1,000,000+
The Fish Master! : 1,000,000+
Color Road : 10,000,000+
Draw In : 10,000,000+
Looper! : 1,000,000+
Will it Crush? : 5,000,000+
Tomb of the Mask : 5,000,000+
Baseball Boy! : 10,000,000+
Hello Stars : 10,000,000+
Tank Stars : 10,000,000+
Hole.io : 10,000,000+
Mini Golf King - Multiplayer Game : 5,000,000+
Flip the Gun - Simulator Game : 10,000,000+
Mad Skills BMX 2 : 1,000,000+
MMX Hill Dash 2 – Offroad Truck, Car & Bike Racing : 1,000,000+
Word Link : 10,000,000+
Last Day on Earth: Survival : 10,000,000+
Partymasters - Fun Idle Game : 10,000,000+
Harry Potter: Hogwarts Mystery : 10,000,000+
Offroad Outlaws : 1,0

Texas Holdem Poker Pro : 5,000,000+
Texas Holdem Poker : 10,000,000+
Big Fish Casino – Play Slots & Vegas Games : 10,000,000+
TRANSFORMERS: Forged to Fight : 10,000,000+
Texas Holdem & Omaha Poker: Pokerist : 10,000,000+
Governor of Poker 2 - OFFLINE POKER GAME : 5,000,000+
Squadron - Bullet Hell Shooter : 10,000,000+
Words With Friends – Play Free : 10,000,000+
Classic Words Solo : 5,000,000+
The Visitor: Ep.2 - Sleepover Slaughter : 1,000,000+
Oggy : 5,000,000+
Speed Racing Ultimate 2 : 1,000,000+
I Know Stuff : 5,000,000+
Water Surfer Racing In Moto : 1,000,000+
Rescue Robots Survival Games : 5,000,000+
Bike Race - Bike Blast Rush : 10,000,000+
ETERNITY WARRIORS 2 : 5,000,000+
Snes9x EX+ : 5,000,000+
Eyes - The Scary Horror Game Adventure : 10,000,000+
Sudoku Master : 1,000,000+
Jungle Monkey Run : 10,000,000+
Golden HoYeah Slots - Real Casino Slots : 5,000,000+
FaFaFa™ Gold Casino: Free slot machines : 1,000,000+
Heart of Vegas™ Slots – Free Slot Casino Games : 10,000,000+
BROTHER 

In [155]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'GAME') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

6272564.694894147

We can see that there are some similarities between iOS data and Google Play Store data, however there are a much higher number of gaming apps on Android than there are on iOS. Outside of this difference, both platforms show that communication, social media, and books and reference are lucrative categories to get into, as they draw the largest number of users. 

Conclusion
---
___


This initial project began with a desire to develop a successful mobile app. Before embarking on this endeavor we begin by determining the most profitable apps, with a focus on users leading to profitability. Data sets were obtained from both Apple's App store and google's Google play store to get a wide enough understanding on what genres are popular irrespective of platform. The focus was on free apps, as paid apps bring a factor of quality into the equation that we were not looking to analyse for this project. To minimize the risk of developing an unsuccessful app we looked to focus on 3 seperate steps.

1. Build minimal version of an android app for Google Play
2. Determine reception, if positive develop further.
3. If app is profitable after 6 months develop an app for iOS and add to the App Store. 

The first step began with an exraction of the data and a cleaning of the data. The data was cleaned by removing missing or incorrect indeces, removing duplicate entries seen within the data, removing as many non - english apps as possible, and removing paid apps. All of this was done on both the iOS data set and android data set. 

An analysis was then performed. It was determined that irrespective of the two platforms, communication, social media, and books/reference were 3 of the most popular categories. Outliers included gaming apps, which was significantly prevalent on the Google Play store. Due to communication and social media apps being highly competitive currently, an app focusing on books/reading has a high chance of being successful, with a focus on religion proving to be the most popular within this category. Features such as daily quotes, progress tracking, and more would need to be added to said app in order to remain competitive. With this data based approach, the likelihood of developing a successful app has increased.  