# App Investigation

In this project, We are going to analyze app date which includes many details about apps on AppStore and Goolge Play in order to form a better understanding of the relation between the genre of an app and the number of users of that particular app. 

Main purpose of this work is to create a solid proposal for the company about the type of a new app to be put in market based on the criterion that the company makes money from in-app adverts yielding the conclusion of more number of users mean more income.

In [1]:
from csv import reader

opened_file_AppStore = open("AppleStore.csv", encoding="utf8")
appStore_data = list (reader(opened_file_AppStore))

opened_file_GooglePlay = open("googleplaystore.csv", encoding="utf8")
googlePlay_data = list (reader(opened_file_GooglePlay))

In [2]:
# This function is to explore the data conveniently
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        

In [3]:
# First let us explore the column names for the datasets
print("AppStore Dataset Columns")
print(appStore_data[0])
print("\n")
print("GooglePlayStore Dataset Columns")
print(googlePlay_data[0])

AppStore Dataset Columns
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


GooglePlayStore Dataset Columns
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


-------------------
It seems that two data sets have different column names but pretty much the same information. Although it seems self explanatory, you can find detailed information about the columns at links below.

[AppStore](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)

[GooglePlayStore](https://www.kaggle.com/lava18/google-play-store-apps/home)

In [4]:
explore_data(appStore_data[1:], 0,3,rows_and_columns=True)

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 7197
Number of columns: 17


In [5]:
explore_data(googlePlay_data[1:], 0, 3, rows_and_columns=True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


## DATA CLEANING
### Line with missing information

In [6]:
# An issue has been observed in discussions at Kaggle that the entry 10472 has a missing column information. 
print(googlePlay_data[0])
print( "\n")
print(googlePlay_data[3])
print( "\n")
print(googlePlay_data[10473])
# It is seen below that "Life Made WI-Fi..." entry has no "Category" information and a columns shift occurs after. 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [7]:
#Incorrect entry is deleted
del googlePlay_data[10473]

In [8]:
# Checked that deletion is in place
print(googlePlay_data[10473])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


### Duplicated lines

In [9]:
# More on data explatory analysis, now we need to check any duplicated data for both appStore_data and googlePlay_data

unique_appStore =[]
duplicated_appStore = []

for app in appStore_data[1:]:
    #app[2] includes name of the app in appStore_data
    if app[2] in unique_appStore:
        duplicated_appStore.append(app[2])
    else:
        unique_appStore.append(app[2])
        
unique_googlePlayStore =[]
duplicated_googlePlayStore = []

for app in googlePlay_data[1:]:
    #app[0] includes name of the app in googlePlay_data
    if app[0] in unique_googlePlayStore:
        duplicated_googlePlayStore.append(app[0])
    else:
        unique_googlePlayStore.append(app[0])

In [10]:
print(len(duplicated_appStore))
print("\n")
print(duplicated_appStore)

# It is observed that there are two duplicated entries in appStore_data. However, it is confirmed from 
# documentation page that those are different apps with different producers but same names.

2


['VR Roller Coaster', 'Mannequin Challenge']


In [11]:
len(duplicated_googlePlayStore)
#The duplication case is different in googlePlayStore date. We need to deal with these duplications. 

1181

In [12]:
# One example is chosen to illustrate the duplication. Let us pick the 1000th entry of duplicated_googlePlayStore list.
# We will also print the header for convenience

dup_android = duplicated_googlePlayStore[1000]
print(googlePlay_data[0])

for app in googlePlay_data[1:]:
    if app[0] == dup_android:
        print("\n")   
        print(app)
       



['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Moco - Chat, Meet People', 'DATING', '4.2', '313724', 'Varies with device', '10,000,000+', 'Free', '0', 'Mature 17+', 'Dating', 'August 4, 2018', '2.6.141', '4.1 and up']


['Moco - Chat, Meet People', 'DATING', '4.2', '313724', 'Varies with device', '10,000,000+', 'Free', '0', 'Mature 17+', 'Dating', 'August 4, 2018', '2.6.141', '4.1 and up']


['Moco - Chat, Meet People', 'DATING', '4.2', '313769', 'Varies with device', '10,000,000+', 'Free', '0', 'Mature 17+', 'Dating', 'August 4, 2018', '2.6.142', '4.1 and up']


In [13]:
# The difference between lines are "Reviews" columns. Appearently, the date has been retrieved from Google Playstore
# at different time instances. It is possible to randomly remove the duplicated lines but keeping the line that has
# the highest "Reviews" count seems a better idea. 

reviews_max = {}

for app in googlePlay_data[1:]:
        #app[0] includes the name of the app
        name = app[0]
        n_reviews = float(app[3])
        
        if (name in reviews_max and reviews_max[name]<n_reviews) or (name not in reviews_max):
            reviews_max[name]=n_reviews
        

In [14]:
#There were 10840 total apps in googlePlay_data and 1181 of them were duplicates, which end up with 9656 unique data.
len(reviews_max)

9659

In [15]:
# Now is the time to remove the duplicated data

googlePlay_data_CLEANED = []
already_added = []

for app in googlePlay_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        # since some of the duplicated apps have the same number of reviews (a full duplicate) this second condition should
        # be added.
        googlePlay_data_CLEANED.append(app)
        already_added.append(name)
        
len(googlePlay_data_CLEANED)
# The duplicated data seems to be removed, but just to make sure, we will apply the same algorithm above to CLEANED data

9659

In [16]:
unique_googlePlayStore_CLEANED =[]
duplicated_googlePlayStore_CLEANED = []

for app in googlePlay_data_CLEANED[1:]:
    #app[0] includes name of the app in googlePlay_data
    if app[0] in unique_googlePlayStore_CLEANED:
        duplicated_googlePlayStore_CLEANED.append(app[0])
    else:
        unique_googlePlayStore_CLEANED.append(app[0])
        
len(duplicated_googlePlayStore_CLEANED)
# As it is seen, there is no duplicated data left.

0

### Cleaning Non-English Apps

In [17]:
# We observed some apps are not for English speaking customers which we do not interested in. In order to clean them out,
# first we need to write a function to identify if there is a non-English letter in the string passed in.

def check_English(string):
    counter = 0
    for letter in string:
        
        if ord(letter)>127:
            counter +=1
    if counter >3:
        return False
    else:
        return True

In [18]:
# We should apply the check_English function to two of our datasets. We will create new sets for English apps.
appStore_data_ENG = []

for app in appStore_data[1:]: #title row still in place
    name = app[2]  #name of the app is at 3rd index for appStore_data
    if check_English(name):
        appStore_data_ENG.append(app)

googlePlay_data_CLEANED_ENG = []
for app in googlePlay_data_CLEANED: # title row is not in place
    name = app[0]  #name of the app is at 1st index for googlePlay_data
    if check_English(name):
        googlePlay_data_CLEANED_ENG.append(app)

print ("English apps at Appstore is " + str(len(appStore_data_ENG)) +"\n")
print ("Non-Engish apps at Appstore is " + str(len(appStore_data)-len(appStore_data_ENG)-1)+ "\n") #title row should be subtracted
print ("English apps at Google PlayStore is " +str(len(googlePlay_data_CLEANED_ENG))+"\n")
print ("Non-Engish apps at Google PlayStore is " + str(len(googlePlay_data_CLEANED)-len(googlePlay_data_CLEANED_ENG))+ "\n")

# Notice that both of last datasets do not include title rows

English apps at Appstore is 6183

Non-Engish apps at Appstore is 1014

English apps at Google PlayStore is 9614

Non-Engish apps at Google PlayStore is 45



In [19]:
# Now is the time to filter non-free apps
appStore_data_ENG_FREE = []
for app in appStore_data_ENG:
    price = float(app[5])    # price information is at index 5 for appStore data
    if price == 0:
        appStore_data_ENG_FREE.append(app)

googlePlay_data_CLEANED_ENG_FREE = []
for app in googlePlay_data_CLEANED_ENG:
    price = app[7]    # price information is at index 7 for googlePlaystore data, first character ($) excluded if not free
    if price[0] == "$":
        price = float(price[1:])
    else:
        price = float(price)
    
    if price == 0:
        googlePlay_data_CLEANED_ENG_FREE.append(app)
        
print ("Free and English apps at Appstore is " + str(len(appStore_data_ENG_FREE)) +"\n")
print ("Non-free and English apps at Appstore is " + str(len(appStore_data_ENG)-len(appStore_data_ENG_FREE))+ "\n")
print ("Free and English apps at Google PlayStore is " +str(len(googlePlay_data_CLEANED_ENG_FREE))+"\n")
print ("Non-free and English apps at Google PlayStore is " + str(len(googlePlay_data_CLEANED_ENG)-len(googlePlay_data_CLEANED_ENG_FREE))+ "\n")


Free and English apps at Appstore is 3222

Non-free and English apps at Appstore is 2961

Free and English apps at Google PlayStore is 8864

Non-free and English apps at Google PlayStore is 750



### App Profile Criteria
The purpose of this work is to define a profile of apps that are mostly prefered by the users, since our company produces free to download apps and make money from in-app ads. In order to reach as much customers as possible, we need to define a profile which is successful on both markets (AppStore and Google Play). Now first, we need to find which columns should be used to understand which genres exist in stores most.

In [20]:
# Title columns are given to find the appropriate columns including genre information
print("AppStore Dataset Columns")
print(appStore_data[0])
print("\n")
print("GooglePlayStore Dataset Columns")
print(googlePlay_data[0])

AppStore Dataset Columns
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


GooglePlayStore Dataset Columns
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [21]:
# "prime genre" at AppStore and "Category" and "Genres" columns at Google Play data seem appropriate
# First we will create a function, taking a dataset and an index as parameter and returns the frequency table in dict form
# if "title_row" is True, function excludes it
def freq_table(dataset, index, title_row = False):
    
    if title_row:
        dataset=dataset[1:]
    my_dict = {}
    for lst in dataset:
        if lst[index] in my_dict:
            my_dict[lst[index]] +=1
        else:
            my_dict[lst[index]] = 1
    return my_dict

In [22]:
# This function is to conveniently sort and display frequency tables
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [23]:
# Frequency tables for appStore "prime genre" column and google Play "Category" and "Genres" columns
print("APPSTORE 'PRIME GENRE' FREQUENCY TABLE")
display_table (appStore_data_ENG_FREE,12)

APPSTORE 'PRIME GENRE' FREQUENCY TABLE
Games : 1874
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


The mostly existing free english apps are "games" in Appstore. The runner-up genre is "Entertainment" and is way behind. The difference between the first and the second is much higher than the difference between the second and the rest. In general, one can say that app store is dominated by apps produced for enternetainment purposes (Games, Entertainment, Photo & Video, Social Networking, Music, Sports etc) rather than practical purposes (education, shopping, utilities, productivity etc. Although, existence of a particular genre does not necessitate higher number of downloads, the existence of a correlation is undeniable.

In [24]:
print("GOOGLE PLAYSTORE 'CATEGORY' FREQUENCY TABLE")
display_table (googlePlay_data_CLEANED_ENG_FREE,1)

GOOGLE PLAYSTORE 'CATEGORY' FREQUENCY TABLE
FAMILY : 1676
GAME : 862
TOOLS : 750
BUSINESS : 407
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 313
SPORTS : 301
PERSONALIZATION : 294
COMMUNICATION : 287
HEALTH_AND_FITNESS : 273
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 190
DATING : 165
VIDEO_PLAYERS : 159
MAPS_AND_NAVIGATION : 124
FOOD_AND_DRINK : 110
EDUCATION : 103
ENTERTAINMENT : 85
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
HOUSE_AND_HOME : 73
WEATHER : 71
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 57
COMICS : 55
BEAUTY : 53


Google Playstore data "CATEGORY" column give a rather different pattern. At this frequency table, "FAMILY" genre apps have the majority followed by "GAMES". Practical purpose apps have higher frequency in general at Google Play Store. Although a corelation is expected between these frequencies and number of downloads, the level of that corelation should be further investigated.

In [25]:
print("GOOGLE PLAYSTORE 'GENRES' FREQUENCY TABLE")
display_table (googlePlay_data_CLEANED_ENG_FREE,9)

GOOGLE PLAYSTORE 'GENRES' FREQUENCY TABLE
Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294
Communication : 287
Action : 275
Health & Fitness : 273
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 164
Video Players & Editors : 157
Casual : 156
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 81
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 53
Parenting : 44
Card : 40
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Acti

"GENRES" column of Google Play dataset gives a similar pattern as the "CATEGORY" column. On the other hand, this column is constituted as naming sub-branches as different genres (i.e. there are many items including Education as well as only Education itself). Therefore this frequency is not useful in that sense. 

__As a result, it is not convenient to offer a certain genre type with the above information.__

### App Store Popularity Analysis

To make a more decent offer, average number of downloads for each genre should be examined. Since such information does not exist in App Store data, "rating_count_tot" columns which gives the total number of ratings for a certain app, will be used. 

In [26]:
freq_table_appStore = freq_table (appStore_data_ENG_FREE,12)

app_Store_rating_tot = {}

for app in appStore_data_ENG_FREE:
    genre = app[12]    # prime genre data is at index 12 of appStore data
    if genre in app_Store_rating_tot:
        app_Store_rating_tot[genre] += float(app[6])    # rating_count_tot is at index 6 of appStore data
    else:
        app_Store_rating_tot[genre] = float(app[6])

app_Store_rating_avg = {}
for item in app_Store_rating_tot:
    app_Store_rating_avg[item] = app_Store_rating_tot[item] / freq_table_appStore [item] 

In [27]:
app_Store_rating_avg

{'Productivity': 21028.410714285714,
 'Weather': 52279.892857142855,
 'Shopping': 26919.690476190477,
 'Reference': 74942.11111111111,
 'Finance': 31467.944444444445,
 'Music': 57326.530303030304,
 'Utilities': 18684.456790123455,
 'Travel': 28243.8,
 'Social Networking': 71548.34905660378,
 'Sports': 23008.898550724636,
 'Health & Fitness': 23298.015384615384,
 'Games': 22788.6696905016,
 'Food & Drink': 33333.92307692308,
 'News': 21248.023255813954,
 'Book': 39758.5,
 'Photo & Video': 28441.54375,
 'Entertainment': 14029.830708661417,
 'Business': 7491.117647058823,
 'Lifestyle': 16485.764705882353,
 'Education': 7003.983050847458,
 'Navigation': 86090.33333333333,
 'Medical': 612.0,
 'Catalogs': 4004.0}

The highest number of ratings is on Navigation genre. However this is mainly because of a couple of very popular navigation apps like GoogleMaps and Waze

In [28]:
for app in appStore_data_ENG_FREE:
    if app[12] == "Navigation":
        print(app[2] + " : " + app[6])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Geocaching® : 12811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS – Car Navigation & Offline Maps : 3582
Google Maps - Navigation & Transit : 154911


In [29]:
#Second most popular genre is "Reference".
for app in appStore_data_ENG_FREE:
    if app[12] == "Reference":
        print(app[2] + " : " + app[6])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


With regards to "Reference" apps, the situation is pretty much the same with "Navigation" apps. Namely, the most popular books in the world have already been allocated with other apps and the books that does not have much popularity have quite low amount of ratings. 

Third most popular genre is "Social Networking" whose average has been heavily influenced by Pinterest, Facebook and Skype.
On the other hand, I see a potential on "Games" apps

In [30]:
#Second most popular genre is "Social Networking".
for app in appStore_data_ENG_FREE:
    if app[12] == "Games":
        print(app[2] + " : " + app[6])

Hangman. : 42316
Blackjack by MobilityWare : 180087
PAC-MAN : 508808
Beer Pong Game : 187315
NYTimes Crossword - Daily Word Puzzle Game : 53465
Sky Burger - Build & Match Food Free : 114096
Pixel Starships™ : 8Bit MMORPG : 5565
Angry Birds : 824451
Glow Hockey 2 FREE : 186653
Solitaire : 679055
Angry Birds HD : 89110
▻Sudoku : 359832
Virtual Regatta Offshore : 209
Jenga : 32527
Spider Solitaire Free by MobilityWare : 128739
Angry Birds Seasons HD : 75714
Fruit Ninja® : 327025
Temple Run : 1724546
Angry Birds Rio : 170843
Angry Birds Rio HD : 70122
My Horse : 293857
Tiny Tower - Free City Building : 414803
ROBLOX : 183621
Nyan Cat: Lost In Space : 76392
Fishing Kings Free+ : 35058
Werewolf "Nightmare in Prison" : 78
Magic Jigsaw Puzzles : 187666
DragonVale : 503230
Racing Penguin Free - Top Flying and Diving Game : 141224
Phoenix HD : 13106
Sonic CD : 16358
White Tiles 2 : Piano Master ( Don't Hit The White Tiles 4 ) - Free Music Game : 60
Doodle Jump FREE - BE WARNED: Insanely addictiv

LINE Bubble 2 : 436
MARVEL Contest of Champions : 233599
Baby Airlines - Airport Adventures : 321
Seashine : 2043
Wheel of Fortune: Game Show Word Puzzles : 11876
Empires & Allies : 22902
Call of Duty®: Heroes : 179416
Bullet Boy : 1697
EA SPORTS™ UFC® : 19354
Knights of Pen & Paper 2 : 790
Guess The Song - New music quiz! : 4050
My Newborn Baby - Mommy & Baby Care : 570
Prop Hunt : 1728
Troll Face Quest Video Memes : 3770
Hero Simulator: clicker game and idle adventure : 1182
Sir Match-a-Lot – Match 3 Puzzle Game : 1546
aa : 158845
Daddy Long Legs : 76622
Monster Strike : 1381
Run!!! : 7891
Paper Girl - Morning Madness Adventures : 274
Digimon Heroes! : 3144
My Talking Angela : 54549
Rival Stars Basketball : 4383
Boom Boom Football : 2048
The Maze Runner ™ : 3370
Clicker Heroes : 7495
1010! : 53496
Dawn of Titans : 18735
Tiny Archers : 568
Fist of Fury : 6813
SimCity BuildIt : 198338
Just Dance Controller : 440
Amazing Thief : 21984
One More Line : 8392
Fatal : 124
Run Sackboy! Run! :

Dream League Soccer 2017 : 66004
My Singing Monsters: Dawn of Fire : 7749
Design Home : 23298
Splash Pop : 63
QB Hero : 304
Disney Jigsaw Puzzles! : 1546
Tap Quest : Gate Keeper : 1252
Give It Up! 2 : 453
Would You Rather? : 6479
Bakery Story 2 : 5753
METAL SLUG ATTACK : 1324
Cat Kitty Kitten Coloring Pages - Free Girl Games : 10278
Tropical Wars - Pirate Battles : 1465
Star Cheerleader - High School Tryouts : 263
KSI Unleashed : 854
Peanuts: Snoopy's Town Tale : 4260
Bus Driving Taxi Parking Simulator Real Extreme Car Racing Sim : 199
Pokémon Shuffle Mobile : 5805
Circle Swing : 2
360 Degree : 1460
Marry Me - Perfect Wedding Day! : 795
Viva™ Slots Las Vegas Classic Casino Games : 58738
Battle Golf : 455
Haywire Hospital : 285
Fusion Pictures - A Multiplayer Game with 4 Hidden Pics Object & 1 Word Puzzle : 0
Kuro no Sekai - It's a Darkness World : 0
Disney Emoji Blitz : 41036
Skylanders SuperChargers : 1296
Osmo Numbers : 52
Lost in Harmony : 1002
Athletics 2: Summer Sports - Free : 25

Bumper Jump : 166
Vector 2 : 554
Evil Dead: Endless Nightmare : 171
KleptoCats : 13300
Monster Busters: Link Flash : 42
Leap Day : 3763
Walls & Balls : 110
Teeter – Endless Arcade Balancer : 50
Gravity Square! : 58
Letter Soup - Word Game : 18358
Fear the Walking Dead: Dead Run–Tactical Runner : 2533
Versus Run : 766
Shopping Mall Car Parking Simulator a Real Driving Racing Game : 205
Card Wars Kingdom - Adventure Time : 4283
Minescape : 84
King of Avalon: Dragon Warfare : 3326
Crazy Truck! : 14
Final Kick VR - Virtual Reality free soccer game for Google Cardboard : 18
Hackers - Join the Cyberwar! : 1759
Oz: Broken Kingdom™ : 4074
SLAAAASH ! -Cut and Smash ! refreshing Puzzle- : 1
Hex Crush! : 1202
Bernie Sandwiches - Run For The White House : 472
YO-KAI WATCH Wibble Wobble : 4275
Room Escape [SECRET CODE 2] : 20
Smash Slots : 663
Escape from the ICU room. : 5
Snail Bob 2 : 276
Curvulate : 43
FarmVille: Tropic Escape - Harvest in Paradise : 16457
Cool Jigsaw Puzzles : 63
Gravity Switch

Masky : 69
White Tiles 4: Piano Master (All mini games in 1) : 3168
Monster Craft GO - Find and Catch pixelmon CarToon : 4808
My Teacher - School Classroom Play & Learn : 234
Hill Climb Racing 2 : 33854
Bold Moves : 4980
Stony Road : 373
The Trail : 12573
Bubble Island 2 - Pop Bubble Shooter : 3480
Tidal Rider : 284
European War 5: Empire : 104
Clashy Colors : 854
Limo Driving School a Valet Driver License Test Parking Simulator : 38
Knights Fight: Medieval Arena : 1501
Osteya: Adventures : 107
Ketchapp Tennis : 830
Rubin.io : 16
NHL SuperCard 2K17 : 116
Super Fashion Show - Girls Makeup, Dressup Games : 125
Evil Zombie Graveyard Apocalypse Shooting VR Games : 22
Nekosan : 905
Jelly Blast: New Exciting Match 3 : 2557
Midnight Calling: Jeronimo : 127
Christmas Stories: The Gift of the Magi : 227
Crystal Rush! Color Shoot Arcade Game : 81
Break Liner : 721
Grandpa, what the fone? : 14
Puzzlepops! Trick or Treat : 61
Twisty Board : 800
splix.io! : 415
Toon Shooters 2: The Freelancers : 10

There are not any highly popular and hard to compete game apps and the average rating numbers are more or less balanced. Besides, building a Game app idea fits the general tendency in App Store being dominated by entertainment apps. People's gaming tastes may differ widely and we think that there could be found some room for a new and popular app.

### Google Play Store Popularity Analysis
Now is the time to analyize Play Store data. We will use "Category" column for the genre information.

Although there is a distinct "Installs" column in Google Play, the data is not precise. However, for our purpose, exact precision is not needed. Considering 100,000+ downloads as 100.000 will be more than enough. 

In [31]:
print(googlePlay_data_CLEANED_ENG_FREE[454][5])

100+


In [32]:
display_table(googlePlay_data_CLEANED_ENG_FREE, 5)

1,000,000+ : 1394
100,000+ : 1024
10,000,000+ : 935
10,000+ : 904
1,000+ : 744
100+ : 613
5,000,000+ : 605
500,000+ : 493
50,000+ : 423
5,000+ : 400
10+ : 314
500+ : 288
50,000,000+ : 204
100,000,000+ : 189
50+ : 170
5+ : 70
1+ : 45
500,000,000+ : 24
1,000,000,000+ : 20
0+ : 4
0 : 1


In [33]:
# We should first get rid of plus signs and commas in order to convert the "Installs" column into float.
# We will use str.replace method for this purpose.

googlePlay_data_CLEANED_ENG_FREE_INSTALLS = []
for app in googlePlay_data_CLEANED_ENG_FREE:
    app[5]=app[5].replace("+","")
    app[5]=float(app[5].replace(",",""))
    
    googlePlay_data_CLEANED_ENG_FREE_INSTALLS.append(app)
display_table(googlePlay_data_CLEANED_ENG_FREE_INSTALLS, 5)

1000000.0 : 1394
100000.0 : 1024
10000000.0 : 935
10000.0 : 904
1000.0 : 744
100.0 : 613
5000000.0 : 605
500000.0 : 493
50000.0 : 423
5000.0 : 400
10.0 : 314
500.0 : 288
50000000.0 : 204
100000000.0 : 189
50.0 : 170
5.0 : 70
1.0 : 45
500000000.0 : 24
1000000000.0 : 20
0.0 : 5


In [34]:
# Since all number of installs are float type, we can calculate average downloads with this data, just like how we did for Apple Store data.
freq_table_googlePlay = freq_table (googlePlay_data_CLEANED_ENG_FREE,1)

googlePlay_installs_tot = {}

for app in googlePlay_data_CLEANED_ENG_FREE_INSTALLS:
    category = app[1]    # Category data is at index 1 of google Playstore data
    if category in googlePlay_installs_tot:
        googlePlay_installs_tot[category] += app[5]    # installs is at index 5 of google playstore data
    else:
        googlePlay_installs_tot[category] = app[5]

googlePlay_installs_avg = {}
for item in googlePlay_installs_tot:
    googlePlay_installs_avg[item] = googlePlay_installs_tot[item] / freq_table_googlePlay [item]
    
googlePlay_installs_avg

{'ART_AND_DESIGN': 1986335.0877192982,
 'AUTO_AND_VEHICLES': 647317.8170731707,
 'BEAUTY': 513151.88679245283,
 'BOOKS_AND_REFERENCE': 8767811.894736841,
 'BUSINESS': 1712290.1474201474,
 'COMICS': 817657.2727272727,
 'COMMUNICATION': 38456119.167247385,
 'DATING': 854028.8303030303,
 'EDUCATION': 1833495.145631068,
 'ENTERTAINMENT': 11640705.88235294,
 'EVENTS': 253542.22222222222,
 'FINANCE': 1387692.475609756,
 'FOOD_AND_DRINK': 1924897.7363636363,
 'HEALTH_AND_FITNESS': 4188821.9853479853,
 'HOUSE_AND_HOME': 1331540.5616438356,
 'LIBRARIES_AND_DEMO': 638503.734939759,
 'LIFESTYLE': 1437816.2687861272,
 'GAME': 15588015.603248259,
 'FAMILY': 3695641.8198090694,
 'MEDICAL': 120550.61980830671,
 'SOCIAL': 23253652.127118643,
 'SHOPPING': 7036877.311557789,
 'PHOTOGRAPHY': 17840110.40229885,
 'SPORTS': 3638640.1428571427,
 'TRAVEL_AND_LOCAL': 13984077.710144928,
 'TOOLS': 10801391.298666667,
 'PERSONALIZATION': 5201482.6122448975,
 'PRODUCTIVITY': 16787331.344927534,
 'PARENTING': 5426

The most installed genres are "COMMUNICATION", "VIDEO PLAYERS", "SOCIAL", "PHOTOGRAPHY", "TOOLS", "GAME", "ENTERTAINMENT" and "NEWS_AND_MAGAZINES". 

First 5 of those genres are dominated by apps like Youtube, Whatsapp, Instagram, Google Photos, Google etc. To compete those apps are far beyond our company's ability. However on the other hand, gameing apps are in wide variaty and the number of downloads are balanced. We think that there is a room for this genre and it might be possible for such an app to be succesfull on both Google Play Store and App Store.

In [35]:
for app in googlePlay_data_CLEANED_ENG_FREE_INSTALLS:
    if app[1] == "GAME":
        print(app[0] + " : " + str(app[5]))

Solitaire : 10000000.0
Sonic Dash : 100000000.0
PAC-MAN : 100000000.0
Bubble Witch 3 Saga : 50000000.0
Race the Traffic Moto : 10000000.0
Marble - Temple Quest : 10000000.0
Shooting King : 10000000.0
Geometry Dash World : 10000000.0
Jungle Marble Blast : 5000000.0
Roll the Ball® - slide puzzle : 100000000.0
Block Craft 3D: Building Simulator Games For Free : 50000000.0
Farm Fruit Pop: Party Time : 1000000.0
Love Balls : 50000000.0
Piano Tiles 2™ : 100000000.0
Pokémon GO : 100000000.0
Paint Hit : 10000000.0
Snake VS Block : 50000000.0
Rolly Vortex : 10000000.0
Woody Puzzle : 1000000.0
Stack Jump : 10000000.0
The Cube : 5000000.0
Extreme Car Driving Simulator : 100000000.0
Bricks n Balls : 1000000.0
The Fish Master! : 1000000.0
Color Road : 10000000.0
Draw In : 10000000.0
PLANK! : 500000.0
Looper! : 1000000.0
Trivia Crack : 100000000.0
Will it Crush? : 5000000.0
Tomb of the Mask : 5000000.0
Baseball Boy! : 10000000.0
Hello Stars : 10000000.0
Tank Stars : 10000000.0
Hole.io : 10000000.0
M

NDS Emulator - For Android 6 : 1000000.0
Free DS Emulator : 1000000.0
nds4droid : 10000000.0
MegaNDS (NDS Emulator) : 500000.0
EmuBox - Fast Retro Emulator : 50000.0
Simple x3DS Emulator - BETA : 50000.0
DS Tower Defence : 100000.0
RetroArch : 1000000.0
Wheelie Challenge : 5000000.0
Sword of Chaos - Lame du Chaos : 100000.0
Hotel Insanity : 100000.0
Shoot! DX - lights for FREE : 100.0
American Sniper City Fight Shooting Assassin : 50000.0
Street Skater 3D : 10000000.0
DZ-JOKER : 100.0
Super ball DZ : 1000.0
Ludo Dz : 500.0
Need for Speed™ No Limits : 50000000.0
Peggle Blast : 5000000.0
Heroes of Dragon Age : 1000000.0
SCRABBLE : 5000000.0
Asphalt Xtreme: Rally Racing : 10000000.0
Modern Combat 5: eSports FPS : 100000000.0
Mass Effect: Andromeda APEX HQ : 100000.0
Mirror’s Edge™ Companion : 100000.0
Gear.Club - True Racing : 1000000.0
Mental Hospital:EB 2 Lite : 100000.0
EC Mover : 10.0
EF Jumper : 100.0
E.G. Chess Free : 10000.0
L.A. Crime Stories 2 Mad City Crime : 100000.0
LA Stories

In this project, our purpose was to examine the apps data taken from App Store and Google Play in order to offer a genre for our company to build a new app for the market. Our company gathers most of its income through in-app ads and English speaking countries, therefore we focused on those apps filtering the rest from the data.

We concluded that, apps which have high number of users have genres like communication, video streaming etc and those fields are heavily dominated by huge companies already like Facebook and Google. Therefore we offered the company to build a game app which we see more room to get success in both markets.