PROJECT TITLE
------------------------
Profitable App Profiles for the App Store and Google Play Markets


PROJECT INTRODUCTION
--------------------------------------
A guided DQ project to apply the learnings of STEP 1: Introduction to Python


PROJECT GOAL
-------------------------
Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

Opening and Exploring the Data
----------------------------------------------
As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources with collecting new data ourselves, we should first try to see whether we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose.

The documentation of the [Google data-set](https://www.kaggle.com/lava18/google-play-store-apps), contains data about approximately ten thousand Android apps from Google Play.

The documentation of the [Apple data-set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps), contains data about approximately seven thousand iOS apps from the App Store.

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row
                
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
from csv import reader

# google data-set
google_data = open('../datasets/googleplaystore.csv', encoding="utf8")
gread_file = reader(google_data)
google_datalist = list(gread_file)

# retrieve the header
google_header = google_datalist[0]

# retrieve the data-rows
google_datarows = google_datalist[1:]

# print the header
print(google_header)
print('\n')

# print the rows of data set using explore()
explore_data(google_datarows, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

The columns that look important in concluding our analysis are 'App', 'Category', 'Rating',  'Installs', 'Content Rating', 'Type', 'Price', and 'Genres'.


In [3]:
from csv import reader

apple_data = open('../datasets/AppleStore.csv',  encoding="utf8")
aread_file = reader(apple_data)
apple_datalist = list(aread_file)

# retrieve the header
apple_header = apple_datalist[0]

# retrieve the data-rows
apple_datarows = apple_datalist[1:]

# print the header
print(apple_header)
print('\n')

# print the rows of data set using explore()
explore_data(apple_datarows, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


The columns that look important in concluding our analysis are 'track_name', 'price', 'rating_count_tot', 'user_rating','cont_rating', 'prime_genre'.


Deleting rows with wrong data for Google data set
------------------------------------------------------------------------

To confirm the discussion outlining the error for row 10472, have checked the data set with the length
of the row, if a column value is missing the row length should be less than 13, which is the length for a good row.

In [4]:
for i, val in enumerate(google_datarows): 
    # List all the row's of data, which have missing columns
    if len(val) < 13:
        print('ROW DATA')        
        print(val, '\n')
        print('ROW INDEX')
        print(i) 

ROW DATA
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 

ROW INDEX
10472


The row 10472, has a missing 'Category' column, which will be deleted as below.

In [5]:
print(len(google_datarows))
#del google_datarows[10472]
print(len(google_datarows))

10841
10841


Removing duplicate data rows
--------------------------------------------
Creating a data set which has data with unique app names having the maximun number of reviews

#### STEP 1: Identify total no of duplicate app rows


In [6]:
duplicate_apps = []
unique_apps = []

for app in google_datarows:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
    
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


#### STEP 2: Create a unique dictionary with [AppName: MaxReviewCount]

In [7]:
uniqueapps_maxreview_dict = {}
for approw in google_datarows:
    name = approw[0]
    review_val = approw[3]
    if name in uniqueapps_maxreview_dict:
        curr_rev_val = uniqueapps_maxreview_dict[name]
        if review_val > curr_rev_val:
            uniqueapps_maxreview_dict.update({name: review_val})           
    else:
        uniqueapps_maxreview_dict.update({name: review_val})

print('[KEY]', '\t',  '[VALUE]', '\n')
for k in uniqueapps_maxreview_dict:
    print(k, '\t', uniqueapps_maxreview_dict[k])    
print('\n')

[KEY] 	 [VALUE] 

Photo Editor & Candy Camera & Grid & ScrapBook 	 159
Coloring book moana 	 974
U Launcher Lite – FREE Live Cool Themes, Hide Apps 	 87510
Sketch - Draw & Paint 	 215644
Pixel Draw - Number Art Coloring Book 	 967
Paper flowers instructions 	 167
Smoke Effect Photo Maker - Smoke Editor 	 178
Infinite Painter 	 36815
Garden Coloring Book 	 13791
Kids Paint Free - Drawing Fun 	 121
Text on Photo - Fonteee 	 13880
Name Art Photo Editor - Focus n Filters 	 8788
Tattoo Name On My Photo Editor 	 44829
Mandala Coloring Book 	 4326
3D Color Pixel by Number - Sandbox Art Coloring 	 1518
Learn To Draw Kawaii Characters 	 55
Photo Designer - Write your name with shapes 	 3632
350 Diy Room Decor Ideas 	 27
FlipaClip - Cartoon animation 	 194216
ibis Paint X 	 224399
Logo Maker - Small Business 	 450
Boys Photo Editor - Six Pack & Men's Suit 	 654
Superheroes Wallpapers | 4K Backgrounds 	 7699
Mcqueen Coloring pages 	 65
HD Mickey Minnie Wallpapers 	 118
Harley Quinn wallpapers HD 

Ready4 SAT (Prep4 SAT) 	 13612
Socratic - Math Answers & Homework Help 	 37862
Ready4 GMAT (Prep4 GMAT) 	 18372
Pocket GMAT Math 	 656
GMAT Question Bank 	 240
GRE Tutor 	 275
GRE Flashcards 	 13791
play2prep: ACT, SAT prep 	 3692
SAT Test 	 2363
GMAT Math Flashcards 	 1769
Pocket SAT Math 	 430
TOEFL Prep & Practice from Magoosh 	 756
GRE Prep & Practice by Magoosh 	 3963
GRE® Flashcards by Kaplan 	 316
SAT Vocabulary 	 642
Magoosh GMAT Prep & Practice 	 1058
SAT Flashcards: Prep & Vocabulary 	 2277
Rosetta Stone: Learn to Speak & Read New Languages 	 172508
Google Classroom 	 69498
LinkedIn Learning: Online Courses to Learn Skills 	 7973
Learn English with Phrases 	 5695
Free english course 	 142632
Learn 50 languages 	 55256
Babbel – Learn Spanish 	 54798
Mango Languages: Lovable Language Courses 	 4815
Learn English with Aco 	 75112
Learn to Speak English 	 33646
busuu: Learn Languages - Spanish, English & More 	 207294
My Class Schedule: Timetable 	 9348
Study Checker 	 3816
My St

Super Slime Simulator - Satisfying Slime App 	 53652
Caf - My Account 	 18961
H Pack 	 9412
Family convenience store FamilyMart 	 9663
သိင်္ Astrology - Min Thein Kha BayDin 	 2225
w UN map 	 23164
Official Matsumoto Kiyoshi application 	 3031
Galaxy Gift 	 95557
PASS by KT (formerly KT certified) 	 7869
Safety stepping stone 	 4212
Rate Guide Bill Letter 	 17368
US Mission - buy gourmet movie KTV 	 6554
OK cashbag [point of pleasure] 	 33264
JOANN - Crafts & Coupons 	 34802
MK eCatalog 	 6676
Vaniday - Beauty Booking App 	 1067
Fashion in Vogue 	 1797
Mirror 	 367505
StyleSeat 	 20304
Wedding Countdown Widget 	 7376
My Day - Countdown Calendar 🗓️ 	 49147
justWink Greeting Cards 	 69177
Wedding LookBook by The Knot 	 3448
Big Days - Events Countdown 	 39724
Wedding Planner by WeddingWire - Venues, Checklist 	 3788
Been Together (Ad) - D-day 	 95736
WedMeGood - Wedding Planner 	 1658
DIY Garden Ideas 	 3309
Brit + Co 	 987
Creative Ideas - DIY & Craft 	 5208
Homestyler Interior Design &

Fotor Photo Editor - Photo Collage & Photo Effects 	 597068
Snapseed 	 823109
Font Studio- Photo Texts Image 	 197295
Add Text To Photo 	 21578
Phonto - Text on Photos 	 307453
Collage&Add Stickers papelook 	 32896
Photo Collage - InstaMag 	 542561
Meitu – Beauty Cam, Easy Photo Editor 	 462702
ESPN 	 521140
Free Sports TV 	 1802
LiveScore: Live Sport Updates 	 283662
MLB At Bat 	 82883
NFL 	 459797
theScore: Live Sports Scores, News, Stats & Videos 	 133833
Onefootball - Soccer Scores 	 911995
Cristiano Ronaldo Wallpaper 	 1733
FIFA - Tournaments, Soccer News & Live Scores 	 342912
Futbol24 	 31908
kicker football news 	 56270
Football Live Scores 	 107724
Pro 2018 - Series A and B 	 101455
BeSoccer - Soccer Live Score 	 152780
Sport.pl LIVE 	 21733
FotMob - Live Soccer Scores 	 410395
Yahoo Fantasy Sports - #1 Rated Fantasy App 	 277939
CBS Sports App - Scores, News, Stats & Watch Live 	 91035
The Team - Live Sport: football, tennis, rugby .. 	 112725
MARCA - Sports Leader Diary 	 76

Spreaker Podcast Radio 	 17703
BeyondPod Podcast Manager 	 32121
Dezeen Magazine RSS Reader 	 350
issuu - Read Magazines, Catalogs, Newspapers. 	 74425
BuzzFeed: News, Tasty, Quizzes 	 131028
Fast News 	 84957
LA Times: Your California News 	 3311
The Washington Post Classic 	 23158
Chicago Tribune 	 1380
USA TODAY 	 49259
World Newspapers 	 185884
The Wall Street Journal: Business & Market News 	 40975
Financial Times 	 27104
The Guardian 	 247992
NYTimes - Latest News 	 63647
Digg 	 6105
News360: Personalized News 	 30722
RT News (Russia Today) 	 56524
NPR News 	 24790
Reuters News 	 13169
Bloomberg: Market & Financial News 	 61692
Haystack TV: Local & World News - Free 	 3684
ABC News - US & World News 	 18976
NBC News 	 63020
Sync for reddit 	 62740
AP Mobile - Breaking News 	 76677
HuffPost - News 	 78154
News Republic 	 479594
Newsroom: News Worth Sharing 	 201737
SmartNews: Breaking News Headlines 	 233305
Updates for Samsung - Android Update Versions 	 80368
AC - Tips & News fo

R Programming Tutorial 	 5
R Programing Offline Tutorial 	 4
Learn R Language Easy Way 	 17
.R 	 259
R-TYPE II 	 5682
Guide for R Programming 	 0
R Programming Solution 	 169
Learn R - Programming Concepts 	 6
Learn R Programming 	 0
Car Parking Nissan GT-R R35 Simulator 	 513
R Quick Reference Big Data 	 6
Learn R Programming Free EBook 	 1
RMEduS - 음성인식을 활용한 R 프로그래밍 실습 시스템 	 4
R Programming Language (Paperset 2) MCQ Quiz 	 0
R-net for Android 	 48
R File Manager 	 17
R-TYPE 	 7687
R Studio 	 23
Day R Premium 	 51068
Join R, Community Engagement 	 144
R Bank 	 90
Elemental Knights R Platinum 	 2925
Wonder5 Masters R 	 1655
Neon-R (Red) 	 25
Mat|r viewer 	 5
R. Lee Ermey's Official Sound 	 1696
Tutorials for R Programming Offline 	 2
R Programming Language (Paperset 1) MCQ Quiz 	 1
S Launcher for Galaxy TouchWiz 	 11244
360 Security - Free Antivirus, Booster, Cleaner 	 16771865
S Player - Lightest and Most Powerful Video Player 	 14224
Offroad Pickup Truck S 	 5178
S Launcher Pro for G

AG Contacts, Premium edition 	 88
Ag PhD Field Guide 	 114
AG Drive 3D 	 164
Apps for SportsBєtting.ag - Bitcoin Welcome here! 	 0
AG Contacts, Lite edition 	 185
Ag Weather Tools 	 0
Trimble Ag Mobile 	 85
Ag Tools 	 2
Ag PhD Deficiencies 	 20
AG Screen Recorder 	 7
SOLEM AG 	 1
AG Subway Simulator Mobile 	 623
AG Subway Simulator Lite 	 6738
Elim AG 	 7
EZ Ag Mobile 	 19
AG Subway Simulator Pro 	 0
BRL AG 	 0
Oklahoma Ag Co-op Council 	 0
AG Fast Service Automotive 	 5
Ag PhD Soils 	 3
All States Ag Parts 	 68
VR AG Racing for Cardboard 	 13
Safe Ag Systems™ 	 3
Ag PhD Planting Population Calculator 	 0
Ag Trucking Mobile App 	 0
Lakeside AG Moultrie 	 3
Ag Guardian 	 1
Wind & Weather Meter for Ag 	 3
Ag Across America 	 0
Border Ag & Energy 	 0
Ag-Pro Companies 	 0
My Ag Report 	 1
Alabama Ag Credit Ag Banking 	 0
AG EMS Tour 	 3
Mix Tank – Tank Mixing Ag App 	 64
Eternal Light AG 	 30
AG test 	 3
West Central Ag 	 3
Platincoin Wallet - PLC Group AG 	 375
Tri-Ag (WV) FCU 	 0
Ag PhD 

Scanning body and undressing people 	 1012
Ultimate Tennis 	 183004
Legion of Heroes 	 143087
Soccer Star 2018 Top Leagues · MLS Soccer Games 	 652940
Heroes Arena 	 336386
virtual lover 3D 	 5195
Range Master: Sniper Academy 	 91935
FC Barcelona Official App 	 92522
Can Your Pet? : Returns - Teen 	 45370
GUYZ - Gay Chat & Gay Dating 	 41269
Bubbu – My Virtual Pet 	 394842
3D Tennis 	 1008012
Start the Hunt for the Lost Treasure 	 302
Treasure Hunt Hidden Objects Adventure Game 	 1231
Scanning under clothes (prank) 	 1443
WGT Golf Game by Topgolf 	 148083
3D Holograms Joke 	 31596
Weaphones™ Gun Sim Free Vol 1 	 598975
Hidden Objects Treasure Hunt Adventure Games 	 275
Live Camera Viewer ★ World Webcam & IP Cam Streams 	 64164
Body scanner (prank) 	 16063
AP ENPS Mobile 	 22
AP® Guide 	 3
WiFi Access Point (hotspot) 	 684
AP Mobile 	 117
AP Manager 	 177
AP App for Android™ 	 188
Access Point Finder 	 22
AP English Language: Practice Tests and Flashcards 	 19
AP Mobile 104 	 0
AP Plann

Hunt Buddy BC 	 172
Survival Mobile:10,000 BC 	 7441
iHunter BC 	 55
iFish BC 	 25
BC Connect 	 11
Ice Crush 10.000 B.C. 	 31
BC Navigator 	 21
BC Lotto Check 	 10
BC Liquor Stores 	 492
BC MVA Fines 	 5
British Columbia Transit Info 	 0
BC Hospital Wait Times 	 8
Town of Princeton, BC 	 4
Explore British Columbia - BC Travel Guide 	 7
Bridge Constructor Stunts FREE 	 36151
BC's Pizza 	 113
Baby Connect (activity log) 	 8343
BC Wildfire 	 27
BC Slots - The Lost Reels FREE 	 209
BC Mobile Intro - Americas 	 0
Car Driving Theory Test BC 	 0
Truck Driving Test Class 3 BC 	 1
BC Highways - Road Conditions 	 382
BC Camera 	 5
BC iptv player 	 8
Business Calendar Pro 	 27135
Victoria, BC | Tour City 	 20
BC Wildflowers 	 7
Bc Vod 	 1
BC Hockey 	 20
Outdoor Movies BC 	 7
British Columbia Tourist Places (Guide) 	 1
British Columbia License 	 14
BC browser 	 1
PPCNP-BC Pocket Prep 	 3
AP Calculus BC Practice Test 	 4
BackCountry Navigator TOPO GPS PRO 	 6230
BC Pizza 	 3
Railroad Radio Vancouve

BJ's Wholesale Club 	 4099
BJ’s Mobile App 	 4722
BJ's Express Scan 	 69
BJ's Bingo & Gaming Casino 	 501
BJ-UFO 	 226
BJ Bridge Pro 2018 	 17
BJ Bridge Free (2018) 	 20
BJ & Jamie 	 13
The Daily BJ 	 38
BJ-FPV 	 16
BJ Bridge Acol Beginner 2018 	 11
BJ Memo Widget 	 616
BJ Grand Salon Mobile App 	 1
BJ Bridge Standard American 2018 	 1
BJ Strategy Tester 	 1
Eating Show - Food BJ 	 103
Virtual DJ Sound Mixer 	 4010
Cards Casino:Video Poker & BJ 	 325
DJ Music Pad 	 35121
BJ-DRONE 	 1
Real Casino:Slot,Keno,BJ,Poker 	 341
AfreecaTV 	 381023
BJ - Confidential 	 0
Basic Strategy Training BJ 21 	 0
HON. B.J. ACS COLLEGE ALE 	 3
One Launcher 	 26601
BJ Toys 	 55
Alumni BJ 	 2
뽕티비 - 개인방송, 인터넷방송, BJ방송 	 414
BJ card game blackjack 	 3
BMH-BJ Congregation 	 0
BJ's Community SoundBoard 	 0
Real DJ Simulator 	 68664
BJ TIKET 	 3
BLACKJACK! 	 524467
Blackjack Verite Drills 	 17
BJ Foods 	 3
Blackjack 	 52199
3D DJ – DJ Mixer 2018 	 6333
BJ Adams 	 1
Phonics Puzzles 	 4
Spider grinder in screen funn

Bu Hangi Firma? 	 8
BU Syllabus 	 38
Bu Hangi Ünlü? 	 1
Bu Nedir ? 	 0
BU Bison Nation 	 4
Kim Bu Youtuber? 	 11
BU HANGİ ŞARKI ? - 2018 	 4
BU Library 	 4
SkyTest BU/GU Lite 	 28
BU Alsace 	 3
Bu Hangi Oyun ? 	 2
Catholic La Bu Zo Kam 	 23
Bu Hangi Youtuber ? 	 31
Nedir Bu ? 	 0
Bu Hangi Film ? 	 3
Khrifa Hla Bu (Solfa) 	 0
Bu Hangi Dizi ? 	 14
Kristian Hla Bu 	 238
SA HLA BU 	 119
Bubble Witch 2 Saga 	 2838064
Büfe BU 	 14
BV Mobile Apps 	 3
BV 	 3
Bacterial Vaginosis 	 1
Bacterial Vaginosis 🇺🇸 	 1
Bacterial Vaginosis Symptoms & Treatment 	 0
Bacteria Vaginosis 	 4015
Hilverda De Boer B.V. App 	 5
BV Smart 	 28
BV Forest 	 0
Bacterial Vaginosis Symptoms 	 698
BV Sridhara Maharaj 	 8
BV Aventure 	 0
Bacterial vaginosis Treatment - Sexual disease 	 2
BV Link 	 10
Bacterial Vaginosis Treatment 	 0
BV Bombers 	 0
Schulman B.V. 	 6
BV MAAp 	 9
BV Taxi - Driver 	 1
BV Taxi Sudan 	 8
van Gennip Textiles BV 	 1
bacterial vaginosis 	 0
PIO bv App 	 2
Meu Cartão BV 	 15057
BV Rando 	 0
Kovax E

iOBD2-CF 	 5
CF PD 	 4
Themes DAF CF Trucks 	 2
CF Calculator 	 25
CF Climb 	 2
Squeezy CF 	 0
Wallpapers DAF CF Trucks 	 1
Wallpapers DAF CF 85 Trucks 	 0
CF 	 2
Wallpapers Truck DAF CF 	 19
CF cal 	 123
CF GeneE 	 16
Themes DAF CF 85 	 3
CF SPOT 	 0
CF Talenti 	 4
Villarreal CF Wallpapers 4 Fans 	 2
Thrive CF 	 6
Special Forces Group 2 	 1432809
Valencia CF Wallpapers 4 Fans 	 4
Cacique CF 	 19
CF Townsville 	 4
CF Church 	 1
CF Etowah 	 0
Cystic Fibrosis Symptoms, Doctors & Treatments 	 6
Málaga CF Wallpapers 4 Fans 	 2
Cluster CF 	 10
CF Themis 	 0
Imperium CF 	 9
Azulones Getafe CF Fans 	 4
CF Riga 	 12
Casa CF 	 12
CampGladiator 	 293
CG Creative Sets: 2D/3D Artist 	 117
Motocross Motorbike Simulator Offroad 	 51366
CG Daily News 	 65
CG - Calendars Add-On 	 49
CG Fit Scale 	 13
All Info about Cg 	 15
cg guruji 	 170
CG Yojna & Jansampark 	 54
Offline Jízdní řády CG Transit 	 7314
CG Prints 	 1
CG Job Alerts 	 116
CG - Conference Call Add-On 	 6
Somos CG 	 2
CG Samanya Gyan 	 145

News 8 San Diego 	 1137
Learn Morse Code - G0HYN Learn Morse 	 27
SYFY 	 3069
Morse Notifier Free 	 79
AMC 	 20843
Cartoon Wars 3 	 137674
ABC – Live TV & Full Episodes 	 50428
Oracle CX Cloud Mobile 	 48
CX-10WiFi 	 1419
CX-OF 	 18
cx-32wifi 	 62
CX-17WIFI 	 49
Cathay Pacific 	 4069
Cx File Explorer 	 175
cx-33wifi 	 63
CX-WiFi720P 	 47
CX-60 	 19
ACTIVEON CX & CX GOLD 	 439
CX North America 	 10
CX-10DS 	 12
CX-37 	 13
Avaya CX 	 21
TI-Nspire CX Calculator Manual 	 11
CX_WiFiUFO 	 818
CX watcher 	 4
Theme For Techno Camon CX 	 42
CX-95WIFI 	 6
Theme for Tecno Camon CX / C8 	 147
CX Carrier Lite 	 0
Oración CX 	 103
CX-40 	 2
Cx Wize 	 3
Ring 	 517
Racing CX 	 4
cx advance call blocker 	 3
CX Ram Booster 	 12
CX-42 	 0
CX Capture 	 1
CONNECT: The Mobile CX Summit 	 0
Hyundai CX Conference 	 0
CX-41 	 1
CX Monthly Tech News 	 2
CXmodel-ufo 	 19
CX Summit 	 1
Secret Codes For Android 	 6060
Absolute RC Heli Simulator 	 654
WiFi FPV 	 316
CX Elevated 	 0
go41cx 	 171
FlexRelease CX 	 4
A

DM security - Dragon Mobile 	 27
DM 24/7 	 7
DM Storage (for twitter) 	 6
DM Transfers Dalaman Transfers 	 51
DM Tracker 	 11
DM Tuning 	 64
DM Die Roller 9000 	 0
DM Magazine 	 0
DM הפקות 	 2
DM TrackMan 	 10
Aster DM Healthcare 	 32
Fake Chat (Direct Message) 	 5985
Interactive NPC DM Tool 	 5
DM Buddy » Learn Digital Marketing 	 3
DM Accounting and Payroll 	 0
Basketball Dynasty Manager 16 	 426
DM Collection 	 3
DM AirDisk HDD 	 4
DM AirDisk NAS 	 0
Shaggy's DM Assistant 	 0
Otto DM 	 0
DM Adventure 	 0
Ramdor DM Mobile 	 1
Disciple Maker’s (DM) Lab 	 3
Ultimate DM 	 0
DN 	 78
DN eAvis 	 61
DN Sync 	 56
DN Reader 	 18
DN Managed Mobility App 	 0
e-DN - den digitala tidningen från Dagens Nyheter 	 160
MultiPicture Live Wallpaper dn 	 505
DN Blog 	 20
Dagens Nyheter 	 2055
DN Events 	 4
DN.VR 	 29
DN - Diário de Notícias 	 794
DN Employee 	 1
DN Advanced Service Coder 	 0
DN Premium Hookah Lounge 	 1
PlayMotiv dn edition 	 0
Cossack Dictionary (DN) 	 7
DN Calculators 	 12
DN Diamonds

DU Flashlight - Brightest LED & Flashlight Free 	 73821
Upgrade for Android DU Master 	 12993
Modern: DU Launcher Theme 	 868
Weather Forecast Pro 	 14051
Night: DU Launcher Theme 	 1419
DU Collage Maker - Photo Collage & Grid & Layout 	 4595
Du Chinese – Mandarin Lessons 	 3390
Portes du Soleil 	 500
Sword of Chaos - Lame du Chaos 	 23599
Hadith du jour 	 276
Bible du Semeur-BDS (French) 	 313
La citadelle du musulman 	 314
Voyance du pyromancien 	 22
Clean My Android 	 101163
Citation du Jour - Motivation 	 321
La Poupée du Voyant 	 265
Star Chart 	 128808
APUS Booster - Space Cleaner & Booster 	 1048766
Officiel du SCRABBLE LAROUSSE 	 116
Proverbes du monde 	 38
GO Security－AntiVirus, AppLock, Booster 	 1251479
iSmart DV 	 5692
dv Prompter 	 321
Sports DV 	 86
Live DV 	 16
Mini DV 	 12
DV-2019 Results 	 158
AKASO DV 	 99
DV-LOTTERY 2019 REGISTRATION 	 97
Selfie DV 	 29
DV-4036 by Somikon 	 17
DV 2019 Entry Guide 	 92
Touch DV 	 1
Porch DV 	 5
DV 2019 - EDV Photo & Form 	 314
SDV Cam

Verdant EI 	 0
Tamago egg 	 49
Ei-ij Spelling Dutch 	 20
Esporte Interativo - Notícias e Resultados Ao Vivo 	 4600
Let's Poke The Egg 	 9051
Egg 	 72
ei Calc 	 2
Disaster Will Strike 	 14692
Egg: clicker 	 40678
Egg for Pou 	 59096
EI App 2 	 0
Survival: Prison Escape 	 127810
Crack the blue angry birds egg 	 85
Tamago Tap Clicker Egg 	 209
Tap The Easter Egg! 	 111
Esporte Interativo Plus 	 27179
EI国际 	 15
Egg Baby 	 351607
Surprise Eggs 	 81543
Crack Attack 	 31970
Aaj Bangla: ei samay er khobor 	 34
Break the Egg 	 366
Kolkata News:Anandbazar Patrika,ei samay&AllRating 	 10
PJ Masks: HQ 	 13731
Ei Somoy & Dainik Sangbad & Statesman (PDF) 	 4
TAMAGO Monsters Returns 	 104583
EJ.by 	 10
EJ Elite Prospects League 	 3
Super Sport Car Simulator 	 58553
EJ Insight 	 33
Ambulance Rescue Simulator 17 	 6669
EJ Trívia Game 	 1
Bad Piggies HD 	 764967
EJ messenger 	 1
Traffic Sniper Counter Attack 	 5339
Rope Superhero Unlimited 	 6735
Painel EJ SSH - INTERNET GRÁTIS 	 10
EJ Ecuador 	 9
Ej-bu

Get SMART ER/LA Opioids 	 0
ER Doctor City Emergency - Surgery City Doc FREE 	 1909
ER Hospital Simulator 	 286
DMC ER Now 	 6
Nose Surgery ER Simulator Lite 	 269
FAHREDDİN er-RÂZİ TEFSİRİ 	 9
Cardinal Glennon ER Reference 	 19
Scanner Radio - Fire and Police Scanner 	 175509
Fill 'er Up 	 58
SnakeBite911 ER 	 1
Ankle Surgery ER Emergency 	 10449
Voxer Walkie Talkie Messenger 	 230564
ES File Explorer/Manager PRO 	 81614
ES Task Manager (Task Killer ) 	 171771
ES Audio Player ( Shortcut ) 	 1236
ES Disk Analyzer - Storage Space 	 5867
ES App Locker 	 32207
ES Chromecast plugin 	 23859
ES Material Theme for Pro 	 14428
ES Holo Theme for Pro 	 4737
ES File Explorer 	 278
ES Dark Theme for free 	 7851
ES Classic Theme 	 20865
ES Themes -- Classic Theme 	 77609
ES Holo Theme 	 11449
ES File Explorer & Manager, Locker Xplorer 2018 	 11
ES Summer Chill Theme for Free 	 940
Furrion ES Control 	 26
OpenGL ES Extensions - The OpenGL Utility 	 513
File Ex - ES File Explorer 	 24
OpenGL ES CapsV

TownWiFi | Wi-Fi Everywhere 	 2372
Jazz Wi-Fi 	 49
Xposed Wi-Fi-Pwd 	 1042
Life Made WI-Fi Touchscreen Photo Frame 	 3.0M
Sat-Fi Voice 	 37
Wi-Fi Visualizer 	 132
Lennox iComfort Wi-Fi 	 552
Sci-Fi Sounds and Ringtones 	 128
Sci Fi Sounds 	 4
Free Wi-fi HotspoT 	 382
FJ 4x4 Cruiser Offroad Driving 	 3543
FJ 4x4 Cruiser Snow Driving 	 1619
Wallpapers Toyota FJ Cruiser 	 78
New Wallpapers Toyota FJ Cruiser Theme 	 1
FJ Final Join , Circles Game 	 32
HD Wallpaper - Toyota FJ Cruiser 	 2
FJ Drive: Mercedes-Benz Lease 	 107
Driving n Parking School 2017 	 15
FJ WiFi HDD 	 40
Offroad Cruiser 	 42432
HD Themes Toyota Cruiser 70 	 86
Toyota Cruisers & Trucks Mag 	 10
4 x4 Offroad SUV 3D Truck Simulator Driving 2017 	 32
Cake Shop - Kids Cooking 	 30668
HD Themes Toyota Cruiser 60 	 0
HD Themes Toyota Cruiser70 	 0
HD Themes Toyota Cruiser 80 	 2
HD Themes Toyota Cruiser 50 	 1
OFF-ROAD SIMULATOR 4x4 : REAL 	 109
HD Themes Toyota Cruiser 100VX 	 2
HD Themes Toyota Cruiser 200 	 3
HD Themes Toyo

#### STEP 3: Conform the unique count to be the difference of total data set count & duplicate rows

In [8]:
print('Total count of google data set:', len(google_datarows))
print('Total count of duplicate app rows:', len(duplicate_apps))
print('Difference of the above two:', (len(google_datarows) - len(duplicate_apps)))
print('Unique Dict rows:', len(uniqueapps_maxreview_dict))

Total count of google data set: 10841
Total count of duplicate app rows: 1181
Difference of the above two: 9660
Unique Dict rows: 9660


#### STEP4: Create a separate data set which is cleaned of any duplicate rows, and has unique app names with highest review counts

In [9]:
clean_google_datalist = []
checkappname_samereviewcount_list = []

for approw in google_datarows:
    appname = approw[0]
    reviewval = approw[3]
    
    if uniqueapps_maxreview_dict[appname] == reviewval:
        if appname not in checkappname_samereviewcount_list:
            clean_google_datalist.append(approw)
            checkappname_samereviewcount_list.append(appname)
            
            
explore_data(clean_google_datalist, 0, 5, True)            
    

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9660
Number of columns: 13


The unique dictionary row count and the newly cleaned data set rows are both equal to 9660

Removing non-english names
-------------------------------------------

Recognizing the app names having non-english characters, by using the corresponding ASCII numbers associated with it.

In [10]:
def is_english(a_string):
    for ch in a_string:
        if( ord(ch) > 127):
            return False
    return True
    
    
print(is_english('dfdsf爱abchk'))
print(is_english('Instagram'))  
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))  
print(is_english('Instachat 😜'))  

False
True
False
False
False


Improving the function to allow upto 3 non-ASCII characters

In [11]:
def is_english(a_string):
    non_asci_count = 0
    
    for ch in a_string:
        if ( ord(ch) > 127):
            non_asci_count += 1
            
    if non_asci_count > 3:
        return False
    else:
        return True
    
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
True


Applying the above function to filter both the datasets for non-english apps

In [12]:
google_eng = []
apple_eng = []

for row in clean_google_datalist:
    name = row[0]
    if is_english(name):
        google_eng.append(row)
        
for row in apple_datarows:
    name = row[1]
    if is_english(name):
        apple_eng.append(row)
        
explore_data(google_eng, 0, 3, True)
print('\n')
explore_data(apple_eng, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9615
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

Isolating Free Apps
----------------------------

In [13]:
google_result = []
apple_result = []
non_zero = []

for row in google_eng:
    price = row[7]
    if price == '0':        
        google_result.append(row)            
        
        
for row in apple_eng:
    price = row[4]
    if price == '0.0':
        apple_result.append(row)

print(len(google_result))
print(len(apple_result))

8862
3222


Andriod Apps - 8862

iOS Apps - 3222

Grouping Apps by Genre
------------------------------------

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we then develop it further.
If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.
Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the <span style="color: red;">prime_genre</span> column of the App Store data set, and the <span style="color: red;">Genres and Category</span> columns of the Google Play data set.

#### STEP 1: Function to generate frequency tables showing percentages

In [14]:
def freq_table(dataset, index):
    freq_t = {}
    freq_t_percent = {}
    
    for row in dataset:
        row_value = row[index]
        if row_value in freq_t:
            freq_t[row_value] += 1
        else:
            freq_t[row_value] = 1
    
    #print(freq_t)     
    #print('\n')
    
    for k in freq_t:
        percent = (freq_t[k] / (len(dataset)) * 100)
        freq_t_percent[k] = percent
    
    return freq_t_percent
    #print(freq_t_percent)       
    
#freq_table(apple_result, -5)

#### STEP 2: Function to display the percentages in a descending order

In [15]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

#### STEP 3. Examine the frequency tables for the "prime_genre" column of the Apple store data set

In [16]:
# For prime_genre Column
display_table(apple_result, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


It's evident from the data above, that among the free english apps, more than half 58.16% are Games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

#### STEP 4. Examine the frequency tables for the "Genres" and "Category" columns of the google play data set

In [17]:
# For Category Column
display_table(google_result, 1)

FAMILY : 18.934777702550214
GAME : 9.693071541412774
TOOLS : 8.451816745655607
BUSINESS : 4.5926427443015125
LIFESTYLE : 3.9043105393816293
PRODUCTIVITY : 3.8930264048747465
FINANCE : 3.7011961182577298
MEDICAL : 3.5206499661475967
SPORTS : 3.39652448657188
PERSONALIZATION : 3.3175355450236967
COMMUNICATION : 3.238546603475513
HEALTH_AND_FITNESS : 3.080568720379147
PHOTOGRAPHY : 2.945159106296547
NEWS_AND_MAGAZINES : 2.798465357707064
SOCIAL : 2.663055743624464
TRAVEL_AND_LOCAL : 2.335815842924848
SHOPPING : 2.2455427668697814
BOOKS_AND_REFERENCE : 2.143985556307831
DATING : 1.8618821936357481
VIDEO_PLAYERS : 1.7941773865944481
MAPS_AND_NAVIGATION : 1.399232678853532
FOOD_AND_DRINK : 1.2412547957571656
EDUCATION : 1.1735499887158656
ENTERTAINMENT : 0.9591514330850823
LIBRARIES_AND_DEMO : 0.9365831640713158
AUTO_AND_VEHICLES : 0.9252990295644324
HOUSE_AND_HOME : 0.8237418190024826
WEATHER : 0.8011735499887158
EVENTS : 0.7109004739336493
PARENTING : 0.6544798013992327
ART_AND_DESIGN : 0.

The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.


In [18]:
# For Genres Column
display_table(google_result, -4)

Tools : 8.440532611148726
Entertainment : 6.070864364703228
Education : 5.348679756262695
Business : 4.5926427443015125
Productivity : 3.8930264048747465
Lifestyle : 3.8930264048747465
Finance : 3.7011961182577298
Medical : 3.5206499661475967
Sports : 3.4642292936131795
Personalization : 3.3175355450236967
Communication : 3.238546603475513
Action : 3.1031369893929135
Health & Fitness : 3.080568720379147
Photography : 2.945159106296547
News & Magazines : 2.798465357707064
Social : 2.663055743624464
Travel & Local : 2.324531708417964
Shopping : 2.2455427668697814
Books & Reference : 2.143985556307831
Simulation : 2.0424283457458814
Dating : 1.8618821936357481
Arcade : 1.8505980591288649
Video Players & Editors : 1.7716091175806816
Casual : 1.7490408485669149
Maps & Navigation : 1.399232678853532
Food & Drink : 1.2412547957571656
Puzzle : 1.128413450688332
Racing : 0.9930038366057323
Role Playing : 0.9365831640713158
Libraries & Demo : 0.9365831640713158
Auto & Vehicles : 0.92529902956443

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.


Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

Most Popular Apps by Genre on the App Store
---------------------------------------------------------------

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Below, we calculate the average number of user ratings per app genre on the App Store:

In [19]:
apple_genres = freq_table(apple_result, -5)

for g in apple_genres:
    # total user ratings of a specific genre
    uratings_total = 0
    
    # total of all rows belonging to generes listed in apple_result
    genre_total = 0   
    
    for row in apple_result:
        row_genre = row[-5]
        if row_genre == g:
            u_ratings = float(row[5])
            uratings_total += u_ratings
            genre_total += 1
            
    avg_ratings = uratings_total / genre_total
    print(g, ':', avg_ratings)
   

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


On average, navigation apps have the highest number of user reviews.

In [20]:
for row in apple_result:
    if row[-5] == 'Navigation':
        # print AppName : NoOfRatings
        print(row[1], ':', row[5]) 

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

In [21]:
for row in apple_result:
    if row[-5] == 'Reference':
        # print AppName : NoOfRatings
        print(row[1], ':', row[5]) 

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating.

the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.

Most Popular Apps by Genre on Google Play
----------------------------------------------------------------

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [22]:
# For Install Column
display_table(google_result, 5)

1,000,000+ : 15.741367637102236
100,000+ : 11.554953735048521
10,000,000+ : 10.516813360415256
10,000+ : 10.200857594222523
1,000+ : 8.395396073121193
100+ : 6.917174452719477
5,000,000+ : 6.838185511171294
500,000+ : 5.574362446400361
50,000+ : 4.773188896411646
5,000+ : 4.513653802753328
10+ : 3.5432182351613632
500+ : 3.2498307379823967
50,000,000+ : 2.2906793048973144
100,000,000+ : 2.1214172872940646
50+ : 1.9183028661701647
5+ : 0.7898894154818324
1+ : 0.5077860528097494
500,000,000+ : 0.2708192281651997
1,000,000,000+ : 0.22568269013766643
0+ : 0.045136538027533285
0 : 0.011284134506883321


One problem with this data is that is not precise. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to get an idea which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

To perform computations, however, we'll need to convert each install number to float — this means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category).

In [23]:
google_category = freq_table(google_result, 1)

for c in google_category:
    total_uinstall = 0
    category_total = 0
    for row in google_result:
        row_category = row[1]
        if row_category == c:
            u_install = row[5]
            u_install = u_install.replace(',','')
            u_install = u_install.replace('+', '')
            total_uinstall += float(u_install)
            category_total += 1
    avg_u_install = total_uinstall / category_total
    print(c, ':', avg_u_install)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1820673.076923077
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15560965.599534342
FAMILY : 3694276.334922527
MEDICAL : 120616.48717948717
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17805627.643678162
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10682301.033377837
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs

In [24]:
for app in google_result:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [25]:
under_100_m = []

for app in google_result:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:

In [26]:
for app in google_result:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:

In [27]:
for app in google_result:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [28]:
for app in google_result:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

Conclusions
------------------

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

TO DO
----------
Analyze the frequency table for the Genre column of the Google Play data set, and see whether you can find useful patterns.
Assume we could also make revenue via in-app purchases and subscriptions, and try to find out which genres seem to be liked the most by users — you could examine app ratings here.