# Popular App Profiles Among Users for the App Store and Google Play Store.

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app ‚Äî the more users who see and engage with the ads, the better.

Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users on Google Play and IOS App Store.

In [1]:
from csv import reader

In [2]:
gplay=list(reader(open("googleplaystore.csv")))

In [3]:
ios=list(reader(open("AppleStore.csv")))

In [4]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [5]:
explore_data(ios,0,3,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7198
Number of columns: 16


In [6]:
explore_data(gplay,0,3,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


## Data Cleaning

1. Removing a row which was shifted to right. This was mentioned by a contributor Shivendra Yadav.

In [7]:
gplay[10473]

['Life Made WI-Fi Touchscreen Photo Frame',
 '1.9',
 '19',
 '3.0M',
 '1,000+',
 'Free',
 '0',
 'Everyone',
 '',
 'February 11, 2018',
 '1.0.19',
 '4.0 and up']

In [8]:
#deleting it
del gplay[10473]

In [9]:
dup_apps=[]
unique_apps=[]
for row in gplay:
    if row[0] in unique_apps:
        dup_apps.append(row[0])
    else:
        unique_apps.append(row[0])

print("No. of duplicates: ",len(dup_apps))

No. of duplicates:  1181


### The number of duplicate apps is 1181

In [10]:
print("Example of duplicate apps:" ,dup_apps[:5])

Example of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


There a large number of duplicate records and it would be better for ananlysis to remove them. Let us explore the duplicate records in depth. We know, by looking at the examples that Zoom Cloud Meetings records are repeating.

In [11]:
print(gplay[0])
for row in gplay:
    if row[0]=="Quick PDF Scanner + OCR FREE":
        print(row)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


The rows seem almost same but differ in Reviews column. We can use this to determine which rows are to be deleted. Instead of deleting duplicate rows randomly, we will keep the most recent reviews and delete all others.
The records with most number of reviews will be the most recent one.


In [12]:
print("Expected length of records after removal of duplicates: ",len(gplay[1:])-len(dup_apps))

Expected length of records after removal of duplicates:  9659


In [13]:
highest_rev={}
for row in gplay[1:]:
    name=row[0]
    rev=float(row[3])
    if name in highest_rev and rev>highest_rev[name]:
        highest_rev[name]=rev
    if name not in highest_rev:
        highest_rev[name]=rev

In [14]:
len(highest_rev)

9659

We inspect each row in android data. If the reviews in the row are equal to the highest reviews of the same app(in the dicitionary) and the name of the app is not in the already_added list, we append the row in the android_clean dataset. The already_added list is to keep a record for all the applications, that have already been added in the clean dataset.

In [15]:
android_clean=[]
already_added=[]
for row in gplay[1:]:
    name=row[0]
    rev=float(row[3])
    if rev==highest_rev[name] and name not in already_added:
        android_clean.append(row)
        already_added.append(name)

In [16]:
len(android_clean)

9659

Let us check for duplicates in the IOS App store dataset, using the ID column.

In [17]:
dup_apps=[]
unique_apps=[]
for row in ios[1:]:
    if row[0] in unique_apps:
        dup_apps.append(row[0])
    else:
        unique_apps.append(row[0])

print("No. of duplicates: ",len(dup_apps))

No. of duplicates:  0


In [18]:
print(ios[-1][1])
print(ios[-4][1])

„Åø„Çì„Å™„ÅÆ„ÅäÂºÅÂΩì by „ÇØ„ÉÉ„ÇØ„Éë„ÉÉ„Éâ ~„ÅäÂºÅÂΩì„Çí„É¨„Ç∑„Éî‰ªò„Åç„ÅßË®òÈå≤„ÉªÂÖ±Êúâ~
„ÄêË¨éËß£„Åç„Äë„É§„Éü„Åô„ÅéÂΩºÂ•≥„Åã„Çâ„ÅÆ„É°„ÉÉ„Çª„Éº„Ç∏


Some IOS apps have names which are not of the English Language. As we are conducting this analysis only for the English apps, we will remove such apps

To do this, let us first explore ASCII characters of both english and non english letters. We use ord() function for the same.

In [19]:
print(ord("Èõª"))
print(ord("„Åø"))
print(ord("a"))
print(ord("A"))
print(ord("z"))

38651
12415
97
65
122


English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system. 

Now, we create a function called eng_check that looks for 1 non english character in a string and returns False as soon as it encounters one

In [20]:
def is_eng(string):
    for character in string:
        if ord(character)>127:
            return False
    return True
is_eng("Docs To Go‚Ñ¢ Free Office Suite'")

False

In [21]:
is_eng("Instagram")

True

If we're going to use the function we've created, we'll lose useful data since many English apps will be incorrectly labeled as non-English. To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range
Now, we create a function called eng_check_3 that looks for atleast 3 non english character in a string and returns False as soon as it encounters them

In [22]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

In [23]:
is_english("Instachat üòú")

True

In [24]:
is_english("Docs To Go‚Ñ¢ Free Office Suite'")

True

In [25]:
is_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠')

False

Let us apply this function on our IOS dataset to check for non english names of the apps.

In [26]:
ios_eng=[]
for row in ios[1:]:
    if is_english(row[1]):
        ios_eng.append(row)
explore_data(ios_eng,0,3,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


Let us apply this function on our android dataset to check for non english names of the apps.

In [27]:
android_eng=[]
for row in android_clean:
    if is_english(row[0]):
        android_eng.append(row)
explore_data(android_eng,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


Now Ios_eng and gplay_Eng contain apps with names in english only.

Now we need to isolate the free apps for both IOS and Android as our company builds free apps only. Our main source of revenue consists of in-app ads.

In [28]:
free_ios=[]
free_android=[]

for row in android_eng:
    if row[7]=='0.0' or row[7]=='0':
        free_android.append(row)

for row in ios_eng:
    if row[4]=='0.0'or row[4]=='0':
        free_ios.append(row)

In [29]:
explore_data(free_android,0,3,True)
explore_data(free_ios,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

We are now left with 8864 rows of data for android and 3222 rows of data for the app store. 
Our Data Cleaning ends here. Now we will proceed to our analysis.

## Analysis for determining Profitable Apps.

Our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

To minimize risks and overhead, our validation strategy for an app idea has three steps:

1. Build a minimal Android version of the app, and add it to Google Play.If the app has a good response from users, we develop it further. 
2. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

3. Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. 

Let's begin the analysis by determining the **most common genres** for each market. For this, we'll need to build frequency tables for a few columns in our datasets.

In [30]:
explore_data(free_ios,0,3)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']




In [31]:
explore_data(free_android,0,3)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']




In [32]:
def freq_table(data,index):
    per={}
    freq={}
    s=0
    for row in data:
        if row[index]in freq:
            freq[row[index]]+=1
        else:
            freq[row[index]]=1
        s+=1
    
    for row in freq:
        per[row]=(freq[row]/s)*100
    
    return per



In [33]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Frequency table of Android apps by Category

In [34]:
display_table(free_android,1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Frequency table of Android Apps by Genres.

In [35]:
display_table(free_android,-4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

### Inferences for Android Apps.

1. The most widely available app category corresponds to **FAMILY** at **18.9%** whereas the **Tools** genre has most number of apps at **8.45%**. 
2. A few next most common apps belong to categories, **Games(9.7%), Tools(8%), Business(4.5%)** and so on.
3. A few next most common apps belong to genres, **Entertainment(6%), Education(5%), Business(4.5%)** and so on.

The **FAMILY** search on google results in gaming apps for kids. That means that although the most common category corresponds to gaming and fun, the most common genres are more inclined towards productivity, business and practical applications.

Therefore, we can infer that **Google Play shows a balanced landscape of both free practical and fun apps in English Language.**




Frequency table of IOS Apps by Prime Genres

In [36]:
display_table(free_ios,11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


### Inferences for the IOS Apps.

1. The most widely available app genre corresponds to **Games** at a whopping **58.1%**. 
2. A few next most common apps belong to genres, **Entertainment(7.88%), Photo and Video(4.96%)** and so on.

Therefore, we can infer that the **App Store has more free fun and entertainment based apps in English Language.**



Although, we have reached some conclusions about the most common categories on both the markets, we still aren't completely aware of the types of apps that attract most users.
One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre.

Let's start with calculating the average number of user ratings per app genre on the App Store

In [69]:
unique_genre_ios=freq_table(free_ios,-5)

In [70]:
genre_avg_rating={}
for genre in unique_genre_ios:
    total=0
    len_genre=0
    for row in free_ios:
        if row[-5]==genre:
            total+=float(row[5])
            len_genre+=1
    genre_avg_rating[genre]=total/len_genre

dict(sorted(genre_avg_rating.items(), key=lambda item: item[1],reverse = True))

{'Navigation': 86090.33333333333,
 'Reference': 74942.11111111111,
 'Social Networking': 71548.34905660378,
 'Music': 57326.530303030304,
 'Weather': 52279.892857142855,
 'Book': 39758.5,
 'Food & Drink': 33333.92307692308,
 'Finance': 31467.944444444445,
 'Photo & Video': 28441.54375,
 'Travel': 28243.8,
 'Shopping': 26919.690476190477,
 'Health & Fitness': 23298.015384615384,
 'Sports': 23008.898550724636,
 'Games': 22788.6696905016,
 'News': 21248.023255813954,
 'Productivity': 21028.410714285714,
 'Utilities': 18684.456790123455,
 'Lifestyle': 16485.764705882353,
 'Entertainment': 14029.830708661417,
 'Business': 7491.117647058823,
 'Education': 7003.983050847458,
 'Catalogs': 4004.0,
 'Medical': 612.0}

We see that the **Navigation** genre has highest number of average reviews. The next most highly reviewed are **Reference** and **Social Networking**.
Let us further drill down these genres and see if the ratings are somewhere skewed or not.

In [71]:
for row in free_ios:
    if row[-5]=="Navigation":
        print(row[1],row[5])

Waze - GPS Navigation, Maps & Real-time Traffic 345046
Google Maps - Navigation & Transit 154911
Geocaching¬Æ 12811
CoPilot GPS ‚Äì Car Navigation & Offline Maps 3582
ImmobilienScout24: Real Estate Search in Germany 187
Railway Route Search 5


Here most number of reviews are dominated by 2 applications, Waze and Google maps. Recommending a navigation app, will put the application up for a cut throat competition with these 2 apps that are much more popular that other applications in the same category

Let us explore **Reference** category.

In [72]:
for row in free_ios:
    if row[-5]=="Reference":
        print(row[1],row[5])

Bible 985920
Dictionary.com Dictionary & Thesaurus 200047
Dictionary.com Dictionary & Thesaurus for iPad 54175
Google Translate 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition 17588
Merriam-Webster Dictionary 16849
Night Sky 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) 8535
LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools 4693
GUNS MODS for Minecraft PC Edition - Mods Tools 1497
Guides for Pok√©mon GO - Pokemon GO News and Cheats 826
WWDC 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free 718
VPN Express 14
Real Bike Traffic Rider Virtual Reality Glasses 8
Êïô„Åà„Å¶!goo 0
Jishokun-Japanese English Dictionary & Translator 0


We see that in the app store, religious books are quite popular as well as some dictionaries and apps like Google Translate is quite popular as well. This can make us recommend an app for the app store, which is practical in terms of helping people improve their language skills. We can make an app that is interactive, for instance, generates word of the day, gives grammar tips and has a dictionary as well. 
The dictionary can be made more fun with some animations.

In [73]:
for row in free_ios:
    if row[-5]=="Social Networking":
        print(row[1],row[5])

Facebook 2974676
Pinterest 1061624
Skype for iPhone 373519
Messenger 351466
Tumblr 334293
WhatsApp Messenger 287589
Kik 260965
ooVoo ‚Äì Free Video Call, Text and Voice 177501
TextNow - Unlimited Text + Calls 164963
Viber Messenger ‚Äì Text & Call 164249
Followers - Social Analytics For Instagram 112778
MeetMe - Chat and Meet New People 97072
We Heart It - Fashion, wallpapers, quotes, tattoos 90414
InsTrack for Instagram - Analytics Plus More 85535
Tango - Free Video Call, Voice and Chat 75412
LinkedIn 71856
Match‚Ñ¢ - #1 Dating App. 60659
Skype for iPad 60163
POF - Best Dating App for Conversations 52642
Timehop 49510
Find My Family, Friends & iPhone - Life360 Locator 43877
Whisper - Share, Express, Meet 39819
Hangouts 36404
LINE PLAY - Your Avatar World 34677
WeChat 34584
Badoo - Meet New People, Chat, Socialize. 34428
Followers + for Instagram - Follower Analytics 28633
GroupMe 28260
Marco Polo Video Walkie Talkie 27662
Miitomo 23965
SimSimi 23530
Grindr - Gay and same sex guys chat

Social Networking sites are again dominated by the giants, Facebook, Skype, Whatsapp etc. Using the same argument as we did for Navigation, we cannot recommend an app for this category.



Now let's calculate the average number of installs per app genre for the Google Play dataset. 

The install numbers don't seem precise enough ‚Äî we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.) We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. To perform computations, however, we'll need to convert each install number from a string to a float. This means we need to remove the commas and the plus characters, or the conversion will fail and cause an error.

We can use the str.replace( ) to replace unwanted characters with the empty string ''.

In [75]:
explore_data(free_android,0,3)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']




Let us generate a list for the unique Categories in the dataset.

In [80]:
unique_category_android=freq_table(free_android,1)

In [90]:
avg_installs={}
for cg in unique_category_android:
    total=0
    len_cg=0
    for row in free_android:
        if row[1]==cg:
            installs=row[5].replace('+','')
            installs=installs.replace(',','')
            len_cg+=1
            total+=float(installs)
    avg_installs[cg]=total/len_cg

dict(sorted(avg_installs.items(), key=lambda item: item[1],reverse = True))


{'COMMUNICATION': 38456119.167247385,
 'VIDEO_PLAYERS': 24727872.452830188,
 'SOCIAL': 23253652.127118643,
 'PHOTOGRAPHY': 17840110.40229885,
 'PRODUCTIVITY': 16787331.344927534,
 'GAME': 15588015.603248259,
 'TRAVEL_AND_LOCAL': 13984077.710144928,
 'ENTERTAINMENT': 11640705.88235294,
 'TOOLS': 10801391.298666667,
 'NEWS_AND_MAGAZINES': 9549178.467741935,
 'BOOKS_AND_REFERENCE': 8767811.894736841,
 'SHOPPING': 7036877.311557789,
 'PERSONALIZATION': 5201482.6122448975,
 'WEATHER': 5074486.197183099,
 'HEALTH_AND_FITNESS': 4188821.9853479853,
 'MAPS_AND_NAVIGATION': 4056941.7741935486,
 'FAMILY': 3695641.8198090694,
 'SPORTS': 3638640.1428571427,
 'ART_AND_DESIGN': 1986335.0877192982,
 'FOOD_AND_DRINK': 1924897.7363636363,
 'EDUCATION': 1833495.145631068,
 'BUSINESS': 1712290.1474201474,
 'LIFESTYLE': 1437816.2687861272,
 'FINANCE': 1387692.475609756,
 'HOUSE_AND_HOME': 1331540.5616438356,
 'DATING': 854028.8303030303,
 'COMICS': 817657.2727272727,
 'AUTO_AND_VEHICLES': 647317.8170731707

Most Popular categories and their average installs:
  1. COMMUNICATION: 38456119
  2. VIDEO_PLAYERS: 24727872
  3. SOCIAL: 23253652
  4. PHOTOGRAPHY: 17840110

Let us explore **Communication** category.


In [92]:
for row in free_android:
    if row[1]=="COMMUNICATION":
        print(row[0],row[5])

WhatsApp Messenger 1,000,000,000+
Messenger for SMS 10,000,000+
My Tele2 5,000,000+
imo beta free calls and text 100,000,000+
Contacts 50,000,000+
Call Free ‚Äì Free Call 5,000,000+
Web Browser & Explorer 5,000,000+
Browser 4G 10,000,000+
MegaFon Dashboard 10,000,000+
ZenUI Dialer & Contacts 10,000,000+
Cricket Visual Voicemail 10,000,000+
TracFone My Account 1,000,000+
Xperia Link‚Ñ¢ 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard 10,000,000+
Skype Lite - Free Video Call & Chat 5,000,000+
My magenta 1,000,000+
Android Messages 100,000,000+
Google Duo - High Quality Video Calls 500,000,000+
Seznam.cz 1,000,000+
Antillean Gold Telegram (original version) 100,000+
AT&T Visual Voicemail 10,000,000+
GMX Mail 10,000,000+
Omlet Chat 10,000,000+
My Vodacom SA 5,000,000+
Microsoft Edge 5,000,000+
Messenger ‚Äì Text and Video Chat for Free 1,000,000,000+
imo free video calls and chat 500,000,000+
Calls & Text by Mo+ 5,000,000+
free video calls and chat 50,000,000+
Skype - free IM &

As expected, this category is also dominated by conglomerate likes Google(Duo), Facebook(Messenger) and Whatsapp. It is quite tough to introduce a new app in such a category

Let us explore **Video_players** category

In [93]:
for row in free_android:
    if row[1]=="VIDEO_PLAYERS":
        print(row[0],row[5])

YouTube 1,000,000,000+
All Video Downloader 2018 1,000,000+
Video Downloader 10,000,000+
HD Video Player 1,000,000+
Iqiyi (for tablet) 1,000,000+
Video Player All Format 10,000,000+
Motorola Gallery 100,000,000+
Free TV series 100,000+
Video Player All Format for Android 500,000+
VLC for Android 100,000,000+
Code 10,000,000+
Vote for 50,000,000+
XX HD Video downloader-Free Video Downloader 1,000,000+
OBJECTIVE 1,000,000+
Music - Mp3 Player 10,000,000+
HD Movie Video Player 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects 1,000,000+
YouTube Studio 10,000,000+
video player for android 10,000,000+
Vigo Video 50,000,000+
Google Play Movies & TV 1,000,000,000+
HTC Service Ôºç DLNA 10,000,000+
VPlayer 1,000,000+
MiniMovie - Free Video and Slideshow Editor 50,000,000+
Samsung Video Library 50,000,000+
OnePlus Gallery 1,000,000+
LIKE ‚Äì Magic Video Maker & Community 50,000,000+
HTC Service‚ÄîVideo Player 5,000,000+
Play 

This category is dominateed too by Youtube,Viva video etc.

Let us explore **Social** category.

In [94]:
for row in free_android:
    if row[1]=="SOCIAL":
        print(row[0],row[5])

Facebook 1,000,000,000+
Facebook Lite 500,000,000+
Tumblr 100,000,000+
Social network all in one 2018 100,000+
Pinterest 100,000,000+
TextNow - free text + calls 10,000,000+
Google+ 1,000,000,000+
The Messenger App 1,000,000+
Messenger Pro 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus 1,000,000+
Telegram X 5,000,000+
The Video Messenger App 100,000+
Jodel - The Hyperlocal App 1,000,000+
Hide Something - Photo, Video 5,000,000+
Love Sticker 1,000,000+
Web Browser & Fast Explorer 5,000,000+
LiveMe - Video chat, new friends, and make money 10,000,000+
VidStatus app - Status Videos & Status Downloader 5,000,000+
Love Images 1,000,000+
Web Browser ( Fast & Secure Web Explorer) 500,000+
SPARK - Live random video chat & meet new people 5,000,000+
Golden telegram 50,000+
Facebook Local 1,000,000+
Meet ‚Äì Talk to Strangers Using Random Video Chat 5,000,000+
MobilePatrol Public Safety App 1,000,000+
üíò WhatsLov: Smileys of love, stickers and GIF 1,000,000+
HTC Social Plugin - 

As expected, Facebook, Pinterest etc. have acwuired the most stake in the average installs in this category

Let us explore **Photography**.

In [95]:
for row in free_android:
    if row[1]=="PHOTOGRAPHY":
        print(row[0],row[5])

TouchNote: Cards & Gifts 1,000,000+
FreePrints ‚Äì Free Photos Delivered 1,000,000+
Groovebook Photo Books & Gifts 500,000+
Moony Lab - Print Photos, Books & Magnets ‚Ñ¢ 50,000+
LALALAB prints your photos, photobooks and magnets 1,000,000+
Snapfish 1,000,000+
Motorola Camera 50,000,000+
HD Camera - Best Cam with filters & panorama 5,000,000+
LightX Photo Editor & Photo Effects 10,000,000+
Sweet Snap - live filter, Selfie photo edit 10,000,000+
HD Camera - Quick Snap Photo & Video 1,000,000+
B612 - Beauty & Filter Camera 100,000,000+
Waterfall Photo Frames 1,000,000+
Photo frame 100,000+
Huji Cam 5,000,000+
Unicorn Photo 1,000,000+
HD Camera 5,000,000+
Makeup Editor -Beauty Photo Editor & Selfie Camera 1,000,000+
Makeup Photo Editor: Makeup Camera & Makeup Editor 1,000,000+
Moto Photo Editor 5,000,000+
InstaBeauty -Makeup Selfie Cam 50,000,000+
Garden Photo Frames - Garden Photo Editor 500,000+
Photo Frame 10,000,000+
Selfie Camera - Photo Editor & Filter & Sticker 50,000,000+
Sweet Sna

This category is also dominated by a few applications like S PHOTO EDITOR, BOOMERANG etc. 

All of these categories have intensive competition. Let us have a look at BOOKS_AND_REFERENCE category. The average installs for this category is : 8767811, which is impressive.

In [96]:
for row in free_android:
    if row[1]=="BOOKS_AND_REFERENCE":
        print(row[0],row[5])

E-Book Read - Read Book for free 50,000+
Download free book with green book 100,000+
Wikipedia 10,000,000+
Cool Reader 10,000,000+
Free Panda Radio Music 100,000+
Book store 1,000,000+
FBReader: Favorite Book Reader 10,000,000+
English Grammar Complete Handbook 500,000+
Free Books - Spirit Fanfiction and Stories 1,000,000+
Google Play Books 1,000,000,000+
AlReader -any text book reader 5,000,000+
Offline English Dictionary 100,000+
Offline: English to Tagalog Dictionary 500,000+
FamilySearch Tree 1,000,000+
Cloud of Books 1,000,000+
Recipes of Prophetic Medicine for free 500,000+
ReadEra ‚Äì free ebook reader 1,000,000+
Anonymous caller detection 10,000+
Ebook Reader 5,000,000+
Litnet - E-books 100,000+
Read books online 5,000,000+
English to Urdu Dictionary 500,000+
eBoox: book reader fb2 epub zip 1,000,000+
English Persian Dictionary 500,000+
Flybook 500,000+
All Maths Formulas 1,000,000+
Ancestry 5,000,000+
HTC Help 10,000,000+
English translation from Bengali 100,000+
Pdf Book Down

Besides the expected giants like Wattpad, Wikipedia etc, we do see some books here that are based on their practical advantage. Various collections of libraries and dictionaries are popular.

So we can recommend a fun and interactive dictionary app that we recommened for IOS before.

## Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that making an app based on grammar, dictionaries or translations can be profitable for the both the markets. To reinstate, the app should be free and in English Language. We can incorporate animations, word of the day and quizzes in the app to help users engage more.