# Profitable App Profiles for the App Store and Google Play Markets
Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

*use datasets as of 2017 - 2018 years from Kaggle.com*

In [3]:
from csv import reader
import math
import numpy as np
import pandas as pd

In [11]:

### The Google Play data set ###
android = pd.read_csv('C:/Python/Data_analysis/googleplaystore.csv')
iso = pd.read_csv('C:/Python/Data_analysis/AppleStore.csv')

print(android.head())
print(iso.head())

                                                 App        Category  Rating  \
0     Photo Editor & Candy Camera & Grid & ScrapBook  ART_AND_DESIGN     4.1   
1                                Coloring book moana  ART_AND_DESIGN     3.9   
2  U Launcher Lite – FREE Live Cool Themes, Hide ...  ART_AND_DESIGN     4.7   
3                              Sketch - Draw & Paint  ART_AND_DESIGN     4.5   
4              Pixel Draw - Number Art Coloring Book  ART_AND_DESIGN     4.3   

  Reviews  Size     Installs  Type Price Content Rating  \
0     159   19M      10,000+  Free     0       Everyone   
1     967   14M     500,000+  Free     0       Everyone   
2   87510  8.7M   5,000,000+  Free     0       Everyone   
3  215644   25M  50,000,000+  Free     0           Teen   
4     967  2.8M     100,000+  Free     0       Everyone   

                      Genres      Last Updated         Current Ver  \
0               Art & Design   January 7, 2018               1.0.0   
1  Art & Design;Pretend 

# Data Cleaning
we only build apps that are free to download and install, and we design them for an English-speaking audience. This means that we'll need to remove non-English apps and non-free

**Firstly**, check if some data miss or wrong

In [22]:
android.info() # saw that Rating have only 9367 non-null values out of 10841

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10840 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10840 non-null  object 
 1   Category        10840 non-null  object 
 2   Rating          9366 non-null   float64
 3   Reviews         10840 non-null  object 
 4   Size            10840 non-null  object 
 5   Installs        10840 non-null  object 
 6   Type            10839 non-null  object 
 7   Price           10840 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10840 non-null  object 
 10  Last Updated    10840 non-null  object 
 11  Current Ver     10832 non-null  object 
 12  Android Ver     10838 non-null  object 
dtypes: float64(1), object(12)
memory usage: 1.4+ MB


In [8]:
android.describe() #saw that some rating incorret (max =19, but max should be 5.0)

Unnamed: 0,Rating
count,9367.0
mean,4.193338
std,0.537431
min,1.0
25%,4.0
50%,4.3
75%,4.5
max,19.0


In [10]:
android[android['Rating']>5] # only one because miss Category in row

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3.0M,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,


In [16]:
android.drop(index=10472, inplace=True)
android.describe()

Unnamed: 0,Rating
count,9366.0
mean,4.191757
std,0.515219
min,1.0
25%,4.0
50%,4.3
75%,4.5
max,5.0


In [20]:
iso.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7197 entries, 0 to 7196
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id                7197 non-null   int64  
 1   track_name        7197 non-null   object 
 2   size_bytes        7197 non-null   int64  
 3   currency          7197 non-null   object 
 4   price             7197 non-null   float64
 5   rating_count_tot  7197 non-null   int64  
 6   rating_count_ver  7197 non-null   int64  
 7   user_rating       7197 non-null   float64
 8   user_rating_ver   7197 non-null   float64
 9   ver               7197 non-null   object 
 10  cont_rating       7197 non-null   object 
 11  prime_genre       7197 non-null   object 
 12  sup_devices.num   7197 non-null   int64  
 13  ipadSc_urls.num   7197 non-null   int64  
 14  lang.num          7197 non-null   int64  
 15  vpp_lic           7197 non-null   int64  
dtypes: float64(3), int64(8), object(5)
memory 

In [21]:
iso.describe()

Unnamed: 0,id,size_bytes,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
count,7197.0,7197.0,7197.0,7197.0,7197.0,7197.0,7197.0,7197.0,7197.0,7197.0,7197.0
mean,863131000.0,199134500.0,1.726218,12892.91,460.373906,3.526956,3.253578,37.361817,3.7071,5.434903,0.993053
std,271236800.0,359206900.0,5.833006,75739.41,3920.455183,1.517948,1.809363,3.737715,1.986005,7.919593,0.083066
min,281656500.0,589824.0,0.0,0.0,0.0,0.0,0.0,9.0,0.0,0.0,0.0
25%,600093700.0,46922750.0,0.0,28.0,1.0,3.5,2.5,37.0,3.0,1.0,1.0
50%,978148200.0,97153020.0,0.0,300.0,23.0,4.0,4.0,37.0,5.0,1.0,1.0
75%,1082310000.0,181924900.0,1.99,2793.0,140.0,4.5,4.5,38.0,5.0,8.0,1.0
max,1188376000.0,4025970000.0,299.99,2974676.0,177050.0,5.0,5.0,47.0,5.0,75.0,1.0


**Second step**, check and remove duplicates

In [35]:
android.drop_duplicates(subset=['App'], keep='first', inplace=True)

In [36]:
android.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9659 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             9659 non-null   object 
 1   Category        9659 non-null   object 
 2   Rating          8196 non-null   float64
 3   Reviews         9659 non-null   object 
 4   Size            9659 non-null   object 
 5   Installs        9659 non-null   object 
 6   Type            9658 non-null   object 
 7   Price           9659 non-null   object 
 8   Content Rating  9659 non-null   object 
 9   Genres          9659 non-null   object 
 10  Last Updated    9659 non-null   object 
 11  Current Ver     9651 non-null   object 
 12  Android Ver     9657 non-null   object 
dtypes: float64(1), object(12)
memory usage: 1.0+ MB


In [38]:
iso[iso.duplicated()] #without duplicated rows

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic


3. Remove non-english apps

One way to go about this is to remove each app whose name contains a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;, etc.), and other symbols (+, *, /, etc.).

All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters. We will use funcion **ord()**
To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters:

In [39]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

In [43]:
android_english = []
ios_english = []

for app in android['App']:
    if is_english(app):
        android_english.append(app)
        
for app in iso['track_name']:
    if is_english(app):
        ios_english.append(app)

In [47]:
print(len(android_english))
print(len(ios_english))

9614
6183


In [53]:
android_eng=android[android['App'].isin(android_english)]
android_eng.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9614 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             9614 non-null   object 
 1   Category        9614 non-null   object 
 2   Rating          8166 non-null   float64
 3   Reviews         9614 non-null   object 
 4   Size            9614 non-null   object 
 5   Installs        9614 non-null   object 
 6   Type            9613 non-null   object 
 7   Price           9614 non-null   object 
 8   Content Rating  9614 non-null   object 
 9   Genres          9614 non-null   object 
 10  Last Updated    9614 non-null   object 
 11  Current Ver     9606 non-null   object 
 12  Android Ver     9612 non-null   object 
dtypes: float64(1), object(12)
memory usage: 1.0+ MB


In [54]:
ios_eng=iso[iso['track_name'].isin(ios_english)]
ios_eng.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6183 entries, 0 to 7195
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id                6183 non-null   int64  
 1   track_name        6183 non-null   object 
 2   size_bytes        6183 non-null   int64  
 3   currency          6183 non-null   object 
 4   price             6183 non-null   float64
 5   rating_count_tot  6183 non-null   int64  
 6   rating_count_ver  6183 non-null   int64  
 7   user_rating       6183 non-null   float64
 8   user_rating_ver   6183 non-null   float64
 9   ver               6183 non-null   object 
 10  cont_rating       6183 non-null   object 
 11  prime_genre       6183 non-null   object 
 12  sup_devices.num   6183 non-null   int64  
 13  ipadSc_urls.num   6183 non-null   int64  
 14  lang.num          6183 non-null   int64  
 15  vpp_lic           6183 non-null   int64  
dtypes: float64(3), int64(8), object(5)
memory 

4. Now remove non-fee apps

In [56]:
android_final = android_eng[android_eng['Price']=='0']
android_final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8862 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             8862 non-null   object 
 1   Category        8862 non-null   object 
 2   Rating          7564 non-null   float64
 3   Reviews         8862 non-null   object 
 4   Size            8862 non-null   object 
 5   Installs        8862 non-null   object 
 6   Type            8861 non-null   object 
 7   Price           8862 non-null   object 
 8   Content Rating  8862 non-null   object 
 9   Genres          8862 non-null   object 
 10  Last Updated    8862 non-null   object 
 11  Current Ver     8856 non-null   object 
 12  Android Ver     8861 non-null   object 
dtypes: float64(1), object(12)
memory usage: 969.3+ KB


In [57]:
ios_final = ios_eng[ios_eng['price']==0.0]
ios_final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3222 entries, 0 to 7195
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id                3222 non-null   int64  
 1   track_name        3222 non-null   object 
 2   size_bytes        3222 non-null   int64  
 3   currency          3222 non-null   object 
 4   price             3222 non-null   float64
 5   rating_count_tot  3222 non-null   int64  
 6   rating_count_ver  3222 non-null   int64  
 7   user_rating       3222 non-null   float64
 8   user_rating_ver   3222 non-null   float64
 9   ver               3222 non-null   object 
 10  cont_rating       3222 non-null   object 
 11  prime_genre       3222 non-null   object 
 12  sup_devices.num   3222 non-null   int64  
 13  ipadSc_urls.num   3222 non-null   int64  
 14  lang.num          3222 non-null   int64  
 15  vpp_lic           3222 non-null   int64  
dtypes: float64(3), int64(8), object(5)
memory 

# Main analysis

In [63]:
android_final['Category'].value_counts()

FAMILY                 1635
GAME                    875
TOOLS                   748
BUSINESS                407
LIFESTYLE               346
PRODUCTIVITY            345
FINANCE                 328
MEDICAL                 312
SPORTS                  301
PERSONALIZATION         294
COMMUNICATION           287
HEALTH_AND_FITNESS      273
PHOTOGRAPHY             261
NEWS_AND_MAGAZINES      248
SOCIAL                  236
TRAVEL_AND_LOCAL        207
SHOPPING                199
BOOKS_AND_REFERENCE     190
DATING                  165
VIDEO_PLAYERS           158
MAPS_AND_NAVIGATION     124
EDUCATION               114
FOOD_AND_DRINK          110
ENTERTAINMENT           100
LIBRARIES_AND_DEMO       83
AUTO_AND_VEHICLES        82
HOUSE_AND_HOME           74
WEATHER                  71
EVENTS                   63
ART_AND_DESIGN           60
PARENTING                58
COMICS                   55
BEAUTY                   53
Name: Category, dtype: int64

In [66]:
android_final['Genres'].value_counts() #won't use forward due to less granularity and unclear connection with Category. Leave only Category for future analysis

Tools                          747
Entertainment                  538
Education                      474
Business                       407
Lifestyle                      345
                              ... 
Strategy;Action & Adventure      1
Art & Design;Pretend Play        1
Arcade;Pretend Play              1
Entertainment;Education          1
Strategy;Creativity              1
Name: Genres, Length: 114, dtype: int64

In [105]:
android_final['installs']=android_final['Installs'].str.replace(',','')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  android_final['installs']=android_final['Installs'].str.replace(',','')


In [109]:
android_final['installs']=android_final['installs'].str.replace('+','').astype(float)

  android_final['installs']=android_final['installs'].str.replace('+','').astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  android_final['installs']=android_final['installs'].str.replace('+','').astype(float)


In [114]:
android_final.drop(columns='Installs', inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  android_final.drop(columns='Installs', inplace=True)


now we could calculate avg install per category

In [118]:
android_final.groupby(by='Category').mean().sort_values('installs', ascending=False)

Unnamed: 0_level_0,Rating,installs
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
COMMUNICATION,4.126923,38456120.0
VIDEO_PLAYERS,4.043056,24852730.0
SOCIAL,4.252736,23253650.0
ENTERTAINMENT,4.126,21134600.0
PHOTOGRAPHY,4.166129,17805630.0
PRODUCTIVITY,4.181915,16787330.0
GAME,4.235252,15837570.0
TRAVEL_AND_LOCAL,4.068156,13984080.0
TOOLS,4.027023,10695250.0
NEWS_AND_MAGAZINES,4.104545,9549178.0


In [126]:
android_final[android_final['Category']=='COMMUNICATION'] #mostly nishe ocupied Big Companies

Unnamed: 0,App,Category,Rating,Reviews,Size,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,installs
335,Messenger – Text and Video Chat for Free,COMMUNICATION,4.0,56642847,Varies with device,Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,1.000000e+09
336,WhatsApp Messenger,COMMUNICATION,4.4,69119316,Varies with device,Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device,1.000000e+09
337,Messenger for SMS,COMMUNICATION,4.3,125257,17M,Free,0,Teen,Communication,"June 6, 2018",1.8.9,4.1 and up,1.000000e+07
338,Google Chrome: Fast & Secure,COMMUNICATION,4.3,9642995,Varies with device,Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,1.000000e+09
339,Messenger Lite: Free Calls & Messages,COMMUNICATION,4.4,1429035,Varies with device,Free,0,Everyone,Communication,"July 25, 2018",37.0.0.7.163,2.3 and up,1.000000e+08
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10709,FO AIRBUS Nantes,COMMUNICATION,,0,17M,Free,0,Everyone,Communication,"December 14, 2017",1.0,4.1 and up,1.000000e+02
10734,FP Connect,COMMUNICATION,,0,22M,Free,0,Teen,Communication,"December 15, 2017",3.15.1,4.1 and up,1.000000e+02
10739,FreedomPop Messaging Phone/SIM,COMMUNICATION,3.6,9894,39M,Free,0,Everyone,Communication,"July 26, 2018",23.01.1265.0712,4.1 and up,5.000000e+05
10748,FP Live,COMMUNICATION,,0,3.3M,Free,0,Teen,Communication,"November 3, 2017",1.2.4,4.2 and up,1.000000e+01


In [127]:
android_final[android_final['Category']=='VIDEO_PLAYERS'] #mostly nishe ocupied YouTube

Unnamed: 0,App,Category,Rating,Reviews,Size,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,installs
3665,YouTube,VIDEO_PLAYERS,4.3,25655305,Varies with device,Free,0,Teen,Video Players & Editors,"August 2, 2018",Varies with device,Varies with device,1.000000e+09
3666,All Video Downloader 2018,VIDEO_PLAYERS,4.3,7557,5.6M,Free,0,Everyone,Video Players & Editors,"July 25, 2018",1.0.1,4.4 and up,1.000000e+06
3667,Video Downloader,VIDEO_PLAYERS,4.2,59089,5.4M,Free,0,Everyone,Video Players & Editors,"August 3, 2018",1.0.8,4.4 and up,1.000000e+07
3668,HD Video Player,VIDEO_PLAYERS,4.3,1551,2.9M,Free,0,Everyone,Video Players & Editors,"July 25, 2018",1.1,4.1 and up,1.000000e+06
3669,Iqiyi (for tablet),VIDEO_PLAYERS,3.6,12764,25M,Free,0,Teen,Video Players & Editors,"July 11, 2018",7.1,4.0 and up,1.000000e+06
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10226,Video Downloader for FB : Video Download with ...,VIDEO_PLAYERS,4.5,1060,3.2M,Free,0,Everyone,Video Players & Editors,"December 20, 2017",2.1,4.0 and up,1.000000e+05
10237,HD VideoDownlaoder For Fb : XXVideo Downloader,VIDEO_PLAYERS,4.2,61,6.1M,Free,0,Everyone,Video Players & Editors,"July 30, 2018",1.0.3,4.0.3 and up,1.000000e+04
10318,HD Video Download for Facebook,VIDEO_PLAYERS,4.4,20755,17M,Free,0,Everyone,Video Players & Editors,"March 16, 2018",4.0.3,4.1 and up,1.000000e+06
10519,Art of F J Taylor,VIDEO_PLAYERS,,2,1.5M,Free,0,Everyone,Video Players & Editors,"June 23, 2015",1.0.2,4.3 and up,1.000000e+01


Because now we made analysis for ios ( see bellow) and main question what will profitable for both markets, we could check (weather, food, finance, book)

In [61]:
ios_final['prime_genre'].value_counts()

Games                1874
Entertainment         254
Photo & Video         160
Education             118
Social Networking     106
Shopping               84
Utilities              81
Sports                 69
Music                  66
Health & Fitness       65
Productivity           56
Lifestyle              51
News                   43
Travel                 40
Finance                36
Weather                28
Food & Drink           26
Reference              18
Business               17
Book                   14
Navigation              6
Medical                 6
Catalogs                4
Name: prime_genre, dtype: int64

In [119]:
ios_final.groupby(by='prime_genre').mean().sort_values('rating_count_tot', ascending=False)

Unnamed: 0_level_0,id,size_bytes,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
prime_genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Navigation,425013800.0,96481790.0,0.0,86090.333333,749.5,3.833333,2.25,37.166667,3.333333,20.166667,1.0
Reference,771769700.0,120077200.0,0.0,74942.111111,2746.555556,3.666667,3.861111,36.555556,3.5,10.277778,1.0
Social Networking,733306200.0,93665390.0,0.0,71548.349057,867.858491,3.59434,2.985849,36.216981,1.95283,12.566038,0.990566
Music,726257400.0,86460170.0,0.0,57326.530303,628.075758,3.94697,3.931818,36.5,3.742424,8.545455,1.0
Weather,632221600.0,89602890.0,0.0,52279.892857,1798.5,3.482143,3.017857,36.821429,3.678571,11.321429,1.0
Book,824671200.0,118487000.0,0.0,39758.5,485.428571,3.071429,3.142857,37.285714,3.071429,4.642857,1.0
Food & Drink,704776700.0,73952260.0,0.0,33333.923077,755.807692,3.634615,3.25,36.615385,1.230769,4.346154,1.0
Finance,584580700.0,94659440.0,0.0,31467.944444,697.222222,3.375,2.847222,35.472222,2.472222,1.805556,1.0
Photo & Video,820967000.0,84955280.0,0.0,28441.54375,435.7375,3.903125,3.384375,36.475,2.61875,11.5875,1.0
Travel,592205000.0,97872030.0,0.0,28243.8,244.075,3.4875,2.7375,37.325,2.375,10.2,0.975


In [122]:
ios_final[ios_final['prime_genre']=='Navigation'] #we saw bellow that mostly this category ocupied Big company ( Waze, Google maps)

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
49,323229106,"Waze - GPS Navigation, Maps & Real-time Traffic",94139392,USD,0.0,345046,3040,4.5,4.5,4.24,4+,Navigation,37,5,36,1
130,585027354,Google Maps - Navigation & Transit,120232960,USD,0.0,154911,1253,4.5,4.0,4.31.1,12+,Navigation,37,5,34,1
881,329541503,Geocaching®,108166144,USD,0.0,12811,134,3.5,1.5,5.3,4+,Navigation,37,0,22,1
1633,504677517,CoPilot GPS – Car Navigation & Offline Maps,82534400,USD,0.0,3582,70,4.0,3.5,10.0.0.984,4+,Navigation,38,5,25,1
3987,344176018,ImmobilienScout24: Real Estate Search in Germany,126867456,USD,0.0,187,0,3.5,0.0,9.5,4+,Navigation,37,5,3,1
6033,463431091,Railway Route Search,46950400,USD,0.0,5,0,3.0,0.0,3.17.1,4+,Navigation,37,0,1,1


In [124]:
ios_final[ios_final['prime_genre']=='Reference'] #mostly bible and dictionary

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
6,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1
90,308750436,Dictionary.com Dictionary & Thesaurus,111275008,USD,0.0,200047,177,4.0,4.0,7.1.3,4+,Reference,37,0,1,1
335,364740856,Dictionary.com Dictionary & Thesaurus for iPad,165748736,USD,0.0,54175,10176,4.5,4.5,4.0,4+,Reference,24,5,9,1
551,414706506,Google Translate,65281024,USD,0.0,26786,27,3.5,4.5,5.10.0,4+,Reference,37,5,59,1
715,388389451,"Muslim Pro: Ramadan 2017 Prayer Times, Azan, Q...",100551680,USD,0.0,18418,706,4.5,5.0,9.2.1,4+,Reference,37,5,16,1
738,1130829481,New Furniture Mods - Pocket Wiki & Game Tools ...,52959232,USD,0.0,17588,17588,4.5,4.5,1.0,4+,Reference,38,3,2,1
757,399452287,Merriam-Webster Dictionary,155593728,USD,0.0,16849,1125,4.5,4.5,4.1,4+,Reference,38,1,12,1
913,475772902,Night Sky,596499456,USD,0.0,12122,60,4.5,4.5,4.4.1,4+,Reference,37,5,29,1
1106,1135575003,City Maps for Minecraft PE - The Best Maps for...,90124288,USD,0.0,8535,8535,4.0,4.0,1.0,4+,Reference,37,4,1,1
1451,1132715891,LUCKY BLOCK MOD ™ for Minecraft PC Edition - T...,86874112,USD,0.0,4693,4693,4.0,4.0,1.0,12+,Reference,37,4,1,1


In [125]:
ios_final[ios_final['prime_genre']=='Social Networking'] #mostly Big Company

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
5,429047995,Pinterest,74778624,USD,0.0,1061624,1814,4.5,4.0,6.26,12+,Social Networking,37,5,27,1
43,304878510,Skype for iPhone,133238784,USD,0.0,373519,127,3.5,4.0,6.35.1,4+,Social Networking,37,0,32,1
48,454638411,Messenger,275729408,USD,0.0,351466,892,3.0,3.0,119.0,4+,Social Networking,37,1,33,1
51,305343404,Tumblr,151573504,USD,0.0,334293,919,4.0,4.0,8.6,17+,Social Networking,37,5,16,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6380,1011114397,BestieBox,48274432,USD,0.0,0,0,0.0,0.0,3.2.1,4+,Social Networking,37,0,5,1
6848,1056011241,MATCH ON LINE chat,9462784,USD,0.0,0,0,0.0,0.0,1.3,17+,Social Networking,37,0,1,1
6878,1148502570,niconico ch,55334912,USD,0.0,0,0,0.0,0.0,1.0.9,17+,Social Networking,37,0,2,1
7090,1171367187,LINE BLOG,101159936,USD,0.0,0,0,0.0,0.0,1.5.0,12+,Social Networking,37,0,1,1


 Conclusion: For ios possible gategory would be next 5 (music/weather/book/finance/food)

### Conclusion: 
We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.