# Project: Profitable App Profiles for the App Store and Google Play Markets

## <font color=green>Aim: The aim of this project is to find mobile apps that are profitable for the App Store and Google Play markets. As a data analyst, my goal is to analyze data to help developers understand what type of apps are likely to attract more users.  </font>

At this company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

## Read files

In [1]:
import pandas as pd

apple_data = pd.read_csv('AppleStore.csv')
android_data = pd.read_csv('googleplaystore.csv')

In [2]:
apple_data.head()

Unnamed: 0.1,Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,1,281656475,PAC-MAN Premium,100788224,USD,3.99,21292,26,4.0,4.5,6.3.5,4+,Games,38,5,10,1
1,2,281796108,Evernote - stay organized,158578688,USD,0.0,161065,26,4.0,3.5,8.2.2,4+,Productivity,37,5,23,1
2,3,281940292,"WeatherBug - Local Weather, Radar, Maps, Alerts",100524032,USD,0.0,188583,2822,3.5,4.5,5.0.0,4+,Weather,37,5,3,1
3,4,282614216,"eBay: Best App to Buy, Sell, Save! Online Shop...",128512000,USD,0.0,262241,649,4.0,4.5,5.10.0,12+,Shopping,37,5,9,1
4,5,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1


In [3]:
android_data.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


#### The Google Play data set has a dedicated discussion section, and we can see that one of the discussions outlines an error for row 10472. Let's print this row and compare it against the header and another row that is correct.

In [4]:
print(android_data.iloc[10472])

App               Life Made WI-Fi Touchscreen Photo Frame
Category                                              1.9
Rating                                                 19
Reviews                                              3.0M
Size                                               1,000+
Installs                                             Free
Type                                                    0
Price                                            Everyone
Content Rating                                        NaN
Genres                                  February 11, 2018
Last Updated                                       1.0.19
Current Ver                                    4.0 and up
Android Ver                                           NaN
Name: 10472, dtype: object


#### 10371 is Correct Row as shown below:

In [5]:
print(android_data.iloc[10471])

App               Xposed Wi-Fi-Pwd
Category           PERSONALIZATION
Rating                         3.5
Reviews                       1042
Size                          404k
Installs                  100,000+
Type                          Free
Price                            0
Content Rating            Everyone
Genres             Personalization
Last Updated        August 5, 2014
Current Ver                  3.0.0
Android Ver           4.0.3 and up
Name: 10471, dtype: object


#### Now, lets find the length of the dataset "android_data". Then we will delete the inconsistent row (10472). Then we will check the length of the dataset to make sure the inconsistent row has been deleted

In [6]:
print(len(android_data))
android_data.drop([10472], axis=0, inplace=True)
len(android_data)

10841


10840

#### Let's double check that the row 10472 has been deleted

In [7]:
android_data[10470:10474]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
10470,Jazz Wi-Fi,COMMUNICATION,3.4,49,4.0M,"10,000+",Free,0,Everyone,Communication,"February 10, 2017",0.1,2.3 and up
10471,Xposed Wi-Fi-Pwd,PERSONALIZATION,3.5,1042,404k,"100,000+",Free,0,Everyone,Personalization,"August 5, 2014",3.0.0,4.0.3 and up
10473,osmino Wi-Fi: free WiFi,TOOLS,4.2,134203,4.1M,"10,000,000+",Free,0,Everyone,Tools,"August 7, 2018",6.06.14,4.4 and up
10474,Sat-Fi Voice,COMMUNICATION,3.4,37,14M,"1,000+",Free,0,Everyone,Communication,"November 21, 2014",2.2.1.5,2.2 and up


In [8]:
u=[]
d = []

for index, column in android_data.iterrows():
    app = column['App']
    if app in u:
        d.append(app)
    else:
        u.append(app)
print(len(d))

1181


In [9]:
rslt_df = android_data[android_data['App'] == "Instagram"] 
rslt_df[0:]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
2545,Instagram,SOCIAL,4.5,66577313,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device
2604,Instagram,SOCIAL,4.5,66577446,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device
2611,Instagram,SOCIAL,4.5,66577313,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device
3909,Instagram,SOCIAL,4.5,66509917,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device


In [10]:
unique_apps = []
duplicate_apps = []

for idx, row in android_data.iterrows():
    name = row['App']
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print(len(duplicate_apps))
print(len(unique_apps))

1181
9659


In [11]:
reviews_max = {}
for idx, row in android_data.iterrows():
    name = row['App']
    n_reviews = float(row['Reviews'])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
len(reviews_max)

9659

In [12]:
android_clean = [] #lsit of new cleaned data
already_added = [] #list of the cleaned app names

for idx, row in android_data.iterrows():
    name  = row['App']
    n_reviews = float(row['Reviews'])
    
    if(reviews_max[name] == n_reviews)and (name not in already_added):
        android_clean.append(row)
        already_added.append(name) #make sure this inside the if block
        
len(android_clean)

9659

In [13]:
android_clean[0:4]

[App               Photo Editor & Candy Camera & Grid & ScrapBook
 Category                                          ART_AND_DESIGN
 Rating                                                       4.1
 Reviews                                                      159
 Size                                                         19M
 Installs                                                 10,000+
 Type                                                        Free
 Price                                                          0
 Content Rating                                          Everyone
 Genres                                              Art & Design
 Last Updated                                     January 7, 2018
 Current Ver                                                1.0.0
 Android Ver                                         4.0.3 and up
 Name: 0, dtype: object,
 App               U Launcher Lite – FREE Live Cool Themes, Hide ...
 Category                                       

In [14]:
clean_android_data = pd.DataFrame(android_clean)

In [15]:
clean_android_data.index

Int64Index([    0,     2,     3,     4,     5,     6,     7,     8,     9,
               10,
            ...
            10831, 10832, 10833, 10834, 10835, 10836, 10837, 10838, 10839,
            10840],
           dtype='int64', length=9659)

In [16]:
len(clean_android_data)

9659

In [17]:
clean_android_data[4400:4420]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
5500,AQ Math Facts,FAMILY,,1,16M,1+,Paid,$2.99,Everyone,Educational,"May 17, 2017",1.0.3,4.0 and up
5501,Adventure Quest World Mobile Quiz,FAMILY,4.6,28,26M,500+,Free,0,Everyone,Puzzle,"January 16, 2018",3.1.6z,4.0.3 and up
5502,Aqw&3d Design Notes Manager,FAMILY,4.5,11,2.9M,500+,Free,0,Everyone,Entertainment,"August 15, 2017",3.0,4.0.3 and up
5503,AQ Aspergers Test,MEDICAL,3.9,265,714k,"10,000+",Free,0,Everyone,Medical,"October 14, 2015",1.4.0,2.3.3 and up
5504,Guess the Class 🔥 AQW,GAME,4.7,57,12M,"1,000+",Free,0,Everyone,Word,"November 1, 2017",1.0.0,4.1 and up
5505,Puffin Web Browser,COMMUNICATION,4.3,541661,Varies with device,"10,000,000+",Free,0,Everyone,Communication,"July 9, 2018",7.5.3.20547,4.1 and up
5506,AQ Ria Retail,FAMILY,5.0,4,52M,50+,Free,0,Everyone,Education,"April 3, 2018",1.1,4.1 and up
5507,Accounting Quiz (AQ) Malaysia,FAMILY,5.0,25,Varies with device,"1,000+",Free,0,Everyone,Education,"January 29, 2018",Varies with device,4.0 and up
5508,AQ Guards,PRODUCTIVITY,,0,3.2M,10+,Free,0,Everyone,Productivity,"July 6, 2018",2.1.22,4.4 and up
5509,Wowkwis aq Ka'qaquj,FAMILY,5.0,1,49M,10+,Free,0,Everyone,Education;Education,"February 16, 2018",1.0,4.0.3 and up


In [18]:
clean_android_data.reset_index(drop=True)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
2,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
3,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
4,Paper flowers instructions,ART_AND_DESIGN,4.4,167,5.6M,"50,000+",Free,0,Everyone,Art & Design,"March 26, 2017",1.0,2.3 and up
5,Smoke Effect Photo Maker - Smoke Editor,ART_AND_DESIGN,3.8,178,19M,"50,000+",Free,0,Everyone,Art & Design,"April 26, 2018",1.1,4.0.3 and up
6,Infinite Painter,ART_AND_DESIGN,4.1,36815,29M,"1,000,000+",Free,0,Everyone,Art & Design,"June 14, 2018",6.1.61.1,4.2 and up
7,Garden Coloring Book,ART_AND_DESIGN,4.4,13791,33M,"1,000,000+",Free,0,Everyone,Art & Design,"September 20, 2017",2.9.2,3.0 and up
8,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1M,"10,000+",Free,0,Everyone,Art & Design;Creativity,"July 3, 2018",2.8,4.0.3 and up
9,Text on Photo - Fonteee,ART_AND_DESIGN,4.4,13880,28M,"1,000,000+",Free,0,Everyone,Art & Design,"October 27, 2017",1.0.4,4.1 and up


In [19]:
clean_android_data[4410:4415]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
5510,AQ Coach,SPORTS,,0,28M,5+,Free,0,Everyone,Sports,"May 25, 2018",1.1.0,4.4 and up
5511,AQ Dentals,HEALTH_AND_FITNESS,,0,12M,10+,Free,0,Everyone,Health & Fitness,"December 22, 2017",1.0.1,4.1 and up
5513,中国語 AQリスニング,FAMILY,,21,17M,"5,000+",Free,0,Everyone,Education,"June 22, 2016",2.4.0,4.0 and up
5514,ClanHQ,COMMUNICATION,2.7,560,37M,"10,000+",Free,0,Everyone,Communication,"July 25, 2018",1.0.21,4.4 and up
5515,QuickShortcutMaker,PERSONALIZATION,4.6,41000,2.0M,"1,000,000+",Free,0,Everyone,Personalization,"February 23, 2014",2.4.0,1.6 and up


In [20]:
def is_english(string):
    
    for x in range(len(string)):
        if ord(string[x]) > 127:
            return False
    return True
    
s = "Instagram"
print(is_english(s))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('لعبة تقدر تربح DZ'))
print(is_english('Maninder'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
True
False
False


In [21]:
def is_english(string):
    special_char = 0
    for x in range(len(string)):
        if ord(string[x]) > 127:
            special_char += 1
        if special_char > 3:
            return False
       
      
        
    return True
    
s = "Instagram"
print(is_english(s))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('لعبة تقدر تربح DZ'))
print(is_english('Maninder'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
True
True
True


In [22]:
#do the same thing for apple data later on!! DON'T FORGET!!!!!

android_eng = []
for idx, row in clean_android_data.iterrows():
    name = row['App']
    
    output = is_english(name)
    
    if output == True:
        android_eng.append(row)
        
len(android_eng)

9614

In [23]:
android_english = pd.DataFrame(android_eng)

In [24]:
android_english.reset_index(drop = True)


Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
2,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
3,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
4,Paper flowers instructions,ART_AND_DESIGN,4.4,167,5.6M,"50,000+",Free,0,Everyone,Art & Design,"March 26, 2017",1.0,2.3 and up
5,Smoke Effect Photo Maker - Smoke Editor,ART_AND_DESIGN,3.8,178,19M,"50,000+",Free,0,Everyone,Art & Design,"April 26, 2018",1.1,4.0.3 and up
6,Infinite Painter,ART_AND_DESIGN,4.1,36815,29M,"1,000,000+",Free,0,Everyone,Art & Design,"June 14, 2018",6.1.61.1,4.2 and up
7,Garden Coloring Book,ART_AND_DESIGN,4.4,13791,33M,"1,000,000+",Free,0,Everyone,Art & Design,"September 20, 2017",2.9.2,3.0 and up
8,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1M,"10,000+",Free,0,Everyone,Art & Design;Creativity,"July 3, 2018",2.8,4.0.3 and up
9,Text on Photo - Fonteee,ART_AND_DESIGN,4.4,13880,28M,"1,000,000+",Free,0,Everyone,Art & Design,"October 27, 2017",1.0.4,4.1 and up


In [25]:
android_eng_free = []
for idx, row in android_english.iterrows():
    price = row['Price']
    
    if price == '0':
        android_eng_free.append(row)
        
len(android_eng_free)

8864

In [41]:
and_data = pd.DataFrame(android_eng_free)

In [42]:
data = and_data['Category']

create a function freq_function
- takes in input = dataset['column']
- return freq table as dictionary


In [43]:
pd.value_counts(data.values.ravel()) 

FAMILY                 1676
GAME                    862
TOOLS                   750
BUSINESS                407
LIFESTYLE               346
PRODUCTIVITY            345
FINANCE                 328
MEDICAL                 313
SPORTS                  301
PERSONALIZATION         294
COMMUNICATION           287
HEALTH_AND_FITNESS      273
PHOTOGRAPHY             261
NEWS_AND_MAGAZINES      248
SOCIAL                  236
TRAVEL_AND_LOCAL        207
SHOPPING                199
BOOKS_AND_REFERENCE     190
DATING                  165
VIDEO_PLAYERS           159
MAPS_AND_NAVIGATION     124
FOOD_AND_DRINK          110
EDUCATION               103
ENTERTAINMENT            85
LIBRARIES_AND_DEMO       83
AUTO_AND_VEHICLES        82
HOUSE_AND_HOME           73
WEATHER                  71
EVENTS                   63
PARENTING                58
ART_AND_DESIGN           57
COMICS                   55
BEAUTY                   53
dtype: int64

In [44]:
data = and_data['Genres']
pd.value_counts(data.values.ravel()) 

Tools                                    749
Entertainment                            538
Education                                474
Business                                 407
Productivity                             345
Lifestyle                                345
Finance                                  328
Medical                                  313
Sports                                   307
Personalization                          294
Communication                            287
Action                                   275
Health & Fitness                         273
Photography                              261
News & Magazines                         248
Social                                   236
Travel & Local                           206
Shopping                                 199
Books & Reference                        190
Simulation                               181
Dating                                   165
Arcade                                   164
Video Play

In [45]:
data = and_data['Genres']
pd.value_counts(data.values.ravel()) 

Tools                                    749
Entertainment                            538
Education                                474
Business                                 407
Productivity                             345
Lifestyle                                345
Finance                                  328
Medical                                  313
Sports                                   307
Personalization                          294
Communication                            287
Action                                   275
Health & Fitness                         273
Photography                              261
News & Magazines                         248
Social                                   236
Travel & Local                           206
Shopping                                 199
Books & Reference                        190
Simulation                               181
Dating                                   165
Arcade                                   164
Video Play

In [None]:
genres_ios = freq_table(apple_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)