## App Installation Research Project

We only build apps that are free to download and install. Our revenue for any given app is mostly influenced by the number of users who use our app. The more users that see and engage with the ads, the better. __Our goal is to understand what type of iOS and Play apps are likely to attract more users so we can lean into these app types in the future.__

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
open_file = open('AppleStore.csv', encoding='utf8')
from csv import reader
read_file = reader(open_file)
ios = list(read_file)
ios_header = list(ios)[0]
ios_list = list(ios)[1:]

open_file = open('googleplaystore.csv')
read_file = reader(open_file)
play = list(read_file)
play_header = list(play)[0]
play_list = list(play)[1:]

In [3]:
explore_data(play,1,10,rows_and_columns = True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+

In [4]:
print(len(play_list))
print(len(ios_list))

10841
7197


In [5]:
print(ios_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [6]:
print(play_list[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [7]:
print(play_list[149])

['FBReader: Favorite Book Reader', 'BOOKS_AND_REFERENCE', '4.5', '203130', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Books & Reference', 'June 28, 2018', 'Varies with device', 'Varies with device']


In [8]:
print(play[10472])

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


In [9]:
del [play[10472]]

In [10]:
print(play_list[149])

['FBReader: Favorite Book Reader', 'BOOKS_AND_REFERENCE', '4.5', '203130', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Books & Reference', 'June 28, 2018', 'Varies with device', 'Varies with device']


In [11]:
print(play_list[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


## REMOVING DUPLICATES


The list above has duplicates. Here is an example of how Slack has been listed thrice and Facebook listed twice:

We have counted 1181 duplicate values. We will not remove these randomly, but rather we will remove all but the very latest app entry (as the latest one will have the most up to date reivew info). We will use a loop to identify duplicates and then remove them.

In [12]:
for app in play_list:
    name = app[0]
    if name == 'Facebook':
        print(app)

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


In [13]:
for app in play:
    name = app[0]
    if name == 'Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


In [14]:
unique_apps = []
duplicate_apps = []

for app in play:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Dups: ', len(duplicate_apps))
print('Uniques: ', len(unique_apps))

Dups:  1181
Uniques:  9660


In [15]:
print(duplicate_apps[:10])

['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [16]:
print(len(play) - len(duplicate_apps))

9660


In [17]:
del(play[10473])

In [18]:
for app in play_list:
    name = app[0]
    n_reviews = app[3]
    if '3.0M' in n_reviews or 'Life Made WI-Fi Touchscreen Photo Frame' in name:
        print(app)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [19]:
del(play_list[10472])

In [20]:
reviews_max = { }

for app in play_list:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews

In [21]:
android_clean = []
already_added = []

for app in play_list:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

## Removing Non-English Apps

In [22]:
def is_english (string):
    non_asci = 0
    for char in string:
        if ord(char) > 127:
            non_asci += 1
    if non_asci < 3:
        return True
    else: 
        return False
    


In [23]:
is_english('Docs To Go™ Free Office Suite')

True

In [24]:
is_english('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [25]:
print(play_header)
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [26]:
print(len(ios_list))
print(len(play_list))

7197
10840


## Isolating to Free Apps

In [27]:
ios_is_free = []
ios_not_free = []

for row in ios_list:
    ios_price = float(row[4].replace('$',''))
    ios_name = row[1]
    if ios_price > 0:
        ios_not_free.append(ios_name)
    else:
        ios_is_free.append(ios_name)
        
print('iOS free: ', len(ios_is_free))
print('iOS not free : ', len(ios_not_free))


play_is_free = []
play_not_free = []


for row in play_list:
    play_price = float(row[7].replace('$',''))
    play_name = row[0]
    if play_price > 0:
        play_not_free.append(play_name)
    else:
        play_is_free.append(play_name)
        
print('Play free: ', len(play_is_free))
print('Play not free : ', len(play_not_free))



iOS free:  4056
iOS not free :  3141
Play free:  10040
Play not free :  800


# FINDING MOST SUCCESSFUL APPS IN BOTH MARKETS
Find top app categories that are successful in both Play and iOS. Our end goal is to add apps to both markets, and we want to see which kinds perform best.

In [42]:
def freq_table(dataset, index):
    table = {}
    total = 0
    for row in dataset:
        total += 1
        genre = row[index]
        if genre in table:
            table[genre] += 1
        else:
            table[genre] = 1

    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


In [43]:
# ios_table = freq_table(ios_list,11)
# play_table = freq_table(play_list,1)

In [44]:
# def display_table(dataset, index):
#     table = freq_table(dataset, index)
#     table_display = []
#     for key in table:
#         key_val_as_tuple = (table[key], key)
#         table_display.append(key_val_as_tuple)
#     table_sorted = sorted(table_display, reverse = True)
#     for entry in table_sorted:
#         print(entry[1], ':', entry[0])

## What are the most popular iOS apps types?

Percentages of each app's DL listed below:

In [45]:
display_table(ios_list,11)

Games : 53.66124774211477
Entertainment : 7.433652910935113
Education : 6.294289287203002
Photo & Video : 4.849242740030569
Utilities : 3.4458802278727245
Health & Fitness : 2.501042100875365
Productivity : 2.473252744198972
Social Networking : 2.3204112824788106
Lifestyle : 2.0008336807002918
Music : 1.9174656106711132
Shopping : 1.6951507572599693
Sports : 1.5839933305543976
Book : 1.5562039738780047
Finance : 1.445046547172433
Travel : 1.1254689453939142
News : 1.0421008753647354
Weather : 1.0004168403501459
Reference : 0.8892594136445742
Food & Drink : 0.8753647353063776
Business : 0.7919966652771988
Navigation : 0.6391552035570377
Medical : 0.31957760177851885
Catalogs : 0.1389467833819647


Over half of the the iOS apps are games.

## What are the most popular types of Play apps?

In [46]:
display_table(play_list,1)

FAMILY : 18.19188191881919
GAME : 10.55350553505535
TOOLS : 7.776752767527675
MEDICAL : 4.271217712177122
BUSINESS : 4.243542435424354
PRODUCTIVITY : 3.911439114391144
PERSONALIZATION : 3.616236162361624
COMMUNICATION : 3.5701107011070112
SPORTS : 3.5424354243542435
LIFESTYLE : 3.5239852398523985
FINANCE : 3.3763837638376386
HEALTH_AND_FITNESS : 3.1457564575645756
PHOTOGRAPHY : 3.0904059040590406
SOCIAL : 2.7214022140221403
NEWS_AND_MAGAZINES : 2.61070110701107
SHOPPING : 2.3985239852398523
TRAVEL_AND_LOCAL : 2.3800738007380073
DATING : 2.158671586715867
BOOKS_AND_REFERENCE : 2.1309963099630997
VIDEO_PLAYERS : 1.6143911439114391
EDUCATION : 1.4391143911439115
ENTERTAINMENT : 1.3745387453874538
MAPS_AND_NAVIGATION : 1.2638376383763839
FOOD_AND_DRINK : 1.1715867158671587
HOUSE_AND_HOME : 0.8118081180811807
LIBRARIES_AND_DEMO : 0.7841328413284132
AUTO_AND_VEHICLES : 0.7841328413284132
WEATHER : 0.7564575645756457
ART_AND_DESIGN : 0.5996309963099631
EVENTS : 0.5904059040590406
PARENTING : 

More of the Play apps are designed for utility

In [57]:
print(play_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [48]:
display_table(play_list,9)

Tools : 7.767527675276753
Entertainment : 5.747232472324723
Education : 5.064575645756458
Medical : 4.271217712177122
Business : 4.243542435424354
Productivity : 3.911439114391144
Sports : 3.671586715867159
Personalization : 3.616236162361624
Communication : 3.5701107011070112
Lifestyle : 3.5147601476014763
Finance : 3.3763837638376386
Action : 3.367158671586716
Health & Fitness : 3.1457564575645756
Photography : 3.0904059040590406
Social : 2.7214022140221403
News & Magazines : 2.61070110701107
Shopping : 2.3985239852398523
Travel & Local : 2.370848708487085
Dating : 2.158671586715867
Books & Reference : 2.1309963099630997
Arcade : 2.029520295202952
Simulation : 1.8450184501845017
Casual : 1.7804428044280445
Video Players & Editors : 1.595940959409594
Puzzle : 1.2915129151291513
Maps & Navigation : 1.2638376383763839
Food & Drink : 1.1715867158671587
Role Playing : 1.0055350553505535
Strategy : 0.9870848708487084
Racing : 0.904059040590406
House & Home : 0.8118081180811807
Libraries & 

Genres has more granularity and has more variety than Category. However, Tools is a top performer on both.

In [49]:
print(ios_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [50]:
freq_table(ios_list, 11)

{'Book': 1.5562039738780047,
 'Business': 0.7919966652771988,
 'Catalogs': 0.1389467833819647,
 'Education': 6.294289287203002,
 'Entertainment': 7.433652910935113,
 'Finance': 1.445046547172433,
 'Food & Drink': 0.8753647353063776,
 'Games': 53.66124774211477,
 'Health & Fitness': 2.501042100875365,
 'Lifestyle': 2.0008336807002918,
 'Medical': 0.31957760177851885,
 'Music': 1.9174656106711132,
 'Navigation': 0.6391552035570377,
 'News': 1.0421008753647354,
 'Photo & Video': 4.849242740030569,
 'Productivity': 2.473252744198972,
 'Reference': 0.8892594136445742,
 'Shopping': 1.6951507572599693,
 'Social Networking': 2.3204112824788106,
 'Sports': 1.5839933305543976,
 'Travel': 1.1254689453939142,
 'Utilities': 3.4458802278727245,
 'Weather': 1.0004168403501459}

## Most Popular Apps by Genre in App Store

Social media apps get 45k ratings per app on average. Music comes in second with 28k.


In [58]:
prime_genre_freq = freq_table(ios_list, 11)

for genre in prime_genre_freq:
    total = 0
    len_genre = 0
    
    for row in ios_list:
        genre_app = row[11]
        
        if genre_app == genre:
            ratings = float(row[5])
            total += ratings
            len_genre += 1

    avg_ratings = total / len_genre
    print(genre, ':', avg_ratings)            

Finance : 11047.653846153846
Travel : 14129.444444444445
Health & Fitness : 9913.172222222222
Business : 4788.087719298245
Book : 5125.4375
Social Networking : 45498.89820359281
Weather : 22181.027777777777
Medical : 592.7826086956521
Shopping : 18615.32786885246
Games : 13691.996633868463
Productivity : 8051.3258426966295
Utilities : 6863.822580645161
Education : 2239.2295805739514
Entertainment : 7533.678504672897
Sports : 14026.929824561403
Navigation : 11853.95652173913
Music : 28842.021739130436
Photo & Video : 14352.280802292264
Food & Drink : 13938.619047619048
Reference : 22410.84375
Catalogs : 1732.5
News : 13015.066666666668
Lifestyle : 6161.763888888889


In [52]:
print(play_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


## Most Popular Apps by Genre in Play Store

In [53]:
prime_genre_freq = freq_table(play_list, 1)

for genre in prime_genre_freq:
    total = 0
    len_genre = 0
    
    for row in play_list:
        genre_app = row[1]
        
        if genre_app == genre:
            ratings = float(row[3])
            total += ratings
            len_genre += 1

    avg_ratings = total / len_genre
    print(genre)            
    print(avg_ratings)


FAMILY
208025.5223123732
EDUCATION
253819.14102564103
SOCIAL
2105903.125423729
NEWS_AND_MAGAZINES
192229.19787985866
ART_AND_DESIGN
26376.0
PRODUCTIVITY
269143.80896226416
PHOTOGRAPHY
637363.1343283582
GAME
1385858.6966783216
FOOD_AND_DRINK
69947.48031496063
DATING
31159.30769230769
WEATHER
178106.5243902439
PERSONALIZATION
227923.82653061225
MEDICAL
3425.4319654427645
LIBRARIES_AND_DEMO
12201.388235294118
VIDEO_PLAYERS
630743.9314285715
PARENTING
15972.183333333332
MAPS_AND_NAVIGATION
223790.17518248176
EVENTS
2515.90625
LIFESTYLE
33724.56544502618
BUSINESS
30335.982608695653
AUTO_AND_VEHICLES
13690.188235294117
COMICS
56387.933333333334
TOOLS
324062.9228944247
HEALTH_AND_FITNESS
111125.34604105572
SHOPPING
442466.23846153845
FINANCE
47952.8087431694
ENTERTAINMENT
397168.8187919463
TRAVEL_AND_LOCAL
242705.11240310076
BOOKS_AND_REFERENCE
95060.90476190476
HOUSE_AND_HOME
45186.193181818184
BEAUTY
7476.226415094339
COMMUNICATION
2107137.622739018
SPORTS
184453.56510416666


In [54]:
prime_cat_freq = freq_table(play_list, 1)

for category in prime_cat_freq:
    total = 0
    len_category = 0
    
    for row in play_list:
        category_app = row[1]
        
        if category_app == category:
            installs = row[5]
            installs = installs.replace('+', '')
            installs = installs.replace(',', '')
            installs = float(installs)
            total += installs
            len_category += 1
    
    avg_cat = total / len_category
    print(category)
    print(avg_cat)

FAMILY
5201959.181034483
EDUCATION
5586230.769230769
SOCIAL
47694467.46440678
NEWS_AND_MAGAZINES
26488755.335689045
ART_AND_DESIGN
1912893.8461538462
PRODUCTIVITY
33434177.75707547
PHOTOGRAPHY
30114172.10447761
GAME
30669601.761363637
FOOD_AND_DRINK
2156683.0787401577
DATING
1129533.3632478632
WEATHER
5196347.804878049
PERSONALIZATION
5932384.647959184
MEDICAL
115026.86177105832
LIBRARIES_AND_DEMO
741128.3529411765
VIDEO_PLAYERS
35554301.25714286
PARENTING
525351.8333333334
MAPS_AND_NAVIGATION
5286729.124087592
EVENTS
249580.640625
LIFESTYLE
1407443.8193717278
BUSINESS
2178075.7934782607
AUTO_AND_VEHICLES
625061.305882353
COMICS
934769.1666666666
TOOLS
13585731.809015421
HEALTH_AND_FITNESS
4642441.3841642225
SHOPPING
12491726.096153846
FINANCE
2395215.120218579
ENTERTAINMENT
19256107.382550336
TRAVEL_AND_LOCAL
26623593.58914729
BOOKS_AND_REFERENCE
8318050.112554112
HOUSE_AND_HOME
1917187.0568181819
BEAUTY
513151.88679245283
COMMUNICATION
84359886.95348836
SPORTS
4560350.255208333
