TITLE: APPLE STORE and GOOGLE PLAY

The goal of this project is to help developers understand what type of apps are likely to attract more users on Google Play and the App Store.

This a guided project from Dataquest.  The function `explore_data` was provided by Dataquest

Further information on the Google data set can be found at
https://www.kaggle.com/lava18/google-play-store-apps/home

Further information on the Apple data set can be found at

https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home


In [None]:
open_apple_data = open('AppleStore.csv', encoding = 'utf8')
open_google_data = open('googleplaystore.csv', encoding = 'utf8')
from csv import reader
read_apple_data = reader(open_apple_data)
read_google_data = reader(open_google_data)
list_apple_data = list(read_apple_data)
list_google_data = list(read_google_data)



In [None]:
def explore_data(dataset, start, end, 
rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        

In [None]:
explore_data(list_google_data, 0, 3, True)

In [None]:
explore_data(list_apple_data, 0, 3, True)

In [None]:
print(list_apple_data[0], '\n', '\n', list_google_data[0])

Based on the headers above and information in the Apple and Google file documentation, the data that at this point appears potentially useful for meeting the goal is:

*Apple*

- price
- user_rating
- cont_rating
- prime_genre

*Google*

- Price
- Content Rating
- Rating (user rating)
- Category
- Genre



In [None]:
#Category missing from google entry 10473, delete entry

print(list_google_data[0], '\n', '\n', list_google_data[10473])
del(list_google_data[10473]) #DON'T RUN ran once already
print(list_google_data[10473])

Discussion board indicates app "Subway Surfer", "Instagram" has replicate rows. The following function, modified from code provided by Dataquest, looks for how many duplicate entries are in the Apple and Google data sets.

In [None]:
def id_duplicate_apps(dataset,app_name_index):
    duplicate_apps = []
    unique_apps = []
    for app in dataset:
        app_name = app[app_name_index]
        if app_name not in unique_apps:
            unique_apps.append(app_name)
        else:
            duplicate_apps.append(app_name)
    
    if len(duplicate_apps) < 5:
        return len(duplicate_apps), duplicate_apps[::]
    else:
        return len(duplicate_apps), duplicate_apps[:5]

In [None]:
id_duplicate_apps(list_google_data, 0)

In [None]:
id_duplicate_apps(list_apple_data, 1)

In [None]:
#Determine how the Apple replicates differ (this code not needed)
#for app in list_apple_data:
#        name = app[1]
#        if name xxxxxxx:
#            print(app)
#            print('\n')

In [None]:
#Determine how the Google replicates differ
for app in list_google_data:
        name = app[0]
        if name in ('Quick PDF Scanner + OCR FREE','Box',
                    'Google My Business') :
            print(app)
            print('\n')


Of the sampled Google replicate entries no differences in an entry were noted. For this code the entry with the maximum number of reviews ( `Reviews`) will be included.

Create dictionary `reviews_max` of app name (key) max # of reviews (value) from the Google list. Remove Google 
list entries where the number of reviews is < max # of reviews for a given app name.



In [None]:
reviews_max = {}
for app in list_google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if (name in reviews_max and 
    (reviews_max[name] < n_reviews)):
        reviews_max[name] = n_reviews
        
    if name not in reviews_max:
        reviews_max.update({name: n_reviews})

In [None]:
len(reviews_max)

Create a list `google_clean` of the entire list_google_data entry for the apps in `reviews_max`

In [None]:
google_clean = []
google_already_added = []
for app in list_google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if (n_reviews == reviews_max[name] and  
       name not in google_already_added):
            
        google_clean.append(app)
        google_already_added.append(name)

In [None]:
len(google_clean)

In [None]:
#function check if string is English

def ascii_english(my_string):
    no_characters_nonenglish = 0
    for letter in my_string:
        if ord(letter) > 127:
            no_characters_nonenglish += 1
            if no_characters_nonenglish == 3:
                return False
    
    return True

In [None]:
ascii_english('Instachat 😜热')

In [None]:
#Make a no replicates, english only google list using
#the `ascii_english` function

google_clean_english = []
def row_english(dataset):
    for row in dataset:
        if ascii_english(row[0]):
            google_clean_english.append(row)

In [None]:
row_english(google_clean)
len(google_clean_english)

In [None]:
len(list_google_data)

Create the `_clean_english` file for Apple by replicating the code used for Google.

In [None]:
#Step 1

apple_reviews_max = {}
for app in list_apple_data[1:]:
    name = app[1]
    ratings_count_ver = float(app[6])
    
    if (name in apple_reviews_max and 
        (apple_reviews_max[name] < ratings_count_ver)):
        apple_reviews_max[name] = ratings_count_ver

    if name not in apple_reviews_max:
        apple_reviews_max.update({name: ratings_count_ver})

#Step 2
apple_clean = []
apple_already_added = []
for app in list_apple_data[1:]:
    name = app[1]
    ratings_count_ver = float(app[6])
    if (ratings_count_ver == apple_reviews_max[name]
    and name not in apple_already_added):
            
        apple_clean.append(app)
        apple_already_added.append(name)

#Step 3

apple_clean_english = []
def apple_row_english(dataset):
    for row in dataset:
        if ascii_english(row[0]):
            apple_clean_english.append(row)

In [None]:
apple_row_english(apple_clean)
len(apple_clean_english)

In [None]:
len(list_apple_data)

In [None]:
#Separate free from non free apps for both google 
#and apple

google_free = []
for app in google_clean_english:
    if(app[6]) == 'Free':
        google_free.append(app) 
        
apple_free = []
for app in apple_clean_english:
    if(app[5]) == '0':
        apple_free.append(app) 

In [None]:
len(google_free)

In [None]:
len(apple_free)

In [None]:
print('APPLE', '\n', list_apple_data[0], '\n', '\n', apple_free[1],
      '\n', '\n', '\n', 'GOOGLE', '\n', list_google_data[0], '\n', '\n', 
      google_free[1])

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

To find out what are the most common types of apps in the market, we'll use the following columns:

*Apple* - 'rating_count_tot', 'prime_genre'

*Google* -'Category', 'Installs', 'Genres'

In [None]:
def freq_table(dataset, index):
    freq_table_dict = {}
    for key in dataset:
        if key[index] in freq_table_dict:
            freq_table_dict[key[index]] += 1
        else:
            freq_table_dict[key[index]] = 1
            
    key_total = 0
    for index in freq_table_dict:
        key_total += freq_table_dict[index]
    
    for index in freq_table_dict:
        freq_table_dict[index] = (freq_table_dict[index] / 
                                  key_total *100)
    
    return freq_table_dict

In [None]:
# code provided by Dataquest

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [41]:
display_table(google_free, 1)

FAMILY : 18.932971628800725
GAME : 9.698202780603594
TOOLS : 8.45484344975698
BUSINESS : 4.600429524132474
PRODUCTIVITY : 3.8996269922007465
LIFESTYLE : 3.888323725556686
FINANCE : 3.7074714592517237
MEDICAL : 3.537922459590822
SPORTS : 3.39097999321804
PERSONALIZATION : 3.3231603933536795
COMMUNICATION : 3.2327342602011986
HEALTH_AND_FITNESS : 3.0857917938284163
PHOTOGRAPHY : 2.950152594099695
NEWS_AND_MAGAZINES : 2.803210127726913
SOCIAL : 2.6675709279981916
TRAVEL_AND_LOCAL : 2.3397761953204474
SHOPPING : 2.2493500621679665
BOOKS_AND_REFERENCE : 2.136317395727365
DATING : 1.8650389962699219
VIDEO_PLAYERS : 1.797219396405561
MAPS_AND_NAVIGATION : 1.3903017972193965
FOOD_AND_DRINK : 1.2433593308466147
EDUCATION : 1.1642364643381937
ENTERTAINMENT : 0.9607776647451114
LIBRARIES_AND_DEMO : 0.938171131456991
AUTO_AND_VEHICLES : 0.9268678648129309
HOUSE_AND_HOME : 0.8025319317282694
WEATHER : 0.7912286650842093
EVENTS : 0.7121057985757884
PARENTING : 0.6555894653554878
ART_AND_DESIGN : 0.6

In [42]:
display_table(google_free, 9)

Tools : 8.44354018311292
Entertainment : 6.081157454504352
Education : 5.357748389284503
Business : 4.600429524132474
Productivity : 3.8996269922007465
Lifestyle : 3.8770204589126256
Finance : 3.7074714592517237
Medical : 3.537922459590822
Sports : 3.458799593082401
Personalization : 3.3231603933536795
Communication : 3.2327342602011986
Action : 3.0970950604724763
Health & Fitness : 3.0857917938284163
Photography : 2.950152594099695
News & Magazines : 2.803210127726913
Social : 2.6675709279981916
Travel & Local : 2.3284729286763874
Shopping : 2.2493500621679665
Books & Reference : 2.136317395727365
Simulation : 2.045891262574884
Dating : 1.8650389962699219
Arcade : 1.8424324629818016
Video Players & Editors : 1.7746128631174407
Casual : 1.763309596473381
Maps & Navigation : 1.3903017972193965
Food & Drink : 1.2433593308466147
Puzzle : 1.1303266644060133
Racing : 0.9946874646772917
Role Playing : 0.938171131456991
Libraries & Demo : 0.938171131456991
Auto & Vehicles : 0.9268678648129309

In [43]:
display_table(apple_free, 12)

Games : 55.64595660749507
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597637
Sports : 1.947731755424063
Health & Fitness : 1.8737672583826428
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032


For Google Play, the `apps` column gives the top 3 categories of free apps as FAMILY, GAME and TOOLS. The `Genres` column top 3 are TOOLS, ENTERTAINMENT and EDUCATION.

For free apps on Apple, per the `prime_genre` column GAMES are by far (55% of the total) most common app type, the 2nd most common being ENTERTAINMENT at 8%.

CONCLUSION:

The free apps most likely to attract users are in the GAME/ENTERTAINMENT genre.