# Analysing Mobile App Data
### Determining the profiles of apps on the Google Play Store (Android) and the App Store (Apple)

The data analysis below will help us determine which app profiles are the most successful in the different app stores.

Open and read the data from the Apple App store and Google Play Store

In [3]:
from csv import reader
open_apple = open('AppleStore.csv', encoding='utf8')
open_google = open('googleplaystore.csv', encoding='utf8')
read_apple = reader(open_apple)
read_google = reader(open_google)
apple_data = list(read_apple)
google_data = list(read_google)

The below function prints out the number of rows and colums of both datasets and prints a preview of the data in whatever range is determined.

In [4]:
def explore_data(dataset, start, end, rows_and_columns=True):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        print('\n')

##### The colums we will be focusing on for our analysis are:

**Apple Store Data Columns**: ***'price', 'user_rating', 'prime_genre'***

**Google Play Data Columns**: ***'Rating', 'Reviews', 'Price', 'Genres'***

In [5]:
print('Apple Data: ', '\n''\n')
summary_apple = explore_data(apple_data, 0, 6)

print('\n''\n''Google Data: ', '\n''\n')
summary_apple = explore_data(google_data, 0, 6)

Apple Data:  


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7198
Number of columns: 16

###### Delete incorrect data

The row with index 10473 is missing the 'Category column'. This row is deleted below.

In [6]:
del google_data[10473]

# check if row is deleted by running the for loop below

for row in google_data[1:]:
    if len(row) != len(google_data[0]):
        print(google_data.index(row))


###### Delete duplicates
 We will be finding and deleting the duplicate entries in the Google Play Store data.

 We won't remove rows randomly, but rather we'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more reliable the ratings.

In [7]:
duplicate_apps = []
unique_apps = []

# Loop through the dataset to identify duplicates
# Add the duplicates to the 'duplicate_apps' list

for app in google_data[1:]:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

        
print('Number of unique apps: ', len(unique_apps))
print('\n')
print('Number of duplicate apps: ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps: ', duplicate_apps[:10])


Number of unique apps:  9659


Number of duplicate apps:  1181


Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [8]:
reviews_max = {}

for app in google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Expected length:', len(google_data[1:]) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


1) Initialize two empty lists, google_clean and already_added.

2) Loop through google_data, and for every iteration:

3) Isolate the name of the app and the number of reviews.

4) Add the current row (app) to the android_clean list, and the app name (name) to the already_added list if:

        The number of reviews of the current app matches the number of reviews of  that app as described in the reviews_most dictionary; and
    The name of the app is not already in the already_added list

In [9]:
google_clean = []
already_added = []

for app in google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        google_clean.append(app)
        already_added.append(name)

In [10]:
explore_data(google_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13




###### Remove non-english apps


In [11]:
def is_english(sentence):
    
    non_ascii = 0
    
    for char in sentence:
        if ord(char) > 127:
            non_ascii += 1
        if non_ascii > 3:
            return False
        
    return True

###### Separate the English and non-English apps for both datasets

In [12]:
android_english = []
apple_english = []

for app in google_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in apple_data:
    name = app[1]
    if is_english(name):
        apple_english.append(app)

In [13]:
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(apple_english, 0, 3, True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13




['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Insta

In [15]:
google_free = []
apple_free = []
    
for app in android_english:
    price = app[7]
    if price == '0':
        google_free.append(app)
        

for app in apple_english:
    price = app[4]
    if price == '0.0':
        apple_free.append(app)
        
        
print('Number of free Apple Store apps: ', len(apple_free))
print('Number of free Google Play Store apps: ', len(google_free))


Number of free Apple Store apps:  3222
Number of free Google Play Store apps:  8864


##### Determine Most used apps.

To determine the profile of the most used apps on both the App store and Apple App Store we will be able to identify which kinds of apps attract the most users. Our goal is to create a minimal version of an app and put it on the Google play store and if it is successful in six months we will launch an IOS version of the app.

In [16]:
def freq_table(dataset, index):
    
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
    
    

In [23]:
print('The App Store: ')
display_table(apple_free, -5)
print('\n')
print('Google Play Store: ')
display_table(google_free, 1)

The App Store: 
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Google Play Store: 
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PE

#### From the above:

We can see that the most popular genre of free apps in the Apple App store is 'Games' with 58%. On the Google play store, the most popular categories are 'Family' with 18%, followed by 'Tools' with 8%.