# Analyzing Mobile App Data

* The goal of this project is to analyze Android & iOS mobile applications and determine the types of apps that are successful at attracting a high amount of users. 
* The number of users that view & engage with the ads in-app will determine the revenue, so the company is looking for strageic guidance on the type of applications that may attract a sufficient user base, which will then be served these ads.
* Datasets used: [Android Apps](https://www.kaggle.com/datasets/lava18/google-play-store-apps), [iOS Apps](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps?select=AppleStore.csv)

In [3]:
from csv import reader

### Android data set ###
opened_file = open('/Users/hannahpaxton/Desktop/Coursework/googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### iOS data set ###
opened_file = open('/Users/hannahpaxton/Desktop/Coursework/AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [5]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [9]:
print(explore_data(android, 0, 5, True))
print(explore_data(ios, 0, 5, True))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13
None
['284882215', 'Facebook', '389

In [11]:
print(android_header)
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


### Data Cleaning 

* Checking to see if a [row with incorrect values](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015) was deleted

In [20]:
print(android[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


In [24]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of Duplicate Apps:', len(duplicate_apps))
print('\n')
print('Example of Duplicate Apps:', duplicate_apps[:15])

Number of Duplicate Apps: 1181


Example of Duplicate Apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


* Removing duplicate rows for apps (kept record with largest number of ratings, assuming this is the most recent record)

In [39]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [45]:
android_clean = []
already_added = []

for app in android: 
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

* Removing non-English apps from the data set (as the company only develops applications in English). Uses the [ASCII system](https://en.wikipedia.org/wiki/ASCII) to detect charaters outside those commonly used in English.

In [111]:
def character_detect(name):
    non_english_char = [] 

    for character in name:
        if ord(character) > 127:
            non_english_char.append(character)
                
    if len(non_english_char) > 3:
        return False
    else: 
        return True

In [113]:
print(character_detect('Instagram'))
print(character_detect('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(character_detect('Docs To Go™ Free Office Suite'))
print(character_detect('Instachat 😜'))

True
False
True
True


In [138]:
def find_english_apps(dataset, name_index):

    english_apps = []
    non_english_apps = []

    for app in dataset: 
        name = app[name_index]

        if character_detect(name):
            english_apps.append(app)
        else:
            non_english_apps.append(app)
    return english_apps, non_english_apps

<b>English iOS Apps</b> <i>(did not require any duplicate removal)</i>

In [236]:
ios_english_apps, ios_non_english_apps = find_english_apps(ios, 1)

print('Length of total iOS apps:', len(ios))
print('\n')
print('Length of English iOS apps:', len(ios_english_apps))
print('Examples of English iOS apps:', ios_english_apps[:5])
print('\n')
print('Length of non-English iOS  apps:', len(ios_non_english_apps))
print('Examples of non-English iOS apps:', ios_non_english_apps[:5])                                                          

Length of total iOS apps: 7197


Length of English iOS apps: 6183
Examples of English iOS apps: [['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']]


Length of non-English iOS  apps: 1014
Examples of non-English iOS apps: [['445375097', '爱奇艺PPS -《欢乐颂2》电视剧热播', '224617472', 'USD', '0.0', '14844', '0', '4.0', '0.0', '6.3.3', '17+', 'Entertainmen

<b>English Android Apps</b>

In [234]:
android_english_apps, android_non_english_apps = find_english_apps(android_clean, 0)

print('Length of total Android apps:', len(android_clean))
print('\n')
print('Length of English Android apps:', len(android_english_apps))
print('Examples of English Android apps:', android_english_apps[:5])
print('\n')
print('Length of non-English Android  apps:', len(android_non_english_apps))
print('Examples of non-English Android apps:', android_non_english_apps[:5])

Length of total Android apps: 9659


Length of English Android apps: 9614
Examples of English Android apps: [['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2

### Isolating Free Apps

In [240]:
def find_free_apps(dataset, price_index):

    free_apps = []
    non_free_apps = []

    for app in dataset: 
        price = app[price_index]

        if price == str(0.0) or price == 'Free':
            free_apps.append(app)
        else:
            non_free_apps.append(app)
    return free_apps, non_free_apps

<b>Free English iOS Apps</b>

In [242]:
ios_free_apps, ios_non_free_apps = find_free_apps(ios_english_apps, 4)

print('Length of total iOS English apps:', len(ios_english_apps))
print('\n')
print('Length of free English iOS apps:', len(ios_free_apps))
print('Examples of free English iOS apps:', ios_free_apps[:5])
print('\n')
print('Length of non-free, non-English iOS  apps:', len(ios_non_free_apps))
print('Examples of non-free, non-English iOS apps:', ios_non_free_apps[:5])   

Length of total iOS English apps: 6183


Length of free English iOS apps: 3222
Examples of free English iOS apps: [['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']]


Length of non-free, non-English iOS  apps: 2961
Examples of non-free, non-English iOS apps: [['362949845', 'Fruit Ninja Classic', '104590336', 'USD', '1.99', '698516', '132', '

<b>Free English Android Apps</b>

In [244]:
android_free_apps, android_non_free_apps = find_free_apps(android_english_apps, 6)

print('Length of total Android English apps:', len(android_english_apps))
print('\n')
print('Length of free English Android apps:', len(android_free_apps))
print('Examples of free English Android apps:', android_free_apps[:5])
print('\n')
print('Length of non-free, non-English Android  apps:', len(android_non_free_apps))
print('Examples of non-free, non-English Android apps:', android_non_free_apps[:5])  

Length of total Android English apps: 9614


Length of free English Android apps: 8863
Examples of free English Android apps: [['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26

In [278]:
def freq_table(dataset, index): 
    cat_frequency = {}

    for app in dataset:
        category = app[index]

        if category in cat_frequency:
            cat_frequency[category] += 1
        else:
            cat_frequency[category] = 1

    for category in cat_frequency:
        cat_frequency[category] /= len(dataset)
        cat_frequency[category] *= 100

    return cat_frequency

In [280]:
print(freq_table(android_free_apps, 1))

{'ART_AND_DESIGN': 0.6431230960171499, 'AUTO_AND_VEHICLES': 0.9251946293580051, 'BEAUTY': 0.5979916506826132, 'BOOKS_AND_REFERENCE': 2.1437436533904997, 'BUSINESS': 4.592124562789123, 'COMICS': 0.6205573733498815, 'COMMUNICATION': 3.2381812027530184, 'DATING': 1.8616721200496444, 'EDUCATION': 1.1621347173643235, 'ENTERTAINMENT': 0.9590432133589079, 'EVENTS': 0.7108202640189552, 'FINANCE': 3.7007785174320205, 'FOOD_AND_DRINK': 1.241114746699763, 'HEALTH_AND_FITNESS': 3.0802211440821394, 'HOUSE_AND_HOME': 0.8236488773552973, 'LIBRARIES_AND_DEMO': 0.9364774906916393, 'LIFESTYLE': 3.9038700214374367, 'GAME': 9.725826469592688, 'FAMILY': 18.898792733837304, 'MEDICAL': 3.5315355974275078, 'SOCIAL': 2.6627552747376737, 'SHOPPING': 2.245289405393208, 'PHOTOGRAPHY': 2.944826808078529, 'SPORTS': 3.396141261423897, 'TRAVEL_AND_LOCAL': 2.335552296062281, 'TOOLS': 8.462146000225657, 'PERSONALIZATION': 3.317161232088458, 'PRODUCTIVITY': 3.8925871601038025, 'PARENTING': 0.6544059573507841, 'WEATHER':

In [282]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

<b>Android Category & Genre Frequency</b>

In [288]:
print("Android Category")
print(display_table(android_free_apps, 1))
print("\n")
print("Android Genre")
print(display_table(android_free_apps, 9))

Android Category
FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
A

<b>iOS Prime Genre Frequency</b>

In [294]:
print("iOS Prime Genre")
print(display_table(ios_free_apps, 11))

iOS Prime Genre
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665
None


<b>iOS Prime Genre Average Number of User Ratings</b> <i>Used as a proxy for number of downloads</i>

In [310]:
genre_frequency = freq_table(ios_free_apps, 11)

def avg_user_rating(frequency_table, dataset, index_rating, index_genre): 

    for genre in frequency_table:
        total = 0
        len_genre = 0

        for app in dataset:
            genre_app = app[index_genre]

            if genre_app == genre:
                num_user_ratings = float(app[index_rating])
                total += num_user_ratings
                len_genre += 1

        average_user_ratings = total / len_genre
        print(genre, ":", average_user_ratings)

In [314]:
print(avg_user_rating(genre_frequency, ios_free_apps, 5, 11))

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0
None
