# Profitable App Profiles for the App Store and Google Play Markets

The aim of the project is to help developers understand what type of apps are likely to attract more users on Google Play and the App Store.  
  
We build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app.  
  
Because of that we want to analyze data to help our developers understand what kinds of apps are likely to attract more users.

## Opening the datasets

In [1]:
import csv

In [2]:
apple_store_data = open('AppleStore.csv', 'r')
read_apple_store_data = list(csv.reader(apple_store_data))
apple_store = read_apple_store_data[1:]
apple_store_header = read_apple_store_data[0]

In [3]:
google_play_data = open('googleplaystore.csv', 'r')
read_google_play_data = list(csv.reader(google_play_data))
google_play = read_google_play_data[1:]
google_play_header = read_google_play_data[0]

## Exploring datasets 

To make it easier to explore datasets I will use the **explore_data()** function that displays a specified slice of the dataset and prints the dataset's dimensions.

In [4]:
def explore_data(dataset, start, end, rows_and_columns=False):
    
    """
    Displays a specified slice of the dataset and optionally prints the dataset's dimensions.
    
        Parameters:
            dataset (list of lists): The dataset to be explored, where each inner list represents a row.
            start (int): Starting index of the slice.
            end (int): Ending index of the slice (non-inclusive).
            rows_and_columns (bool, optional): If True, prints the number of rows and columns in the dataset.

        Returns:
            None
            
    """
        
    dataset_slice = dataset[start:end] 
    
    for row in dataset_slice:
        print(row)
        print('\n') 

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

### Apple Store Data

The documentation of the Apple Store Data can be found here: [Apple Store Dataset](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)  
The dataset can be downloaded directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)

In [5]:
print(apple_store_header)
print("\n")
explore_data(apple_store, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


We can see that we have 7197 rows of data and each row contains information about single app.  
  
We also have 16 columns and some of them that might be useful for our analysis are: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'.

### Google Play Data

The documentation of the Google Play Data can be found here: [Google Play Dataset](https://www.kaggle.com/datasets/lava18/google-play-store-apps)  
The dataset can be downloaded directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)

In [6]:
print(google_play_header)
print("\n")
explore_data(google_play, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

In Google Play dataset we have 10841 rows and each row also contains information about single app.  
  
For our analysis we can use some of 13 available columns which are: 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.

## Cleaning the Data

In the discussion section in Google Play documentation there is one row that is described to have some errors. 

In [7]:
print(google_play_header)
print("\n")
print(google_play[10472])
print("\n")
print(google_play[10471]) # to compare

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


Row 10472 has missing value and misplaced columns.

In [8]:
del google_play[10472]

## Removing Duplicate Entries

### Checking if there are duplicate apps

In [9]:
duplicate_apps = []
unique_apps = []

for app in google_play:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else: 
        unique_apps.append(name)

print('Amout of duplicate apps: ', len(duplicate_apps))
print('\n')
print('Examples of dupicate apps: ', duplicate_apps[:15])

Amout of duplicate apps:  1181


Examples of dupicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


There are 1181 duplicate apps in Google Play Dataset. The example of duplicate app is ZOOM Cloud Meetings:

### Checking what is the difference between duplicate rows

In [10]:
for app in google_play:
    name = app[0]
    if name == "ZOOM Cloud Meetings":
        print(app)
        print('\n')

['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up']


['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up']




These rows do not differ

In [11]:
for app in google_play:
    name = app[0]
    if name == "Xero Accounting Software":
        print(app)
        print('\n')

['Xero Accounting Software', 'BUSINESS', '3.5', '2111', 'Varies with device', '100,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', 'Varies with device', 'Varies with device']


['Xero Accounting Software', 'BUSINESS', '3.5', '2111', 'Varies with device', '100,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', 'Varies with device', 'Varies with device']




These rows do not differ either

### Deciding which rows will be deleted

In [12]:
for app in google_play:
    name = app[0]
    if name == "Instagram":
        print(app)
        print('\n')

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




The difference between these rows is the amount of reviews: that shows that the data was collected at different times.  
So we are free to remove duplicates.  
The number of reviews will be criterion for removing the duplicates. I'll keep the row with the highest number of rewievs which is the most recent data.



In [13]:
reviews_max = {}
for app in google_play:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and n_reviews > reviews_max[name]:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        

In [14]:
print(reviews_max["Instagram"])

66577446.0


In [15]:
google_play_clean = []
already_added = []


In [16]:
for app in google_play:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        google_play_clean.append(app)
        already_added.append(name)

In the code above I use **max_reviews** dictionary to remove duplicate apps. 
* I initialize two lists: **google_play_clean** and **already added**.
* I loop through **google_play** dataset and isolate name and number of reviews of the app.
* If number of reviews equals number of reviews from **max_reviews** dictionary and name of the app is not already in **already_added** list, I add the app to **google_play_clean** list

In [17]:
explore_data(google_play_clean, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


## Removing Non-English Apps

### Checking if word is english

For checking if the app is english I will use the **check_if_english** function written below.

In [18]:
def check_if_english(a_string):
    '''
    Check if a string is predominantly in English by counting non-ASCII characters.

            Parameters:
                    a_string (str): The string to be checked.

            Returns:
                    bool: Returns True if the string contains 3 or fewer non-ASCII characters (ASCII > 127), 
                          suggesting it is primarily in English. Returns False if it contains more than 3 non-ASCII characters.
    '''
    
    non_ascii = 0
    
    for character in a_string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True


Since the names of some English apps contain characters like emojis or symbols such as "™", I will remove apps with more than three non-ASCII characters.  This approach ensures that we don’t lose important data about English apps due to the presence of emojis or similar symbols.

In [19]:
print(check_if_english('Instagram'))
print(check_if_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_if_english('Docs To Go™ Free Office Suite'))
print(check_if_english('Instachat 😜'))

True
False
True
True


### Removing non-english apps

Below I use **check_if_english()** function to filter the Google Play dataset and Apple Store dataset. Them I explore data one more time.

In [20]:
google_play_only_english = []
for app in google_play_clean:
    name = app[0]
    if check_if_english(name):
        google_play_only_english.append(app)

In [21]:
explore_data(google_play_only_english, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9614
Number of columns: 13


In [22]:
apple_store_only_english = []

for app in apple_store:
    name = app[1]
    if check_if_english(name):
        apple_store_only_english.append(app)

In [23]:
explore_data(apple_store_only_english, 0 ,5, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 6183
Number of columns: 16


## Isolating the Free Apps

In [24]:
google_play_final = []
for app in google_play_only_english:
    price = app[7]
    
    if price == '0':
        google_play_final.append(app)
    

In [25]:
explore_data(google_play_final, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8864
Number of columns: 13


In [26]:
apple_store_final = []
for app in apple_store_only_english:
    price = app[4]
    
    if price == '0.0':
        apple_store_final.append(app)

In [27]:
explore_data(apple_store_final, 0, 5, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 3222
Number of columns: 16


## Most common apps by genre

In [28]:
def freq_table(dataset, index):
    total = 0
    freq_table = {}
    for row in dataset:
        total += 1
        feature = row[index]
        if feature in freq_table:
            freq_table[feature] += 1
        else:
            freq_table[feature] = 1
    
    table_percentages = {}
    for key in freq_table:
        percentage = (freq_table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages
    

In [29]:
def display_table(dataset, index):
    '''
    Display the frequency table of a specified column in the dataset, sorted in descending order.

            Parameters:
                    dataset (list of lists): The dataset to be analyzed, where each inner list represents a row.
                    index (int): The index of the column to generate the frequency table for.

            Returns:
                    None: Prints each unique entry in the specified column along with its frequency, in descending order.
    '''
    
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [30]:
display_table(apple_store_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Percentages shows that among free English apps the most common are games (more than a half - 58.16%). 7.88% are entertainment apps and 4.96% are apps designed for photos and videos.  
  
We can see that apps with practical purpose (education, business, productivity, medical etc.) are far less common than apps designed for fun (like games, music, social networking).  
  
However, the large number of fun apps doesn't necessarily mean they also have the highest user base — demand might not match supply in this case.

In [31]:
display_table(google_play_final, 1) # Category 

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Among categories of apps from Google Play the most common are family apps (18.90%), followed by game apps (9.72%) and tools apps (8.46%).  
  
At Google Play apps with more practical purpose (like business, productivity, finance) are greater part of all apps than at Apple Store. 

In [32]:
display_table(google_play_final, 9) # Genre

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

When we analyze apps from Google Play grouped by genre we can see that the biggest group are tools app (8.45%), followed by entertainment apps (6%) and education apps (5.35%).
  
The fact that there are more apps with practical purpose at Google Play than at Apple Store is also confirmed by these percentages.

But the offer not always must equal to demand. So now I will investigate which categories of apps have the greatest number of users.

## Most popular apps by genre at Apple Store

In [33]:
genres_apple_store = freq_table(apple_store_final, -5)

In [34]:
genres_apple_store

{'Social Networking': 3.2898820608317814,
 'Photo & Video': 4.9658597144630665,
 'Games': 58.16263190564867,
 'Music': 2.0484171322160147,
 'Reference': 0.5586592178770949,
 'Health & Fitness': 2.0173805090006205,
 'Weather': 0.8690254500310366,
 'Utilities': 2.5139664804469275,
 'Travel': 1.2414649286157666,
 'Shopping': 2.60707635009311,
 'News': 1.3345747982619491,
 'Navigation': 0.186219739292365,
 'Lifestyle': 1.5828677839851024,
 'Entertainment': 7.883302296710118,
 'Food & Drink': 0.8069522036002483,
 'Sports': 2.1415270018621975,
 'Book': 0.4345127250155183,
 'Finance': 1.1173184357541899,
 'Education': 3.662321539416512,
 'Productivity': 1.7380509000620732,
 'Business': 0.5276225946617008,
 'Catalogs': 0.12414649286157665,
 'Medical': 0.186219739292365}

In Apple Store dataset there is no information about number of installs of apps. For this reason I will use column named **rating_count_total** which is total number of user ratings to estimate popularity of each genre of app.  
  
Below, I calculate the average number of user ratings per app genre on the App Store:

In [35]:
for genre in genres_apple_store:
    total = 0 
    len_genre = 0
    for app in apple_store_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_user_rating = float(app[5])
            total += n_user_rating
            len_genre += 1
    average_number = total / len_genre
    print(genre, ":", average_number)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


### App recomendation - Travel Together App

On average navigation apps has the highest number of user reviews, followed by social networking apps. Creating an app in both: navigation and social networking genres might be profitable because of high user engagement and demand.  
  
The example of app that connect both these genres might be app that enables planning a travel, a trip together with friends. That kind of app can offer features for arranging group travel, coordinating plans with friends, restaurant recommendations and points of interest with friend, travel-related forums and messaging for tips, reviews, and real-time location-based travel recommendations.  
  
This app can be used mainly for practical purposes, but it can also bring a lot of fun from plannig a trip and traveling together!


### Why not weather app?


Weather apps seem to be popular according to number of user reviews, but people usually do not spend much time on them. They usually take a quick look at current weather and weather forecast for the rest of day or other days and close app, so if we want to create an app that makes profit from in-app adds, the weather app will not be a great idea.

### Why not food or drink app?

In [36]:
for app in apple_store_final:
    genre = app[-5]
    if genre == "Food & Drink":
        print(app[1], ": ", app[5])

Starbucks :  303856
Domino's Pizza USA :  258624
OpenTable - Restaurant Reservations :  113936
Allrecipes Dinner Spinner :  109349
DoorDash - Food Delivery :  25947
UberEATS: Uber for Food Delivery :  17865
Postmates - Food Delivery, Faster :  9519
Dunkin' Donuts - Get Offers, Coupons & Rewards :  9068
Chick-fil-A :  5665
McDonald's :  4050
Deliveroo: Restaurant Delivery - Order Food Nearby :  1702
SONIC Drive-In :  1645
Nowait Guest :  1625
7-Eleven, Inc. :  1356
Outback :  805
Bon Appetit :  750
Starbucks Keyboard :  457
Whataburger :  197
Delish Eatmoji Keyboard :  154
Lieferheld - Delicious food delivery service :  29
Lieferando.de :  29
McDo France :  22
Chefkoch - Rezepte, Kochen, Backen & Kochbuch :  20
Youmiam :  9
Marmiton Twist :  2
Open Food Facts :  1


As we can see, the most popular Food & Drink apps are connected with companies like Starbucks, Domino's Pizza etc. That means that creating popular and profitable Food & Drink app might reqire cooking and delivery service.

## Most popular apps by genre at Google Play

In [37]:
categories_google_play = freq_table(google_play_final, 1) # Categories

In [38]:
categories_google_play

{'ART_AND_DESIGN': 0.6430505415162455,
 'AUTO_AND_VEHICLES': 0.9250902527075812,
 'BEAUTY': 0.5979241877256317,
 'BOOKS_AND_REFERENCE': 2.1435018050541514,
 'BUSINESS': 4.591606498194946,
 'COMICS': 0.6204873646209386,
 'COMMUNICATION': 3.2378158844765346,
 'DATING': 1.861462093862816,
 'EDUCATION': 1.1620036101083033,
 'ENTERTAINMENT': 0.9589350180505415,
 'EVENTS': 0.7107400722021661,
 'FINANCE': 3.7003610108303246,
 'FOOD_AND_DRINK': 1.2409747292418771,
 'HEALTH_AND_FITNESS': 3.0798736462093865,
 'HOUSE_AND_HOME': 0.8235559566787004,
 'LIBRARIES_AND_DEMO': 0.9363718411552346,
 'LIFESTYLE': 3.9034296028880866,
 'GAME': 9.724729241877256,
 'FAMILY': 18.907942238267147,
 'MEDICAL': 3.531137184115524,
 'SOCIAL': 2.6624548736462095,
 'SHOPPING': 2.2450361010830324,
 'PHOTOGRAPHY': 2.944494584837545,
 'SPORTS': 3.395758122743682,
 'TRAVEL_AND_LOCAL': 2.33528880866426,
 'TOOLS': 8.461191335740072,
 'PERSONALIZATION': 3.3167870036101084,
 'PRODUCTIVITY': 3.892148014440433,
 'PARENTING': 0.6

In [39]:
for category in categories_google_play:
    total = 0 
    len_category = 0 
    for app in google_play_final:
        category_app = app[1]
        if category_app == category: 
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
    average_n_installs = total / len_category
    print(category, ": ", average_n_installs)

ART_AND_DESIGN :  1986335.0877192982
AUTO_AND_VEHICLES :  647317.8170731707
BEAUTY :  513151.88679245283
BOOKS_AND_REFERENCE :  8767811.894736841
BUSINESS :  1712290.1474201474
COMICS :  817657.2727272727
COMMUNICATION :  38456119.167247385
DATING :  854028.8303030303
EDUCATION :  1833495.145631068
ENTERTAINMENT :  11640705.88235294
EVENTS :  253542.22222222222
FINANCE :  1387692.475609756
FOOD_AND_DRINK :  1924897.7363636363
HEALTH_AND_FITNESS :  4188821.9853479853
HOUSE_AND_HOME :  1331540.5616438356
LIBRARIES_AND_DEMO :  638503.734939759
LIFESTYLE :  1437816.2687861272
GAME :  15588015.603248259
FAMILY :  3695641.8198090694
MEDICAL :  120550.61980830671
SOCIAL :  23253652.127118643
SHOPPING :  7036877.311557789
PHOTOGRAPHY :  17840110.40229885
SPORTS :  3638640.1428571427
TRAVEL_AND_LOCAL :  13984077.710144928
TOOLS :  10801391.298666667
PERSONALIZATION :  5201482.6122448975
PRODUCTIVITY :  16787331.344927534
PARENTING :  542603.6206896552
WEATHER :  5074486.197183099
VIDEO_PLAYERS 

There are these categories sorted from categories with the greatest average  number of installs per app to categories with the smallest average number of installs per app:   
  
COMMUNICATION : 38,456,119.17  
VIDEO_PLAYERS : 24,727,872.45  
SOCIAL : 23,253,652.13  
PHOTOGRAPHY : 17,840,110.4  
PRODUCTIVITY : 16,787,331.34  
GAME : 15,588,015.6  
TRAVEL_AND_LOCAL : 13,984,077.71  
ENTERTAINMENT : 11,640,705.88  
TOOLS : 10,801,391.3  
NEWS_AND_MAGAZINES : 9,549,178.47  
BOOKS_AND_REFERENCE : 8,767,811.89  
SHOPPING : 7,036,877.31  
PERSONALIZATION : 5,201,482.61  
WEATHER : 5,074,486.2  
HEALTH_AND_FITNESS : 4,188,821.99  
MAPS_AND_NAVIGATION : 4,056,941.77  
SPORTS : 3,638,640.14  
FAMILY : 3,695,641.82   
LIFESTYLE : 1,437,816.27  
FINANCE : 1,387,692.48   
HOUSE_AND_HOME : 1,331,540.56  
BUSINESS : 1,712,290.15  
EDUCATION : 1,833,495.15  
FOOD_AND_DRINK : 1,924,897.74  
ART_AND_DESIGN : 1,986,335.09  
COMICS : 817,657.27  
DATING : 854,028.83  
AUTO_AND_VEHICLES : 647,317.82  
LIBRARIES_AND_DEMO : 638,503.73  
PARENTING : 542,603.62  
BEAUTY : 513,151.89  
EVENTS : 253,542.22  
MEDICAL : 120,550.62  

As we can see category with the greatest average number of installs per app is COMMUNICATION. 

In [40]:
for app in google_play_final:
    if app[1] == "COMMUNICATION":
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

### Why not communication app?

Communcation apps are dominated by giants like WhatsApp, Messenger etc. 
It will be pretty difficult to create communication app that will beat these giants and become one of the most popular communication app.   
  
If we want to make profit from in-app adds that kind of app can also not be profitable because of little amount of them.  
  
In that category, the average number of installs are heavily skewed up by these few giants mentioned above. This means that communication apps seem more popular than they really are.   
  
If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:

In [41]:
under_100_m = []

for app in google_play_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

### Why Travel Together App

It might be a good idea to create an app that connects two categories like I suggested above.  
  
Travel together app enables to connect practical purposes of the app with fun. Moreover, people often spend a lot of time planning their travels so it might be easier to make big profit from in-app adds.   
  
Traveling is connected with booking an accomodation, finding transport to the destionation, looking for great restaurants and attractions. For that reason many hotels, restaurants, transporters etc. might decide to collaborate with us in return for advertisement.

## Conclusions

In this project, I analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.  
  
I found out that there are many categories that are dominated by few giant apps with a lot of installs, users so it might be really hard to break through.  
  
I came to the conclusion that the good idea might be to connect two popular categories, like for example social networking and navigation, tools.  
  
Social networking category is mainly created for fun and it is dominated by some giants apps. But we can create an app that will also have practical purposes like: planning a travel, navigate to the destination, arranging transport and accomodation, create travel group etc.  
  
So I suggest to create Travel Together App that also enables to callaborate with a lot of companies that offer accomodations, transports, attractions and food in return for advertisement.