# App Profile Recommendation.

This project is about analysing data to help developers of a company that builds Android and iOS mobile apps, understand what type of apps are likely to attract more users.

The company only build apps that are free to download and install, and their main source of revenue consists of in-app ads. This means that the number of users of their apps determines their revenue for any given app — the more users who see and engage with the ads, the better.

### Packages import section.

In [1]:
import csv

### Opening and Exploring the data.
We use the `extract_data()` function to access the file and save it in our code as a list of lists. Then we use the `explore_data()` function to explore the data.

In [2]:
# access files and return it in our code as a list of lists
def extract_data(data_set):
    opened_file = open(data_set, encoding='utf8')
    reader = list(csv.reader(opened_file))

    return reader

# display row in a dataset from a starting index to an ending index. 
# user can specify if he wants the function to show the number of rows and columns.
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        # adds a new (empty) line after each row
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        print('\n')

*IOS apps*

Opening and exploring apps from AppleStore.csv.

In [3]:
ios_apps = extract_data("AppleStore.csv")

explore_data(ios_apps, 0, 1, rows_and_columns=True)
explore_data(ios_apps, 1, 5)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Number of rows: 7198
Number of columns: 16


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']




As part of our analysis we will focus on the following columns : 'track_name', 'price', 'rating_count_tot', 'user_rating', 'prime_genre'.
The documentation about this dataset can be found [here](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps).

*Android apps*

Opening and exploring apps from googleplaystore.csv.

In [4]:
android_apps = extract_data("googleplaystore.csv")

explore_data(android_apps, 0, 1, rows_and_columns=True)
explore_data(android_apps, 1, 5)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Number of rows: 10842
Number of columns: 13


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']




As part of our analysis we will focus on the following columns : 'App', 'Category', 'Reviews', 'Type', 'Price', Genres. The documentation about this dataset can be found [here](https://www.kaggle.com/datasets/lava18/google-play-store-apps).

### Data cleaning

##### Deleting rows with missing entries.

*Android apps*

From [on of the discussions](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015) about Google play dataset, we remark that a certain entry at index 10472 is missing. Let's check it and if true, delete that row.
Since we don't know if the user reporting the error have removed the header row or not, we will check row **10472, 10473**.

In [5]:
print(android_apps[10472])
print(android_apps[10473])

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


After checking, we noticed that, at row 10473 in our dataset, the **category** information is missing. We'll therefore delete that row. 

In [6]:
del android_apps[10473]

*IOS apps*

There is no missing entry to notice about that dataset.

##### Removing duplicate rows.

*Android apps*

Now, let's check if the Google Play dataset has duplicate row.
To do that, we will do the following :
* Create two lists: one for storing the name of duplicate apps, and one for storing the name of unique apps.
* Loop through the android data set (the Google Play data set), and for each iteration, we will do the following:
    * We save the app name to a variable nam `name`.
    * If `name` is already in the `unique_apps list`, we append `name` to the `duplicate_apps` list.
    * Else (if `name` isn't already in the `unique_apps` list), we append `name` to the `unique_apps` list.

In [7]:
duplicate_apps = []
unique_apps = []

for app in android_apps:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print(f"Number of duplicate apps : {len(duplicate_apps)}")

Number of duplicate apps : 1181


We see that `duplicate_apps` list is not empty. That proves, that `android_apps` has duplicate apps. Let's show some of them.

In [8]:
print(f"Examples of duplicate apps: {duplicate_apps[:15]}")

Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Let's take 1 or 2 of these apps, and see their structure.

In [9]:
for app in android_apps:
    name = app[0]
    if name == 'Slack':
        print(app)

print()

for app in android_apps:
    name = app[0]
    if name == 'Quick PDF Scanner + OCR FREE':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+

We notice that the main difference  in each group of apps happens on the fourth position of each row, which corresponds to the number of reviews.

We can use this information to build a criterion for removing the duplicates. The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

To remove the duplicates, we will do the following:
* Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
* Use the information stored in the dictionary and create a new dataset, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

We expect a dictionary with a length of 10842 - 1181 = 9659.

In [10]:
reviews_max = {}
for app in android_apps[1:]:
    name = app[0]
    n_reviews = float(app[3])

    if name in reviews_max and reviews_max[name] < n_reviews :
        reviews_max[name] = n_reviews
    elif name not in reviews_max :
        reviews_max[name] = n_reviews
        
print(len(reviews_max))

9659


Now let's remove the duplicate row.
To do that, we just create 2 lists : `android_clean`, that will save our clean data set, and `already_added` that save the name of the apps already added in `android_clean`.
`len(android_clean)` should be 9659.

In [11]:
# store our new cleaned data set
android_clean = []
# store app names
already_added = []
for app in android_apps[1:]:
    name = app[0]
    n_reviews = float(app[3])

    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

print(len(android_clean))
    

9659


*IOS apps*

We don't need to remove the duplicate app entries because there are no duplicates. This dataset uses a unique id to identify each entry.

##### Removing non-english apps

Our company target an english speaking audience, so let's remove non-english apps.  
To do that, we will remove each app with a name containing more than 3 symbols that isn't commonly used in English text. Each character we use in a string has a corresponding number associated with it. The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system.  
We will do the following :
    * First build a function that detects whether a character belongs to the set of common English characters or not. We can get the corresponding number of each character using the `ord()`.

In [12]:
# take a string and return False if that string contains non-english characters
# else, return True.
# This function will also serve for IOS apps.
def english(a_string):
    non_english_char = 0
    for char in a_string:
        if ord(char) > 127 and non_english_char <= 3:
            non_english_char += 1
        elif non_english_char > 3:
            return False
    else:
        return True

# Testing if the function work well.
print(english('Instagram'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))

True
False
True
True


*Android apps*

Now let's use `english` function to filter our android dataset.

In [13]:
android_english_apps = []
for app in android_clean:
    name = app[0]
    if english(name):
        android_english_apps.append(app)

print(len(android_english_apps))
print(android_english_apps[:5])

9616
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]


*IOS apps*

As we did for Android apps, we will use `english`  function to filter IOS dataset.

In [14]:
ios_english_apps = []
for app in ios_apps[1:]:
    name = app[1]
    if english(name):
        ios_english_apps.append(app)

print(len(ios_english_apps))
print(ios_english_apps[:5])

6226
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']]


#####  Isolating the Free Apps


Our company build apps that are free to download and install, we will therefore remove apps that aren't free.

*Android apps*

In [15]:
cleaned_android = []
for app in android_english_apps:
    app_type = app[6]

    if app_type == 'Free':
        cleaned_android.append(app)

print(len(cleaned_android))
print(cleaned_android[:5])

8865
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]


*IOS apps*

In [16]:
cleaned_ios = []
for app in ios_english_apps:
    price = app[4]

    if price == '0.0':
        cleaned_ios.append(app)

print(len(cleaned_ios))
print(cleaned_ios[:5])

3253
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']]


### Most Common Genre in each market (App Store and Google Play markets)

As we mentioned in the introduction, our goal is to determine the kinds of apps that are likely to attract more users both Google Play and the App Store, because the number of people using our apps affect our revenue.  

To minimize risks and overhead, our validation strategy for an app idea has three steps:
* Build a minimal Android version of the app, and add it to Google Play.
* If the app has a good response from users, we develop it further.
* If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

To find the kind of app that fit our interest, we will begin the analysis by determining the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our datasets.   
So for the android dataset, we'll built a frequency table on the columns `Genres` and `Category`; and for the ios dataset we'll use the column `prime_genre`.

We'll build two functions we can use to analyze the frequency tables:
* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in a descending order


In [17]:
# generate frequency tables that show percentages
def freq_table(dataset, index):
    table = {}
    for row in dataset:
        key = row[index]
        if key in table:
            table[key] += 1
        else:
            table[key] = 1

    for key in table:
        table[key] = table[key] * 100 / len(dataset)

    return table

# function we can use to display the percentages in a descending order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_sorted = sorted(table.items(), reverse = True, key = lambda kv : kv[1])
    for key, value in table_sorted:
        print(f"{key} : {value}")

*Android apps*

 Let's use our `display_table` function to display the frequency table of the columns `Genres`, and `Category`.

In [18]:
# display frequency table of Genres
display_table(cleaned_android, -4)

Tools : 8.448956570783983
Entertainment : 6.068809926677947
Education : 5.346869712351946
Business : 4.591088550479413
Lifestyle : 3.902989283699944
Productivity : 3.8917089678510997
Finance : 3.699943598420756
Medical : 3.5307388606880994
Sports : 3.4630569655950367
Personalization : 3.3164128595600677
Communication : 3.2374506486181613
Action : 3.102086858432036
Health & Fitness : 3.0795262267343486
Photography : 2.9441624365482233
News & Magazines : 2.7975183305132543
Social : 2.662154540327129
Travel & Local : 2.323745064861816
Shopping : 2.2447828539199097
Books & Reference : 2.1545403271291597
Simulation : 2.0417371686407217
Dating : 1.8612521150592216
Arcade : 1.849971799210378
Video Players & Editors : 1.7710095882684715
Casual : 1.7597292724196278
Maps & Navigation : 1.3987591652566271
Food & Drink : 1.2408347433728144
Puzzle : 1.1280315848843767
Racing : 0.9926677946982515
Libraries & Demo : 0.9362662154540327
Role Playing : 0.9362662154540327
Auto & Vehicles : 0.924985899605

In [19]:
# display frequency table of Category
display_table(cleaned_android, 1)

FAMILY : 18.89452904681331
GAME : 9.723632261703328
TOOLS : 8.460236886632826
BUSINESS : 4.591088550479413
LIFESTYLE : 3.9142695995487875
PRODUCTIVITY : 3.8917089678510997
FINANCE : 3.699943598420756
MEDICAL : 3.5307388606880994
SPORTS : 3.395375070501974
PERSONALIZATION : 3.3164128595600677
COMMUNICATION : 3.2374506486181613
HEALTH_AND_FITNESS : 3.0795262267343486
PHOTOGRAPHY : 2.9441624365482233
NEWS_AND_MAGAZINES : 2.7975183305132543
SOCIAL : 2.662154540327129
TRAVEL_AND_LOCAL : 2.33502538071066
SHOPPING : 2.2447828539199097
BOOKS_AND_REFERENCE : 2.1545403271291597
DATING : 1.8612521150592216
VIDEO_PLAYERS : 1.793570219966159
MAPS_AND_NAVIGATION : 1.3987591652566271
FOOD_AND_DRINK : 1.2408347433728144
EDUCATION : 1.161872532430908
ENTERTAINMENT : 0.9588268471517203
LIBRARIES_AND_DEMO : 0.9362662154540327
AUTO_AND_VEHICLES : 0.924985899605189
HOUSE_AND_HOME : 0.8234630569655951
WEATHER : 0.8009024252679074
EVENTS : 0.7106598984771574
PARENTING : 0.6542583192329385
ART_AND_DESIGN : 0.

In Google Play Store, `FAMILY` is the most common genres if we consider the column `Category`, and if we consider the column `Tools` is the most common genre. Here the frequencies are more balanced.
This classement although useful, doesn't show us if apps in games genre are popular. A lot of apps installed are in this genre, but are this apps popular ?

*IOS apps*  
We display the frequency table of the columns `prime_genre`.

In [20]:
display_table(cleaned_ios, -5)

Games : 57.854288349216105
Entertainment : 7.931140485705503
Photo & Video : 4.918536735321242
Education : 3.627420842299416
Social Networking : 3.258530587150323
Shopping : 2.6744543498309254
Utilities : 2.5207500768521367
Sports : 2.1211189671072854
Music : 2.028896403320012
Health & Fitness : 1.9981555487242546
Productivity : 1.7522287119581925
Lifestyle : 1.5985244389794036
News : 1.3218567476175838
Travel : 1.291115893021826
Finance : 1.2603750384260684
Weather : 0.8914847832769751
Food & Drink : 0.8914847832769751
Reference : 0.5533353827236397
Business : 0.5533353827236397
Book : 0.4611128189363664
Navigation : 0.18444512757454656
Medical : 0.18444512757454656
Catalogs : 0.12296341838303104


So in App Store, `Games` is the common genre with near 58% of the install apps being in that genre. It's followed by `Entertainment` and `Photo & Video` with respectively a share of near 8% and 5%. The gap between the first genre and the second is huge.  
Apps designed for entertainement dominate this classement.  
This classement although useful, doesn't show us if apps in `games` genre are popular. A lot of apps installed are in this genre, but are this apps popular ?

### Most Popular Genre in each Market
We will find what genres are the most popular (have the most users) by calculating the average number of installs for each app genre.  
For the Google Play data set, we can find this information in the `Installs` column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` column.  
To calculate the average number of user ratings per app genre, we'll do the following :
* Isolate the apps of each genre
* Add up the user ratings for the apps of that genre
* Divide the sum by the number of apps belonging to that genre (not by the total number of apps)

*IOS apps*

In [21]:
# generate a frequency table for the prime_genre column
table_genre = freq_table(cleaned_ios, -5)
table_user_ratings_number = {}
for genre in table_genre:
    total = 0
    len_genre = 0
    for row in cleaned_ios:
        genre_app = row[-5]
        if genre_app == genre :
            total += float(row[5])
            len_genre += 1
    
    avg_user_ratings_number = total / len_genre
    table_user_ratings_number[genre] = avg_user_ratings_number

table_user_ratings_number_sorted = sorted(table_user_ratings_number.items(), reverse = True, 
                                          key = lambda kv : kv[1])
for key, value in table_user_ratings_number_sorted:
    print(f"{key}: {value}")

Navigation: 86090.33333333333
Reference: 74942.11111111111
Social Networking: 71548.34905660378
Music: 57326.530303030304
Weather: 50477.137931034486
Book: 37217.73333333333
Food & Drink: 29885.758620689656
Photo & Video: 28441.54375
Finance: 27638.243902439026
Travel: 26925.166666666668
Shopping: 25996.32183908046
Health & Fitness: 23298.015384615384
Sports: 23008.898550724636
Games: 22691.801806588734
News: 21248.023255813954
Productivity: 20702.19298245614
Utilities: 18460.353658536584
Lifestyle: 16168.73076923077
Entertainment: 13831.282945736433
Business: 7075.333333333333
Education: 7003.983050847458
Catalogs: 4004.0
Medical: 612.0


`Navigation` is the most popular Genre in Apple Store, followed by `Reference` and `Social Networking`.  
Let's check what are the most popular apps in these genre.

In [22]:
apps = {}
for row in cleaned_ios:
    genre_app = row[-5]
    app_name = row[1]
    nb_of_ratings = row[5]
    if genre_app == 'Navigation':
        apps[app_name] = nb_of_ratings

for key, value in apps.items():
    print(f"{key}: {value}")
        

Waze - GPS Navigation, Maps & Real-time Traffic: 345046
Google Maps - Navigation & Transit: 154911
Geocaching®: 12811
CoPilot GPS – Car Navigation & Offline Maps: 3582
ImmobilienScout24: Real Estate Search in Germany: 187
Railway Route Search: 5


The popularity of `Navigation` genre is mainly supported by one app : `Waze - GPS Navigation, Maps & Real-time Traffic`. Likewise, the popularity of `Reference` is mainly due to `Bible`. Basically in the first 3 popular genre, the popularity is due to a few apps. Too much disparities.

In [23]:
apps = {}
for row in cleaned_ios:
    genre_app = row[-5]
    app_name = row[1]
    nb_of_ratings = row[5]
    if genre_app == 'Book':
        apps[app_name] = nb_of_ratings

for key, value in apps.items():
    print(f"{key}: {value}")

Kindle – Read eBooks, Magazines & Textbooks: 252076
Audible – audio books, original series & podcasts: 105274
Color Therapy Adult Coloring Book for Adults: 84062
OverDrive – Library eBooks and Audiobooks: 65450
HOOKED - Chat Stories: 47829
快看漫画: 1647
BookShout: Read eBooks & Track Your Reading Goals: 879
Dr. Seuss Treasury — 50 best kids books: 451
Green Riding Hood: 392
Weirdwood Manor: 197
MangaZERO - comic reader: 9
ikouhoushi: 0
MangaTiara - love comic reader: 0
謎解き: 0
謎解き2016: 0


Here too, apps for reading are very ppopular.
Maybe what we can do is to create a digital version of a popular book. And since it seems that religious book are popular, it can be a digital version of a great religious book.

*Android apps*

In [24]:
# generate a frequency table for the Category column
table_category = freq_table(cleaned_android, 1)
table_install_number = {}
for category in table_category:
    total = 0
    len_category = 0
    for row in cleaned_android:
        category_app = row[1]
        if category_app == category :
            total += float(row[5].strip("+").replace(",",""))
            len_category += 1
    
    avg_install_number = total / len_category
    table_install_number[category] = avg_install_number

table_install_number_sorted = sorted(table_install_number.items(), reverse = True, 
                                          key = lambda kv : kv[1])
for key, value in table_install_number_sorted:
    print(f"{key}: {value}")

COMMUNICATION: 38456119.167247385
VIDEO_PLAYERS: 24727872.452830188
SOCIAL: 23253652.127118643
PHOTOGRAPHY: 17840110.40229885
PRODUCTIVITY: 16787331.344927534
GAME: 15588015.603248259
TRAVEL_AND_LOCAL: 13984077.710144928
ENTERTAINMENT: 11640705.88235294
TOOLS: 10801391.298666667
NEWS_AND_MAGAZINES: 9549178.467741935
BOOKS_AND_REFERENCE: 8721959.47643979
SHOPPING: 7036877.311557789
PERSONALIZATION: 5201482.6122448975
WEATHER: 5074486.197183099
HEALTH_AND_FITNESS: 4188821.9853479853
MAPS_AND_NAVIGATION: 4056941.7741935486
FAMILY: 3697848.1731343283
SPORTS: 3638640.1428571427
ART_AND_DESIGN: 1986335.0877192982
FOOD_AND_DRINK: 1924897.7363636363
EDUCATION: 1833495.145631068
BUSINESS: 1712290.1474201474
LIFESTYLE: 1433701.5244956773
FINANCE: 1387692.475609756
HOUSE_AND_HOME: 1331540.5616438356
DATING: 854028.8303030303
COMICS: 817657.2727272727
AUTO_AND_VEHICLES: 647317.8170731707
LIBRARIES_AND_DEMO: 638503.734939759
PARENTING: 542603.6206896552
BEAUTY: 513151.88679245283
EVENTS: 253542.222

`COMMUNICATION` is the most popular category, followed by `VIDEO_PLAYERS` and `SOCIAL`.

In [25]:
apps = {}
for row in cleaned_android:
    category = row[1]
    app_name = row[0]
    nb_of_install = row[5]
    if category == 'COMMUNICATION':
        apps[app_name] = nb_of_install

for key, value in apps.items():
    print(f"{key}: {value}")

WhatsApp Messenger: 1,000,000,000+
Messenger for SMS: 10,000,000+
My Tele2: 5,000,000+
imo beta free calls and text: 100,000,000+
Contacts: 50,000,000+
Call Free – Free Call: 5,000,000+
Web Browser & Explorer: 5,000,000+
Browser 4G: 10,000,000+
MegaFon Dashboard: 10,000,000+
ZenUI Dialer & Contacts: 10,000,000+
Cricket Visual Voicemail: 10,000,000+
TracFone My Account: 1,000,000+
Xperia Link™: 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard: 10,000,000+
Skype Lite - Free Video Call & Chat: 5,000,000+
My magenta: 1,000,000+
Android Messages: 100,000,000+
Google Duo - High Quality Video Calls: 500,000,000+
Seznam.cz: 1,000,000+
Antillean Gold Telegram (original version): 100,000+
AT&T Visual Voicemail: 10,000,000+
GMX Mail: 10,000,000+
Omlet Chat: 10,000,000+
My Vodacom SA: 5,000,000+
Microsoft Edge: 5,000,000+
Messenger – Text and Video Chat for Free: 1,000,000,000+
imo free video calls and chat: 500,000,000+
Calls & Text by Mo+: 5,000,000+
free video calls and chat: 50,000

Here again the disparities among apps in the most popular genre are too much. The market is largely dominate by big companies. Since we already recommend to create an app in the `Book` in the Apple Store, let's check this category here.

In [26]:
pps = {}
for row in cleaned_android:
    category = row[1]
    app_name = row[0]
    nb_of_install = row[5]
    if category == 'BOOKS_AND_REFERENCE':
        apps[app_name] = nb_of_install

for key, value in apps.items():
    print(f"{key}: {value}")

WhatsApp Messenger: 1,000,000,000+
Messenger for SMS: 10,000,000+
My Tele2: 5,000,000+
imo beta free calls and text: 100,000,000+
Contacts: 50,000,000+
Call Free – Free Call: 5,000,000+
Web Browser & Explorer: 5,000,000+
Browser 4G: 10,000,000+
MegaFon Dashboard: 10,000,000+
ZenUI Dialer & Contacts: 10,000,000+
Cricket Visual Voicemail: 10,000,000+
TracFone My Account: 1,000,000+
Xperia Link™: 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard: 10,000,000+
Skype Lite - Free Video Call & Chat: 5,000,000+
My magenta: 1,000,000+
Android Messages: 100,000,000+
Google Duo - High Quality Video Calls: 500,000,000+
Seznam.cz: 1,000,000+
Antillean Gold Telegram (original version): 100,000+
AT&T Visual Voicemail: 10,000,000+
GMX Mail: 10,000,000+
Omlet Chat: 10,000,000+
My Vodacom SA: 5,000,000+
Microsoft Edge: 5,000,000+
Messenger – Text and Video Chat for Free: 1,000,000,000+
imo free video calls and chat: 500,000,000+
Calls & Text by Mo+: 5,000,000+
free video calls and chat: 50,000

It looks like there are only a few very popular apps, so this market still shows potential.   
Taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

### Conclusion

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.