# Profitable App Profiles for the App Store and Google Play Markets
Our project objective is to identify profitable mobile app profiles suitable for both the App Store and Google Play markets. As data analysts within a company specializing in Android and iOS app development, our role is to empower our developers with data-driven insights for strategic decision-making.

In our company, we exclusively focus on creating apps that are free to download and install, generating revenue primarily through in-app advertisements. The success of each app is closely tied to user engagement. Therefore, our aim in this analysis is to provide valuable data that guides our developers in creating apps likely to attract a larger user base.

### Opening and Exploring the Data
As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play. Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost.


In [1]:
#opening datasets
from csv import reader
#open Google Store dataset
with open("googleplaystore.csv", encoding="utf8") as google_file:
    read_file = reader(google_file)
    android = list(read_file)
    android_header = android[0]
    android = android[1:]

#open AppStore dataset
with open("AppleStore.csv", encoding="utf8") as app_file:
    read_file = reader(app_file)
    ios = list(read_file)
    ios_header = ios[0]
    ios = ios[1:]
    



- To simplify your exploration, we've developed a function called explore_data(). You can use this function repeatedly to display rows in a clear and readable format.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
explore_data(ios,0,3, rows_and_columns=False)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']




In [4]:
explore_data(android,0,3, rows_and_columns=False)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']




### Determining the number of rows and columns in the datasets.
- Google Play store: 13 columns, 10841 rows
- App Store: 16 columns, 7197 rows.

In [5]:
#Google Play Store
# Number of columns is the length of the header
num_columns = len(android_header)
print(f'Number of columns:{num_columns}')

# Number of rows is the length of the data
num_rows = len(android)
print(f'Number of rows: {num_rows}')

Number of columns:13
Number of rows: 10841


In [6]:
#Apple Store
# Number of columns is the length of the header
num_columns = len(ios_header)
print(f'Number of columns: {num_columns}')

# Number of rows is the length of the data
num_rows = len(ios)
print(f'Number of rows: {num_rows}')

Number of columns: 16
Number of rows: 7197


### Analyzing columns` names. 


In [7]:
#Google Play store columns
print(android_header)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


The Google Play dataset comprises 10,841 apps and includes 13 columns. Upon initial inspection, the columns deemed potentially valuable for our analysis include 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.More information about each columns can be found in the dataset [documentation](https://www.kaggle.com/datasets/lava18/google-play-store-apps).

In [8]:
#Appstore Columns
print(ios_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


There are 7,197 iOS apps within this dataset, featuring columns such as 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre' that appear to be of particular interest. While not all column names are immediately clear, comprehensive details about each column can be referenced in the dataset [documentation](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps).

### Data Cleaning
We need to make sure the data we analyze is accurate, or the results of our analysis will be wrong. This means that we need to:

- Detect inaccurate data, and correct or remove it.
- Detect duplicate data, and remove the duplicates.

Also, our goal is to make _Free_ apps for _English-spekaing_ audience. Thus, we need to:
- Delete non-English app
- Delete paid apps

### Deleting Wrong Data
There is an error in row 10472 that corresponds to the app 'Life Made WI-Fi Touchscreen Photo Frame'. The rating of this app is 19, that is clearly off, because the maximum rating for a Google Play is 5. 

In [9]:
print(android_header)
print(android[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [10]:
#Using zip() finction to demonstare that "Rating" for the 10472 entry is off.
x = zip(android_header, android[10472])
print(list(x))

[('App', 'Life Made WI-Fi Touchscreen Photo Frame'), ('Category', '1.9'), ('Rating', '19'), ('Reviews', '3.0M'), ('Size', '1,000+'), ('Installs', 'Free'), ('Type', '0'), ('Price', 'Everyone'), ('Content Rating', ''), ('Genres', 'February 11, 2018'), ('Last Updated', '1.0.19'), ('Current Ver', '4.0 and up')]


In [11]:
for row in android:
    header_len = len(android_header)
    row_len = len(row)
    if row_len != header_len:
        print(row)
        print('The number of colums for this entry:', len(row))
        print('Index position is:',android.index(row))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
The number of colums for this entry: 12
Index position is: 10472


We found out that entry __10472__ in the Google Play dataset is missing "Rating" column. The dataset contains 13 columns, but the 10472 entry has 12.

### Deleting the row that has an error.

In [12]:
print(len(android))
del android[10472]
print(len(android))

10841
10840


### Removing Duplicate Entries
After exploring the Google Play dataset, we can notice, that is has duplicate entries. For example, _"Instagram"_ can be found four times or _"ZOOM Cloud Meetings"_ has two duplicates.

In [13]:
count = 0
for app in android:
    name = app[0]
    if name == "Instagram":
        count += 1
        print(app)
print(f"There are {count} duplicates.")


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
There are 4 duplicates.


In [14]:
count = 0
for app in android:
    name = app[0]
    if name == "ZOOM Cloud Meetings":
        count += 1
        print(app)
print(f"There are {count} duplicates.")


['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up']
['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up']
There are 2 duplicates.


In [15]:
duplicate_apps = []
unique_apps = []
for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
print(f'Number of duplicate apps: {len(duplicate_apps)}')
print('\n')
print(f'Example of duplicate apps: {duplicate_apps[:10]}')

Number of duplicate apps: 1181


Example of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


The total number of duplicates in the Google Play dataset is __1181__ . 
We will not remove the duplicated randomly, instead we can examine the number of reviews: the higer number of reviews, the more recent the data should be.

In [42]:
count = 0
for app in android:
    name = app[0]
    if name == "Instagram":
        count += 1
        print("The number of reviews:", app[3])

The number of reviews: 66577313
The number of reviews: 66577446
The number of reviews: 66577313
The number of reviews: 66509917


To remove duplicate entries and keep only one entry per app based on the criterion of the highest number of reviews, we can follow these steps:
- Create an empty dictionary to store unique app names as keys and their corresponding highest number of reviews as values.
- Iterate through your dataset and for each app:
- If the app is not already in the dictionary, add it with the number of reviews as the value.
- If the app is already in the dictionary, update the value with the maximum of the current number of reviews and the existing value.

Once we have the dictionary, create a new dataset using the unique app names and their corresponding highest number of reviews.

In [17]:
#Creating dictionary where each key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews


We found out earlier that the Google Data set has 1181 duplicates. So, we need to inspect reviews_max dictionary to make sure everything went as expected.


In [18]:
print(f'Expected length: {len(android)-1181}')
print(f'Length of the unique apps dcitionary: {len(reviews_max)}')

Expected length: 9659
Length of the unique apps dcitionary: 9659


In [19]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

print(android_clean[:5])
print('\n')
print(already_added[:5])
        
    
    


[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]


['Photo Editor & Candy Camera & Grid & ScrapBook', 'U Launcher Lite – FREE Live Cool Themes, 

In [20]:
explore_data(android_clean,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


### Data Cleaning: Removing Non-English Apps
Our objective is to examine applications intended for an English-speaking audience. Upon closer inspection of the provided datasets, it is evident that both datasets include apps designed for a non-English speaking audience.

In [21]:
print(ios[813][1])
print(ios[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


First step is to write a function that takes a string and returns _False_ if there are more than three characters that fall outside the ASCII range(0-127); otherwise, the function returns _True_ .

In [22]:
def is_english(data_string):
    count = 0
    for ch in data_string:
        if ord(ch) > 127:
            count += 1
        if count > 3:
            return False
    return True        

Testing fuction: to check whether these app names are detected as English or non-English

In [23]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


We use the is_english() function to filter out the non-English apps for both data sets:

In [24]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)

for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)

explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

### Data Cleaning: Isolating the Free Apps
We exclusively develop applications that users can download and install for free. Our primary revenue comes from in-app advertisements. Our datasets include both free and non-free apps, and for our analysis, we need to focus solely on the free apps.

In [25]:
#Google Play Store: isolation the free apps. 
android_free = []
android_not_free = [] #for checking purpose
for app in android_english:
    price = app[6]
    if price == "Free":
        android_free.append(app)
    else:
        android_not_free.append(app)
        
explore_data(android_free, 0, 2, True)
print('\n')
explore_data(android_not_free, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 8863
Number of columns: 13


['TurboScan: scan documents and receipts in PDF', 'BUSINESS', '4.7', '11442', '6.8M', '100,000+', 'Paid', '$4.99', 'Everyone', 'Business', 'March 25, 2018', '1.5.2', '4.0 and up']


['Tiny Scanner Pro: PDF Doc Scan', 'BUSINESS', '4.8', '10295', '39M', '100,000+', 'Paid', '$4.99', 'Everyone', 'Business', 'April 11, 2017', '3.4.6', '3.0 and up']


Number of rows: 751
Number of columns: 13


The _android_free_ dataset contains __8863__ rows.

In [26]:
print(f'Lenght of the english only android dataset: {len(android_english)}')
print(f'Lenght of the english only and free android dataset: {len(android_free)}')
print(f'Lenght of the english only and not free android dataset: {len(android_not_free)}')


Lenght of the english only android dataset: 9614
Lenght of the english only and free android dataset: 8863
Lenght of the english only and not free android dataset: 751


In [27]:
#Apple Store isolation the free apps.
ios_free = []
ios_not_free = []
for app in ios_english:
    price = app[4]
    if price == "0.0":
        ios_free.append(app)
    else:
        ios_not_free.append(app)
        
explore_data(ios_free, 0, 2, True)
print('\n')
explore_data(ios_not_free, 0, 2, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 3222
Number of columns: 16


['362949845', 'Fruit Ninja Classic', '104590336', 'USD', '1.99', '698516', '132', '4.5', '4.0', '2.3.9', '4+', 'Games', '38', '5', '13', '1']


['500116670', 'Clear Vision (17+)', '37879808', 'USD', '0.99', '541693', '69225', '4.5', '4.5', '1.1.3', '17+', 'Games', '43', '5', '1', '1']


Number of rows: 2961
Number of columns: 16


The ios final dataset contains __3222__ rows.

In [28]:
print(f'Lenght of the english only ios dataset: {len(ios_english)}')
print(f'Lenght of the english only and free ios dataset: {len(ios_free)}')
print(f'Lenght of the english only and not free ios dataset: {len(ios_not_free)}')

Lenght of the english only ios dataset: 6183
Lenght of the english only and free ios dataset: 3222
Lenght of the english only and not free ios dataset: 2961


### Selecting the Most Common Apps by Genre
Our objective is to identify app types that have a high potential to attract a larger user base, as our revenue is closely tied to app usage. Since our ultimate goal is to launch the app on both Google Play and the App Store, we aim to discover successful app profiles that resonate well with users on both platforms.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

> 1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

 We need to build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages
- Another function we can use to display the percentages in a descending order

In [29]:
#A function for generating frequency tables
def freq_table(dataset, index):
    if index < 0 or index >= len(dataset[0]):  # To ensure that the input index is valid and within the bounds of the dataset
        raise ValueError("Invalid index")
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentage = {}
    
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentage[key] = percentage
    
    return table_percentage

#A function to convert the dictionary into list of tupels and sort it. 
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])        
       

In [30]:
display_table(ios_free,11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Among the free English apps, over half (58.16%) fall into the gaming category, with entertainment apps following closely at around 8%. Photo and video apps make up nearly 5%, while education-focused apps represent only 3.66%. Social networking apps account for 3.29% of the dataset.

The overall trend suggests that the App Store, specifically the section featuring free English apps, is predominantly filled with leisure-oriented applications (games, entertainment, photo and video, social networking, sports, music, etc.). In contrast, apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are comparatively less common. It's important to note that while fun apps may dominate in quantity, it doesn't necessarily correlate with having the highest number of users, as demand may differ from the available offerings.

In [31]:
 display_table(android_free, 1) #Category

FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

The scenario appears noticeably distinct on Google Play, where there's a notable absence of apps primarily focused on entertainment. Instead, a substantial number of apps seem to cater to practical purposes such as family, tools, business, lifestyle, and productivity. However, delving deeper into the data reveals that the family category, constituting nearly 19% of the apps, is primarily comprised of games designed for kids.



It seems that practical applications have a better representation on Google Play compared to App Store:

In [32]:
display_table(android_free, 9) #Genres

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

Based on the Genres and Category columns, we can notice is that the Genres column has more categories. We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we can say that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. 


### Most Popular Apps by Genre on the App Store
In order to find out what genres are the most popular, we should calculate the average number of installs for each app genre. For the Google Store dataset, we can see this information in the `Install` column, but the Apple dataset is missiong this information. 
The follow steps are:
* Isolating the apps of each genre
* Add up the user ratings for the apps of that genre
* Divide the sum by the number of apps belonging to that genre 
* Sort to illustrate the most popular genres.

In [33]:
genres_ios = freq_table(ios_free, 11)
avg_installs_list_apple = []  # Creating a list to sort the average user ratings

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    
    avg_user_rating = total / len_genre

    avg_installs_list_apple.append((genre, avg_user_rating))

# Sort the list of tuples based on average user ratings in descending order
sorted_avg_installs = sorted(avg_installs_list_apple, key=lambda x: x[1], reverse=True)

# Print the sorted result
for genre, avg_user_rating in sorted_avg_installs:
    print(f'{genre}: {avg_user_rating}')
        

Navigation: 86090.33333333333
Reference: 74942.11111111111
Social Networking: 71548.34905660378
Music: 57326.530303030304
Weather: 52279.892857142855
Book: 39758.5
Food & Drink: 33333.92307692308
Finance: 31467.944444444445
Photo & Video: 28441.54375
Travel: 28243.8
Shopping: 26919.690476190477
Health & Fitness: 23298.015384615384
Sports: 23008.898550724636
Games: 22788.6696905016
News: 21248.023255813954
Productivity: 21028.410714285714
Utilities: 18684.456790123455
Lifestyle: 16485.764705882353
Entertainment: 14029.830708661417
Business: 7491.117647058823
Education: 7003.983050847458
Catalogs: 4004.0
Medical: 612.0


We can see that the most popular apps on the Apple Store belong to `Navigation`, followed by `Reference` and `Social Networking`. 

On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

In [34]:
for app in ios_free:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings


Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The similar pattern we can see in the Social Networking genre , where the average number is heavily influenced by a few ginats like Facebook, Skype, Messenger, and etc. 

In [35]:
for app in ios_free:
    if app[11] == 'Social Networking':
        print(app[1], ':', app[5]) # print name and number of ratings

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

Exploring opportunities in the Health and Fitness domain, we're considering the development of an app featuring on-demand training videos, nutritional guidance, and a social hub. This venture could involve collaboration with a fitness influencer to enhance its appeal and effectiveness.

In [36]:
for app in ios_free:
    if app[11] == 'Health & Fitness':
        print(app[1], ':', app[5]) # print name and number of ratings

Calorie Counter & Diet Tracker by MyFitnessPal : 507706
Lose It! – Weight Loss Program and Calorie Counter : 373835
Weight Watchers : 136833
Sleep Cycle alarm clock : 104539
Fitbit : 90496
Period Tracker Lite : 53620
Nike+ Training Club - Workouts & Fitness Plans : 33969
Plant Nanny - Water Reminder with Cute Plants : 27421
Sworkit - Custom Workouts for Exercise & Fitness : 16819
Clue Period Tracker: Period & Ovulation Tracker : 13436
Headspace : 12819
Fooducate - Lose Weight, Eat Healthy,Get Motivated : 11875
Runtastic Running, Jogging and Walking Tracker : 10298
WebMD for iPad : 9142
8fit - Workouts, meal plans and personal trainer : 8730
Garmin Connect™ Mobile : 8341
Record by Under Armour, connects with UA HealthBox : 7754
Fitstar Personal Trainer : 7496
My Cycles Period and Ovulation Tracker : 7469
Seven - 7 Minute Workout Training Challenge : 6808
RUNNING for weight loss: workout & meal plans : 6407
Lifesum – Inspiring healthy lifestyle app : 5795
Waterlogged - Daily Hydration Tr

### Most Popular Apps by Genre on the Google Play
The dataset from Google Play offers information on the number of installs, but the figures lack precision and fail to provide a clear representation of genre popularity.

In [37]:
display_table(android_free, 5) #The install columns

1,000,000+ : 15.728308699086089
100,000+ : 11.55365000564143
10,000,000+ : 10.549475346947986
10,000+ : 10.199706645605326
1,000+ : 8.394448832223853
100+ : 6.916393997517771
5,000,000+ : 6.826131106848697
500,000+ : 5.562450637481666
50,000+ : 4.772650344127271
5,000+ : 4.513144533453684
10+ : 3.542818458761142
500+ : 3.2494640640866526
50,000,000+ : 2.3017037120613786
100,000,000+ : 2.1324607920568655
50+ : 1.9180864267178157
5+ : 0.7898002933543946
1+ : 0.5077287600135394
500,000,000+ : 0.270788672007221
1,000,000,000+ : 0.2256572266726842
0+ : 0.045131445334536835


In [38]:
categories_android = freq_table(android_free, 1)
avg_installs_list = []  # Cretaing list to sort the number of installs

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+','')
            n_installs = n_installs.replace(',','')
            total += float(n_installs)
            len_category += 1
    
    avg_installs = total / len_category
    avg_installs_list.append((category, avg_installs))
    
# Sort the list of tuples based on average installs in descending order
sorted_avg_installs = sorted(avg_installs_list, key=lambda x: x[1], reverse=True)

# Print the sorted result
for category, avg_installs in sorted_avg_installs:
     print(f'{category}: {avg_installs}')

            


COMMUNICATION: 38456119.167247385
VIDEO_PLAYERS: 24727872.452830188
SOCIAL: 23253652.127118643
PHOTOGRAPHY: 17840110.40229885
PRODUCTIVITY: 16787331.344927534
GAME: 15588015.603248259
TRAVEL_AND_LOCAL: 13984077.710144928
ENTERTAINMENT: 11640705.88235294
TOOLS: 10801391.298666667
NEWS_AND_MAGAZINES: 9549178.467741935
BOOKS_AND_REFERENCE: 8767811.894736841
SHOPPING: 7036877.311557789
PERSONALIZATION: 5201482.6122448975
WEATHER: 5074486.197183099
HEALTH_AND_FITNESS: 4188821.9853479853
MAPS_AND_NAVIGATION: 4056941.7741935486
FAMILY: 3697848.1731343283
SPORTS: 3638640.1428571427
ART_AND_DESIGN: 1986335.0877192982
FOOD_AND_DRINK: 1924897.7363636363
EDUCATION: 1833495.145631068
BUSINESS: 1712290.1474201474
LIFESTYLE: 1437816.2687861272
FINANCE: 1387692.475609756
HOUSE_AND_HOME: 1331540.5616438356
DATING: 854028.8303030303
COMICS: 817657.2727272727
AUTO_AND_VEHICLES: 647317.8170731707
LIBRARIES_AND_DEMO: 638503.734939759
PARENTING: 542603.6206896552
BEAUTY: 513151.88679245283
EVENTS: 253542.22

On average, as we can see, communication apps has the most installs: 38456119. However, there are relatively a few apps in communication, some of them like WhatsApp, Messenger, Skype, Google Chrome, Gmain, and Hangouts have more than a billion installs. 

In [39]:
for app in android_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Communication and game genres might seem more popular, but these niches seem to be taken by a few giants who are hard to compete against. 
The Health and Fitness genre looks fairy popular, with average number of installs of 4188821. Out goal is to recommend an app genre that shows potential, so we can explore this genre in more depth.

 Some of the apps in the health and finess genre:

In [40]:
for app in android_free:
    if app[1] == 'HEALTH_AND_FITNESS':
        print(app[0],': ',app[5])

Step Counter - Calorie Counter :  500,000+
Lose Belly Fat in 30 Days - Flat Stomach :  5,000,000+
Pedometer - Step Counter Free & Calorie Burner :  1,000,000+
Six Pack in 30 Days - Abs Workout :  10,000,000+
Lose Weight in 30 Days :  10,000,000+
Pedometer :  10,000,000+
LG Health :  10,000,000+
Step Counter - Pedometer Free & Calorie Counter :  10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App :  10,000,000+
Sportractive GPS Running Cycling Distance Tracker :  1,000,000+
30 Day Fitness Challenge - Workout at Home :  10,000,000+
Home Workout for Men - Bodybuilding :  1,000,000+
Fat Burning Workout - Home Weight lose :  100,000+
Buttocks and Abdomen :  500,000+
Walking for Weight Loss - Walk Tracker :  100,000+
Running & Jogging :  500,000+
Sleep Sounds :  1,000,000+
Fitbit :  10,000,000+
Lose Belly Fat-Home Abs Fitness Workout :  50,000+
Cycling - Bike Tracker :  500,000+
Abs Training-Burn belly fat :  100,000+
Calorie Counter - EasyFit free :  1,000,000+
Aunjai i lert u :  

The Health and Fitness category consists of the variery of apps, 

We can look at the apps that are somewhere in the middle in terms of installs. 

In [41]:
for app in android_free:
    if app[1] == 'HEALTH_AND_FITNESS' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Lose Belly Fat in 30 Days - Flat Stomach : 5,000,000+
Pedometer - Step Counter Free & Calorie Burner : 1,000,000+
Six Pack in 30 Days - Abs Workout : 10,000,000+
Lose Weight in 30 Days : 10,000,000+
Pedometer : 10,000,000+
LG Health : 10,000,000+
Step Counter - Pedometer Free & Calorie Counter : 10,000,000+
Pedometer, Step Counter & Weight Loss Tracker App : 10,000,000+
Sportractive GPS Running Cycling Distance Tracker : 1,000,000+
30 Day Fitness Challenge - Workout at Home : 10,000,000+
Home Workout for Men - Bodybuilding : 1,000,000+
Sleep Sounds : 1,000,000+
Fitbit : 10,000,000+
Calorie Counter - EasyFit free : 1,000,000+
Garmin Connect™ : 10,000,000+
BetterMe: Weight Loss Workouts : 5,000,000+
Bike Computer - GPS Cycling Tracker : 1,000,000+
Running Distance Tracker + : 1,000,000+
Runkeeper - GPS Track Run Walk : 10,000,000+
Walking: Pedometer diet : 1,000,000+
8fit Workouts & Meal Planner : 10,000,000+
Keep Trainer - Workout Trainer & Fitness Coach : 1,000,000+
PumpUp — Fitness Co

The Health and Fitness category offers a diverse range of apps, encompassing fitness trackers, calorie counters, step trackers, personalized workout apps, gym-related applications, and more. While there is a plethora of options for physical health, there is a relatively limited selection of apps specifically tailored to mental health, meditation, and yoga. 

### Conclusion 
In our project, we examined data pertaining to mobile apps on the App Store and Google Play. Our objective was to suggest an app profile that could be prifitable in both markets. Our findings indicate that developing a health-related app, particularly one centered around mental health, could be a profitable venture for both platforms. However, to make an informed decision, a more thorough analysis of these mental health and wellness apps is needed.