**Mobile App Profitability Analysis** 

This is a data-driven project focused on identifying profitable app profiles in the App Store and Google Play markets. By analyzing key metrics and trends, the project aims to guide developers in making informed decisions about the types of apps to build, maximizing potential revenue and market success.

In [14]:
from csv import reader

# Google play dataset
opened_csv_file = open('googleplaystore.csv') # open the csv file in read mode
read_csv_file = reader(opened_csv_file) # read the csv file 
android_list = list(read_csv_file) # convert the interable csv  into a list of list, this means that it will in form of a dataframe
android_list_header = android_list[0] # get the list headers
android_list = android_list[1:] # remove first row index from the dataset

In [20]:
# apple store dataset
opened_csv_file = open('AppleStore.csv') # open the csv file in read mode
read_csv_file = reader(opened_csv_file) # read csv file, this return the csv object 
ios_list = list(read_csv_file) # convert the csv file into object
ios_list_header = ios_list[0] # get the list headers which are om index 0
ios_list = ios_list[1:] # remove headers from the list which are on the index 0

In [11]:
"""
This function will help us to the explore both the dataset, it receive the dataset as the first argument
start: the index where the slicing begins
end: the index where the slicing ends
rows_and_columns: this is a boolean variabe weither you want to display the number of rows and cols, this variable should be set to true
"""
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

We will use explore_data() function to explore our data set

In [32]:
print(android_list_header) # we need to show the ios list heaser
print("\n")
explore_data(android_list, 0, 4, True) # here we are using our explore data function to visualize our data set

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10840
Number of columns: 13


We have some incorrect rows from the android dataset (we can check for these rows and compare them with other rows)

In [27]:
print(android_list[10472]) #this is a row which contains some incorrect data
print("\n")
print(android_list_header)
print("\n")
print(android_list[0]) # we need to compare this row with our first row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Based on the data for row "10472" in android data this clearly shows that this row contains incorrect data for "Rating" Column where it has "19" and the heighst rating for google play is "5" hence we will delete this row
N.B: do not run the below code cell more than once

In [28]:
print(len(android_list)) #verify the number of rows we have in our dataset
del android_list[10472] # this delete the row index from the android_list
print(len(android_list))

10841
10840


**DATA CELEANING** In this part we are going to perform data cleaning operations 
1. deleting rows or columns with missing data that are not critical for the analysis
2. identifying and correcting errors in the data entries such as typo errors
3. ensuring consistency in the data formats
4. identifying and removing duplicates from the datasets
5. finally removing data which are irrelevant to the analsis

we need to check for duplicate entries in our a dataset 
1. we will need to create two lists, one which stores duplicate app in our dataset and another one which stores names of unique app from the dataset
2. first check if the app name is existing from the unique app list
3. if it exist append it the duplicate_apps list
4. if it does not exist in the unique_apps list add it in the unique_apps list since it will be our first app name


In [36]:
duplicate_apps = []
unique_apps = []

for app in android_list:
    app_name = app[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)

print("Number of duplicate apps: ", len(duplicate_apps))
print("\n")
print("Some of the duplicate apps ", duplicate_apps[:15])

Number of duplicate apps:  1181


Some of the duplicate apps  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']
