 Profitable App Profiles for the App Store and Google Play Markets

In our data analysis project, we, as a mobile app development company, will be exploring data to gain valuable insights into the factors that influence the attraction of more users to our Android and iOS mobile apps, available on Google Play and the App Store.

As a company that heavily relies on in-app ads for revenue, our success is directly tied to the number of users we acquire. With a larger user base, we can enhance engagement with our in-app ads and generate higher revenue. Hence, the main objective of this project is to employ data analysis techniques to assist our developers in understanding the types of apps that are more likely to draw a larger audience.

Through the examination of various factors such as app categories, user ratings, app size, and even pricing (if applicable), along with other relevant metrics, we aim to uncover patterns and trends that can provide guidance to our development team. These insights will enable us to create apps with a higher potential for popularity and user attraction. Ultimately, our analysis will contribute to strategic decision-making and optimization of our app development process, fostering increased user engagement and revenue generation.

open both csv files using the default methods 

In [8]:
from csv import reader 

In [59]:
open_file = open('AppleStore.csv')
read_file = reader(open_file)
ios = list(read_file)
ios_header = ios[0]
ios_main = ios[1:]

In [60]:
open_file = open('googleplaystore.csv')
read_file = reader(open_file)
android = list(read_file)
android_header = android[0]
android_main = android[1:]

 created a function named explore_data() that you can repeatedly use to print rows in a readable way.

In [61]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Print the first few rows of each dataset.

In [22]:
print(ios_main[:2])

[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']]


In [31]:
print(android_main[:2])

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']]


the number of rows and columns of each dataset 

In [24]:
explore_data(ios_main , 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


In [25]:
explore_data(adroid_main , 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


To Familirize with data we will print each of dataset column names 

In [26]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [27]:
print(ios_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


After identifying an error in one row of the Google dataset during our discussion, we have decided to take the necessary step of deleting it to maintain the overall efficiency of our analysis

To ensure that the deletion has taken effect and to maintain transparency, we will print the length of the Google dataset before and after removing the row containing the error. This will allow us to verify the impact of the deletion on the dataset's overall length and confirm the desired efficiency in our analysis.

In [48]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [64]:
del android_main[10472]

In [65]:
print(len(android_main))

10840


Before we commence with our data analysis, it is essential to perform data cleaning tasks. One method we can employ is removing duplicates. However, before proceeding with duplicate removal, we need to identify whether there are any duplicates present in our dataset.

In [66]:
duplicate_apps = []
unique_apps = []
for app in android_main:
    name = app[0]
    if name not  in unique_apps:
        unique_apps.append(name)
    else:
        duplicate_apps.append(name)
print("Number of unique apps is :", len(unique_apps))
print("Example of duplicate apps are:", duplicate_apps[:5])      
        
        
        

Number of unique apps is : 9659
Example of duplicate apps are: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


As we can see there is alot duplicate apps in our dataset and we need to find a way to remove it 

In [70]:
reviews_max = {}
for app in android_main:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
         reviews_max[name] = n_reviews
        

In [71]:
print(len(android_main))

10840


In [74]:
android_clean = []
already_added = []
for app in android_main:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
    
    

In [76]:
print(len(android_clean))

9659
