# App Store Analysis

## Using Data Analytics to Inform Business Strategy

### Company Background

The company XYZ is in the business of building Android and iOS mobile apps. These apps are free to download and install, and they are distributed through Google Play Store and the Apple App Store.

The company's main source of revenue is in-app advertising. This posits the business model as volume driven - scale in terms of the number of users becomes very important. The more the number of users who see and engage with the ads, proportionally better is the revenue opportunity.

### Business Challenge

The senior management team is meeting for the annual strategy event to decide on allocation of resources and future app development roadmap. The team is seeking inputs from the business strategy group that will help the company maximize return-on-investment(ROI) opportunities. 

### Project Design

Our goal for this project is to offer actionable insights that are backed by data. Based on our understanding of the company's business model, we know that the biggest driver of ROI is the number of users for an app - the revenue opportunity is directly proportional. We will focus our exploration on this topic. 

Our project scope is to analyze app store data and identify the type of apps that are likely to attract more users. Such actionable intelligence can help optimize revenue and the company can focus on creating the kind of apps that are popular.

Our key requirements are as follows:

- We are interested in free apps only
- We are interested in apps in English language only


### Sources

Apple Store Data (July 2017): https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

Google Play Store Data (August 2018): https://www.kaggle.com/lava18/google-play-store-apps

In [2]:
# Defining a function to make it easy to print data

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    
    for row in dataset_slice:
        print(row)
        print("\n") #Add a new empty line after each row
        
    if rows_and_columns:
        print("Number of rows: " + str(len(dataset)))
        print("Number of columns: " + str(len(dataset[0])))

In [13]:
#Open Google Play Store and Apple Store datasets and save them as list of lists.

from csv import reader

#Apple Store Dataset
opened_file = open("./AppleStore.csv")
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]


#Google Play Store Dataset
opened_file = open("./googleplaystore.csv")
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### Cleaning our datasets ready

Our two datasets, in their current format, are a list of lists. However, we cannot use them right away. The data needs to be cleaned and prepared so that we do not get any wrong results in our analysis. As per our requirements, we need to remove all paid apps and non-English language apps too.

We will focus on the following three steps that are integral to any Data Cleansing process:

- remove or correct wrong data
- remove duplicate data
- modify the data to fit the purpose of our analysis

