# <a id='toc1_'></a>[Data Analysis](#toc0_)

Name  
Topic  
email  
June 4th, 2023  


**Table of contents**<a id='toc0_'></a>    
- [Data Analysis](#toc1_)    
- [1. Introduction](#toc2_)    
  - [1.1. Key Questions](#toc2_1_)    
  - [1.2. Assumptions and Methods](#toc2_2_)    
- [2. Setup and Data Collection](#toc3_)    
  - [2.1. Setup Libraries](#toc3_1_)    
  - [2.2. Load Data](#toc3_2_)    
- [3. Assess, Clean, and Feature Engineer](#toc4_)    
  - [3.1. Assess](#toc4_1_)    
  - [3.2. Clean](#toc4_2_)    
  - [3.3. Feature Engineering](#toc4_3_)    
- [4. Exploratory Data Analysis](#toc5_)    
  - [4.1. Statistical Analysis](#toc5_1_)    
  - [4.2. Visual Analysis](#toc5_2_)    
- [5. Model 1: Preparation, Initiation, and Evaluation](#toc6_)    
  - [5.1. Model 1: Hypothesis Formation](#toc6_1_)    
  - [5.2. Model 1: Data Preparation](#toc6_2_)    
  - [5.3. Model 1: Assumptions](#toc6_3_)    
  - [5.4. Model 1: Initiation and Evaluation](#toc6_4_)    
  - [5.5. Model 1: Assessing Residuals](#toc6_5_)    
  - [5.6. Model 1: Iteration](#toc6_6_)    
- [6. Model 2: Preparation, Initiation, and Evaluation](#toc7_)    
  - [6.1. Model 2: Hypothesis Formation](#toc7_1_)    
  - [6.2. Model 2: Data Preparation](#toc7_2_)    
  - [6.3. Model 2: Assumptions](#toc7_3_)    
  - [6.4. Model 2: Initiation and Evaluation](#toc7_4_)    
  - [6.5. Model 2: Assessing Residuals](#toc7_5_)    
  - [6.6. Model 2: Iteration](#toc7_6_)    
- [7. Key Findings](#toc8_)    
- [8. Recommendations](#toc9_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc2_'></a>[1. Introduction](#toc0_)



## <a id='toc2_1_'></a>[1.1. Key Questions](#toc0_)

## <a id='toc2_2_'></a>[1.2. Assumptions and Methods](#toc0_)


# <a id='toc3_'></a>[2. Setup and Data Collection](#toc0_)



## <a id='toc3_1_'></a>[2.1. Setup Libraries](#toc0_)

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import datetime as dt
from functools import reduce
import sys

# Viz libraries
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import matplotlib.cm as cm
from matplotlib.ticker import ScalarFormatter

%matplotlib inline
pd.plotting.register_matplotlib_converters()

# ML libraries
from scipy import stats
import statsmodels.api as sm
import sklearn as sk

# # Optional Libraries
# import re
# from collections import defaultdict
# from datetime import timedelta  
# from dateutil.relativedelta import relativedelta
# import functools
# from IPython.display import display, Markdown
# import math
# import os
# os.environ["PYTHONHASHSEED"] = "123"

# Initialize pseudo-random seed for replicability
np.random.seed(43)

# Initialize styling params
sns.set_style("whitegrid")
sns.set_context("notebook")

plt.rcParams['figure.figsize'] = (8.0, 6.0) #setting figure size
plt.rcParams["xtick.direction"] = "in"
plt.rcParams["ytick.direction"] = "in"
plt.rcParams["font.size"] = 11.0
plt.rcParams["figure.figsize"] = (9, 6)

pd.set_option("display.max_colwidth", 1000)
pd.set_option('display.max_columns', 50)


In [2]:
print("Versions used in this notebook:")
print(f"Python version: {sys.version}")

print(f"Pandas version: {pd.__version__}")
print(f"Numpy version: {np.__version__}")
print(f"Seaborn version: {sns.__version__}")
print(f"Matplotlib version: {mpl.__version__}")
# print(f"Scipy version: {scipy.__version__}")
# print(f"Statsmodels version: {sm.__version__}")
print(f"SKLearn version: {sk.__version__}")


Versions used in this notebook:
Python version: 3.9.16 (main, Mar  8 2023, 10:39:24) [MSC v.1916 64 bit (AMD64)]
Pandas version: 2.0.2
Numpy version: 1.23.4
Seaborn version: 0.12.1
Matplotlib version: 3.6.2
SKLearn version: 1.2.2


## <a id='toc3_2_'></a>[2.2. Load Data](#toc0_)


# <a id='toc4_'></a>[3. Assess, Clean, and Feature Engineer](#toc0_)

This section will describe the methods and techniques used in the analysis.


## <a id='toc4_1_'></a>[3.1. Assess](#toc0_)

## <a id='toc4_2_'></a>[3.2. Clean](#toc0_)

## <a id='toc4_3_'></a>[3.3. Feature Engineering](#toc0_)


# <a id='toc5_'></a>[4. Exploratory Data Analysis](#toc0_)

In this section, we will perform some basic data analysis.


## <a id='toc5_1_'></a>[4.1. Statistical Analysis](#toc0_)

## <a id='toc5_2_'></a>[4.2. Visual Analysis](#toc0_)


# <a id='toc6_'></a>[5. Model 1: Preparation, Initiation, and Evaluation](#toc0_)

In this section, we will apply some machine learning models on the data.


## <a id='toc6_1_'></a>[5.1. Model 1: Hypothesis Formation](#toc0_)



## <a id='toc6_2_'></a>[5.2. Model 1: Data Preparation](#toc0_)



## <a id='toc6_3_'></a>[5.3. Model 1: Assumptions](#toc0_)



## <a id='toc6_4_'></a>[5.4. Model 1: Initiation and Evaluation](#toc0_)



## <a id='toc6_5_'></a>[5.5. Model 1: Assessing Residuals](#toc0_)



## <a id='toc6_6_'></a>[5.6. Model 1: Iteration](#toc0_)



# <a id='toc7_'></a>[6. Model 2: Preparation, Initiation, and Evaluation](#toc0_)

In this section, we will apply some machine learning models on the data.


## <a id='toc7_1_'></a>[6.1. Model 2: Hypothesis Formation](#toc0_)



## <a id='toc7_2_'></a>[6.2. Model 2: Data Preparation](#toc0_)



## <a id='toc7_3_'></a>[6.3. Model 2: Assumptions](#toc0_)



## <a id='toc7_4_'></a>[6.4. Model 2: Initiation and Evaluation](#toc0_)



## <a id='toc7_5_'></a>[6.5. Model 2: Assessing Residuals](#toc0_)



## <a id='toc7_6_'></a>[6.6. Model 2: Iteration](#toc0_)


# <a id='toc8_'></a>[7. Key Findings](#toc0_)

Summarize the findings from the analyses.



# <a id='toc9_'></a>[8. Recommendations](#toc0_)

List number of actionables based on the findings.