<a href="https://colab.research.google.com/github/mchhab/Side_Projects/blob/main/Home_Depot_Case_Study_Sheet_Manik_Chhabra.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**Importing the necessary libraries for this business case study**

In [3]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math

pd.set_option('display.max_rows', None) #displays all rows

##**Understanding the structure of the data**

In [2]:
df = pd.read_excel('Case_Study.xlsx') #reads the data
df.head() #returns the first 5 rows of the dataset

Unnamed: 0,week_start,page_type,taxonomy,key,L1,holiday,page_views
0,2022-10-31,CLP,appliances,CLP>appliances,appliances,none,281729
1,2022-11-07,CLP,appliances,CLP>appliances,appliances,none,307537
2,2022-11-14,CLP,appliances,CLP>appliances,appliances,none,299777
3,2022-11-21,CLP,appliances,CLP>appliances,appliances,thanksgiving,612575
4,2022-11-28,CLP,appliances,CLP>appliances,appliances,none,285861


In [5]:
df.tail() #returns the last 5 rows of the dataset

Unnamed: 0,week_start,page_type,taxonomy,key,L1,holiday,page_views
249412,2024-09-23,TY,thank you,TY>thank you,thank you,none,1258642
249413,2024-09-30,TY,thank you,TY>thank you,thank you,none,1331796
249414,2024-10-07,TY,thank you,TY>thank you,thank you,none,1256319
249415,2024-10-14,TY,thank you,TY>thank you,thank you,none,1292454
249416,2024-10-21,TY,thank you,TY>thank you,thank you,none,1425259


In [4]:
df.shape #returns the number of rows and columns respectively

(249417, 7)

#The dataset has 249417 rows and 7 columns.

In [7]:
df.info() #prints a concise summary of the dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 249417 entries, 0 to 249416
Data columns (total 7 columns):
 #   Column      Non-Null Count   Dtype         
---  ------      --------------   -----         
 0   week_start  249417 non-null  datetime64[ns]
 1   page_type   249417 non-null  object        
 2   taxonomy    249417 non-null  object        
 3   key         249417 non-null  object        
 4   L1          249417 non-null  object        
 5   holiday     249417 non-null  object        
 6   page_views  249417 non-null  int64         
dtypes: datetime64[ns](1), int64(1), object(5)
memory usage: 13.3+ MB


#Shows the column headers, the number of non-null values per column, and the data type of each column. The data types are accurate for each column header. Moreover, this summary shows that no null values exist in the dataset.

In [8]:
df.isnull().sum() #finds the null values per column

Unnamed: 0,0
week_start,0
page_type,0
taxonomy,0
key,0
L1,0
holiday,0
page_views,0


#In summary, there are no null values in the data set.

In [11]:
df.describe(include='all').transpose() #creates a statistical summary of the numerical and categorical data

Unnamed: 0,count,unique,top,freq,mean,min,25%,50%,75%,max,std
week_start,249417.0,,,,2023-11-03 12:46:27.370548480,2022-10-31 00:00:00,2023-05-08 00:00:00,2023-11-06 00:00:00,2024-05-06 00:00:00,2024-10-21 00:00:00,
page_type,249417.0,5.0,PLP,234181.0,,,,,,,
taxonomy,249417.0,2399.0,appliances,208.0,,,,,,,
key,249417.0,2542.0,CLP>appliances,104.0,,,,,,,
L1,249417.0,20.0,outdoors,47257.0,,,,,,,
holiday,249417.0,8.0,none,215992.0,,,,,,,
page_views,249417.0,,,,28849.321454,0.0,2137.0,4954.0,13799.0,39945497.0,613077.313215


#Analyzing by column:

1.   **week_start** - The minimum week start date is 10/31/2022. The maximum week start date is 10/21/2024.
2.   **page-type** - Contains 5 unique values. "PLP" is the most frequent value and appears in the column 234181 times
3.   **taxonomy** - Contains 2399 unique values. The topmost value is "appliances" and appears 208 times.
4.   **key** - Contains 2542 unique values. 'CLP>appliances' is the topmost value and appears 104 times.
5.   **L1** - Contains 20 unique values. Outdoors is the topmost value and appears 47257 times.
6.   **holiday** - Contains 8 unique values. 'None' is the topmost value and appears 215992 times.
7.   **page_views** - The mean number of page views is 28849 per row (rounded). The minimum number of page views is 0. The maximum number of page views is 39945497. The median number of page views is 4954.

