# Day 1 :

 # <u> *Business Problem Understanding* <u/>
## <u> 1. Project Objective <u/>
* Predict next month's product-level sales for a retail chain to optimize inventory management and reduce stockouts/overstock situations.
## <u>2. Business Context<u/>
1. Why forecast?: Retailers typically lose 8-10% of sales due to stockouts and 10-15% due to overstocking

2. Use Case: Monthly purchase order planning, warehouse staffing, promotional planning

3. Impact: Every 1% improvement in forecast accuracy can reduce inventory costs by 2-3%

### <u>3. Scope & Constraints<u/>
1. Forecast Horizon: 1 month ahead

2. Frequency: Monthly predictions

3. Granularity: Product-level forecasting (may aggregate to category level if data sparse)

4. Time Frame: Use 2-3 years of historical data

5. Assumption: No major business disruptions (store openings/closures, pandemics)

# Day 2 :



In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('Walmart.csv')

In [3]:
df.shape

(6435, 8)

In [9]:
df.sample(5)

Unnamed: 0,Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
5930,42,20-05-2011,524559.95,0,72.62,3.99,129.075677,8.494
1286,9,26-10-2012,549731.49,0,69.52,3.506,227.232807,4.954
3305,24,28-05-2010,1473868.15,0,69.59,3.046,132.293936,8.211
1861,14,19-02-2010,2204556.7,0,31.27,2.745,182.034782,8.992
2455,18,23-07-2010,1032908.23,0,75.71,2.784,132.582581,9.342


In [5]:
df.columns

Index(['Store', 'Date', 'Weekly_Sales', 'Holiday_Flag', 'Temperature',
       'Fuel_Price', 'CPI', 'Unemployment'],
      dtype='object')

## Cleaning Data

In [7]:
df.isnull().sum() # No missing values

Store           0
Date            0
Weekly_Sales    0
Holiday_Flag    0
Temperature     0
Fuel_Price      0
CPI             0
Unemployment    0
dtype: int64

In [None]:
# Change Date column to pandas datetime
df['Date'] = pd.to_datetime(df['Date'], format='%d-%m-%Y', dayfirst=True)

## Handling seasonality

### 1. Convert Weekly to Monthly Data

In [34]:
df['Year_month'] = df['Date'].dt.to_period('M')

In [35]:
monthly_data = df.groupby(['Store','Year_month']).agg({
    'Weekly_Sales': 'sum',          # Total sales in month
    'Temperature': 'mean',           # Average temperature
    'Fuel_Price': 'mean',            # Average fuel price
    'CPI': 'mean',                   # Average CPI
    'Unemployment': 'mean',          # Average unemployment
    'Holiday_Flag': 'max'            # If ANY week had holiday, month has holiday
}).reset_index()

In [36]:
monthly_data.head()

Unnamed: 0,Store,Year_month,Weekly_Sales,Temperature,Fuel_Price,CPI,Unemployment,Holiday_Flag
0,1,2010-02,6307344.1,41.845,2.54875,211.236828,8.106,1
1,1,2010-03,5871293.98,52.58,2.686,211.241116,8.106,0
2,1,2010-04,7422801.92,65.34,2.7744,210.552135,7.808,0
3,1,2010-05,5929938.64,76.0525,2.8185,210.547812,7.808,0
4,1,2010-06,6084081.46,82.3925,2.66575,211.356237,7.808,0


In [37]:
# Renaming weekly sale column to montly sale
monthly_data = monthly_data.rename(columns={'Weekly_Sales': 'Monthly_Sales'})

In [38]:
# Convert YearMonth back to proper date (end of month)
monthly_data['Date'] = monthly_data['Year_month'].dt.to_timestamp('M')

In [39]:
monthly_data.head()

Unnamed: 0,Store,Year_month,Monthly_Sales,Temperature,Fuel_Price,CPI,Unemployment,Holiday_Flag,Date
0,1,2010-02,6307344.1,41.845,2.54875,211.236828,8.106,1,2010-02-28
1,1,2010-03,5871293.98,52.58,2.686,211.241116,8.106,0,2010-03-31
2,1,2010-04,7422801.92,65.34,2.7744,210.552135,7.808,0,2010-04-30
3,1,2010-05,5929938.64,76.0525,2.8185,210.547812,7.808,0,2010-05-31
4,1,2010-06,6084081.46,82.3925,2.66575,211.356237,7.808,0,2010-06-30


### 2. Creating Seasonal Features

In [40]:
# Extract time components
monthly_data['Year'] = monthly_data['Date'].dt.year
monthly_data['Month'] = monthly_data['Date'].dt.month
monthly_data['Quarter'] = monthly_data['Date'].dt.quarter

In [41]:
# Cyclic encoding for month (CRITICAL for seasonality)
monthly_data['Month_sin'] = np.sin(2 * np.pi * monthly_data['Month'] / 12)
monthly_data['Month_cos'] = np.cos(2 * np.pi * monthly_data['Month'] / 12)

In [44]:
monthly_data.columns

Index(['Store', 'Year_month', 'Monthly_Sales', 'Temperature', 'Fuel_Price',
       'CPI', 'Unemployment', 'Holiday_Flag', 'Date', 'Year', 'Month',
       'Quarter', 'Month_sin', 'Month_cos'],
      dtype='object')