# **Retail Sales Forecasting for Strategic Inventory and Revenue Planning**

In retail, understanding future sales trends is essential for managing inventory, planning marketing campaigns, and maximizing revenue. Overstocking ties up capital, while understocking leads to missed sales opportunities and poor customer satisfaction. Accurate sales forecasting helps retailers make data-driven decisions that reduce operational costs and optimize performance.

## **Objective**

This project aims to develop a predictive model that accurately forecasts monthly sales based on historical transaction data. Using time series techniques and feature engineering, the goal is to anticipate future sales performance and evaluate forecast accuracy using standard KPIs such as Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Forecast Bias.

### **Business Values:**
- An accurate forecast empowers retail decision-makers to:
- Optimize inventory levels
- Allocate resources efficiently
- Improve financial planning and budgeting
- Reduce waste and avoid stockouts
- Align promotions with expected demand

**Key Questions to Answer:**

* How can historical sales data be used to predict future demand?
* Which forecasting model best captures seasonality and trend components?
* How accurate are the predictions compared to actual sales?
* What KPIs should be monitored to evaluate model performance?


# Frame the Problem and Look at the Big Picture
1. Define the objective in business terms.
2. How will your solution be used?
3. What are the current solutions/workarounds (if any)?
4. How should you frame this problem (supervised/unsupervised, online/offline, ...)?
5. How should performance be measured? Is the performance measure aligned with the business objective?
6. What would be the minimum performance needed to reach the business objective?
7. What are comparable problems? Can you reuse experience or tools?
8. Is human expertise available?
9. How would you solve the problem manually?
10. List the assumptions you (or others) have made so far. Verify assumptions if possible.

# **Get The Data**

This dataset is sourced from the [Superstore Sales Forecasting Dataset on Kaggle](https://www.kaggle.com/datasets/rohitsahoo/sales-forecasting), which includes transaction-level sales data from a fictional office supply retailer.

The data includes order dates, shipping details, customer segments, product categories, and sales amounts.

### Features of Interest:
- `Order Date`: Date of purchase (used as the time index for forecasting)
- `Sales`: Revenue generated from each transaction (forecast target)
- `Category`, `Segment`, `Region`: Optional groupings for segmentation
- `Product ID`, `Sub-Category`: Useful for granular forecasting

In [9]:
import numpy as np
import pandas as pd
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [4]:
df = pd.read_csv('train.csv', parse_dates=["Order Date", "Ship Date"])
df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales
0,1,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96
1,2,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,3,CA-2017-138688,12/06/2017,16/06/2017,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
3,4,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368


In [7]:
df.describe()

Unnamed: 0,Row ID,Postal Code,Sales
count,9800.0,9789.0,9800.0
mean,4900.5,55273.322403,230.769059
std,2829.160653,32041.223413,626.651875
min,1.0,1040.0,0.444
25%,2450.75,23223.0,17.248
50%,4900.5,58103.0,54.49
75%,7350.25,90008.0,210.605
max,9800.0,99301.0,22638.48
