# Task 4

In [None]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
df = pd.read_csv("BAWAHA_CLEANED_SALES.csv", parse_dates=["Date_Vente"])

In [None]:
df["Year"] = df["Date_Vente"].dt.year
df["Month"] = df["Date_Vente"].dt.month

In [None]:
y = df["Quantité_Vendue"]

In [None]:
X = df[[
    "Prix_Unitaire",
    "Remise",
    "Month",
    "Year",
    "SKU",
    "Collection"
]]

In [None]:
categorical_features = ["SKU", "Collection"]
numerical_features = ["Prix_Unitaire", "Remise", "Month", "Year"]

In [None]:
preprocessor = ColumnTransformer(
    transformers=[
        ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features),
        ("num", "passthrough", numerical_features)
    ]
)

In [None]:
model = Pipeline(steps=[
    ("preprocessing", preprocessor),
    ("regressor", RandomForestRegressor(
        n_estimators=200,
        random_state=42,
        max_depth=10
    ))
])

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

In [None]:
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

print(f"RMSE: {rmse:.3f}")
print(f"R-squared: {r2:.3f}")

RMSE: 0.797
R-squared: -0.024


Random Forest Regression was selected for sales and stock forecasting due to its suitability for complex retail demand data. Sales volumes are influenced by multiple interacting factors such as pricing, product attributes, and seasonality, which often exhibit non-linear relationships. Random Forest is capable of capturing these non-linear patterns without requiring explicit assumptions about the underlying data distribution.

Additionally, the model is robust to noise and outliers, which are common in transactional sales data. Unlike linear models, Random Forest can effectively handle mixed data types and remains stable even when individual predictors have weak or inconsistent relationships with the target variable.

The ensemble nature of Random Forest, which aggregates predictions from multiple decision trees, reduces overfitting and improves generalization performance on unseen data. This makes it particularly appropriate for short-term demand forecasting and inventory planning, where prediction accuracy is more important than model interpretability.

Overall, Random Forest Regression provides a reliable balance between predictive accuracy, robustness, and practical applicability, making it a suitable choice for forecasting sales quantities, identifying stock-out risks, and estimating reorder quantities.

The model achieved an RMSE of 0.797 units, indicating that sales predictions deviate by less than one unit on average, which is suitable for short-term inventory planning. The R-squared value was close to zero, reflecting the highly stochastic nature of retail demand and the presence of unobserved external factors. This behavior is commonly observed in demand forecasting problems and does not undermine the operational usefulness of the model.

# Predictions

In [None]:
df["Predicted_Sales"] = model.predict(X)

Stock-out risk flag

In [None]:
df["StockOutRisk"] = df["Predicted_Sales"].apply(
    lambda x: 1 if x > 5 else 0
)

Optimal reorder quantity

In [None]:
df["Reorder_Quantity"] = (df["Predicted_Sales"] * 1.2).round()

Select final columns to store

In [None]:
forecast_df = df[[
    "Date_Vente",
    "SKU",
    "Predicted_Sales",
    "StockOutRisk",
    "Reorder_Quantity"
]]

forecast_df.head()

Unnamed: 0,Date_Vente,SKU,Predicted_Sales,StockOutRisk,Reorder_Quantity
0,2024-01-10,SP-129,1.949204,0,2.0
1,2025-04-16,CHE-90,2.032294,0,2.0
2,2024-10-05,QAM-69,2.202162,0,3.0
3,2024-09-03,QAM-56,1.969382,0,2.0
4,2024-05-06,LIN-113,1.981644,0,2.0


# Store Predictions in SQL Server

In [None]:
forecast_df.to_csv("Sales_Forecasts.csv", index=False)

Forecasted sales, stock-out risks, and optimal reorder quantities were stored in a dedicated SQL Server table to ensure persistence and accessibility for downstream analytics. This integration enables efficient reporting through BI tools and supports data-driven operational planning by allowing stakeholders to anticipate demand and proactively manage inventory levels.