# Project 4: Inventory Demand Forecasting

**Type:** Regression | **Level:** Intermediate

**Goal:** Forecast product demand (Quantity/Sales) based on historical supply chain data to optimize stock levels.

**Model:** Random Forest Regressor / XGBoost

**Dataset:** DataCo Smart Supply Chain (or any inventory CSV)

## 1. Setup & Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import LabelEncoder
import joblib

sns.set_style('whitegrid')
print("âœ… Regression environment ready!")

## 2. Load and Inspect Data

In [None]:
# Incarcam setul de date (asigura-te ca ai fisierul in folderul data)
# Daca folosesti DataCoSupplyChain, s-ar putea sa ai nevoie de encoding='ISO-8859-1'
df = pd.read_csv('../data/DataCoSupplyChainDataset.csv', encoding='ISO-8859-1')

# Selectam coloanele relevante pentru predictia cererii (Demand/Quantity)
# Exemplu: Product Price, Category Name, Order Region, Days for shipping (real)
cols_to_keep = ['Days for shipping (real)', 'Days for shipment (scheduled)', 'Benefit per order', 
                'Sales per customer', 'Category Name', 'Order Region', 'Order Item Quantity']

df = df[cols_to_keep]
print(f"Dataset loaded with {df.shape[0]} rows.")
df.head()

## 3. Data Preprocessing

In [None]:
# Transformam variabilele categorice in numere
le = LabelEncoder()
df['Category Name'] = le.fit_transform(df['Category Name'])
df['Order Region'] = le.fit_transform(df['Order Region'])

# Tratam valorile lipsa daca exista
df = df.dropna()

# Definim variabila target (cererea = Order Item Quantity) si feature-urile
X = df.drop('Order Item Quantity', axis=1)
y = df['Order Item Quantity']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("âœ… Data splitting complete.")

## 4. Model Training (Random Forest Regressor)

In [None]:
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

print("âœ… Model trained successfully!")

## 5. Evaluation & Visualization

In [None]:
y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"ðŸ“Š Mean Absolute Error: {mae:.2f}")
print(f"ðŸ“Š R2 Score (Accuracy): {r2:.4f}")

# Vizualizare: Predictie vs Realitate
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.3, color='blue')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.xlabel('Actual Demand')
plt.ylabel('Predicted Demand')
plt.title('Actual vs Predicted Inventory Demand')
plt.show()

## 6. Feature Importance
See which factors influence demand the most.

In [None]:
importances = model.feature_importances_
feature_names = X.columns
feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': importances}).sort_values(by='Importance', ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(x='Importance', y='Feature', data=feature_importance_df, palette='viridis')
plt.title('Key Drivers of Product Demand')
plt.show()