# 🛍️ Retail Price Prediction & Recommendation
A step‑by‑step Jupyter Notebook that trains a Random‑Forest model, evaluates it, and returns actionable price recommendations.

**Walk‑through outline**
1. Data load & preview
2. Feature engineering (one‑hot encoding)
3. Train/test split & model training (Random Forest)
4. Evaluation metrics (RMSE & R²)
5. Business rule → *Raise / Keep / Discount*
6. Visual diagnostics

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score


## 1 · Load the data

In [None]:

DATA_PATH = 'retail_price.csv'  # adjust if located elsewhere
df = pd.read_csv(DATA_PATH)
print(f"Data shape: {df.shape}")
df.head()


## 2 · Define target and encode categorical features

In [None]:

target_col = 'unit_price'
X = df.drop(columns=[target_col])
y = df[target_col]

# One‑hot encode
X_encoded = pd.get_dummies(X, drop_first=True)
print(f"Encoded matrix shape: {X_encoded.shape}")


## 3 · Train/test split & model training

In [None]:

X_train, X_test, y_train, y_test = train_test_split(
    X_encoded, y, test_size=0.2, random_state=42)

rf = RandomForestRegressor(n_estimators=300, random_state=42, n_jobs=-1)
rf.fit(X_train, y_train)


## 4 · Evaluation

In [None]:

y_pred = rf.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"RMSE: {rmse:.2f}")
print(f"R²: {r2:.3f}")


## 5 · Generate price recommendations

In [None]:

band = 0.05
actions = []
for actual, pred in zip(y_test, y_pred):
    if pred >= actual * (1 + band):
        actions.append('Raise price')
    elif pred <= actual * (1 - band):
        actions.append('Discount')
    else:
        actions.append('Keep price')

results = pd.DataFrame({
    'Actual Price': y_test.values,
    'Predicted Price': y_pred,
    'Error': y_test.values - y_pred,
    'Action': actions
})
results.head(20)


## 6 · Visual diagnostics

In [None]:

# Actual vs Predicted
plt.figure()
plt.scatter(y_test, y_pred)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()])
plt.xlabel('Actual Unit Price')
plt.ylabel('Predicted Unit Price')
plt.title('Actual vs Predicted Unit Price')
plt.grid(True)
plt.tight_layout()
plt.show()

# Residuals histogram
plt.figure()
plt.hist(y_test - y_pred, bins=30)
plt.xlabel('Residual (Actual - Predicted)')
plt.ylabel('Frequency')
plt.title('Residual Distribution')
plt.grid(True)
plt.tight_layout()
plt.show()

# Feature importances
importances = rf.feature_importances_
indices = np.argsort(importances)[-15:]
plt.figure()
plt.barh(range(len(indices)), importances[indices])
plt.yticks(range(len(indices)), X_encoded.columns[indices])
plt.xlabel('Importance')
plt.title('Top 15 Feature Importances')
plt.tight_layout()
plt.show()
