# Sales Forecasting with Linear Regression
To forecast monthly sales we treat **ad spend**, **discount percentage**, and **customer footfall** as input features. Linear regression fits coefficients that minimise squared errors, and **5-Fold Cross-Validation** checks the model on several splits so we do not rely on a single lucky train/test partition.

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold, cross_val_score

In [None]:
# Monthly records: sales in thousands
data = pd.DataFrame({
    "ad_spend_k": [5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
    "discount_pct": [5, 5, 7, 7, 10, 10, 12, 12, 15, 15],
    "footfall_k": [20, 22, 24, 25, 27, 29, 30, 32, 34, 36],
    "sales_k": [50, 55, 60, 63, 68, 74, 78, 83, 88, 94]
})

X = data[["ad_spend_k", "discount_pct", "footfall_k"]]
y = data["sales_k"]

In [None]:
model = LinearRegression()
kf = KFold(n_splits=5, shuffle=True, random_state=42)

mse_scores = -cross_val_score(model, X, y, cv=kf, scoring="neg_mean_squared_error")
r2_scores = cross_val_score(model, X, y, cv=kf, scoring="r2")

print("Average MSE:", mse_scores.mean())
print("Average R^2:", r2_scores.mean())

In [None]:
model.fit(X, y)
future_point = pd.DataFrame({"ad_spend_k": [15], "discount_pct": [12], "footfall_k": [38]})
predicted_sales = model.predict(future_point)[0]
print(f"Predicted sales for future plan: {predicted_sales:.1f}k")