# Internship Task: Monthly Price Forecasting

## Introduction
This notebook covers the end-to-end process of forecasting average monthly prices using historical data. 
The steps include data cleaning, model selection (SARIMA), evaluation, and recommendations for business actions and deployment.


## 1. Data Cleaning
We start by loading the dataset, handling missing values, and visualizing the time series.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

# Load Data
df = pd.read_csv('data.csv')

# Convert date to datetime
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Check for missing values
print("Missing values:\n", df.isnull().sum())

# Visualize
plt.figure(figsize=(12, 6))
plt.plot(df['avg_monthly_price'], label='Average Monthly Price')
plt.title('Historical Monthly Prices')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.show()

# Handle Missing/Anomalies
# If any missing values existed, we would interpolate:
df['avg_monthly_price'] = df['avg_monthly_price'].interpolate(method='linear')


## 2. Model Description
We use **SARIMA (Seasonal AutoRegressive Integrated Moving Average)**.

### Why SARIMA?
- **Seasonality**: The data is monthly and likely exhibits yearly seasonality (repeating patterns every 12 months).
- **Trend**: Price data often has trends (inflation, market growth).
- **Flexibility**: SARIMA handles both trend and seasonality explicitly.

The model is defined by parameters $(p, d, q) \times (P, D, Q, s)$ where $s=12$.

## 3. Model Evaluation
We split the data into **Training** (all except last 12 months) and **Test** (last 12 months) sets to validate predictive performance.

In [None]:
# Train/Test Split
test_steps = 12
train_data = df.iloc[:-test_steps]
test_data = df.iloc[-test_steps:]

print(f"Training samples: {len(train_data)}, Test samples: {len(test_data)}")

# Fit SARIMA Model
# We use a standard configuration (1, 1, 1)x(1, 1, 1, 12) as a starting point.
# In a production setting, auto_arima would be used to find optimal parameters.
model = SARIMAX(train_data['avg_monthly_price'], 
                order=(1, 1, 1), 
                seasonal_order=(1, 1, 1, 12),
                enforce_stationarity=False,
                enforce_invertibility=False)

sarima_results = model.fit(disp=False)
print(sarima_results.summary())

# Predict on Test Set
start = len(train_data)
end = len(train_data) + len(test_data) - 1
predictions = sarima_results.predict(start=start, end=end, dynamic=False, typ='levels')
predictions.index = test_data.index

# Metrics
rmse = np.sqrt(mean_squared_error(test_data['avg_monthly_price'], predictions))
mae = mean_absolute_error(test_data['avg_monthly_price'], predictions)
mape = np.mean(np.abs(predictions - test_data['avg_monthly_price'])/np.abs(test_data['avg_monthly_price'])) * 100

print(f"RMSE: {rmse:.2f}")
print(f"MAE: {mae:.2f}")
print(f"MAPE: {mape:.2f}%")

# Plot Results
plt.figure(figsize=(12, 6))
plt.plot(train_data['avg_monthly_price'], label='Training Data')
plt.plot(test_data['avg_monthly_price'], label='Actual Prices')
plt.plot(predictions, label='Predicted Prices (Test)', color='red')
plt.title('Model Evaluation: Actual vs Predicted')
plt.legend()
plt.grid(True)
plt.show()

## 4. Suggested Actions
Based on predicted price changes:
1. **Inventory Management**: If prices are predicted to rise, stock up on inventory now to reduce costs later.
2. **Dynamic Pricing**: Adjust selling prices proactively. If market price drops, offer discounts to maintain volume; if checks rise, increase margins.
3. **Hedging**: Lock in supplier contracts at current rates if a sharp increase is forecast.

## 5. Effectiveness Measurement
To determine if actions are effective:
1. **Margin Analysis**: Compare profit margins before and after implementing the strategy.
2. **Forecast Accuracy Tracking**: Continuously compare model predictions vs actuals. If the model is accurate, the decisions based on it are valid.
3. **A/B Testing**: Implement pricing changes in one region/segment and compare with a control group.

## 6. Model Deployment (Django)
To deploy this model using Django:
1. **Serialize**: Save the trained model object using `pickle` or `joblib` (e.g., `model.pkl`).
2. **Django View**: Create a view in Django that loads the model (lazily or on startup).
3. **API Endpoint**: Expose an endpoint (e.g., `/api/predict/`) that accepts a date or horizon as input.
4. **Response**: The view calls `model.predict()` and returns the JSON response.

## 7. Integration into Web App
- **Frontend**: A form where users select a date range.
- **Backend**: Django view processes the request, loads `model.pkl`, performs inference, and passes data to the template.
- **Visualization**: Use Chart.js or D3.js in the template to render the forecast line chart dynamically based on the API response.

## 8. Monitoring in Production
- **Drift Detection**: Monitor the distribution of input data and output predictions. Significant deviations indicate concept drift.
- **Performance Tracking**: Log every prediction and the subsequent actual value. Calculate RMSE/MAPE weekly.
- **Retraining**: Set up a Celery task to retrain the model monthly with the latest data to keep it fresh.

## Future Forecast (Next 12 Months)
Finally, we retrain the model on the full dataset and predict the next 12 months.

In [None]:
# Retrain on Full Data
final_model = SARIMAX(df['avg_monthly_price'], 
                      order=(1, 1, 1), 
                      seasonal_order=(1, 1, 1, 12),
                      enforce_stationarity=False,
                      enforce_invertibility=False)
final_results = final_model.fit(disp=False)

# Forecast Next 12 Months
future_steps = 12
future_forecast = final_results.get_forecast(steps=future_steps)
forecast_ci = future_forecast.conf_int()
forecast_values = future_forecast.predicted_mean

print("Forecast for next 12 months:")
print(forecast_values)

# Plot Forecast
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['avg_monthly_price'], label='Historical Data')
plt.plot(forecast_values.index, forecast_values, label='Future Forecast', color='green')
plt.fill_between(forecast_values.index, 
                 forecast_ci.iloc[:, 0], 
                 forecast_ci.iloc[:, 1], color='k', alpha=0.1, label='Confidence Interval')
plt.title('Future Price Forecast (Next 12 Months)')
plt.legend()
plt.grid(True)
plt.show()