# AAVAIL Revenue Prediction - Part 2: Model Iteration

## Assignment 02: Time-Series Forecasting Models

**Objective**: Compare different modeling approaches to predict next 30 days revenue

**Models to Compare**:
1. ARIMA - Traditional time series
2. Exponential Smoothing - Holt-Winters method
3. Random Forest - ML with engineered features
4. Gradient Boosting - Advanced ensemble method
5. LSTM - Deep learning approach

In [1]:
# Import libraries
import sys
import os
sys.path.append('../src')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Import custom modules
from data_ingestion import load_retail_data
from model_approaches import TimeSeriesModelApproaches, run_model_comparison

print("Libraries imported successfully!")

ModuleNotFoundError: No module named 'statsmodels'

In [None]:
# Load processed data from Part 1
print("Loading processed data...")

# Load the focused dataset (top 10 countries)
df_focused = pd.read_csv('../data/processed/focused_data_top10.csv')
df_focused['date'] = pd.to_datetime(df_focused['date'])

print(f"Data loaded: {len(df_focused):,} records")
print(f"Date range: {df_focused['date'].min()} to {df_focused['date'].max()}")
print(f"Countries: {df_focused['country'].nunique()}")

In [None]:
# Run model comparison for all countries combined
print("Running model comparison for all top 10 countries combined...")

results_all, comparison_all = run_model_comparison(df_focused, country=None)

print("\nModel Comparison Results:")
print(comparison_all)

In [None]:
# Run model comparison for United Kingdom (top country)
print("Running model comparison for United Kingdom...")

results_uk, comparison_uk = run_model_comparison(df_focused, country='United Kingdom')

print("\nUK Model Comparison Results:")
print(comparison_uk)

In [None]:
# Select best model and prepare for deployment
modeler = TimeSeriesModelApproaches()
best_model_name, best_model_result = modeler.select_best_model(results_all)

print(f"\nSelected Best Model: {best_model_name}")
print(f"MAPE: {best_model_result['mape']:.2%}")
print(f"MAE: {best_model_result['mae']:.2f}")
print(f"30-day forecast: ${best_model_result['forecast_30d_sum']:,.2f}")

In [None]:
# Save results and best model
import pickle

# Save model comparison results
comparison_all.to_csv('../reports/model_comparison_results.csv', index=False)

# Save best model
model_data = {
    'best_model_name': best_model_name,
    'best_model_result': best_model_result,
    'model_comparison': comparison_all.to_dict()
}

with open('../models/best_model_assignment02.pkl', 'wb') as f:
    pickle.dump(model_data, f)

print("Model results saved successfully!")

## Summary

Assignment 02 completed successfully:
- Compared 5 different modeling approaches
- Selected best performing model based on MAPE
- Prepared model for deployment in Assignment 03
- Generated comprehensive comparison report