# Time Series Forecasting - Quick Start Tutorial

**Author**: Time Series Repository

**Date**: 2024

## Objective

This notebook demonstrates how to:
1. Load and visualize time series data
2. Apply different forecasting methods
3. Compare model performance
4. Choose the best model for your data

Perfect for thesis work (skripsi) and academic research!

## 1. Setup and Imports

In [None]:
# Add parent directory to path to import utils
import sys
sys.path.append('..')

# Import utilities
from utils.data_preprocessing import (
    train_test_split_temporal, 
    check_stationarity,
    normalize_data
)
from utils.visualization import (
    plot_time_series, 
    plot_forecast,
    plot_multiple_forecasts
)

# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

print("‚úì All libraries imported successfully!")

## 2. Generate Sample Data

We'll create synthetic time series data with trend and seasonality.
You can replace this with your own data!

In [None]:
# Generate sample time series data
dates = pd.date_range(start='2020-01-01', end='2023-12-31', freq='M')

# Components
trend = np.linspace(100, 200, len(dates))
seasonal = 10 * np.sin(np.linspace(0, 8*np.pi, len(dates)))
noise = np.random.normal(0, 5, len(dates))

# Combine
values = trend + seasonal + noise

# Create DataFrame
df = pd.DataFrame({'value': values}, index=dates)

print(f"Dataset created!")
print(f"Shape: {df.shape}")
print(f"Date range: {df.index[0]} to {df.index[-1]}")
print(f"\nFirst few rows:")
df.head()

## 3. Visualize the Data

In [None]:
# Plot the time series
plt.figure(figsize=(14, 6))
plt.plot(df.index, df['value'], linewidth=2)
plt.title('Time Series Data', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Value', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Summary statistics:")
df.describe()

## 4. Check Stationarity

Stationarity is important for ARIMA models.

In [None]:
# Check if data is stationary
stationarity = check_stationarity(df, column='value')

print("Stationarity Test Results:")
print(f"Test Statistic: {stationarity['test_statistic']:.4f}")
print(f"P-value: {stationarity['p_value']:.4f}")
print(f"Is Stationary: {stationarity['is_stationary']}")
print(f"\nCritical Values:")
for key, value in stationarity['critical_values'].items():
    print(f"  {key}: {value:.4f}")

if stationarity['is_stationary']:
    print("\n‚úì Data is stationary - good for ARIMA!")
else:
    print("\n‚úó Data is not stationary - may need differencing for ARIMA")

## 5. Split Data into Train and Test Sets

Always split chronologically for time series!

In [None]:
# Split data (80% train, 20% test)
train, test = train_test_split_temporal(df, test_size=0.2)

print(f"Training set: {len(train)} observations")
print(f"Test set: {len(test)} observations")
print(f"\nTrain date range: {train.index[0]} to {train.index[-1]}")
print(f"Test date range: {test.index[0]} to {test.index[-1]}")

## 6. Apply ARIMA Model

In [None]:
from statsmodels.tsa.arima.model import ARIMA

# Fit ARIMA model
print("Fitting ARIMA(1,1,1) model...")
arima_model = ARIMA(train['value'], order=(1, 1, 1))
arima_fit = arima_model.fit()

# Forecast
arima_forecast = arima_fit.forecast(steps=len(test))

print("‚úì ARIMA model fitted and forecast generated!")

## 7. Apply Holt-Winters (Triple Exponential Smoothing)

In [None]:
from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Fit Holt-Winters model
print("Fitting Holt-Winters model...")
hw_model = ExponentialSmoothing(
    train['value'], 
    trend='add', 
    seasonal='add', 
    seasonal_periods=12
)
hw_fit = hw_model.fit()

# Forecast
hw_forecast = hw_fit.forecast(steps=len(test))

print("‚úì Holt-Winters model fitted and forecast generated!")

## 8. Evaluate Models

We'll use multiple metrics to evaluate performance.

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error

def evaluate_forecast(actual, predicted, model_name):
    """Evaluate forecast performance"""
    mse = mean_squared_error(actual, predicted)
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(actual, predicted)
    mape = np.mean(np.abs((actual - predicted) / actual)) * 100
    
    return {
        'Model': model_name,
        'RMSE': rmse,
        'MAE': mae,
        'MAPE': mape
    }

# Evaluate both models
results = []
results.append(evaluate_forecast(test['value'].values, arima_forecast.values, 'ARIMA'))
results.append(evaluate_forecast(test['value'].values, hw_forecast.values, 'Holt-Winters'))

# Create results DataFrame
results_df = pd.DataFrame(results)
results_df = results_df.set_index('Model')

print("Model Performance Comparison:")
print("=" * 50)
print(results_df.to_string())
print("\n" + "=" * 50)

best_model = results_df['RMSE'].idxmin()
print(f"\nüèÜ Best Model (by RMSE): {best_model}")

## 9. Visualize Forecasts

In [None]:
# Plot comparison
plt.figure(figsize=(16, 8))

# Plot training data
plt.plot(train.index, train['value'], label='Training Data', 
         color='blue', linewidth=2)

# Plot test data
plt.plot(test.index, test['value'], label='Actual Test Data', 
         color='green', linewidth=2)

# Plot forecasts
plt.plot(test.index, arima_forecast, label='ARIMA Forecast', 
         color='red', linestyle='--', linewidth=2)
plt.plot(test.index, hw_forecast, label='Holt-Winters Forecast', 
         color='orange', linestyle='--', linewidth=2)

plt.title('Time Series Forecasting Comparison', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Value', fontsize=12)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 10. Conclusions

### Key Findings:
- We compared ARIMA and Holt-Winters models
- Both methods captured the trend and seasonality
- The best model can be determined by the evaluation metrics

### For Your Thesis/Skripsi:
1. **Try multiple methods**: Include at least 3-5 different approaches
2. **Document everything**: Parameters, metrics, and reasoning
3. **Visualize results**: Use plots to show comparisons
4. **Report all metrics**: RMSE, MAE, MAPE, etc.
5. **Discuss findings**: Why did one method perform better?

### Next Steps:
- Try other methods (Prophet, LSTM)
- Tune hyperparameters
- Test on your own data
- Add more sophisticated evaluation (cross-validation, etc.)

## Bonus: Save Your Results

In [None]:
# Save results to CSV
results_df.to_csv('forecast_results.csv')
print("‚úì Results saved to 'forecast_results.csv'")

# Save forecasts
forecast_df = pd.DataFrame({
    'Date': test.index,
    'Actual': test['value'].values,
    'ARIMA': arima_forecast.values,
    'Holt-Winters': hw_forecast.values
})
forecast_df.to_csv('forecasts.csv', index=False)
print("‚úì Forecasts saved to 'forecasts.csv'")

print("\nüéâ Analysis complete! You're ready for your thesis work!")