# Task 3: Time Series Forecasting and Analysis

## 1. Objective
The goal of this notebook is to build and evaluate robust time series forecasting models for **Tesla (TSLA)**, **Vanguard Total Bond Market ETF (BND)**, and **S&P 500 ETF (SPY)**. 

We will follow these steps:
1. **Exploratory Data Analysis (EDA)** for time series characteristics.
2. **Stationarity Testing** using the Augmented Dickey-Fuller (ADF) test.
3. **Model Selection/Parameter Identification** via ACF/PACF and Auto-ARIMA.
4. **Model Development**: Comparing ARIMA and LSTM models.
5. **Evaluation**: Using metrics like MAE, RMSE, and MAPE.
6. **Forecasting**: Predicting future movements for portfolio optimization.

In [None]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import warnings

# Add src to path
sys.path.append(os.path.abspath(os.path.join('..', 'src')))

from models import ARIMAModel, LSTMModel, split_data, evaluate_forecast
from utils import check_stationarity, plot_price_series

sns.set(style="whitegrid")
warnings.filterwarnings('ignore')
print("Libraries loaded successfully.")

## 2. Data Loading
We load the cleaned historical data generated in Task 1.

In [None]:
data_path = '../data/processed/historical_data.csv'
data = pd.read_csv(data_path, index_col=0, parse_dates=True, header=[0, 1])

tickers = ['TSLA', 'BND', 'SPY']
close_prices = pd.DataFrame()
for ticker in tickers:
    close_prices[ticker] = data[ticker]['Close']

close_prices.head()

## 3. Time Series Exploration & Visualization
Understanding the trend and volatility of the assets.

In [None]:
plot_price_series(close_prices, title="Asset Price History (2015 - Present)")

### 3.1 Rolling Statistics
We check for stability in mean and variance over time.

In [None]:
plt.figure(figsize=(14, 6))
for ticker in tickers:
    rolmean = close_prices[ticker].rolling(window=30).mean()
    rolstd = close_prices[ticker].rolling(window=30).std()
    plt.plot(rolmean, label=f'{ticker} Rolling Mean (30d)')

plt.title('Rolling Means for Assets')
plt.legend()
plt.show()

### 3.2 Seasonal Decomposition
Decomposing TSLA price to see Trend, Seasonality, and Residual components.

In [None]:
result = seasonal_decompose(close_prices['TSLA'], model='multiplicative', period=252)
fig = result.plot()
fig.set_size_inches(14, 8)
plt.show()

## 4. Stationarity Testing (ADF Test)
ARIMA models require stationary data. We use the Augmented Dickey-Fuller test to verify.

In [None]:
for ticker in tickers:
    check_stationarity(close_prices[ticker], name=ticker)

Since the prices are non-stationary, they will require at least one order of differencing (d=1).

## 5. Correlation Analysis (ACF & PACF)
We examine Autocorrelation (ACF) and Partial Autocorrelation (PACF) to suggest p and q terms.

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 4))
plot_acf(close_prices['TSLA'].diff().dropna(), ax=ax[0], title="ACF - TSLA (Differenced)")
plot_pacf(close_prices['TSLA'].diff().dropna(), ax=ax[1], title="PACF - TSLA (Differenced)")
plt.show()

## 6. Model Training: ARIMA
We use `auto_arima` to automatically select the best (p, d, q) parameters based on AIC.

In [None]:
arima_results = {}
for ticker in tickers:
    print(f"\nFitting ARIMA for {ticker}...")
    train, test = split_data(close_prices[ticker], test_size=0.2)
    
    model = ARIMAModel()
    best_order = model.optimize_and_fit(train)
    
    # Forecast on test period
    preds = model.predict(n_periods=len(test))
    metrics = evaluate_forecast(test, preds, model_name=f"ARIMA_{ticker}")
    
    arima_results[ticker] = {
        'model': model,
        'metrics': metrics,
        'preds': preds,
        'test': test
    }
    print(f"Result for {ticker}: {metrics}")

## 7. Model Training: LSTM
Recurrent Neural Networks (LSTM) are better at capturing non-linear relationships.

In [None]:
lstm_results = {}
for ticker in ['TSLA']: # Focus on TSLA for deep learning demo
    print(f"\nFitting LSTM for {ticker}...")
    train, test = split_data(close_prices[ticker], test_size=0.2)
    
    model = LSTMModel(epochs=10, look_back=60)
    model.fit(train)
    
    # For LSTM prediction on test set, we need the last look_back elements of train
    # The current predict helper assumes we provide data including look_back buffer
    test_input = pd.concat([train.iloc[-60:], test])
    preds = model.predict(test_input)
    
    metrics = evaluate_forecast(test, preds, model_name=f"LSTM_{ticker}")
    lstm_results[ticker] = {
        'metrics': metrics,
        'preds': preds
    }
    print(metrics)

## 8. Final Visualizations
Plotting the forecast against actual validation data.

In [None]:
plt.figure(figsize=(14, 7))
ticker = 'TSLA'
test_idx = arima_results[ticker]['test'].index
plt.plot(arima_results[ticker]['test'], label='Actual Price', color='black')
plt.plot(test_idx, arima_results[ticker]['preds'], label='ARIMA Forecast', color='blue', linestyle='--')
if ticker in lstm_results:
    plt.plot(test_idx, lstm_results[ticker]['preds'], label='LSTM Forecast', color='red', linestyle=':')

plt.title(f"{ticker} Price Forecast vs Actual")
plt.legend()
plt.show()

## 9. Conclusion
We have explored the time series properties of our assets. The ARIMA model provided a statistical baseline, while the LSTM model attempted to capture more complex patterns. These forecasts will now be used as inputs for **Portfolio Optimization** (Expected Returns).