### GARCH Modeling of Temperature in Los Angeles Returns


In this notebook, we explore how volatility in city temperature — specifically for teh city of Los Angeles — can be modeled and forecasted using two important time series approaches:

- GARCH(1,1): A classic model where today's volatility depends on past shocks and past volatility.
- EGARCH(1,1): An extended version that captures asymmetric behavior — allowing for different impacts depending on whether temperature changes are positive or negative.

Our goal is to evaluate how well these models describe the risk or variability in temperature behavior, how they perform in forecasting future volatility, and how we can use them to estimate extreme outcomes or risk measures such as Value at Risk (VaR) — repurposed in this context as the probability of unusually large temperature swings.

While GARCH models are primarily used in financial econometrics, they are increasingly applied to climate and environmental data, where variability and clustering of shocks are also common. One key insight from GARCH modeling is that volatility is not constant — periods of stable weather can be followed by clusters of high variability (e.g., during seasonal transitions or unusual weather patterns).

Another behavioral pattern we explore is asymmetry in temperature changes: Are sudden drops (cold snaps) associated with more volatility than sudden warm-ups? This is similar to the idea in financial markets where “bad news” often triggers more extreme reactions than “good news.”


### GARCH(1,1)

The GARCH(p, q) framework captures this dynamic by modeling the conditional variance (volatility) at any given time as a function of two things:

- Past squared shocks (errors) — how surprising or extreme previous changes were
- Past variances — how variable the recent temperature changes have been

Let us denote the return equation:

$$r_t = \mu + \varepsilon_t$$

Where:

- $r_t$ is daily temperature change (or return) at time $t$ (e.g., the change in temperature or log-change from the previous day)
- $\mu$ is the conditional mean of temperature returns (often assumed constant or modeled separately)  
- $\varepsilon_t$ is the innovation or shock at time $t$ — the unpredictable component of the process, representing the deviation from expected behavior

We also model the shock $\varepsilon_t$ as:

$$\varepsilon_t = \sigma_t z_t$$

Where:

- $\sigma_t$ is the conditional standard deviation (i.e., volatility) at time $t$  
- $z_t$ is a white noise process (typically $z_t \sim \text{i.i.d. } \mathcal{N}(0, 1)$ or another standardized distribution)

This means:
- $\varepsilon_t$ is the return shock scaled by the volatility at that time  
- The volatility $\sigma_t$ varies over time, which is the key insight in GARCH modeling


GARCH(1,1) Equation:

$$\sigma^2_t = \omega + \alpha \times \varepsilon^2_{t-1} + \beta \times \sigma^2_{t-1}$$


Where:
- $\sigma_t^2$ is the conditional variance of daily temperature changes at time $t$, given past information up to $t-1$  
- $\varepsilon_{t-1}^2$: the squared innovation (or shock) from time $t-1$
- $\omega > 0$ is the long-run or baseline level of variance. It ensures that volatility remains strictly positive even when past shocks are negligible. A larger $\omega$ implies a higher unconditional variance, all else equal.
- $\alpha \geq 0$ is the ARCH parameter, which measures the short-term impact of new information on volatility. Specifically, it quantifies how sensitive today's variance is to the magnitude of the most recent shock / innovation. A higher $\alpha$ implies that volatility reacts more strongly to recent changes.
- $\beta \geq 0$ is the GARCH parameter, capturing the persistence of past volatility. It determines how much of the previous day's variance carries forward to the current period. A high $\beta$ indicates long memory in volatility — meaning shocks decay slowly over time.

We assume:
- $\omega, \alpha, \beta \geq 0$
- $\alpha + \beta < 1$

In the long term, we know that $\sigma = \omega / (1-\alpha - \beta)$


### EGARCH(1,1) for asymmetric volatility

The conditional variance in the Exponential GARCH (EGARCH) model is defined as:

$$
\log(\sigma_t^2) = \omega + \beta \log(\sigma_{t-1}^2) + \alpha \left| \frac{\varepsilon_{t-1}}{\sigma_{t-1}} \right| + \gamma \frac{\varepsilon_{t-1}}{\sigma_{t-1}}
$$

This formulation models the log of variance, which ensures that $\sigma_t^2$ is always positive, without requiring non-negativity constraints on the parameters.

The EGARCH model includes an asymmetric term through the parameter $\gamma$, allowing for different volatility responses to positive and negative swings. Specifically:

- The term $\left| \frac{\varepsilon_{t-1}}{\sigma_{t-1}} \right|$ captures the magnitude of the standardized shock.
- The term $\frac{\varepsilon_{t-1}}{\sigma_{t-1}}$ captures its sign and direction.

$\gamma$: captures asymmetric effect

- If $\gamma < 0$: negative shocks (e.g., sudden temperature drops) increase volatility more than positive ones
- If $\gamma > 0$: positive shocks increase volatility more

$\beta$: volatility persistence

In this assignment, we'll apply both models to historical daily temperature data for San Diego, compute the conditional volatility, and evaluate the ability of these models to forecast future variability and extreme events.

We'll also compute Value at Risk (VaR) estimates — here, interpreted as thresholds for extreme temperature swings, to identify the likelihood of experiencing large temperature movements over time.

In [None]:
### INSTALL PACKAGES IF NEEDED
#!pip install pandas-datareader
#!pip install arch

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from arch import arch_model
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy.stats import t
import pandas_datareader.data as web

### Step 1: Load Data

In [None]:
# Get Los Angeles Temperature
data = ### Bring in the 'date' and 'Los Angeles' columns from 'temperature.csv'

# Compute log changes
data['Change'] = np.log(data[____] / data[____].shift(1)) * 100 ###FILL IN BLANKS
returns = data['Change'].dropna()

# Plot ###FILL IN BLANKS BELOW
plt.figure(figsize=(12, 6))
plt.plot(___)
plt.title(___)
plt.xlabel('Date')
plt.ylabel('Change (%)')
plt.grid(True)
plt.____()

### Step 2: Fit GARCH(1,1) Model

In [None]:
model1 = arch_model(returns, vol='Garch', p=1, q=1, mean='Constant', dist='normal')
res1 = model1.fit(disp='off')
print(___) ###FILL IN THE BLANK TO PRINT THE SUMMARY

Using the above analysis, elaborate on the following, including signficances and implication:
- Mean Equation

- Volatility Equation:
  - ω (long-run volatility)
  - α₁ (reaction to shocks)
  - β₁ (volatility persistence)
  - α₁ + β₁

- Distribution

- Log-Likelihood

- AIC (Akaike Information Criterion)
- BIC (Bayesian Information Criterion)

### Step 3: Fit EGARCH(1,1) Model

In [None]:
model2 = ###COMPLETE THIS LINE
res2 = model2.fit(disp='off')
print(____)

Using the above analysis, elaborate on the following, including signficances and implication:
- Mean Equation

- Volatility Equation:
  - ω (long-run volatility)
  - α₁ (reaction to shocks)
  - β₁ (volatility persistence)
  - α₁ + β₁

- Distribution

- Degrees of Freedom

- Log-Likelihood

- AIC (Akaike Information Criterion)
- BIC (Bayesian Information Criterion)


### Step 4: Compare AIC/BIC

In [None]:
print("GARCH AIC:", res1.aic, "| BIC:", res1.bic)
print("EGARCH AIC:", res2.aic, "| BIC:", res2.bic)

What does the above analysis tell us with respect to model fit and complevity?


### Step 5: Residual Diagnostics - ACF

In [None]:
plot_acf(____) ###FILL IN THE BLANK
plt.title("ACF of Standardized Residuals (EGARCH)")
plt.show()

Residual diagnostics are important to assess whether the EGARCH model has properly captured the dynamics in the return series, particularly autocorrelation and time-varying volatility. Interpret the above plot.

### Step 6: Ljung-Box Test

In [None]:
lb_test = ###COMPLETE THIS LINE
print(lb_test)

The Ljung–Box test is used to detect autocorrelation in the residuals of a time series model. It tests the null hypothesis that the residuals are independently distributed up to a certain lag (in this case, lag 10). Interpret the results.

### Step 7: Rolling Forecast (Expanding Window)

In [None]:
rolling_preds = []
true_vals = []
window = 1000

for i in range(window, len(returns) - 1):
    train = returns.iloc[:i]
    test = returns.iloc[i + 1]

    model = arch_model(train, __=____, _=_, _=_, ___='AR', lags=1, dist='t') ###FILL IN THE BLANKS
    result = model.fit(disp='off')
    forecast = result.forecast(horizon=1)
    sigma = np.sqrt(forecast.variance.values[-1, 0])
    rolling_preds.append(sigma)
    true_vals.append(abs(test))

plt.figure(figsize=(12, 4))
plt.plot(returns.index[window + 1:], true_vals, label="|Actual Return|")
plt.plot(returns.index[window + 1:], rolling_preds, label="Forecast Volatility")
plt.legend()
plt.title("Rolling Forecast: EGARCH(1,1)")
plt.show()

Answer the following:
1. What does the blue line represent?
2. What does the orange line represent?

Give an interprtation of the plot. For example, here's an interpretation on an EGARCH run on stock data:
- The forecasted volatility generally tracks the magnitude of returns well, capturing periods of high and low volatility.
- During volatile periods (e.g., mid-2022), the model forecasts higher volatility, while in calmer periods (e.g., mid-2023), it forecasts lower levels.
- Although not every spike in return magnitude is captured exactly, the model reacts adaptively and reflects changing market conditions.
- The EGARCH model effectively captures time-varying volatility and adjusts its forecasts as market conditions evolve. This dynamic behavior makes it a useful tool for risk management, Value at Risk (VaR), and volatility forecasting.


### Step 8: Dynamic Value at Risk (VaR)

In [None]:
from scipy.stats import t

alpha = 0.05
df = res2.params['nu']
t_quant = t.ppf(alpha, df=df)

VaR = res2.params['Const']  res2.conditional_volatility  t_quant ### ADD THE OPERATIONS TO THE LINE

plt.figure(figsize=(12, 4))
plt.plot(returns.index, returns, label="Returns")
plt.plot(returns.index, VaR, label="VaR (95%)", color='red')
plt.title("Dynamic VaR using EGARCH Model (95% Confidence)")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


Note: In the financial world, Value at Risk (VaR) is a widely used risk metric that estimates the potential loss in value of a portfolio or asset over a given time horizon, at a specific confidence level.

Answer the following:
1. What does the blue line represent?
2. What does the orange line represent?

Give an interprtation of the plot and what it says about the EGARCH-based dynamic VaR model.