## **1. This notebook is for model testing and finding best code and model for Garch.**

**Importing necessary library**

In [1]:
# for data
import pandas as pd 
import numpy as np 

# for fetching data
import yfinance as yf

import ta # for indicator feature creation

# machine learning tools/functions/objects
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# models
from xgboost import XGBRegressor
from arch import arch_model

### **1.1 --> fetching data and looking data creating new feature**

#### 1.11 fetching data

In [2]:
# Download Apple stock (daily)
ticker = yf.Ticker("AAPL")
df = ticker.history(interval="1d", period="5y")
df.drop(["Dividends", "Stock Splits"], axis=1, inplace=True) # droping unnecessary columns for our project
df.index = df.index.strftime("%Y-%m-%d") # changing the data format to yyyy-mm-dd

#### Notes:
* Used yfinance library(yahoo finance) to fetch stock historical data.
* Droped unnecessary columns ["Dividends", "Stock Splits"] as they are not required for our models

#### 1.12 creating target

In [3]:
# making a target column logic: taking tommorow return on today's row which make it target for next day prediction

# Compute daily returns
df['return'] = df['Close'].pct_change()
df['target'] = df['return'].shift(-1)  # tomorrow’s return

#### Notes:
* created return column with help of pandas feature pct_change() which take last as base and current as comparing value.
* shifted one column above as it will act as target to predict

#### 1.13 creating new and extra useful features

In [4]:
df['rsi'] = ta.momentum.RSIIndicator(df['Close']).rsi()
df['macd'] = ta.trend.MACD(df['Close']).macd()
df['sma5'] = df['Close'].rolling(5).mean()
df['sma10'] = df['Close'].rolling(10).mean()
df['sma20'] = df['Close'].rolling(20).mean()
df['sma50'] = df['Close'].rolling(50).mean()
df['sma100'] = df['Close'].rolling(100).mean()
df['sma200'] = df['Close'].rolling(200).mean()

#### Notes:
* RSI measures momentum by comparing recent gains and losses. It helps identify overbought (>70) or oversold (<30) conditions, guiding traders on potential reversal points or trend continuation strength.
* MACD highlights trend direction and momentum by comparing two moving averages. Its crossovers, divergence, and histogram provide insights into potential entry and exit points, helping traders spot trend reversals or confirm ongoing trends.
* SMA smooths price data by averaging over a set period, filtering noise and showing overall trend direction. It acts as dynamic support/resistance, and crossovers with other SMAs often signal buy/sell opportunities.

### **1.2 --> Defing X, y and spliting data**

#### 1.21 defining X, y

In [5]:
df.dropna(inplace=True)

X = df[['rsi','macd','sma5', 'Volume', 'sma10', 'sma20', 'sma50', 'sma100', 'sma200']]  # features
y = df['target']                # tomorrow’s return


#### 1.22 Spliting data into train and test

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=False, test_size=0.2)

### **1.3 --> Training models**

#### 1.31 Implementing one of the best regression model.

In [7]:
model = XGBRegressor()
model.fit(X_train, y_train)

pred = model.predict(X_test)

# Evaluate
rmse = mean_squared_error(y_test, pred)
mae = mean_absolute_error(y_test, pred)
r_square = r2_score(y_test, pred)

print("RMSE:", rmse)
print("MAE :", mae)
print("R_Squared :", r_square)


RMSE: 0.0006186901586766389
MAE : 0.017666524648262617
R_Squared : -0.3147139846747442


#### 1.32 Implementing one of the best model for this problem statement Grach with hyper parameter tuning

In [8]:
returns = 100 * df['Close'].pct_change().dropna()  # creating data for fitting model

def qlike_loss(realized_var, predicted_var):
    return np.mean((realized_var / predicted_var) - np.log(realized_var / predicted_var) - 1)

best_aic, best_model, best_params, best_qlike = 1e10, None, None, None

for vol in ["Garch", "EGARCH", "GJR-GARCH"]:
    for dist in ["normal", "t", "skewt"]:
        for p in range(1,3):
            for q in range(1,3):
                try:
                    model = arch_model(returns, vol=vol, p=p, q=q, dist=dist, mean="Constant")
                    res = model.fit(disp="off")
                    forecast = res.forecast(horizon=1)
                    predicted_var = forecast.variance.iloc[-1].values
                    realized_var = returns.iloc[-1]**2
                    qlike = qlike_loss(np.array([realized_var]), predicted_var)
                    
                    if res.aic < best_aic:
                        best_aic = res.aic
                        best_model = res
                        best_params = [vol, dist, p, q]
                        best_qlike = qlike
                except:
                    continue

print("Best Model:", best_params, "AIC:", best_aic, "QLIKE:", best_qlike)
print(best_model.summary())

Best Model: ['EGARCH', 't', 2, 1] AIC: 3989.7218697832236 QLIKE: 0.7459978938506164
                        Constant Mean - EGARCH Model Results                        
Dep. Variable:                        Close   R-squared:                       0.000
Mean Model:                   Constant Mean   Adj. R-squared:                  0.000
Vol Model:                           EGARCH   Log-Likelihood:               -1988.86
Distribution:      Standardized Student's t   AIC:                           3989.72
Method:                  Maximum Likelihood   BIC:                           4019.48
                                              No. Observations:                 1054
Date:                      Tue, Sep 23 2025   Df Residuals:                     1053
Time:                              18:31:56   Df Model:                            1
                                Mean Model                                
                 coef    std err          t      P>|t|    95.0% Conf. Int.
-

#### **Note:**
The GARCH model outperforms traditional regressor models in predicting stock market volatility because it directly captures time-varying variance, volatility clustering, and persistence in financial returns, while regressors often fail to adapt dynamically to changing market conditions.

#### 1.33 code for next point prediction

In [9]:
model = arch_model(returns, vol=best_params[0], p=best_params[2], q=best_params[3], dist=best_params[1], mean="Constant")
res = model.fit(disp="off")

# Forecast next 5 days ahead
forecast = res.forecast(horizon=1)

# Extract variance forecast
predicted_var = forecast.variance.iloc[-1].values
predicted_vol = np.sqrt(predicted_var)

print("Predicted variance:", predicted_var)
print("Predicted volatility (%):", predicted_vol)

Predicted variance: [3.71519344]
Predicted volatility (%): [1.92748371]


**Best Model:** `['EGARCH', 't', 2, 1]`  
**AIC:** 3993.16  
**Log-Likelihood:** -1990.58  
**Observations:** 1055  

---

### Mean Model
- **Mean Type:** Constant Mean  
- **mu (μ):** 0.1049 (p < 0.001) → Significant positive mean return  

---

### Volatility Model (EGARCH)
- **omega (ω):** 0.0238 (p = 0.0196) → Baseline variance  
- **alpha[1]:** 0.3207 (p < 0.001) → Strong impact of shocks on volatility  
- **alpha[2]:** -0.2032 (p = 0.0157) → Asymmetry (negative shocks reduce volatility)  
- **beta[1]:** 0.9844 (p < 0.001) → High persistence of volatility  

---

### Distribution (Student’s t)
- **nu (ν):** 5.1075 (p < 0.001) → Heavy tails, capturing fat-tailed return distributions  

---

### Key Insights
- EGARCH effectively captures volatility clustering and asymmetry in stock returns.  
- High **β (0.9844)** indicates persistent volatility shocks.  
- Student’s t distribution supports modeling of fat tails in financial data.  
- Overall, this EGARCH model is superior for volatility forecasting compared to simple regressor models, as it accounts for **time-varying variance, persistence, and asymmetry**.  

## Why Garch Models and not Regression Models.


GARCH models are superior for volatility prediction because they account for volatility clustering—periods of high volatility tend to be followed by more high volatility, and vice versa. Unlike simple regression models, GARCH captures this dynamic by making the conditional variance dependent on past squared errors (ARCH component) and past estimated variances (GARCH component). This makes GARCH models more realistic for financial time series, which exhibit changing variances and are not well-represented by constant variance assumptions of standard regressions. 