#  Modelación Volatilidad en Tasas de Cambio con modelos de la familia GARCH

**Objetivo**: El objetivo de este Notebook es implementar modelos clásicos de volatilidad como GARCH (Generalized ARCH Models), y EGARCH (Exponential GARCH) y GJR para predecir la tasa de cambio entre monedas.

**Descripción Dataset**: El dataset utilizado contiene la información histórica entre "2021-04-06" y "2024-04-06" con frecuencia diaria para las siguientes tasas de cambio:
- `Dolar EstadoUnidense` (USD) contra `Dolar Canadiende` (CAD)
- `Dolar EstadoUnidense` (USD) contra `Dolar Australiano` (CAD)

**Métricas de evaluación de modelos**: 
- `Mean Squared Error (MSE)`
- `Mean Absolute Error (MAE)`
- `Mean Absolute Percentage Error (MAPE)`

**Integrantes**
- David López
- Camilo Velez
- Sebastian Ávila
- David Armendariz

## Importando Librerías

In [1]:
import plotly.express as px
import matplotlib.pyplot as plt
from statsmodels.stats.diagnostic import het_arch
from statsmodels.compat import lzip
import statsmodels.api as sm
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
import pandas as pd
import numpy as np
from arch import arch_model

## Cargando DataSet

In [2]:
df=pd.read_csv("../data/trusted/currency_exchange.csv")
df.rename(columns={"Fecha":"Date"},inplace=True)
print(df.dtypes)
print(df.shape)
display(df.head())
display(df.tail())

Date        object
USD_CAD    float64
USD_AUD    float64
dtype: object
(1096, 3)


Unnamed: 0,Date,USD_CAD,USD_AUD
0,2021-04-06,1.2566,1.3049
1,2021-04-07,1.2609,1.3137
2,2021-04-08,1.2562,1.3066
3,2021-04-09,1.253,1.3125
4,2021-04-10,1.253,1.3125


Unnamed: 0,Date,USD_CAD,USD_AUD
1091,2024-04-01,1.357,1.541
1092,2024-04-02,1.3567,1.5342
1093,2024-04-03,1.3527,1.5233
1094,2024-04-04,1.3543,1.518
1095,2024-04-05,1.359,1.5201


## Testeando ARCH effects

Para asegurar que esta clase de modelos son apropiados para la data, usamos el **Engle’s Lagrange multiplier test** con el objetivo de validar la hipótesis de que existe presencia de efectos **Autoregressive Conditionally Heteroscedastic (ARCH)**

### USD-CAD

In [3]:
data1 = sm.add_constant(df["USD_CAD"])
results = sm.OLS(data1["USD_CAD"], data1["const"]).fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                USD_CAD   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                       nan
Date:                Sun, 14 Apr 2024   Prob (F-statistic):                nan
Time:                        15:22:49   Log-Likelihood:                 1770.6
No. Observations:                1096   AIC:                            -3539.
Df Residuals:                    1095   BIC:                            -3534.
Df Model:                           0                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.3089      0.001    900.442      0.0

In [4]:
res = het_arch(results.resid, nlags=6)
name = ["lm", "lmpval", "fval", "fpval"]
lzip(name, res)

[('lm', 1040.4406275835142),
 ('lmpval', 1.6003044611375294e-221),
 ('fval', 3789.3848150577796),
 ('fpval', 0.0)]

El resultado del test es un **p-value** = 0, por lo tanto, a un nviel de significancia del 99% se rechaza la hipótesis nula de que no hay presencia de efectos ARCH para la tasa de cambio `USD-CAD`

### USD-AUD

In [6]:
data1 = sm.add_constant(df["USD_AUD"])
results = sm.OLS(data1["USD_AUD"], data1["const"]).fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                USD_AUD   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                       nan
Date:                Sun, 14 Apr 2024   Prob (F-statistic):                nan
Time:                        15:23:03   Log-Likelihood:                 1192.9
No. Observations:                1096   AIC:                            -2384.
Df Residuals:                    1095   BIC:                            -2379.
Df Model:                           0                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.4466      0.002    587.461      0.0

In [7]:
res = het_arch(results.resid, nlags=6)
name = ["lm", "lmpval", "fval", "fpval"]
lzip(name, res)

[('lm', 1049.775016027846),
 ('lmpval', 1.5311189571024948e-223),
 ('fval', 4710.614441119437),
 ('fpval', 0.0)]

## Modelo

In [9]:
df.set_index("Date",inplace=True)

### Separando datos en training y test

In [10]:

# Number of observations in the dataset
n = len(df)

# Specify the number of steps (days) for the training data
# For example, if you want the first 80% of the data for training
train_size = int(0.8 * n)

# Splitting the data into train and test sets using slice notation
train = df[:train_size]
test = df[train_size:]


In [13]:
train

Unnamed: 0_level_0,USD_CAD,USD_AUD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2021-04-06,1.2566,1.3049
2021-04-07,1.2609,1.3137
2021-04-08,1.2562,1.3066
2021-04-09,1.2530,1.3125
2021-04-10,1.2530,1.3125
...,...,...
2023-08-25,1.3601,1.5616
2023-08-26,1.3601,1.5616
2023-08-27,1.3601,1.5616
2023-08-28,1.3599,1.5554


### GARCH

In [32]:
data = pd.read_csv('../data/trusted/currency_exchange.csv', index_col='Fecha', parse_dates=True)

In [34]:
# Sampling
data_in_the_sample = data.loc[:"2023-06-02", "USD_AUD"]
data_out_of_the_sample = data.loc["2023-06-03":, "USD_AUD"]

In [36]:
# Check and convert index to DatetimeIndex if not already
data.index = pd.to_datetime(data.index)
data_in_the_sample.index = pd.to_datetime(data_in_the_sample.index)
data_out_of_the_sample.index = pd.to_datetime(data_out_of_the_sample.index)

# Initialize and fit the GARCH model on in-sample data
am = arch_model(data_in_the_sample, mean='Constant', vol='Garch', p=1, q=1)
res = am.fit(disp='off')

# Prepare to store forecasts
forecasts = []

# Manually create rolling one-step forecasts
for date in data_out_of_the_sample.index:
    try:
        # Generate a forecast
        next_forecast = res.forecast(horizon=1, start=date)
        # Store the forecasted variance
        if not next_forecast.variance.dropna().empty:
            forecasted_var = next_forecast.variance.dropna().iloc[0, 0]
            forecasts.append(forecasted_var)
        else:
            # Append np.nan if forecast cannot be generated
            forecasts.append(np.nan)
    except Exception as e:
        print(f"Error generating forecast for {date}: {e}")
        forecasts.append(np.nan)

# Create a DataFrame with the forecasts
forecast_df = pd.DataFrame({
    'Date': data_out_of_the_sample.index,
    'Forecasted Variance': forecasts
})

# Display the results
print(forecast_df)

estimating the model parameters. The scale of y is 0.00552. Parameter
estimation work better when this value is between 1 and 1000. The recommended
rescaling is 10 * y.

model or by setting rescale=False.



          Date  Forecasted Variance
0   2023-06-03                  NaN
1   2023-06-04                  NaN
2   2023-06-05                  NaN
3   2023-06-06                  NaN
4   2023-06-07                  NaN
..         ...                  ...
303 2024-04-01                  NaN
304 2024-04-02                  NaN
305 2024-04-03                  NaN
306 2024-04-04                  NaN
307 2024-04-05                  NaN

[308 rows x 2 columns]


#### USD_AUD Currency Exchange

In [25]:
# Defining the GARCH model on the in-sample data
am = arch_model(train["USD_AUD"], vol="Garch")  # Specifying p, q for GARCH(1,1)


In [31]:
# Initialize and fit the GARCH model on in-sample data
am = arch_model(data_in_the_sample, mean='Constant', vol='Garch', p=1, q=1)
res = am.fit(disp='off')

# Prepare to store forecasts
forecasts = []

# Manually create rolling one-step forecasts
for i in range(len(data_out_of_the_sample)):
    # Generate a forecast
    next_forecast = res.forecast(horizon=1, start=data_out_of_the_sample.index[i])
    # Store the forecasted variance
    if not next_forecast.variance.dropna().empty:
        forecasted_var = next_forecast.variance.dropna().iloc[0, 0]
        forecasts.append(forecasted_var)
    else:
        # Append None or np.nan if forecast cannot be generated
        forecasts.append(np.nan)

# Create a DataFrame with the forecasts
forecast_df = pd.DataFrame({
    'Date': data_out_of_the_sample.index,
    'Forecasted Variance': forecasts
})

# Display the results
print(forecast_df)

estimating the model parameters. The scale of y is 0.00552. Parameter
estimation work better when this value is between 1 and 1000. The recommended
rescaling is 10 * y.

model or by setting rescale=False.



AssertionError: 

In [24]:
cvar_rjpy_stat = {}
for date in data_out_of_the_sample.index:
    # Fit the model up to 'date' not including 'date'
    res = am.fit(last_obs=pd.Timestamp(date) - pd.Timedelta(days=1), disp="off")
    # Forecast the next day
    forecasts = res.forecast(horizon=1)
    # Since we forecast 1 day, we use iloc[0] to get that forecast
    forecasts_res = forecasts.variance.dropna()
    if not forecasts_res.empty:
        cvar_rjpy_stat[date] = forecasts_res.iloc[0]

cvar_rjpy_stat = pd.DataFrame(list(cvar_rjpy_stat.items()), columns=['Date', 'Forecasted Variance'])

# Print or inspect the DataFrame
print(cvar_rjpy_stat)

AssertionError: 

In [87]:

am = arch_model(train["USD_AUD"], vol="Garch")

In [88]:
cvar_rjpy_stat = {}
for date in data_out_of_the_sample.index:
    res = am.fit(last_obs=date, disp="off")
    forecasts = res.forecast(horizon=1)
    forecasts_res = forecasts.variance.dropna()
    cvar_rjpy_stat[date] = forecasts_res.iloc[1]

cvar_rjpy_stat = pd.DataFrame(cvar_rjpy_stat).T

AssertionError: 

In [59]:

# Define the GARCH model
model = arch_model(train["USD_AUD"], vol='GARCH')

# Fit the model
fitted_model = model.fit()

# Forecast the test set
test_pred = fitted_model.forecast(horizon=len(test))
predicted_volatility = np.sqrt(test_pred.variance.dropna().values[-1])

# Ensure the index of predicted_volatility aligns with the test set
predicted_volatility = pd.Series(predicted_volatility, index=test.index)

# Create a DataFrame for plotting
results_df = pd.DataFrame({
    'Actual': test,
    'Predicted': predicted_volatility
})

# Plot using Plotly Express
fig = px.line(results_df, title='Actual vs Predicted Volatility')
fig.update_xaxes(title_text='Date')
fig.update_yaxes(title_text='Volatility')
fig.show()

# Calculate MSE, MAE, and MAPE
mse = mean_squared_error(test, predicted_volatility)
mae = mean_absolute_error(test, predicted_volatility)
mape = mean_absolute_percentage_error(test, predicted_volatility)

print('MSE:', mse)
print('MAE:', mae)
print('MAPE:', mape)


y is poorly scaled, which may affect convergence of the optimizer when
estimating the model parameters. The scale of y is 0.002192. Parameter
estimation work better when this value is between 1 and 1000. The recommended
rescaling is 10 * y.

model or by setting rescale=False.




MSE: 1.6874974154229538
MAE: 1.2989650088729885
MAPE: 0.9589942910871211
