---
Author: Mustapha Bouhsen <br>
[LinkedIn](https://www.linkedin.com/in/mustapha-bouhsen/)<br>
[Git](https://github.com/mus514)<br>
Date: February 14, 2024<br>
---

In [0]:
%run Repos/bouhsen.m@gmail.com/ML_Pipeline_Hub/library/garch_model

In [0]:
%run Repos/bouhsen.m@gmail.com/ML_Pipeline_Hub/library/daily_utilities

## Creating table containg the GARCH(1,1) volatility for each stock

The volatility is given by :

$$
\sigma_{t}^2= \omega+ \alpha y^2_{t-1} + \beta \sigma^2_{t-1} \ , \ \omega \ge 0 \ \ \ \  \alpha, \beta>0 \ and
 \ \alpha+ \beta<1
$$

In [0]:
#-----------------------------------------
# Set the prod folder path
#-----------------------------------------
raw_folder_path = "/mnt/raw/"
prod_folder_path = "/mnt/prod/"

stocks = ["aapl", "amzn", "googl", "msft"]

In [0]:
#-----------------------------------------
# Load the data
#-----------------------------------------
returns = spark.sql("SELECT * FROM stocks_returns").toPandas()

In [0]:
#-----------------------------------------
# Calculate the GARCH(1,1) volatility for each stock
#-----------------------------------------
garch_vol = {}
garch_vol["date"] = returns["date"]
for stock in stocks:
    garch_vol[stock] = forecast_vol(returns[stock])[0][:-1]

garch_vol = pd.DataFrame(garch_vol)

In [0]:
#--------------------------------------------------------
# Convert the data to spark to save it esaly to sql table
#--------------------------------------------------------
# Define the schema
schema = StructType([
    StructField("date", StringType(), True),
    StructField("aapl", FloatType(), True),
    StructField("amzn", FloatType(), True),
    StructField("msft", FloatType(), True),
    StructField("googl", FloatType(), True)
])

garch_vol = spark.createDataFrame(garch_vol, schema)

# Check if the table exists
if spark.catalog.tableExists("stocks_volatility"):
    # Drop the existing table
    spark.sql(f"DROP TABLE stocks_volatility")
    print(f'Dropped table: stocks_volatility')

# Create the table
garch_vol.write.format("parquet").saveAsTable("stocks_volatility")

Dropped table: stocks_volatility


In [0]:
#-----------------------------------------
# Write the garch volatility in the prod
#-----------------------------------------
# Temp folder to save temp parquet files
temp_folder = prod_folder_path + f"temp/"

# write data frame to csv
garch_vol.coalesce(1).write.mode("overwrite").option("header", "True").csv(temp_folder)

#get all files path ending with .parquet
files_paths = get_files_paths_from_folders(temp_folder, ".csv")
            
# Copy parquet files to final destination
ingest_and_transform_to_parquet(files_paths, prod_folder_path, "volatilities")

# delete the temp folder
delete_contents_recursively(temp_folder)

In [0]:
%sql
-- Disply the stocks_returns
SELECT *
FROM stocks_volatility
ORDER BY date DESC
LIMIT 10

date,aapl,amzn,msft,googl
2024-02-12,0.00018448957,0.00064494414,0.00041550378,0.00020927316
2024-02-09,0.00019253211,0.0006314657,0.00041388778,0.00020363413
2024-02-08,0.00020006507,0.0007080605,0.00044699386,0.00022061924
2024-02-07,0.00021096437,0.0007930064,0.00047631405,0.00019706559
2024-02-06,0.00021713106,0.0008970418,0.00051616976,0.0002131243
2024-02-05,0.0002221631,0.0010173343,0.0005544657,0.0002131472
2024-02-02,0.00023297689,0.0004126897,0.0005976942,0.00019868766
2024-02-01,0.00023329063,0.00035980533,0.0006471031,0.00019144946
2024-01-31,0.00021728098,0.00030829286,0.00021628957,0.00013339386
2024-01-30,0.0002000188,0.00029780518,0.00021344332,0.00013986659
