# AI for stock market prediction: Using LLMs for TimeSeries Predictions

Project by: Jana Nikolovska <br>
Supervised by: Giacomo Frisoni, MSc <br><!--  -->
Prof. Gianluca Moro, PhD <br>

ALMA MATER STUDITORIUM - University of Bologna
May 2025

---
**Summary:** <br>
In this project, I explore the use of Large Language Models (LLMs) for time series forecasting, focusing on the task of stock market prediction. The work was proposed and mentored by Prof. Gianluca Moro and Giacomo Frisoni at the University of Bologna.

As a starting point, I used a provided notebook as starting point . The notebook introduces the dataset (historical S&P 500 data via [`yfinance`](https://pypi.org/project/yfinance/)), a baseline linear regression model for comparison, and the *Trading Protocol* — a framework to evaluate forecasting performance by simulating trading strategies.

For the LLM-based forecasting approach, I followed the methodology described in [this paper](https://arxiv.org/pdf/2310.07820) and its [official implementation](https://github.com/ngruver/llmtime/tree/main). The code from the paper has been adapted and extended with additional functionality tailored to the specific requirements of my experiments.

_Goal_: <br>
The goal of this project was to create a similar problem to those of the referenced paper, particularly in terms of train and test sizes. The problem was defined such that with 150 days of value history, the goal was to predict the subsequent 29 days (for both open and closed time series), following an autoregressive approach without using ground truth. 
* Modifications were made to the Linear Regression model from the baseline notebook. While it still utilizes a lagged dataset for predictions, a simulation of autoregressiveness was incorporated to make it more comparable to autoregressive models. <br>
The dataset was split into multiple 150-29 train-test sections, and models were trained and evaluated independently on each split. <br>

_Evaluation_: <br>
For evaluation, RMSE and MAPE were used to assess the models' predictive accuracy, while a trading protocol was employed to simulate trading and profit. To enhance the results, I averaged the outcomes across the different splits to achieve a more reliable measure of model performance. Various visualizations were included throughout the notebook to enhance the understanding of the results."

In [42]:
import os
os.environ['OMP_NUM_THREADS'] = '4'
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import dash
from dash import Dash, dcc, html, Input, Output
from IPython.utils import io
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.tsa.seasonal import seasonal_decompose
from scipy import stats
import openai
from data.serialize import SerializerSettings
from models.utils import grid_iter
from models.darts import get_arima_predictions_data
from models.llmtime import get_llmtime_predictions_data
from data.small_context import get_datasets
from models.validation_likelihood_tuning import get_autotuned_predictions_data
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score, explained_variance_score, mean_squared_log_error, median_absolute_error

%load_ext autoreload
%reload_ext autoreload
%autoreload 2

import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning

from src.visualizations import random_8_digit_number, plot_autoregressive_ml_model_results, gain_over_tries, plot_eval_over_time

warnings.simplefilter('once', UserWarning)
warnings.filterwarnings("ignore", category=ConvergenceWarning)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
with open("secrets/openai_key.txt", "r") as file:
    openai_api_key = file.read().strip()
openai.api_key = openai_api_key

In [28]:
PREDS = {}
RESULTS = []
LAG = 12
TRAIN_SIZE = 150
PREDICTION_SIZE = 29
RESULT_IMAGES_PATH = 'visualizations/project'
DF_PATH = os.path.join("data","sp500_dates.csv")
DS_NAME = 'sp500'
os.makedirs(RESULT_IMAGES_PATH, exist_ok=True)

## Define models ##

1.  **ARIMA** <br>

ARIMA (AutoRegressive Integrated Moving Average) is a statistical model used specifically for univariate time series forecasting. It is a traditional model in econometrics and signal processing.

An ARIMA model is defined by three parameters:
    
    * AR (AutoRegressive): dependence on past values
    
    * I (Integrated): differencing to make the series stationary
    
    * MA (Moving Average): dependence on past forecast errors

ARIMA models assume that future values of a time series are a linear function of past observations and residual errors. The model explicitly encodes temporal relationships using equations that reflect the structure of the data. It is a classical statistical method specifically developed for time-dependent data. <br/>
**sources**: Hayes, A. (2024, July 31). [Autoregressive Integrated Moving Average (ARIMA) prediction model](https://www.investopedia.com/terms/a/autoregressive-integrated-moving-average-arima.asp#:~:text=Autoregressive%20integrated%20moving%20average%20(ARIMA)%20models%20predict%20future%20values%20based,to%20forecast%20future%20security%20prices.). Investopedia. <br>

2. **GPT-3 and GPT-4** <br>

GPT-3 and GPT-4 are Large Language Models (LLMs) based on the Transformer architecture, which is a deep learning model introduced in the field of Natural Language Processing (NLP). These models are trained using self-supervised learning on massive corpora of text data.
GPT models operate using autoregressive language modeling. They generate outputs by predicting the next token (word, number, or symbol) based on preceding context. In essence, they do not "understand" in the human sense but rely on statistical correlations learned from text data.
    
hese models do not reason analytically like traditional algorithms. Instead, they infer likely outcomes based on patterns in their training data. This makes them particularly effective in zero-shot or few-shot learning scenarios where minimal or no task-specific data is provided. **While GPT-3 and GPT-4 are not designed for time series, they can be repurposed by encoding time series as sequences of tokens.**

<p align="center">
  <img src="data/visualizations/comparison_arima_gpts.jpg" alt="Comparison table between ARIMA, GPT-3 and GPT-4 models" width="800"/>
</p>

### Setting hyper parameters ###

In [4]:
"""
    Function adapted from: https://github.com/ngruver/llmtime
    Original author(s): Nicholas Gruver et al.
"""
gpt4_hypers = dict(
    alpha=0.3,
    basic=True,
    temp=1.0,
    top_p=0.8,
    settings=SerializerSettings(base=10, prec=3, signed=True, time_sep=', ', bit_sep='', minus_sign='-')
)

gpt3_hypers = dict(
    temp=0.7,
    alpha=0.95,
    beta=0.3,
    basic=False,
    settings=SerializerSettings(base=10, prec=3, signed=True, half_bin_correction=True)
)


promptcast_hypers = dict(
    temp=0.7,
    settings=SerializerSettings(base=10, prec=0, signed=True, 
                                time_sep=', ',
                                bit_sep='',
                                plus_sign='',
                                minus_sign='-',
                                half_bin_correction=False,
                                decimal_point='')
)

arima_hypers = dict(p=[12,30], d=[1,2], q=[0])

model_hypers = {
    #'LLMTime GPT-3.5': {'model': 'gpt-3.5-turbo-instruct', **gpt3_hypers},
    #'LLMTime GPT-4': {'model': 'gpt-4', **gpt4_hypers},   
    'ARIMA': arima_hypers,
    
}

model_predict_fns = {
    #'LLMTime GPT-3.5': get_llmtime_predictions_data,
    #'LLMTime GPT-4': get_llmtime_predictions_data,
    'ARIMA': get_arima_predictions_data,
}

model_names = list(model_predict_fns.keys())

## Dataset Overview ##

In [5]:
from src.utils import load_dataset, split_time_series

In [6]:
sp500_data = df = load_dataset(DF_PATH)

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2001-01-02,1320.280029,1320.280029,1276.050049,1283.270020,1.129400e+09
2001-01-03,1283.270020,1347.760010,1274.619995,1347.560059,1.880700e+09
2001-01-04,1347.560059,1350.239990,1329.140015,1333.339966,2.131000e+09
2001-01-05,1333.339966,1334.770020,1294.949951,1298.349976,1.430800e+09
2001-01-06,1321.676636,1322.630005,1288.729980,1297.519979,1.325700e+09
...,...,...,...,...,...
2008-12-26,869.510010,873.739990,866.520020,872.799988,1.880050e+09
2008-12-27,870.463338,873.726664,863.370015,871.673319,2.361177e+09
2008-12-28,871.416667,873.713338,860.220011,870.546651,2.842303e+09
2008-12-29,872.369995,873.700012,857.070007,869.419983,3.323430e+09


In [7]:
splits = split_time_series(sp500_data, TRAIN_SIZE, PREDICTION_SIZE)

print(f'Number of splits from dataset: {len(splits)}')
print("Train:")
print(splits[0][0].tail(), "\n")
print("Test:")
print(splits[0][1].head())

Number of splits from dataset: 16
Train:
                   Open         High          Low        Close        Volume
Date                                                                        
2001-05-27  1285.530029  1285.795044  1270.915039  1272.910034  9.270500e+08
2001-05-28  1281.710022  1282.107544  1268.162537  1270.420044  9.765250e+08
2001-05-29  1277.890015  1278.420044  1265.410034  1267.930054  1.026000e+09
2001-05-30  1267.930054  1267.930054  1245.959961  1248.079956  1.158600e+09
2001-05-31  1248.079956  1261.910034  1248.069946  1255.819946  1.226600e+09 

Test:
                   Open         High          Low        Close        Volume
Date                                                                        
2001-06-01  1255.819946  1265.339966  1246.880005  1260.670044  1.015000e+09
2001-06-02  1257.436646  1265.949992  1250.039998  1262.816691  9.555000e+08
2001-06-03  1259.053345  1266.560018  1253.199992  1264.963338  8.960000e+08
2001-06-04  1260.670044  12

## Run Inference ##

### Metrics. Utility functions for evaluation

In [8]:
from src.evaluation import mape

MAPE measures the average absolute percentage difference between predicted and actual values.

![image.png](attachment:7a37d6aa-6a9d-45da-a0cd-6c12e3f6a8eb.png)  TODO: fix image <br/>

Interpretation:
* Expressed as a percentage, so it shows the average error as a percent of the true values.
* Lower MAPE means more accurate predictions relative to the actual values.

Scale-independent, so you can compare errors across datasets or variables and intuitive

_Simulated Trading Protocol_: In the gain function, we leverage both the predicted open and predicted close prices generated by our models. By comparing these predicted values, we determine whether the model expects the price to increase or decrease within the trading period. We then use these predictions alongside the actual open and close prices to calculate the realized gain, reflecting how accurate the combined open-close forecasts are in predicting profitable movements.

* `gain` estimates how much money you would make or lose if you followed the prediction signals (buy if predicted growth, sell if predicted decline) based on actual price movements.
* `roi` function normalizes this gain by the average amount invested, giving a percentage-like return figure.
* `print_eval` function wraps both, printing out the results clearly.

### Traditional Machine Learning Models ###

#### Get the dataset ready ####

In [11]:
from src.models.traditional_ml_models import create_lag_features, split_lag_dataset_to_label_and_features, predict_linear_regression

These functions are used to simulate autoregressive behavior by creating lagged input features that represent past values of the time series. By feeding these lagged features into a linear regression model, we imitate an autoregressive process where predictions depend explicitly on previous observed or predicted values.

* `create_lag_features` Create a DataFrame of lagged features from a time series
* `split_lag_dataset_to_label_and_features` From multiple (train, test) splits of lagged datasets, separate feature columns and target labels
* `update_lag_input` Create a new input row with updated lag features for recursive forecasting
* `recursive_forecast` Predict multiple future time points one step at a time, feeding predictions back as inputs, imitating autoregressiveness

In [12]:
lag_data_open = create_lag_features(sp500_data['Open'], lag_size=LAG)
display(lag_data_open.head())
lag_data_close = create_lag_features(sp500_data['Close'], lag_size=LAG)
display(lag_data_close.head())

Unnamed: 0_level_0,value,lag_1,lag_2,lag_3,lag_4,lag_5,lag_6,lag_7,lag_8,lag_9,lag_10,lag_11,lag_12
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2001-01-14,1322.569946,1324.694946,1326.819946,1313.27002,1300.800049,1295.859985,1298.349976,1310.013306,1321.676636,1333.339966,1347.560059,1283.27002,1320.280029
2001-01-15,1320.444946,1322.569946,1324.694946,1326.819946,1313.27002,1300.800049,1295.859985,1298.349976,1310.013306,1321.676636,1333.339966,1347.560059,1283.27002
2001-01-16,1318.319946,1320.444946,1322.569946,1324.694946,1326.819946,1313.27002,1300.800049,1295.859985,1298.349976,1310.013306,1321.676636,1333.339966,1347.560059
2001-01-17,1326.650024,1318.319946,1320.444946,1322.569946,1324.694946,1326.819946,1313.27002,1300.800049,1295.859985,1298.349976,1310.013306,1321.676636,1333.339966
2001-01-18,1329.890015,1326.650024,1318.319946,1320.444946,1322.569946,1324.694946,1326.819946,1313.27002,1300.800049,1295.859985,1298.349976,1310.013306,1321.676636


Unnamed: 0_level_0,value,lag_1,lag_2,lag_3,lag_4,lag_5,lag_6,lag_7,lag_8,lag_9,lag_10,lag_11,lag_12
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2001-01-14,1322.600037,1320.575043,1318.550049,1326.819946,1313.27002,1300.800049,1295.859985,1296.689982,1297.519979,1298.349976,1333.339966,1347.560059,1283.27002
2001-01-15,1324.625031,1322.600037,1320.575043,1318.550049,1326.819946,1313.27002,1300.800049,1295.859985,1296.689982,1297.519979,1298.349976,1333.339966,1347.560059
2001-01-16,1326.650024,1324.625031,1322.600037,1320.575043,1318.550049,1326.819946,1313.27002,1300.800049,1295.859985,1296.689982,1297.519979,1298.349976,1333.339966
2001-01-17,1329.469971,1326.650024,1324.625031,1322.600037,1320.575043,1318.550049,1326.819946,1313.27002,1300.800049,1295.859985,1296.689982,1297.519979,1298.349976
2001-01-18,1347.969971,1329.469971,1326.650024,1324.625031,1322.600037,1320.575043,1318.550049,1326.819946,1313.27002,1300.800049,1295.859985,1296.689982,1297.519979


In [13]:
splits_lagged_open = split_time_series(lag_data_open, TRAIN_SIZE, PREDICTION_SIZE)
splits_lagged_close = split_time_series(lag_data_close, TRAIN_SIZE, PREDICTION_SIZE)

print(f'Number of splits from dataset: {len(splits_lagged_open)}')
print("Train:")
print(splits_lagged_open[0][0].tail(), "\n")
print("Test:")
print(splits_lagged_open[0][1].head())

Number of splits from dataset: 16
Train:
                  value        lag_1        lag_2        lag_3        lag_4  \
Date                                                                          
2001-06-08  1276.959961  1270.030029  1283.569946  1267.109985  1260.670044   
2001-06-09  1272.959961  1276.959961  1270.030029  1283.569946  1267.109985   
2001-06-10  1268.959961  1272.959961  1276.959961  1270.030029  1283.569946   
2001-06-11  1264.959961  1268.959961  1272.959961  1276.959961  1270.030029   
2001-06-12  1254.390015  1264.959961  1268.959961  1272.959961  1276.959961   

                  lag_5        lag_6        lag_7        lag_8        lag_9  \
Date                                                                          
2001-06-08  1259.053345  1257.436646  1255.819946  1248.079956  1267.930054   
2001-06-09  1260.670044  1259.053345  1257.436646  1255.819946  1248.079956   
2001-06-10  1267.109985  1260.670044  1259.053345  1257.436646  1255.819946   
2001-06-11

In [14]:
Xs_open_train, ys_open_train, Xs_open_test, ys_open_test = split_lag_dataset_to_label_and_features(splits_lagged_open)
Xs_close_train, ys_close_train, Xs_close_test, ys_close_test = split_lag_dataset_to_label_and_features(splits_lagged_close)

#### Linear regression ####

**Pipeline**: 
* make predictions with an autoregressive approach for all splits from the original dataset
* visualize an Open and a Close series graph of True and Predicted values (chosen randomly)
* visualize the histogram of Gain over all the splits, print minimum and maximum values of Gain
* visualize the evaluation over time for the split with minimum and maximum Gain

In [17]:
lr_predictions_open = []
for i in range(len(Xs_open_train)):
    lr_predictions_open.append(predict_linear_regression((Xs_open_train[i],ys_open_train[i]), (Xs_open_test[i],ys_open_test[i]),lag=LAG))

lr_predictions_close = []
for i in range(len(Xs_close_train)):
    lr_predictions_close.append(predict_linear_regression((Xs_close_train[i],ys_close_train[i]), (Xs_close_test[i],ys_close_test[i]),lag=LAG))

In [29]:
file_number = random_8_digit_number()
#file_number = None
print(f"Due to space limitation the plot is being saved in {RESULT_IMAGES_PATH} under the name plot_{file_number}.png. For running and visualizing it directly in the notebook uncomment and run the commented line")
plot_autoregressive_ml_model_results(ys_open_train[0], ys_open_test[0], lr_predictions_open[0], ds_name="SP500-Open", model_name='Linear Regression', hash_number=file_number)

Due to space limitation the plot is being saved in visualizations/project under the name plot_69087485.png. For running and visualizing it directly in the notebook uncomment and run the commented line


In [30]:
file_number = random_8_digit_number()
#file_number = None
print(f"Due to space limitation the plot is being saved in {RESULT_IMAGES_PATH} under the name plot_{file_number}.png. For running and visualizing it directly in the notebook uncomment and run the commented line")
plot_autoregressive_ml_model_results(ys_close_train[2], ys_close_test[2], lr_predictions_close[2], ds_name="SP500-Close", model_name='Linear Regression', hash_number=file_number)

Due to space limitation the plot is being saved in visualizations/project under the name plot_85331481.png. For running and visualizing it directly in the notebook uncomment and run the commented line


### Evalutaion ###

In [38]:
from src.evaluation import print_eval

In [34]:
lr_gain = []
lr_roi = []
lr_mape_open = []
lr_mape_close = []

for i in range(len(lr_predictions_open)):
    gain_lr_, roi_lr_ = print_eval(lr_predictions_open[i], lr_predictions_close[i], ys_open_test[i], ys_close_test[i], verbose=False)
    lr_mape_open.append(mape(ys_open_test[i], lr_predictions_open[i]))
    lr_mape_close.append(mape(ys_close_test[i], lr_predictions_close[i]))
    lr_gain.append(gain_lr_)
    lr_roi.append(roi_lr_)

In [40]:
file_number = random_8_digit_number()
#file_number = None
print(f"Due to space limitation the plot is being saved in {RESULT_IMAGES_PATH} under the name plot_{file_number}.png. For running and visualizing it directly in the notebook uncomment and run the commented line")
gain_over_tries(lr_gain,'Linear Regression', hash_number = file_number)
print(f"Gain: Min: {round(min(lr_gain),2)}$, Max: {round(max(lr_gain),2)}$")


Due to space limitation the plot is being saved in visualizations/project under the name plot_22823587.png. For running and visualizing it directly in the notebook uncomment and run the commented line
Gain: Min: -96.17$, Max: 39.5$


In [43]:
file_number = random_8_digit_number()
#file_number = None
print(f"Due to space limitation the plot is being saved in {RESULT_IMAGES_PATH} under the name plot_{file_number}.png. For running and visualizing it directly in the notebook uncomment and run the commented line")
daily_gain, cumulative_gain_series = plot_eval_over_time(lr_predictions_open[lr_gain.index(min(lr_gain))], 
                                                         lr_predictions_close[lr_gain.index(min(lr_gain))], 
                                                         ys_open_test[lr_gain.index(min(lr_gain))], 
                                                         ys_close_test[lr_gain.index(min(lr_gain))],
                                                        hash_number = file_number)

Due to space limitation the plot is being saved in visualizations/project under the name plot_66032252.png. For running and visualizing it directly in the notebook uncomment and run the commented line


In [44]:
file_number = random_8_digit_number()
#file_number = None
print(f"Due to space limitation the plot is being saved in {RESULT_IMAGES_PATH} under the name plot_{file_number}.png. For running and visualizing it directly in the notebook uncomment and run the commented line")
daily_gain, cumulative_gain_series = plot_eval_over_time(lr_predictions_open[lr_gain.index(max(lr_gain))],
                                                         lr_predictions_close[lr_gain.index(max(lr_gain))],
                                                         ys_open_test[lr_gain.index(max(lr_gain))],
                                                         ys_close_test[lr_gain.index(max(lr_gain))],
                                                        hash_number = file_number)

Due to space limitation the plot is being saved in visualizations/project under the name plot_78752830.png. For running and visualizing it directly in the notebook uncomment and run the commented line


In [45]:
RESULTS.append({
    "Model": "Linear Regression",
    "MAPE - Open": round(sum(lr_mape_open) / len(lr_mape_open),2),
    "MAPE - Closed": round(sum(lr_mape_close) / len(lr_mape_close),2),
    "Gain": round(sum(lr_gain) / len(lr_gain),2),
    "ROI": round(sum(lr_roi) / len(lr_roi),4)
})

In [35]:
RESULTS_df = pd.DataFrame.from_dict(RESULTS)
display(RESULTS_df)

### ARIMA, GPT3, GPT4 ###

In [47]:
from src.models.llmtime import get_inference

In [49]:
suppress_output = True
open_dicts_ = {k: [] for k in model_names}
close_dicts_ = {k: [] for k in model_names}

for i, split in enumerate(splits[:5]):
    print(f'[INFO] Processing split {i}')
    
    def run_inference():
        model_name_open, predicted_open_dict = get_inference((split[0]['Open'], split[1]['Open']), 'SP500', num_samples=30, visualize=False)
        model_name_close, predicted_close_dict = get_inference((split[0]['Close'], split[1]['Close']), 'SP500', num_samples=30, visualize=False)
        return model_name_open, predicted_open_dict, model_name_close, predicted_close_dict

    if suppress_output:
        with io.capture_output() as captured:
            model_name_open, predicted_open_dict, model_name_close, predicted_close_dict = run_inference()
    else:
        model_name_open, predicted_open_dict, model_name_close, predicted_close_dict = run_inference()

    for k in predicted_open_dict:
        open_dicts_[k].append(predicted_open_dict[k])
    for k in predicted_close_dict:
        close_dicts_[k].append(predicted_close_dict[k])
    
    print(f'[INFO] Processed split {i}')


[INFO] Processing split 0


NameError: name 'model_names' is not defined

In [38]:
ts_gain = {}
ts_roi = {}
ts_mape_open = {}
ts_mape_close = {}
ts_predictions_open = {}
ts_predictions_close = {}

for model in model_names:
    print(f'[INFO]Processing model: {model}...')
    ts_gain[model] = []
    ts_roi[model] = []
    ts_mape_open[model] = []
    ts_mape_close[model] = []
    ts_predictions_open[model] = []
    ts_predictions_close[model] = []
    for i in range(len(open_dicts_[model])):
        print(f'\t\t\tProcessing iteration: {i}...')
        preds_model_open, preds_model_close =  process_llmtime_outputs(open_dicts_[model][i]['samples'], 
                                                                       close_dicts_[model][i]['samples'])
        
        
        preds_model_gain, preds_model_roi = print_eval(preds_model_open, 
                                                       preds_model_close, 
                                                       splits[i][1]['Open'], 
                                                       splits[i][1]['Close'],
                                                      verbose=False)
        ts_predictions_open[model].append(preds_model_open)
        ts_predictions_close[model].append(preds_model_close)
        ts_mape_open[model].append(mape(splits[i][1]['Open'], preds_model_open))
        ts_mape_close[model].append(mape(splits[i][1]['Close'], preds_model_close))
        ts_gain[model].append(preds_model_gain)
        ts_roi[model].append(preds_model_roi)
    RESULTS.append({
    "Model": model,
    "MAPE - Open": round(sum(ts_mape_open[model]) / len(ts_mape_open[model]),2),
    "MAPE - Closed": round(sum(ts_mape_close[model]) / len(ts_mape_close[model]),2),
    "Gain": round(sum(ts_gain[model]) / len(ts_gain[model]),2),
    "ROI": round(sum(ts_roi[model]) / len(ts_roi[model]),2)
})

[INFO]Processing model: LLMTime GPT-3.5...


ZeroDivisionError: division by zero

In [39]:
RESULTS_df = pd.DataFrame.from_dict(RESULTS)
display(RESULTS_df)

In [40]:
for model in ts_gain.keys():

    print(f"Model {model}, gain over tries graph")
    file_number = random_8_digit_number()
    #file_number = None
    print(f"Due to space limitation the plot is being saved in {RESULT_IMAGES_PATH} under the name plot_{file_number}.png. For running and visualizing it directly in the notebook uncomment and run the commented line")
    gain_over_tries(ts_gain[model],model, hash_number = file_number)
    print(f"Gain: Min: {round(min(ts_gain[model]),2)}$, Max: {round(max(ts_gain[model]),2)}$")
    
    print(f"Model {model}, minimum gain graph")
    file_number = random_8_digit_number()
    #file_number = None
    print(f"Due to space limitation the plot is being saved in {RESULT_IMAGES_PATH} under the name plot_{file_number}.png. For running and visualizing it directly in the notebook uncomment and run the commented line")

    _ = plot_eval_over_time(ts_predictions_open[model][ts_gain[model].index(min(ts_gain[model]))], 
                                                         ts_predictions_close[model][ts_gain[model].index(min(ts_gain[model]))], 
                                                         splits[ts_gain[model].index(min(ts_gain[model]))][1]['Open'], 
                                                         splits[ts_gain[model].index(min(ts_gain[model]))][1]['Close'],
                                                            hash_number=file_number)
    print(f"Model {model}, maximum gain graph")
    file_number = random_8_digit_number()
    #file_number = None
    print(f"Due to space limitation the plot is being saved in {RESULT_IMAGES_PATH} under the name plot_{file_number}.png. For running and visualizing it directly in the notebook uncomment and run the commented line")

    _ = plot_eval_over_time(ts_predictions_open[model][ts_gain[model].index(max(ts_gain[model]))], 
                                                         ts_predictions_close[model][ts_gain[model].index(max(ts_gain[model]))], 
                                                         splits[ts_gain[model].index(max(ts_gain[model]))][1]['Open'], 
                                                         splits[ts_gain[model].index(max(ts_gain[model]))][1]['Close'],
                                                           hash_number=file_number)

Model LLMTime GPT-3.5, gain over tries graph


NameError: name 'random_8_digit_number' is not defined