# LSTM-CEEMDAN model

## Python requirements

```
!pip install plotly
!pip install cufflinks
!pip install chart_studio
!pip install ipywidgets
!pip install yfinance
!pip install EMD-signal
!pip install sklearn
!pip install keras
!pip install tensorflow
```

### Model

The **LSTM-CEEMDAN** works with daily data from equities: open, high, low, close, volume. Hereinafter we are calling these "features".

### Data processing steps:

1. Data gathering

We use yfinance package to get the daily ticker history. We obtain a dataframe with one time series for each of the features, for each of the tickers we are working with.

Here we are considering the top 10 most liquid tickers in Brazilian B3 exchange in period of **2019-01-01 until 2020-05-01**:

**PETR4, VALE3, BOVA11, ITUB4, BBDC4, B3SA3, BBAS3, ABEV3, MGLU3, VVAR3**

2. Decomposition

We decompose the data with the complete ensemble empirical mode decomposition with adative noise (CEEMDAN) algorithm, using the [PyEMD public package from Dawid Laszuk](github.com/laszukdawid/PyEMD).

As a result, we obtain a set of instrinsic mode functions (IMFs) time series for each of the features, for each of the tickers. The number of IMFs resulting from a decomposition may vary, and depends upon CEEMDAN hyperparameters such as scale of added noise, specific series characteristics, and mainly **series length**.

3. Data transformation

For each of the IMFs series obtained, we tranform the data according to the following equation in order to work with data values between 0 and 1.

$$x' = \frac{x-x_{min}}{x_{max}-x_{min}}$$

Where x is any element in the series to be transformed, x' is the equivalent transformed element. The equation also takes into account the maximum and minimum values of the series.

4. Windowing

Besides the mathematical transformation, each series undergo a vectorial transformation in order to split the data into windows.

The process is examplified:

```
original single dimensional series = [1,2,3,4,5,6,7,8]
Splitting into windows of length 4.
Windowed data = [[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7],[5,6,7,8]]
```
This way the resulting series will have $n-w+1$ elements, where $n$ is the length of the orginal series and $w$ is the window length.

### Processed data availability

After the processing method, the final data is available in a dataframe format.

In [1]:
from datetime import timedelta, datetime
import pandas as pd
import numpy as np

import cufflinks as cf
import chart_studio.plotly as plotly
import plotly.offline
cf.go_offline()
cf.set_config_file(offline=True, world_readable=False)

from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from keras.preprocessing.sequence import TimeseriesGenerator
from keras.models import Sequential
from keras.layers import Dense, LSTM, LeakyReLU
import yfinance as yf
from PyEMD import CEEMDAN

# iplot layout
space =  {
            'legend' : {'bgcolor':'#1A1A1C','font':{'color':'#D9D9D9',"size":12}},
            'paper_bgcolor' : '#1A1A1C',
            'plot_bgcolor' : '#1A1A1C',
            "title" : {"font":{"color":"#D9D9D9"},"x":0.5},
            'yaxis' : {
                'tickfont' : {'color':'#C2C2C2', "size":12},
                'gridcolor' : '#434343',
                'titlefont' : {'color':'#D9D9D9'},
                'zerolinecolor' : '#666570',
                'showgrid' : True
            },
            'xaxis' : {
                'tickfont' : {'color':'#C2C2C2', "size":12},
                'gridcolor' : '#434343',
                'titlefont' : {'color':'#D9D9D9'},
                'zerolinecolor' : '#666570',
                'showgrid' : True
            },
            'titlefont' : {'color':'#D9D9D9'}
        }

Using TensorFlow backend.


In [2]:
tickers = ["PETR4", "VALE3", "BOVA11", "ITUB4", "BBDC4", "B3SA3", "BBAS3", "ABEV3", "MGLU3", "VVAR3"]

tickers = [f"{ticker}.SA" for ticker in tickers]

start_datetime = datetime(year=2018, month=7, day=20)
end_datetime = datetime(year=2019, month=12, day=2)

history_data = {ticker.split('.')[0]:yf.download(ticker, start=start_datetime, end=end_datetime).drop(['Close','Open', 'High', 'Low', 'Volume'], axis=1).dropna() for ticker in tickers}

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


In [3]:
%%time
ceemdan = CEEMDAN()
decomposed_data = {}
decomposed_ticker_features_series = {}
scalers = {}
for ticker in history_data:
    # iterating every ticker
    print(f'[{ticker}] Decomposing...')
    ticker_dataframe = history_data[ticker]
    decomposed_ticker_features_series[ticker] = {}
    scalers[ticker] = {}
    for column in ticker_dataframe.columns:
        # iterating evey feature
        scaler = MinMaxScaler()
        decomposed_ticker_features_series[ticker][column] = {}
        series = ticker_dataframe[column].values.reshape(-1,1)
        scaler.fit(series)
        scalers[ticker][column] = scaler
        ticker_feature_time_series = np.frombuffer(scaler.transform(series))
        ticker_feature_time_series_imfs = ceemdan(ticker_feature_time_series, max_imf=10)
        for i, imf_series in enumerate(ticker_feature_time_series_imfs):
            # iterating every IMF 
            if i < len(ticker_feature_time_series_imfs):
                decomposed_ticker_features_series[ticker][column][f'IMF{i+1}'] = imf_series
            else:
                decomposed_ticker_features_series[ticker][column][f'Rsd'] = imf_series

[PETR4] Decomposing...
[VALE3] Decomposing...
[BOVA11] Decomposing...
[ITUB4] Decomposing...
[BBDC4] Decomposing...
[B3SA3] Decomposing...
[BBAS3] Decomposing...
[ABEV3] Decomposing...
[MGLU3] Decomposing...
[VVAR3] Decomposing...
CPU times: user 1min 7s, sys: 1.04 s, total: 1min 8s
Wall time: 1min 54s


In [4]:
# Data organisation
features_in_order = ['Adj Close']

max_window_size = 10
windows_sizes_for_imf_level = {
    'IMF1': 2,
    'IMF2': 2,
    'IMF3': 3,
    'IMF4': 3,
    'IMF5': 4,
    'IMF6': 4,
    'IMF7': 5,
    'IMF8': 5,
    'Rsd': 6,
    'DEFAULT': 4
}

# Coupling together the IMFs of the same level for different features to create exogenous input
# The number of imfs for each feature decomposition may differ, thus some of the last imfs may not match in number of features
series = {}
for ticker in decomposed_ticker_features_series:
    series[ticker] = {}
    for feature in decomposed_ticker_features_series[ticker]:
        imfs = pd.DataFrame.from_dict(decomposed_ticker_features_series[ticker][feature])
        for imf in imfs:
            if imf not in series[ticker]:
                series[ticker][imf] = []
            _series = imfs[imf].values
            _series = _series.reshape((len(_series),1))
            series[ticker][imf] += [_series] # reshaping to get into column format

dataset = {}
# # horizontal stack
for ticker in series:
    dataset[ticker] = {}
    for imf_level in series[ticker]:
        dataset[ticker][imf_level] = np.hstack(tuple(series[ticker][imf_level]))

# # data set split rates
train = 0.7
validation = 0.2
test = 0.1

train_dataset = {}
validation_dataset = {}
test_dataset = {}

train_generators = {}
validation_generators = {}
test_generators = {}

for ticker in dataset:

    train_dataset[ticker] = {}
    validation_dataset[ticker] = {}
    test_dataset[ticker] = {}

    train_generators[ticker] = {}
    validation_generators[ticker] = {}
    test_generators[ticker] = {}

    for imf_level in dataset[ticker]:
        # splitting data sets according to rates
        train_dataset[ticker][imf_level] = dataset[ticker][imf_level][:round(train*dataset[ticker][imf_level].shape[0]),:]
        validation_dataset[ticker][imf_level] = dataset[ticker][imf_level][round(train*dataset[ticker][imf_level].shape[0]):round((train+validation)*dataset[ticker][imf_level].shape[0]),:]
        test_dataset[ticker][imf_level] = dataset[ticker][imf_level][round((train+validation)*dataset[ticker][imf_level].shape[0]):,:]

        if imf_level in windows_sizes_for_imf_level:
            window_size = windows_sizes_for_imf_level[imf_level]
        else: 
            window_size = windows_sizes_for_imf_level['DEFAULT']
        # windowing
        train_generators[ticker][imf_level] = TimeseriesGenerator(train_dataset[ticker][imf_level], train_dataset[ticker][imf_level], length=window_size, batch_size=1)
        validation_generators[ticker][imf_level] = TimeseriesGenerator(validation_dataset[ticker][imf_level], validation_dataset[ticker][imf_level], length=window_size, batch_size=1)
        test_generators[ticker][imf_level] = TimeseriesGenerator(test_dataset[ticker][imf_level], test_dataset[ticker][imf_level], length=window_size, batch_size=1)


In [5]:
%%time
# Model Training

models = {}

model_epochs = {
    'IMF1': 2000,
    'IMF2': 1500,
    'IMF3': 1000,
    'IMF4': 1000,
    'IMF5': 1000,
    'IMF6': 800,
    'IMF7': 800,
    'IMF8': 600,
    'Rsd': 500,
    'DEFAULT': 1000
}

for ticker in train_generators:
    models[ticker] = {}
    for imf_level in train_generators[ticker]:
        print(f'Training model [{ticker}][{imf_level}]')
        # Prediction model
        model = Sequential()
        current_dataset = train_dataset[ticker][imf_level]
        n_features = current_dataset.shape[1]
        cur_tmp_gen = train_generators[ticker][imf_level]

        if imf_level in windows_sizes_for_imf_level:
            window_size = windows_sizes_for_imf_level[imf_level]
        else: 
            window_size = windows_sizes_for_imf_level['DEFAULT']

        model.add(LSTM(128, activation='tanh', return_sequences=True, input_shape=(window_size, n_features)))
        model.add(LSTM(64, activation='tanh', input_shape=(window_size, 128)))
        model.add(Dense(16))
        model.add(Dense(4))
        model.add(Dense(n_features))
        model.compile(optimizer='adam', loss='mse')

        number_of_epochs = model_epochs[imf_level]
        # fit model
        model.fit_generator(cur_tmp_gen, steps_per_epoch=1, epochs=number_of_epochs, verbose=0)

        models[ticker][imf_level] = model

Training model [PETR4][IMF1]
Training model [PETR4][IMF2]
Training model [PETR4][IMF3]
Training model [PETR4][IMF4]
Training model [PETR4][IMF5]
Training model [VALE3][IMF1]
Training model [VALE3][IMF2]
Training model [VALE3][IMF3]
Training model [VALE3][IMF4]
Training model [VALE3][IMF5]
Training model [VALE3][IMF6]
Training model [BOVA11][IMF1]
Training model [BOVA11][IMF2]
Training model [BOVA11][IMF3]
Training model [BOVA11][IMF4]
Training model [BOVA11][IMF5]
Training model [ITUB4][IMF1]
Training model [ITUB4][IMF2]
Training model [ITUB4][IMF3]
Training model [ITUB4][IMF4]
Training model [ITUB4][IMF5]
Training model [ITUB4][IMF6]
Training model [BBDC4][IMF1]
Training model [BBDC4][IMF2]
Training model [BBDC4][IMF3]
Training model [BBDC4][IMF4]
Training model [BBDC4][IMF5]
Training model [BBDC4][IMF6]
Training model [B3SA3][IMF1]
Training model [B3SA3][IMF2]
Training model [B3SA3][IMF3]
Training model [B3SA3][IMF4]
Training model [B3SA3][IMF5]
Training model [B3SA3][IMF6]
Training 

In [6]:
results = {}

for ticker in models:
    results[ticker] = {}

    # initializing results dicitionary
    for feature in features_in_order:
        results[ticker][feature] = {}
        for imf_level in models[ticker]:
            results[ticker][feature][imf_level] = {
                'real_train': [],
                'predicted_train': [],
                'real_validation': [],
                'predicted_validation': [],
                'real_test': [],
                'predicted_test': [],
                'x_axis_train': [],
                'x_axis_validation': [],
                'x_axis_test': []
            }

    for imf_level in models[ticker]:
        model = models[ticker][imf_level]
        
        print(f'Predicting: [{ticker}][{imf_level}]')

        cur_train_gen = train_generators[ticker][imf_level]
        cur_validation_gen = validation_generators[ticker][imf_level]
        cur_test_gen = test_generators[ticker][imf_level]

        # predicting train
        day_counter = 0
        for i in range(len(cur_train_gen)):
            x, y = cur_train_gen[i]
            yhat = model.predict(x, verbose=0)

            for j in range(yhat.shape[1]):
                results[ticker][features_in_order[j]][imf_level]['real_train'] += [y[:,j][0]]
                results[ticker][features_in_order[j]][imf_level]['predicted_train'] += [yhat[:,j][0]]
                results[ticker][features_in_order[j]][imf_level]['x_axis_train'] += [day_counter]
            day_counter += 1

        # predicting validation
        for i in range(len(cur_validation_gen)):
            x, y = cur_validation_gen[i]
            yhat = model.predict(x, verbose=0)

            for j in range(yhat.shape[1]):
                results[ticker][features_in_order[j]][imf_level]['real_validation'] += [y[:,j][0]]
                results[ticker][features_in_order[j]][imf_level]['predicted_validation'] += [yhat[:,j][0]]
                results[ticker][features_in_order[j]][imf_level]['x_axis_validation'] += [day_counter]
            day_counter += 1

        # predicting test
        for i in range(len(cur_test_gen)):
            x, y = cur_test_gen[i]
            yhat = model.predict(x, verbose=0)

            for j in range(yhat.shape[1]):
                results[ticker][features_in_order[j]][imf_level]['real_test'] += [y[:,j][0]]
                results[ticker][features_in_order[j]][imf_level]['predicted_test'] += [yhat[:,j][0]]
                results[ticker][features_in_order[j]][imf_level]['x_axis_test'] += [day_counter]
            day_counter += 1


Predicting: [PETR4][IMF1]
Predicting: [PETR4][IMF2]
Predicting: [PETR4][IMF3]
Predicting: [PETR4][IMF4]
Predicting: [PETR4][IMF5]
Predicting: [VALE3][IMF1]
Predicting: [VALE3][IMF2]
Predicting: [VALE3][IMF3]
Predicting: [VALE3][IMF4]
Predicting: [VALE3][IMF5]
Predicting: [VALE3][IMF6]
Predicting: [BOVA11][IMF1]
Predicting: [BOVA11][IMF2]
Predicting: [BOVA11][IMF3]
Predicting: [BOVA11][IMF4]
Predicting: [BOVA11][IMF5]
Predicting: [ITUB4][IMF1]
Predicting: [ITUB4][IMF2]
Predicting: [ITUB4][IMF3]
Predicting: [ITUB4][IMF4]
Predicting: [ITUB4][IMF5]
Predicting: [ITUB4][IMF6]
Predicting: [BBDC4][IMF1]
Predicting: [BBDC4][IMF2]
Predicting: [BBDC4][IMF3]
Predicting: [BBDC4][IMF4]
Predicting: [BBDC4][IMF5]
Predicting: [BBDC4][IMF6]
Predicting: [B3SA3][IMF1]
Predicting: [B3SA3][IMF2]
Predicting: [B3SA3][IMF3]
Predicting: [B3SA3][IMF4]
Predicting: [B3SA3][IMF5]
Predicting: [B3SA3][IMF6]
Predicting: [BBAS3][IMF1]
Predicting: [BBAS3][IMF2]
Predicting: [BBAS3][IMF3]
Predicting: [BBAS3][IMF4]
Predict

In [7]:
# organizing imf prediction results, concatenating train, validation and test
concatenated_results = {}

for ticker in results:
    concatenated_results[ticker] = {}
    for feature in results[ticker]:
        concatenated_results[ticker][feature] = {}
        for imf_level in results[ticker][feature]:
            
            df_result = pd.DataFrame.from_dict(results[ticker][feature][imf_level], orient='index').T
            df_train = df_result[['real_train','predicted_train','x_axis_train']].set_index('x_axis_train').dropna(axis=0)
            df_train.index.name = 'x'
            df_validation = df_result[['real_validation','predicted_validation','x_axis_validation']].set_index('x_axis_validation').dropna(axis=0)
            df_validation.index.name = 'x'
            df_test = df_result[['real_test','predicted_test','x_axis_test']].set_index('x_axis_test').dropna(axis=0)
            df_test.index.name = 'x'

            df_concatenated = pd.concat([df_train,df_validation,df_test], axis=1)

            concatenated_results[ticker][feature][imf_level] = df_concatenated

In [8]:
# plotting partial result
plot_ticker = 'PETR4'
plot_feature = 'Adj Close'
plot_imf = 'IMF1'

concatenated_results[plot_ticker][plot_feature][plot_imf].iplot(title=f'{plot_ticker} {plot_feature} {plot_imf}', asFigure=True, layout=space)

In [9]:
concatenated_results_copy = concatenated_results.copy()
accuracies_per_imf_detailed = {}

for ticker in concatenated_results_copy:
    accuracies_per_imf_detailed[ticker] = {}
    for feature in concatenated_results_copy[ticker]:
        # we are predicting only Adj Close, so we don't need to specify which feature we are predicting here
        # accuracies_per_imf_detailed[ticker][feature] = {}
        for imf_level in concatenated_results_copy[ticker][feature]:
            real_test = concatenated_results_copy[ticker][feature][imf_level]['real_test'].values
            predicted_test = concatenated_results_copy[ticker][feature][imf_level]['predicted_test'].values
            real_validation = concatenated_results_copy[ticker][feature][imf_level]['real_validation'].values
            predicted_validation = concatenated_results_copy[ticker][feature][imf_level]['predicted_validation'].values
            real_train = concatenated_results_copy[ticker][feature][imf_level]['real_train'].values
            predicted_train = concatenated_results_copy[ticker][feature][imf_level]['predicted_train'].values

            # removing offset nan
            real_test = real_test[~np.isnan(real_test)]
            predicted_test = predicted_test[~np.isnan(predicted_test)]
            real_validation = real_validation[~np.isnan(real_validation)]
            predicted_validation = predicted_validation[~np.isnan(predicted_validation)]
            real_train = real_train[~np.isnan(real_train)]
            predicted_train = predicted_train[~np.isnan(predicted_train)]

            accuracies_per_imf_detailed[ticker][f'mse_test_{imf_level}'] = mean_squared_error(real_train,predicted_train)
            accuracies_per_imf_detailed[ticker][f'mape_test_{imf_level}'] = np.mean(np.abs((real_train - predicted_train) / real_train)) * 100

            # we sre only interested in the test accuracy here, so no need for specifying train and validation accuracies
            # accuracies_per_imf_detailed[ticker] = {
            #     'mse_train':mean_squared_error(real_test,predicted_test),
            #     'mse_validation':mean_squared_error(real_validation,predicted_validation),
            #     'mse_test':mean_squared_error(real_train,predicted_train),
            #     'mape_train':np.mean(np.abs((real_test - predicted_test) / real_test)) * 100,
            #     'mape_validation':np.mean(np.abs((real_validation - predicted_validation) / real_validation)) * 100,
            #     'mape_test':np.mean(np.abs((real_train - predicted_train) / real_train)) * 100,
            # }

df_accuracies_by_imfs = pd.DataFrame.from_dict(accuracies_per_imf_detailed)
df_accuracies_by_imfs.to_csv(f'exp_records_accuracy_per_imf/lstm_ceemdan_per_imf_{datetime.now().strftime("%H_%M_%S_%m_%d_%Y")}.csv', sep=',', encoding='utf-8')
df_accuracies_by_imfs

Unnamed: 0,PETR4,VALE3,BOVA11,ITUB4,BBDC4,B3SA3,BBAS3,ABEV3,MGLU3,VVAR3
mse_test_IMF1,0.00052,0.001428,0.000394,0.001116,0.0003807898,0.000143,0.000359,0.000763,6.6e-05,0.0005390423
mape_test_IMF1,116.647175,218.315512,126.297964,281.82939,124.758,104.238041,546.930024,129.645248,119.776442,138.4503
mse_test_IMF2,5.6e-05,0.000431,9.3e-05,0.000113,0.000167528,3.1e-05,8.6e-05,0.00012,1.6e-05,8.467471e-05
mape_test_IMF2,354.248916,156.253707,230.914252,147.552508,204.8975,97.578355,194.101283,135.289368,1074.033999,75.43673
mse_test_IMF3,1.3e-05,5e-05,1.4e-05,1.2e-05,3.837209e-06,1.1e-05,7e-06,2e-05,5e-06,1.59628e-05
mape_test_IMF3,542.215471,37.249069,102.445544,18.887421,46.71012,118.779281,29.018442,45.167111,39.139082,29.22832
mse_test_IMF4,0.000505,0.000213,0.000165,0.00077,0.0002344182,8.2e-05,0.000137,0.000666,8e-06,0.0005255099
mape_test_IMF4,97.773401,93.212107,89.500047,4674.600112,67.07502,278.198179,369.98539,103.211202,62.595827,701.9924
mse_test_IMF5,0.000104,0.000384,8.8e-05,0.000603,0.000217793,1.3e-05,0.000148,5.1e-05,0.000101,0.0002989821
mape_test_IMF5,2.268386,51.039605,1.829608,32.900192,8.580111,27.207704,10.504605,987.385383,64.225295,89.95575


In [10]:
# recomposing prediction by arithmetically adding the IMF curves

final_prediction_results = {}
max_window_size = 10

for ticker in concatenated_results:
    final_prediction_results[ticker] = {}
    for feature in concatenated_results[ticker]:
        addition_train = None
        addition_validation = None
        addition_test = None

        addition_real_train = None
        addition_real_validation = None
        addition_real_test = None

        # recomposing predictions
        for imf_level in concatenated_results[ticker][feature]:
            # adding test
            can_sum = True
            if addition_test is None:
                addition_test = concatenated_results[ticker][feature][imf_level]['predicted_test'].values
            else:
                np_array_to_be_added = concatenated_results[ticker][feature][imf_level]['predicted_test'].values
                cur_length = addition_test.shape[0]
                next_np_array_length = np_array_to_be_added.shape[0]
                if cur_length < next_np_array_length:
                    if next_np_array_length-cur_length < max_window_size:
                        np_array_to_be_added = np_array_to_be_added[next_np_array_length-cur_length:]
                    else:
                        can_sum = False
                else: 
                    if cur_length-next_np_array_length < max_window_size:
                        addition_test = addition_test[cur_length-next_np_array_length:]
                    else:
                        can_sum = False
                
                if can_sum:
                    addition_test = np.add(addition_test,np_array_to_be_added)

        for imf_level in concatenated_results[ticker][feature]:
            # adding train
            can_sum = True
            if addition_train is None:
                addition_train = concatenated_results[ticker][feature][imf_level]['predicted_train'].values
            else:
                np_array_to_be_added = concatenated_results[ticker][feature][imf_level]['predicted_train'].values
                cur_length = addition_train.shape[0]
                next_np_array_length = np_array_to_be_added.shape[0]
                if cur_length < next_np_array_length:
                    if next_np_array_length-cur_length < max_window_size:
                        np_array_to_be_added = np_array_to_be_added[next_np_array_length-cur_length:]
                    else:
                        can_sum = False
                else: 
                    if cur_length-next_np_array_length < max_window_size:
                        addition_train = addition_train[cur_length-next_np_array_length:]
                    else:
                        can_sum = False
                
                if can_sum:
                    addition_train = np.add(addition_train,np_array_to_be_added)

        for imf_level in concatenated_results[ticker][feature]:
            # adding validation
            can_sum = True
            if addition_validation is None:
                addition_validation = concatenated_results[ticker][feature][imf_level]['predicted_validation'].values
            else:
                np_array_to_be_added = concatenated_results[ticker][feature][imf_level]['predicted_validation'].values
                cur_length = addition_validation.shape[0]
                next_np_array_length = np_array_to_be_added.shape[0]
                if cur_length < next_np_array_length:
                    if next_np_array_length-cur_length < max_window_size:
                        np_array_to_be_added = np_array_to_be_added[next_np_array_length-cur_length:]
                    else:
                        can_sum = False
                else: 
                    if cur_length-next_np_array_length < max_window_size:
                        addition_validation = addition_validation[cur_length-next_np_array_length:]
                    else:
                        can_sum = False
                
                if can_sum:
                    addition_validation = np.add(addition_validation,np_array_to_be_added)

        # recomposing real
        for imf_level in concatenated_results[ticker][feature]:
            # adding test
            can_sum = True
            if addition_real_test is None:
                addition_real_test = concatenated_results[ticker][feature][imf_level]['real_test'].values
            else:
                np_array_to_be_added = concatenated_results[ticker][feature][imf_level]['real_test'].values
                cur_length = addition_real_test.shape[0]
                next_np_array_length = np_array_to_be_added.shape[0]
                if cur_length < next_np_array_length:
                    if next_np_array_length-cur_length < max_window_size:
                        np_array_to_be_added = np_array_to_be_added[next_np_array_length-cur_length:]
                    else:
                        can_sum = False
                else: 
                    if cur_length-next_np_array_length < max_window_size:
                        addition_real_test = addition_real_test[cur_length-next_np_array_length:]
                    else:
                        can_sum = False
                
                if can_sum:
                    addition_real_test = np.add(addition_real_test,np_array_to_be_added)

        for imf_level in concatenated_results[ticker][feature]:
            # adding train
            can_sum = True
            if addition_real_train is None:
                addition_real_train = concatenated_results[ticker][feature][imf_level]['real_train'].values
            else:
                np_array_to_be_added = concatenated_results[ticker][feature][imf_level]['real_train'].values
                cur_length = addition_real_train.shape[0]
                next_np_array_length = np_array_to_be_added.shape[0]
                if cur_length < next_np_array_length:
                    if next_np_array_length-cur_length < max_window_size:
                        np_array_to_be_added = np_array_to_be_added[next_np_array_length-cur_length:]
                    else:
                        can_sum = False
                else: 
                    if cur_length-next_np_array_length < max_window_size:
                        addition_real_train = addition_real_train[cur_length-next_np_array_length:]
                    else:
                        can_sum = False
                
                if can_sum:
                    addition_real_train = np.add(addition_real_train,np_array_to_be_added)

        for imf_level in concatenated_results[ticker][feature]:
            # adding validation
            can_sum = True
            if addition_real_validation is None:
                addition_real_validation = concatenated_results[ticker][feature][imf_level]['real_validation'].values
            else:
                np_array_to_be_added = concatenated_results[ticker][feature][imf_level]['real_validation'].values
                cur_length = addition_real_validation.shape[0]
                next_np_array_length = np_array_to_be_added.shape[0]
                if cur_length < next_np_array_length:
                    if next_np_array_length-cur_length < max_window_size:
                        np_array_to_be_added = np_array_to_be_added[next_np_array_length-cur_length:]
                    else:
                        can_sum = False
                else: 
                    if cur_length-next_np_array_length < max_window_size:
                        addition_real_validation = addition_real_validation[cur_length-next_np_array_length:]
                    else:
                        can_sum = False
                
                if can_sum:
                    addition_real_validation = np.add(addition_real_validation,np_array_to_be_added)
        
        scaler = scalers[ticker][feature]

        final_prediction_results[ticker][feature] = {
            'train_predicted': scaler.inverse_transform(addition_train.reshape(-1,1)).reshape(-1),
            'validation_predicted': scaler.inverse_transform(addition_validation.reshape(-1,1)).reshape(-1),
            'test_predicted': scaler.inverse_transform(addition_test.reshape(-1,1)).reshape(-1),
            'train_real': scaler.inverse_transform(addition_real_train.reshape(-1,1)).reshape(-1),
            'validation_real': scaler.inverse_transform(addition_real_validation.reshape(-1,1)).reshape(-1),
            'test_real': scaler.inverse_transform(addition_real_test.reshape(-1,1)).reshape(-1)
        }

In [11]:
# plotting final result
plot_ticker = 'PETR4'
plot_feature = 'Adj Close'

pd.DataFrame.from_dict(final_prediction_results[plot_ticker][plot_feature]).iplot(title=f'{plot_ticker} {plot_feature} {plot_imf}', layout=space)

In [12]:
# calculating accuracy metrics

adj_close_accuracies = {}
accuracies_detailed = {}

for ticker in final_prediction_results:
    adj_close_accuracies[ticker] = {}
    accuracies_detailed[ticker] = {}
    for feature in final_prediction_results[ticker]:

        y_train = final_prediction_results[ticker][feature]['train_predicted'][~np.isnan(final_prediction_results[ticker][feature]['train_predicted'])]
        yhat_train = final_prediction_results[ticker][feature]['train_real'][~np.isnan(final_prediction_results[ticker][feature]['train_real'])]

        y_validation = final_prediction_results[ticker][feature]['validation_predicted'][~np.isnan(final_prediction_results[ticker][feature]['validation_predicted'])]
        yhat_validation = final_prediction_results[ticker][feature]['validation_real'][~np.isnan(final_prediction_results[ticker][feature]['validation_real'])]

        y_test = final_prediction_results[ticker][feature]['test_predicted'][~np.isnan(final_prediction_results[ticker][feature]['test_predicted'])]
        yhat_test = final_prediction_results[ticker][feature]['test_real'][~np.isnan(final_prediction_results[ticker][feature]['test_real'])]
        accuracies_detailed[ticker][feature] = {
            'mse':{
                'train':mean_squared_error(y_train,yhat_train),
                'validation':mean_squared_error(y_validation,yhat_validation),
                'test':mean_squared_error(y_test,yhat_test),
            },
            'mape':{
                'train':np.mean(np.abs((y_train - yhat_train) / y_train)) * 100,
                'validation':np.mean(np.abs((y_validation - yhat_validation) / y_validation)) * 100,
                'test':np.mean(np.abs((y_test - yhat_test) / y_test)) * 100,
            }
        }

        if feature == 'Adj Close':
            adj_close_accuracies[ticker] = {
                'mse': mean_squared_error(y_test,yhat_test),
                'mape': np.mean(np.abs((y_validation - yhat_validation) / y_validation)) * 100
            }

# pd.DataFrame.from_dict(accuracies_detailed[plot_ticker][plot_feature])
df_close_accuracies = pd.DataFrame.from_dict(adj_close_accuracies).T
df_close_accuracies.to_csv(f'exp_records/lstm_ceemdan_{datetime.now().strftime("%H_%M_%S_%m_%d_%Y")}.csv', sep=',', encoding='utf-8')
df_close_accuracies

Unnamed: 0,mse,mape
PETR4,0.850763,1.46258
VALE3,1.490739,1.560848
BOVA11,0.631973,0.625056
ITUB4,0.092269,1.348598
BBDC4,0.461326,1.754286
B3SA3,0.698235,1.943622
BBAS3,0.273462,1.19385
ABEV3,0.152687,1.242173
MGLU3,1.733704,2.59074
VVAR3,0.016182,2.256926
