<a href="https://colab.research.google.com/github/acse-2020/acse2020-acse9-finalreport-acse-jaq15/blob/main/evaluation_notebooks/Model_Group_Plotting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Imports

The cells below handle all the necessary imports to run our models, making use of the public repo feeder_repo, linked <!-- [Text](link) -->
[here](https://github.com/acse-jaq15/feeder_repo).

In [1]:
# clone the feeder repo to get data_reader module and financial time series data
!git clone https://github.com/acse-jaq15/feeder_repo.git

Cloning into 'feeder_repo'...
remote: Enumerating objects: 917, done.[K
remote: Counting objects: 100% (181/181), done.[K
remote: Compressing objects: 100% (94/94), done.[K
remote: Total 917 (delta 83), reused 179 (delta 81), pack-reused 736[K
Receiving objects: 100% (917/917), 291.80 MiB | 17.09 MiB/s, done.
Resolving deltas: 100% (419/419), done.
Checking out files: 100% (546/546), done.


In [2]:
# using '%' to enforce a permanent change of directory
%cd feeder_repo/

/content/feeder_repo


In [3]:
# checking contents listed correctly, should read:
# baseline_model.py data data_reader.py model_loader.py saved_models
# data LICENSE README.md security_plotter.py
!ls

base_model.py  data_reader.py  model_loader.py	saved_models
data	       LICENSE	       README.md	security_plotter.py


In [4]:
import os
import sys
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from keras import backend as K
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

In [5]:
# turning off tensorflow info and warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

In [6]:
# appending path with 'feeder_repo' string
sys.path.append('feeder_repo')

# import Data_Reader class from data_reader module
from feeder_repo.data_reader import Data_Reader
# import Baseline_Model class from base_model module
from feeder_repo.base_model import Base_Model
# import Security_Plotter class from security_plotter module
from feeder_repo.security_plotter import Security_Plotter
# import Trained_Model class from model_loader module
from feeder_repo.model_loader import Trained_Model
# import Untrained_Model class from model_loader module
from feeder_repo.model_loader import Untrained_Model

In [7]:
# checking if the notebook is running on a GPU
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
    print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
    print('and then re-execute this cell.')
else:
    print(gpu_info)

Wed Jul 28 11:42:36 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Generating group plots
A loop is used to generate a single plot per security of each model's predictions, along with the actual price and prediction of the dummy model.

In [8]:
# storing the year of the time series to be used as test data
in_yr = 2019
# setting our window_length to be 30 days
window_len = 30

In [9]:
# storing the units of each security in a dictionary, for later plotting
unit_dict = {
                'Al': 'Price (USD/mt)',
                'Cu': 'Price (USD/mt)',
                'Corn': 'Price (USd/bushel)',
                'EURCHF': 'Spot exchange rate',
                'EURUSD': 'Spot exchange rate',
                'GBPUSD': 'Spot exchange rate',
                'Bund10y': 'Yield (%)',
                'Gilt10y': 'Yield (%)',
                'Treasury10y': 'Yield (%)',
                'Amazon': 'Price (USD)',
                'Google': 'Price (USD)',
                'Nvidia': 'Price (USD)'
            }

# storing a list of models
model_list = ['CNN', 'CNN_GRU', 'CNN_LSTM',
              'GRU', 'GRU_AE', 'GRU_LSTM',
              'LSTM', 'LSTM_AE', 'LSTM_GRU',
              'MLP', 'MLP_AE']

# storing a list of securities
security_list = ['Al', 'Cu', 'Corn',
                'EURCHF', 'EURUSD', 'GBPUSD',
                'Gilt10y', 'Bund10y', 'Treasury10y',
                'Amazon', 'Google', 'Nvidia']

In [10]:
# mounting google drive for easy storage of plots and output dataframe
from google.colab import drive
# mounting the drive
drive.mount('/content/gdrive/')
# creating a string to save the plots and dataframe respectively
plot_path = '/content/gdrive/My Drive/group_plots/'

Mounted at /content/gdrive/


In [11]:
# looping through each security in security_list
for s in security_list:

  # creating an instance of Data_Reader class
  in_data = Data_Reader(s, in_yr)
  # calling class method extract_train_test to generate training and test datasets
  in_data.extract_train_test()
  # calling class method extract_xy to generate X and y training and test datasets
  in_data.extract_xy(window_len)

  # assigning X_test and y_test
  X_test = in_data.X_test
  y_test = in_data.y_test
  
  # creating a subplot, one per model loop and formatting various parameters
  fig, ax = plt.subplots()
  plt.xticks(rotation=45)
  fig.set_size_inches(12, 6)
  fig.suptitle(s+' Actual, Predicted and Dummy Prices', size='xx-large', y=0.92)

  # converting to datetime date format and slicing
  date_time = in_data.data.date[in_data.train_len + window_len:]
  # converting the series to datetime using pandas
  series_dates = pd.to_datetime(date_time).dt.date
  # resetting index to 0 based
  series_dates = series_dates.reset_index(drop=True)
  # converting to matplotlib format
  series_dates = mdates.date2num(series_dates)

  # setting YearLocator
  years = mdates.YearLocator()
  # setting MonthLocator
  months = mdates.MonthLocator()
  # setting format to give year and verbose month '2019-Jan'
  d_format = mdates.DateFormatter('%Y-%b')

  # generating the figure legend
  handles, labels = ax.get_legend_handles_labels()
  fig.legend(handles, labels, loc='lower center', ncol=3, fontsize='large')

  # setting x axis label
  ax.set_xlabel('Date')
  # getting y axis label from unit_dict
  ax.set_ylabel(unit_dict[s])
  # informing matplotlib that x axis contains dates
  ax.xaxis_date()
  ax.set_xticklabels(ax.get_xticks(), rotation=45, ha='right')
  # setting minor and major locators and format
  ax.xaxis.set_major_locator(months)
  ax.xaxis.set_major_formatter(d_format)
  ax.xaxis.set_minor_locator(years)

  # looping through each model in model_list
  for m in model_list:

    # conditional logic to set time_distributed bool depending on the model type
    # in order to ensure input data is of correct dimensions
    if m == 'CNN_GRU' or m == 'CNN_LSTM':
      time_distributed = True
      # creating another instance of Data_Reader class
      in_data_model = Data_Reader(s, in_yr)
      # calling class method extract_train_test to generate training and test datasets
      in_data_model.extract_train_test()
      # calling class method extract_xy to generate X and y training and test datasets
      in_data_model.extract_xy(window_len, time_distributed)
      # assigning X_test and y_test
      X_test_model = in_data_model.X_test
    else:
      X_test_model = X_test

    # clearing the keras session on the back end to ease memory usage
    K.clear_session()

    # creating an instance of Trained_Model class
    trained_model = Trained_Model(m, s)

    # creating an instance of Base_Model class using X_test
    base_model = Base_Model(X_test, window_len)
    # calling predict_y method
    base_model.predict_y(in_data.test_len - window_len)

    # using the trained model to predict y from X_test
    y_pred = trained_model.model.predict(X_test_model)
    # assigning y_dummy variable to .y_pred class attribute
    y_dummy = base_model.y_pred
    
    # calling class method extract_real_price to generate unnormalised prices
    in_data.extract_real_prices(y_pred, y_dummy)

    # assigning actual_price, predicted_price and dummy_price
    actual_price = in_data.actual_price
    predicted_price = in_data.predicted_price
    dummy_price = in_data.dummy_price

    # plotting values by accessing each subplot in turn
    ax.plot(series_dates, predicted_price, label=m, linewidth=0.9)


    # conditional logic to gather dummy model metrics only once
    if m == 'MLP_AE':
      # assigning y_true variable for metric calculation
      y_true = in_data.y_true
      
      actual_price = in_data.actual_price
      dummy_price = in_data.dummy_price
      ax.plot(series_dates, actual_price, label='Acutal Price', linewidth=2, linestyle='dotted', color='k')
      ax.plot(series_dates, dummy_price, label='Dummy', linewidth=2, linestyle='dotted')

    # printing to keep track of progress
    print(s+' '+m+' complete')
  
  ax.legend(bbox_to_anchor=(1.01, 0.5), loc='center left', borderaxespad=0.)

  # saving the matplotlib plot after the security loop is complete
  plt.savefig(plot_path+s+'_group_plot.png', dpi=600)
  # closing the plot to generate a fresh one in the next model loop
  plt.clf()
  
  # print to keep track of progress
  print(s+' all completed successfully')

Al CNN complete
Al CNN_GRU complete
Al CNN_LSTM complete
Al GRU complete
Al GRU_AE complete
Al GRU_LSTM complete
Al LSTM complete
Al LSTM_AE complete
Al LSTM_GRU complete
Al MLP complete
Al MLP_AE complete
Al all completed successfully
Cu CNN complete
Cu CNN_GRU complete
Cu CNN_LSTM complete
Cu GRU complete
Cu GRU_AE complete
Cu GRU_LSTM complete
Cu LSTM complete
Cu LSTM_AE complete
Cu LSTM_GRU complete
Cu MLP complete
Cu MLP_AE complete
Cu all completed successfully
Corn CNN complete
Corn CNN_GRU complete
Corn CNN_LSTM complete
Corn GRU complete
Corn GRU_AE complete
Corn GRU_LSTM complete
Corn LSTM complete
Corn LSTM_AE complete
Corn LSTM_GRU complete
Corn MLP complete
Corn MLP_AE complete
Corn all completed successfully
EURCHF CNN complete
EURCHF CNN_GRU complete
EURCHF CNN_LSTM complete
EURCHF GRU complete
EURCHF GRU_AE complete
EURCHF GRU_LSTM complete
EURCHF LSTM complete
EURCHF LSTM_AE complete
EURCHF LSTM_GRU complete
EURCHF MLP complete
EURCHF MLP_AE complete
EURCHF all complet

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>

<Figure size 864x432 with 0 Axes>