The first step is to compose the training and testing datasets from historical data and computed technical analysis indicators. I will first work with a specific stock, symbol AI.PA

#### Let's download historical data for the last 20 years from Yahoo! API, and read it in a DataFrame:

In [1]:
!curl -L 'http://query1.finance.yahoo.com/v7/finance/download/AI.PA?period1=946857600&period2=1593820800&interval=1d&events=history' > historical_data_AI-PA.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     8  100     8    0     0    235      0 --:--:-- --:--:-- --:--:--   235
100  355k    0  355k    0     0  1233k      0 --:--:-- --:--:-- --:--:-- 1233k


In [2]:
# install required python packages
!pip install -r requirements.txt

Collecting ta (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/90/ec/e4f5aea8c7f0f55f92b52ffbafa389ea82f3a10d9cab2760e40af34c5b3f/ta-0.5.25.tar.gz
Building wheels for collected packages: ta
  Running setup.py bdist_wheel for ta ... [?25ldone
[?25h  Stored in directory: /home/ec2-user/.cache/pip/wheels/2e/93/b7/cf649194508e53cee4145ffb949e9f26877a5a8dd12db9ed5b
Successfully built ta
Installing collected packages: ta
Successfully installed ta-0.5.25
[33mYou are using pip version 10.0.1, however version 20.2b1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [3]:
# imports go here
import pandas as pd
import numpy as np
import ta
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer, mean_squared_error, mean_absolute_error
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.dummy import DummyRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.kernel_ridge import KernelRidge
from sklearn.linear_model import ElasticNet
import sagemaker
import boto3
import json

In [4]:
df = pd.read_csv('historical_data_AI-PA.csv', index_col=0, parse_dates=True, infer_datetime_format=True)

In [5]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-01-03,34.854301,36.306599,34.771301,35.061798,10.226519,904282.0
2000-01-04,35.061798,34.9995,32.613701,33.505798,9.772677,1381445.0
2000-01-05,32.779701,33.4021,32.2817,33.194599,9.681908,853763.0
2000-01-06,32.7589,36.223598,32.696701,35.580399,10.377778,1387137.0
2000-01-07,35.580399,37.136398,34.958,35.144798,10.250728,2198233.0


In [6]:
df.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-06-29,126.099998,127.75,125.699997,127.5,127.5,966640.0
2020-06-30,127.5,128.399994,126.599998,128.399994,128.399994,1021045.0
2020-07-01,128.300003,129.350006,127.050003,128.449997,128.449997,684533.0
2020-07-02,129.600006,132.949997,128.800003,132.649994,132.649994,1304983.0
2020-07-03,132.0,132.899994,129.800003,130.350006,130.350006,681232.0


#### Display descriptive stats about the dataset:

In [7]:
df.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,5267.0,5267.0,5267.0,5267.0,5267.0,5267.0
mean,65.154315,65.766419,64.550407,65.181575,50.084645,1257315.0
std,26.437988,26.582026,26.277434,26.441206,31.948313,727506.0
min,27.5308,28.087299,26.8046,26.9706,7.866548,0.0
25%,38.716299,39.1385,38.380001,38.77045,18.792883,820842.5
50%,60.484501,61.058498,59.993599,60.505001,42.880104,1094817.0
75%,86.47475,87.168801,85.711201,86.457001,76.44696,1493460.0
max,140.5,140.699997,139.800003,140.300003,137.140625,10146860.0


#### Plot the distributions of our features:

In [8]:
df.hist()

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f8197f42780>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8197cc09e8>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f8197c6dda0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8197c9d358>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f8197c438d0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8197bebe48>]],
      dtype=object)

#### Are there any NaN values in the dataset?

In [9]:
df.isna().sum().sum()

30

#### Let's add technical analysis indicators to our feature set.  
I will use the 'ta' python package to compute them:

In [10]:
df = ta.add_all_ta_features(df, open='Open', high='High', low='Low', close='Close', volume='Volume')

  dip[i] = 100 * (self._dip[i]/self._trs[i])
  din[i] = 100 * (self._din[i]/self._trs[i])


In [11]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,volume_adi,volume_obv,volume_cmf,volume_fi,...,momentum_uo,momentum_stoch,momentum_stoch_signal,momentum_wr,momentum_ao,momentum_kama,momentum_roc,others_dr,others_dlr,others_cr
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01-03,34.854301,36.306599,34.771301,35.061798,10.226519,904282.0,-562079.8,904282.0,,,...,,,,,,,,-46.209035,,0.0
2000-01-04,35.061798,34.9995,32.613701,33.505798,9.772677,1381445.0,-910426.0,-477163.0,,,...,,,,,,,,-4.437879,-4.539366,-4.437879
2000-01-05,32.779701,33.4021,32.2817,33.194599,9.681908,853763.0,-372901.3,-1330926.0,,,...,,,,,,,,-0.928791,-0.933132,-5.325451
2000-01-06,32.7589,36.223598,32.696701,35.580399,10.377778,1387137.0,508292.3,56211.0,,,...,,,,,,,,7.187314,6.940771,1.479106
2000-01-07,35.580399,37.136398,34.958,35.144798,10.250728,2198233.0,-1312943.0,-2142022.0,,,...,,,,,,,,-1.224272,-1.231828,0.236725


In [12]:
df.columns

Index(['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'volume_adi',
       'volume_obv', 'volume_cmf', 'volume_fi', 'momentum_mfi', 'volume_em',
       'volume_sma_em', 'volume_vpt', 'volume_nvi', 'volume_vwap',
       'volatility_atr', 'volatility_bbm', 'volatility_bbh', 'volatility_bbl',
       'volatility_bbw', 'volatility_bbp', 'volatility_bbhi',
       'volatility_bbli', 'volatility_kcc', 'volatility_kch', 'volatility_kcl',
       'volatility_kcw', 'volatility_kcp', 'volatility_kchi',
       'volatility_kcli', 'volatility_dcl', 'volatility_dch', 'trend_macd',
       'trend_macd_signal', 'trend_macd_diff', 'trend_sma_fast',
       'trend_sma_slow', 'trend_ema_fast', 'trend_ema_slow', 'trend_adx',
       'trend_adx_pos', 'trend_adx_neg', 'trend_vortex_ind_pos',
       'trend_vortex_ind_neg', 'trend_vortex_ind_diff', 'trend_trix',
       'trend_mass_index', 'trend_cci', 'trend_dpo', 'trend_kst',
       'trend_kst_sig', 'trend_kst_diff', 'trend_ichimoku_conv',
       'trend_ic

#### The dataset now has much more features, but also NaN values. I will then perform interpolation of the dataset, so that they get replaced by neutral values, i.e. that will not influence the algorithm during training:

In [13]:
df.isna().sum().sum()

33869

In [14]:
df.interpolate(axis=0, limit_direction='both', inplace=True)

In [15]:
df.isna().sum().sum()

0

#### I will now add historical data for CAC40 and SBF120, the indices that AI.PA stock relates to:

In [16]:
!curl -L 'http://query1.finance.yahoo.com/v7/finance/download/^FCHI?period1=946857600&period2=1593820800&interval=1d&events=history' > historical_data_CAC40.csv
!curl -L 'http://query1.finance.yahoo.com/v7/finance/download/^SBF120?period1=946857600&period2=1593820800&interval=1d&events=history' > historical_data_SBF120.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     8  100     8    0     0    222      0 --:--:-- --:--:-- --:--:--   222
100  404k    0  404k    0     0  1974k      0 --:--:-- --:--:-- --:--:-- 1974k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     8  100     8    0     0    250      0 --:--:-- --:--:-- --:--:--   250
100  339k    0  339k    0     0  1816k      0 --:--:-- --:--:-- --:--:-- 1816k


In [17]:
# Prefix columns with index name

cac40_df = pd.read_csv('historical_data_CAC40.csv', index_col=0, parse_dates=True, infer_datetime_format=True)
sbf120_df = pd.read_csv('historical_data_SBF120.csv', index_col=0, parse_dates=True, infer_datetime_format=True)

# Drop 'Volume' and 'Adj Close' features, meaningless regarding market indices
cac40_df.drop(['Volume', 'Adj Close'], axis=1, inplace=True)
sbf120_df.drop(['Volume', 'Adj Close'], axis=1, inplace=True)

prefixed_cac_cols = list()
prefixed_sbf_cols = list()
for cac_col, sbf_col in zip(cac40_df.columns, sbf120_df.columns):
    prefixed_cac_cols.append('cac40_' + cac_col)
    prefixed_sbf_cols.append('sbf120_' + sbf_col)

cac40_df.columns = prefixed_cac_cols
sbf120_df.columns = prefixed_sbf_cols

In [18]:
# Check everything went as expected

print(cac40_df.head())
print('')
print(sbf120_df.head())

             cac40_Open   cac40_High    cac40_Low  cac40_Close
Date                                                          
2000-01-03  6024.379883  6102.120117  5901.770020  5917.370117
2000-01-04  5922.229980  5925.069824  5657.200195  5672.020020
2000-01-05  5521.830078  5589.500000  5461.589844  5479.700195
2000-01-06  5485.930176  5530.259766  5388.850098  5450.109863
2000-01-07  5423.879883  5561.689941  5423.879883  5539.609863

            sbf120_Open  sbf120_High   sbf120_Low  sbf120_Close
Date                                                           
2000-01-03  4035.110107  4035.110107  4035.110107   4035.110107
2000-01-04  3873.149902  3873.149902  3873.149902   3873.149902
2000-01-05  3743.870117  3743.870117  3743.870117   3743.870117
2000-01-06  3728.080078  3728.080078  3728.080078   3728.080078
2000-01-07  3794.070068  3794.070068  3794.070068   3794.070068


In [19]:
# Interpolate both dataframes to fill NaNs
cac40_df.interpolate(axis=0, limit_direction='both', inplace=True)
sbf120_df.interpolate(axis=0, limit_direction='both', inplace=True)

In [20]:
# Add These features to our stock price dataset
df = pd.concat([df, cac40_df], axis=1)

In [21]:
df = pd.concat([df, sbf120_df], axis=1)

In [22]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,volume_adi,volume_obv,volume_cmf,volume_fi,...,others_dlr,others_cr,cac40_Open,cac40_High,cac40_Low,cac40_Close,sbf120_Open,sbf120_High,sbf120_Low,sbf120_Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01-03,34.854301,36.306599,34.771301,35.061798,10.226519,904282.0,-562079.8,904282.0,-0.402545,-833509.084988,...,-4.539366,0.0,6024.379883,6102.120117,5901.77002,5917.370117,4035.110107,4035.110107,4035.110107,4035.110107
2000-01-04,35.061798,34.9995,32.613701,33.505798,9.772677,1381445.0,-910426.0,-477163.0,-0.402545,-833509.084988,...,-4.539366,-4.437879,5922.22998,5925.069824,5657.200195,5672.02002,3873.149902,3873.149902,3873.149902,3873.149902
2000-01-05,32.779701,33.4021,32.2817,33.194599,9.681908,853763.0,-372901.3,-1330926.0,-0.402545,-833509.084988,...,-0.933132,-5.325451,5521.830078,5589.5,5461.589844,5479.700195,3743.870117,3743.870117,3743.870117,3743.870117
2000-01-06,32.7589,36.223598,32.696701,35.580399,10.377778,1387137.0,508292.3,56211.0,-0.402545,-833509.084988,...,6.940771,1.479106,5485.930176,5530.259766,5388.850098,5450.109863,3728.080078,3728.080078,3728.080078,3728.080078
2000-01-07,35.580399,37.136398,34.958,35.144798,10.250728,2198233.0,-1312943.0,-2142022.0,-0.402545,-833509.084988,...,-1.231828,0.236725,5423.879883,5561.689941,5423.879883,5539.609863,3794.070068,3794.070068,3794.070068,3794.070068


#### The value I am trying to predict is Ajdusted Close for day d + 1 to 7, from stock characteristics of day d. This means I have to shift the Adjusted Close column by -1 to -7, and drop the last 7 rows:

In [23]:
adjclose_df = pd.DataFrame()
adjclose_cols = list()
for i in range(1, 8):
    colname = 'AdjClose_D+' + str(i)
    adjclose_df[colname] = df['Adj Close'].shift(periods=-i)
    adjclose_cols.append(colname)
adjclose_df.columns = adjclose_cols

In [24]:
adjclose_df.tail(7)

Unnamed: 0_level_0,AdjClose_D+1,AdjClose_D+2,AdjClose_D+3,AdjClose_D+4,AdjClose_D+5,AdjClose_D+6,AdjClose_D+7
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-06-25,126.099998,127.5,128.399994,128.449997,132.649994,130.350006,
2020-06-26,127.5,128.399994,128.449997,132.649994,130.350006,,
2020-06-29,128.399994,128.449997,132.649994,130.350006,,,
2020-06-30,128.449997,132.649994,130.350006,,,,
2020-07-01,132.649994,130.350006,,,,,
2020-07-02,130.350006,,,,,,
2020-07-03,,,,,,,


In [25]:
adjclose_df.drop(adjclose_df.tail(i).index,inplace=True)

In [26]:
df.drop(df.tail(i).index,inplace=True)

In [27]:
adjclose_df.tail(7)

Unnamed: 0_level_0,AdjClose_D+1,AdjClose_D+2,AdjClose_D+3,AdjClose_D+4,AdjClose_D+5,AdjClose_D+6,AdjClose_D+7
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-06-16,127.400002,126.550003,128.100006,127.349998,129.050003,126.199997,126.849998
2020-06-17,126.550003,128.100006,127.349998,129.050003,126.199997,126.849998,126.099998
2020-06-18,128.100006,127.349998,129.050003,126.199997,126.849998,126.099998,127.5
2020-06-19,127.349998,129.050003,126.199997,126.849998,126.099998,127.5,128.399994
2020-06-22,129.050003,126.199997,126.849998,126.099998,127.5,128.399994,128.449997
2020-06-23,126.199997,126.849998,126.099998,127.5,128.399994,128.449997,132.649994
2020-06-24,126.849998,126.099998,127.5,128.399994,128.449997,132.649994,130.350006


In [28]:
adjclose_df.head()

Unnamed: 0_level_0,AdjClose_D+1,AdjClose_D+2,AdjClose_D+3,AdjClose_D+4,AdjClose_D+5,AdjClose_D+6,AdjClose_D+7
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2000-01-03,9.772677,9.681908,10.377778,10.250728,9.893692,9.802925,9.916677
2000-01-04,9.681908,10.377778,10.250728,9.893692,9.802925,9.916677,10.147853
2000-01-05,10.377778,10.250728,9.893692,9.802925,9.916677,10.147853,10.37174
2000-01-06,10.250728,9.893692,9.802925,9.916677,10.147853,10.37174,10.026813
2000-01-07,9.893692,9.802925,9.916677,10.147853,10.37174,10.026813,9.923937


In [29]:
adjclose_df.shape

(5265, 7)

In [30]:
df.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,volume_adi,volume_obv,volume_cmf,volume_fi,...,others_dlr,others_cr,cac40_Open,cac40_High,cac40_Low,cac40_Close,sbf120_Open,sbf120_High,sbf120_Low,sbf120_Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-06-18,127.150002,128.199997,125.449997,126.550003,126.550003,886472.0,225553600.0,246930822.0,0.187488,409308.773296,...,-0.669425,260.934151,4978.299805,5017.180176,4908.600098,4958.75,3976.079515,3981.240028,3921.819399,3940.668816
2020-06-19,127.75,129.350006,127.199997,128.100006,128.100006,2035713.0,225222200.0,248966535.0,0.147959,801601.985274,...,1.217375,265.354926,4997.529785,5040.470215,4979.450195,4979.450195,3976.365011,3981.53002,3922.05763,3940.923466
2020-06-22,127.400002,129.149994,127.0,127.349998,127.349998,738802.0,224724000.0,248227733.0,0.105716,607929.214461,...,-0.587207,263.215823,4928.009766,5006.399902,4902.060059,4948.700195,3976.650507,3981.820011,3922.295861,3941.178117
2020-06-23,128.25,129.5,127.800003,129.050003,129.050003,984631.0,225187300.0,249212364.0,0.161753,760207.55856,...,1.326076,268.064419,4972.879883,5046.310059,4962.600098,5017.680176,3976.936003,3982.110002,3922.534092,3941.432767
2020-06-24,128.149994,128.300003,126.199997,126.199997,126.199997,861152.0,224326200.0,248351212.0,0.147496,300993.854921,...,-2.233202,259.935897,4985.629883,5004.040039,4871.359863,4871.359863,3977.221499,3982.399993,3922.772323,3941.687418


In [31]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,volume_adi,volume_obv,volume_cmf,volume_fi,...,others_dlr,others_cr,cac40_Open,cac40_High,cac40_Low,cac40_Close,sbf120_Open,sbf120_High,sbf120_Low,sbf120_Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01-03,34.854301,36.306599,34.771301,35.061798,10.226519,904282.0,-562079.8,904282.0,-0.402545,-833509.084988,...,-4.539366,0.0,6024.379883,6102.120117,5901.77002,5917.370117,4035.110107,4035.110107,4035.110107,4035.110107
2000-01-04,35.061798,34.9995,32.613701,33.505798,9.772677,1381445.0,-910426.0,-477163.0,-0.402545,-833509.084988,...,-4.539366,-4.437879,5922.22998,5925.069824,5657.200195,5672.02002,3873.149902,3873.149902,3873.149902,3873.149902
2000-01-05,32.779701,33.4021,32.2817,33.194599,9.681908,853763.0,-372901.3,-1330926.0,-0.402545,-833509.084988,...,-0.933132,-5.325451,5521.830078,5589.5,5461.589844,5479.700195,3743.870117,3743.870117,3743.870117,3743.870117
2000-01-06,32.7589,36.223598,32.696701,35.580399,10.377778,1387137.0,508292.3,56211.0,-0.402545,-833509.084988,...,6.940771,1.479106,5485.930176,5530.259766,5388.850098,5450.109863,3728.080078,3728.080078,3728.080078,3728.080078
2000-01-07,35.580399,37.136398,34.958,35.144798,10.250728,2198233.0,-1312943.0,-2142022.0,-0.402545,-833509.084988,...,-1.231828,0.236725,5423.879883,5561.689941,5423.879883,5539.609863,3794.070068,3794.070068,3794.070068,3794.070068


In [32]:
df.shape

(5265, 86)

#### I will then train 7 models, each predicting a different Adjusted Close value for D + 1 up to 7:

#### The dataset has very different value ranges, so I have to normalize it:

In [33]:
X_scaler = MinMaxScaler().fit(df.values)
y_scaler = MinMaxScaler().fit(adjclose_df.values)

X_scaled = X_scaler.transform(df.values)
y_scaled = y_scaler.transform(adjclose_df.values)

#### Split training and testing sets:

In [34]:
# the last 10% data goes to the testing set
train_size = int(len(X_scaled) * 0.90)
train_X, test_X = X_scaled[0:train_size], X_scaled[train_size:len(X_scaled)]
train_y, test_y = y_scaled[0:train_size], y_scaled[train_size:len(y_scaled)]

In [35]:
# Check if split is correct
print("training set size: {:.2f}%".format(len(train_X)/len(X_scaled) * 100))
print("testing set size: {:.2f}%".format(len(test_X)/len(X_scaled) * 100))

training set size: 89.99%
testing set size: 10.01%


#### Investigate by training models
I will then train several models including the DummyRegressor benchmark model, but also LinearRegressor and RandomForestRegressor, to get a better grasp of what model would be best suited for this use-case:

In [36]:
# This function returns the Root Mean Squared Error, normalized by Standard-Deviation
def stdev_root_mean_squared_error(y_true, y_pred):
    return np.sqrt(mean_squared_error(y_true, y_pred)) / np.std(y_true)

# This one returns the Mean Absolute Percentage Error, normalized by the true values
# and expressed as a percentage
def mean_asbsolute_percentage_error(y_true, y_pred):
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def get_metrics_list(y_true, y_pred, decimals=3):
    mse = round(mean_squared_error(y_true, y_pred), decimals)
    sd_rmse = round(stdev_root_mean_squared_error(y_true, y_pred), decimals)
    mae = round(mean_absolute_error(y_true, y_pred), decimals)
    mape = round(mean_asbsolute_percentage_error(y_true, y_pred), decimals)
    return [mse, sd_rmse, mae, mape]

# Function that trains 7 models, each predicting adjusted close for day + 1 to 7
def train_eval(model_list, train_X, train_y, test_X, test_y):
    preds_list = list()
    index_names = [str('d+{}'.format(i)) for i in range(1, 8)]
    metrics_df = pd.DataFrame(columns=['MSE', 'SD-RMSE', 'MAE', 'MAPE'], index=index_names)
    for i in range(7):
        model_list[i].fit(train_X, train_y[:,i])
        preds_list.append(model_list[i].predict(test_X))
        metrics_df.loc['d+{}'.format(i + 1)] = get_metrics_list(test_y[:,i], preds_list[i])
    print(metrics_df)
    return preds_list

In [43]:
%%time

dummy_models_list = list()
for i in range(7):
    dummy_models_list.append(DummyRegressor())

dummy_preds_list = train_eval(dummy_models_list, train_X, train_y, test_X, test_y)

       MSE SD-RMSE    MAE    MAPE
d+1  0.242   5.106  0.482  62.904
d+2  0.242   5.101  0.482  62.907
d+3  0.242   5.095  0.483  62.909
d+4  0.243   5.087  0.483  62.912
d+5  0.243    5.08  0.483  62.916
d+6  0.244   5.067  0.484  62.919
d+7  0.244   5.057  0.484  62.923
CPU times: user 15.8 ms, sys: 0 ns, total: 15.8 ms
Wall time: 16.1 ms


In [44]:
%%time

linear_models_list = list()
for i in range(7):
    linear_models_list.append(LinearRegression())

linear_preds_list = train_eval(linear_models_list, train_X, train_y, test_X, test_y)

       MSE SD-RMSE    MAE   MAPE
d+1      0   0.126  0.008  1.057
d+2      0   0.165  0.011  1.412
d+3      0     0.2  0.013  1.673
d+4  0.001   0.235  0.015  1.976
d+5  0.001   0.266  0.017  2.234
d+6  0.001   0.296  0.019  2.504
d+7  0.001   0.321   0.02  2.703
CPU times: user 354 ms, sys: 150 ms, total: 504 ms
Wall time: 258 ms


In [46]:
%%time

rf_models_list = list()
for i in range(7):
    rf_models_list.append(RandomForestRegressor(n_estimators=100, random_state=42))

rf_preds_list = train_eval(rf_models_list, train_X, train_y, test_X, test_y)

       MSE SD-RMSE    MAE    MAPE
d+1  0.021   1.494  0.107  12.803
d+2  0.022   1.544  0.112  13.419
d+3  0.022    1.55  0.114  13.668
d+4  0.024   1.615  0.121  14.512
d+5  0.025   1.628  0.123  14.721
d+6  0.025   1.624  0.123  14.828
d+7  0.026   1.649  0.127  15.244
CPU times: user 2min 22s, sys: 115 ms, total: 2min 22s
Wall time: 2min 22s


As the dataset is relatively small (approx. 4000 records) and contains a lot of features (approx. 90), I would have expected LinearRegression model to perform poorly, and RandomForestRegressor to shine, but the linear model obtained suprisingly good evaluation metrics, versus quite poor accuracy for the random forests!

Let's investigate further by training:
- a simple model working with distances like K-Nearest-Neighbors
- Kernel Ridge Regressor, another simple model
- a more complex model like Support Vector Machines

In [47]:
%%time

knn_models_list = list()
for i in range(7):
    knn_models_list.append(KNeighborsRegressor())

knn_preds_list = train_eval(knn_models_list, train_X, train_y, test_X, test_y)

       MSE SD-RMSE    MAE    MAPE
d+1  0.021   1.521  0.118  14.462
d+2  0.022   1.544  0.121  14.744
d+3  0.023   1.555  0.122  14.877
d+4  0.023   1.567  0.123   15.04
d+5  0.024   1.584  0.125  15.207
d+6  0.024   1.591  0.125  15.265
d+7  0.024   1.601  0.127  15.393
CPU times: user 1.79 s, sys: 8.02 ms, total: 1.8 s
Wall time: 1.8 s


KNN does quite poorly at predicting the values I am interested in, most probably because of the large number of features the dataset has.

In [48]:
%%time

svr_models_list = list()
for i in range(7):
    svr_models_list.append(SVR(gamma='auto'))

svr_preds_list = train_eval(svr_models_list, train_X, train_y, test_X, test_y)

       MSE SD-RMSE    MAE    MAPE
d+1  0.018   1.393  0.129  16.691
d+2  0.017   1.368  0.126  16.328
d+3  0.016   1.326  0.122  15.748
d+4  0.015   1.266  0.116  14.959
d+5  0.015   1.256  0.115  14.844
d+6  0.014   1.232  0.113  14.569
d+7  0.014   1.208  0.111  14.212
CPU times: user 248 ms, sys: 4 ms, total: 252 ms
Wall time: 252 ms


In [49]:
%%time

kr_models_list = list()
for i in range(7):
    kr_models_list.append(KernelRidge())

kr_preds_list = train_eval(kr_models_list, train_X, train_y, test_X, test_y)

       MSE SD-RMSE    MAE   MAPE
d+1      0   0.159  0.012  1.537
d+2      0   0.194  0.014  1.863
d+3      0   0.229  0.017  2.185
d+4  0.001    0.26  0.019  2.432
d+5  0.001   0.291  0.021  2.653
d+6  0.001   0.317  0.022  2.858
d+7  0.001   0.339  0.024  3.041
CPU times: user 19.6 s, sys: 3.13 s, total: 22.7 s
Wall time: 16.2 s


It is interesting to note that a complex model like Support Vector Regressor fails to obtain good predictions even though it is supposed to work well with a large feature set, and that only simple models like Kernel Ridge and Linear Regression work well with the dataset.

#### DeepAR

The last algorithm I will train before moving to dimensionality reduction is DeepAR. It is included in Amazon SageMaker, and is supposed to work very well with time-oriented data such as the present market dataset. It is implemented with a Recurrent Neural Network.

In [7]:
# Initialize SageMaker and S3 variables
sagemaker_session = sagemaker.Session()
s3_bucket = sagemaker.Session().default_bucket()
s3_prefix = 'deepar-stock-pred'
role = sagemaker.get_execution_role()
region = sagemaker_session.boto_region_name
s3_output_path = "s3://{}/{}/output".format(s3_bucket, s3_prefix)
image_name = sagemaker.amazon.amazon_estimator.get_image_uri(region, "forecasting-deepar", "latest")

In [44]:
# split the test set into validation and testing set for DeepAR
# the first 50% of the testing set goes to the validation set
valid_size = int(len(test_X) * 0.5)

In [45]:
# Convert datasets to DeepAR JSON format and write them to local files

# Scale target values
deepar_target_scaler = MinMaxScaler()
adjclose_array = df['Adj Close'].values.reshape(-1, 1)
deepar_target_scaler.fit(adjclose_array)
target_list = deepar_target_scaler.transform(adjclose_array).reshape(1, -1)[0].tolist()

# Scale feature set
dyn_feat = df.drop('Adj Close', axis=1).values
scaled_dyn_feat = MinMaxScaler().fit_transform(dyn_feat)

# Create the train and valid feature lists
train_feat = scaled_dyn_feat[:train_size]
train_feat_list = np.swapaxes(train_feat, 0, 1).tolist()

valid_feat = scaled_dyn_feat[train_size : train_size+valid_size]
valid_feat_list = np.swapaxes(valid_feat, 0, 1).tolist()

# Build JSON queries for training
train_dict = {"start": str(df.index[0]), "target": target_list[:train_size], "dynamic_feat": train_feat_list}
train_json = json.dumps(train_dict)

valid_dict = {"start": str(df.index[train_size]), "target": target_list[train_size : train_size+valid_size], "dynamic_feat": valid_feat_list}
valid_json = json.dumps(valid_dict)

In [208]:
# Upload dataset to S3 to make it available to SageMaker
s3_data_path = "{}/data".format(s3_prefix)
s3 = boto3.resource('s3')
bucket = s3.Bucket(s3_bucket)
train_channel = s3_data_path + "/train.json"
valid_channel = s3_data_path + "/valid.json"
bucket.put_object(Key=train_channel, Body=train_json)
bucket.put_object(Key=valid_channel, Body=valid_json)

s3.Object(bucket_name='sagemaker-us-west-2-378467645007', key='deepar-stock-pred/data/valid.json')

In [209]:
hyperparameters = {
    "prediction_length": "7",
    "context_length": "7",
    "time_freq": "D",
    "epochs": "200",
    "early_stopping_patience": "40",
    "num_layers": "2",  
    "num_cells": "40",
    "mini_batch_size": "128",
    "learning_rate": "1e-3",
    "dropout_rate": "0.1", 
    "likelihood": "gaussian"
}

In [None]:
# Specify data channels
data_channels = {
    "train": 's3://{}/{}'.format(s3_bucket, train_channel),
    "test": 's3://{}/{}'.format(s3_bucket, valid_channel)
}

estimator = sagemaker.estimator.Estimator(
    sagemaker_session=sagemaker_session,
    image_name=image_name,
    role=role,
    train_instance_count=1,
    train_instance_type='ml.p2.xlarge',
    base_job_name='deepar-stock-pred-job',
    output_path=s3_output_path
)

# Set hyperparameters
estimator.set_hyperparameters(**hyperparameters)

# Train the model
estimator.fit(inputs=data_channels, wait=True)

2020-07-04 15:39:32 Starting - Starting the training job...
2020-07-04 15:39:33 Starting - Launching requested ML instances.........
2020-07-04 15:41:04 Starting - Preparing the instances for training......
2020-07-04 15:42:30 Downloading - Downloading input data
2020-07-04 15:42:30 Training - Downloading the training image.....[34mArguments: train[0m
[34m[07/04/2020 15:43:13 INFO 140466583250752] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'num_dynamic_feat': u'auto', u'dropout_rate': u'0.10', u'mini_batch_size': u'128', u'test_quantiles': u'[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]', u'_tuning_objective_metric': u'', u'_num_gpus': u'auto', u'num_eval_samples': u'100', u'learning_rate': u'0.001', u'num_cells': u'40', u'num_layers': u'2', u'embedding_dimension': u'10', u'_kvstore': u'auto', u'_num_kv_servers': u'auto', u'cardinality': u'auto', u'likelihood': u'student-t', u'early_stopping_patience': u''}[0

[34m[07/04/2020 15:45:46 INFO 140466583250752] Epoch[2] Batch[0] avg_epoch_loss=-2.384125[0m
[34m[07/04/2020 15:45:46 INFO 140466583250752] #quality_metric: host=algo-1, epoch=2, batch=0 train loss <loss>=-2.38412475586[0m
[34m[07/04/2020 15:45:46 INFO 140466583250752] Epoch[2] Batch[5] avg_epoch_loss=-2.447776[0m
[34m[07/04/2020 15:45:46 INFO 140466583250752] #quality_metric: host=algo-1, epoch=2, batch=5 train loss <loss>=-2.44777584076[0m
[34m[07/04/2020 15:45:46 INFO 140466583250752] Epoch[2] Batch [5]#011Speed: 1797.34 samples/sec#011loss=-2.447776[0m
[34m[07/04/2020 15:45:47 INFO 140466583250752] Epoch[2] Batch[10] avg_epoch_loss=-2.563244[0m
[34m[07/04/2020 15:45:47 INFO 140466583250752] #quality_metric: host=algo-1, epoch=2, batch=10 train loss <loss>=-2.7018055439[0m
[34m[07/04/2020 15:45:47 INFO 140466583250752] Epoch[2] Batch [10]#011Speed: 1219.95 samples/sec#011loss=-2.701806[0m
[34m[07/04/2020 15:45:47 INFO 140466583250752] processed a total of 1332 examp

[34m[07/04/2020 15:49:56 INFO 140466583250752] Epoch[7] Batch[0] avg_epoch_loss=-3.598943[0m
[34m[07/04/2020 15:49:56 INFO 140466583250752] #quality_metric: host=algo-1, epoch=7, batch=0 train loss <loss>=-3.59894275665[0m
[34m[07/04/2020 15:49:57 INFO 140466583250752] Epoch[7] Batch[5] avg_epoch_loss=-3.533870[0m
[34m[07/04/2020 15:49:57 INFO 140466583250752] #quality_metric: host=algo-1, epoch=7, batch=5 train loss <loss>=-3.5338704586[0m
[34m[07/04/2020 15:49:57 INFO 140466583250752] Epoch[7] Batch [5]#011Speed: 1833.60 samples/sec#011loss=-3.533870[0m
[34m[07/04/2020 15:49:57 INFO 140466583250752] processed a total of 1246 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 49802.136182785034, "sum": 49802.136182785034, "min": 49802.136182785034}}, "EndTime": 1593877797.458199, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593877747.655999}
[0m
[34m[07/04/2020 15:49:57 INFO 140466583250752] #t

[34m[07/04/2020 15:56:38 INFO 140466583250752] Epoch[15] Batch[0] avg_epoch_loss=-3.654280[0m
[34m[07/04/2020 15:56:38 INFO 140466583250752] #quality_metric: host=algo-1, epoch=15, batch=0 train loss <loss>=-3.65428042412[0m
[34m[07/04/2020 15:56:38 INFO 140466583250752] Epoch[15] Batch[5] avg_epoch_loss=-3.755961[0m
[34m[07/04/2020 15:56:38 INFO 140466583250752] #quality_metric: host=algo-1, epoch=15, batch=5 train loss <loss>=-3.75596090158[0m
[34m[07/04/2020 15:56:38 INFO 140466583250752] Epoch[15] Batch [5]#011Speed: 1823.56 samples/sec#011loss=-3.755961[0m
[34m[07/04/2020 15:56:39 INFO 140466583250752] processed a total of 1260 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 50366.43409729004, "sum": 50366.43409729004, "min": 50366.43409729004}}, "EndTime": 1593878199.196779, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593878148.829805}
[0m
[34m[07/04/2020 15:56:39 INFO 140466583250752]

[34m[07/04/2020 16:00:08 INFO 140466583250752] Epoch[19] Batch[0] avg_epoch_loss=-3.449018[0m
[34m[07/04/2020 16:00:08 INFO 140466583250752] #quality_metric: host=algo-1, epoch=19, batch=0 train loss <loss>=-3.44901847839[0m
[34m[07/04/2020 16:00:08 INFO 140466583250752] Epoch[19] Batch[5] avg_epoch_loss=-3.583866[0m
[34m[07/04/2020 16:00:08 INFO 140466583250752] #quality_metric: host=algo-1, epoch=19, batch=5 train loss <loss>=-3.58386600018[0m
[34m[07/04/2020 16:00:08 INFO 140466583250752] Epoch[19] Batch [5]#011Speed: 1824.76 samples/sec#011loss=-3.583866[0m
[34m[07/04/2020 16:00:08 INFO 140466583250752] processed a total of 1255 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52459.25283432007, "sum": 52459.25283432007, "min": 52459.25283432007}}, "EndTime": 1593878408.888612, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593878356.429296}
[0m
[34m[07/04/2020 16:00:08 INFO 140466583250752]

[34m[07/04/2020 16:05:22 INFO 140466583250752] Epoch[25] Batch[0] avg_epoch_loss=-3.360052[0m
[34m[07/04/2020 16:05:22 INFO 140466583250752] #quality_metric: host=algo-1, epoch=25, batch=0 train loss <loss>=-3.36005163193[0m
[34m[07/04/2020 16:05:23 INFO 140466583250752] Epoch[25] Batch[5] avg_epoch_loss=-3.580284[0m
[34m[07/04/2020 16:05:23 INFO 140466583250752] #quality_metric: host=algo-1, epoch=25, batch=5 train loss <loss>=-3.58028403918[0m
[34m[07/04/2020 16:05:23 INFO 140466583250752] Epoch[25] Batch [5]#011Speed: 1742.03 samples/sec#011loss=-3.580284[0m
[34m[07/04/2020 16:05:23 INFO 140466583250752] Epoch[25] Batch[10] avg_epoch_loss=-3.586229[0m
[34m[07/04/2020 16:05:23 INFO 140466583250752] #quality_metric: host=algo-1, epoch=25, batch=10 train loss <loss>=-3.59336228371[0m
[34m[07/04/2020 16:05:23 INFO 140466583250752] Epoch[25] Batch [10]#011Speed: 1124.55 samples/sec#011loss=-3.593362[0m
[34m[07/04/2020 16:05:23 INFO 140466583250752] processed a total of 1

[34m[07/04/2020 16:09:44 INFO 140466583250752] Epoch[30] Batch[0] avg_epoch_loss=-2.460935[0m
[34m[07/04/2020 16:09:44 INFO 140466583250752] #quality_metric: host=algo-1, epoch=30, batch=0 train loss <loss>=-2.4609348774[0m
[34m[07/04/2020 16:09:45 INFO 140466583250752] Epoch[30] Batch[5] avg_epoch_loss=-3.525690[0m
[34m[07/04/2020 16:09:45 INFO 140466583250752] #quality_metric: host=algo-1, epoch=30, batch=5 train loss <loss>=-3.5256896019[0m
[34m[07/04/2020 16:09:45 INFO 140466583250752] Epoch[30] Batch [5]#011Speed: 1815.92 samples/sec#011loss=-3.525690[0m
[34m[07/04/2020 16:09:45 INFO 140466583250752] processed a total of 1259 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52417.09804534912, "sum": 52417.09804534912, "min": 52417.09804534912}}, "EndTime": 1593878985.460502, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593878933.043335}
[0m
[34m[07/04/2020 16:09:45 INFO 140466583250752] #

[34m[07/04/2020 16:14:59 INFO 140466583250752] Epoch[36] Batch[0] avg_epoch_loss=-3.986794[0m
[34m[07/04/2020 16:14:59 INFO 140466583250752] #quality_metric: host=algo-1, epoch=36, batch=0 train loss <loss>=-3.98679351807[0m
[34m[07/04/2020 16:14:59 INFO 140466583250752] Epoch[36] Batch[5] avg_epoch_loss=-4.061351[0m
[34m[07/04/2020 16:14:59 INFO 140466583250752] #quality_metric: host=algo-1, epoch=36, batch=5 train loss <loss>=-4.06135058403[0m
[34m[07/04/2020 16:14:59 INFO 140466583250752] Epoch[36] Batch [5]#011Speed: 1810.99 samples/sec#011loss=-4.061351[0m
[34m[07/04/2020 16:15:00 INFO 140466583250752] Epoch[36] Batch[10] avg_epoch_loss=-4.043699[0m
[34m[07/04/2020 16:15:00 INFO 140466583250752] #quality_metric: host=algo-1, epoch=36, batch=10 train loss <loss>=-4.02251696587[0m
[34m[07/04/2020 16:15:00 INFO 140466583250752] Epoch[36] Batch [10]#011Speed: 1077.51 samples/sec#011loss=-4.022517[0m
[34m[07/04/2020 16:15:00 INFO 140466583250752] processed a total of 1

[34m[07/04/2020 16:20:14 INFO 140466583250752] Epoch[42] Batch[0] avg_epoch_loss=-4.093235[0m
[34m[07/04/2020 16:20:14 INFO 140466583250752] #quality_metric: host=algo-1, epoch=42, batch=0 train loss <loss>=-4.09323501587[0m
[34m[07/04/2020 16:20:15 INFO 140466583250752] Epoch[42] Batch[5] avg_epoch_loss=-4.048449[0m
[34m[07/04/2020 16:20:15 INFO 140466583250752] #quality_metric: host=algo-1, epoch=42, batch=5 train loss <loss>=-4.04844923814[0m
[34m[07/04/2020 16:20:15 INFO 140466583250752] Epoch[42] Batch [5]#011Speed: 1819.61 samples/sec#011loss=-4.048449[0m
[34m[07/04/2020 16:20:15 INFO 140466583250752] processed a total of 1262 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52190.75393676758, "sum": 52190.75393676758, "min": 52190.75393676758}}, "EndTime": 1593879615.55504, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593879563.364218}
[0m
[34m[07/04/2020 16:20:15 INFO 140466583250752] 

[34m[07/04/2020 16:25:28 INFO 140466583250752] Epoch[48] Batch[0] avg_epoch_loss=-3.633671[0m
[34m[07/04/2020 16:25:28 INFO 140466583250752] #quality_metric: host=algo-1, epoch=48, batch=0 train loss <loss>=-3.63367128372[0m
[34m[07/04/2020 16:25:28 INFO 140466583250752] Epoch[48] Batch[5] avg_epoch_loss=-3.784148[0m
[34m[07/04/2020 16:25:28 INFO 140466583250752] #quality_metric: host=algo-1, epoch=48, batch=5 train loss <loss>=-3.78414813677[0m
[34m[07/04/2020 16:25:28 INFO 140466583250752] Epoch[48] Batch [5]#011Speed: 1823.99 samples/sec#011loss=-3.784148[0m
[34m[07/04/2020 16:25:29 INFO 140466583250752] Epoch[48] Batch[10] avg_epoch_loss=-3.883523[0m
[34m[07/04/2020 16:25:29 INFO 140466583250752] #quality_metric: host=algo-1, epoch=48, batch=10 train loss <loss>=-4.0027736187[0m
[34m[07/04/2020 16:25:29 INFO 140466583250752] Epoch[48] Batch [10]#011Speed: 1215.72 samples/sec#011loss=-4.002774[0m
[34m[07/04/2020 16:25:29 INFO 140466583250752] processed a total of 13

[34m[07/04/2020 16:29:46 INFO 140466583250752] Epoch[53] Batch[0] avg_epoch_loss=-4.262716[0m
[34m[07/04/2020 16:29:46 INFO 140466583250752] #quality_metric: host=algo-1, epoch=53, batch=0 train loss <loss>=-4.26271629333[0m
[34m[07/04/2020 16:29:47 INFO 140466583250752] Epoch[53] Batch[5] avg_epoch_loss=-4.162700[0m
[34m[07/04/2020 16:29:47 INFO 140466583250752] #quality_metric: host=algo-1, epoch=53, batch=5 train loss <loss>=-4.16269961993[0m
[34m[07/04/2020 16:29:47 INFO 140466583250752] Epoch[53] Batch [5]#011Speed: 1759.34 samples/sec#011loss=-4.162700[0m
[34m[07/04/2020 16:29:47 INFO 140466583250752] processed a total of 1257 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 51779.4029712677, "sum": 51779.4029712677, "min": 51779.4029712677}}, "EndTime": 1593880187.460052, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593880135.680584}
[0m
[34m[07/04/2020 16:29:47 INFO 140466583250752] #t

[34m[07/04/2020 16:35:01 INFO 140466583250752] Epoch[59] Batch[0] avg_epoch_loss=-4.243622[0m
[34m[07/04/2020 16:35:01 INFO 140466583250752] #quality_metric: host=algo-1, epoch=59, batch=0 train loss <loss>=-4.24362230301[0m
[34m[07/04/2020 16:35:01 INFO 140466583250752] Epoch[59] Batch[5] avg_epoch_loss=-4.172838[0m
[34m[07/04/2020 16:35:01 INFO 140466583250752] #quality_metric: host=algo-1, epoch=59, batch=5 train loss <loss>=-4.17283757528[0m
[34m[07/04/2020 16:35:01 INFO 140466583250752] Epoch[59] Batch [5]#011Speed: 1738.70 samples/sec#011loss=-4.172838[0m
[34m[07/04/2020 16:35:01 INFO 140466583250752] processed a total of 1254 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52546.54312133789, "sum": 52546.54312133789, "min": 52546.54312133789}}, "EndTime": 1593880501.851024, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593880449.303942}
[0m
[34m[07/04/2020 16:35:01 INFO 140466583250752]

[34m[07/04/2020 16:39:23 INFO 140466583250752] Epoch[64] Batch[0] avg_epoch_loss=-4.105927[0m
[34m[07/04/2020 16:39:23 INFO 140466583250752] #quality_metric: host=algo-1, epoch=64, batch=0 train loss <loss>=-4.10592746735[0m
[34m[07/04/2020 16:39:23 INFO 140466583250752] Epoch[64] Batch[5] avg_epoch_loss=-4.214569[0m
[34m[07/04/2020 16:39:23 INFO 140466583250752] #quality_metric: host=algo-1, epoch=64, batch=5 train loss <loss>=-4.21456917127[0m
[34m[07/04/2020 16:39:23 INFO 140466583250752] Epoch[64] Batch [5]#011Speed: 1827.95 samples/sec#011loss=-4.214569[0m
[34m[07/04/2020 16:39:24 INFO 140466583250752] Epoch[64] Batch[10] avg_epoch_loss=-4.266317[0m
[34m[07/04/2020 16:39:24 INFO 140466583250752] #quality_metric: host=algo-1, epoch=64, batch=10 train loss <loss>=-4.32841539383[0m
[34m[07/04/2020 16:39:24 INFO 140466583250752] Epoch[64] Batch [10]#011Speed: 1119.79 samples/sec#011loss=-4.328415[0m
[34m[07/04/2020 16:39:24 INFO 140466583250752] processed a total of 1

[34m[07/04/2020 16:43:45 INFO 140466583250752] Epoch[69] Batch[0] avg_epoch_loss=-4.153598[0m
[34m[07/04/2020 16:43:45 INFO 140466583250752] #quality_metric: host=algo-1, epoch=69, batch=0 train loss <loss>=-4.15359783173[0m
[34m[07/04/2020 16:43:46 INFO 140466583250752] Epoch[69] Batch[5] avg_epoch_loss=-4.242741[0m
[34m[07/04/2020 16:43:46 INFO 140466583250752] #quality_metric: host=algo-1, epoch=69, batch=5 train loss <loss>=-4.24274110794[0m
[34m[07/04/2020 16:43:46 INFO 140466583250752] Epoch[69] Batch [5]#011Speed: 1811.64 samples/sec#011loss=-4.242741[0m
[34m[07/04/2020 16:43:46 INFO 140466583250752] processed a total of 1246 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52086.498975753784, "sum": 52086.498975753784, "min": 52086.498975753784}}, "EndTime": 1593881026.706016, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593880974.618937}
[0m
[34m[07/04/2020 16:43:46 INFO 1404665832507

[34m[07/04/2020 16:49:00 INFO 140466583250752] Epoch[75] Batch[0] avg_epoch_loss=-4.269294[0m
[34m[07/04/2020 16:49:00 INFO 140466583250752] #quality_metric: host=algo-1, epoch=75, batch=0 train loss <loss>=-4.2692937851[0m
[34m[07/04/2020 16:49:01 INFO 140466583250752] Epoch[75] Batch[5] avg_epoch_loss=-4.306075[0m
[34m[07/04/2020 16:49:01 INFO 140466583250752] #quality_metric: host=algo-1, epoch=75, batch=5 train loss <loss>=-4.30607461929[0m
[34m[07/04/2020 16:49:01 INFO 140466583250752] Epoch[75] Batch [5]#011Speed: 1808.85 samples/sec#011loss=-4.306075[0m
[34m[07/04/2020 16:49:01 INFO 140466583250752] processed a total of 1240 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52306.779861450195, "sum": 52306.779861450195, "min": 52306.779861450195}}, "EndTime": 1593881341.494165, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593881289.186824}
[0m
[34m[07/04/2020 16:49:01 INFO 14046658325075

[34m[07/04/2020 16:54:14 INFO 140466583250752] Epoch[81] Batch[0] avg_epoch_loss=-4.211822[0m
[34m[07/04/2020 16:54:14 INFO 140466583250752] #quality_metric: host=algo-1, epoch=81, batch=0 train loss <loss>=-4.21182155609[0m
[34m[07/04/2020 16:54:14 INFO 140466583250752] Epoch[81] Batch[5] avg_epoch_loss=-4.304887[0m
[34m[07/04/2020 16:54:14 INFO 140466583250752] #quality_metric: host=algo-1, epoch=81, batch=5 train loss <loss>=-4.30488697688[0m
[34m[07/04/2020 16:54:14 INFO 140466583250752] Epoch[81] Batch [5]#011Speed: 1805.34 samples/sec#011loss=-4.304887[0m
[34m[07/04/2020 16:54:15 INFO 140466583250752] Epoch[81] Batch[10] avg_epoch_loss=-4.145523[0m
[34m[07/04/2020 16:54:15 INFO 140466583250752] #quality_metric: host=algo-1, epoch=81, batch=10 train loss <loss>=-3.95428676605[0m
[34m[07/04/2020 16:54:15 INFO 140466583250752] Epoch[81] Batch [10]#011Speed: 1127.90 samples/sec#011loss=-3.954287[0m
[34m[07/04/2020 16:54:15 INFO 140466583250752] processed a total of 1

[34m[07/04/2020 16:59:29 INFO 140466583250752] Epoch[87] Batch[0] avg_epoch_loss=-4.162812[0m
[34m[07/04/2020 16:59:29 INFO 140466583250752] #quality_metric: host=algo-1, epoch=87, batch=0 train loss <loss>=-4.16281223297[0m
[34m[07/04/2020 16:59:29 INFO 140466583250752] Epoch[87] Batch[5] avg_epoch_loss=-4.263187[0m
[34m[07/04/2020 16:59:29 INFO 140466583250752] #quality_metric: host=algo-1, epoch=87, batch=5 train loss <loss>=-4.26318693161[0m
[34m[07/04/2020 16:59:29 INFO 140466583250752] Epoch[87] Batch [5]#011Speed: 1803.42 samples/sec#011loss=-4.263187[0m
[34m[07/04/2020 16:59:29 INFO 140466583250752] processed a total of 1241 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52040.87996482849, "sum": 52040.87996482849, "min": 52040.87996482849}}, "EndTime": 1593881969.89152, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593881917.850569}
[0m
[34m[07/04/2020 16:59:29 INFO 140466583250752] 

[34m[07/04/2020 17:04:43 INFO 140466583250752] Epoch[93] Batch[0] avg_epoch_loss=-4.476624[0m
[34m[07/04/2020 17:04:43 INFO 140466583250752] #quality_metric: host=algo-1, epoch=93, batch=0 train loss <loss>=-4.47662448883[0m
[34m[07/04/2020 17:04:44 INFO 140466583250752] Epoch[93] Batch[5] avg_epoch_loss=-4.304698[0m
[34m[07/04/2020 17:04:44 INFO 140466583250752] #quality_metric: host=algo-1, epoch=93, batch=5 train loss <loss>=-4.30469751358[0m
[34m[07/04/2020 17:04:44 INFO 140466583250752] Epoch[93] Batch [5]#011Speed: 1730.04 samples/sec#011loss=-4.304698[0m
[34m[07/04/2020 17:04:44 INFO 140466583250752] Epoch[93] Batch[10] avg_epoch_loss=-4.264870[0m
[34m[07/04/2020 17:04:44 INFO 140466583250752] #quality_metric: host=algo-1, epoch=93, batch=10 train loss <loss>=-4.21707744598[0m
[34m[07/04/2020 17:04:44 INFO 140466583250752] Epoch[93] Batch [10]#011Speed: 1097.99 samples/sec#011loss=-4.217077[0m
[34m[07/04/2020 17:04:44 INFO 140466583250752] processed a total of 1

[34m[07/04/2020 17:09:58 INFO 140466583250752] Epoch[99] Batch[0] avg_epoch_loss=-4.461358[0m
[34m[07/04/2020 17:09:58 INFO 140466583250752] #quality_metric: host=algo-1, epoch=99, batch=0 train loss <loss>=-4.46135807037[0m
[34m[07/04/2020 17:09:58 INFO 140466583250752] Epoch[99] Batch[5] avg_epoch_loss=-4.385100[0m
[34m[07/04/2020 17:09:58 INFO 140466583250752] #quality_metric: host=algo-1, epoch=99, batch=5 train loss <loss>=-4.3850997289[0m
[34m[07/04/2020 17:09:58 INFO 140466583250752] Epoch[99] Batch [5]#011Speed: 1792.81 samples/sec#011loss=-4.385100[0m
[34m[07/04/2020 17:09:59 INFO 140466583250752] Epoch[99] Batch[10] avg_epoch_loss=-4.383811[0m
[34m[07/04/2020 17:09:59 INFO 140466583250752] #quality_metric: host=algo-1, epoch=99, batch=10 train loss <loss>=-4.38226442337[0m
[34m[07/04/2020 17:09:59 INFO 140466583250752] Epoch[99] Batch [10]#011Speed: 1138.45 samples/sec#011loss=-4.382264[0m
[34m[07/04/2020 17:09:59 INFO 140466583250752] processed a total of 12

[34m[07/04/2020 17:14:21 INFO 140466583250752] Epoch[104] Batch[0] avg_epoch_loss=-4.417675[0m
[34m[07/04/2020 17:14:21 INFO 140466583250752] #quality_metric: host=algo-1, epoch=104, batch=0 train loss <loss>=-4.41767501831[0m
[34m[07/04/2020 17:14:21 INFO 140466583250752] Epoch[104] Batch[5] avg_epoch_loss=-4.393484[0m
[34m[07/04/2020 17:14:21 INFO 140466583250752] #quality_metric: host=algo-1, epoch=104, batch=5 train loss <loss>=-4.39348435402[0m
[34m[07/04/2020 17:14:21 INFO 140466583250752] Epoch[104] Batch [5]#011Speed: 1835.67 samples/sec#011loss=-4.393484[0m
[34m[07/04/2020 17:14:22 INFO 140466583250752] Epoch[104] Batch[10] avg_epoch_loss=-4.331566[0m
[34m[07/04/2020 17:14:22 INFO 140466583250752] #quality_metric: host=algo-1, epoch=104, batch=10 train loss <loss>=-4.25726318359[0m
[34m[07/04/2020 17:14:22 INFO 140466583250752] Epoch[104] Batch [10]#011Speed: 1182.00 samples/sec#011loss=-4.257263[0m
[34m[07/04/2020 17:14:22 INFO 140466583250752] processed a to

[34m[07/04/2020 17:19:36 INFO 140466583250752] Epoch[110] Batch[0] avg_epoch_loss=-4.058572[0m
[34m[07/04/2020 17:19:36 INFO 140466583250752] #quality_metric: host=algo-1, epoch=110, batch=0 train loss <loss>=-4.05857181549[0m
[34m[07/04/2020 17:19:36 INFO 140466583250752] Epoch[110] Batch[5] avg_epoch_loss=-4.245801[0m
[34m[07/04/2020 17:19:36 INFO 140466583250752] #quality_metric: host=algo-1, epoch=110, batch=5 train loss <loss>=-4.24580093225[0m
[34m[07/04/2020 17:19:36 INFO 140466583250752] Epoch[110] Batch [5]#011Speed: 1820.37 samples/sec#011loss=-4.245801[0m
[34m[07/04/2020 17:19:37 INFO 140466583250752] processed a total of 1278 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52283.75697135925, "sum": 52283.75697135925, "min": 52283.75697135925}}, "EndTime": 1593883177.047992, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593883124.763643}
[0m
[34m[07/04/2020 17:19:37 INFO 14046658325

[34m[07/04/2020 17:24:50 INFO 140466583250752] Epoch[116] Batch[0] avg_epoch_loss=-4.372274[0m
[34m[07/04/2020 17:24:50 INFO 140466583250752] #quality_metric: host=algo-1, epoch=116, batch=0 train loss <loss>=-4.3722743988[0m
[34m[07/04/2020 17:24:51 INFO 140466583250752] Epoch[116] Batch[5] avg_epoch_loss=-4.301935[0m
[34m[07/04/2020 17:24:51 INFO 140466583250752] #quality_metric: host=algo-1, epoch=116, batch=5 train loss <loss>=-4.30193471909[0m
[34m[07/04/2020 17:24:51 INFO 140466583250752] Epoch[116] Batch [5]#011Speed: 1818.31 samples/sec#011loss=-4.301935[0m
[34m[07/04/2020 17:24:51 INFO 140466583250752] Epoch[116] Batch[10] avg_epoch_loss=-4.390605[0m
[34m[07/04/2020 17:24:51 INFO 140466583250752] #quality_metric: host=algo-1, epoch=116, batch=10 train loss <loss>=-4.49701013565[0m
[34m[07/04/2020 17:24:51 INFO 140466583250752] Epoch[116] Batch [10]#011Speed: 1176.05 samples/sec#011loss=-4.497010[0m
[34m[07/04/2020 17:24:51 INFO 140466583250752] processed a tot

[34m[07/04/2020 17:30:06 INFO 140466583250752] Epoch[122] Batch[0] avg_epoch_loss=-4.329181[0m
[34m[07/04/2020 17:30:06 INFO 140466583250752] #quality_metric: host=algo-1, epoch=122, batch=0 train loss <loss>=-4.32918071747[0m
[34m[07/04/2020 17:30:06 INFO 140466583250752] Epoch[122] Batch[5] avg_epoch_loss=-4.372017[0m
[34m[07/04/2020 17:30:06 INFO 140466583250752] #quality_metric: host=algo-1, epoch=122, batch=5 train loss <loss>=-4.37201674779[0m
[34m[07/04/2020 17:30:06 INFO 140466583250752] Epoch[122] Batch [5]#011Speed: 1749.00 samples/sec#011loss=-4.372017[0m
[34m[07/04/2020 17:30:06 INFO 140466583250752] processed a total of 1241 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52691.547870635986, "sum": 52691.547870635986, "min": 52691.547870635986}}, "EndTime": 1593883806.812107, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593883754.120023}
[0m
[34m[07/04/2020 17:30:06 INFO 14046658

[34m[07/04/2020 17:35:20 INFO 140466583250752] Epoch[128] Batch[0] avg_epoch_loss=-4.387909[0m
[34m[07/04/2020 17:35:20 INFO 140466583250752] #quality_metric: host=algo-1, epoch=128, batch=0 train loss <loss>=-4.38790893555[0m
[34m[07/04/2020 17:35:21 INFO 140466583250752] Epoch[128] Batch[5] avg_epoch_loss=-4.424069[0m
[34m[07/04/2020 17:35:21 INFO 140466583250752] #quality_metric: host=algo-1, epoch=128, batch=5 train loss <loss>=-4.42406868935[0m
[34m[07/04/2020 17:35:21 INFO 140466583250752] Epoch[128] Batch [5]#011Speed: 1826.23 samples/sec#011loss=-4.424069[0m
[34m[07/04/2020 17:35:21 INFO 140466583250752] Epoch[128] Batch[10] avg_epoch_loss=-4.403231[0m
[34m[07/04/2020 17:35:21 INFO 140466583250752] #quality_metric: host=algo-1, epoch=128, batch=10 train loss <loss>=-4.37822647095[0m
[34m[07/04/2020 17:35:21 INFO 140466583250752] Epoch[128] Batch [10]#011Speed: 1125.34 samples/sec#011loss=-4.378226[0m
[34m[07/04/2020 17:35:21 INFO 140466583250752] processed a to

[34m[07/04/2020 17:40:37 INFO 140466583250752] Epoch[134] Batch[0] avg_epoch_loss=-4.319220[0m
[34m[07/04/2020 17:40:37 INFO 140466583250752] #quality_metric: host=algo-1, epoch=134, batch=0 train loss <loss>=-4.31921958923[0m
[34m[07/04/2020 17:40:37 INFO 140466583250752] Epoch[134] Batch[5] avg_epoch_loss=-4.347351[0m
[34m[07/04/2020 17:40:37 INFO 140466583250752] #quality_metric: host=algo-1, epoch=134, batch=5 train loss <loss>=-4.34735075633[0m
[34m[07/04/2020 17:40:37 INFO 140466583250752] Epoch[134] Batch [5]#011Speed: 1793.17 samples/sec#011loss=-4.347351[0m
[34m[07/04/2020 17:40:38 INFO 140466583250752] Epoch[134] Batch[10] avg_epoch_loss=-4.503284[0m
[34m[07/04/2020 17:40:38 INFO 140466583250752] #quality_metric: host=algo-1, epoch=134, batch=10 train loss <loss>=-4.69040384293[0m
[34m[07/04/2020 17:40:38 INFO 140466583250752] Epoch[134] Batch [10]#011Speed: 1101.63 samples/sec#011loss=-4.690404[0m
[34m[07/04/2020 17:40:38 INFO 140466583250752] processed a to

[34m[07/04/2020 17:45:52 INFO 140466583250752] Epoch[140] Batch[0] avg_epoch_loss=-4.389905[0m
[34m[07/04/2020 17:45:52 INFO 140466583250752] #quality_metric: host=algo-1, epoch=140, batch=0 train loss <loss>=-4.38990497589[0m
[34m[07/04/2020 17:45:52 INFO 140466583250752] Epoch[140] Batch[5] avg_epoch_loss=-4.307956[0m
[34m[07/04/2020 17:45:52 INFO 140466583250752] #quality_metric: host=algo-1, epoch=140, batch=5 train loss <loss>=-4.30795558294[0m
[34m[07/04/2020 17:45:52 INFO 140466583250752] Epoch[140] Batch [5]#011Speed: 1818.98 samples/sec#011loss=-4.307956[0m
[34m[07/04/2020 17:45:53 INFO 140466583250752] Epoch[140] Batch[10] avg_epoch_loss=-4.443791[0m
[34m[07/04/2020 17:45:53 INFO 140466583250752] #quality_metric: host=algo-1, epoch=140, batch=10 train loss <loss>=-4.60679368973[0m
[34m[07/04/2020 17:45:53 INFO 140466583250752] Epoch[140] Batch [10]#011Speed: 1123.25 samples/sec#011loss=-4.606794[0m
[34m[07/04/2020 17:45:53 INFO 140466583250752] processed a to

[34m[07/04/2020 17:51:08 INFO 140466583250752] Epoch[146] Batch[0] avg_epoch_loss=-3.441674[0m
[34m[07/04/2020 17:51:08 INFO 140466583250752] #quality_metric: host=algo-1, epoch=146, batch=0 train loss <loss>=-3.44167375565[0m
[34m[07/04/2020 17:51:09 INFO 140466583250752] Epoch[146] Batch[5] avg_epoch_loss=-4.038611[0m
[34m[07/04/2020 17:51:09 INFO 140466583250752] #quality_metric: host=algo-1, epoch=146, batch=5 train loss <loss>=-4.03861061732[0m
[34m[07/04/2020 17:51:09 INFO 140466583250752] Epoch[146] Batch [5]#011Speed: 1774.26 samples/sec#011loss=-4.038611[0m
[34m[07/04/2020 17:51:09 INFO 140466583250752] processed a total of 1276 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52746.55103683472, "sum": 52746.55103683472, "min": 52746.55103683472}}, "EndTime": 1593885069.616885, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593885016.869846}
[0m
[34m[07/04/2020 17:51:09 INFO 14046658325

[34m[07/04/2020 17:56:23 INFO 140466583250752] Epoch[152] Batch[0] avg_epoch_loss=-4.352229[0m
[34m[07/04/2020 17:56:23 INFO 140466583250752] #quality_metric: host=algo-1, epoch=152, batch=0 train loss <loss>=-4.35222911835[0m
[34m[07/04/2020 17:56:23 INFO 140466583250752] Epoch[152] Batch[5] avg_epoch_loss=-4.357634[0m
[34m[07/04/2020 17:56:23 INFO 140466583250752] #quality_metric: host=algo-1, epoch=152, batch=5 train loss <loss>=-4.35763374964[0m
[34m[07/04/2020 17:56:23 INFO 140466583250752] Epoch[152] Batch [5]#011Speed: 1794.28 samples/sec#011loss=-4.357634[0m
[34m[07/04/2020 17:56:24 INFO 140466583250752] Epoch[152] Batch[10] avg_epoch_loss=-4.392376[0m
[34m[07/04/2020 17:56:24 INFO 140466583250752] #quality_metric: host=algo-1, epoch=152, batch=10 train loss <loss>=-4.43406667709[0m
[34m[07/04/2020 17:56:24 INFO 140466583250752] Epoch[152] Batch [10]#011Speed: 1112.73 samples/sec#011loss=-4.434067[0m
[34m[07/04/2020 17:56:24 INFO 140466583250752] processed a to

[34m[07/04/2020 18:00:45 INFO 140466583250752] Epoch[157] Batch[0] avg_epoch_loss=-4.238723[0m
[34m[07/04/2020 18:00:45 INFO 140466583250752] #quality_metric: host=algo-1, epoch=157, batch=0 train loss <loss>=-4.23872327805[0m
[34m[07/04/2020 18:00:46 INFO 140466583250752] Epoch[157] Batch[5] avg_epoch_loss=-4.251945[0m
[34m[07/04/2020 18:00:46 INFO 140466583250752] #quality_metric: host=algo-1, epoch=157, batch=5 train loss <loss>=-4.25194541613[0m
[34m[07/04/2020 18:00:46 INFO 140466583250752] Epoch[157] Batch [5]#011Speed: 1708.14 samples/sec#011loss=-4.251945[0m
[34m[07/04/2020 18:00:46 INFO 140466583250752] Epoch[157] Batch[10] avg_epoch_loss=-4.356063[0m
[34m[07/04/2020 18:00:46 INFO 140466583250752] #quality_metric: host=algo-1, epoch=157, batch=10 train loss <loss>=-4.48100442886[0m
[34m[07/04/2020 18:00:46 INFO 140466583250752] Epoch[157] Batch [10]#011Speed: 1104.70 samples/sec#011loss=-4.481004[0m
[34m[07/04/2020 18:00:46 INFO 140466583250752] processed a to

[34m[07/04/2020 18:06:01 INFO 140466583250752] Epoch[163] Batch[0] avg_epoch_loss=-4.555402[0m
[34m[07/04/2020 18:06:01 INFO 140466583250752] #quality_metric: host=algo-1, epoch=163, batch=0 train loss <loss>=-4.55540180206[0m
[34m[07/04/2020 18:06:01 INFO 140466583250752] Epoch[163] Batch[5] avg_epoch_loss=-4.567573[0m
[34m[07/04/2020 18:06:01 INFO 140466583250752] #quality_metric: host=algo-1, epoch=163, batch=5 train loss <loss>=-4.56757259369[0m
[34m[07/04/2020 18:06:01 INFO 140466583250752] Epoch[163] Batch [5]#011Speed: 1776.56 samples/sec#011loss=-4.567573[0m
[34m[07/04/2020 18:06:02 INFO 140466583250752] processed a total of 1254 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52381.30497932434, "sum": 52381.30497932434, "min": 52381.30497932434}}, "EndTime": 1593885962.35094, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593885909.969569}
[0m
[34m[07/04/2020 18:06:02 INFO 140466583250

[34m[07/04/2020 18:10:24 INFO 140466583250752] Epoch[168] Batch[0] avg_epoch_loss=-4.131955[0m
[34m[07/04/2020 18:10:24 INFO 140466583250752] #quality_metric: host=algo-1, epoch=168, batch=0 train loss <loss>=-4.13195514679[0m
[34m[07/04/2020 18:10:25 INFO 140466583250752] Epoch[168] Batch[5] avg_epoch_loss=-4.113587[0m
[34m[07/04/2020 18:10:25 INFO 140466583250752] #quality_metric: host=algo-1, epoch=168, batch=5 train loss <loss>=-4.11358706156[0m
[34m[07/04/2020 18:10:25 INFO 140466583250752] Epoch[168] Batch [5]#011Speed: 1815.41 samples/sec#011loss=-4.113587[0m
[34m[07/04/2020 18:10:25 INFO 140466583250752] Epoch[168] Batch[10] avg_epoch_loss=-4.186040[0m
[34m[07/04/2020 18:10:25 INFO 140466583250752] #quality_metric: host=algo-1, epoch=168, batch=10 train loss <loss>=-4.27298316956[0m
[34m[07/04/2020 18:10:25 INFO 140466583250752] Epoch[168] Batch [10]#011Speed: 1313.11 samples/sec#011loss=-4.272983[0m
[34m[07/04/2020 18:10:25 INFO 140466583250752] processed a to

[34m[07/04/2020 18:15:39 INFO 140466583250752] Epoch[174] Batch[0] avg_epoch_loss=-4.683774[0m
[34m[07/04/2020 18:15:39 INFO 140466583250752] #quality_metric: host=algo-1, epoch=174, batch=0 train loss <loss>=-4.68377351761[0m
[34m[07/04/2020 18:15:39 INFO 140466583250752] Epoch[174] Batch[5] avg_epoch_loss=-4.456165[0m
[34m[07/04/2020 18:15:39 INFO 140466583250752] #quality_metric: host=algo-1, epoch=174, batch=5 train loss <loss>=-4.45616483688[0m
[34m[07/04/2020 18:15:39 INFO 140466583250752] Epoch[174] Batch [5]#011Speed: 1819.35 samples/sec#011loss=-4.456165[0m
[34m[07/04/2020 18:15:40 INFO 140466583250752] processed a total of 1259 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52685.52088737488, "sum": 52685.52088737488, "min": 52685.52088737488}}, "EndTime": 1593886540.398934, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593886487.712874}
[0m
[34m[07/04/2020 18:15:40 INFO 14046658325

[34m[07/04/2020 18:20:54 INFO 140466583250752] Epoch[180] Batch[0] avg_epoch_loss=-4.602485[0m
[34m[07/04/2020 18:20:54 INFO 140466583250752] #quality_metric: host=algo-1, epoch=180, batch=0 train loss <loss>=-4.60248470306[0m
[34m[07/04/2020 18:20:55 INFO 140466583250752] Epoch[180] Batch[5] avg_epoch_loss=-4.495529[0m
[34m[07/04/2020 18:20:55 INFO 140466583250752] #quality_metric: host=algo-1, epoch=180, batch=5 train loss <loss>=-4.49552869797[0m
[34m[07/04/2020 18:20:55 INFO 140466583250752] Epoch[180] Batch [5]#011Speed: 1814.34 samples/sec#011loss=-4.495529[0m
[34m[07/04/2020 18:20:55 INFO 140466583250752] processed a total of 1267 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52764.116048812866, "sum": 52764.116048812866, "min": 52764.116048812866}}, "EndTime": 1593886855.672465, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593886802.90783}
[0m
[34m[07/04/2020 18:20:55 INFO 140466583

[34m[07/04/2020 18:26:09 INFO 140466583250752] Epoch[186] Batch[0] avg_epoch_loss=-4.420472[0m
[34m[07/04/2020 18:26:09 INFO 140466583250752] #quality_metric: host=algo-1, epoch=186, batch=0 train loss <loss>=-4.42047166824[0m
[34m[07/04/2020 18:26:09 INFO 140466583250752] Epoch[186] Batch[5] avg_epoch_loss=-4.495818[0m
[34m[07/04/2020 18:26:09 INFO 140466583250752] #quality_metric: host=algo-1, epoch=186, batch=5 train loss <loss>=-4.4958178997[0m
[34m[07/04/2020 18:26:09 INFO 140466583250752] Epoch[186] Batch [5]#011Speed: 1803.96 samples/sec#011loss=-4.495818[0m
[34m[07/04/2020 18:26:09 INFO 140466583250752] processed a total of 1275 examples[0m
[34m#metrics {"Metrics": {"update.time": {"count": 1, "max": 52178.12895774841, "sum": 52178.12895774841, "min": 52178.12895774841}}, "EndTime": 1593887169.851948, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/DeepAR"}, "StartTime": 1593887117.673298}
[0m
[34m[07/04/2020 18:26:09 INFO 140466583250

[34m[07/04/2020 18:31:23 INFO 140466583250752] Epoch[192] Batch[0] avg_epoch_loss=-4.491228[0m
[34m[07/04/2020 18:31:23 INFO 140466583250752] #quality_metric: host=algo-1, epoch=192, batch=0 train loss <loss>=-4.49122810364[0m
[34m[07/04/2020 18:31:23 INFO 140466583250752] Epoch[192] Batch[5] avg_epoch_loss=-4.530313[0m
[34m[07/04/2020 18:31:23 INFO 140466583250752] #quality_metric: host=algo-1, epoch=192, batch=5 train loss <loss>=-4.53031269709[0m
[34m[07/04/2020 18:31:23 INFO 140466583250752] Epoch[192] Batch [5]#011Speed: 1818.93 samples/sec#011loss=-4.530313[0m
[34m[07/04/2020 18:31:24 INFO 140466583250752] Epoch[192] Batch[10] avg_epoch_loss=-4.546350[0m
[34m[07/04/2020 18:31:24 INFO 140466583250752] #quality_metric: host=algo-1, epoch=192, batch=10 train loss <loss>=-4.56559391022[0m
[34m[07/04/2020 18:31:24 INFO 140466583250752] Epoch[192] Batch [10]#011Speed: 1228.28 samples/sec#011loss=-4.565594[0m
[34m[07/04/2020 18:31:24 INFO 140466583250752] processed a to

[34m[07/04/2020 18:36:38 INFO 140466583250752] Epoch[198] Batch[0] avg_epoch_loss=-4.315952[0m
[34m[07/04/2020 18:36:38 INFO 140466583250752] #quality_metric: host=algo-1, epoch=198, batch=0 train loss <loss>=-4.31595230103[0m
[34m[07/04/2020 18:36:38 INFO 140466583250752] Epoch[198] Batch[5] avg_epoch_loss=-4.498737[0m
[34m[07/04/2020 18:36:38 INFO 140466583250752] #quality_metric: host=algo-1, epoch=198, batch=5 train loss <loss>=-4.49873725573[0m
[34m[07/04/2020 18:36:38 INFO 140466583250752] Epoch[198] Batch [5]#011Speed: 1810.37 samples/sec#011loss=-4.498737[0m
[34m[07/04/2020 18:36:39 INFO 140466583250752] Epoch[198] Batch[10] avg_epoch_loss=-3.679280[0m
[34m[07/04/2020 18:36:39 INFO 140466583250752] #quality_metric: host=algo-1, epoch=198, batch=10 train loss <loss>=-2.69593200684[0m
[34m[07/04/2020 18:36:39 INFO 140466583250752] Epoch[198] Batch [10]#011Speed: 1198.98 samples/sec#011loss=-2.695932[0m
[34m[07/04/2020 18:36:39 INFO 140466583250752] processed a to

Two elements from the output above are worth noting once training is done:
- the final RMSE test loss obtained after 200 training epochs is 0.0393535601381, which gives the SD-RMSE of 1.36 below. This is not very good compared to the linear regression and kernel ridge models trained previously - respectively 0.126 and 0.159 for d+1


- the learning phase took 3 hours. This is not really acceptable, as we want our training phase to be much quicker than this so that our model can return predictions for a new stock as fast as possible once training is over.

In [53]:
# Calculate the SD-RMSE from the output above
0.0393535601381 / np.std(target_list[train_size : train_size+valid_size])

1.3618174275914596

#### For both reasons above, DeepAR will be dropped and I will keep working on LinearRegression and KernelRidge models for next steps.