# <a id='toc1_'></a>[Forecasting (PyTorch)](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [Forecasting (PyTorch)](#toc1_)    
  - [Bidding dataset class](#toc1_1_)    
  - [LSTM model class](#toc1_2_)    
  - [Early stopping class](#toc1_3_)    
  - [Experiment class](#toc1_4_)    
  - [Parameters](#toc1_5_)    
  - [Model](#toc1_6_)    
  - [Training](#toc1_7_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

In [9]:
import json
import os

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from torch.utils.data import DataLoader, Dataset

## <a id='toc1_1_'></a>[Bidding dataset class](#toc0_)

In [10]:

class BiddingData(Dataset):

    def __init__(self,
                 seq_len: int,
                 pred_len: int,
                 scale: bool,
                 flag: str,
                 val_split: float,
                 test_split: float,
                 cluster: int,
                 features: list,
                 target: str | None = None) -> None:
        assert flag in ['train', 'test', 'val', 'pred']
        assert target
        assert features
        self.flag = flag
        type_map = {'train': 0, 'val': 1, 'test': 2}
        self.set_type = type_map[self.flag]
        self.seq_len = seq_len
        self.pred_len = pred_len
        self.val_split = val_split
        self.test_split = test_split
        self.target = target
        self.features = features
        self.scale = scale
        self.cluster = cluster
        self.__read_data__()

    def __len__(self) -> int:
        return len(self.x) - self.seq_len - self.pred_len + 1

    def __read_data__(self) -> pd.DataFrame:
        cluster_df = pd.read_csv(
            os.path.join(os.environ['PROCESSED_DATA_PATH'],
                         f'processed_{self.cluster}.csv'))
        size = len(cluster_df)
        border1s = [
            0,
            int(size * (1 - self.val_split - self.test_split)),
            int(size * (1 - self.test_split))
        ]
        border2s = [
            int(size * (1 - self.val_split - self.test_split)),
            int(size * (1 - self.test_split)), size
        ]
        border1 = border1s[self.set_type]
        border2 = border2s[self.set_type]
        self.forecast_features = cluster_df.columns
        self.dataset = cluster_df.loc[border1:border2, features + [target]]
        self.data = self.dataset.values
        self.y = cluster_df.loc[border1:border2,
                                self.target].values.reshape(-1, 1)
        self.x = cluster_df.loc[border1:border2, self.features].values

        if self.scale:
            self.scaler = StandardScaler()
            train_data = cluster_df.loc[border1s[0]:border2s[0],
                                        features + [target]]
            self.scaler.fit(train_data)
            self.data = self.scaler.transform(self.dataset.values)
            self.x = pd.DataFrame(data=self.data,
                                  columns=features + [target])[features].values
            self.y = pd.DataFrame(data=self.data, columns=features +
                                  [target])[target].values.reshape(-1, 1)

    def __getitem__(self, index: int):
        x_begin = index
        x_end = x_begin + self.seq_len
        y_begin = x_end
        y_end = y_begin + self.pred_len
        seq_x = self.x[x_begin:x_end]
        seq_y = self.y[y_begin:y_end]
        return seq_x, seq_y

    def inverse_transform(self, data):
        return self.scaler.inverse_transform(data)

## <a id='toc1_2_'></a>[LSTM model class](#toc0_)

In [11]:
import torch
from torch import nn


class LSTM(nn.Module):

    def __init__(self, input_size: int, hidden_size: int, num_layers: int,
                 batch_size: int, output_size: int, **kwargs):
        super(LSTM, self).__init__()
        # rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2) # input_size - number of expected features in the input x, hidden_size -  number of features in the hidden state h

        # Inputs:
        # input = torch.randn(5, 3, 10) # [batch_size, input_size] for unbatched input, [seq_len, batch_size, input_size] when batch_first=False, [batch_size, seq_len, input_size] when batch_first=True

        # h0 = torch.randn(2, 3, 20) # [D * num_layers, hidden_size] for unbatched input, [D * num_layers, barch_size, hidden_size] , D = 2 if birectional=True, otherwise 1
        # c0 = torch.randn(2, 3, 20)
        # output, (hn, cn) = rnn(input, (h0, c0))
        self.input_size = input_size  # number of expected features in the input
        self.hidden_size = hidden_size  # number of features in the hidden state
        self.output_size = output_size
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.lstm = nn.LSTM(input_size=self.input_size,
                            hidden_size=self.hidden_size,
                            num_layers=self.num_layers,
                            batch_first=True,
                            dtype=torch.float)
        self.linear = nn.Linear(self.hidden_size,
                                self.output_size,
                                dtype=torch.float)
        self.h0 = torch.randn(self.num_layers,
                              self.batch_size,
                              self.hidden_size,
                              dtype=torch.float)
        self.c0 = torch.randn(self.num_layers,
                              self.batch_size,
                              self.hidden_size,
                              dtype=torch.float)

    def forward(self, input):
        # input [self.batch_size, self.seq_len, self.input_size]
        output, (hn, cn) = self.lstm(input, (self.h0, self.c0))
        output = self.linear(output)
        return output


## <a id='toc1_3_'></a>[Early stopping class](#toc0_)

In [12]:
import torch
from torch import optim
from torch.utils.tensorboard import SummaryWriter
from datetime import datetime


class EarlyStopping:

    def __init__(self, patience: int, verbose: bool = True, delta: int = 0):
        self.patience = patience
        self.verbose = verbose
        self.counter = 0
        self.best_score = None
        self.early_stop = False
        self.val_loss_min = float('inf')
        self.delta = delta

    def __call__(self, val_loss, model, path='./runs'):
        score = -val_loss
        if self.best_score is None:
            self.best_score = score
            self.path = os.path.join(path)
            self.save_checkpoint(val_loss, model, self.path)
        elif score < self.best_score + self.delta:
            self.counter += 1
            print(
                f'EarlyStopping counter: {self.counter} out of {self.patience}'
            )
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_score = score
            self.save_checkpoint(val_loss, model, self.path)
            self.counter = 0

    def save_checkpoint(self, val_loss, model, path):
        if self.verbose:
            print(
                f'validation loss decreased ({self.val_loss_min:.3f} --> {val_loss:.3f})',
                'saving model',
                sep='\n')
        os.makedirs(path, exist_ok=True)
        torch.save(model, os.path.join(path, 'checkpoint.pth'))
        self.val_loss_min = val_loss


## <a id='toc1_4_'></a>[Experiment class](#toc0_)

In [13]:
import matplotlib.pyplot as plt

class Experiment:

    def __init__(self, args):
        self.args = args
        torch.manual_seed(self.args['seed'])

    def _select_criterion(self):
        return nn.MSELoss()

    def _select_optimizer(self):
        return optim.Adam(self.model.parameters(),
                          lr=self.args['learning_rate'])

    def _get_data(self, flag):
        if flag in ['test', 'pred']:
            shuffle_flag = False
        else:
            shuffle_flag = True
        data = BiddingData(seq_len=self.args['seq_len'],
                           pred_len=self.args['pred_len'],
                           features=self.args['features'],
                           val_split=self.args['val_split'],
                           test_split=self.args['test_split'],
                           target=self.args['target'],
                           cluster=self.args['cluster'],
                           scale=True,
                           flag=flag)
        loader = DataLoader(dataset=data,
                            batch_size=self.args['batch_size'],
                            shuffle=shuffle_flag,
                            drop_last=True)
        return data, loader

    def _build_model(self):
        model = LSTM
        return model(input_size=self.train_data.x.shape[-1],
                     output_size=self.train_data.y.shape[-1],
                     hidden_size=self.args['hidden_size'],
                     num_layers=self.args['num_layers'],
                     batch_size=self.args['batch_size'])

    def train(self):
        self.train_data, self.train_loader = self._get_data(flag='train')
        self.val_data, self.val_loader = self._get_data(flag='val')
        self.test_data, self.test_loader = self._get_data(flag='test')
        self.model = self._build_model()
        model_optim = self._select_optimizer()
        criterion = self._select_criterion()
        train_steps = len(self.train_loader)
        early_stopping = EarlyStopping(patience=self.args['patience'],
                                       verbose=True)
        self.path = os.path.join(os.environ['RUNS_PATH'],
                                 self.args['timestamp'],
                                 f'cluster_{self.args["cluster"]}')
        os.makedirs(self.path, exist_ok=True)
        for epoch in range(self.args['epochs']):
            print('epoch: {}'.format(epoch + 1))
            train_losses = []
            self.model.train()
            epoch_time = datetime.now()
            self.writer = SummaryWriter(log_dir=os.path.join(
                self.path, 'tensorboards', f'epoc_{epoch+1}'))
            for i, (batch_x, batch_y) in enumerate(self.train_loader):
                batch_x = batch_x.float()
                batch_y = batch_y.float()
                model_optim.zero_grad()
                pred = self.model(batch_x)
                pred = pred[:, -self.args['pred_len']:, :]
                loss = torch.sqrt(criterion(pred, batch_y))
                self.writer.add_scalar('loss train/iter', loss, i)
                train_losses.append(loss.item())
                if i % self.args['log_interval'] == 0:
                    print('\titer: {0:>5d}/{1:>5d} | loss: {2:.3f}'.format(
                        i, train_steps, loss.item()))
                loss.backward()
                model_optim.step()
            train_loss = np.mean(train_losses)
            val_loss = self.evaluate(self.val_loader, criterion)
            test_loss = self.evaluate(self.test_loader, criterion)
            print('epoch {0} time: {1} s'.format(epoch + 1,
                                                 datetime.now() - epoch_time))
            print(
                'train loss: {0:.3f} | val loss: {1:.3f} | test loss: {2:.3f}'.
                format(train_loss, val_loss, test_loss))
            early_stopping(val_loss, self.model, self.path)
            self.writer.flush()
            if early_stopping.early_stop:
                print('early stopping!')
                break
            self.writer.close()
        best_model_path = os.path.join(self.path, 'checkpoint.pth')
        self.model = torch.load(best_model_path)
        return self.model

    def evaluate(self, loader, criterion):
        losses = []
        self.model.eval()
        with torch.no_grad():
            for i, (batch_x, batch_y) in enumerate(loader):
                batch_x = batch_x.float()
                batch_y = batch_y.float()
                pred = self.model(batch_x)
                pred = pred[:, -self.args['pred_len']:, :]
                loss = torch.sqrt(criterion(pred, batch_y))
                self.writer.add_scalar(f'loss {loader.dataset.flag}/iter',
                                       loss, i)
                losses.append(loss.item())
        loss = np.mean(losses)
        self.model.train()
        return loss

    def test(self):
        self.test_data, self.test_loader = self._get_data(flag='test')
        preds = []
        trues = []
        test_folder = os.path.join(self.path, 'test_results')
        os.makedirs(test_folder, exist_ok=True)
        self.model.eval()
        with torch.no_grad():
            for i, (batch_x, batch_y) in enumerate(self.test_loader):
                batch_x = batch_x.float()
                batch_y = batch_y.float()
                pred = self.model(batch_x)
                pred = torch.flatten(pred[:, -self.args['pred_len']:, -1])
                true = torch.flatten(batch_y.float())
                preds.append(pred.detach().cpu().numpy())
                trues.append(true.detach().cpu().numpy())
        preds = np.array(preds).flatten()
        trues = np.array(trues).flatten()
        visual(
            preds, trues,
            os.path.join(self.path,
                         f'test_cluster_{self.args["cluster"]}.pdf'))

    def predict(self):
        pred_data, pred_loader = self._get_data(flag='train')
        self.model.load_state_dict(torch.load(self.best_model_path))
        preds = []
        self.model.eval()
        with torch.no_grad():
            for i, (batch_x, batch_y) in enumerate(pred_loader):
                batch_x = batch_x.float().to(self.device)
                batch_y = batch_y.float()
                pred = self.model(batch_x)
                preds.append(pred)
        preds = np.array(preds)
        preds = preds.reshape(-1, preds.shape[-2], preds.shape[-1])
        if (pred_data.scale):
            preds = pred_data.inverse_transform(preds)


def visual(true, preds, name):
    plt.figure()
    plt.plot(true, label='test ground truth', linewidth=2)
    if preds is not None:
        plt.plot(preds, label='test prediction', linewidth=2)
    plt.legend()
    plt.savefig(name, bbox_inches='tight')
    plt.close()

## Features

In [14]:
features = [
    'Impressions',
    'AbsoluteTopImpressionPercentage',
    'TopImpressionPercentage',
    'SearchImpressionShare',
    'SearchTopImpressionShare',
    'SearchRankLostTopImpressionShare',
    'Clicks',
    'Cost_gbp',
    'CpcBid_gbp',
]
features_date = features + ['Date']
target = 'CpcBid_gbp'
features.remove(target)
with open(
        os.path.join(os.environ['MODELS_PATH'], 'kmeans_clustered_dict.json'),
        'r') as f:
    clustered = json.load(f)

## <a id='toc1_5_'></a>[Parameters](#toc0_)

In [16]:
SEED = 10
task_params = dict(seq_len=14, pred_len=1, features=features, target=target)
training_params = dict(val_split=0.2,
                       test_split=0.2,
                       batch_size=4,
                       patience=20,
                       epochs=500,
                       learning_rate=0.01,
                       log_interval=10,
                       seed=SEED)
model_params = dict(hidden_size=10, num_layers=2)

## <a id='toc1_6_'></a>[Model](#toc0_)

In [17]:
exp = Experiment(
    {
        **task_params,
        **training_params,
        **model_params,
        'cluster': 0,
        'timestamp': '',
    }, )
exp.train_data, exp.train_loader = exp._get_data(flag='train')
exp._build_model()



LSTM(
  (lstm): LSTM(8, 10, num_layers=2, batch_first=True)
  (linear): Linear(in_features=10, out_features=1, bias=True)
)

## <a id='toc1_7_'></a>[Training](#toc0_)

In [18]:
timestamp = datetime.now().strftime('%Y%m%d%H%M%S')
for f in os.listdir(os.environ['PROCESSED_DATA_PATH']):
    if f.endswith('.csv'):
        exp = Experiment({
            **task_params,
            **training_params,
            **model_params, 'cluster':
            f.rstrip('.csv').lstrip('processed_'),
            'timestamp':
            timestamp
        })
        exp.train()
        exp.test()




epoch: 1
	iter:     0/   17 | loss: 0.781
	iter:    10/   17 | loss: 0.899
epoch 1 time: 0:00:03.941642 s
train loss: 0.899 | val loss: 0.672 | test loss: 0.401
validation loss decreased (inf --> 0.672)
saving model
epoch: 2
	iter:     0/   17 | loss: 0.793
	iter:    10/   17 | loss: 0.253
epoch 2 time: 0:00:00.170021 s
train loss: 0.342 | val loss: 0.319 | test loss: 1.121
validation loss decreased (0.672 --> 0.319)
saving model
epoch: 3
	iter:     0/   17 | loss: 0.247
	iter:    10/   17 | loss: 0.108
epoch 3 time: 0:00:00.162647 s
train loss: 0.258 | val loss: 0.415 | test loss: 1.304
EarlyStopping counter: 1 out of 20
epoch: 4
	iter:     0/   17 | loss: 0.119
	iter:    10/   17 | loss: 0.083
epoch 4 time: 0:00:00.134539 s
train loss: 0.253 | val loss: 0.350 | test loss: 1.015
EarlyStopping counter: 2 out of 20
epoch: 5
	iter:     0/   17 | loss: 0.106
	iter:    10/   17 | loss: 0.470
epoch 5 time: 0:00:00.134685 s
train loss: 0.200 | val loss: 0.290 | test loss: 0.796
validation lo



epoch: 1
	iter:     0/   17 | loss: 0.780
	iter:    10/   17 | loss: 0.891
epoch 1 time: 0:00:00.133593 s
train loss: 0.854 | val loss: 0.723 | test loss: 0.296
validation loss decreased (inf --> 0.723)
saving model
epoch: 2
	iter:     0/   17 | loss: 0.668
	iter:    10/   17 | loss: 0.265
epoch 2 time: 0:00:00.136969 s
train loss: 0.310 | val loss: 1.064 | test loss: 0.723
EarlyStopping counter: 1 out of 20
epoch: 3
	iter:     0/   17 | loss: 0.276
	iter:    10/   17 | loss: 0.320
epoch 3 time: 0:00:00.137604 s
train loss: 0.230 | val loss: 0.878 | test loss: 0.764
EarlyStopping counter: 2 out of 20
epoch: 4
	iter:     0/   17 | loss: 0.157
	iter:    10/   17 | loss: 0.048
epoch 4 time: 0:00:00.133216 s
train loss: 0.223 | val loss: 0.882 | test loss: 0.559
EarlyStopping counter: 3 out of 20
epoch: 5
	iter:     0/   17 | loss: 0.191
	iter:    10/   17 | loss: 0.548
epoch 5 time: 0:00:00.131222 s
train loss: 0.214 | val loss: 0.933 | test loss: 0.616
EarlyStopping counter: 4 out of 20




epoch: 1
	iter:     0/   17 | loss: 0.781
	iter:    10/   17 | loss: 0.845
epoch 1 time: 0:00:00.136263 s
train loss: 0.859 | val loss: 0.732 | test loss: 0.219
validation loss decreased (inf --> 0.732)
saving model
epoch: 2
	iter:     0/   17 | loss: 0.670
	iter:    10/   17 | loss: 0.301
epoch 2 time: 0:00:00.144048 s
train loss: 0.293 | val loss: 0.281 | test loss: 0.572
validation loss decreased (0.732 --> 0.281)
saving model
epoch: 3
	iter:     0/   17 | loss: 0.151
	iter:    10/   17 | loss: 0.267
epoch 3 time: 0:00:00.128566 s
train loss: 0.279 | val loss: 0.441 | test loss: 0.766
EarlyStopping counter: 1 out of 20
epoch: 4
	iter:     0/   17 | loss: 0.099
	iter:    10/   17 | loss: 0.131
epoch 4 time: 0:00:00.138934 s
train loss: 0.260 | val loss: 0.601 | test loss: 0.580
EarlyStopping counter: 2 out of 20
epoch: 5
	iter:     0/   17 | loss: 0.072
	iter:    10/   17 | loss: 0.493
epoch 5 time: 0:00:00.129834 s
train loss: 0.220 | val loss: 0.658 | test loss: 0.650
EarlyStopping



epoch: 1
	iter:     0/   17 | loss: 0.824
	iter:    10/   17 | loss: 0.852
epoch 1 time: 0:00:00.117621 s
train loss: 0.875 | val loss: 0.530 | test loss: 0.764
validation loss decreased (inf --> 0.530)
saving model
epoch: 2
	iter:     0/   17 | loss: 0.737
	iter:    10/   17 | loss: 0.294
epoch 2 time: 0:00:00.116181 s
train loss: 0.346 | val loss: 0.710 | test loss: 1.534
EarlyStopping counter: 1 out of 20
epoch: 3
	iter:     0/   17 | loss: 0.488
	iter:    10/   17 | loss: 0.423
epoch 3 time: 0:00:00.114700 s
train loss: 0.288 | val loss: 0.794 | test loss: 1.655
EarlyStopping counter: 2 out of 20
epoch: 4
	iter:     0/   17 | loss: 0.254
	iter:    10/   17 | loss: 0.103
epoch 4 time: 0:00:00.116991 s
train loss: 0.239 | val loss: 0.620 | test loss: 0.738
EarlyStopping counter: 3 out of 20
epoch: 5
	iter:     0/   17 | loss: 0.301
	iter:    10/   17 | loss: 0.486
epoch 5 time: 0:00:00.132633 s
train loss: 0.295 | val loss: 0.718 | test loss: 1.570
EarlyStopping counter: 4 out of 20




epoch: 1
	iter:     0/   17 | loss: 0.813
	iter:    10/   17 | loss: 0.780
epoch 1 time: 0:00:00.162695 s
train loss: 0.813 | val loss: 0.834 | test loss: 1.333
validation loss decreased (inf --> 0.834)
saving model
epoch: 2
	iter:     0/   17 | loss: 0.592
	iter:    10/   17 | loss: 0.178
epoch 2 time: 0:00:00.152935 s
train loss: 0.372 | val loss: 0.726 | test loss: 1.210
validation loss decreased (0.834 --> 0.726)
saving model
epoch: 3
	iter:     0/   17 | loss: 0.072
	iter:    10/   17 | loss: 0.381
epoch 3 time: 0:00:00.154038 s
train loss: 0.219 | val loss: 0.705 | test loss: 1.726
validation loss decreased (0.726 --> 0.705)
saving model
epoch: 4
	iter:     0/   17 | loss: 0.248
	iter:    10/   17 | loss: 0.048
epoch 4 time: 0:00:00.273070 s
train loss: 0.221 | val loss: 0.683 | test loss: 1.658
validation loss decreased (0.705 --> 0.683)
saving model
epoch: 5
	iter:     0/   17 | loss: 0.066
	iter:    10/   17 | loss: 0.510
epoch 5 time: 0:00:00.246321 s
train loss: 0.208 | val 



epoch: 1
	iter:     0/   17 | loss: 0.815
	iter:    10/   17 | loss: 0.838
epoch 1 time: 0:00:00.259645 s
train loss: 0.857 | val loss: 0.543 | test loss: 1.808
validation loss decreased (inf --> 0.543)
saving model
epoch: 2
	iter:     0/   17 | loss: 0.659
	iter:    10/   17 | loss: 0.212
epoch 2 time: 0:00:00.217067 s
train loss: 0.447 | val loss: 0.825 | test loss: 1.915
EarlyStopping counter: 1 out of 20
epoch: 3
	iter:     0/   17 | loss: 0.078
	iter:    10/   17 | loss: 0.471
epoch 3 time: 0:00:00.207309 s
train loss: 0.240 | val loss: 0.928 | test loss: 2.191
EarlyStopping counter: 2 out of 20
epoch: 4
	iter:     0/   17 | loss: 0.247
	iter:    10/   17 | loss: 0.113
epoch 4 time: 0:00:00.187530 s
train loss: 0.241 | val loss: 0.944 | test loss: 1.802
EarlyStopping counter: 3 out of 20
epoch: 5
	iter:     0/   17 | loss: 0.162
	iter:    10/   17 | loss: 0.588
epoch 5 time: 0:00:00.221085 s
train loss: 0.239 | val loss: 0.940 | test loss: 2.200
EarlyStopping counter: 4 out of 20




epoch: 1
	iter:     0/   17 | loss: 0.768
	iter:    10/   17 | loss: 0.711
epoch 1 time: 0:00:00.183136 s
train loss: 0.730 | val loss: 0.618 | test loss: 0.933
validation loss decreased (inf --> 0.618)
saving model
epoch: 2
	iter:     0/   17 | loss: 0.354
	iter:    10/   17 | loss: 0.201
epoch 2 time: 0:00:00.166526 s
train loss: 0.235 | val loss: 0.684 | test loss: 0.980
EarlyStopping counter: 1 out of 20
epoch: 3
	iter:     0/   17 | loss: 0.328
	iter:    10/   17 | loss: 0.385
epoch 3 time: 0:00:00.182578 s
train loss: 0.203 | val loss: 0.747 | test loss: 1.045
EarlyStopping counter: 2 out of 20
epoch: 4
	iter:     0/   17 | loss: 0.133
	iter:    10/   17 | loss: 0.044
epoch 4 time: 0:00:00.187087 s
train loss: 0.182 | val loss: 0.764 | test loss: 1.065
EarlyStopping counter: 3 out of 20
epoch: 5
	iter:     0/   17 | loss: 0.088
	iter:    10/   17 | loss: 0.470
epoch 5 time: 0:00:00.150284 s
train loss: 0.189 | val loss: 0.695 | test loss: 0.941
EarlyStopping counter: 4 out of 20




epoch: 1
	iter:     0/   17 | loss: 0.810
	iter:    10/   17 | loss: 0.786
epoch 1 time: 0:00:00.216151 s
train loss: 0.821 | val loss: 1.253 | test loss: 1.436
validation loss decreased (inf --> 1.253)
saving model
epoch: 2
	iter:     0/   17 | loss: 0.541
	iter:    10/   17 | loss: 0.268
epoch 2 time: 0:00:00.314674 s
train loss: 0.295 | val loss: 1.546 | test loss: 1.817
EarlyStopping counter: 1 out of 20
epoch: 3
	iter:     0/   17 | loss: 0.282
	iter:    10/   17 | loss: 0.375
epoch 3 time: 0:00:00.209094 s
train loss: 0.233 | val loss: 1.510 | test loss: 1.986
EarlyStopping counter: 2 out of 20
epoch: 4
	iter:     0/   17 | loss: 0.239
	iter:    10/   17 | loss: 0.051
epoch 4 time: 0:00:00.217098 s
train loss: 0.212 | val loss: 1.397 | test loss: 1.880
EarlyStopping counter: 3 out of 20
epoch: 5
	iter:     0/   17 | loss: 0.165
	iter:    10/   17 | loss: 0.533
epoch 5 time: 0:00:00.254641 s
train loss: 0.219 | val loss: 1.544 | test loss: 1.820
EarlyStopping counter: 4 out of 20


