## Modeling

I decided to be bold and use a pretrained BERT model. Then I used a text feature based on combining the title and body, and a set of engineered features. From BERT I used the tokanization and embeddings. Then I built my own regression model for the head.

I choose to use a few layers deep NN. Depth should help with the non-linearities and multi-modalities. The presence of the (scaled) numerical feature is mean to help bring down the bias. I used the Huber loss (SmoothL1Loss) for robust regression.

Given I am working on an older M1 Macbook air, computations were slow. So I did not get to do much hyper-parameter tuning. 

Another idea that could have helped with faster computations (but I thought about it late) is to actually download the BERT embeddings for our dataset then treat them like features, combine them with the numerical features and use them together in a regression model to predict the score.

In [1]:
import numpy as np
import pandas as pd
from tqdm import tqdm

import torch
import torch.nn as nn
from transformers import BertTokenizer, BertModel
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import StepLR

from sklearn.model_selection import train_test_split
from sklearn.metrics import (mean_absolute_error, 
                             mean_squared_error, 
                             mean_absolute_percentage_error,
                             r2_score)

from utils4 import *

In [2]:
# load the data
df = load_data('data_features.csv')
# Split into train/val/test sets (70/15/15)
df_train, df_val, df_test = prepare_data(df, test_size=.15)

In [3]:
# tokenize the data
train_encodings, train_labels, train_numerical_features = tokenize_data(
    df_train, df_train[NUMERICAL_FEATURES].values)
val_encodings, val_labels, val_numerical_features = tokenize_data(
    df_val, df_val[NUMERICAL_FEATURES].values)
test_encodings, test_labels, test_numerical_features = tokenize_data(
    df_test, df_test[NUMERICAL_FEATURES].values)

In [4]:
# create the custom datasets with numerical features
train_dataset = TextDataset(train_encodings, train_labels, train_numerical_features)
val_dataset = TextDataset(val_encodings, val_labels, val_numerical_features)
test_dataset = TextDataset(test_encodings, test_labels, test_numerical_features)


In [5]:
# create the data loaders
batch_size = 32
train_loader, val_loader, test_loader = create_data_loaders(train_dataset, 
                                                            val_dataset, 
                                                            test_dataset, 
                                                            batch_size)

In [6]:
# create the RedditRegression model instance and move it to the device
model = RedditRegression(num_numerical_features=len(NUMERICAL_FEATURES)).to(device)

# freeze the pre-trained weights
for param in model.bert.parameters():
    param.requires_grad = False

# define the loss function (Huber)
loss_fn = nn.SmoothL1Loss(reduction='mean')

# define the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=.01)

# define the scheduler
scheduler = StepLR(optimizer, step_size=3, gamma=0.1)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:
num_epochs = 10
original_scale = False
for epoch in tqdm(range(num_epochs), desc='Epochs'):
    if epoch > 4:
        original_scale = True
    # train
    train_loss, train_mae, train_mape = train(model,
                                              train_loader, 
                                              loss_fn, 
                                              optimizer, 
                                              original_scale)
    print(
        f'Epoch {epoch+1}, Train Loss: {train_loss}, Train MAE: {train_mae}, Train MAPE: {train_mape}')
    scheduler.step()

    # evaluation on validation set
    val_loss, val_mae, val_mape, val_labels, val_yhats = evaluate(model,
                                                                  val_loader,
                                                                  loss_fn,
                                                                  original_scale)
    print(
        f'Epoch {epoch+1}, Val Loss: {val_loss}, Val MAE: {val_mae}, Val MAPE: {val_mape}')

    # evaluation on test set
    test_loss, test_mae, test_mape, test_labels, test_yhats = evaluate(model, 
                                                                       test_loader, 
                                                                       loss_fn, 
                                                                       original_scale)
    print(
        f'Epoch {epoch+1}, Test Loss: {test_loss}, Test MAE: {test_mae}, Test MAPE: {test_mape}')


Epochs:   0%|                                            | 0/10 [00:00<?, ?it/s]
Training:   0%|                                          | 0/95 [00:00<?, ?it/s][A
Training:   0%|          | 0/95 [00:04<?, ?it/s, Loss=11.2, MAE=11.6, MAPE=3.47][A
Training:   1%|  | 1/95 [00:04<06:53,  4.40s/it, Loss=11.2, MAE=11.6, MAPE=3.47][A
Training:   1%|     | 1/95 [00:08<06:53,  4.40s/it, Loss=369, MAE=370, MAPE=141][A
Training:   2%|     | 2/95 [00:08<06:29,  4.19s/it, Loss=369, MAE=370, MAPE=141][A
Training:   2%|  | 2/95 [00:12<06:29,  4.19s/it, Loss=20.7, MAE=21.2, MAPE=29.5][A
Training:   3%|  | 3/95 [00:12<06:19,  4.12s/it, Loss=20.7, MAE=21.2, MAPE=29.5][A
Training:   3%|  | 3/95 [00:16<06:19,  4.12s/it, Loss=43.7, MAE=44.2, MAPE=21.5][A
Training:   4%|  | 4/95 [00:16<06:12,  4.09s/it, Loss=43.7, MAE=44.2, MAPE=21.5][A
Training:   4%|  | 4/95 [00:20<06:12,  4.09s/it, Loss=42.5, MAE=42.9, MAPE=11.7][A
Training:   5%|  | 5/95 [00:20<06:06,  4.07s/it, Loss=42.5, MAE=42.9, MAPE=11.7

Epoch 1, Train Loss: 6.709235147426003, Train MAE: 7.147599697113037, Train MAPE: 3.98036527633667



Evaluation:   0%|                                        | 0/17 [00:00<?, ?it/s][A
Evaluation:   0%|       | 0/17 [00:03<?, ?it/s, Loss=0.949, MAE=1.43, MAPE=2.08][A
Evaluation:   6%| | 1/17 [00:03<00:48,  3.05s/it, Loss=0.949, MAE=1.43, MAPE=2.0[A
Evaluation:   6%| | 1/17 [00:06<00:48,  3.05s/it, Loss=1.21, MAE=1.69, MAPE=1.96[A
Evaluation:  12%| | 2/17 [00:06<00:46,  3.07s/it, Loss=1.21, MAE=1.69, MAPE=1.96[A
Evaluation:  12%| | 2/17 [00:09<00:46,  3.07s/it, Loss=1.18, MAE=1.67, MAPE=2.05[A
Evaluation:  18%|▏| 3/17 [00:09<00:43,  3.08s/it, Loss=1.18, MAE=1.67, MAPE=2.05[A
Evaluation:  18%|▏| 3/17 [00:12<00:43,  3.08s/it, Loss=0.858, MAE=1.3, MAPE=1.07[A
Evaluation:  24%|▏| 4/17 [00:12<00:40,  3.09s/it, Loss=0.858, MAE=1.3, MAPE=1.07[A
Evaluation:  24%|▏| 4/17 [00:15<00:40,  3.09s/it, Loss=1.11, MAE=1.59, MAPE=2.14[A
Evaluation:  29%|▎| 5/17 [00:15<00:37,  3.09s/it, Loss=1.11, MAE=1.59, MAPE=2.14[A
Evaluation:  29%|▎| 5/17 [00:18<00:37,  3.09s/it, Loss=0.976, MAE=1.41, MAP

Epoch 1, Val Loss: 1.047254502773285, Val MAE: 1.510587215423584, Val MAPE: 1.8130682706832886



Evaluation:   0%|                                        | 0/20 [00:00<?, ?it/s][A
Evaluation:   0%|        | 0/20 [00:03<?, ?it/s, Loss=0.94, MAE=1.36, MAPE=2.26][A
Evaluation:   5%| | 1/20 [00:03<00:58,  3.10s/it, Loss=0.94, MAE=1.36, MAPE=2.26[A
Evaluation:   5%| | 1/20 [00:06<00:58,  3.10s/it, Loss=0.944, MAE=1.39, MAPE=2.1[A
Evaluation:  10%| | 2/20 [00:06<00:55,  3.10s/it, Loss=0.944, MAE=1.39, MAPE=2.1[A
Evaluation:  10%| | 2/20 [00:09<00:55,  3.10s/it, Loss=0.988, MAE=1.45, MAPE=1.9[A
Evaluation:  15%|▏| 3/20 [00:09<00:52,  3.10s/it, Loss=0.988, MAE=1.45, MAPE=1.9[A
Evaluation:  15%|▏| 3/20 [00:12<00:52,  3.10s/it, Loss=0.971, MAE=1.43, MAPE=1.8[A
Evaluation:  20%|▏| 4/20 [00:12<00:49,  3.12s/it, Loss=0.971, MAE=1.43, MAPE=1.8[A
Evaluation:  20%|▏| 4/20 [00:15<00:49,  3.12s/it, Loss=1.09, MAE=1.56, MAPE=2.07[A
Evaluation:  25%|▎| 5/20 [00:15<00:46,  3.11s/it, Loss=1.09, MAE=1.56, MAPE=2.07[A
Evaluation:  25%|▎| 5/20 [00:18<00:46,  3.11s/it, Loss=1.06, MAE=1.5, MAPE=

Epoch 1, Test Loss: 1.0195763468742371, Test MAE: 1.4717755317687988, Test MAPE: 2.053283214569092



Training:   0%|                                          | 0/95 [00:00<?, ?it/s][A
Training:   0%|         | 0/95 [00:04<?, ?it/s, Loss=0.931, MAE=1.38, MAPE=1.55][A
Training:   1%| | 1/95 [00:04<06:43,  4.29s/it, Loss=0.931, MAE=1.38, MAPE=1.55][A
Training:   1%| | 1/95 [00:08<06:43,  4.29s/it, Loss=0.905, MAE=1.39, MAPE=2.41][A
Training:   2%| | 2/95 [00:08<06:35,  4.25s/it, Loss=0.905, MAE=1.39, MAPE=2.41][A
Training:   2%|  | 2/95 [00:12<06:35,  4.25s/it, Loss=1.26, MAE=1.72, MAPE=2.69][A
Training:   3%|  | 3/95 [00:12<06:30,  4.24s/it, Loss=1.26, MAE=1.72, MAPE=2.69][A
Training:   3%|  | 3/95 [00:16<06:30,  4.24s/it, Loss=1.09, MAE=1.57, MAPE=2.26][A
Training:   4%|  | 4/95 [00:16<06:25,  4.23s/it, Loss=1.09, MAE=1.57, MAPE=2.26][A
Training:   4%|  | 4/95 [00:21<06:25,  4.23s/it, Loss=1.07, MAE=1.54, MAPE=2.31][A
Training:   5%|  | 5/95 [00:21<06:21,  4.24s/it, Loss=1.07, MAE=1.54, MAPE=2.31][A
Training:   5%|▏  | 5/95 [00:25<06:21,  4.24s/it, Loss=1.07, MAE=1.53, MAPE

Epoch 2, Train Loss: 1.0017743549848859, Train MAE: 1.449711799621582, Train MAPE: 1.9635306596755981



Evaluation:   0%|                                        | 0/17 [00:00<?, ?it/s][A
Evaluation:   0%|       | 0/17 [00:02<?, ?it/s, Loss=0.847, MAE=1.28, MAPE=1.45][A
Evaluation:   6%| | 1/17 [00:02<00:47,  2.96s/it, Loss=0.847, MAE=1.28, MAPE=1.4[A
Evaluation:   6%| | 1/17 [00:05<00:47,  2.96s/it, Loss=0.666, MAE=1.07, MAPE=1.5[A
Evaluation:  12%| | 2/17 [00:05<00:44,  2.96s/it, Loss=0.666, MAE=1.07, MAPE=1.5[A
Evaluation:  12%| | 2/17 [00:08<00:44,  2.96s/it, Loss=0.859, MAE=1.33, MAPE=2.3[A
Evaluation:  18%|▏| 3/17 [00:08<00:41,  2.97s/it, Loss=0.859, MAE=1.33, MAPE=2.3[A
Evaluation:  18%|▏| 3/17 [00:11<00:41,  2.97s/it, Loss=0.643, MAE=1.05, MAPE=1.5[A
Evaluation:  24%|▏| 4/17 [00:11<00:38,  2.97s/it, Loss=0.643, MAE=1.05, MAPE=1.5[A
Evaluation:  24%|▏| 4/17 [00:14<00:38,  2.97s/it, Loss=0.986, MAE=1.41, MAPE=2.7[A
Evaluation:  29%|▎| 5/17 [00:14<00:35,  2.97s/it, Loss=0.986, MAE=1.41, MAPE=2.7[A
Evaluation:  29%|▎| 5/17 [00:17<00:35,  2.97s/it, Loss=0.786, MAE=1.2, MAPE

Epoch 2, Val Loss: 0.7657754526418799, Val MAE: 1.1912546157836914, Val MAPE: 1.8312121629714966



Evaluation:   0%|                                        | 0/20 [00:00<?, ?it/s][A
Evaluation:   0%|        | 0/20 [00:02<?, ?it/s, Loss=0.81, MAE=1.21, MAPE=3.24][A
Evaluation:   5%| | 1/20 [00:02<00:56,  2.98s/it, Loss=0.81, MAE=1.21, MAPE=3.24[A
Evaluation:   5%| | 1/20 [00:05<00:56,  2.98s/it, Loss=1.04, MAE=1.49, MAPE=3.16[A
Evaluation:  10%| | 2/20 [00:05<00:53,  2.99s/it, Loss=1.04, MAE=1.49, MAPE=3.16[A
Evaluation:  10%| | 2/20 [00:08<00:53,  2.99s/it, Loss=0.751, MAE=1.17, MAPE=1.2[A
Evaluation:  15%|▏| 3/20 [00:09<00:51,  3.00s/it, Loss=0.751, MAE=1.17, MAPE=1.2[A
Evaluation:  15%|▏| 3/20 [00:11<00:51,  3.00s/it, Loss=0.929, MAE=1.35, MAPE=2.7[A
Evaluation:  20%|▏| 4/20 [00:11<00:47,  3.00s/it, Loss=0.929, MAE=1.35, MAPE=2.7[A
Evaluation:  20%|▏| 4/20 [00:14<00:47,  3.00s/it, Loss=0.834, MAE=1.26, MAPE=2.4[A
Evaluation:  25%|▎| 5/20 [00:14<00:44,  2.99s/it, Loss=0.834, MAE=1.26, MAPE=2.4[A
Evaluation:  25%|▎| 5/20 [00:17<00:44,  2.99s/it, Loss=0.724, MAE=1.13, MAP

Epoch 2, Test Loss: 0.8137224614620209, Test MAE: 1.23651123046875, Test MAPE: 2.1053876876831055



Training:   0%|                                          | 0/95 [00:00<?, ?it/s][A
Training:   0%|          | 0/95 [00:04<?, ?it/s, Loss=1.07, MAE=1.51, MAPE=2.77][A
Training:   1%|  | 1/95 [00:04<06:45,  4.31s/it, Loss=1.07, MAE=1.51, MAPE=2.77][A
Training:   1%| | 1/95 [00:08<06:45,  4.31s/it, Loss=0.644, MAE=1.06, MAPE=1.61][A
Training:   2%| | 2/95 [00:08<06:28,  4.18s/it, Loss=0.644, MAE=1.06, MAPE=1.61][A
Training:   2%| | 2/95 [00:12<06:28,  4.18s/it, Loss=0.728, MAE=1.14, MAPE=1.74][A
Training:   3%| | 3/95 [00:12<06:21,  4.15s/it, Loss=0.728, MAE=1.14, MAPE=1.74][A
Training:   3%|  | 3/95 [00:16<06:21,  4.15s/it, Loss=0.78, MAE=1.17, MAPE=1.86][A
Training:   4%|  | 4/95 [00:16<06:14,  4.12s/it, Loss=0.78, MAE=1.17, MAPE=1.86][A
Training:   4%|  | 4/95 [00:20<06:14,  4.12s/it, Loss=0.943, MAE=1.39, MAPE=1.5][A
Training:   5%|  | 5/95 [00:20<06:10,  4.11s/it, Loss=0.943, MAE=1.39, MAPE=1.5][A
Training:   5%| | 5/95 [00:24<06:10,  4.11s/it, Loss=0.824, MAE=1.24, MAPE=

Epoch 3, Train Loss: 0.8269753236519662, Train MAE: 1.2169668674468994, Train MAPE: 1.8264468908309937



Evaluation:   0%|                                        | 0/17 [00:00<?, ?it/s][A
Evaluation:   0%|       | 0/17 [00:03<?, ?it/s, Loss=0.795, MAE=1.18, MAPE=1.02][A
Evaluation:   6%| | 1/17 [00:03<00:50,  3.15s/it, Loss=0.795, MAE=1.18, MAPE=1.0[A
Evaluation:   6%| | 1/17 [00:06<00:50,  3.15s/it, Loss=0.762, MAE=1.13, MAPE=0.9[A
Evaluation:  12%| | 2/17 [00:06<00:46,  3.13s/it, Loss=0.762, MAE=1.13, MAPE=0.9[A
Evaluation:  12%| | 2/17 [00:09<00:46,  3.13s/it, Loss=0.935, MAE=1.35, MAPE=1.1[A
Evaluation:  18%|▏| 3/17 [00:09<00:44,  3.15s/it, Loss=0.935, MAE=1.35, MAPE=1.1[A
Evaluation:  18%|▏| 3/17 [00:12<00:44,  3.15s/it, Loss=0.956, MAE=1.34, MAPE=1.0[A
Evaluation:  24%|▏| 4/17 [00:12<00:40,  3.14s/it, Loss=0.956, MAE=1.34, MAPE=1.0[A
Evaluation:  24%|▏| 4/17 [00:15<00:40,  3.14s/it, Loss=0.925, MAE=1.32, MAPE=1.5[A
Evaluation:  29%|▎| 5/17 [00:15<00:37,  3.13s/it, Loss=0.925, MAE=1.32, MAPE=1.5[A
Evaluation:  29%|▎| 5/17 [00:18<00:37,  3.13s/it, Loss=0.69, MAE=1.07, MAPE

Epoch 3, Val Loss: 0.844310371314778, Val MAE: 1.221878170967102, Val MAPE: 1.1545239686965942



Evaluation:   0%|                                        | 0/20 [00:00<?, ?it/s][A
Evaluation:   0%|       | 0/20 [00:03<?, ?it/s, Loss=0.628, MAE=1.01, MAPE=1.36][A
Evaluation:   5%| | 1/20 [00:03<00:59,  3.12s/it, Loss=0.628, MAE=1.01, MAPE=1.3[A
Evaluation:   5%| | 1/20 [00:06<00:59,  3.12s/it, Loss=0.878, MAE=1.25, MAPE=1.0[A
Evaluation:  10%| | 2/20 [00:06<00:56,  3.13s/it, Loss=0.878, MAE=1.25, MAPE=1.0[A
Evaluation:  10%| | 2/20 [00:09<00:56,  3.13s/it, Loss=0.712, MAE=1.06, MAPE=1.4[A
Evaluation:  15%|▏| 3/20 [00:09<00:53,  3.15s/it, Loss=0.712, MAE=1.06, MAPE=1.4[A
Evaluation:  15%|▏| 3/20 [00:12<00:53,  3.15s/it, Loss=0.806, MAE=1.18, MAPE=1.0[A
Evaluation:  20%|▏| 4/20 [00:12<00:50,  3.15s/it, Loss=0.806, MAE=1.18, MAPE=1.0[A
Evaluation:  20%|▏| 4/20 [00:15<00:50,  3.15s/it, Loss=0.748, MAE=1.12, MAPE=1.3[A
Evaluation:  25%|▎| 5/20 [00:15<00:47,  3.15s/it, Loss=0.748, MAE=1.12, MAPE=1.3[A
Evaluation:  25%|▎| 5/20 [00:18<00:47,  3.15s/it, Loss=0.592, MAE=0.968, MA

Epoch 3, Test Loss: 0.8201614022254944, Test MAE: 1.2078298330307007, Test MAPE: 1.3316726684570312



Training:   0%|                                          | 0/95 [00:00<?, ?it/s][A
Training:   0%|         | 0/95 [00:04<?, ?it/s, Loss=0.919, MAE=1.32, MAPE=1.74][A
Training:   1%| | 1/95 [00:04<06:38,  4.24s/it, Loss=0.919, MAE=1.32, MAPE=1.74][A
Training:   1%|  | 1/95 [00:08<06:38,  4.24s/it, Loss=0.91, MAE=1.33, MAPE=1.29][A
Training:   2%|  | 2/95 [00:08<06:32,  4.22s/it, Loss=0.91, MAE=1.33, MAPE=1.29][A
Training:   2%|  | 2/95 [00:12<06:32,  4.22s/it, Loss=0.811, MAE=1.2, MAPE=1.61][A
Training:   3%|  | 3/95 [00:12<06:27,  4.21s/it, Loss=0.811, MAE=1.2, MAPE=1.61][A
Training:   3%| | 3/95 [00:16<06:27,  4.21s/it, Loss=0.787, MAE=1.17, MAPE=1.87][A
Training:   4%| | 4/95 [00:16<06:24,  4.22s/it, Loss=0.787, MAE=1.17, MAPE=1.87][A
Training:   4%|  | 4/95 [00:21<06:24,  4.22s/it, Loss=0.92, MAE=1.32, MAPE=1.61][A
Training:   5%|  | 5/95 [00:21<06:19,  4.21s/it, Loss=0.92, MAE=1.32, MAPE=1.61][A
Training:   5%| | 5/95 [00:25<06:19,  4.21s/it, Loss=0.619, MAE=0.999, MAPE

Epoch 4, Train Loss: 0.7904554332557477, Train MAE: 1.1738166809082031, Train MAPE: 1.710542917251587



Evaluation:   0%|                                        | 0/17 [00:00<?, ?it/s][A
Evaluation:   0%|       | 0/17 [00:03<?, ?it/s, Loss=0.797, MAE=1.17, MAPE=1.99][A
Evaluation:   6%| | 1/17 [00:03<00:50,  3.13s/it, Loss=0.797, MAE=1.17, MAPE=1.9[A
Evaluation:   6%| | 1/17 [00:06<00:50,  3.13s/it, Loss=0.894, MAE=1.27, MAPE=2.0[A
Evaluation:  12%| | 2/17 [00:06<00:46,  3.11s/it, Loss=0.894, MAE=1.27, MAPE=2.0[A
Evaluation:  12%| | 2/17 [00:09<00:46,  3.11s/it, Loss=1.05, MAE=1.48, MAPE=2.2][A
Evaluation:  18%|▏| 3/17 [00:09<00:43,  3.10s/it, Loss=1.05, MAE=1.48, MAPE=2.2][A
Evaluation:  18%|▏| 3/17 [00:12<00:43,  3.10s/it, Loss=0.869, MAE=1.23, MAPE=1.4[A
Evaluation:  24%|▏| 4/17 [00:12<00:40,  3.11s/it, Loss=0.869, MAE=1.23, MAPE=1.4[A
Evaluation:  24%|▏| 4/17 [00:15<00:40,  3.11s/it, Loss=0.733, MAE=1.12, MAPE=1.2[A
Evaluation:  29%|▎| 5/17 [00:15<00:37,  3.11s/it, Loss=0.733, MAE=1.12, MAPE=1.2[A
Evaluation:  29%|▎| 5/17 [00:18<00:37,  3.11s/it, Loss=0.72, MAE=1.12, MAPE

Epoch 4, Val Loss: 0.7574149510439705, Val MAE: 1.1293179988861084, Val MAPE: 1.5021756887435913



Evaluation:   0%|                                        | 0/20 [00:00<?, ?it/s][A
Evaluation:   0%|       | 0/20 [00:03<?, ?it/s, Loss=0.872, MAE=1.25, MAPE=2.02][A
Evaluation:   5%| | 1/20 [00:03<01:00,  3.18s/it, Loss=0.872, MAE=1.25, MAPE=2.0[A
Evaluation:   5%| | 1/20 [00:06<01:00,  3.18s/it, Loss=0.864, MAE=1.24, MAPE=1.9[A
Evaluation:  10%| | 2/20 [00:06<00:56,  3.16s/it, Loss=0.864, MAE=1.24, MAPE=1.9[A
Evaluation:  10%| | 2/20 [00:09<00:56,  3.16s/it, Loss=0.955, MAE=1.37, MAPE=3.2[A
Evaluation:  15%|▏| 3/20 [00:09<00:53,  3.16s/it, Loss=0.955, MAE=1.37, MAPE=3.2[A
Evaluation:  15%|▏| 3/20 [00:12<00:53,  3.16s/it, Loss=0.596, MAE=0.95, MAPE=1.2[A
Evaluation:  20%|▏| 4/20 [00:12<00:50,  3.18s/it, Loss=0.596, MAE=0.95, MAPE=1.2[A
Evaluation:  20%|▏| 4/20 [00:15<00:50,  3.18s/it, Loss=0.722, MAE=1.09, MAPE=1.1[A
Evaluation:  25%|▎| 5/20 [00:15<00:47,  3.19s/it, Loss=0.722, MAE=1.09, MAPE=1.1[A
Evaluation:  25%|▎| 5/20 [00:19<00:47,  3.19s/it, Loss=0.527, MAE=0.857, MA

Epoch 4, Test Loss: 0.7652749806642533, Test MAE: 1.148299217224121, Test MAPE: 1.7330493927001953



Training:   0%|                                          | 0/95 [00:00<?, ?it/s][A
Training:   0%|         | 0/95 [00:04<?, ?it/s, Loss=0.662, MAE=1.06, MAPE=1.95][A
Training:   1%| | 1/95 [00:04<06:40,  4.26s/it, Loss=0.662, MAE=1.06, MAPE=1.95][A
Training:   1%|  | 1/95 [00:08<06:40,  4.26s/it, Loss=0.725, MAE=1.1, MAPE=1.74][A
Training:   2%|  | 2/95 [00:08<06:34,  4.24s/it, Loss=0.725, MAE=1.1, MAPE=1.74][A
Training:   2%| | 2/95 [00:12<06:34,  4.24s/it, Loss=0.768, MAE=1.14, MAPE=1.28][A
Training:   3%| | 3/95 [00:12<06:28,  4.22s/it, Loss=0.768, MAE=1.14, MAPE=1.28][A
Training:   3%|  | 3/95 [00:16<06:28,  4.22s/it, Loss=1.17, MAE=1.59, MAPE=2.93][A
Training:   4%|  | 4/95 [00:16<06:23,  4.22s/it, Loss=1.17, MAE=1.59, MAPE=2.93][A
Training:   4%| | 4/95 [00:21<06:23,  4.22s/it, Loss=0.788, MAE=1.18, MAPE=2.54][A
Training:   5%| | 5/95 [00:21<06:19,  4.22s/it, Loss=0.788, MAE=1.18, MAPE=2.54][A
Training:   5%| | 5/95 [00:25<06:19,  4.22s/it, Loss=0.709, MAE=1.04, MAPE=

Epoch 5, Train Loss: 0.769518127253181, Train MAE: 1.1491562128067017, Train MAPE: 1.7437912225723267



Evaluation:   0%|                                        | 0/17 [00:00<?, ?it/s][A
Evaluation:   0%|      | 0/17 [00:03<?, ?it/s, Loss=0.807, MAE=1.18, MAPE=0.884][A
Evaluation:   6%| | 1/17 [00:03<00:51,  3.22s/it, Loss=0.807, MAE=1.18, MAPE=0.8[A
Evaluation:   6%| | 1/17 [00:06<00:51,  3.22s/it, Loss=0.67, MAE=1.05, MAPE=1.39[A
Evaluation:  12%| | 2/17 [00:06<00:48,  3.23s/it, Loss=0.67, MAE=1.05, MAPE=1.39[A
Evaluation:  12%| | 2/17 [00:09<00:48,  3.23s/it, Loss=0.783, MAE=1.16, MAPE=2.1[A
Evaluation:  18%|▏| 3/17 [00:09<00:45,  3.22s/it, Loss=0.783, MAE=1.16, MAPE=2.1[A
Evaluation:  18%|▏| 3/17 [00:12<00:45,  3.22s/it, Loss=0.812, MAE=1.21, MAPE=2.5[A
Evaluation:  24%|▏| 4/17 [00:12<00:41,  3.22s/it, Loss=0.812, MAE=1.21, MAPE=2.5[A
Evaluation:  24%|▏| 4/17 [00:16<00:41,  3.22s/it, Loss=0.536, MAE=0.86, MAPE=1.4[A
Evaluation:  29%|▎| 5/17 [00:16<00:38,  3.21s/it, Loss=0.536, MAE=0.86, MAPE=1.4[A
Evaluation:  29%|▎| 5/17 [00:19<00:38,  3.21s/it, Loss=0.946, MAE=1.38, MAP

Epoch 5, Val Loss: 0.7440699444097632, Val MAE: 1.1133381128311157, Val MAPE: 1.5200697183609009



Evaluation:   0%|                                        | 0/20 [00:00<?, ?it/s][A
Evaluation:   0%|      | 0/20 [00:03<?, ?it/s, Loss=0.544, MAE=0.924, MAPE=1.43][A
Evaluation:   5%| | 1/20 [00:03<01:00,  3.21s/it, Loss=0.544, MAE=0.924, MAPE=1.[A
Evaluation:   5%| | 1/20 [00:06<01:00,  3.21s/it, Loss=0.842, MAE=1.27, MAPE=2.1[A
Evaluation:  10%| | 2/20 [00:06<00:57,  3.22s/it, Loss=0.842, MAE=1.27, MAPE=2.1[A
Evaluation:  10%| | 2/20 [00:09<00:57,  3.22s/it, Loss=0.885, MAE=1.27, MAPE=1.6[A
Evaluation:  15%|▏| 3/20 [00:09<00:54,  3.22s/it, Loss=0.885, MAE=1.27, MAPE=1.6[A
Evaluation:  15%|▏| 3/20 [00:12<00:54,  3.22s/it, Loss=0.595, MAE=0.92, MAPE=1.7[A
Evaluation:  20%|▏| 4/20 [00:12<00:51,  3.22s/it, Loss=0.595, MAE=0.92, MAPE=1.7[A
Evaluation:  20%|▏| 4/20 [00:16<00:51,  3.22s/it, Loss=0.614, MAE=0.955, MAPE=1.[A
Evaluation:  25%|▎| 5/20 [00:16<00:48,  3.21s/it, Loss=0.614, MAE=0.955, MAPE=1.[A
Evaluation:  25%|▎| 5/20 [00:19<00:48,  3.21s/it, Loss=0.739, MAE=1.14, MAP

Epoch 5, Test Loss: 0.7555551081895828, Test MAE: 1.1347354650497437, Test MAPE: 1.766058325767517



Training:   0%|                                          | 0/95 [00:00<?, ?it/s][A
Training:   0%|   | 0/95 [00:04<?, ?it/s, Loss=0.878, MAE=3.52e+3, MAPE=3.48e+8][A
Training:   1%| | 1/95 [00:04<06:41,  4.27s/it, Loss=0.878, MAE=3.52e+3, MAPE=3.[A
Training:   1%| | 1/95 [00:08<06:41,  4.27s/it, Loss=0.799, MAE=3.75e+3, MAPE=6.[A
Training:   2%| | 2/95 [00:08<06:34,  4.24s/it, Loss=0.799, MAE=3.75e+3, MAPE=6.[A
Training:   2%| | 2/95 [00:12<06:34,  4.24s/it, Loss=0.701, MAE=3.26e+3, MAPE=1.[A
Training:   3%| | 3/95 [00:12<06:31,  4.26s/it, Loss=0.701, MAE=3.26e+3, MAPE=1.[A
Training:   3%| | 3/95 [00:17<06:31,  4.26s/it, Loss=0.622, MAE=2.48e+3, MAPE=1.[A
Training:   4%| | 4/95 [00:17<06:29,  4.28s/it, Loss=0.622, MAE=2.48e+3, MAPE=1.[A
Training:   4%| | 4/95 [00:21<06:29,  4.28s/it, Loss=0.589, MAE=2.61e+3, MAPE=6.[A
Training:   5%| | 5/95 [00:21<06:24,  4.28s/it, Loss=0.589, MAE=2.61e+3, MAPE=6.[A
Training:   5%| | 5/95 [00:25<06:24,  4.28s/it, Loss=0.71, MAE=3.22e+3, MAP

Epoch 6, Train Loss: 0.7671850351910842, Train MAE: 2948.9453125, Train MAPE: 2429999872.0



Evaluation:   0%|                                        | 0/17 [00:00<?, ?it/s][A
Evaluation:   0%| | 0/17 [00:03<?, ?it/s, Loss=0.674, MAE=3.19e+3, MAPE=3.86e+8][A
Evaluation:   6%| | 1/17 [00:03<00:51,  3.23s/it, Loss=0.674, MAE=3.19e+3, MAPE=[A
Evaluation:   6%| | 1/17 [00:06<00:51,  3.23s/it, Loss=0.656, MAE=2.26e+3, MAPE=[A
Evaluation:  12%| | 2/17 [00:06<00:48,  3.24s/it, Loss=0.656, MAE=2.26e+3, MAPE=[A
Evaluation:  12%| | 2/17 [00:09<00:48,  3.24s/it, Loss=0.801, MAE=2.2e+3, MAPE=8[A
Evaluation:  18%|▏| 3/17 [00:09<00:45,  3.23s/it, Loss=0.801, MAE=2.2e+3, MAPE=8[A
Evaluation:  18%|▏| 3/17 [00:12<00:45,  3.23s/it, Loss=0.858, MAE=4.59e+3, MAPE=[A
Evaluation:  24%|▏| 4/17 [00:12<00:41,  3.22s/it, Loss=0.858, MAE=4.59e+3, MAPE=[A
Evaluation:  24%|▏| 4/17 [00:16<00:41,  3.22s/it, Loss=0.862, MAE=2.43e+3, MAPE=[A
Evaluation:  29%|▎| 5/17 [00:16<00:38,  3.22s/it, Loss=0.862, MAE=2.43e+3, MAPE=[A
Evaluation:  29%|▎| 5/17 [00:19<00:38,  3.22s/it, Loss=0.663, MAE=2.94e+3, 

Epoch 6, Val Loss: 0.7393517950001884, Val MAE: 3244.37841796875, Val MAPE: 1259945344.0



Evaluation:   0%|                                        | 0/20 [00:00<?, ?it/s][A
Evaluation:   0%| | 0/20 [00:03<?, ?it/s, Loss=0.799, MAE=1.78e+3, MAPE=4.71e+8][A
Evaluation:   5%| | 1/20 [00:03<01:01,  3.22s/it, Loss=0.799, MAE=1.78e+3, MAPE=[A
Evaluation:   5%| | 1/20 [00:06<01:01,  3.22s/it, Loss=0.889, MAE=5.91e+3, MAPE=[A
Evaluation:  10%| | 2/20 [00:06<00:57,  3.22s/it, Loss=0.889, MAE=5.91e+3, MAPE=[A
Evaluation:  10%| | 2/20 [00:09<00:57,  3.22s/it, Loss=0.645, MAE=1.82e+3, MAPE=[A
Evaluation:  15%|▏| 3/20 [00:09<00:54,  3.23s/it, Loss=0.645, MAE=1.82e+3, MAPE=[A
Evaluation:  15%|▏| 3/20 [00:12<00:54,  3.23s/it, Loss=0.44, MAE=5.56e+3, MAPE=4[A
Evaluation:  20%|▏| 4/20 [00:12<00:51,  3.23s/it, Loss=0.44, MAE=5.56e+3, MAPE=4[A
Evaluation:  20%|▏| 4/20 [00:16<00:51,  3.23s/it, Loss=0.943, MAE=3.82e+3, MAPE=[A
Evaluation:  25%|▎| 5/20 [00:16<00:48,  3.23s/it, Loss=0.943, MAE=3.82e+3, MAPE=[A
Evaluation:  25%|▎| 5/20 [00:19<00:48,  3.23s/it, Loss=0.848, MAE=2.71e+3, 

Epoch 6, Test Loss: 0.7538920104503631, Test MAE: 3166.63525390625, Test MAPE: 4799095808.0



Training:   0%|                                          | 0/95 [00:00<?, ?it/s][A
Training:   0%|   | 0/95 [00:04<?, ?it/s, Loss=0.805, MAE=3.93e+3, MAPE=7.01e+9][A
Training:   1%| | 1/95 [00:04<06:40,  4.26s/it, Loss=0.805, MAE=3.93e+3, MAPE=7.[A
Training:   1%| | 1/95 [00:08<06:40,  4.26s/it, Loss=0.748, MAE=4.14e+3, MAPE=1.[A
Training:   2%| | 2/95 [00:08<06:35,  4.25s/it, Loss=0.748, MAE=4.14e+3, MAPE=1.[A
Training:   2%| | 2/95 [00:12<06:35,  4.25s/it, Loss=0.817, MAE=3.35e+3, MAPE=6.[A
Training:   3%| | 3/95 [00:12<06:31,  4.25s/it, Loss=0.817, MAE=3.35e+3, MAPE=6.[A
Training:   3%| | 3/95 [00:17<06:31,  4.25s/it, Loss=1.02, MAE=4.04e+3, MAPE=6.8[A
Training:   4%| | 4/95 [00:17<06:27,  4.25s/it, Loss=1.02, MAE=4.04e+3, MAPE=6.8[A
Training:   4%| | 4/95 [00:21<06:27,  4.25s/it, Loss=0.656, MAE=2.64e+3, MAPE=7.[A
Training:   5%| | 5/95 [00:21<06:22,  4.25s/it, Loss=0.656, MAE=2.64e+3, MAPE=7.[A
Training:   5%| | 5/95 [00:25<06:22,  4.25s/it, Loss=0.651, MAE=1.26e+3, MA

Epoch 7, Train Loss: 0.7633840454252143, Train MAE: 2961.05517578125, Train MAPE: 2723283968.0



Evaluation:   0%|                                        | 0/17 [00:00<?, ?it/s][A
Evaluation:   0%|  | 0/17 [00:03<?, ?it/s, Loss=1.15, MAE=3.16e+3, MAPE=7.02e+8][A
Evaluation:   6%| | 1/17 [00:03<00:51,  3.22s/it, Loss=1.15, MAE=3.16e+3, MAPE=7[A
Evaluation:   6%| | 1/17 [00:06<00:51,  3.22s/it, Loss=0.74, MAE=3.21e+3, MAPE=1[A
Evaluation:  12%| | 2/17 [00:06<00:48,  3.24s/it, Loss=0.74, MAE=3.21e+3, MAPE=1[A
Evaluation:  12%| | 2/17 [00:09<00:48,  3.24s/it, Loss=0.651, MAE=2.44e+3, MAPE=[A
Evaluation:  18%|▏| 3/17 [00:09<00:45,  3.23s/it, Loss=0.651, MAE=2.44e+3, MAPE=[A
Evaluation:  18%|▏| 3/17 [00:12<00:45,  3.23s/it, Loss=0.785, MAE=3.21e+3, MAPE=[A
Evaluation:  24%|▏| 4/17 [00:12<00:41,  3.22s/it, Loss=0.785, MAE=3.21e+3, MAPE=[A
Evaluation:  24%|▏| 4/17 [00:16<00:41,  3.22s/it, Loss=0.778, MAE=2.87e+3, MAPE=[A
Evaluation:  29%|▎| 5/17 [00:16<00:38,  3.22s/it, Loss=0.778, MAE=2.87e+3, MAPE=[A
Evaluation:  29%|▎| 5/17 [00:19<00:38,  3.22s/it, Loss=0.843, MAE=4.11e+3, 

Epoch 7, Val Loss: 0.7377164276207194, Val MAE: 3202.9443359375, Val MAPE: 1144637184.0



Evaluation:   0%|                                        | 0/20 [00:00<?, ?it/s][A
Evaluation:   0%| | 0/20 [00:03<?, ?it/s, Loss=0.912, MAE=1.96e+3, MAPE=8.89e+8][A
Evaluation:   5%| | 1/20 [00:03<01:02,  3.28s/it, Loss=0.912, MAE=1.96e+3, MAPE=[A
Evaluation:   5%| | 1/20 [00:06<01:02,  3.28s/it, Loss=0.585, MAE=2.64e+3, MAPE=[A
Evaluation:  10%| | 2/20 [00:06<00:58,  3.26s/it, Loss=0.585, MAE=2.64e+3, MAPE=[A
Evaluation:  10%| | 2/20 [00:09<00:58,  3.26s/it, Loss=0.803, MAE=3.49e+3, MAPE=[A
Evaluation:  15%|▏| 3/20 [00:09<00:55,  3.24s/it, Loss=0.803, MAE=3.49e+3, MAPE=[A
Evaluation:  15%|▏| 3/20 [00:12<00:55,  3.24s/it, Loss=0.6, MAE=2.26e+3, MAPE=2.[A
Evaluation:  20%|▏| 4/20 [00:12<00:51,  3.24s/it, Loss=0.6, MAE=2.26e+3, MAPE=2.[A
Evaluation:  20%|▏| 4/20 [00:16<00:51,  3.24s/it, Loss=0.743, MAE=2.93e+3, MAPE=[A
Evaluation:  25%|▎| 5/20 [00:16<00:48,  3.23s/it, Loss=0.743, MAE=2.93e+3, MAPE=[A
Evaluation:  25%|▎| 5/20 [00:19<00:48,  3.23s/it, Loss=0.696, MAE=3.44e+3, 

Epoch 7, Test Loss: 0.7540332734584808, Test MAE: 3021.92822265625, Test MAPE: 3797488384.0



Training:   0%|                                          | 0/95 [00:00<?, ?it/s][A
Training:   0%|    | 0/95 [00:04<?, ?it/s, Loss=1.04, MAE=2.19e+3, MAPE=8.22e+9][A
Training:   1%| | 1/95 [00:04<06:42,  4.28s/it, Loss=1.04, MAE=2.19e+3, MAPE=8.2[A
Training:   1%| | 1/95 [00:08<06:42,  4.28s/it, Loss=0.814, MAE=2.35e+3, MAPE=7.[A
Training:   2%| | 2/95 [00:08<06:37,  4.28s/it, Loss=0.814, MAE=2.35e+3, MAPE=7.[A
Training:   2%| | 2/95 [00:12<06:37,  4.28s/it, Loss=0.523, MAE=2.03e+3, MAPE=3.[A
Training:   3%| | 3/95 [00:12<06:33,  4.28s/it, Loss=0.523, MAE=2.03e+3, MAPE=3.[A
Training:   3%| | 3/95 [00:17<06:33,  4.28s/it, Loss=0.866, MAE=2.49e+3, MAPE=7.[A
Training:   4%| | 4/95 [00:17<06:29,  4.28s/it, Loss=0.866, MAE=2.49e+3, MAPE=7.[A
Training:   4%| | 4/95 [00:21<06:29,  4.28s/it, Loss=0.806, MAE=2.5e+3, MAPE=2.8[A
Training:   5%| | 5/95 [00:21<06:25,  4.28s/it, Loss=0.806, MAE=2.5e+3, MAPE=2.8[A
Training:   5%| | 5/95 [00:25<06:25,  4.28s/it, Loss=1.04, MAE=2.68e+3, MAP

Epoch 8, Train Loss: 0.7606632041303735, Train MAE: 2934.7890625, Train MAPE: 2541186560.0



Evaluation:   0%|                                        | 0/17 [00:00<?, ?it/s][A
Evaluation:   0%| | 0/17 [00:03<?, ?it/s, Loss=0.762, MAE=2.32e+3, MAPE=3.32e+8][A
Evaluation:   6%| | 1/17 [00:03<00:51,  3.21s/it, Loss=0.762, MAE=2.32e+3, MAPE=[A
Evaluation:   6%| | 1/17 [00:06<00:51,  3.21s/it, Loss=0.449, MAE=4.03e+3, MAPE=[A
Evaluation:  12%| | 2/17 [00:06<00:48,  3.22s/it, Loss=0.449, MAE=4.03e+3, MAPE=[A
Evaluation:  12%| | 2/17 [00:09<00:48,  3.22s/it, Loss=0.951, MAE=4.11e+3, MAPE=[A
Evaluation:  18%|▏| 3/17 [00:09<00:45,  3.22s/it, Loss=0.951, MAE=4.11e+3, MAPE=[A
Evaluation:  18%|▏| 3/17 [00:12<00:45,  3.22s/it, Loss=0.567, MAE=4.03e+3, MAPE=[A
Evaluation:  24%|▏| 4/17 [00:12<00:41,  3.22s/it, Loss=0.567, MAE=4.03e+3, MAPE=[A
Evaluation:  24%|▏| 4/17 [00:16<00:41,  3.22s/it, Loss=0.836, MAE=2.43e+3, MAPE=[A
Evaluation:  29%|▎| 5/17 [00:16<00:38,  3.22s/it, Loss=0.836, MAE=2.43e+3, MAPE=[A
Evaluation:  29%|▎| 5/17 [00:19<00:38,  3.22s/it, Loss=0.667, MAE=2.51e+3, 

Epoch 8, Val Loss: 0.7396089382031384, Val MAE: 3208.001953125, Val MAPE: 1135852160.0



Evaluation:   0%|                                        | 0/20 [00:00<?, ?it/s][A
Evaluation:   0%| | 0/20 [00:03<?, ?it/s, Loss=0.501, MAE=4.84e+3, MAPE=1.74e+7][A
Evaluation:   5%| | 1/20 [00:03<01:01,  3.22s/it, Loss=0.501, MAE=4.84e+3, MAPE=[A
Evaluation:   5%| | 1/20 [00:06<01:01,  3.22s/it, Loss=0.934, MAE=4.17e+3, MAPE=[A
Evaluation:  10%| | 2/20 [00:06<00:58,  3.24s/it, Loss=0.934, MAE=4.17e+3, MAPE=[A
Evaluation:  10%| | 2/20 [00:09<00:58,  3.24s/it, Loss=0.487, MAE=1.58e+3, MAPE=[A
Evaluation:  15%|▏| 3/20 [00:09<00:54,  3.23s/it, Loss=0.487, MAE=1.58e+3, MAPE=[A
Evaluation:  15%|▏| 3/20 [00:12<00:54,  3.23s/it, Loss=1.02, MAE=4.75e+3, MAPE=2[A
Evaluation:  20%|▏| 4/20 [00:12<00:51,  3.22s/it, Loss=1.02, MAE=4.75e+3, MAPE=2[A
Evaluation:  20%|▏| 4/20 [00:16<00:51,  3.22s/it, Loss=0.811, MAE=4.3e+3, MAPE=1[A
Evaluation:  25%|▎| 5/20 [00:16<00:48,  3.22s/it, Loss=0.811, MAE=4.3e+3, MAPE=1[A
Evaluation:  25%|▎| 5/20 [00:19<00:48,  3.22s/it, Loss=0.78, MAE=2.17e+3, M

Epoch 8, Test Loss: 0.7554276540875435, Test MAE: 3023.777099609375, Test MAPE: 3642918400.0



Training:   0%|                                          | 0/95 [00:00<?, ?it/s][A
Training:   0%|    | 0/95 [00:04<?, ?it/s, Loss=1.05, MAE=3.53e+3, MAPE=3.98e+9][A
Training:   1%| | 1/95 [00:04<06:49,  4.35s/it, Loss=1.05, MAE=3.53e+3, MAPE=3.9[A
Training:   1%| | 1/95 [00:08<06:49,  4.35s/it, Loss=0.585, MAE=3.14e+3, MAPE=3.[A
Training:   2%| | 2/95 [00:08<06:41,  4.32s/it, Loss=0.585, MAE=3.14e+3, MAPE=3.[A
Training:   2%| | 2/95 [00:12<06:41,  4.32s/it, Loss=0.576, MAE=2.53e+3, MAPE=6.[A
Training:   3%| | 3/95 [00:12<06:36,  4.31s/it, Loss=0.576, MAE=2.53e+3, MAPE=6.[A
Training:   3%| | 3/95 [00:17<06:36,  4.31s/it, Loss=0.776, MAE=2.82e+3, MAPE=3.[A
Training:   4%| | 4/95 [00:17<06:31,  4.30s/it, Loss=0.776, MAE=2.82e+3, MAPE=3.[A
Training:   4%| | 4/95 [00:21<06:31,  4.30s/it, Loss=0.852, MAE=2.35e+3, MAPE=6.[A
Training:   5%| | 5/95 [00:21<06:27,  4.30s/it, Loss=0.852, MAE=2.35e+3, MAPE=6.[A
Training:   5%| | 5/95 [00:25<06:27,  4.30s/it, Loss=0.71, MAE=3.54e+3, MAP

Epoch 9, Train Loss: 0.7619033041753267, Train MAE: 2935.14990234375, Train MAPE: 2599285248.0



Evaluation:   0%|                                        | 0/17 [00:00<?, ?it/s][A
Evaluation:   0%| | 0/17 [00:03<?, ?it/s, Loss=0.736, MAE=3.61e+3, MAPE=1.79e+8][A
Evaluation:   6%| | 1/17 [00:03<00:51,  3.24s/it, Loss=0.736, MAE=3.61e+3, MAPE=[A
Evaluation:   6%| | 1/17 [00:06<00:51,  3.24s/it, Loss=0.74, MAE=3.6e+3, MAPE=2.[A
Evaluation:  12%| | 2/17 [00:06<00:48,  3.24s/it, Loss=0.74, MAE=3.6e+3, MAPE=2.[A
Evaluation:  12%| | 2/17 [00:09<00:48,  3.24s/it, Loss=0.873, MAE=3.31e+3, MAPE=[A
Evaluation:  18%|▏| 3/17 [00:09<00:45,  3.23s/it, Loss=0.873, MAE=3.31e+3, MAPE=[A
Evaluation:  18%|▏| 3/17 [00:12<00:45,  3.23s/it, Loss=0.511, MAE=2.86e+3, MAPE=[A
Evaluation:  24%|▏| 4/17 [00:12<00:41,  3.22s/it, Loss=0.511, MAE=2.86e+3, MAPE=[A
Evaluation:  24%|▏| 4/17 [00:16<00:41,  3.22s/it, Loss=0.876, MAE=5.2e+3, MAPE=3[A
Evaluation:  29%|▎| 5/17 [00:16<00:38,  3.22s/it, Loss=0.876, MAE=5.2e+3, MAPE=3[A
Evaluation:  29%|▎| 5/17 [00:19<00:38,  3.22s/it, Loss=0.794, MAE=1.75e+3, 

Epoch 9, Val Loss: 0.7442489546888015, Val MAE: 3191.43505859375, Val MAPE: 1101428480.0



Evaluation:   0%|                                        | 0/20 [00:00<?, ?it/s][A
Evaluation:   0%|   | 0/20 [00:03<?, ?it/s, Loss=0.8, MAE=2.77e+3, MAPE=3.48e+9][A
Evaluation:   5%| | 1/20 [00:03<01:01,  3.22s/it, Loss=0.8, MAE=2.77e+3, MAPE=3.[A
Evaluation:   5%| | 1/20 [00:06<01:01,  3.22s/it, Loss=0.69, MAE=4.34e+3, MAPE=3[A
Evaluation:  10%| | 2/20 [00:06<00:58,  3.23s/it, Loss=0.69, MAE=4.34e+3, MAPE=3[A
Evaluation:  10%| | 2/20 [00:09<00:58,  3.23s/it, Loss=0.909, MAE=2.84e+3, MAPE=[A
Evaluation:  15%|▏| 3/20 [00:09<00:54,  3.23s/it, Loss=0.909, MAE=2.84e+3, MAPE=[A
Evaluation:  15%|▏| 3/20 [00:12<00:54,  3.23s/it, Loss=0.769, MAE=2.99e+3, MAPE=[A
Evaluation:  20%|▏| 4/20 [00:12<00:51,  3.23s/it, Loss=0.769, MAE=2.99e+3, MAPE=[A
Evaluation:  20%|▏| 4/20 [00:16<00:51,  3.23s/it, Loss=0.72, MAE=2.33e+3, MAPE=7[A
Evaluation:  25%|▎| 5/20 [00:16<00:48,  3.22s/it, Loss=0.72, MAE=2.33e+3, MAPE=7[A
Evaluation:  25%|▎| 5/20 [00:19<00:48,  3.22s/it, Loss=0.922, MAE=4.22e+3, 

Epoch 9, Test Loss: 0.7492606431245804, Test MAE: 2999.86572265625, Test MAPE: 3245786624.0



Training:   0%|                                          | 0/95 [00:00<?, ?it/s][A
Training:   0%|    | 0/95 [00:04<?, ?it/s, Loss=0.61, MAE=1.21e+3, MAPE=6.74e+9][A
Training:   1%| | 1/95 [00:04<06:42,  4.28s/it, Loss=0.61, MAE=1.21e+3, MAPE=6.7[A
Training:   1%| | 1/95 [00:08<06:42,  4.28s/it, Loss=0.918, MAE=2.79e+3, MAPE=1.[A
Training:   2%| | 2/95 [00:08<06:35,  4.26s/it, Loss=0.918, MAE=2.79e+3, MAPE=1.[A
Training:   2%| | 2/95 [00:12<06:35,  4.26s/it, Loss=0.635, MAE=3151.07, MAPE=4.[A
Training:   3%| | 3/95 [00:12<06:32,  4.26s/it, Loss=0.635, MAE=3151.07, MAPE=4.[A
Training:   3%| | 3/95 [00:17<06:32,  4.26s/it, Loss=1.02, MAE=3.17e+3, MAPE=2.9[A
Training:   4%| | 4/95 [00:17<06:27,  4.26s/it, Loss=1.02, MAE=3.17e+3, MAPE=2.9[A
Training:   4%| | 4/95 [00:21<06:27,  4.26s/it, Loss=1.02, MAE=4.56e+3, MAPE=7.2[A
Training:   5%| | 5/95 [00:21<06:22,  4.25s/it, Loss=1.02, MAE=4.56e+3, MAPE=7.2[A
Training:   5%| | 5/95 [00:25<06:22,  4.25s/it, Loss=0.991, MAE=4427.03, MA