In [1]:
import numpy as np
import pandas as pd

import torch
import torch.nn as nn


# Reading Data

In [2]:
pd.read_csv('TSForecasting/dataset/ETT-small/ETTh1.csv')

Unnamed: 0,date,HUFL,HULL,MUFL,MULL,LUFL,LULL,OT
0,2016-07-01 00:00:00,5.827,2.009,1.599,0.462,4.203,1.340,30.531000
1,2016-07-01 01:00:00,5.693,2.076,1.492,0.426,4.142,1.371,27.787001
2,2016-07-01 02:00:00,5.157,1.741,1.279,0.355,3.777,1.218,27.787001
3,2016-07-01 03:00:00,5.090,1.942,1.279,0.391,3.807,1.279,25.044001
4,2016-07-01 04:00:00,5.358,1.942,1.492,0.462,3.868,1.279,21.948000
...,...,...,...,...,...,...,...,...
17415,2018-06-26 15:00:00,-1.674,3.550,-5.615,2.132,3.472,1.523,10.904000
17416,2018-06-26 16:00:00,-5.492,4.287,-9.132,2.274,3.533,1.675,11.044000
17417,2018-06-26 17:00:00,2.813,3.818,-0.817,2.097,3.716,1.523,10.271000
17418,2018-06-26 18:00:00,9.243,3.818,5.472,2.097,3.655,1.432,9.778000


In [3]:
from TSForecasting.data_provider.data_factory import data_provider

In [4]:
dataset, dataloader = data_provider(root_path='.', data_path='TSForecasting/dataset/ETT-small/ETTh1.csv',
                                    flag='test', features='MS', target='OT', data='ETTh1', 
                                    batch_size=32, freq='d', seq_len=7, label_len=0, pred_len=1,
                                    embed='timeF')


Mode: test; datapath: TSForecasting/dataset/ETT-small/ETTh1.csv, flag: test; features: MS, target: OT, data: ETTh1, batch_size: 32, freq: d, seq_len: 7, label_len: 0, pred_len: 1, embed: timeF
test 2880


- **label_len: These are the immediate past time steps that the model uses as a context or reference while predicting the future values. These time steps are part of the target but are already known to the model during prediction.**
- **pred_len: These are the future time steps that the model is supposed to predict.**
- **Together, label_len + pred_len gives you the target sequence length.**

In [5]:
dataset.data_x

array([[-0.16717427, -0.51817436, -0.22500761, ...,  0.37736106,
         0.58477531, -0.61693099],
       [-0.17870065, -0.93489991, -0.14781685, ..., -0.06913615,
        -0.09116118, -0.7166423 ],
       [ 0.1323397 , -0.48611856, -0.05123777, ...,  0.88345648,
         0.77835355, -0.74726404],
       ...,
       [ 1.63042616,  0.34685411,  1.30105059, ...,  2.40076508,
        -0.28473923, -1.65959533],
       [ 1.35379273,  0.21863081,  1.09484601, ...,  1.86535938,
        -0.38152835, -1.62886464],
       [ 1.03122593,  0.09040762,  0.86961562, ...,  1.18046983,
        -0.42912948, -1.61360825]])

In [6]:
dataset.data_y

array([[-0.16717427, -0.51817436, -0.22500761, ...,  0.37736106,
         0.58477531, -0.61693099],
       [-0.17870065, -0.93489991, -0.14781685, ..., -0.06913615,
        -0.09116118, -0.7166423 ],
       [ 0.1323397 , -0.48611856, -0.05123777, ...,  0.88345648,
         0.77835355, -0.74726404],
       ...,
       [ 1.63042616,  0.34685411,  1.30105059, ...,  2.40076508,
        -0.28473923, -1.65959533],
       [ 1.35379273,  0.21863081,  1.09484601, ...,  1.86535938,
        -0.38152835, -1.62886464],
       [ 1.03122593,  0.09040762,  0.86961562, ...,  1.18046983,
        -0.42912948, -1.61360825]])

- **Reason why there are two datasets, data_x and data_y <br>**
- **Suppose the dataset represents hourly temperature readings, and you're training a model to predict the next 24 hours based on the past 48 hours.<br>**
- **data_x might hold the temperature readings for the past 48 hours (input to the model).<br>**
- **data_y holds the readings for the next 24 hours, which the model attempts to predict.**

In [7]:
for i in dataloader:
    break

In [8]:
len(i)

4

In [9]:
i[0].shape, i[1].shape, i[2].shape, i[3].shape

(torch.Size([32, 7, 7]),
 torch.Size([32, 1, 7]),
 torch.Size([32, 7, 3]),
 torch.Size([32, 1, 3]))

- **Data loader returns a batch of 4 tensors during each iteration**
- **These tensors correspond to different components of the data used for training or evaluation in time series forecasting tasks**
- **Input Sequence (i[0]): This tensor is the model's input, where it learns patterns over the seq_len time steps.**
- **Target Sequence (i[1]): This tensor is what the model aims to predict. It includes label_len + pred_len time steps. The label_len part might be used by the model to guide its predictions for the pred_len steps.**
- **Input Time Encodings (i[2]): Time encodings provide the model with additional context about the temporal position of each input in the sequence (e.g., which hour, day, or month a particular input corresponds to). This helps the model to better understand temporal dependencies.**
- **Target Time Encodings (i[3]): Similar to i[2], these encodings provide temporal context for the target sequence.**

In [10]:
i[1][0]

tensor([[ 0.3513,  0.6995,  0.4639,  0.5533, -0.3964,  0.2468, -0.8623]],
       dtype=torch.float64)

In [11]:
i[1][0]

tensor([[ 0.3513,  0.6995,  0.4639,  0.5533, -0.3964,  0.2468, -0.8623]],
       dtype=torch.float64)

In [12]:
pd.read_csv('./data/test.csv')

Unnamed: 0,warehouse,date,holiday_name,holiday,shops_closed,winter_school_holidays,school_holidays,id
0,Prague_1,2024-03-16,,0,0,0,0,Prague_1_2024-03-16
1,Prague_1,2024-03-17,,0,0,0,0,Prague_1_2024-03-17
2,Prague_1,2024-03-18,,0,0,0,0,Prague_1_2024-03-18
3,Prague_1,2024-03-19,,0,0,0,0,Prague_1_2024-03-19
4,Prague_1,2024-03-20,,0,0,0,0,Prague_1_2024-03-20
...,...,...,...,...,...,...,...,...
392,Budapest_1,2024-05-11,,0,0,0,0,Budapest_1_2024-05-11
393,Budapest_1,2024-05-12,,0,0,0,0,Budapest_1_2024-05-12
394,Budapest_1,2024-05-13,,0,0,0,0,Budapest_1_2024-05-13
395,Budapest_1,2024-05-14,,0,0,0,0,Budapest_1_2024-05-14


In [13]:
397/32

12.40625