#### Contents

0. [Load data and preprocess](#Load-data-and-preprocess)
1. [Initialize VRAE object](#Initialize-VRAE-object)
2. [Fit the model onto dataset](#Fit-the-model-onto-dataset)
3. [Transform the input timeseries to encoded latent vectors](#Transform-the-input-timeseries-to-encoded-latent-vectors)
4. [Save the model to be fetched later](#Save-the-model-to-be-fetched-later)

### Import required modules

In [1]:
from model.vrae import VRAE

from model.utils import *
import numpy as np
import torch
from torch.utils.data import DataLoader, Dataset
from tqdm.notebook import trange
import tqdm

### Input parameters

In [2]:
dload = './saved_model' #download directory

### Hyper parameters

### Load data and preprocess
- `folder` : data location
- `cols_to_remove` : generation 수행하지 않을 column 제거

**TODO : 해당 변수에 대한 처리를 어떻게 해줘야하는가 확인 작업이 필요함**

~~~
YYYYMMDD : 년월일
HHMMSS : 시분초
MNG_NO : 장비번호
IF_IDX : 회선 index
~~~

- 현재는 분석의 편의를 위해 ['YYYYMMDD', 'HHMMSS']만 제거해줌

In [3]:
# params
folder = 'data'
cols_to_remove = ['YYYYMMDD', 'HHMMSS']

# load data
df_total = load_data(folder, cols_to_remove)

# shape
print(df_total.shape)

(23195128, 56)


In [4]:
class HamonDataset(Dataset):
    def __init__(self, data, window):
        self.data = torch.Tensor(data)
        self.window = window
 
    def __len__(self):
        return len(self.data) // self.window -1
    
    def __getitem__(self, index):
#         x = self.data[index*self.window:index*(self.window+1)]
        x = self.data[index*self.window:(index+1)*(self.window)]
        return x

In [5]:
data = df_total
window = 100

In [6]:
train_dataset = HamonDataset(data, window)
train_dataset

<__main__.HamonDataset at 0x7f998fda5668>

In [7]:
train_dataset[0].shape

torch.Size([100, 56])

**Fetch `sequence_length` from dataset**

In [8]:
sequence_length = train_dataset[0].shape[0]
sequence_length

100

**Fetch `number_of_features` from dataset**

This config corresponds to number of input features

In [9]:
number_of_features = train_dataset[0].shape[1]
number_of_features

56

### Parameters

In [10]:
n_epochs = 50
hidden_size = 90
hidden_layer_depth = 1
latent_length = 20
batch_size = 32
learning_rate = 0.0005
dropout_rate = 0.2
optimizer = 'Adam' # options: ADAM, SGD
cuda = True # options: True, False
print_every=30
clip = True # options: True, False
max_grad_norm=5
loss = 'MSELoss' # options: SmoothL1Loss, MSELoss
block = 'LSTM' # options: LSTM, GRU

### Initialize VRAE object

VRAE inherits from `sklearn.base.BaseEstimator` and overrides `fit`, `transform` and `fit_transform` functions, similar to sklearn modules

In [11]:
vrae = VRAE(sequence_length=sequence_length,
            number_of_features = number_of_features,
            hidden_size = hidden_size, 
            hidden_layer_depth = hidden_layer_depth,
            latent_length = latent_length,
            batch_size = batch_size,
            learning_rate = learning_rate,
            n_epochs = n_epochs,
            dropout_rate = dropout_rate,
            optimizer = optimizer, 
            cuda = cuda,
            print_every=print_every, 
            clip=clip, 
            max_grad_norm=max_grad_norm,
            loss = loss,
            block = block,
            dload = dload)

  "num_layers={}".format(dropout, num_layers))


### Fit the model onto dataset

In [None]:
vrae.fit(train_dataset)

#If the model has to be saved, with the learnt parameters use:
# vrae.fit(dataset, save = True)

Epoch: 0
Batch 30, loss = 19379261806739456.0000, recon_loss = 19379261806739456.0000, kl_loss = 0.3441
Batch 60, loss = 91916449072807936.0000, recon_loss = 91916449072807936.0000, kl_loss = 0.5754
Batch 90, loss = 199138993357455360.0000, recon_loss = 199138993357455360.0000, kl_loss = 0.8410
Batch 120, loss = 94359520960053248.0000, recon_loss = 94359520960053248.0000, kl_loss = 1.2230
Batch 150, loss = 3174932435435520.0000, recon_loss = 3174932435435520.0000, kl_loss = 1.4889
Batch 180, loss = 79180144463314944.0000, recon_loss = 79180144463314944.0000, kl_loss = 1.9412
Batch 210, loss = 83308724726267904.0000, recon_loss = 83308724726267904.0000, kl_loss = 2.2184
Batch 240, loss = 1843471904145408.0000, recon_loss = 1843471904145408.0000, kl_loss = 2.1812
Batch 270, loss = 37535862388424704.0000, recon_loss = 37535862388424704.0000, kl_loss = 1.9807
Batch 300, loss = 98044018784468992.0000, recon_loss = 98044018784468992.0000, kl_loss = 2.6538
Batch 330, loss = 14806789157552128.

### Save the model to be fetched later

In [None]:
vrae.save('vrae.pth')

# To load a presaved model, execute:
# vrae.load('vrae.pth')

### Reconstruct

In [None]:
reconstruction = vrae.transform(train_dataset)
reconstruction