## Credits:
- https://www.kaggle.com/code/jhjh97/sleep-critical-point-train/edit
- https://www.kaggle.com/code/jhjh97/detect-sleep-states-train/edit

## PATHS  
- Data Preparation: https://www.kaggle.com/code/jhjh97/transformer-data-preprocessing  
- TRAIN - this  
- INFERENCE - not yet

# Import and Config

In [1]:
import pandas as pd
import numpy as np
import gc
import os, glob
import random
import math

from tqdm.auto import tqdm 
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from torch.optim import AdamW
from sklearn.metrics import average_precision_score
from timm.scheduler import CosineLRScheduler

from collections import OrderedDict
from transformers import get_cosine_schedule_with_warmup
import time
from metric import event_detection_ap



In [2]:
class CONFIG:
    # PATHS
    INPUT_DIR = '/kaggle/input/sample'
    
    # Fundamental config
    NOTDEBUG = True # False -> DEBUG, True -> normally train
    WORKERS = os.cpu_count()-1
    N_FOLDS = 5
    TRAIN_FOLD = 0
    MAX_LEN = 2**14
    USE_AMP = True
    SEED = 42
    
    # Data config
    ROLLING_RANGES = [17, 33, 65]
    N_DAYS_PER_SAMPLE = 3 # same with data prep config
    SHUFFLE=True
    FOLDS=5
    
    # Model config
    IN_CHANNEL = 1 + len(ROLLING_RANGES)*2
    SEQ_LEN = 24*60*12*N_DAYS_PER_SAMPLE
    D_MODEL = 64 if NOTDEBUG else 16
#     KERNEL_SIZE = 30
    N_BLKS = 1 if NOTDEBUG else 1
    DROPOUT = 0.2
    
    # Optimizer config
    LR = 5e-4
    WD = 1e-2
    WARMUP_PROP = 0.1
    # LR_INIT = 1e-4
    # LR_MIN = 1e-5
    
    # Train config
    EPOCHS = 20
    BS = 4
#     MAX_GRAD_NORM = 2.
#     GRAD_ACC = 32 // BS

In [3]:
def torch_fix_seed(seed=42):
    # Python random
    random.seed(seed)
    # Numpy
    np.random.seed(seed)
    # Pytorch
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    # torch.backends.cudnn.deterministic = True
    # torch.use_deterministic_algorithms = True
    # torch.backends.cudnn.benchmark = True

torch_fix_seed(CONFIG.SEED)
torch.set_default_dtype(torch.float32)

In [4]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

'cuda'

# Model

## Model Version1 : Encoder Only
### 모델 의도 설명
- 이미 attention에서 주변 맥락 압축해 현재 seq에 존재해야할 feature을 추렸어.
- 주변과 전체 맥락을 보고 한번 좀 정리를 한거지.
- 그리고 바로 한번 output 뽑아볼려고. decoder 없어도 처음에 의도했던 (주변 맥락을 deeplearning 적으로 추리기 + transformer)요소를 다 포함한 것 같아서.

### Hmm..
- attention이 너무 멀리보는것 같다는 우려가 난 있어 3일치 데이터니까, 1일차의 데이터가 3일차의 데이터와 연관도는 전혀 없을거거든. 이 부분 더 구체화 해보자.
     - 데이터를 1일치로 낮춰봐도 좋을 듯?
         - 근데 이러면 밤 12시 30분에 잔 애기는 앞에 30분의 데이터만으로 onset을 판단하도록 모델이 학습됨. 이건 모델에게서 더 멀리볼 수 있을 여지를 방해하는것같음. 얼마나 전 데이터까지가 연관있는지 모르니깐,,, 일단 전날 훗날 6시간씩 줘보면 적당하지 않을까?
     - 혹은 데이터를 1일치 + 8시간? 으로 낮춰봐도 좋을듯?  
     - **결론적으로! 요 세개에 대해 실험 해보자!**
         1. 데이터 앞뒤+1일(총 3일)
         2. 데이터 1일 (진짜 성능 떨어짐?)
         3. 데이터 1일+6시간 (제일 적절할거라 예상)

### Aggregator

- 목적은? 
    - N steps만큼을 요약해서 요약된 것들끼리 attention을 구하는거야.
    - 실제로 5초 단위의 데이터가 의미있을 것 같진 않고, 여러 step들이 묶여서 어떤 패턴을 가질 때 의미가 있을 것 같거든
    - 그래서 1분, 3분, 5분, ... 단위로 Aggregation을 시켜서 이 뭉탱이들 끼리의 attention을 구하도록 유도하는게 이 Aggregator의 목적이야.
    - 제일 적절한 aggregation 범위는 실험으로 얻어야 할 것 같구.  


- 방법은?
    - 핵심은 주어진 scale만큼으로 seq를 요약해주는 것이야.
    - 어떤 pooling 방식이든, stride가 aggregate_scale과 일치하면 괜찮을 것 같아.
    - AvgPooling으로 특정 stride써서 얻을 수 도 있고, CNN으로 지금 sacle보다 더 넓게 보면서 구해도 좋을 것 같고.
        - 5초 단위의 변화가 중요한 요소일 수 도 있는데, pooling이 적절할지 모르겠네.
    - 가장 적절한 방식은 실험으로 얻어야겠지만, 어떤 scale범위를 대표하는 features를 만들기위해 더 넓게 본다면 (지금 task에서 event가 상당히 sparse하기 때문에)탐색범위가 넓어져서 훨씬 좋을 것 같네.
    - 그래서 예상으론 일반 Pooling 방식들보단 CNN같은 더 넓게 보는 방식이 더 좋은 결과를 낼 것 같아.
    

- **IDEAS TO BE TESTED**
    1. CNN 한번으로 많이 요약하지 말고, 짧은 CNN 여러개로 계층적으로 요약하기(ResNet 방식처럼)
        1. 직렬적
        2. 병렬적
        - 1/12, 1/12, 1/4 요약할 예정.
    2. 여러 seq에 대한 attention 사용하기.
        - Seq를 1시간 단위로 압축한, 30분단위로 압축한 5분 단위로 압축된 attn (Seq d) 들 구해서, 걔들을 sum해서 사용하는건 어떨까?
    3. 다양한 kernel_size/stride 테스트(초반)

In [None]:
# class Aggregator(nn.Module):
#     # Experiments needed
    
#     def __init__(self, d_model, strategy, agg_scale=12):
#         super().__init__()
#         if strategy=='avg':
#             self.pool = nn.AvgPool1d(kernel_size=agg_scale, stride=agg_scale, ceil_mode=True)
#         elif strategy=='max':
#             self.pool = nn.MaxPool1d(kernel_size=agg_scale, stride=agg_scale, ceil_mode=True)
#         elif strategy=='cnn1':
#             self.pool = nn.Sequential(
#                 nn.Conv1d(in_channels=d_model, out_channels=d_model, kernel_size=10, stride=6, padding=4), nn.ReLU(),
#                 nn.Conv1d(in_channels=d_model, out_channels=d_model, kernel_size=10, stride=6, padding=4), nn.ReLU(),
#             )
#         elif strategy=='cnn2':
#             self.pool = nn.Sequential(
#                 nn.Conv1d(in_channels=d_model, out_channels=d_model, kernel_size=10, stride=6, padding=4), nn.ReLU(),
#                 nn.Conv1d(in_channels=d_model, out_channels=d_model, kernel_size=10, stride=6, padding=4), nn.ReLU(),
#                 nn.Conv1d(in_channels=d_model, out_channels=d_model, kernel_size=10, stride=6, padding=4), nn.ReLU(),
#             )
#         elif strategy=='cnn3':
#             self.pool = nn.Sequential(
#                 nn.Conv1d(in_channels=d_model, out_channels=d_model, kernel_size=10, stride=6, padding=4), nn.ReLU(),
#                 nn.Conv1d(in_channels=d_model, out_channels=d_model, kernel_size=10, stride=6, padding=4), nn.ReLU(),
#                 nn.Conv1d(in_channels=d_model, out_channels=d_model, kernel_size=10, stride=6, padding=4), nn.ReLU(),
#                 nn.Conv1d(in_channels=d_model, out_channels=d_model, kernel_size=10, stride=5, padding=5), nn.ReLU(),

#             )
#         else:
#             assert False, "Choose proper strategy among: 'max', 'avg', 'cnn1,2,3'"
    
#     def forward(self, x):
#         x = torch.permute(x, (0,2,1))
#         x = self.pool(x)
#         x = torch.permute(x, (0,2,1))
#         return x

In [None]:
# # Self Attention
# class TimeAttention(nn.Module):
    
#     def __init__(self, d_model, pooling_strategy, dropout=0.1, agg_scale=12):
#         super().__init__()
#         self.scale = d_model ** 0.5
#         self.strategy = pooling_strategy
#         self.agg_sacle = agg_scale
#         self.aggregate_q = Aggregator(d_model, pooling_strategy, agg_scale)  # 37056
#         self.aggregate_v = Aggregator(d_model, pooling_strategy, agg_scale)  # 37056
#         self.w_k = nn.Linear(d_model, d_model, bias = False)  # params= 4096
        
#         self.attn_dropout = nn.Dropout(dropout)
#         self.dropout = nn.Dropout(dropout)
#         self.layer_norm = nn.LayerNorm(d_model, eps=1e-6)
        
#     def forward(self, q, k, v, mask=None):
#         residual = k.clone()
        
#         q = self.aggregate_q(q)
#         k = self.w_k(k)
#         v = self.aggregate_v(v)
        
#         attn = torch.matmul(k / self.scale, q.transpose(-1,-2))  # (seq d)*(d seq/12) = (seq, seq/12)
#         torch.cuda.memory_summary()
        
#         if mask is not None:
#             attn = attn.masked_fill(mask==0, -1e9)
#         attn = self.attn_dropout(F.softmax(attn, dim= -1))  # (seq *seq/12)
#         x = torch.matmul(attn, v)  # (seq *seq/12)*(seq/12 d)
#         x = self.dropout(x)
#         x += residual
#         out = self.layer_norm(x)
        
#         # returning q,v might be potential problem.
#         return out

In [None]:
# class FeedForwardNN(nn.Module):
    
#     def __init__(self, d_model, d_hid, dropout=0.1):
#         super().__init__()
#         self.w_1 = nn.Linear(d_model, d_hid)  # params= 16384
#         self.w_2 = nn.Linear(d_hid, d_model)  # params= 16384
#         self.layer_norm = nn.LayerNorm(d_model, eps=1e-6)
#         self.dropout = nn.Dropout(dropout)

        
#     def forward(self, x):
#         residual = x.clone()
#         x = self.w_2(F.relu(self.w_1(x)))
#         x = self.dropout(x)
#         x += residual
#         out = self.layer_norm(x)
        
#         return out

In [None]:
# # Serial Architecture
# class EncoderLayer(nn.Module):
    
#     def __init__(self,d_model, d_hid, dropout=0.1, agg_scale=12):
#         super().__init__()
#         self.attention1 = TimeAttention(d_model, 'cnn1', dropout, agg_scale)
#         self.attention2 = TimeAttention(d_model, 'cnn2', dropout, agg_scale)
#         self.attention3 = TimeAttention(d_model, 'cnn3', dropout, agg_scale)
#         self.ffnn = FeedForwardNN(d_model, d_hid, dropout)
        
#     def forward(self, x):
#         # attn*3, ffnn
#         x = self.attention1(x, x, x)
#         x = self.attention2(x, x, x)
#         x = self.attention3(x, x, x)
        
#         x = self.ffnn(x)
        
#         return x

In [None]:
# class Encoder(nn.Module):
    
#     def __init__(self, n_blocks, in_channels, out_classes, d_model, d_hid, dropout=0.1, agg_scale=12):
#         super(Encoder, self).__init__()
#         self.fc_input = nn.Linear(in_channels, d_model, bias=True)  # bias true?
#         self.encoder_blks = nn.ModuleList([EncoderLayer(d_model, d_hid, dropout, agg_scale) for _ in range(n_blocks)])
#         self.fc_output = nn.Linear(d_model, out_classes)
        
#     def forward(self, x):
        
#         x = self.fc_input(x) # (B Seq d)
#         for idx, encoder in enumerate(self.encoder_blks):
#             x = encoder(x)
#         x = self.fc_output(x)
        
#         return x

## Model BiGRU

In [5]:
class ResidualBiGRU(nn.Module):
    def __init__(self, hidden_size, n_layers=1, bidir=True):
        super(ResidualBiGRU, self).__init__()

        self.hidden_size = hidden_size
        self.n_layers = n_layers

        self.gru = nn.GRU(
            hidden_size,
            hidden_size,
            n_layers,
            batch_first=True,
            bidirectional=bidir,
        )
        dir_factor = 2 if bidir else 1
        self.fc1 = nn.Linear(
            hidden_size * dir_factor, hidden_size * dir_factor * 2
        )
        self.ln1 = nn.LayerNorm(hidden_size * dir_factor * 2)
        self.fc2 = nn.Linear(hidden_size * dir_factor * 2, hidden_size)
        self.ln2 = nn.LayerNorm(hidden_size)

    def forward(self, x, h=None):
        res, new_h = self.gru(x, h)
        # res.shape = (batch_size, sequence_size, 2*hidden_size)

        res = self.fc1(res)
        res = self.ln1(res)
        res = nn.functional.relu(res)

        res = self.fc2(res)
        res = self.ln2(res)
        res = nn.functional.relu(res)

        # skip connection
        res = res + x

        return res, new_h

class MultiResidualBiGRU(nn.Module):
    def __init__(self, input_size, hidden_size, out_size, n_layers, bidir=True):
        super(MultiResidualBiGRU, self).__init__()

        self.input_size = input_size
        self.hidden_size = hidden_size
        self.out_size = out_size
        self.n_layers = n_layers

        self.fc_in = nn.Linear(input_size, hidden_size)
        self.ln = nn.LayerNorm(hidden_size)
        self.res_bigrus = nn.ModuleList(
            [
                ResidualBiGRU(hidden_size, n_layers=1, bidir=bidir)
                for _ in range(n_layers)
            ]
        )
        self.fc_out = nn.Linear(hidden_size, out_size)

    def forward(self, x, h=None):
        # if we are at the beginning of a sequence (no hidden state)
        if h is None:
            # (re)initialize the hidden state
            h = [None for _ in range(self.n_layers)]

        x = self.fc_in(x)
        x = self.ln(x)
        x = nn.functional.relu(x)

        new_h = []
        for i, res_bigru in enumerate(self.res_bigrus):
            x, new_hi = res_bigru(x, h[i])
            new_h.append(new_hi)

        x = self.fc_out(x)
#         x = F.normalize(x,dim=0)
        return x, new_h  # log probabilities + hidden states

## Model Version2: with Downsampling & Upsampling

layer normalization 어디에 써야하고, batchnorm인지 layernorm이 적절할지 어떻게 판단할 수 있을까?

In [6]:
class DownSampler(nn.Module):

    def __init__(self, scale, d_model):

        super().__init__()
        # 5sec to 5min for one step (scale=60)
        self.cnn1 = nn.Conv1d(d_model, d_model, kernel_size=10, stride=6, padding=2)
        self.cnn2 = nn.Conv1d(d_model, d_model, kernel_size=7, stride=5, padding=1)
        self.cnn3 = nn.Conv1d(d_model, d_model, kernel_size=4, stride=2, padding=1)
        
    def forward(self, x):
        # 51840
        x = self.cnn1(x)
        x = self.cnn2(x)
        x = self.cnn3(x)

        return x  # 864.0

In [7]:
class MyAttention(nn.Module):

    def __init__(self, d_model, d_hid, dropout=0.1):
        super().__init__()
        self.scale = d_model ** 0.5
        self.conv_q = nn.Conv1d(d_model, d_model, kernel_size=6, stride=2, padding=2)
        self.conv_v = nn.Conv1d(d_model, d_model, kernel_size=6, stride=2, padding=2)
        self.conv_k = nn.Conv1d(d_model, d_model, kernel_size=6, stride=2, padding=2)
        self.upsample = nn.Upsample(scale_factor=2)
        self.conv_attn = nn.Conv1d(d_model, d_model, kernel_size=5, stride=1, padding=2)

        self.attn_dropout = nn.Dropout(dropout)
        self.layer_norm1 = nn.LayerNorm(d_model, eps=1e-6)  # maybe batchnorm
        self.dropout1 = nn.Dropout(dropout)

        self.w_1 = nn.Linear(d_model, d_hid)  # params= 16384
        self.w_2 = nn.Linear(d_hid, d_model)  # params= 16384
        self.layer_norm2 = nn.LayerNorm(d_model, eps=1e-6)
        self.dropout2 = nn.Dropout(dropout)
        
    def forward(self, x):
        # x(B Seq D)
        residual = x.clone()  # ?
        q = self.conv_q(x)
        k = self.conv_k(x)
        v = self.conv_v(x)
        attn = torch.matmul(k/self.scale, q.transpose(-1,-2))
        attn = self.attn_dropout(F.softmax(attn, dim=1))
        x = torch.matmul(attn, v)  # (Lk_sm Lq)*(Lv d) ?
        x = self.dropout1(x)
        x = self.layer_norm1(x.transpose(-1,-2)).transpose(-1,-2) # !
        x = self.upsample(x)
        x = self.conv_attn(x)
        x += residual

        residual = x.clone()
        x = self.w_2(F.relu(self.w_1(x.transpose(-1,-2))))
        x = self.dropout2(x).transpose(-1,-2)
        x += residual
        x = self.layer_norm2(x.transpose(-1,-2)).transpose(-1,-2)
        # x(B Seq D)
        return x

In [8]:
class UpSampler(nn.Module):
    def __init__(self, scale, d_model):
        super().__init__()
        self.upsample1 = nn.Upsample(scale_factor=5)
        self.upsample2 = nn.Upsample(scale_factor=3)
        self.upsample3 = nn.Upsample(scale_factor=2)
        self.upsample4 = nn.Upsample(scale_factor=2)
        self.conv1 = nn.Conv1d(in_channels=d_model,out_channels=d_model, kernel_size=25,stride=1,padding=12)
        self.conv2 = nn.Conv1d(in_channels=d_model,out_channels=d_model, kernel_size=15,stride=1,padding=7)
        self.conv3 = nn.Conv1d(in_channels=d_model,out_channels=d_model, kernel_size=11,stride=1,padding=5)
        self.conv4 = nn.Conv1d(in_channels=d_model,out_channels=d_model, kernel_size=11,stride=1,padding=5)

        
    def forward(self, x):
        x = self.conv1(self.upsample1(x))
        x = self.conv2(self.upsample2(x))
        x = self.conv3(self.upsample3(x))
        x = self.conv4(self.upsample4(x))

        return x

In [9]:
class MyModel(nn.Module):
    def __init__(self,in_c, out_c, scale, d_model, d_hid, n_blks):
        super().__init__()
        self.fc_input = nn.Linear(in_c, d_model)
        self.enc_blks = nn.ModuleList([
            nn.Sequential(DownSampler(scale, d_model), 
                          MyAttention(d_model, d_hid),
                          UpSampler(scale, d_model) ) 
            for _ in range(n_blks)])
                
        self.fc_output = nn.Linear(d_model, out_c)
    
    def forward(self, x):
        x = self.fc_input(x)
        x = x.transpose(-1,-2)
        for enc_blk in self.enc_blks:
            x = enc_blk(x)
        x = self.fc_output(x.transpose(-1,-2))

        return x

torch.Size([2, 64, 864]) in downsampler out shape  
torch.Size([2, 64, 51840]) in upsampler out shape  
torch.Size([2, 64, 864]) in downsampler out shape  
torch.Size([2, 64, 51840]) in upsampler out shape  
torch.Size([2, 64, 864]) in downsampler out shape  
torch.Size([2, 64, 51840]) in upsampler out shape  
be fc_output,  torch.Size([2, 64, 51840])  
out shape torch.Size([2, 51840, 2])

In [None]:
# class MyModel_out_zero(nn.Module):
#     def __init__(self, in_c, out_c, scale, d_model, d_hid, n_blks):
#         super().__init__()
#         self.fc_input = nn.Linear(in_c, d_model)
#         self.enc_blks = nn.ModuleList([
#             nn.Sequential(DownSampler(scale, d_model), 
#                           MyAttention(d_model, d_hid),
#                           UpSampler(scale, d_model) ) 
#             for _ in range(n_blks)])
                
#         self.fc_output = nn.Linear(d_model, out_c)
    
#     def forward(self, x):
#         x = self.fc_input(x)
#         x = x.transpose(-1,-2)
#         for enc_blk in self.enc_blks:
#             x = enc_blk(x)
#         x = self.fc_output(x.transpose(-1,-2))
#         return x*0.0

# Dataset

## 1. Read from csv at each call.

In [None]:
class SleepDatasetCSV(Dataset):
    
    def __init__(self, input_paths, gen_feat, feat_type, mode):
        """
        gen_feat<bool>: rolling mean, std
        feat_type<str>: enmo, anglez
        mode: 'trainval', test'
        """
        self.gen_feat = gen_feat
        self.feat_type = feat_type
        self.mode = mode
        self.input_paths = input_paths
        
    def __len__(self):
        return len(self.input_paths)
    
    def generate_features(self, X):
        for r in CONFIG.ROLLING_RANGES:
            tmp_feat = X[f'{self.feat_type}'].rolling(r, center=True)
            X[f'{self.feat_type}_mean_{r}'] = tmp_feat.mean()
            X[f'{self.feat_type}_std_{r}'] = tmp_feat.std()
        return X.drop(columns=['onset','wakeup']).fillna(0)
    
    def __getitem__(self, index):
        path = self.input_paths[index]
        XY = pd.read_csv(path)
        
        if self.gen_feat:
            X = self.generate_features(XY).to_numpy()
        else:
            X = XY[self.feat_type].copy().to_numpy()
            
        if self.mode=='trainval':
            Y = XY[['onset','wakeup']].copy().to_numpy()
            gc.collect()
            return X,Y
        elif self.mode=='test':
            gc.collect()
            return X

In [None]:
def kfold_split(input_paths, K):
    splited_input_paths=[]
    for i in range(K):
        st=i*(len(input_paths)//K)
        ed=(i+1)*(len(input_paths)//K)
        if i==K-1:
            splited_input_paths.append(input_paths[st:])
        else:
            splited_input_paths.append(input_paths[st:ed])
    
    return splited_input_paths

In [None]:
# train valid K-Fold Cross Validation
enmo_input_paths = glob.glob(f'{CONFIG.INPUT_DIR}/enmo/*.csv')
anglez_input_paths = glob.glob(f'{CONFIG.INPUT_DIR}/anglez/*.csv')

if CONFIG.SHUFFLE:
    random.shuffle(enmo_input_paths) # shuffle
    random.shuffle(anglez_input_paths) # shuffle

# split into Kfolds
enmo_kfold_paths = kfold_split(enmo_input_paths, CONFIG.FOLDS)
anglez_kfold_paths = kfold_split(anglez_input_paths, CONFIG.FOLDS)

In [None]:
# enmo train, valid loaders
train_dl = DataLoader(SleepDatasetCSV(enmo_kfold_paths[1]+enmo_kfold_paths[2]+enmo_kfold_paths[3]+enmo_kfold_paths[4], 
                                            gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
eval_dl = DataLoader(SleepDatasetCSV(enmo_kfold_paths[0], 
                                            gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)



# enmo_train_dl0 = DataLoader(SleepDatasetCSV(enmo_kfold_paths[1]+enmo_kfold_paths[2]+enmo_kfold_paths[3]+enmo_kfold_paths[4], 
#                                             gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# enmo_valid_dl0 = DataLoader(SleepDatasetCSV(enmo_kfold_paths[0], 
#                                             gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)

# enmo_train_dl1 = DataLoader(SleepDatasetCSV(enmo_kfold_paths[0]+enmo_kfold_paths[2]+enmo_kfold_paths[3]+enmo_kfold_paths[4], 
#                                             gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# enmo_valid_dl1 = DataLoader(SleepDatasetCSV(enmo_kfold_paths[1], 
#                                             gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)

# enmo_train_dl2 = DataLoader(SleepDatasetCSV(enmo_kfold_paths[0]+enmo_kfold_paths[1]+enmo_kfold_paths[3]+enmo_kfold_paths[4], 
#                                             gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# enmo_valid_dl2 = DataLoader(SleepDatasetCSV(enmo_kfold_paths[2], 
#                                             gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)

# enmo_train_dl3 = DataLoader(SleepDatasetCSV(enmo_kfold_paths[0]+enmo_kfold_paths[1]+enmo_kfold_paths[2]+enmo_kfold_paths[4], 
#                                             gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# enmo_valid_dl3 = DataLoader(SleepDatasetCSV(enmo_kfold_paths[3], 
#                                             gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)

# enmo_train_dl4 = DataLoader(SleepDatasetCSV(enmo_kfold_paths[0]+enmo_kfold_paths[1]+enmo_kfold_paths[2]+enmo_kfold_paths[3], 
#                                             gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# enmo_valid_dl4 = DataLoader(SleepDatasetCSV(enmo_kfold_paths[4], 
#                                             gen_feat=True, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)


# # anglez train, valid loaders
# anglez_train_dl0 = DataLoader(SleepDatasetCSV(anglez_kfold_paths[1]+anglez_kfold_paths[2]+anglez_kfold_paths[3]+anglez_kfold_paths[4], 
#                                             gen_feat=True, feat_type='anglez', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# anglez_valid_dl0 = DataLoader(SleepDatasetCSV(anglez_kfold_paths[0], 
#                                             gen_feat=True, feat_type='anglez', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)

# anglez_train_dl1 = DataLoader(SleepDatasetCSV(anglez_kfold_paths[0]+anglez_kfold_paths[2]+anglez_kfold_paths[3]+anglez_kfold_paths[4], 
#                                             gen_feat=True, feat_type='anglez', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# anglez_valid_dl1 = DataLoader(SleepDatasetCSV(anglez_kfold_paths[1], 
#                                             gen_feat=True, feat_type='anglez', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)

# anglez_train_dl2 = DataLoader(SleepDatasetCSV(anglez_kfold_paths[0]+anglez_kfold_paths[1]+anglez_kfold_paths[3]+anglez_kfold_paths[4], 
#                                             gen_feat=True, feat_type='anglez', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# anglez_valid_dl2 = DataLoader(SleepDatasetCSV(anglez_kfold_paths[2], 
#                                             gen_feat=True, feat_type='anglez', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)

# anglez_train_dl3 = DataLoader(SleepDatasetCSV(anglez_kfold_paths[0]+anglez_kfold_paths[1]+anglez_kfold_paths[2]+anglez_kfold_paths[4], 
#                                             gen_feat=True, feat_type='anglez', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# anglez_valid_dl3 = DataLoader(SleepDatasetCSV(anglez_kfold_paths[3], 
#                                             gen_feat=True, feat_type='anglez', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)

# anglez_train_dl4 = DataLoader(SleepDatasetCSV(anglez_kfold_paths[0]+anglez_kfold_paths[1]+anglez_kfold_paths[2]+anglez_kfold_paths[3], 
#                                             gen_feat=True, feat_type='anglez', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# anglez_valid_dl4 = DataLoader(SleepDatasetCSV(anglez_kfold_paths[4], 
#                                             gen_feat=True, feat_type='anglez', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)

In [None]:
# for x,y in tqdm(enmo_train_dl0):
# #     print(x.shape, y.shape)
# # default 2.55 s/it

In [None]:
# enmo_train_nofeat = DataLoader(SleepDatasetCSV(glob.glob(f'{CONFIG.INPUT_DIR}/enmo/*.csv'), 
#                                             gen_feat=False, feat_type='enmo', mode='trainval'), batch_size=CONFIG.BS, shuffle=True)
# for x,y in tqdm(enmo_train_nofeat):
#     print(x.shape,y.shape)
# # without feature generation, 4.62s/it

## 2. load on the RAM

In [10]:
def kfold_split(input_paths, K):
    splited_input_paths=[]
    for i in range(K):
        st=i*(len(input_paths)//K)
        ed=(i+1)*(len(input_paths)//K)
        if i==K-1:
            splited_input_paths.append(input_paths[st:])
        else:
            splited_input_paths.append(input_paths[st:ed])
    
    return splited_input_paths

### how about store it on RAM?
enmo_input_paths = glob.glob(f'{CONFIG.INPUT_DIR}/enmo/*.csv')
if CONFIG.SHUFFLE:
    random.shuffle(enmo_input_paths) # shuffle
enmo_kfold_paths = kfold_split(enmo_input_paths, CONFIG.FOLDS)

In [11]:
enmo_kfold_paths_train=[]
enmo_kfold_paths_eval=[]
enmo_kfold_paths_train += enmo_kfold_paths[1]
enmo_kfold_paths_train += enmo_kfold_paths[2]
enmo_kfold_paths_train += enmo_kfold_paths[3]
enmo_kfold_paths_train += enmo_kfold_paths[4]
enmo_kfold_paths_eval += enmo_kfold_paths[0]

In [12]:
len(enmo_kfold_paths_train)

5721

In [13]:
enmo_inputs_train=[]
enmo_targets_train=[]
attach_hours=8
def generate_features_enmo(xy):
    for r in [17, 33, 65]:
        tmp_feat = xy['enmo'].rolling(r, center=True)
        xy[f'enmo_mean_{r}'] = tmp_feat.mean()
        xy[f'enmo_std_{r}'] = tmp_feat.std()
    return xy.drop(columns=['onset','wakeup']).fillna(0)

# TRAIN
print('about 26 min')
for i, path in tqdm(enumerate(enmo_kfold_paths_train), total=len(enmo_kfold_paths_train)):
    st=(24-attach_hours)*60*12
    ed=-(24-attach_hours)*60*12
    xy = pd.read_csv(path)
    x = generate_features_enmo(xy)
    enmo_inputs_train.append(x.to_numpy()[st:ed])
    enmo_targets_train.append(xy[['onset','wakeup']].to_numpy()[st:ed])
    del x, xy; gc.collect()

about 26 min


  0%|          | 0/5721 [00:00<?, ?it/s]

In [None]:
# EVAL
enmo_inputs_eval=[]
enmo_targets_eval=[]
attach_hours=8
print('about 6.5 min')
for i, path in tqdm(enumerate(enmo_kfold_paths_eval), total=len(enmo_kfold_paths_eval)):
    st=(24-attach_hours)*60*12
    ed=-(24-attach_hours)*60*12
    xy = pd.read_csv(path)[st:ed]
    x = generate_features(xy)
    enmo_inputs_eval.append(x.to_numpy())
    enmo_targets_eval.append(xy[['onset','wakeup']].to_numpy())
    del xy, x; gc.collect()

In [14]:
class SleepDatasetTRAIN(Dataset):
    def __init__(self):
        pass
    def __len__(self):
        return len(enmo_inputs_train)
    def __getitem__(self, index):
        return enmo_inputs_train[index], enmo_targets_train[index]
    
class SleepDatasetEVAL(Dataset):
    def __init__(self):
        pass
    def __len__(self):
        return len(enmo_inputs_eval)
    def __getitem__(self, index):
        return enmo_inputs_eval[index], enmo_targets_eval[index]

train_dl = DataLoader(SleepDatasetTRAIN(), batch_size=CONFIG.BS, shuffle=True)
# eval_dl = DataLoader(SleepDatasetEVAL(), batch_size=CONFIG.BS, shuffle=True)

# Train and Evaluate: ENMO

In [15]:
# loss function - regression, outliers represent anomalies that should be detected. -> use MSE type of regression loss
class FocalLoss(nn.Module):
    
    def __init__(self, weight=None, seize_average=True, alpha=1., gamma=2.):
        super(FocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
    
    def forward(self, inputs, targets):
        inputs = F.sigmoid(inputs)  # make outputs btw 0~1
        
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        
        BCE = F.binary_cross_entropy_with_logits(inputs, targets, reduction='none')
        BCE_EXP = torch.exp(-BCE)
        focal_loss = self.alpha * (1-BCE_EXP)**self.gamma * BCE
        
        return focal_loss.mean()

In [16]:
# weight decay
def add_weight_decay(model, weight_decay=1e-5, skip_list=()):
    decay=[]
    no_decay=[]
    for name, param in model.named_parameters():
        if not param.requires_grad:
            continue
        if len(param.shape)==1 or np.any([v in name.lower() for v in skip_list]):
            no_decay.append(param)
        else:
            decay.append(param)
    return [
        {'params': no_decay, 'weight_decay':0.},
        {'params': decay, 'weight_decay': weight_decay}
    ]

# Train

In [17]:
def train(model, train_loader, optimizer):
    model.train()
    train_loss=0
    with tqdm(train_loader, leave=True) as pbar:
        for step, (x,y) in enumerate(pbar):
            x = x.to(torch.float32).to(device)
            y = y.to(torch.float32).to(device)
            optimizer.zero_grad()
            pred, _ = model(x)
            loss = criterion(pred,y)
            train_loss += loss.item()
            
            loss.backward()
            optimizer.step()

            pbar.set_postfix(
                    OrderedDict(
                        loss=f'{loss.item():.6f}',
                        lr=f'{optimizer.param_groups[0]["lr"]:.3e}'
                    )
                )
        train_loss /= len(train_loader)
    return train_loss

In [18]:
def evaluate(model, val_loader):
    model.eval()
    val_loss=0
    with torch.no_grad():
        with tqdm(val_loader, leave=True) as pbar:
            for x,y in pbar:
                x=x.to(torch.float32).to(device)
                y=y.to(torch.float32).to(device)
                pred,_ = model(x)
                loss = criterion(pred,y)
                val_loss += loss.item()
                
                pbar.set_postfix(
                        OrderedDict(
                            loss=f'{loss.item():.6f}',
                            lr=f'{optimizer.param_groups[0]["lr"]:.3e}'
                        )
                    )
    val_loss /= len(val_loader)
    
    return val_loss

In [19]:
# Train folds !
def train_folds(model, train_dl, valid_dl, optimizer, scheduler, epochs, fold_num):
    os.makedirs('./model',exist_ok=True)
    history = {
        'train_loss': [],
        'valid_loss': [],
        'lr': [],
    }
    best_valid_loss = 1e5
    for epoch in range(epochs):
        
        train_loss = train(model, train_dl, optimizer)
        valid_loss = evaluate(model, valid_dl)

        history['train_loss'].append(train_loss)
        history['valid_loss'].append(valid_loss)
        history['lr'].append(optimizer.param_groups[0]["lr"])

        if valid_loss < best_valid_loss:
            best_valid_loss = valid_loss
            torch.save(
                model.state_dict(),
                os.path.join('./model/', f"model_best_fold-{fold_num}.pth")
            )
        print(
            f"epoch{epoch+1} -- ",
            f"train_loss = {train_loss:.6f} -- ",
            f"valid_loss = {valid_loss:.6f}",
        )
        if scheduler!=None:
            scheduler.step()

In [20]:
def get_scheduler(optimizer):
    steps = len(train_dl)*CONFIG.EPOCHS
    warmup_steps = int(steps*CONFIG.WARMUP_PROP)
    print('steps, warmup_steps: ',steps, warmup_steps)
    scheduler = get_cosine_schedule_with_warmup(optimizer,
                                                num_warmup_steps=warmup_steps,
                                                num_training_steps=steps,
                                                num_cycles=0.5)
    return scheduler

In [None]:
"""model version 1"""
# model = Encoder(
#     n_blocks=CONFIG.N_BLKS,
#     in_channels=CONFIG.IN_CHANNEL,
#     out_classes=2,
#     d_model=CONFIG.D_MODEL,
#     d_hid=CONFIG.D_MODEL*4,
#     dropout=CONFIG.DROPOUT,
#     agg_scale=None).to(device)
# # 915 MB

In [None]:
# """model version 2"""
# model = MyModel(in_c=7, out_c=2, scale=60, d_model=64, d_hid=64*2, n_blks=3).to(device)
# # model = MyModel_out_zero(in_c=7, out_c=2, scale=60, d_model=64, d_hid=64*2, n_blks=3).to(device)

In [21]:
"""model Bigru"""
model = MultiResidualBiGRU(input_size=7, hidden_size=64,out_size=2,n_layers=2).to(device)

In [None]:
# inp = torch.unsqueeze(torch.tensor(enmo_inputs_train[0]).to(torch.float32).to(device), dim=0)
# # print(inp.shape)
# out=model(inp)

In [22]:
optimizer_parameters = add_weight_decay(model, weight_decay=CONFIG.WD, skip_list=['bias'])
optimizer = AdamW(optimizer_parameters, lr=CONFIG.LR, eps=1e-6, betas=(0.9, 0.999))
print('lr ', CONFIG.LR)
# scheduler = get_scheduler(optimizer)
criterion = FocalLoss()

lr  0.0005


In [None]:
# optimizer = AdamW(optimizer_parameters, lr=1000, eps=1e-6, betas=(0.9, 0.999))
train(model, train_dl, optimizer)

  0%|          | 0/1431 [00:00<?, ?it/s]

In [None]:
# # out check
# inp = torch.unsqueeze(torch.tensor(enmo_inputs_train[0]).to(torch.float32).to(device), dim=0)
# target = torch.unsqueeze(torch.tensor(enmo_targets_train[0]).to(torch.float32).to(device), dim=0)

# # print(inp.shape)
# model.eval()
# out=model(inp)

loss 가 0.1732868105173111에서 낮아지질 않는다. 몇 step 진행하지도 않았는데. 뭔가가 grdient가 전부 0이 되거나, output들이 죽는 현상이 발생하고있음.

lr을 1000으로 학습해보니 loss는 0.3612673056125641나옴.

어떤 이유에서인지 gradient vanishing이 극복되지 않는 minima에 갇힌것같음.


- 모델의 output이 모두 0는 아님.

BiGRU 로 학습시켜보니깐, loss가 0.17328 밑으로 줄어들음. 내가 모델링을 잘 못한 것 같음.

loss=0.173467 아직 1epoch도 학습 안했긴 했지만, 그렇게 크게 차이나진 않는듯 함.

loss = 0.173296 으로 고정됨 모델을 바꿔도.

데이터 문제인가?? loss 문제인가?

loss?

target? (segmentation?)

model? (non-linearity)
- 다른 모델도 성능이 안나와야하겠지.


In [None]:
# TRAIN - fold 0
train_folds(model=model, 
            train_dl=train_dl, 
            valid_dl=eval_dl, 
            optimizer=optimizer, 
            scheduler=None,
            epochs=CONFIG.EPOCHS,
            fold_num=0)

# loss=0.173287 for train
# loss=0.173287 for eval

### Model Version 1

2.85s/it - BS8  
분명 더 forward(), backward(), dataloading 시간 줄일 방향 있을거야.

train_loss = 0.173  
val_loss = 0.17344413714368917

#### Model Version 2

2.71s/it - BS8  
train_loss =  
val_loss =  


# Plot History

In [None]:
def plot_history(history, model_path=".", show=True):
    epochs = range(1, len(history["train_loss"]) + 1)

    plt.figure()
    plt.plot(epochs, history["train_loss"], label="Training Loss")
    plt.plot(epochs, history["valid_loss"], label="Validation Loss")
    plt.yscale('log')
    plt.title("Loss evolution")
    plt.xlabel("Epochs")
    plt.ylabel("Loss")
    plt.legend()
    plt.savefig(os.path.join(model_path, "loss_evo.png"))
    if show:
        plt.show()
    plt.close()

#     plt.figure()
#     plt.plot(epochs, history["valid_mAP"])
#     plt.title("Validation mAP evolution")
#     plt.xlabel("Epochs")
#     plt.ylabel("mAP")
#     plt.savefig(os.path.join(model_path, "mAP_evo.png"))
#     if show:
#         plt.show()
#     plt.close()

    plt.figure()
    plt.plot(epochs, history["lr"])
    plt.title("Learning Rate evolution")
    plt.xlabel("Epochs")
    plt.ylabel("LR")
    plt.savefig(os.path.join(model_path, "lr_evo.png"))
    if show:
        plt.show()
    plt.close()

In [None]:
plot_history(history, model_path=model_path)
history_path = os.path.join(model_path, "history.json")
with open(history_path, "w", encoding="utf-8") as f:
    json.dump(history, f, ensure_ascii=False, indent=4)