#### Motivation

I recalled Jeremy Howard talking about a guy that did a project of time series classification in which he converted the series to images using Gramian Angular Field and apply a Resnet50 afterwards. You can read the topic in the Fastai forums [here](https://forums.fast.ai/t/share-your-work-here/27676/367). 

I wanted to test this cool approach with our dataset and it was a surprise that it can achieve pretty decent results. But the most important thing is that is very different from the other solutions (LSTMs, CNN1Ds, etc) so it can contribute to the final ensemble.

#### What is Gramian Angular Field Imaging?

This [blog](https://medium.com/analytics-vidhya/encoding-time-series-as-images-b043becbdbf3) post gives a good explanation about how the encoding using Gramian Angular Field works. Long story short, it perform a polar encoding of the data followed by a Gram Matrix like operation on the resulting angles. 

<img src="https://miro.medium.com/max/1378/1*A0yHZ8GD47cQd1OACTiz6Q.gif">

Let's apply GAF transformation to some of our sensor time series to see how they look like.

In [None]:
!pip install pyts

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pyts.image import GramianAngularField

In [None]:
# Read the data
df_train = pd.read_csv("/kaggle/input/tabular-playground-series-apr-2022/train.csv")
df_train_labels = pd.read_csv("/kaggle/input/tabular-playground-series-apr-2022/train_labels.csv")
sensors = [f for f in df_train.columns if "sensor_" in f]

# Pivot the dataframe to easy the access to the sequences
df_train_piv = df_train.pivot(
    index = ["sequence", "subject"], 
    columns = "step", 
    values = sensors
    )

In [None]:
labels = df_train_labels["state"].values

# Choose 3 samples at random
for r in np.random.randint(0, len(labels), 3):
    
    # Choose 3 sensors at random
    for s in np.random.choice(sensors, 3):

        data = df_train_piv.loc[:, df_train_piv.columns.get_level_values(0) == s].values
        
        gasf = GramianAngularField(image_size=60, method="summation")
        data_gasf = gasf.fit_transform(data)

        plt.matshow(data_gasf[r])
        plt.title(f"{s}, Label: {labels[r]}")
        plt.show()

Now that we are can convert our sensor time series to images, we can apply any computer vision model we want. In my case I used a model called SimpleNet from the paper *Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures* (2016). 

On the other hand, as it has been said (I noticed it thanks to AmbrosM, as many other things), the number of times a subject appears in the data is well correlated with the target. Therefore, I concatenated that feature with those extracted by the SimpleNet in the neural network model.

![image](https://i.postimg.cc/sfhVTXMz/Image.png)

### **IMPORTANT!**


The following code runs perfectly on my local machine but gives a RAM error on Kaggle notebooks at some point. I tried to fix it but I haven't found the solution yet. For that reason, I chose to directly append the models, oofs and the submission file generated locally to this notebook. Anyhow, the same code that I used in my local machine is commented below for those who want to use it :D.

In [None]:
sub = pd.read_csv("/kaggle/input/tps-apr22-gaf-cnn2d-outputs/submission.csv")
sub.to_csv("submission.csv", index=False)

### The Code

In [None]:
# !pip install pyts

#### Imports

In [None]:
# import os
# import time
# import random

# from logging import getLogger, INFO, FileHandler,  Formatter,  StreamHandler

# import pandas as pd
# import numpy as np

# from sklearn.metrics import roc_auc_score
# from sklearn.model_selection import GroupKFold

# from pyts.image import GramianAngularField

# import torch
# import torch.nn as nn
# import torch.nn.functional as F
# from torch.utils.data import Dataset, DataLoader
# from torch.optim import Adam
# from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts, CosineAnnealingLR, OneCycleLR, ReduceLROnPlateau
# from torch.cuda.amp import autocast, GradScaler

# import warnings
# warnings.filterwarnings("ignore")

# import gc
# gc.enable()

#### Utils

In [None]:
# def set_seed(seed: int=42):
#     random.seed(seed)
#     np.random.seed(seed)
#     os.environ["PYTHONHASHSEED"] = str(seed)
#     torch.manual_seed(seed)
#     torch.cuda.manual_seed(seed)
#     torch.backends.cudnn.deterministic = True
#     torch.backends.cudnn.benchmark = False


# def init_logger(logs_path):
#     log_file = os.path.join(logs_path, "train.log")
#     logger = getLogger(__name__)
#     logger.setLevel(INFO)
#     handler1 = StreamHandler()
#     handler1.setFormatter(Formatter("%(message)s"))
#     handler2 = FileHandler(filename=log_file)
#     handler2.setFormatter(Formatter("%(message)s"))
#     logger.addHandler(handler1)
#     logger.addHandler(handler2)
#     return logger


# class AverageMeter(object):
#     def __init__(self):
#         self.reset()

#     def reset(self):
#         self.val = 0
#         self.avg = 0
#         self.sum = 0
#         self.count = 0

#     def update(self, val, n=1):
#         self.val = val
#         self.sum += val * n
#         self.count += n
#         self.avg = self.sum / self.count


# def asMinutes(s):
#     m = np.floor(s / 60)
#     s -= m * 60
#     return "%dm %ds" % (m, s)


# def timeSince(since, percent):
#     now = time.time()
#     s = now - since
#     es = s / (percent)
#     rs = es - s
#     return "%s (remain %s)" % (asMinutes(s), asMinutes(rs))

#### Datasets

In [None]:
# class TrainDataset(Dataset):
#     def __init__(self, X, counts, y):
#         self.X = X
#         self.counts = counts
#         self.y = y
    
#     def __len__(self):
#         return len(self.X)
    
#     def __getitem__(self, idx):
        
#         output = {
#             "X": torch.tensor(self.X[idx], dtype=torch.float),
#             "counts": torch.tensor(self.counts[idx], dtype=torch.float),
#             "y": torch.tensor(self.y[idx], dtype=torch.float)
#         }
#         return output


# class TestDataset(Dataset):
#     def __init__(self, X, counts):
#         self.X = X
#         self.counts = counts
    
#     def __len__(self):
#         return len(self.X)
    
#     def __getitem__(self, idx):
        
#         output = {
#             "X": torch.tensor(self.X[idx], dtype=torch.float),
#             "counts": torch.tensor(self.counts[idx], dtype=torch.float)
#         }
#         return output

#### Pytorch model

In [None]:
# class simplenet(nn.Module):
#     def __init__(self, classes=1):
#         super(simplenet, self).__init__()
#         self.features = self._make_layers() 
#         self.classifier = nn.Linear(256 + 1, classes)
#         self.drp = nn.Dropout(0.1)

#     def forward(self, x, sc):
#         out = self.features(x)

#         # Global Max Pooling
#         out = F.max_pool2d(out, kernel_size=out.size()[2:]) 
#         # out = F.dropout2d(out, 0.1, training=True)
#         out = self.drp(out)
        
#         out = out.view(out.size(0), -1)
        
#         out = torch.cat([out, sc.unsqueeze(dim=1)], axis=1)
        
#         out = self.classifier(out)
#         return out

#     def _make_layers(self):

#         model = nn.Sequential(
#                              nn.Conv2d(13, 64, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(64, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.Conv2d(64, 128, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.Conv2d(128, 128, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.Conv2d(128, 128, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False),
#                              nn.Dropout2d(p=0.1),

#                              nn.Conv2d(128, 128, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.Conv2d(128, 128, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.Conv2d(128, 256, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False),
#                              nn.Dropout2d(p=0.1),

#                              nn.Conv2d(256, 256, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.Conv2d(256, 256, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False),
#                              nn.Dropout2d(p=0.1),

#                              nn.Conv2d(256, 512, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(512, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False),
#                              nn.Dropout2d(p=0.1),

#                              nn.Conv2d(512, 2048, kernel_size=[1, 1], stride=(1, 1), padding=(0, 0)),
#                              nn.BatchNorm2d(2048, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.Conv2d(2048, 256, kernel_size=[1, 1], stride=(1, 1), padding=(0, 0)),
#                              nn.BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                              nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False),
#                              nn.Dropout2d(p=0.1),

#                              nn.Conv2d(256, 256, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1)),
#                              nn.BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True),
#                              nn.ReLU(inplace=True),

#                             )
#         for m in model.modules():
#             if isinstance(m, nn.Conv2d):
#                 nn.init.xavier_uniform_(m.weight.data, gain=nn.init.calculate_gain('relu'))

#         return model

#### NN boilerplate

In [None]:
# def get_scheduler(cfg, optimizer, trainloader=None):
#     if cfg.scheduler == "ReduceLROnPlateau":
#         scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=cfg.factor, patience=cfg.patience, verbose=True, eps=cfg.eps)
#     elif cfg.scheduler == "CosineAnnealingLR":
#         scheduler = CosineAnnealingLR(optimizer, T_max=cfg.T_max, eta_min=cfg.min_lr, last_epoch=-1)
#     elif cfg.scheduler == "CosineAnnealingWarmRestarts":
#         scheduler = CosineAnnealingWarmRestarts(optimizer, T_0=cfg.T_0, T_mult=1, eta_min=cfg.min_lr, last_epoch=-1)
#     elif cfg.scheduler == "OneCycleLR":
#         scheduler = OneCycleLR(optimizer, pct_start=0.1, div_factor=1e3, max_lr=cfg.max_lr, epochs=cfg.epochs, steps_per_epoch=len(trainloader))
#     return scheduler



# def train_fn(cfg, train_dl, model, criterion, optimizer, epoch, scheduler):
    
#     if cfg.apex:
#         scaler = GradScaler()
    
#     batch_time = AverageMeter()
#     data_time = AverageMeter()
#     losses = AverageMeter()
    
#     # switch to train mode
#     model.train()
#     start = end = time.time()
#     global_step = 0
#     preds = []
    
#     for step, data in enumerate(train_dl):
#         # measure data loading time
#         data_time.update(time.time() - end)
        
#         inputs = data["X"].to(cfg.device)
#         counts = data["counts"].to(cfg.device)
#         targets = data["y"].to(cfg.device)
#         batch_size = targets.size(0)
        
#         if cfg.apex:
#             with autocast():
#                 y_preds = model(inputs, counts)
#                 loss = criterion(y_preds.view(-1), targets)
#         else:
#             y_preds = model(inputs, counts)
#             loss = criterion(y_preds.view(-1), targets)
            
#         # record loss
#         losses.update(loss.item(), batch_size)

#         preds.append(y_preds.sigmoid().detach().cpu().numpy())
        
#         if cfg.gradient_accumulation_steps > 1:
#             loss = loss / cfg.gradient_accumulation_steps
            
#         if cfg.apex:
#             scaler.scale(loss).backward()
#         else:
#             loss.backward()
            
#         grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), CFG.max_grad_norm)
        
#         if (step + 1) % cfg.gradient_accumulation_steps == 0:
#             if cfg.apex:
#                 scaler.step(optimizer)
#                 scaler.update()
#             else:
#                 optimizer.step()

#             if isinstance(scheduler, OneCycleLR):
#                 scheduler.step()

#             optimizer.zero_grad()
#             global_step += 1

#         # measure elapsed time
#         batch_time.update(time.time() - end)
#         end = time.time()
        
#         if step % cfg.print_freq == 0 or step == (len(train_dl)-1):
#             print("Epoch: [{0}][{1}/{2}] "
#                   "Elapsed {remain:s} "
#                   "Loss: {loss.val:.4f}({loss.avg:.4f}) "
#                   "Grad: {grad_norm:.4f}  "
#                   "LR: {lr:.6f}  "
#                   .format(epoch+1, step, len(train_dl), 
#                           remain = timeSince(start, float(step+1)/len(train_dl)),
#                           loss = losses,
#                           grad_norm = grad_norm,
#                           lr = scheduler.get_lr()[0]))
    
#     predictions = np.concatenate(preds)
    
#     return losses.avg, predictions



# def valid_fn(cfg, valid_dl, model, criterion):
#     batch_time = AverageMeter()
#     data_time = AverageMeter()
#     losses = AverageMeter()
    
#     # switch to evaluation mode
#     model.eval()
#     preds = []
#     start = end = time.time()
#     for step, data in enumerate(valid_dl):
#         # measure data loading time
#         data_time.update(time.time() - end)
        
#         inputs = data["X"].to(cfg.device)
#         counts = data["counts"].to(cfg.device)
#         targets = data["y"].to(cfg.device)
#         batch_size = targets.size(0)
        
#         # compute loss
#         with torch.no_grad():
#             y_preds = model(inputs, counts)
#         loss = criterion(y_preds.view(-1), targets)
#         losses.update(loss.item(), batch_size)
        
#         preds.append(y_preds.sigmoid().detach().cpu().numpy())
        
#         if cfg.gradient_accumulation_steps > 1:
#             loss = loss / cfg.gradient_accumulation_steps
            
#         # measure elapsed time
#         batch_time.update(time.time() - end)
#         end = time.time()
        
#         if step % cfg.print_freq == 0 or step == (len(valid_dl)-1):
#             print("EVAL: [{0}/{1}] "
#                 "Elapsed {remain:s} "
#                 "Loss: {loss.val:.4f}({loss.avg:.4f}) "
#                 .format(step, len(valid_dl),
#                         loss=losses,
#                         remain=timeSince(start, float(step+1)/len(valid_dl))))
            
#     predictions = np.concatenate(preds)
    
#     return losses.avg, predictions



# def train_loop(cfg, logger, models_path, fold,
#                X_train_cv, y_train_cv, counts_train_cv, counts_val_cv, X_val_cv, y_val_cv):

#         train_cv_ds = TrainDataset(X_train_cv, counts_train_cv, y_train_cv)
#         val_cv_ds = TrainDataset(X_val_cv, counts_val_cv, y_val_cv)
        
#         train_cv_dl = DataLoader(train_cv_ds,
#                                 batch_size = cfg.batch_size, 
#                                 shuffle = True, 
#                                 num_workers = cfg.num_workers, pin_memory=True, drop_last=True)
#         val_cv_dl = DataLoader(val_cv_ds, 
#                             batch_size = cfg.batch_size * 2, 
#                             shuffle = False, 
#                             num_workers = cfg.num_workers, pin_memory=True, drop_last=False)
        
#         model = simplenet()
#         model.to(cfg.device)

#         optimizer = Adam(model.parameters(), lr=cfg.lr, weight_decay=cfg.weight_decay, amsgrad=False)
#         scheduler = get_scheduler(cfg, optimizer, train_cv_dl)
        
#         criterion = nn.BCEWithLogitsLoss()

#         best_score = 0
        
#         for epoch in range(cfg.epochs):
            
#             start_time = time.time()
            
#             # train
#             avg_loss, _ = train_fn(cfg, train_cv_dl, model, criterion, optimizer, epoch, scheduler)
#             # eval
#             avg_val_loss, preds_val_cv = valid_fn(cfg, val_cv_dl, model, criterion)
            
#             if isinstance(scheduler, ReduceLROnPlateau):
#                 scheduler.step(avg_val_loss)
#             elif isinstance(scheduler, CosineAnnealingLR):
#                 scheduler.step()
#             elif isinstance(scheduler, CosineAnnealingWarmRestarts):
#                 scheduler.step()

#             # scoring
#             score = get_score(y_val_cv, preds_val_cv)

#             elapsed = time.time() - start_time

#             logger.info(f"Epoch {epoch+1} - avg_train_loss: {avg_loss:.4f}  avg_val_loss: {avg_val_loss:.4f}  time: {elapsed:.0f}s")
#             logger.info(f"Epoch {epoch+1} - Score: {score:.4f}")

#             if score > best_score:
#                 best_score = score
#                 logger.info(f"Epoch {epoch+1} - Save Best Score: {best_score:.4f} Model")
#                 torch.save({"model": model.state_dict(), 
#                             "preds": preds_val_cv},
#                             os.path.join(models_path, f"model_best_score_fold{fold}.pth"))
            
#         preds_val_cv = torch.load(os.path.join(models_path, f"model_best_score_fold{fold}.pth"), 
#                                   map_location = torch.device("cpu"))["preds"]

#         return preds_val_cv

#### Functions for validation and inference

In [None]:
# def create_folds(df_train_labels, n_splits):
    
#     cv = GroupKFold(n_splits=n_splits)
#     df_train_labels["fold"] = -1

#     for fold, (_, val_idx) in enumerate(cv.split(df_train_labels, df_train_labels["state"], groups=df_train_labels["subject"])):
#         df_train_labels.loc[val_idx, "fold"] = fold
        
#     return df_train_labels



# def get_score(y_true, y_pred):
#     score = roc_auc_score(y_true, y_pred)
#     return score



# def validate(cfg, logger, X_train, counts_train, y_train):
    
#     oofs_preds = y_train.copy()
#     oofs_preds["preds"] = 0
    
#     for fold in range(cfg.n_splits):
        
#         logger.info(f"Fold: {fold}")
                    
#         train_idx = y_train["fold"] != fold
#         val_idx = y_train["fold"] == fold

#         X_train_cv, X_val_cv = X_train[train_idx], X_train[val_idx]
#         counts_train_cv, counts_val_cv = counts_train[train_idx], counts_train[val_idx]
#         y_train_cv, y_val_cv = y_train.loc[train_idx, "state"].values, y_train.loc[val_idx, "state"].values
        
#         preds_val = train_loop(cfg, logger, models_path, fold,
#                                X_train_cv, y_train_cv,
#                                counts_train_cv, counts_val_cv, 
#                                X_val_cv, y_val_cv)
        
#         oofs_preds.loc[val_idx, "preds"] = preds_val

#     score = get_score(y_train["state"], oofs_preds["preds"])
#     logger.info(f"Final Score: {score:.4f}")
    
#     return oofs_preds


        
# def inference(cfg, X_test, counts_test):
    
#     test_ds = TestDataset(counts_test, X_test)
#     test_dl = DataLoader(test_ds, 
#                          batch_size = cfg.batch_size * 2, 
#                          shuffle = False, 
#                          num_workers = cfg.num_workers, pin_memory=True, drop_last=False)
    
#     chkpts = [os.path.join(models_path, model) for model in os.listdir(models_path)]

#     predictions = 0
#     for c in chkpts:
#         batch_time = AverageMeter()
#         data_time = AverageMeter()
        
#         model = simplenet()
#         model.load_state_dict(torch.load(c)["model"])
#         model.to(cfg.device)
        
#         model.eval()
#         preds = []
#         start = end = time.time()
        
#         for step, data in enumerate(test_dl):
            
#             data_time.update(time.time() - end)
            
#             inputs = data["X"].to(cfg.device)
#             counts = data["counts"].to(cfg.device)
            
#             with torch.no_grad():
#                 y_preds = model(inputs, counts)
            
#             preds.append(y_preds.sigmoid().detach().cpu().numpy())
                            
#             # measure elapsed time
#             batch_time.update(time.time() - end)
#             end = time.time()
            
#             if step % cfg.print_freq == 0 or step == (len(test_dl)-1):
#                 print("EVAL: [{0}/{1}] "
#                     "Elapsed {remain:s} "
#                     .format(step, len(test_dl),
#                             remain = timeSince(start, float(step+1)/len(test_dl))))
                
#         predictions += np.concatenate(preds) / len(chkpts)
    
#     return predictions

#### Config file

In [None]:
# class CFG:
#     data_path = "/kaggle/input/tabular-playground-series-apr-2022/"
#     work_path = "/kaggle/working/"
#     sim_name = "sim_1"
#     n_splits = 10
#     img_size = 60
#     seed = 42
#     device = "cuda:0"
#     apex = False
#     print_freq = 100
#     num_workers = 4
#     scheduler = "CosineAnnealingLR" # ["ReduceLROnPlateau", "CosineAnnealingLR", "CosineAnnealingWarmRestarts"]
#     epochs = 10
#     # factor = 0.2 # ReduceLROnPlateau
#     # patience = 4 # ReduceLROnPlateau
#     # eps = 1e-6 # ReduceLROnPlateau
#     T_max = 10 # CosineAnnealingLR
#     # T_0 = 3 # CosineAnnealingWarmRestarts
#     lr = 1e-3
#     min_lr = 1e-6
#     # max_lr = 1e-4
#     batch_size = 64
#     weight_decay = 1e-6
#     gradient_accumulation_steps = 1
#     max_grad_norm = 1000

#### Validation and inference

In [None]:
# oofs_path = os.path.join(CFG.work_path, "oofs", CFG.sim_name)
# subs_path = os.path.join(CFG.work_path, "subs", CFG.sim_name)
# models_path = os.path.join(CFG.work_path, "models", CFG.sim_name)
# logs_path = os.path.join(CFG.work_path, "logs", CFG.sim_name)
# os.makedirs(oofs_path, exist_ok=True)
# os.makedirs(subs_path, exist_ok=True)
# os.makedirs(models_path, exist_ok=True)
# os.makedirs(logs_path, exist_ok=True)

# LOGGER = init_logger(logs_path)
# set_seed(CFG.seed)

In [None]:
# df_train = pd.read_csv(os.path.join(CFG.data_path, "train.csv"))
# df_test = pd.read_csv(os.path.join(CFG.data_path, "test.csv"))
# df_train_labels = pd.read_csv(os.path.join(CFG.data_path, "train_labels.csv"))

# df_train_labels["subject"] = df_train.groupby("sequence")["subject"].head(1).values

# df_train_labels = create_folds(df_train_labels, CFG.n_splits)

# sensors = [f for f in df_train.columns if "sensor_" in f]

In [None]:
# df_all = pd.concat([df_train, df_test], axis=0)

# del df_train
# del df_test
# gc.collect()

# df_all_piv = df_all.pivot(
#     index = ["sequence", "subject"], 
#     columns = "step", 
#     values = sensors
#     )

In [None]:
# counts_dict = df_all.groupby("subject")["sequence"].count().to_dict()
# counts_all = df_all_piv.index.get_level_values("subject").to_series().map(counts_dict).values
# counts_all = (counts_all - counts_all.min()) / (counts_all.max() - counts_all.min())
# counts_train = counts_all[:len(df_train_labels)]
# counts_test = counts_all[len(df_train_labels):]

# del df_all
# del counts_all
# gc.collect()

In [None]:
# gasf = GramianAngularField(image_size=CFG.img_size, method="summation")
# X_all = np.zeros((len(df_all_piv), len(sensors), CFG.img_size, CFG.img_size))
# for c,s in enumerate(sensors):
#     sensor_data = df_all_piv.loc[:, df_all_piv.columns.get_level_values(0) == s].values
#     X_all[:,c,:,:] = gasf.fit_transform(sensor_data)
    
# X_train = X_all[:len(df_train_labels), :, :, :]    
# X_test = X_all[len(df_train_labels):, :, :, :]
# y_train = df_train_labels.copy()

# del X_all
# gc.collect()

In [None]:
# oofs_predictions = validate(CFG, LOGGER, X_train, sensors, counts_train, y_train, gasf_list)
# oofs_predictions.to_csv(os.path.join(oofs_path, "oofs.csv"), index=False)

In [None]:
# test_predictions = inference(CFG, X_test, sensors, counts_test, gasf_list)
# sub = pd.read_csv(os.path.join(CFG.path, "data", "sample_submission.csv"))
# sub["state"] = test_predictions
# sub.to_csv(os.path.join(subs_path, "submission.csv"), index=False)