<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spacing: 1px; background-color: #f6f5f5; color :#6666ff; border-radius: 200px 200px; text-align:center">Efficient Channel Attention for  Normalizer Free Networks</h1>


![](https://blog.paperspace.com/content/images/size/w1600/2020/09/eca_module.jpg)

<p style = "font-family: garamond; font-size: 20px; font-style: normal; border-radius: 10px 10px; text-align:center">We have seen in recent years that channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most of the existing methods dedicated to developing more sophisticated attention modules for achieving better
performance are inevitably increasing model complexity. <br><br>To overcome the performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting
the channel attention module in SENet (Squeeze and Excitation), the paper empirically shows avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, a local crosschannel interaction strategy without dimensionality reduction is proposed, which can be efficiently implemented via 1D convolution. Furthermore, a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction has been developed <br><br>ECA-Net's architecture is extremely similar to that of SE-Net as shown in the above figure. ECA-Net takes an input tensor which is the output of a convolutional layer and is 4-dimensional of the shape (B,C,H,W) where B represents the batch size, C represents the number of channels or total number of feature maps in that tensor and finally, H and W, represent the spatial dimensions of each feature map, namely, the height and width. The output of ECA block is also a 4-D tensor of the same shape. ECA-block is also made up of 3 modules which include:<br><br>
1. Global Feature Descriptor<br>
2. Adaptive Neighborhood Interaction<br>
3. Broadcasted Scaling</p>

<p p style = "font-family: garamond; font-size:40px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">What are we discussing today? </p>
 <p p style = "font-family: garamond; font-size:25px; font-style: normal;background-color: #f6f5f5; color :#006699; border-radius: 10px 10px; text-align:center">Normalizer Free Networks with ECA <br>
 Sam Optimizer with AdamP <br>
 Mixup Augmentation <br>
 Weighted Random Sampler <br>
 K-Fold Cross Validation <br>
 Weights and Biases for Experiment Tracking

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#ff0066; border-radius: 10px 10px; text-align:center">Upvote the kernel if you find it insightful!</p>

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">TIMM Pytorch Models</p>

<p style = "font-family: garamond; font-size: 20px; font-style: normal; border-radius: 10px 10px; text-align:center">PyTorch Image Models (timm) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.<br><br>
Using timm we will create the ECA NFNet for our problem statement. We will be using the ECA NFNET l0 which is a slimmed down from the original F0 variant. </p>

In [None]:
import sys
sys.path.append('../input/timm-pytorch-image-models/pytorch-image-models-master')

# <center><img src="https://i.imgur.com/gb6B4ig.png" width="400" alt="Weights & Biases" /></center><br>
<p style = "font-family: garamond; font-size: 20px; font-style: normal; border-radius: 10px 10px; text-align:center">Wandb is a developer tool for companies turn deep learning research projects into deployed software by helping teams track their models, visualize model performance and easily automate training and improving models.
We will use their tools to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.<br><br>We'll be using this to train our K Fold Cross Validation and gain better insights about our training. <br><br></p>

![img](https://i.imgur.com/BGgfZj3.png)

In [None]:
!pip install -q wandb --upgrade
!pip install -q adamp

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Import Libraries</p>

In [None]:
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)

# Python
from tqdm import tqdm
from collections import defaultdict
import pandas as pd
import numpy as np
import os
import random
import glob
pd.set_option('display.max_columns', None)

# Visualizations
from PIL import Image
import matplotlib.pyplot as plt
import seaborn as sns
import cv2
import plotly.express as px
%matplotlib inline
sns.set(style="whitegrid")

# Image Augmentations
import albumentations
from albumentations.pytorch.transforms import ToTensorV2


# Utils
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold

# Pytorch for Deep Learning
import torch
import torchvision
import timm
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from torch.optim.lr_scheduler import CosineAnnealingLR
from adamp import AdamP

# Weights and Biases Tool
import wandb

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Define Configurations/Parameters</p>

In [None]:
params = {
    'seed': 42,
    'model': 'eca_nfnet_l0',
    'size' : 224,
    'inp_channels': 1,
    'device': 'cuda',
    'lr': 1e-4,
    'weight_decay': 1e-6,
    'batch_size': 32,
    'num_workers' : 4,
    'epochs': 3,
    'out_features': 1,
    'name': 'CosineAnnealingLR',
    'T_max': 10,
    'min_lr': 1e-6,
    'nfolds': 5,
}

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Define Seed for Reproducibility</p>

In [None]:
def seed_everything(seed=42):
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
    
seed_everything(params['seed'])

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Define Train and Test</p>

In [None]:
train_dir = ('../input/seti-breakthrough-listen/train')
test_dir = ('../input/seti-breakthrough-listen/test')
train_df = pd.read_csv('../input/seti-breakthrough-listen/train_labels.csv')
test_df = pd.read_csv('../input/seti-breakthrough-listen/sample_submission.csv')

In [None]:
def return_filpath(name, folder=train_dir):
    path = os.path.join(folder, name[0], f'{name}.npy')
    return path

In [None]:
train_df['image_path'] = train_df['id'].apply(lambda x: return_filpath(x))
test_df['image_path'] = test_df['id'].apply(lambda x: return_filpath(x, folder=test_dir))
train_df.head()

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Minimal EDA <br> Code Credit <a href = 'https://www.kaggle.com/datafan07/pytorch-lightning-single-fold-training-lb-0-97'>Ertuğrul Demir </a></p>


In [None]:
dist = train_df.target.map({0:'Target 0', 1:'Target 1'})
dist = dist.value_counts()
fig = px.pie(dist,
             values='target',
             names=dist.index,
             hole=.4,title="Imbalanced Dataset")
fig.update_traces(textinfo='percent+label', pull=0.05)
fig.show()

<p style = "font-family: garamond; font-size: 20px; font-style: normal; border-radius: 10px 10px; text-align:center">The dataset is very imbalanced and we will see later how we use a sampler to handle it. </p>

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Image Augmentation</p>

In [None]:
def get_train_transforms():
    return albumentations.Compose(
        [
            albumentations.Resize(params['size'],params['size']),
            albumentations.HorizontalFlip(p=0.5),
            albumentations.VerticalFlip(p=0.5),
            albumentations.Rotate(limit=180, p=0.7),
            albumentations.RandomBrightness(limit=0.6, p=0.5),
            albumentations.Cutout(
                num_holes=10, max_h_size=12, max_w_size=12,
                fill_value=0, always_apply=False, p=0.5
            ),
            albumentations.ShiftScaleRotate(
                shift_limit=0.25, scale_limit=0.1, rotate_limit=0
            ),
            ToTensorV2(p=1.0),
        ]
    )

def get_valid_transforms():
    return albumentations.Compose(
        [
            albumentations.Resize(params['size'],params['size']),
            ToTensorV2(p=1.0)
        ]
    )

def get_test_transforms():
        return albumentations.Compose(
            [
                albumentations.Resize(params['size'],params['size']),
                ToTensorV2(p=1.0)
            ]
        )

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Custom Dataset</p>

In [None]:
class SETIDataset(Dataset):
    def __init__(self, images_filepaths, targets, transform=None):
        self.images_filepaths = images_filepaths
        self.targets = targets
        self.transform = transform

    def __len__(self):
        return len(self.images_filepaths)

    def __getitem__(self, idx):
        image_filepath = self.images_filepaths[idx]
        image = np.load(image_filepath).astype(np.float32)
        image = np.vstack(image).transpose((1, 0))
            
        if self.transform is not None:
            image = self.transform(image=image)["image"]
        else:
            image = image[np.newaxis,:,:]
            image = torch.from_numpy(image).float()
        
        label = torch.tensor(self.targets[idx]).float()
        return image, label

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">MixUp Augmentation</p>

<p style = "font-family: garamond; font-size: 20px; font-style: normal; border-radius: 10px 10px; text-align:center">Large deep neural networks are powerful, but exhibit undesirable behaviors such
as memorization and sensitivity to adversarial examples. MixUp is a data augmentation technique to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior
in-between training examples. Experiments show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.</p> <br><br>

![](https://miro.medium.com/max/1838/0*CdJ256L9RTDGGLrS.png)

In [None]:
def mixup(x, y, alpha=1.0, use_cuda=True):

    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1

    batch_size = x.size()[0]
    if use_cuda:
        index = torch.randperm(batch_size).cuda()
    else:
        index = torch.randperm(batch_size)

    mixed_x = lam * x + (1 - lam) * x[index, :]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam


def mixup_criterion(criterion, pred, y_a, y_b, lam):
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Custom Class for Monitoring Loss and ROC</p>

In [None]:
class MetricMonitor:
    def __init__(self, float_precision=3):
        self.float_precision = float_precision
        self.reset()

    def reset(self):
        self.metrics = defaultdict(lambda: {"val": 0, "count": 0, "avg": 0})

    def update(self, metric_name, val):
        metric = self.metrics[metric_name]

        metric["val"] += val
        metric["count"] += 1
        metric["avg"] = metric["val"] / metric["count"]

    def __str__(self):
        return " | ".join(
            [
                "{metric_name}: {avg:.{float_precision}f}".format(
                    metric_name=metric_name, avg=metric["avg"],
                    float_precision=self.float_precision
                )
                for (metric_name, metric) in self.metrics.items()
            ]
        )
    
def use_roc_score(output, target):
    try:
        y_pred = torch.sigmoid(output).cpu()
        y_pred = y_pred.detach().numpy()
        target = target.cpu()

        return roc_auc_score(target, y_pred)
    except:
        return 0.5

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Weighted Random Sampler</p>

<p style = "font-family: garamond; font-size: 20px; font-style: normal; border-radius: 10px 10px; text-align:center">Samples elements from [0 ,.., len(weights)-1] with given probabilities (weights).</p>

In [None]:
def get_sampler(train_data):
    class_counts = train_data['target'].value_counts().to_list()
    num_samples = sum(class_counts)
    labels = train_data['target'].to_list()

    class_weights = [num_samples/class_counts[i] for i in range(len(class_counts))]
    weights = [class_weights[labels[i]] for i in range(int(num_samples))]
    sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Train and Valid Loader</p>


In [None]:
def get_loader(train_data, valid_data, sampler):
    training_set = SETIDataset(
        images_filepaths=train_data['image_path'].values,
        targets=train_data['target'].values,
        transform=get_train_transforms()
            )

    validation_set = SETIDataset(
        images_filepaths=valid_data['image_path'].values,
        targets=valid_data['target'].values,
        transform=get_valid_transforms()
            )

    train_loader = DataLoader(
        training_set,
        batch_size=params['batch_size'],
        shuffle=True,
        num_workers=params['num_workers'],
        sampler = sampler,
        pin_memory=True
            )

    valid_loader = DataLoader(
        validation_set,
        batch_size=params['batch_size'],
        shuffle=False,
        num_workers=params['num_workers'],
        pin_memory=True
            )
    
    return train_loader, valid_loader

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">ECA NFNet</p>
<p style = "font-family: garamond; font-size: 20px; font-style: normal; border-radius: 10px 10px; text-align:center">Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples.<br><br> Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations.<br><br> Normalizer-Free ResNets use an adaptive gradient clipping technique which overcomes these instabilities. The smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and the largest models attain a new state-of-the-art top-1 accuracy of 86.5%. <br><br>In addition, Normalizer-Free models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training on a dataset of 300 million labeled images, with our best models obtaining an accuracy of 89.2%. </p>

![](https://miro.medium.com/max/910/1*CjpipU_oChc899f_Esjpyg.png)

<p style = "font-family: garamond; font-size: 20px; font-style: normal; border-radius: 10px 10px; text-align:center">ECA NFNet model variant is slimmed down from the original F0 variant in the paper for improved runtime characteristics (throughput, memory use) in PyTorch, on a GPU accelerator. It utilizes Efficient Channel Attention (ECA) instead of Squeeze-Excitation. It also features SiLU activations instead of the usual GELU.<br><br>
Like other models in the NF family, this model contains no normalization layers (batch, group, etc). The models make use of Weight Standardized convolutions with additional scaling values in lieu of normalization layers.</p>

In [None]:
class EcaNFNet(nn.Module):
    def __init__(self, model_name=params['model'], out_features=params['out_features'],
                 inp_channels=params['inp_channels'], pretrained=True):
        super().__init__()
        self.model = timm.create_model(model_name, pretrained=pretrained,
                                       in_chans=inp_channels)
        n_features = self.model.head.fc.in_features
        self.model.head.fc = nn.Linear(n_features, out_features, bias=True)    
        
    
    def forward(self, x):
        x = self.model(x)
        return x

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Sharpness Aware Minimization (SAM) Optimizer</p>
<p style = "font-family: garamond; font-size: 20px; font-style: normal; border-radius: 10px 10px; text-align:center">In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. <br><br>Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulation results in a min-max optimization problem on which gradient descent can be performed efficiently. The empirical results show that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-10, CIFAR-100, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, SAM natively provides robustness to label noise on par with that provided by state-of-the-art procedures that specifically target learning with noisy labels.</p>

![](https://raw.githubusercontent.com/davda54/sam/main/img/loss_landscape.png)

In [None]:
class SAM(torch.optim.Optimizer):
    def __init__(self, params, base_optimizer, rho=0.05, **kwargs):
        assert rho >= 0.0, f"Invalid rho, should be non-negative: {rho}"

        defaults = dict(rho=rho, **kwargs)
        super(SAM, self).__init__(params, defaults)

        self.base_optimizer = base_optimizer(self.param_groups, **kwargs)
        self.param_groups = self.base_optimizer.param_groups

    @torch.no_grad()
    def first_step(self, zero_grad=False):
        grad_norm = self._grad_norm()
        for group in self.param_groups:
            scale = group["rho"] / (grad_norm + 1e-12)

            for p in group["params"]:
                if p.grad is None: continue
                e_w = p.grad * scale.to(p)
                p.add_(e_w)  # climb to the local maximum "w + e(w)"
                self.state[p]["e_w"] = e_w

        if zero_grad: self.zero_grad()

    @torch.no_grad()
    def second_step(self, zero_grad=False):
        for group in self.param_groups:
            for p in group["params"]:
                if p.grad is None: continue
                p.sub_(self.state[p]["e_w"])  # get back to "w" from "w + e(w)"

        self.base_optimizer.step()  # do the actual "sharpness-aware" update

        if zero_grad: self.zero_grad()

    @torch.no_grad()
    def step(self, closure=None):
        assert closure is not None, "Sharpness Aware Minimization requires closure, but it was not provided"
        closure = torch.enable_grad()(closure)  # the closure should do a full forward-backward pass

        self.first_step(zero_grad=True)
        closure()
        self.second_step()

    def _grad_norm(self):
        shared_device = self.param_groups[0]["params"][0].device  # put everything on the same device, in case of model parallelism
        norm = torch.norm(
                    torch.stack([
                        p.grad.norm(p=2).to(shared_device)
                        for group in self.param_groups for p in group["params"]
                        if p.grad is not None
                    ]),
                    p=2
               )
        return norm

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Define Loss Function, Optimizer and Scheduler</p>

In [None]:
model = EcaNFNet()
model = model.to(params['device'])
criterion = nn.BCEWithLogitsLoss().to(params['device'])
base_optimizer = AdamP
optimizer = SAM(model.parameters(), base_optimizer, lr=params['lr'], weight_decay=params['weight_decay'])

scheduler = CosineAnnealingLR(optimizer,
                              T_max=params['T_max'],
                              eta_min=params['min_lr'],
                              last_epoch=-1)



<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Train and Validation Loops</p>

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Big Shoutout to <a href = 'https://www.kaggle.com/nakshatrasingh'>nakshatrasingh</a> for pointing out the missing Mixup implementation</p>

In [None]:
def train(train_loader, model, criterion, optimizer, epoch, params, scheduler):
    metric_monitor = MetricMonitor()
    model.train()
    stream = tqdm(enumerate(train_loader), total=len(train_loader))
       
    for i, (images, target) in stream:

        images = images.to(params['device'])
        target = target.to(params['device']).float().view(-1, 1)
        images, targets_a, targets_b, lam = mixup(images, target.view(-1, 1))
    
        output = model(images)
        loss = mixup_criterion(criterion, output, targets_a, targets_b, lam)
        
            
        loss.backward(retain_graph = True)
        optimizer.first_step(zero_grad=True)
        
        mixup_criterion(criterion, model(images), targets_a, targets_b, lam).backward()
        optimizer.second_step(zero_grad=True)
        
        
        
        roc_score = use_roc_score(output, target)
        metric_monitor.update('Loss', loss.item())
        metric_monitor.update('ROC', roc_score)
        wandb.log({"Train Epoch":epoch,"Train loss": loss.item(), "Train ROC":roc_score})
        

        stream.set_description(
            "Epoch: {epoch}. Train.      {metric_monitor}".format(
                epoch=epoch,
                metric_monitor=metric_monitor)
        )
    
    scheduler.step()

In [None]:
def validate(val_loader, model, criterion, epoch, params):
    metric_monitor = MetricMonitor()
    model.eval()
    stream = tqdm(enumerate(val_loader), total=len(val_loader))
    final_targets = []
    final_outputs = []
    with torch.no_grad():
        for i, (images, target) in stream:
            images = images.to(params['device'], non_blocking=True)
            target = target.to(params['device'], non_blocking=True).float().view(-1, 1)
            output = model(images)
            loss = criterion(output, target)
            roc_score = use_roc_score(output, target)
            metric_monitor.update('Loss', loss.item())
            metric_monitor.update('ROC', roc_score)
            wandb.log({"Valid Epoch": epoch, "Valid loss": loss.item(), "Valid ROC":roc_score})
            stream.set_description(
                "Epoch: {epoch}. Validation. {metric_monitor}".format(
                    epoch=epoch,
                    metric_monitor=metric_monitor)
            )
            
            targets = target.detach().cpu().numpy().tolist()
            outputs = output.detach().cpu().numpy().tolist()
            
            final_targets.extend(targets)
            final_outputs.extend(outputs)
    return final_outputs, final_targets

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">W&B Initialization for K-FOLD CV</p>

<p style = "font-family: garamond; font-size: 20px; font-style: normal; border-radius: 10px 10px; text-align:center">K-Fold CV gives a model with less bias compared to other methods. In K-Fold CV, we have a paprameter ‘k’. This parameter decides how many folds the dataset is going to be divided. Every fold gets chance to appears in the training set (k-1) times, which in turn ensures that every observation in the dataset appears in the dataset, thus enabling the model to learn the underlying data distribution better.<br><br>Another approach is to shuffle the dataset just once prior to splitting the dataset into k folds, and then split, such that the ratio of the observations in each class remains the same in each fold. Also the test set does not overlap between consecutive iterations. This approach is called Stratified K-Fold CV. This approach is useful for imbalanced datasets.</p>


In [None]:
best_roc = -np.inf
best_epoch = -np.inf
best_model_name = None

kfold = StratifiedKFold(n_splits=params['nfolds'], shuffle=True, random_state=params['seed'])

for fold, (trn_idx, val_idx) in enumerate(kfold.split(train_df, train_df['target'])):
    
    run = wandb.init(project='Seti-ECA-NFNet-Mixup', 
             config=params, 
             group = 'ECA-NFNet-New-Data',
             job_type='train',
             name = f'Fold{fold}')
    
    print(f"{'='*40} Fold: {fold} {'='*40}")

    train_data = train_df.loc[trn_idx]
    valid_data = train_df.loc[val_idx]
    
    sampler = get_sampler(train_data)
    
    train_loader, valid_loader = get_loader(train_data, valid_data, sampler)

    for epoch in range(1, params['epochs'] + 1):

        train(train_loader, model, criterion, optimizer, epoch, params, scheduler)
        predictions, valid_targets = validate(valid_loader, model, criterion, epoch, params)
        roc_auc = round(roc_auc_score(valid_targets, predictions), 3)
        torch.save(model.state_dict(),f"{params['model']}_{epoch}_epoch_{roc_auc}_roc_auc.pth")

        if roc_auc > best_roc:
            best_roc = roc_auc
            best_epoch = epoch
            best_model_name = f"{params['model']}_{epoch}_epoch_{roc_auc}_roc_auc.pth"
            
        
            
    print(f"Best ROC-AUC in fold: {fold} was: {best_roc:.4f}")
    print(f"Final ROC-AUC in fold: {fold} was: {roc_auc:.4f}")

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Cross Validation Results</p>

<p style = "font-family: garamond; font-size: 25px; font-style: normal; border-radius: 10px 10px; text-align:center">We are able to achieve a Validation ROC score of .7670!<br><br> Weights & Biases provides us with such easy to use interface and tools to keep a track of our Evaluation metrics like training and validation loss and Roc along with other resources like Gpu usage<br><br> Let's take a look at some of our K-Fold CV training and GPU Utilization graphs.</p>

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center"><a href = 'https://wandb.ai/tanishqgautam/Seti-ECA-NFNet?workspace=user-tanishqgautam'>Check out the Weights and Biases Dashboard here $\rightarrow$ </a></p>

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">KFold Metrics Visualization</p><br>

<center><img src="https://i.imgur.com/xBqIjjp.png" width="1500" alt="metrics" /></center>

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">GPU Utilization</p><br>

<center><img src="https://i.imgur.com/EmpBUbP.png" width="1500" alt="GPU" /></center>

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Test Loop</p>


In [None]:
model = EcaNFNet()
model.load_state_dict(torch.load(best_model_name))
model = model.to(params['device'])

In [None]:
model.eval()
predicted_labels = None

test_dataset = SETIDataset(
    images_filepaths = test_df['image_path'].values,
    targets = test_df['target'].values,
    transform = get_test_transforms()
)
test_loader = DataLoader(
    test_dataset, batch_size=params['batch_size'],
    shuffle=False, num_workers=params['num_workers'],
    pin_memory=True
)

temp_preds = None
with torch.no_grad():
    for (images, target) in tqdm(test_loader):
        images = images.to(params['device'], non_blocking=True)
        output = model(images)
        predictions = torch.sigmoid(output).cpu().numpy()
        if temp_preds is None:
            temp_preds = predictions
        else:
            temp_preds = np.vstack((temp_preds, predictions))

if predicted_labels is None:
    predicted_labels = temp_preds
else:
    predicted_labels += temp_preds

In [None]:
torch.save(model.state_dict(), f"{params['model']}_{best_epoch}epochs_weights.pth")

<p p style = "font-family: garamond; font-size:30px; font-style: normal;background-color: #f6f5f5; color :#6666ff; border-radius: 10px 10px; text-align:center">Submission File</p>

In [None]:
sub_df = pd.DataFrame()
sub_df['id'] = test_df['id']
sub_df['target'] = predicted_labels

In [None]:
sub_df.head()

In [None]:
sub_df.to_csv('submission.csv', index=False)