![SETI](https://earthsky.org/upl/2020/02/Earth-transit-zone-Breakthrough-Listen.jpg)
# About This Notebook
This is a try to demonstate the use of Vision Transformers on this Dataset by using TPUs.  
**If you found this notebook useful and use parts of it in your work, please don't forget to show your appreciation by upvoting this kernel. That keeps me motivated and inspires me to write and share these public kernels.** ðŸ˜Š

# Problem Statement
* The Breakthrough Listen team at the University of California, Berkeley, employs the worldâ€™s most powerful telescopes to scan millions of stars for signs of technology.
* Itâ€™s hard to search for a faint needle of alien transmission in the huge haystack of detections from modern technology.
* Current methods use two filters to search through the haystack.
    * First, the Listen team intersperses scans of the target stars with scans of other regions of sky. Any signal that appears in both sets of scans probably isnâ€™t coming from the direction of the target star.
    * Second, the pipeline discards signals that donâ€™t change their frequency, because this means that they are probably nearby the telescope.
* Use data science skills to help identify anomalous signals in scans of Breakthrough Listen targets.
* Because there are no confirmed examples of alien signals to use to train machine learning algorithms, the team included some simulated signals.

# Why this competition?
As evident from the problem statement, this competition presents an interesting challenge straight out of a Sci-Fi movie stuff!  
Also (if successful) this model should be able to answer one of the biggest questions in science.

# Expected Outcome
Given a numpy array of signal, we should be able to identify it as a positive class (signal from an alien lifeform) or negative class (signal from one of our devices).

# Data Description
Data is stored in a numpy float16 format in training folder and the labes are mentioned in the `train_labels.csv` file where the first letter of the file name indicates the subfolder the `.npy` file is placed inside the train directory.  
The data consist of two-dimensional arrays {shape = (6, 273, 256)}, so there may be approaches from computer vision that are promising, as well as digital signal processing, anomaly detection, and more.

# Grading Metric
Submissions are evaluated on **area under the ROC curve** between the predicted probability and the observed target.

# Problem Category
From the data and objective its is evident that this is a **Classification Problem**. But we have an option for the approach starting with vanilla ML methods to Computer Vision to Anomaly detection etc.

# Brief Introduction to Vision Transformers
It all started with [this paper](https://arxiv.org/abs/2010.11929) from Google Brain team in late 2020. And it literally says *AN IMAGE IS WORTH 16X16 WORDS*.  

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. But this paper shows that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.  

The principal approach of Transformers is to pre-train on a huge dataset and then fine-tune on a task-specific dataset. Leveraging the efficiency and scalability of transformer based networks, it has become possible to train huge models. With the models and datasets growing, there is still no sign of saturating performance.  

The architecture of ViT is shown below:-  

![ViT](https://amaarora.github.io/images/ViT.png)

On fine-tuning and application on a sample image we get the following result:-  

![Vit Result](https://user-images.githubusercontent.com/6073256/101206904-2a338f00-36b3-11eb-8920-f617abab1604.png)

As you can observe clearly the architecture attends to image regions that are semantically relevant for classification; or loosely speaking the attention mask only focus on important areas in an image.

Special thanks to [rwightman](https://github.com/rwightman/pytorch-image-models) for creating timm which makes implementing this SOTA method incredibly easy and contains all the pre-trained weights as well.

So without further ado, let's now start with some basic imports to take us through this:-

# Imports

In [None]:
!curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version 1.7

In [None]:
import sys
sys.path.append('../input/pytorch-image-models/pytorch-image-models-master')

In [None]:
# Asthetics
import warnings
import sklearn.exceptions
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings("ignore", category=sklearn.exceptions.UndefinedMetricWarning)

# General
from tqdm import tqdm
from collections import defaultdict
from datetime import datetime
import pandas as pd
import numpy as np
import os
import random
import glob
import gc
pd.set_option('display.max_columns', None)

# Visualizations
from PIL import Image
from plotly.subplots import make_subplots
from plotly.offline import iplot
import matplotlib.pyplot as plt
import seaborn as sns
import cv2
import plotly.graph_objs as go
import plotly.figure_factory as ff
import plotly.express as px
%matplotlib inline
sns.set(style="whitegrid")

# Image Aug
import albumentations
from albumentations.pytorch.transforms import ToTensorV2

# Machine Learning
# Utils
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
# Deep Learning
import tensorflow as tf
import torch
import torchvision
import timm
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from torch.optim.lr_scheduler import CosineAnnealingLR
# Metrics
from sklearn.metrics import roc_auc_score
# TPU Specific
import torch_xla
import torch_xla.debug.metrics as met
import torch_xla.distributed.parallel_loader as pl
import torch_xla.utils.utils as xu
import torch_xla.core.xla_model as xm
import torch_xla.distributed.xla_multiprocessing as xmp
import torch_xla.test.test_utils as test_utils
os.environ['XLA_USE_BF16']="1"
os.environ['XLA_TENSOR_ALLOCATOR_MAXSIZE'] = '100000000'

# Random Seed Initialize
RANDOM_SEED = 42

def seed_everything(seed=RANDOM_SEED):
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
    
seed_everything()

# Select Accelerator
def auto_select_accelerator():
    TPU_DETECTED = False
    try:
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
        TPU_DETECTED =True
    except:
        pass
    
    return TPU_DETECTED

TPU_DETECTED = auto_select_accelerator()

if TPU_DETECTED:
    device = 'TPU'
elif torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
    
print(f'Using device: {device}')

# Read the Dataset

In [None]:
csv_dir = '../input/seti-breakthrough-listen'
train_dir = '../input/seti-breakthrough-listen/train'
test_dir = '../input/seti-breakthrough-listen/test'

train_file_path = os.path.join(csv_dir, 'train_labels.csv')
sample_sub_file_path = os.path.join(csv_dir, 'sample_submission.csv')

print(f'Train file: {train_file_path}')
print(f'Train file: {sample_sub_file_path}')

In [None]:
train_df = pd.read_csv(train_file_path)
test_df = pd.read_csv(sample_sub_file_path)

In [None]:
def return_filpath(name, folder=train_dir):
    path = os.path.join(folder, name[0], f'{name}.npy')
    return path

In [None]:
train_df['image_path'] = train_df['id'].apply(lambda x: return_filpath(x))
test_df['image_path'] = test_df['id'].apply(lambda x: return_filpath(x, folder=test_dir))

In [None]:
(X_train, X_valid, y_train, y_valid) = train_test_split(train_df['image_path'],
                                                        train_df['target'],
                                                        test_size=0.2,
                                                        stratify=train_df['target'],
                                                        shuffle=True,
                                                        random_state=RANDOM_SEED)

# Model Params

In [None]:
print("Available Vision Transformer Models: ")
timm.list_models("vit*")

In [None]:
params = {
    'model': 'vit_base_patch32_384',
    'im_size': 384,
    'inp_channels': 1,
    'device': device,
    'lr': 1e-4,
    'weight_decay': 1e-6,
    'batch_size': 32,
    'num_workers' : 0,
    'epochs': 20,
    'out_features': 1,
    'Balance_Dataset': False,
}

# Image Augmentation

In [None]:
def get_train_transforms(IMG_SIZE):
    '''
    Return Augmented Image tensor for training dataset
    '''
    return albumentations.Compose(
        [
            albumentations.Resize(IMG_SIZE,IMG_SIZE),
            albumentations.HorizontalFlip(p=0.5),
            albumentations.VerticalFlip(p=0.5),
            albumentations.Rotate(limit=180, p=0.7),
            albumentations.RandomBrightness(limit=0.6, p=0.5),
            albumentations.Cutout(
                num_holes=10, max_h_size=12, max_w_size=12,
                fill_value=0, always_apply=False, p=0.5
            ),
            albumentations.ShiftScaleRotate(
                shift_limit=0.25, scale_limit=0.1, rotate_limit=0
            ),
            ToTensorV2(p=1.0),
        ]
    )

def get_valid_transforms(IMG_SIZE):
    '''
    Return resized Tensor for Validation Dataset
    '''
    return albumentations.Compose(
        [
            albumentations.Resize(IMG_SIZE,IMG_SIZE),
            ToTensorV2(p=1.0)
        ]
    )

def get_test_transforms(IMG_SIZE, TTA):
    '''
    Returns resized tensor if TTA = 1 otherwise returns
    Augmented Image tensor if TTA > 1. Values <1 not accepted.
    '''
    if TTA > 1:
        return albumentations.Compose(
            [
                albumentations.Resize(IMG_SIZE,IMG_SIZE),
                albumentations.HorizontalFlip(p=0.5),
                albumentations.VerticalFlip(p=0.5),
                albumentations.Rotate(limit=180, p=0.7),
                albumentations.RandomBrightness(limit=0.6, p=0.5),
                albumentations.ShiftScaleRotate(
                    shift_limit=0.25, scale_limit=0.1, rotate_limit=0
                ),
                ToTensorV2(p=1.0)
            ]
        )
    else:
        return albumentations.Compose(
            [
                albumentations.Resize(IMG_SIZE,IMG_SIZE),
                ToTensorV2(p=1.0)
            ]
        )

# Dataset

In [None]:
class SETIDataset(Dataset):
    def __init__(self, images_filepaths, targets, transform=None):
        self.images_filepaths = images_filepaths
        self.targets = targets
        self.transform = transform

    def __len__(self):
        return len(self.images_filepaths)

    def __getitem__(self, idx):
        image_filepath = self.images_filepaths[idx]
        image = np.load(image_filepath)
        image = image.astype(np.float32)
        image = np.vstack(image).transpose((1, 0))
            
        if self.transform is not None:
            image = self.transform(image=image)["image"]
        else:
            image = image[np.newaxis,:,:]
            image = torch.from_numpy(image).float()
        
        label = self.targets[idx].reshape(-1,)
        return image, label

# Model

In [None]:
class AlienNet(nn.Module):
    def __init__(self, model_name=params['model'], out_features=params['out_features'],
                 inp_channels=params['inp_channels'], pretrained=True):
        super().__init__()
        self.model = timm.create_model(model_name, pretrained=pretrained,
                                       in_chans=inp_channels)
        if model_name.split('_')[0] == 'efficientnet':
            n_features = self.model.classifier.in_features
            self.model.conv_stem = nn.Conv2d(inp_channels, 40, kernel_size=(3, 3),
                                             stride=(2, 2), padding=(1, 1), bias=False)
            self.model.classifier = nn.Linear(n_features, out_features)
        
        elif model_name.split('_')[0] == 'nfnet':
            n_features = self.model.head.fc.in_features
            self.model.head.fc = nn.Linear(n_features, out_features)
            
        elif model_name.split('_')[0] == 'vit':
            n_features = self.model.head.in_features
            self.model.head = nn.Linear(n_features, out_features, bias=True)
    
    def forward(self, x):
        x = self.model(x)
        return x
    
    def roc_score(self, output, target):
        try:
            y_pred = torch.sigmoid(output).cpu()
            y_pred = y_pred.detach().numpy()
            target = target.cpu()
            
            return roc_auc_score(target, y_pred)
        except:
            return 0.5
    
    def train_one_epoch(self, train_loader, criterion, optimizer, params):
        epoch_loss = 0.0
        epoch_roc = 0.0
        self.model.train()
        
        for i, (data, target) in enumerate(train_loader):
            data = data.to(params['device'], non_blocking=True)
            target = target.to(params['device'], non_blocking=True).float().view(-1, 1)
            optimizer.zero_grad()
            output = self.forward(data)
            loss = criterion(output, target)
            roc = self.roc_score(output, target)
            epoch_loss += loss
            epoch_roc += roc
            loss.backward()
            
            if params['device'].type == 'xla':
                xm.optimizer_step(optimizer)
                if i % 20 == 0:
                    xm.master_print(f"\tBATCH {i+1}/{len(train_loader)} - LOSS: {loss}")
            else:
                optimizer.step()
                
        return epoch_loss / len(train_loader), epoch_roc / len(train_loader)
    
    def validate_one_epoch(self, valid_loader, criterion, params):
        valid_loss = 0.0
        valid_roc = 0.0
        self.model.eval()
        
        for data, target in valid_loader:
            data = data.to(params['device'], non_blocking=True)
            target = target.to(params['device'], non_blocking=True).float().view(-1, 1)
            
            with torch.no_grad():
                output = self.model(data)
                loss = criterion(output, target)
                roc = self.roc_score(output, target)
                valid_loss += loss
                valid_roc += roc

        return valid_loss / len(valid_loader), valid_roc / len(valid_loader)

In [None]:
def fit_tpu(model, params, criterion, optimizer, train_loader,
            valid_loader=None):

    valid_loss_min = np.Inf

    train_losses = []
    valid_losses = []
    train_accs = []
    valid_accs = []

    for epoch in range(1, params['epochs'] + 1):
        gc.collect()
        para_train_loader = pl.ParallelLoader(train_loader, [params['device']])

        xm.master_print(f"{'='*50}")
        xm.master_print(f"EPOCH {epoch} - TRAINING...")
        train_loss, train_acc = model.train_one_epoch(
            para_train_loader.per_device_loader(params['device']),
            criterion, optimizer, params
        )
        xm.master_print(
            f"\n\t[TRAIN] EPOCH {epoch} - LOSS: {train_loss}, ROC: {train_acc}\n"
        )
        train_losses.append(train_loss)
        train_accs.append(train_acc)
        gc.collect()

        if valid_loader is not None:
            gc.collect()
            para_valid_loader = pl.ParallelLoader(valid_loader, [params['device']])
            xm.master_print(f"EPOCH {epoch} - VALIDATING...")
            valid_loss, valid_acc = model.validate_one_epoch(
                para_valid_loader.per_device_loader(params['device']),
                criterion, params
            )
            xm.master_print(f"\t[VALID] LOSS: {valid_loss}, ROC: {valid_acc}\n")
            valid_losses.append(valid_loss)
            valid_accs.append(valid_acc)
            gc.collect()

            if valid_loss <= valid_loss_min and epoch != 1:
                xm.master_print(
                    "Validation loss decreased ({:.4f} --> {:.4f}).  Saving model ...".format(
                        valid_loss_min, valid_loss
                    )
                )
                xm.save(model.state_dict(), f"AlienNet_{params['model']}_best_epoch.pth")
            valid_loss_min = valid_loss

    return {
        "train_loss": train_losses,
        "valid_losses": valid_losses,
        "train_acc": train_accs,
        "valid_acc": valid_accs,
    }

# Train

In [None]:
def _run():
    train_dataset = SETIDataset(
        images_filepaths=X_train.values,
        targets=y_train.values,
        transform=get_train_transforms(params['im_size'])
    )

    valid_dataset = SETIDataset(
        images_filepaths=X_valid.values,
        targets=y_valid.values,
        transform=get_valid_transforms(params['im_size'])
    )

    tpu_train_sampler = torch.utils.data.distributed.DistributedSampler(
        train_dataset,
        num_replicas = xm.xrt_world_size(),
        rank = xm.get_ordinal(),
        shuffle = True)

    tpu_valid_sampler = torch.utils.data.distributed.DistributedSampler(
        valid_dataset,
        num_replicas = xm.xrt_world_size(),
        rank = xm.get_ordinal(),
        shuffle = False)
    
    class_counts = y_train.value_counts().to_list()
    num_samples = sum(class_counts)
    labels = y_train.to_list()

    class_weights = [num_samples/class_counts[i] for i in range(len(class_counts))]
    weights = [class_weights[labels[i]] for i in range(int(num_samples))]
    balanced_sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))

    if params['device'] == 'TPU':
        train_loader = DataLoader(
            train_dataset, batch_size=params['batch_size'],
            sampler = tpu_train_sampler, num_workers=params['num_workers'],
            pin_memory=True, drop_last=True
        )

        val_loader = DataLoader(
            valid_dataset, batch_size=params['batch_size'],
            sampler = tpu_valid_sampler, num_workers=params['num_workers'],
            pin_memory=True
        )

    elif params['device'].type == 'cuda':
        train_loader = DataLoader(
            train_dataset, batch_size=params['batch_size'],
            sampler = balanced_sampler, num_workers=params['num_workers'],
            pin_memory=True
        )

        val_loader = DataLoader(
            valid_dataset, batch_size=params['batch_size'], shuffle=False,
            num_workers=params['num_workers'], pin_memory=True
        )
    else:
        train_loader = DataLoader(
            train_dataset, batch_size=params['batch_size'], shuffle=True,
            num_workers=params['num_workers'], pin_memory=True
        )

        val_loader = DataLoader(
            valid_dataset, batch_size=params['batch_size'], shuffle=False,
            num_workers=params['num_workers'], pin_memory=True
        )
    
    params['device'] = xm.xla_device()
    model = AlienNet()
    model = model.to(params['device'])
    criterion = nn.BCEWithLogitsLoss().to(params['device'])

    lr = params['lr'] * xm.xrt_world_size()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)

    xm.master_print(f"INITIALIZING TRAINING ON {xm.xrt_world_size()} TPU CORES")
    start_time = datetime.now()
    xm.master_print(f"Start Time: {start_time}")
    
    logs = fit_tpu(
        model=model,
        params=params,
        criterion=criterion,
        optimizer=optimizer,
        train_loader=train_loader,
        valid_loader=val_loader,
    )

    xm.master_print(f"Execution time: {datetime.now() - start_time}")

    xm.master_print("Saving Model")
    xm.save(model.state_dict(), f"AlienNet_{params['model']}_{datetime.now().strftime('%Y%m%d-%H%M')}.pth")

In [None]:
# Start training processes
def _mp_fn(rank, flags):
    torch.set_default_tensor_type("torch.FloatTensor")
    a = _run()

# _run()
FLAGS = {}
xmp.spawn(_mp_fn, args=(FLAGS,), nprocs=8, start_method="fork")

This is a simple starter kernel on implementation of Transfer Learning using Pytorch for this problem. Pytorch has many SOTA Image models which you can try out using the guidelines in this notebook.

I hope you have learnt something from this notebook. I have created this notebook as a baseline model, which you can easily fork and paly-around with to get much better results. I might update parts of it down the line when I get more GPU hours and some interesting ideas.

**If you liked this notebook and use parts of it in you code, please show some support by upvoting this kernel. It keeps me inspired to come-up with such starter kernels and share it with the community.**

Thanks and happy kaggling!