<a href="https://colab.research.google.com/github/kuyesu/Python-data-tools/blob/main/TabTransformer%5BPyTorch%5D_DNN_with_Attention_(%2BEDA).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
# [Spaceship Titanic][1]

We are challenged to predict which passengers were transported by the anomaly using records recovered from the spaceship’s damaged computer system.

---
## The aim of this notebook is to implement TabTransformer model from scratch with PyTorch.

---
TabTransformer is a deep neural network for tabular data modeling built upon self-attention mechanism. The model architecture is as follows:

<img src="https://raw.githubusercontent.com/keras-team/keras-io/master/examples/structured_data/img/tabtransformer/tabtransformer.png" width="400"/>

- I explained and implemented TabTransformer in <a href="#4.2">Chapter4.2</a>.
- For a deeper understanding of TabTransformer, please refer to the original paper: [TabTransformer: Tabular Data Modeling Using Contextual Embeddings](https://arxiv.org/abs/2012.06678).

---
**References:** Thanks to previous great codes and notebooks.
- [🔥🔥[TensorFlow]TabTransformer🔥🔥][2]
- [Structured data learning with TabTransformer][3]

**My Previous Notebooks:**

- I have implemented TabTransformer from scratch with TensorFlow in [SpaceshipTitanic: EDA + TabTransformer[TensorFlow]][4].
- Please note that EDA and Feature Engineering parts in this notebook are same as my previous notebooks below. If you have read it, you can <a href="#4">skip over chapter3</a>.
 - [SpaceshipTitanic: EDA + TabTransformer[TensorFlow]][4]
 - [TabNet: DNN+DecisionTree [Library & fromScratch]][5]

---
### **If you find this notebook useful, or when you copy&edit this notebook, please do give me an upvote. It helps me keep up my motivation.**

---
[1]: https://www.kaggle.com/competitions/spaceship-titanic/overview
[2]: https://www.kaggle.com/code/usharengaraju/tensorflow-tabtransformer
[3]: https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/structured_data/ipynb/tabtransformer.ipynb
[4]: https://www.kaggle.com/code/masatomurakawamm/spaceshiptitanic-eda-tabtransformer-tensorflow
[5]: https://www.kaggle.com/code/masatomurakawamm/tabnet-dnn-decisiontree-library-fromscratch

<span id='toc'/>

<h1 style="background:#05445E; border:0; border-radius: 12px; color:#D3D3D3"><center>0. TABLE OF CONTENTS</center></h1>

<ul class="list-group" style="list-style-type:none;">
    <li><a href="#1" class="list-group-item list-group-item-action">1. Settings</a></li>
    <li><a href="#2" class="list-group-item list-group-item-action">2. Data Loading</a></li>
    <li><a href="#3" class="list-group-item list-group-item-action">3. EDA and Feature Engineering</a>
        <ul class="list-group" style="list-style-type:none;">
            <li><a href="#3.1" class="list-group-item list-group-item-action">3.1 Exploratory Data Analysis</a></li>
            <li><a href="#3.2" class="list-group-item list-group-item-action">3.2 Dataset</a></li>
        </ul>
    </li>
    <li><a href="#4" class="list-group-item list-group-item-action">4. Model</a>
        <ul class="list-group" style="list-style-type:none;">
            <li><a href="#4.1" class="list-group-item list-group-item-action">4.1 Preprocessing Model</a></li>
            <li><a href="#4.2" class="list-group-item list-group-item-action">4.2 TabTransformer</a></li>
        </ul>
    </li>
    <li><a href="#5" class="list-group-item list-group-item-action">5. Training</a></li>
    <li><a href="#6" class="list-group-item list-group-item-action">6. Prediction</a></li>
</ul>


<a id ="1"></a><h1 style="background:#05445E; border:0; border-radius: 12px; color:#D3D3D3"><center>1. Settings</center></h1>
[Back to the TOC](#toc)

In [None]:
## Parameters
data_config = {
    'train_csv_path': '../input/spaceship-titanic/train.csv',
    'test_csv_path': '../input/spaceship-titanic/test.csv',
    'sample_submission_path': '../input/spaceship-titanic/sample_submission.csv',
}

exp_config = {
    'n_bins': 10,
    'n_splits': 5,
    'batch_size': 512,
    'learning_rate': 2e-4,
    'weight_decay': 0.0001,
    'train_epochs': 15,
    'finalize': True,
    'finalize_epochs': 8,
}

model_config = {
    'cat_embedding_dim': 12,
    'num_transformer_blocks': 4,
    'num_heads': 3,
    'tf_dropout_rates': [0., 0., 0., 0.,],
    'ff_dropout_rates': [0., 0., 0., 0.,],
    'mlp_dropout_rates': [0.2, 0.1],
    'mlp_hidden_units_factors': [2, 1],
}

print('Parameters setted!')

In [None]:
## Import dependencies
import numpy as np
import pandas as pd
import scipy as sp
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
import matplotlib.ticker as ticker
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import os, sys, pathlib, gc
import re, math, random, time
import datetime as dt
from tqdm import tqdm
from typing import Optional, Union, Tuple
from collections import OrderedDict

import sklearn
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OrdinalEncoder

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_addons as tfa

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import warnings
warnings.filterwarnings('ignore')

print('import done!')

In [None]:
## For reproducible results
def seed_all(s):
    random.seed(s)
    np.random.seed(s)
    tf.random.set_seed(s)
    torch.manual_seed(s)
    torch.cuda.manual_seed(s)
    torch.backends.cudnn.deterministic = True
    torch.use_deterministic_algorithms = True
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
    os.environ['PYTHONHASHSEED'] = str(s)
    print('Seeds setted!')
global_seed = 42
seed_all(global_seed)

## Limit GPU Memory in TensorFlow
## Because TensorFlow, by default, allocates the full amount of available GPU memory when it is launched.
physical_devices = tf.config.list_physical_devices('GPU')
if len(physical_devices) > 0:
    for device in physical_devices:
        tf.config.experimental.set_memory_growth(device, True)
        print('{} memory growth: {}'.format(device, tf.config.experimental.get_memory_growth(device)))
else:
    print("Not enough GPU hardware devices available")

## For Seaborn Setting
custom_params = {
    "axes.spines.right": False,
    "axes.spines.top": False,
    'grid.alpha': 0.3,
    'figure.figsize': (16, 6),
    'axes.titlesize': 'Large',
    'axes.labelsize': 'Large',
    'figure.facecolor': '#fdfcf6',
    'axes.facecolor': '#fdfcf6',
}
cluster_colors = ['#b4d2b1', '#568f8b', '#1d4a60', '#cd7e59', '#ddb247', '#d15252']
sns.set_theme(
    style='whitegrid',
    #palette=sns.color_palette(cluster_colors),
    rc=custom_params,)

<a id ="2"></a><h1 style="background:#05445E; border:0; border-radius: 12px; color:#D3D3D3"><center>2. Data Loading</center></h1>
[Back to the TOC](#toc)

---
### [File and Data Field Descriptions](https://www.kaggle.com/competitions/spaceship-titanic/data)

- **train.csv** - Personal records for about two-thirds (~8700) of the passengers, to be used as training data.
 - `PassengerId` - A unique Id for each passenger. Each Id takes the form `gggg_pp` where `gggg` indicates a group the passenger is travelling with and `pp` is their number within the group. People in a group are often family members, but not always.
 - `HomePlanet` - The planet the passenger departed from, typically their planet of permanent residence.
 - `CryoSleep` - Indicates whether the passenger elected to be put into suspended animation for the duration of the voyage. Passengers in cryosleep are confined to their cabins.
 - `Cabin` - The cabin number where the passenger is staying. Takes the form `deck/num/side`, where `side` can be either `P` for *Port* or `S` for *Starboard*.
 - `Destination` - The planet the passenger will be debarking to.
 - `Age` - The age of the passenger.
 - `VIP` - Whether the passenger has paid for special VIP service during the voyage.
 - `RoomService`, `FoodCourt`, `ShoppingMall`, `Spa`, `VRDeck` - Amount the passenger has billed at each of the *Spaceship Titanic*'s many luxury amenities.
 - `Name` - The first and last names of the passenger.
 - `Transported` - Whether the passenger was transported to another dimension. This is the target, the column you are trying to predict.


- **test.csv** - Personal records for the remaining one-third (~4300) of the passengers, to be used as test data. Your task is to predict the value of `Transported` for the passengers in this set.


- **sample_submission.csv** - A submission file in the correct format.
 - `PassengerId` - Id for each passenger in the test set.
 - `Transported` - The target. For each passenger, predict either *True* or *False*.

---
### [Submission & Evaluation](https://www.kaggle.com/competitions/spaceship-titanic/overview/evaluation)

- Submissions are evaluated based on their classification accuracy, the percentage of predicted labels that are correct.

---

In [None]:
## Data Loading
train_df = pd.read_csv(data_config['train_csv_path'])
test_df = pd.read_csv(data_config['test_csv_path'])
submission_df = pd.read_csv(data_config['sample_submission_path'])

print(f'train_length: {len(train_df)}')
print(f'test_lenght: {len(test_df)}')
print(f'submission_length: {len(submission_df)}')

In [None]:
## Null Value Check
print('train_df.info()'); print(train_df.info(), '\n')
print('test_df.info()'); print(test_df.info(), '\n')

## train_df Check
train_df.head()

---
There are some missing values.

---

<a id ="3"></a><h1 style="background:#05445E; border:0; border-radius: 12px; color:#D3D3D3"><center>3. EDA and Feature Engineering</center></h1>
[Back to the TOC](#toc)

<h2 style="background:#D4F1F4; border:0; border-radius: 12px; color:black"><center>Data Preprocessing</center></h2>

In [None]:
## Feature Selection
numerical_columns = ['Age', 'RoomService', 'FoodCourt',
                     'ShoppingMall', 'Spa', 'VRDeck']
categorical_columns = ['PassengerId', 'HomePlanet', 'CryoSleep',
                       'Cabin', 'Destination', 'VIP', 'Name']
target = 'Transported'

## Number of unique values in each categorical features.
categorical_n_unique = {cc: train_df[cc].nunique() \
                        for cc in categorical_columns}
categorical_n_unique

In [None]:
## Function for Data Preprocessing
def preprocess_df(dataframe):
    df = dataframe.copy()

    ## Drop 'Name'
    df = df.drop(['Name'], axis=1)

    ## Transform 'Transported' column to 0 or 1.
    if 'Transported' in df.columns:
        df.loc[df['Transported']==True, 'Transported'] = 1.
        df.loc[df['Transported']==False, 'Transported'] = 0.
        df['Transported'] = df['Transported'].astype('int64')

    ## Transform True-False features (CryoSleep and VIP) to 'Yes' or 'No'.
    df.loc[df['CryoSleep']==True, 'CryoSleep'] = 'Yes'
    df.loc[df['CryoSleep']==False, 'CryoSleep'] = 'No'
    df['CryoSleep'] = df['CryoSleep'].astype(str)

    df.loc[df['VIP']==True, 'VIP'] = 'Yes'
    df.loc[df['VIP']==False, 'VIP'] = 'No'
    df['VIP'] = df['VIP'].astype(str)

    ## Transform the dtypes of HomePlanet and Destination to str
    df['HomePlanet'] = df['HomePlanet'].astype(str)
    df['Destination'] = df['Destination'].astype(str)

    return df

train = preprocess_df(train_df)
train.head()

**Caution: After `astype(str)`, null values (np.nan) are replaced by the string 'nan'.**

In [None]:
## Handle 'Cabin' feature
def cabin_split(dataframe):
    df = dataframe.copy()

    df['Cabin'] = df['Cabin'].astype(str)
    cabins = df['Cabin'].str.split('/', expand=True)
    cabins.columns = ['Cabin_0', 'Cabin_1', 'Cabin_2']

    df = pd.concat([df, cabins], axis=1)
    df = df.drop(['Cabin'], axis=1)
    df['Cabin_0'].astype(str)
    df['Cabin_1'] = pd.to_numeric(df['Cabin_1'], errors='coerce')
    df['Cabin_2'].astype(str)
    df['Cabin_2'] = df['Cabin_2'].map(lambda x: 'nan' if x is None else x)

    return df

train = cabin_split(train)
train.head()

<a id ="3.1"></a><h2 style="background:#75E6DA; border:0; border-radius: 12px; color:black"><center>3.1 Exploratory Data Analysis</center></h2>
[Back to the TOC](#toc)

<a id ="3.1.1"></a><h2 style="background:#D4F1F4; border:0; border-radius: 12px; color:black"><center>Target Distribution </center></h2>

In [None]:
## Count positive and negative 'Transported'
train_pos = train.query('Transported==1').reset_index(drop=True)
train_neg = train.query('Transported==0').reset_index(drop=True)
print(f'positive samples: {len(train_pos)}, negative samples: {len(train_neg)}')

In [None]:
## Target Distribution
target_count = train.groupby(['Transported'])['PassengerId'].count()
target_percent = target_count / target_count.sum()

fig = go.Figure()
data = go.Bar(x=target_count.index.astype(str).values,
              y=target_count.values)
fig.add_trace(data)
fig.update_layout(title = dict(text="Target distribution"),
                  xaxis = dict(title="'Transported' values"),
                  yaxis = dict(title='counts'))
fig.show()

<h2 style="background:#D4F1F4; border:0; border-radius: 12px; color:black"><center>Numerical Features </center></h2>

In [None]:
## Statistics of Numerical Features
train.describe().T.style.bar(subset=['mean'],)\
                        .background_gradient(subset=['std'], cmap='coolwarm')\
                        .background_gradient(subset=['50%'], cmap='coolwarm')

In [None]:
## Statistics based on 'Transported' (pos or neg)
train.groupby('Transported').describe().T

In [None]:
## Values at 90, 95, 98, 99, 100 % quantiles.
quantiles = [0.9, 0.95, 0.98, 0.99, 1]
train_quantile_values = train[['RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck']].quantile(quantiles)
train_quantile_values

---
#### There seems to be outliers...

---

In [None]:
## Clipping outliers on 99% quantile
def clipping_quantile(dataframe, quantile_values=None, quantile=0.99):
    df = dataframe.copy()
    if quantile_values is None:
        quantile_values = df[['RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck']].quantile(quantile)

    for num_column in ['RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck']:
        num_values = df[num_column].values
        threshold = quantile_values[num_column]
        num_values = np.where(num_values > threshold, threshold, num_values)
        df[num_column] = num_values
    return df

train = clipping_quantile(train, quantile_values=None, quantile=0.99)

## Statistics after clipping outliers
train.describe().T.style.bar(subset=['mean'],)\
                        .background_gradient(subset=['std'], cmap='coolwarm')\
                        .background_gradient(subset=['50%'], cmap='coolwarm')

In [None]:
## Statistics based on 'Transported' (after Clipping Outliers)
train.groupby('Transported').describe().T

In [None]:
## Distribution of Numerical Features after Clipping Outliers
n_cols = 2
n_rows = int(np.ceil(len(numerical_columns) / n_cols))

fig, axes = plt.subplots(nrows=n_rows,ncols=n_cols,figsize=(20,15))

bins = 50
for i, column in enumerate(numerical_columns):
    q, mod = divmod(i, n_cols)
    sns.histplot(x=column, data=train,
                 hue='Transported', ax=axes[q][mod],
                 bins=bins, stat="percent",
                 kde=True, legend=True)
    axes[q][mod].set_title(f'Distribution of {numerical_columns[i]}',size=15)

fig.suptitle('Blue: Transported=0, Red: Transported=1', fontsize=20)
fig.tight_layout()
plt.show()

In [None]:
## Heat Map of Correlation Matrix
fig = px.imshow(
    train.corr(),
    color_continuous_scale='RdBu_r',
    color_continuous_midpoint=0,
    aspect='auto'
)

fig.update_layout(
    height=500,
    width=500,
    title="Heatmap",
    showlegend=False
)

fig.show()

### Binning Numerical Features

---
Binning Method
- `Age`: 0 to 100 at intervals of 5.
- `other numerical features`: Split into 10 bins.
 - 1. Value=0 is the first bin ( get by (-1, 0] ).
 - 2. Get quantiles at [ 0, 0.9, 0.95, 0.99, 1 ].
 - 3. Split between quantiles_0 and quantiles_0.9 into 6 bins.
 - 4. Use quantiles_0.95, _0.99, _1 for the rest boundary.

---

In [None]:
## Helper Functions
def bin_split(dataframe, column, n_bins, thresholds=None):
    if thresholds is None:
        if column == 'Age':
            bins = np.array([i*5 for i in range(21)])
        else:
            bins = np.array([-1, ])
            x = dataframe[column]
            x_quantiles = x.quantile([0, 0.9, 0.95, 0.99, 1])
            bins = np.append(bins, [i * ((x_quantiles.iloc[1] - x_quantiles.iloc[0]) / (n_bins-4)) for i in range(n_bins-4)])
            bins = np.append(bins, [x_quantiles.iloc[1], x_quantiles.iloc[2], x_quantiles.iloc[3], x_quantiles.iloc[4]+1])
    else:
        bins = thresholds[column]

    splits = pd.cut(dataframe[column], bins=bins, labels=False, right=True)
    return splits, bins

def binning(dataframe, numerical_columns, n_bins, thresholds=None):
    df = dataframe.copy()
    df_split_bins = {}
    for num_column in numerical_columns:
        splits, bins = bin_split(df, num_column, n_bins, thresholds)
        df[num_column] = splits
        df_split_bins[num_column] = bins
    return df, df_split_bins

n_bins = exp_config['n_bins']
train, train_split_bins = binning(train, numerical_columns, n_bins, thresholds=None)

for key in train_split_bins:
    print(f'{key} bins: \n{train_split_bins[key]}\n\n')

In [None]:
## Distribution of Numerical Features after Binning
n_cols = 2
n_rows = int(np.ceil(len(numerical_columns) / n_cols))

fig, axes = plt.subplots(nrows=n_rows, ncols=n_cols, figsize=(20,15))

bins = 50
for i, column in enumerate(numerical_columns):
    q, mod = divmod(i, n_cols)
    sns.histplot(
        x=column,
        data=train,
        hue='Transported',
        ax=axes[q][mod],
        bins=bins,
        stat="percent",
        legend=True
    )
    axes[q][mod].set_title(f'Distribution of {numerical_columns[i]}',size=15)

fig.suptitle('Blue: Transported=0, Red: Transported=1', fontsize=20)
fig.tight_layout()
plt.show()

<a id ="3.1.3"></a><h2 style="background:#D4F1F4; border:0; border-radius: 12px; color:black"><center>Categorical Features</center></h2>

In [None]:
## Distribution of Categorical Features
categorical_columns = ['HomePlanet', 'CryoSleep',
                       'Destination', 'VIP']

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=categorical_columns,
    shared_yaxes='all'
)

for i in range(2):
    for j in range(2):
        n = i*2 + j
        data0 = go.Histogram(
            x=train_neg[categorical_columns[n]],
            marker = dict(color='#0000FF'), ## Blue
            name='Transporetd=0'
        )
        data1 = go.Histogram(
            x=train_pos[categorical_columns[n]],
            marker = dict(color='#FF0000'), ## Red
            name='Transported=1'
        )

        fig.add_trace(data0, row=i+1, col=j+1)
        fig.add_trace(data1, row=i+1, col=j+1)

        fig.update_traces(opacity=0.75, histnorm='probability')
        #fig.update_layout(barmode='overlay')

fig.update_layout(title = dict(text='Blue: Transported=0, Red: Transported=1'),
                  showlegend=False,)
fig.update_yaxes(title='probability', row=1, col=1)
fig.update_yaxes(title='probability', row=2, col=1)
fig.show()

<h2 style="background:#D4F1F4; border:0; border-radius: 12px; color:black"><center>Cabin Features</center></h2>

In [None]:
## 'Cabin_0'
sns.countplot(x='Cabin_0', data=train, hue='Transported')

In [None]:
## 'Cabin_1'
sns.histplot(x='Cabin_1', data=train, hue='Transported', kde=True)

In [None]:
## 'Cabin_2'
sns.countplot(x='Cabin_2', data=train, hue='Transported')

### Binning 'Cabin_1'

In [None]:
## Histogram of 'Cabin_1' by Plotly (interactive)
fig = go.Figure()

data0 = go.Histogram(
    x=train_neg['Cabin_1'],
    marker = dict(color='#0000FF'), # Blue
    opacity=0.6,
    name='Transporetd=0'
)
data1 = go.Histogram(
    x=train_pos['Cabin_1'],
    marker = dict(color='#FF0000'), # Red
    opacity=0.6,
    name='Transported=1'
)

fig.add_trace(data0)
fig.add_trace(data1)

fig.update_layout(
    xaxis = dict(title='Cabin_1'),
    yaxis = dict(title='Count')
)
fig.update_layout(barmode='overlay')

fig.show()

In [None]:
## Binning 'Cabin_1' based on the above graph
cabin_1_bins = np.array([0, 300, 600, 1150, 1500, 1700, 2000])
train['Cabin_1'] = pd.cut(train['Cabin_1'], bins=cabin_1_bins, labels=False, right=False)

## Distribution of 'Cabin_1' after Binning
sns.countplot(x='Cabin_1', data=train, hue='Transported')

<a id ="3.2"></a><h2 style="background:#75E6DA; border:0; border-radius: 12px; color:black"><center>3.2 Dataset </center></h2>
[Back to the TOC](#toc)

<h2 style="background:#D4F1F4; border:0; border-radius: 12px; color:black"><center>Data Processing</center></h2>

In [None]:
numerical_columns_0 = ['Age', 'RoomService', 'FoodCourt',
                     'ShoppingMall', 'Spa', 'VRDeck']
numerical_columns_1 = ['Age', 'RoomService', 'FoodCourt',
                     'ShoppingMall', 'Spa', 'VRDeck', 'Cabin_1']
categorical_columns_0 = ['PassengerId', 'HomePlanet', 'CryoSleep',
                       'Cabin', 'Destination', 'VIP', 'Name']
categorical_columns_1 = ['PassengerId', 'HomePlanet', 'CryoSleep',
                       'Cabin', 'Destination', 'VIP', 'Name',
                       'Cabin_0', 'Cabin_2']

In [None]:
## Before filling null values,　making the string 'nan' (transformed by astype(str) in preprocess_df() function) back to np.nan.
for column in ['CryoSleep', 'VIP', 'HomePlanet', 'Destination', 'Cabin_0', 'Cabin_2']:
    train[column] = train[column].map(lambda x: np.nan if x=='nan' else x)

## Filling null values with mode
train = train.fillna(train.mode().iloc[0])

for numerical in numerical_columns_1:
    train[numerical] = train[numerical].astype('int64')

train.info()

In [None]:
## Test Data Processing
test = preprocess_df(test_df)
test = cabin_split(test)

test = clipping_quantile(test, quantile_values=train_quantile_values.loc[0.99])
test, _ = binning(test, numerical_columns_0, n_bins, thresholds=train_split_bins)
test['Cabin_1'] = pd.cut(test['Cabin_1'], bins=cabin_1_bins, labels=False, right=False)

for column in ['CryoSleep', 'VIP', 'HomePlanet', 'Destination', 'Cabin_0', 'Cabin_2']:
    test[column] = test[column].map(lambda x: np.nan if x=='nan' else x)

test = test.fillna(train.mode().iloc[0])

for numerical in numerical_columns_1:
    test[numerical] = test[numerical].astype('int64')

test.info()

<h2 style="background:#D4F1F4; border:0; border-radius: 12px; color:black"><center>Validation Split</center></h2>

In [None]:
## Split train samples for cross-validation
n_splits = exp_config['n_splits']
skf = StratifiedKFold(n_splits=n_splits)
train['k_folds'] = -1
for fold, (train_idx, valid_idx) in enumerate(
    skf.split(X=train, y=train['Transported'])
):
    train['k_folds'][valid_idx] = fold

## Check split samples
for i in range(n_splits):
    print(f"fold {i}: {len(train.query('k_folds==@i'))} samples")

In [None]:
## Hold-out validation
valid_fold = train.query(f'k_folds == 0').reset_index(drop=True)
train_fold = train.query(f'k_folds != 0').reset_index(drop=True)
print(len(train_fold), len(valid_fold))

<h2 style="background:#D4F1F4; border:0; border-radius: 12px; color:black"><center>Dataset and DataLoader</center></h2>

In [None]:
## After binning, all features are categorical.
numerical_columns = []
categorical_columns = ['Age', 'RoomService', 'FoodCourt',
                       'ShoppingMall', 'Spa', 'VRDeck',
                       'HomePlanet', 'CryoSleep',
                       'Destination', 'VIP',
                       'Cabin_0', 'Cabin_1', 'Cabin_2']

## Making Lookup table of categorical featurs and target
## Using sklearn.preprocessing.OrdinalEncoder
oe = OrdinalEncoder(handle_unknown='error',
                    dtype=np.int64)

encoded = oe.fit_transform(train_fold[categorical_columns].values)
#decoded = oe.inverse_transform(encoded)
train_fold[categorical_columns] = encoded

valid_fold[categorical_columns] = oe.transform(valid_fold[categorical_columns].values)
train[categorical_columns] = oe.transform(train[categorical_columns].values)
test[categorical_columns] = oe.transform(test[categorical_columns].values)

encoder_categories = oe.categories_
encoder_categories

In [None]:
## Dataset
class SpaceshipDataset(torch.utils.data.Dataset):
    def __init__(self, df, numerical_columns,
                 categorical_columns, target=None):
        self.df = df
        self.numerical_columns = numerical_columns
        self.categorical_columns = categorical_columns
        self.target = target

    def __len__(self):
        return len(self.df)

    def __getitem__(self, index):
        data = {}

        for nc in self.numerical_columns:
            x = torch.tensor(self.df[nc][index],
                             dtype=torch.float32)
            x = torch.unsqueeze(x, dim=0)
            data[nc] = x

        for cc in self.categorical_columns:
            x = torch.tensor(self.df[cc][index],
                             dtype=torch.int32)
            x = torch.unsqueeze(x, dim=0)
            data[cc] = x

        if self.target is not None:
            label = torch.tensor(self.df[self.target][index],
                                 dtype=torch.float32)
            label = torch.unsqueeze(label, dim=-1)
            return data, label
        else:
            return data

In [None]:
## Create Datasets
train_ds = SpaceshipDataset(
    train_fold,
    numerical_columns,
    categorical_columns,
    target='Transported'
)

val_ds = SpaceshipDataset(
    valid_fold,
    numerical_columns,
    categorical_columns,
    target='Transported'
)

test_ds = SpaceshipDataset(
    test,
    numerical_columns,
    categorical_columns,
    target=None
)

## Operation Check
index = 0
print(train_ds.__getitem__(index))

In [None]:
## Create DataLoaders
batch_size = exp_config['batch_size']

train_dl = torch.utils.data.DataLoader(
    train_ds,
    batch_size=batch_size,
    shuffle=True
)
val_dl = torch.utils.data.DataLoader(
    val_ds,
    batch_size=batch_size,
    shuffle=True
)
test_dl = torch.utils.data.DataLoader(
    test_ds,
    batch_size=batch_size,
    shuffle=False,
    drop_last=False
)

dl_dict = {'train': train_dl, 'val': val_dl}

## Operation Check
sample_data, sample_label = next(iter(dl_dict['train']))
input_dtypes = {}
for key in sample_data:
    input_dtypes[key] = sample_data[key].dtype
    print(f'{key}, shape:{sample_data[key].shape}, dtype:{sample_data[key].dtype}')

print('Label shape: ', sample_label.shape)

<a id ="4"></a><h1 style="background:#05445E; border:0; border-radius: 12px; color:#D3D3D3"><center>4. Model</center></h1>

<a id ="4.1"></a><h2 style="background:#75E6DA; border:0; border-radius: 12px; color:black"><center>4.1 Preprocessing Model</center></h2>
[Back to the TOC](#toc)

This Preprocessing model recieves input data from the dataset, and handles numerical and categorical features respectively. Numerical features are gathered in a tensor. Categorical features are transformed into uniform shaped tensors by learnable embedding layers.

In [None]:
class Preprocessor(nn.Module):
    def __init__(self, numerical_columns, categorical_columns, encoder_categories, emb_dim):
        super().__init__()
        self.numerical_columns = numerical_columns
        self.categorical_columns = categorical_columns
        self.encoder_categories = encoder_categories
        self.emb_dim = emb_dim
        self.embed_layers = nn.ModuleDict()

        for i, categorical in enumerate(categorical_columns):
            embedding = nn.Embedding(
                num_embeddings=len(self.encoder_categories[i]),
                embedding_dim=self.emb_dim,
            )
            self.embed_layers[categorical] = embedding

    def forward(self, x):
        x_nums = []
        for numerical in self.numerical_columns:
            x_num = torch.unsqueeze(x[numerical], dim=1)
            x_nums.append(x_num)
        if len(x_nums) > 0:
            x_nums = torch.cat(x_nums, dim=1)
        else:
            x_nums = torch.tensor(x_nums, dtype=torch.float32)

        x_cats = []
        for categorical in self.categorical_columns:
            x_cat = self.embed_layers[categorical](x[categorical])
            x_cats.append(x_cat)
        if len(x_cats) > 0:
            x_cats = torch.cat(x_cats, dim=1)
        else:
            x_cats = torch.tensor(x_cats, dtype=torch.float32)

        return x_nums, x_cats

## Operation Check
preprocessor = Preprocessor(numerical_columns,
                            categorical_columns,
                            encoder_categories,
                            emb_dim=3)
x_nums, x_cats = preprocessor(sample_data)
x_nums.shape, x_cats.shape

<a id ="4.2"></a><h2 style="background:#75E6DA; border:0; border-radius: 12px; color:black"><center>4.2 TabTransformer</center></h2>
[Back to the TOC](#toc)

### Tab Transformer

Again, the architecture of TabTransformer is as follows:

<img src="https://raw.githubusercontent.com/keras-team/keras-io/master/examples/structured_data/img/tabtransformer/tabtransformer.png" width="400"/>

The TabTransformer architecture comprises a column embedding layer, a stack of $N$ Transformer layers, and a multi-layer perceptron (MLP).

- **Column embedding layer:** All the categorical features are encoded into parametric embeddings, of same dimensions. This means that each value in each categorical feature will have its own embedding vector.

- **Transformer layer:** The embedded categorical features are fed into a stack of Transformer blocks. Each Transformer block consists of a multi-head self-attention layer followed by a position-wise feed-forward layer. Parametric embeddings are transformed into contextual embeddings through the Transformer blocks.

 - **Self-attention layer:** A self-attention layer comprises three parametric matrices - Key, Query and Value. Each input embedding is projected on to these matrices, to generate their key ($K \in \mathbb{R}^{m \times k}$), query ($Q \in \mathbb{R}^{m \times k}$) and value ($V \in \mathbb{R}^{m \times v}$) vectors.

 $$
 \text{Attention}(Q, K, V) = \text{softmax}(\frac{QK^T}{\sqrt{k}})V
 $$

 - **Feed-forward layer:** Through two position-wise feed-forward layers, embeddings are expanded to four times its size and projected back to its original size. When using ReLU activation, feed-forward network's formula is as follows:

 $$
 \text{FFN}(x) = \text{max}(0, \: x W_1 + b_1) W_2 + b_2
 $$

- **MLP:** The outputs of the last Transformer layer, which are contextualized embeddings of the categorical features, are concatenated along with the numerical input features to form a final feature vector. This vecter is inputted into an MLP to predict the target.

In [None]:
class MLPBlock(nn.Module):
    def __init__(self, n_features, hidden_units,
                 dropout_rates):
        super().__init__()
        self.mlp_layers = nn.Sequential()
        num_features = n_features
        for i, units in enumerate(hidden_units):
            self.mlp_layers.add_module(f'norm_{i}', nn.BatchNorm1d(num_features))
            self.mlp_layers.add_module(f'dense_{i}', nn.Linear(num_features, units))
            self.mlp_layers.add_module(f'act_{i}', nn.SELU())
            self.mlp_layers.add_module(f'dropout_{i}', nn.Dropout(dropout_rates[i]))
            num_features = units

    def forward(self, x):
        y = self.mlp_layers(x)
        return y

In [None]:
class TabTransformerBlock(nn.Module):
    def __init__(self, num_heads, emb_dim,
                 attn_dropout_rate, ff_dropout_rate):
        super().__init__()
        self.attn = nn.MultiheadAttention(emb_dim, num_heads,
                                          dropout=attn_dropout_rate,
                                          batch_first=True)
        self.norm_1 = nn.LayerNorm(emb_dim)
        self.norm_2 = nn.LayerNorm(emb_dim)
        self.feedforward = nn.Sequential(
            nn.Linear(emb_dim, emb_dim*4),
            nn.GELU(),
            nn.Dropout(ff_dropout_rate),
            nn.Linear(emb_dim*4, emb_dim))

    def forward(self, x_cat):
        attn_output, attn_output_weights = self.attn(x_cat, x_cat, x_cat)
        x_skip_1 = x_cat + attn_output
        x_skip_1 = self.norm_1(x_skip_1)
        feedforward_output = self.feedforward(x_skip_1)
        x_skip_2 = x_skip_1 + feedforward_output
        x_skip_2 = self.norm_2(x_skip_2)
        return x_skip_2

In [None]:
class TabTransformer(nn.Module):
    def __init__(self, numerical_columns, categorical_columns,
                 num_transformer_blocks, num_heads, emb_dim,
                 attn_dropout_rates, ff_dropout_rates,
                 mlp_dropout_rates,
                 mlp_hidden_units_factors,
                 ):
        super().__init__()
        self.transformers = nn.Sequential()
        for i in range(num_transformer_blocks):
            self.transformers.add_module(f'transformer_{i}',
                                        TabTransformerBlock(num_heads,
                                                            emb_dim,
                                                            attn_dropout_rates[i],
                                                            ff_dropout_rates[i]))

        self.flatten = nn.Flatten()
        self.num_norm = nn.LayerNorm(len(numerical_columns))

        self.n_features = (len(categorical_columns) * emb_dim) + len(numerical_columns)
        mlp_hidden_units = [int(factor * self.n_features) \
                            for factor in mlp_hidden_units_factors]
        self.mlp = MLPBlock(self.n_features, mlp_hidden_units,
                            mlp_dropout_rates)

        self.final_dense = nn.Linear(mlp_hidden_units[-1], 1)
        self.final_sigmoid = nn.Sigmoid()

    def forward(self, x_nums, x_cats):
        contextualized_x_cats = self.transformers(x_cats)
        contextualized_x_cats = self.flatten(contextualized_x_cats)

        if x_nums.shape[-1] > 0:
            x_nums = self.num_norm(x_nums)
            features = torch.cat((x_nums, contextualized_x_cats), -1)
        else:
            features = contextualized_x_cats

        mlp_output = self.mlp(features)
        model_output = self.final_dense(mlp_output)
        output = self.final_sigmoid(model_output)
        return output

In [None]:
## TabTransformer Model Check

## Settings for TabTransformer
emb_dim = model_config['cat_embedding_dim']
num_transformer_blocks = model_config['num_transformer_blocks']
num_heads = model_config['num_heads']
attn_dropout_rates = model_config['tf_dropout_rates']
ff_dropout_rates = model_config['ff_dropout_rates']
mlp_dropout_rates = model_config['mlp_dropout_rates']
mlp_hidden_units_factors = model_config['mlp_hidden_units_factors']

## Building Models
preprocessor = Preprocessor(numerical_columns, categorical_columns,
                            encoder_categories, emb_dim)

model = TabTransformer(numerical_columns, categorical_columns,
                       num_transformer_blocks, num_heads, emb_dim,
                       attn_dropout_rates, ff_dropout_rates,
                       mlp_dropout_rates, mlp_hidden_units_factors)

## Operation, Parameters and Model Structure Check
x_nums, x_cats = preprocessor(sample_data)
y = model(x_nums, x_cats)
print('Numerical Input shape: ', x_nums.shape)
print('Categorical Input shape: ', x_cats.shape)
print('Output shape: ', y.shape)

print('# of Preprocessor parameters: ',
      sum(p.numel() for p in preprocessor.parameters() if p.requires_grad))
print('# of N-BEATS parameters: ',
      sum(p.numel() for p in model.parameters() if p.requires_grad))

model

<a id ="5"></a><h1 style="background:#05445E; border:0; border-radius: 12px; color:#D3D3D3"><center>5. Training</center></h1>
[Back to the TOC](#toc)

In [None]:
## Loss Function
criterion = nn.BCELoss()

## Optimizer and Learning Rate Scheduler
epochs = exp_config['train_epochs']
batch_size = exp_config['batch_size']
steps_per_epoch = len(train_fold) // batch_size

learning_rate = exp_config['learning_rate']
weight_decay = exp_config['weight_decay']
params = list(preprocessor.parameters()) + list(model.parameters())
optimizer = torch.optim.AdamW(
    params=params,
    lr=learning_rate,
    weight_decay=weight_decay
)
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer=optimizer,
    T_max=epochs*steps_per_epoch
)

In [None]:
## Displaying Learning Rate
def lr_plot(lr_scheduler, steps):
    lrs = []
    for _ in range(steps):
        optimizer.step()
        lrs.append(optimizer.param_groups[0]["lr"])
        lr_scheduler.step()
    xs = [i+1 for i in range(steps)]
    plt.figure(figsize=(7,5))
    ax = sns.lineplot(xs, lrs)
    ax.set_xlabel('Steps')
    ax.set_ylabel('Learning Rate')

lr_plot(lr_scheduler, epochs*steps_per_epoch)

## Create New Optimizer and Lr_scheduler
optimizer = torch.optim.AdamW(
    params=params,
    lr=learning_rate,
    weight_decay=weight_decay
)
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer=optimizer,
    T_max=epochs*steps_per_epoch
)

In [None]:
## Function for the Model Training
def train_model(model, preprocessor,
                dl_dict, criterion,
                optimizer, lr_scheduler,
                num_epochs, finalize=False):
    ## Checking usability of GUP
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
    print(f'device: {device}')
    print('-------Start Training-------')
    model.to(device)
    ## We use preprocessor on CPU

    ## training and validation loop
    if finalize:
        phases = ['train']
    else:
        phases = ['train', 'val']

    losses = {phase: [] for phase in phases}
    for epoch in range(num_epochs):
        for phase in phases:
            if phase == 'train':
                preprocessor.train()
                model.train()
            else:
                preprocessor.eval()
                model.eval()

            epoch_loss = 0.0
            epoch_corrects = 0

            for data, labels in tqdm(dl_dict[phase]):
                x_nums, x_cats = preprocessor(data)

                x_nums = x_nums.to(device)
                labels = labels.to(device)

                ## Optimizer Initialization
                optimizer.zero_grad()

                ## Forward Processing
                with torch.set_grad_enabled(phase=='train'):
                    outputs = model(x_nums, x_cats)
                    loss = criterion(outputs, labels)
                    preds = torch.where(outputs>0.5, 1., 0.)

                    ## Backward Processing and Optimization
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()
                        lr_scheduler.step()

                    epoch_loss += loss.item() * x_cats.size(0)
                    epoch_corrects += torch.sum(preds == labels)

            epoch_loss = epoch_loss / len(dl_dict[phase].dataset)
            losses[phase].append(epoch_loss)
            epoch_acc = epoch_corrects / len(dl_dict[phase].dataset)

            ## Displaying results
            print('Epoch {}/{} | {:^5} |  Loss: {:.4f} Acc: {:.4f}'.\
                  format(epoch+1, num_epochs, phase, epoch_loss, epoch_acc))

    return model, preprocessor, losses

In [None]:
## Function for Plotting Losses
def plot_losses(losses, title=None):
    plt.figure(figsize=(7, 5))
    losses = pd.DataFrame(losses)
    losses.index = [i+1 for i in range(len(losses))]
    ax = sns.lineplot(data=losses)
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Loss')
    ax.legend()
    ax.set_title(title)

In [None]:
## Training
model_trained, preprocessor_trained, losses = train_model(
    model,
    preprocessor,
    dl_dict,
    criterion,
    optimizer,
    lr_scheduler,
    epochs
)

## Plot Losses
plot_losses(losses)

<h2 style="background:#D4F1F4; border:0; border-radius: 12px; color:black"><center>Finalizing Training</center></h2>

In [None]:
## Finalizing Training
if exp_config['finalize']:

    ## Making Dataset and DataLoader for Finalizing
    train_all_ds = SpaceshipDataset(
        train,
        numerical_columns,
        categorical_columns,
        target='Transported'
    )

    train_all_dl = torch.utils.data.DataLoader(
        train_all_ds,
        batch_size=batch_size,
        shuffle=True,
        drop_last=True
    )

    finalize_dl_dict = {'train': train_all_dl}

    ## Building Models
    preprocessor = Preprocessor(
        numerical_columns,
        categorical_columns,
        encoder_categories,
        emb_dim
    )

    model = TabTransformer(
        numerical_columns,
        categorical_columns,
        num_transformer_blocks,
        num_heads, emb_dim,
        attn_dropout_rates,
        ff_dropout_rates,
        mlp_dropout_rates,
        mlp_hidden_units_factors
    )

    ## Loss Function
    criterion = nn.BCELoss()

    ## Optimizer and Learning Rate Scheduler
    epochs = exp_config['finalize_epochs']
    batch_size = exp_config['batch_size']
    steps_per_epoch = len(train_fold) // batch_size

    learning_rate = exp_config['learning_rate']
    weight_decay = exp_config['weight_decay']
    params = list(preprocessor.parameters()) + list(model.parameters())
    optimizer = torch.optim.AdamW(params=params,
                                  lr=learning_rate,
                                  weight_decay=weight_decay)
    lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer=optimizer,
        T_max=epochs*steps_per_epoch
    )

    ## Training
    model_trained, preprocessor_trained, losses = train_model(
        model,
        preprocessor,
        finalize_dl_dict,
        criterion,
        optimizer,
        lr_scheduler,
        epochs,
        finalize=True
    )

    ## Plot Losses
    plot_losses(losses)

<a id ="6"></a><h1 style="background:#05445E; border:0; border-radius: 12px; color:#D3D3D3"><center>6. Prediction</center></h1>
[Back to the TOC](#toc)

In [None]:
## Prediction
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

preprocessor_trained.eval()
model_trained.eval()
model_trained.to(device)

probas = []

for data in (test_dl):
    x_nums, x_cats = preprocessor_trained(data)
    x_nums = x_nums.to(device)
    x_cats = x_cats.to(device)

    with torch.set_grad_enabled(False):
        outputs = model_trained(x_nums, x_cats)
        outputs = torch.squeeze(outputs)
        outputs = outputs.to('cpu').detach().numpy().copy()
        probas.append(outputs)

## post-processing
probas = np.concatenate(probas)
preds = np.where(probas > 0.5, True, False)
submission_df['Transported'] = preds
submission_df.to_csv('submission_cv.csv', index=False)
submission_df.head(10)

---
#### Work in Progress...

I'm going to update this notebook in soon. Please come again and checkout the progress. Thank you for reading!