***QuantileTransformer** had made a big stir in **MOA competition** last year and I could remember that we more less everybody used it in our preprocessing pipeline. So , I just thought to use it or rather check its impact in this dataset. In that process, I would implement **TabNet** (another high impactful component from MOA) with and without QuantileTransformer. 
I have tried to keep this notebook clean and simple and anyone can simply segregate "WITH" and "WITHOUT" part and use them separately as well.
Point to be noted, I haven't used any out of fold prediction to compare the impact of QT between two implementation rather used separate submissions :)., hope that would be more intriguing.*

QuantileTransformer could be considered as a scaling/normalization process to normalize your data. We know, StandardScaler and MinmaxScaler are there to handling scaling but let's see how QuantileTransformer works and how it effects our data.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.model_selection import train_test_split,KFold,StratifiedKFold
from sklearn.preprocessing import QuantileTransformer

import torch

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
! pip install pytorch-tabnet

# ***TabNet without QuantileTransformer***

In [None]:
train = pd.read_csv("../input/tabular-playground-series-jun-2021/train.csv")
test = pd.read_csv("../input/tabular-playground-series-jun-2021/test.csv")
sub_df = pd.read_csv("../input/tabular-playground-series-jun-2021/sample_submission.csv")

In [None]:
train.head()
train.iloc[:,1:76].shape

In [None]:
X_train = train.iloc[:,1:76].to_numpy()
y_train = train['target'].to_numpy()
X_test = test.iloc[:,1:].to_numpy()

In [None]:
folds = StratifiedKFold(n_splits=5,random_state=42,shuffle=True)

In [None]:
from pytorch_tabnet.tab_model import TabNetClassifier

preds = np.zeros((len(X_test),9))

for fold,(train_idx,val_idx) in enumerate(folds.split(X_train,y_train)):
    print(fold)
    trainX,trainY = X_train[train_idx],y_train[train_idx]
    valX,valY = X_train[val_idx],y_train[val_idx]
    clf = TabNetClassifier(verbose=1,seed=42)
    clf.fit(X_train=trainX,y_train=trainY,eval_set=[(trainX,trainY),(valX,valY)],patience=7,max_epochs=40,drop_last=False,eval_metric=['logloss'])
    #oof[val_idx] = clf.predict_proba(valX)
    preds += clf.predict_proba(X_test)/folds.n_splits

In [None]:
for i in range(1,10):
    col_name = 'Class_'+str(i)
    sub_df[col_name] = preds[:,i-1]

In [None]:
sub_df.to_csv("submission.csv",index=False)

***This gives me a score of 1.76739 in Public test dataset***

# ***TabNet Using QuantileTransformer***

In [None]:
train = pd.read_csv("../input/tabular-playground-series-jun-2021/train.csv")
test = pd.read_csv("../input/tabular-playground-series-jun-2021/test.csv")
sub_df = pd.read_csv("../input/tabular-playground-series-jun-2021/sample_submission.csv")

In [None]:
train.head()
train.iloc[:,1:76].shape

In [None]:
X_train = train.iloc[:,1:76].to_numpy()
y_train = train['target'].to_numpy()
X_test = test.iloc[:,1:].to_numpy()

In [None]:
trans = QuantileTransformer(n_quantiles=100, output_distribution='normal')
X_train = trans.fit_transform(X_train)
X_test = trans.transform(X_test)

In [None]:
#trainX,valX,trainy,valy = train_test_split(X_train,y_train,test_size=0.2,shuffle=True,random_state=42)

In [None]:
    #n_d=64, n_a=64, n_steps=5,
    #gamma=1.5, n_independent=2, n_shared=2,
    #lambda_sparse=1e-4, momentum=0.3, clip_value=2.,
    #optimizer_fn=torch.optim.Adam,
    #optimizer_params=dict(lr=2e-2),
    #scheduler_params = {"gamma": 0.95,
    #                 "step_size": 20},
    #scheduler_fn=torch.optim.lr_scheduler.StepLR, epsilon=1e-15,

In [None]:
folds = StratifiedKFold(n_splits=5,random_state=42,shuffle=True)

In [None]:
from pytorch_tabnet.tab_model import TabNetClassifier

preds = np.zeros((len(X_test),9))

for fold,(train_idx,val_idx) in enumerate(folds.split(X_train,y_train)):
    print(fold)
    trainX,trainY = X_train[train_idx],y_train[train_idx]
    valX,valY = X_train[val_idx],y_train[val_idx]
    clf = TabNetClassifier(verbose=1,seed=42)
    clf.fit(X_train=trainX,y_train=trainY,eval_set=[(trainX,trainY),(valX,valY)],patience=7,max_epochs=40,drop_last=False,eval_metric=['logloss'])
    #oof[val_idx] = clf.predict_proba(valX)
    preds += clf.predict_proba(X_test)/folds.n_splits

In [None]:
#from pytorch_tabnet.tab_model import TabNetClassifier

#classifier = TabNetClassifier(verbose=1,seed=42)
#classifier.fit(X_train=trainX,y_train=trainy,eval_set=[(trainX,trainy),(valX,valy)],patience=7,max_epochs=40,drop_last=False,eval_metric=['logloss'])

In [None]:
#preds = classifier.predict_proba(X_test)

In [None]:
preds.shape

In [None]:
for i in range(1,10):
    col_name = 'Class_'+str(i)
    sub_df[col_name] = preds[:,i-1]

In [None]:
sub_df.head()

In [None]:
sub_df.to_csv("submission_qt.csv",index=False)

This gives me a score of 1.75099 in Public test set. So , we could see there is impact whether it's going to be same in Private Date set , we will know very soon don't we :)