Hi all, as you may know, 2020 brought the first (that I know) deep learning architecture for tabular data ([here](https://arxiv.org/pdf/1908.07442.pdf) the paper for those interested). Since it proved to be high-performing for many datasets, including in some of the latest kaggle [competitions](https://www.kaggle.com/c/lish-moa/notebooks?competitionId=19988&sortBy=scoreAscending&searchQuery=tabnet), why not give a try here? 


### Props to: 

- [pytorch-tabnet](https://github.com/dreamquark-ai/tabnet) I think this library should get a round of applause, fitting a neural network is as easy as a scikit-learn estimator;

- [binary-classification-example](https://github.com/dreamquark-ai/tabnet/blob/develop/census_example.ipynb) a notebook with an example tailored for our use case;

- https://www.kaggle.com/wilddave/xgb-starter I want this to be a complementary to his notebook. 


I don't have much time (and knowledge) to add a proper torch customization or to use the pretrainer, but I may do it in the next weeks. 

**Please let me know what you think about it!**

<img src="https://www.europol.europa.eu/sites/default/files/images/finance_budget.jpg">




## Install pytorch-tabnet, read data, downcast Training, make env

In [None]:
!pip install pytorch-tabnet

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import re
import datatable as dt

import torch
from pytorch_tabnet.tab_model import TabNetClassifier, TabNetRegressor
from sklearn.model_selection import StratifiedKFold

DEVICE = 'gpu' if torch.cuda.is_available() else 'cpu'

input_path = '/kaggle/input/'
root_path = os.path.join(input_path, 'jane-street-market-prediction')

#reading files
train = dt.fread(os.path.join(root_path, "train.csv")).to_pandas()
float64_cols = train.select_dtypes('float64').columns
train[float64_cols] = train[float64_cols].astype('float32')
resp_cols = [i for i in train.columns if 'resp' in i]
meta_features = dt.fread(os.path.join(root_path, "features.csv")).to_pandas()


features_names = list(set(train.columns) - set(resp_cols) - set(['weight', 'ts_id', 'date']))
features_index = list(map(lambda x: int(re.sub("feature_", "", x)), features_names))
features_tuples = sorted(list(zip(features_names, features_index)), key = lambda x: x[1])
just_features = [i[0] for i in features_tuples]

import janestreet
env = janestreet.make_env()

## Fill Na, define features and target for model

In [None]:
train = train.loc[train['weight'] != 0]
# binarize the target
train['action'] = (train['resp'].values > 0).astype(int)
#train = train.fillna(-99999)
f_mean = train.mean()
train = train.fillna(f_mean)

# split data for training and free data space usage to prevent exceeding maximum allowed
X_features = train.loc[:, just_features]
y_target = train.loc[:, 'action']
del train

print('Finished.')

# Fit and Predict

### Hyperparameters and use GPU

In [None]:
MAX_EPOCHS = 200
BATCH_SIZE = 1024
VIRTUAL_BATCH_SIZE = 128
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

print("Using {}".format(DEVICE))

### TabNetClassifier

In [None]:
clf = TabNetClassifier(n_d=64, n_a=64, n_steps=5,
                       gamma=1.5, n_independent=2, n_shared=2,
                       cat_emb_dim=1, lambda_sparse=1e-4, 
                       momentum=0.3, clip_value=2., optimizer_fn=torch.optim.Adam,
                       optimizer_params=dict(lr=2e-2), scheduler_params = {"gamma": 0.95,
                         "step_size": 20},
                       scheduler_fn=torch.optim.lr_scheduler.StepLR, epsilon=1e-15,
                       device_name = DEVICE
)

clf.fit(X_train=X_features.values, y_train=y_target.values,
    max_epochs=MAX_EPOCHS , patience=20,
    batch_size=BATCH_SIZE, virtual_batch_size=VIRTUAL_BATCH_SIZE,
    num_workers=0,
    weights=1,
    drop_last=False
)


# Submit

By default I don't, be sure to switch *I_WANT_TO_SUBMIT* to *True*

In [None]:
# perform test and create submissions file
print('Creating submissions file...', end='')
rcount = 0
I_WANT_TO_SUBMIT = True
if I_WANT_TO_SUBMIT: 
    for (test_df, prediction_df) in env.iter_test():
        X_test = test_df.loc[:, just_features].fillna(f_mean)
        y_preds = clf.predict(X_test.values)
        prediction_df.action = y_preds.item()
        env.predict(prediction_df)
        rcount += len(test_df.index)
    print(f'Finished processing {rcount} rows.')

# Final Take

I find it as easy to use as other gradient boosting alternatives. Please tell me what you think in the comments. 