This notebook doesn't do anything novel in terms of features, nor architecture. Instead I want to show how you can structure your code and data to run experiments in a fast and concise manner. 

The preprocessing code and LGB models were taken from [here](https://www.kaggle.com/tatudoug/stock-embedding-ffnn-features-of-the-best-lgbm) and based on [this](https://www.kaggle.com/ragnar123/optiver-realized-volatility-lgbm-baseline)

This is how it works:
- The training set with features is cached and loaded from https://www.kaggle.com/slawekbiel/optiver-train-features
- The code to generate those features is saved in an Utility Script: https://www.kaggle.com/slawekbiel/optiver-features and used to process the test data.
- fast.ai library handles defining the NN model and preparing the data for it (normalization, embeddings, batching etc)
- Both fastai nad LGB models are trained locally, serialized and then pushed to the dataset: https://www.kaggle.com/slawekbiel/optiver-models

In [None]:
from optiver_features import generate_test_df
from fastai.tabular.all import *

In [None]:
test_df = generate_test_df()
train_df = pd.read_csv('../input/optiver-train-features/train_with_features.csv')

In [None]:
def pred_tabular_nn(train_df, test_df):
    train_df = train_df.drop(['time_id', 'row_id'], axis=1).fillna(0)
    train_df.stock_id = train_df.stock_id.astype('category')
    cont_nn,cat_nn = cont_cat_split(train_df,  dep_var='target')
    dls = TabularPandas(train_df, [Categorify, Normalize], cat_nn, cont_nn, y_names='target').dataloaders(2048)
    test_dl = dls.test_dl(test_df.fillna(0))
    learn = tabular_learner(dls, y_range=(0,.1), layers=[1000,500,200], n_out=1, path = '../input/optiver-models/')
    res = torch.zeros(len(test_df))
    for idx in range(5):
        learn.load(f'nn_fold{idx}')
        preds, _ = learn.get_preds(dl=test_dl)
        res += preds.squeeze() / 5
    return res.numpy()

In [None]:
def pred_lgb(test_df):
    test_df = test_df.drop(['row_id', 'time_id'], axis=1)
    res = np.zeros(len(test_df))
    for idx in range(10):
        filename = f'../input/optiver-models/models/lgb_fold{idx}.pickle'
        model = pickle.load(open(filename, 'rb'))
        preds = model.predict(test_df)
        res += preds / 10
    return res

In [None]:
nn_preds = pred_tabular_nn(train_df, test_df)
lgb_preds = pred_lgb(test_df)
rate = 0.570
test_df['target']=(1-rate) * lgb_preds + rate * nn_preds
test_df[['row_id', 'target']].to_csv('submission.csv', index =False)
pd.read_csv('submission.csv').head()