# Welcome!

Hey there, just wanted to share a simple end to end solution using the N-Beats implementation. 
Originally I was trying to reimplement the N-Beats paper with keras, though I eventually found out theres an easy to use github repo: https://github.com/philipperemy/n-beats

Thanks to pythonash for his notebook to help me get started with this dataset: https://www.kaggle.com/code/pythonash/end-to-end-simple-and-powerful-dnn-with-leakyrelu

This notebook is meant to be a very simple guide - quick and easy to run. With no kfold, ensembling and learning rate tuning implemented, it acheived 0.1358 score with only 1 epochs. 

I also didn't use the investment ID to make the notebook even easier to understand. 

To improve the model, it is recommended to try kfold, ensembling, including the investment ID and hyperparameter tuning. It is not an exact replication of the paper - which uses an ensemble of different loss functions and horizon windows. Changing batch size, blocks per stack and number of hidden layers may improve model as it seems to overfit quite quickly. 

**If you found this helpful, please leave an upvote!**

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
import ubiquant
from distutils.dir_util import copy_tree

copy_tree("../input/nbeats-keras/nbeats_keras", "./nbeats_keras") # copy into nbeats working directory
from nbeats_keras.model import NBeatsNet as NBeatsKeras

### Data prep

In [None]:
df = pd.read_parquet('../input/ubiquant-parquet/train_low_mem.parquet').astype('float16') # convert to float16 to prevent out of memory
f_col = df.drop(['row_id','time_id','investment_id','target'],axis=1).columns
df.head()

In [None]:
f_col

In [None]:
df_y = pd.DataFrame(df['target'])
df_x = df[f_col]
del df #free up memory

In [None]:
x, y = df_x.values, df_y.values
num_samples, time_steps, input_dim, output_dim = len(x), len(x[0]), 1, 1
print(num_samples, time_steps)

### NBeats Model

In [None]:
backend = NBeatsKeras(
            backcast_length=time_steps, forecast_length=output_dim,
            stack_types=(NBeatsKeras.GENERIC_BLOCK, NBeatsKeras.GENERIC_BLOCK),
            nb_blocks_per_stack=4, thetas_dim=(4, 4), share_weights_in_stack=True,
            hidden_layer_units=512
        )

backend.compile(loss='mae', optimizer='adam')

c = num_samples // 10 # 10% for validation
x_train, y_train, x_test, y_test = x[c:], y[c:], x[:c], y[:c]

In [None]:
backend.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=1, batch_size=1024)

### Submission

In [None]:
env = ubiquant.make_env()   
iter_test = env.iter_test()    
for (test_df, sample_prediction_df) in iter_test:
    test_df = test_df[f_col]
    sample_prediction_df['target'] = np.squeeze(backend.predict(test_df))
    env.predict(sample_prediction_df)