This is a short notebook to show a simple way to extract trade indicators from the given data for those, who like myself, aren't very knowledgable about this domain. This method is based on the methodology as described in [this paper by Ortu et al](https://arxiv.org/pdf/2102.08189.pdf)

The authors categorize the features to their model as:
* Technical indicators: Such as those already provided to us in this competition
* Trade indicators: Additional features calculated from the technical indicators such as Relative Strength Index(RSI), price momentum index etc. 
* Social indicators: These include features obtained through sentiment analysis of social media posts

Due to the constraints of this competition, we cannot add social indicators, but we can compute the trade indicators from data already provided. This notebook will show you how. 

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime
import os
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

Import all the usual packages and read the data as you usually do.

In [None]:
data_path = '../input/g-research-crypto-forecasting'
assets = pd.read_csv(os.path.join(data_path, 'asset_details.csv'))
train_df = pd.read_csv(os.path.join(data_path, 'train.csv'))
train_df['asset_name'] = train_df.Asset_ID.map(assets.set_index('Asset_ID').Asset_Name)
print(f'There are {len(train_df)} rows in the dataset')
train_df.head()

In [None]:
import time
totimestamp = lambda s: np.int32(time.mktime(datetime.strptime(s, "%d/%m/%Y").timetuple()))

To demonstrate this method I'm going to choose only a small subset of the data. Here I'm extracting only data from bitcoin for a three month period. 

In [None]:
bit = train_df[train_df.Asset_ID == 1].set_index('timestamp')
bit = bit.loc[totimestamp('01/01/2021'):totimestamp('01/04/2021')]
bit.head()

Now, to extract these features we'll be using the stockstats package. You can find the documentation [here](https://pypi.org/project/stockstats/). It is essentially a wrapper to a pandas DataFrame that is preloaded with formulas for about 30 indicators. 

All you have to do is use the *retype* function to cast your pandas DataFrame into a *StockDataFrame*.

This package assumes that your data is sorted by time and contains certain columns. You also have to make sure your column names match what it expects. Here are the columns it requires:

* open: corresponding to Open in our dataset
* close: corresponding to Close in our dataset
* high: corresponding to High in our dataset
* low: corresponding to Low in our dataset
* volume: corresponding to Volume in our dataset
* amount: corresponding to Count in our dataset


In [None]:
#Installing and importing the package
!pip install stockstats
from stockstats import StockDataFrame

In [None]:
#Preparing the columns 
bit.drop(['Asset_ID','asset_name', 'VWAP', 'Target'], axis=1, inplace=True)
bit.rename(columns = {'Count':'amount','Open':'open','High':'high','Low':'low',
                     'Close':'close','Volume':'volume'}, inplace=True)
bit.head()

In [None]:
stock = StockDataFrame.retype(bit)

Before you calculate the features you can tune the default values as I have done below. For complete list of options that are available for tuning and extraction refer the docs.

In [None]:
KDJ_WINDOW = 10
BOLL_WINDOW = 10
MACD_EMA_SHORT = 10
PDI_SMMA = 10
MDI_SMMA = 10
DX_SMMA = 10
ADX_EMA = 5
ADXR_EMA = 10
TRIX_EMA_WINDOW = 10
TEMA_EMA_WINDOW = 10


In [None]:
#Creating the features
bit['volume_delta'] = stock['volume_delta']
bit['open_-2_r'] = stock['open_-2_r']
bit['cr']= stock['cr']
bit['kdjk'] = stock['kdjk']
bit['open_2_sma'] = stock['open_2_sma']
bit['macd'] = stock['macd']
bit['boll'] = stock['boll']
bit['boll_ub'] = stock['boll_ub']
bit['boll_lb'] = stock['boll_lb']
bit['rsi_12'] = stock['rsi_12']
bit['wr_10'] = stock['wr_10']
bit['cci'] = stock['cci']
bit['tr'] = stock['tr']
bit['atr'] = stock['atr']
bit['dma'] = stock['dma']
bit['pdi'] = stock['pdi']
bit['dx'] = stock['dx']
bit['adx'] = stock['adx']
bit['adxr'] = stock['adxr']
bit['trix'] = stock['trix']
bit['tema'] = stock['tema']
bit['vr'] = stock['vr']

In [None]:
bit.head()

I wanted to see how these features would perform in a model, so I built a small model to see if there's any improvement while using these features. Instead of building a model to predict the target as we're doing in this competition, I created a new target for a classification problem as was done in the paper stated above. A '1' corresponds to a price increase and '0' to a decrease. So, essentially this model will predict if the price of bitcoin will increase or decrease in the next minute. 

In [None]:
#Creating new target
bit['target'] = bit.open.shift(-1) - bit.close
bit['target'] = bit.target.apply(lambda x:1 if x>0 else 0)
bit.head()

In [None]:
bit.target.value_counts()

In [None]:
X = bit.drop(['target'],axis=1).values
y = bit.target.values

In [None]:
train_size = int(len(X)*0.70)
train, test = X[0:train_size], X[train_size:len(X)]
y_train, y_test = y[0:train_size], y[train_size:len(X)]
print('Observations: %d' % (len(X)))
print('Training Observations: %d' % (len(train)))
print('Testing Observations: %d' % (len(test)))

In [None]:
from xgboost import XGBClassifier
model = XGBClassifier( learning_rate=0.05, n_estimators=500, 
                      subsample=1.0,alpha=0, tree_method = 'gpu_hist')
model.fit(train, y_train)

In [None]:
y_pred = model.predict(test)

In [None]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print(f'model accuracy is :{accuracy}')

from sklearn.metrics import precision_score
precision = precision_score(y_test, y_pred, average='binary')
print(f'model precision is :{precision}')

from sklearn.metrics import recall_score
recall = recall_score(y_test, y_pred, average='binary')
print(f'model recall is :{recall}')

This is a 3% increase in model accuracy while using these newly extracted features(Run this notebook while commenting out the extracted features to obtain the score without). 

Note: This is not a definitive evaluation, I have not stress tested this with bigger data, optimised the algorithm or even evaluated which of, or even all, these features affect the model score. The goal of this notebook was purely to show the feature extraction.

I hope you have enjoyed this notebook and that this helps you in building better models for yourselves. 