# MSDS 8310 - Project 2: Automated Trading Strategy

### Investigators
- [Matt Baldree](mailto:mbaldree@smu.edu)


# Abstract 

In this project, leading and lagging financial technical indicators were applied to cryptocurrency asset, Bitcoin, historical closing price to determine a trading strategy. The trading strategy was used to buy, sell, and hold assets and compared with a buy and hold strategy. Various closing resolutions and algorithms were explored for understanding but not all were documented. The resulting lower resolution closing labeled data of 2 hours was fitted with a stochastic gradient boosting algorithm, `XGBoost`, to predict a strategy based on time series close price and volume data plus derived data. Accuracy of learned algorithm was 95.4%. RSI proved to be the most important feature.

# Introduction

Traders, especially new traders, have a difficult time determining when he or she should enter or exit a market. Without objective data or signals, the trader or investor might choose to sit out of the market or buy and sell based on naive thinking. Experienced traders learn to read stock charts and leverage technical indicators and overlays to make trading decisions. Traders of traditional corporate stock assets use additional criteria such as P/E ratios, balance sheets, etc. Cryptocurrency assets do not have such information and must instead use social media sentiment/growth, developer capabilities, marketing strategy, liquidity, etc. Obtaining this information is difficult leaving the investor to rely on gut and traditional trading tools like technical indicators. In addition, cryptocurrencies trade 24x7 across various exchanges. 

A technical indicator is "any class of metrics whose value is derived from generic price activity in a stock or asset [1]." There are two kinds of techincal indicators, leading and lagging, that try to predict the future or general price direction of a security by looking at past patterns. Leading indicators signal future events. Lagging indicators follows an event. The importance of a lagging indicator is its ability to confirm that a pattern is occurring. There are many, many indicators. For this project, two popular indicators, relative strength indicator (RSI) and Bollinger bands (BB), are used to determine a trading strategy [4]. Through trial and error, the indicators were adjusted to fit the pattern of Bitcoin close price for daily and 2 hour samples. An algorithm was developer to incorporate both indicators to determine a trading strategy of *buy*, *sell*, or *do nothing*. This strategy was then applied to buy or sell the asset and the result compared to a buy and hold strategy. 

The resulting labeled data for 2 hour closing price trading strategy was used to train a stochastic gradient boosting machine learning algorithm [5] to predict *buy*, *sell*, or *do nothing strategy* based on time series closing price and volume plus derived data. In addition, a feature ranking and example decision tree plots were provided for deeper understanding [6]. Future work in this project would include the following:
- adding additional technical indicators to deteremine which ones provide the most value in determining trading strategy,
- incorporating other cryptocurrency price history to determine if feature importance is the same,
- automate data aquisition, labeling, and training of algorithm, and
- develop a webservice to provide trading strategy for today or past days.

# Data

The historical pricing data for Bitcoin on Coinbase exchange was obtained from Kaggle with a one minute resolution [2]. This low level resolution allowed me to resample it for any desired resolution. For this project, the data was resampled to 2 hour and one day resolutions. The date range for the data was December 1, 2014 6 am to October 19, 2017 11:59 pm inclusive.  For the 2 hour resolution, only data for 2017 was used providing 3,500 signals. For the daily resolution, data for 2016 and 2017 was used providing ?? signals. 

The resampling of the data was performed with a `Pandas` rule. The rule is expressed in multiples of time. In this case time is one minute, so for a 2 hr resample the rule would be 120T = 2*60*T.

```python
df = df.resample(rule="120T").agg(
		{'open':'first','high':'max','low':'min','close':'last','volbtc':'sum','volusd':'sum','wtdprice':'last'})
```
The details of the pricing data for Bitcoin obtained from Coinbase are the following:

|Attribute |Description                          |
|:---------|:------------------------------------|
|Timestamp |Data and time of the transaction     |
|Open      |Open price for the period            |
|High      |High price for the period            |
|Low       |Low price for the period             |
|Close     |Close price for the period           |
|Vol BTC   |Trade volume of asset for the period |
|Vol USD   |Trade volume of asset in dollars     |
|Wtd Price |Weighted price of asset in dollars   |

Technical indicators were then created from closing price leveraging a multi-platform tool for market analysis, `TA-Lib` [3]. This tool provides developers the capability of creating many technical indicators. I utilized the tool to create BB and RSI indicators. For BB, a 7-day simple moving average for middle band and a 1.5 standard deviation of the middle band to create upper and lower bands. For RSI, a 7-day average gain/loss was used.

```python
# bollinger bands
df['bb_up'],df['bb_mid'],df['bb_low'] = ta.BBANDS(np.asarray(df.close),timeperiod=7,nbdevup=1.5,nbdevdn=1.5,matype=0)
# rsi
df['rsi'] = ta.RSI(np.asarray(df.close),timeperiod=7)

```
An algorithm leveraging both BB and RSI was used to determine a *buy*, *sell*, or *do nothing* trading strategy. This algorithm signals buy if (close lag2 < BB low lag2) and (close lag1 < BB low lag1) and (RSI lag1 < 35). A sell signal is created if (close lag2 > BB up lag2) and (close lag1 > BB up lag1) and (RS lag1 > 85).

```python
# generate trading signals
df['bb_sig'] = 0  # default to do nothing
# if lag2 price is less than lag2 bb lower and the oppostive for lag1 values, then buy signal
df.loc[(df.close_lag2 < df.bb_low_lag2) & (df.close_lag1 < df.bb_low_lag1) & (df.rsi_lag1 < 35), 'bb_sig'] = 1
# if lag2 price is less than lag2 bb upper and the oppostite for lag1 values, then sell signal
df.loc[(df.close_lag2 > df.bb_up_lag2) & (df.close_lag1 > df.bb_up_lag1) & (df.rsi_lag1 > 85), 'bb_sig'] = -1
# first signal will be a buy
df.iloc[0, df.columns.get_loc('bb_sig')] = 1
```

The technical indicators and algorithm are adjusted based on review of the price, strategy, trading, and return charts from the following dataset.

|Attribute    |Description                          |
|:------------|:------------------------------------|
|BB High      |BB high std dev                      |
|BB Mid       |BB mid std dev                       |
|BB Low       |BB low std dev                       |
|Close Lag1   |Close price one period back          |
|RSI          |RSI for period                       |
|RSI Lag1     |RSI one period back                  |
|RSI Lag2     |RSI two periods back                 |
|BB High Lag1 |BB high std dev one period back      |
|BB Mid  Lag1 |BB mid std dev one period back       |
|BB Low  Lag1 |BB low std dev one period back       |
|Close Lag2   |Close price two periods back         |
|BB High Lag2 |BB high std dev two periods back     |
|BB Mid  Lag2 |BB mid std dev two periods back      |
|BB Low  Lag2 |BB low std dev two periods back      |
|Signal       |Trading signal (buy, sell, do noth'g)|
|Strategy     |Trading strategy (buy, sell)         |
|Trade Return |Return for trading strategy          |
|B&H Return   |Return for buy & hold return         |
|Trade Cum    |Cummulative returns for trading      |
|B&H Cum      |Cummulative returns for buy & hold   |

```python
# CHARTING
fig1,ax = plt.subplots(5,sharex=True)
ax[0].plot(df['close'])
ax[0].plot(df['bb_up'],linestyle='--',label='upper')
ax[0].plot(df['bb_mid'],linestyle='--',label='middle')
ax[0].plot(df['bb_low'],linestyle='--',label='lower')
ax[0].legend(loc='upper left')
ax[1].plot(df['rsi'],color='green',label='rsi')
ax[1].axhline(y=85,linestyle='--',color='orange')
ax[1].axhline(y=40,linestyle='--',color='orange')
#ax[1].legend(loc='upper left')
ax[2].plot(df['bb_sig'],marker='o',markersize=5,linestyle='',label='signal',color='red')
#ax[2].legend(loc='upper left')
ax[3].plot(df['bb_rsi_str'],marker='o',markersize=5,linestyle='',label='strategy',color='green')
#ax[3].legend(loc='upper left')
ax[4].plot(df['bb_rsi_cum_returns'],label='Trade')
ax[4].plot(df['bh_cum_returns'],label='Buy & Hold')
ax[4].legend(loc='upper left')
plt.suptitle('BTC 2hr Close Prices, BB (7, 1.5), & RSI (7)')
plt.show()
```

The labeled data was then saved and modified for input into the machine learning algorithm.

```python
# persist df
df.to_csv("./coinbaseBTCUSD-withsignals-2hr.csv")
```

The output signal data was then modified to prepare it for input into the machine learning algorithm.

|Attribute    |Description                          |
|:------------|:------------------------------------|
|Close        |Close price for the period           |
|Vol BTC      |Trade volume of asset for the period |
|BB High      |BB high std dev                      |
|BB Mid       |BB mid std dev                       |
|BB Low       |BB low std dev                       |
|RSI          |RSI for period                       |
|Close Lag1   |Close price one period back          |
|BB Low  Lag1 |BB low std dev one period back       |
|BB High Lag1 |BB high std dev one period back      |
|Close Lag2   |Close price two periods back         |
|BB Low  Lag2 |BB low std dev two periods back      |
|BB High Lag2 |BB high std dev two periods back     |
|RSI Lag1     |RSI one period back                  |
|Signal       |Trading signal (buy, sell, do noth'g)|

The data was then separated into features and labled X and Y.

```python
# split data into X and y
X = dataset[:,0:14]
Y = dataset[:,14]
# CV model
```

The data was then stratified into 10-folds to train the stochastic gradient boost model, `XGBoost`, and an accuracy score was generated.

```python
# note that scikit will automatically determine this is a multiclass problem
model = XGBClassifier()
kfold = StratifiedKFold(n_splits=10, random_state=7)
results = cross_val_score(model, X, Y, cv=kfold)
print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
```

A sample decision tree was plotted along with feature importance. 

```python
model.fit(X, Y)
# plot a single decision tree
plot_tree(model, num_trees=0, rankdir='LR')
# plot feature importance
plot_importance(model)
pyplot.show()
```


# Results
The results of the project are shown below. 

## Trading Strategy

### 2 Hour
<img src='label-2hr.png'>

### Daily
<img src='label-daily.png'>

### Summary

|Summary       |2hr Trade|2hr Buy & Hold|Daily Trade|Daily Buy & Hold|
|:-------------|:--------|:-------------|:----------|:---------------|
|Return        |2.17     |4.88          |3.23       |4.74            |
|Std Dev       |0.21     |0.27          |0.69       |0.87            |
|Sharpe (Rf=0%)|10.27    |18.39         |4.70       |5.45            |

### Signals

|Signal     |2hr Strategy|Daily Strategy|
|:----------|:-----------|:-------------|
|Buy        |92          |6             |
|Sell       |56          |17            |
|Do Nothing |3356        |269           |
|Total      |3504        |292           |


## Machine Learning

|Summary  |2hr Strategy|
|:--------|:-----------|
|Accuracy |95.44%      |
|Std Dev  |1.95%       |

### Decision Tree
<img src='model-tree-2hr.png'>

### Feature Importance
|Attribute    |Feature|
|:------------|:------|
|Close        |f0     |
|Vol BTC      |f1     |
|BB High      |f2     |
|BB Mid       |f3     |
|BB Low       |f4     |
|RSI          |f5     |
|Close Lag1   |f6     |
|BB Low  Lag1 |f7     |
|BB High Lag1 |f8     |
|Close Lag2   |f9     |
|BB Low  Lag2 |f10    |
|BB High Lag2 |f11    |
|RSI Lag1     |f12    |
|RSI Lag2     |f13    |
<img src='model-feature-2hr.png'>


# Analysis
Reviewing the results yields a few insights. One, the buy and hold trading strategy produced better results than signal trading strategy; i.e., 1.79 sharpe ratio better for 2 hour sample. The algorithm should be explored to yield better results. Second, the 2 hour close sample yielded 12x more signal points than daily sample which is important to train a classifier. In order to use the daily sample, a larger sample size should be used. Third, the accuracy of the learned classifier was 95.4% with a standard deviation of 1.95% without tuning. Fourth, RSI features were 4.2x more important in classifying the signals than BB.

$$ \frac{\sum F\ score_{RSI}}{\sum F\ score_{BB}} $$


# Conclusion
This project was able to label a Bitcoin dataset with an trading algorithm built on a combination of multiple technical indicators. The labeled data was used to successfully train a stochastic gradient boost algorithm, `XGBOOST` yielding a 95.4% accuracy. RSI proved to be the more important techincal indicator in classifying trading strategy. These results are limited to this dataset.

# Future Work

Future work in this project would include the following:
- adding additional technical indicators to determine which ones provide the most value in determining trading strategy,
- incorporating other cryptocurrency price history to determine if feature importance is the same,
- automate data acquisition, labeling, and training of algorithm,
- develop a webservice to provide trading strategy for given date range. 

# References

- [1] Techincal Indicator. Investopedia, https://www.investopedia.com/terms/t/technicalindicator.asp.
- [2] Zielak. Coinbase Bitcoin Historical Data. Kaggle, https://www.kaggle.com/mczielinski/bitcoin-historical-data/data.
- [3] TA-Lib. http://www.ta-lib.org.
- [4] Stock Technical Analysis with Python,https://www.udemy.com/stock-technical-analysis-with-python/.
- [5] XGBoost, https://xgboost.readthedocs.io/en/latest/.
- [6] XGBoost with Python, https://machinelearningmastery.com/xgboost-with-python/.

# Appendix

## Typical label python code

```python
# Code modified and fixed from Stock Technical Analysis with Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import talib as ta

### DATA
df = pd.read_csv("./coinbaseBTCUSD_1min_2014-12-01_to_2017-10-20.csv")
df.timestamp = pd.to_datetime(df.timestamp,unit='s')  # timestamp is in seconds
df.index = df.timestamp
del df['timestamp']
df = df.loc['2014-12-01T06:00:00':'2017-10-19T23:59:00']  # remove rows that do no lie within the hour window

# resample to hourly data to daily
df = df.resample(rule="120T").agg(
		{'open':'first','high':'max','low':'min','close':'last','volbtc':'sum','volusd':'sum','wtdprice':'last'})
# 2017- data
df = df['1-1-2017':].copy()
# bollinger bands
df['bb_up'],df['bb_mid'],df['bb_low'] = ta.BBANDS(np.asarray(df.close),
                                                  timeperiod=7,nbdevup=1.5,nbdevdn=1.5,matype=0)
# rsi
df['rsi'] = ta.RSI(np.asarray(df.close),timeperiod=7)

### TRADING SIGNAL (buy=1 , sell=-1, hold=0)
# price cross over BB and RSI cross over threshold
# backteset BB to avoid back-testing bias
df['close_lag1'] = df.close.shift(1)
df['bb_low_lag1'] = df.bb_low.shift(1)
df['bb_up_lag1'] = df.bb_up.shift(1)
df['close_lag2'] = df.close.shift(2)
df['bb_low_lag2'] = df.bb_low.shift(2)
df['bb_up_lag2'] = df.bb_up.shift(2)
df['rsi_lag1'] = df.rsi.shift(1)
df['rsi_lag2'] = df.rsi.shift(2)

# generate trading signals
df['bb_sig'] = 0  # default to do nothing
# TODO: refine until the signals look right!!!!
# if lag2 price is less than lag2 bb lower and the oppostive for lag1 values, then buy signal
df.loc[(df.close_lag2<df.bb_low_lag2) & (df.close_lag1<df.bb_low_lag1) & (df.rsi_lag1<35),'bb_sig'] = 1
# if lag2 price is less than lag2 bb upper and the oppostite for lag1 values, then sell signal
# TODO: need to add a check for enough profit before selling???
df.loc[(df.close_lag2>df.bb_up_lag2) & (df.close_lag1>df.bb_up_lag1) & (df.rsi_lag1>85),'bb_sig'] = -1
# first signal will be a buy
df.iloc[0,df.columns.get_loc('bb_sig')] = 1

print(df.bb_sig.value_counts())

### TRADING STRATEGY
# own asset=1, not own asset=0
df['bb_rsi_str'] = 1
bb_rsi_str = 0
for i,r in enumerate(df.iterrows()):
	if r[1]['bb_sig']==1:
		bb_rsi_str = 1
	elif r[1]['bb_sig']==-1:
		bb_rsi_str = 0
	else:
		bb_rsi_str = df.bb_rsi_str[i-1]
	df.iloc[i,df.columns.get_loc('bb_rsi_str')] = bb_rsi_str

### ANALYSIS
# Strategies Daily Returns
# Bands Crossover Strategy Without Trading Commissions
df['bb_rsi_returns'] = ((df.close/df.close_lag1)-1)*df.bb_rsi_str
df.iloc[0,df.columns.get_loc('bb_rsi_returns')] = 0.0  # no return for the first period
# Buy and Hold Strategy
df['bh_returns'] = (df.close/df.close_lag1)-1
df.iloc[0,df.columns.get_loc('bh_returns')] = 0.0  # no return for the first period

# Strategies Cumulative Returns
# Cumulative Returns Calculation
# TODO: check calculations
df['bb_rsi_cum_returns'] = (np.cumprod(df.bb_rsi_returns+1)-1)
df['bh_cum_returns'] = (np.cumprod(df.bh_returns+1)-1)

# Strategies Performance Metrics
# Annualized Returns
bb_rsi_yr_returns = df.bb_rsi_cum_returns.tail(1).values[0]
bh_yr_returns = df.bh_cum_returns.tail(1).values[0]
# Annualized Standard Deviation
bb_rsi_std = np.std(df.bb_rsi_returns.values)*np.sqrt(365.)  # cryptos trade 365
bh_std = np.std(df.bh_returns.values)*np.sqrt(365.)
# Annualized Sharpe Ratio
bb_rsi_sharpe = bb_rsi_yr_returns/bb_rsi_std
bh_sharpe = bh_yr_returns/bh_std

# Summary Results Data Table
print('\n')
summary_df = pd.DataFrame(
		{'Summary' :['Return','Std Dev','Sharpe (Rf=0%)'],'Trade':[bb_rsi_yr_returns,bb_rsi_std,bb_rsi_sharpe],
		 'Buy&Hold':[bh_yr_returns,bh_std,bh_sharpe]})
summary_df = summary_df[['Summary','Trade','Buy&Hold']]
with pd.option_context('display.precision',2):
	print(summary_df)

# CHARTING
fig1,ax = plt.subplots(5,sharex=True)
ax[0].plot(df['close'])
ax[0].plot(df['bb_up'],linestyle='--',label='upper')
ax[0].plot(df['bb_mid'],linestyle='--',label='middle')
ax[0].plot(df['bb_low'],linestyle='--',label='lower')
ax[0].legend(loc='upper left')
ax[1].plot(df['rsi'],color='green',label='rsi')
ax[1].axhline(y=85,linestyle='--',color='orange')
ax[1].axhline(y=35,linestyle='--',color='orange')
ax[1].legend(loc='upper left')
ax[2].plot(df['bb_sig'],marker='o',markersize=5,linestyle='',label='signal',color='red')
ax[2].legend(loc='upper left')
ax[3].plot(df['bb_rsi_str'],marker='o',markersize=5,linestyle='',label='strategy',color='green')
ax[3].legend(loc='upper left')
ax[4].plot(df['bb_rsi_cum_returns'],label='Trade')
ax[4].plot(df['bh_cum_returns'],label='Buy & Hold')
ax[4].legend(loc='upper left')
plt.suptitle('BTC 2hr Close Prices, BB (7, 1.5), & RSI (7)')
plt.show()

# persist df
df.to_csv("./coinbaseBTCUSD-withsignals-2hr.csv")
```