# Trading Strategy for Finance using LSTMs

## Optional Exercise
Please read ahead and only come back to these optional exercises if time permits.

**Train from scratch** [20-30 mins]

First, change the # of epochs to 20 in the above cell. Second, put the starting learning rate back to **0.002**. Third, comment out the two line where the pre-trained model is loaded (under "Restore latest checkpoint"). Then re-run everything using Kernel->Restart & Run All. 



#### How can a portfolio manager assess the predicted signal?

We could scatter-plot actual returns over the predicted returns, however correlation is not visually apparent on scatter plots when the correlation is below 20-30%. The correlation we achieve in this signal is much weaker which is typical of modern financial markets. Correlations which we often observe in other applications of predictive models are all but impossible in the financial markets which are highly efficient (simply put, unpredictable). If we imagine that someone has a signal with correlation of 30% using leverage the person would soon get extremely rich - and the observed signal (inefficiency) would disappear from the market.

In order to visually assess the signal, we split out of sample data points into buckets based on the value of predicted returns. We then compute per-bucket mean actual returns. Then we plot mean actual returns (Y axis) against predicted returns (X axis). We thus plot one point per bucket. By taking mean value, we average out the variance within each bucket and uncover the predictive value of the signal.

In [None]:
actual = totalActual
predicted = totalPredicted

actualMeanReturn = []
predictedMeanReturn = []
stdActualReturns = []
# Buckets are created
buckets = np.arange(-0.02,0.02,0.002)

actual = np.array(actual)
predicted = np.array(predicted)

# Predicted values and the actual values are placed into buckets
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    for i in range(len(buckets)-1):
        index = np.logical_and(predicted>buckets[i], predicted<buckets[i+1])
        thisBucket = actual[index].mean()
        actualMeanReturn.append(thisBucket)
        predictedMeanReturn.append(predicted[index].mean())
        stdActualReturns.append(actual[index].std())

# Actual versus predicted values are plotted
plt.figure()
plt.plot(predictedMeanReturn,actualMeanReturn, marker='*')
plt.xlabel('predicted')
plt.ylabel('actual')
plt.grid(True)
plt.show()

plt.figure()
plt.errorbar(predictedMeanReturn, actualMeanReturn, yerr = stdActualReturns, marker='*')
plt.xlabel('predicted')
plt.ylabel('actual')
plt.grid(True)
plt.show()


**How much variance is there?**

Plot 2 answers this question by adding error bars to the previous plot. Length of the error bar is equal to the standard deviation of actual returns within each respective bucket.
Plots such as these would be typically used by a portfolio manager to assess behavior of prospective signals and to assess signal levels at which an action should be taken. The simplest trading system utilizing this signal would buy security when predicted return is above some threshold (say, above 0.5%) and sell (or short-sell) the security when the signal is below negative threshold (e.g. below -0.5%). 


## 4. Next Steps

We recommend you to try the following steps after the lab.

1. Try using other machine learning techniques such as random forest, ridge regression, xgboost and compare the correlation with LSTM based predictor.

2. Try using autoencoder to extract fewer features than the original dataset provides and use the features as input to the deep learning model. Analyze the performance. 

## 5. Summary

In this lab, step by step implementation of a LSTM based deep neural network to predict time series financial data is presented. The performance of the model is evaluated with the pearson correlation and competitive performance is achieved. The code provided in this lab can be used in complex trading strategies.

## 6. Post-Lab

Finally, don't forget to save your work from this lab before time runs out and the instance shuts down!!

1. You can download the data from this [https://www.kaggle.com/c/two-sigma-financial-modeling](https://www.kaggle.com/c/two-sigma-financial-modeling).

2. To use the data, please set the "usePreparedData" variable to False before running the code on your environment.

3. Also, remove the code "model_saver.restore(sess, pre_trained_model)" to train the model for you data.


