<a href="https://colab.research.google.com/github/sshillo/blog/blob/master/bitcoin-prediction-pytorch-deep-learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Intro
It's December 2020 and bitcoin prices have hit an all time high. I've recently gone down a rabbit hole of trying to create a reinforcement learning bot to trade bitcoin. While that task is still a work in progress, I thought it useful to show how to predict bitcoin prices using deep learning. This is a good place to start for most people who want to get into deep learning, reinforcement learning, or automated trading. 

### Why Pytorch aka what libraries and how come?

You might be asking yourself why not tensorflow or keras or stable baselines or something pytorch lightning or something else. 

My main reason for using pytorch is that it's backed by Facebook, and it allows you to write clean pythonic code that is easy to understand. My impressions of tensorflow were that python had been used to create some dsl on top of what was actually happening, which means a steep learning curve because you arn't writing the python code that you are used to.

### Why not pytorch lighting or stable baselines 3 or some other library?

Well first off, if you're learning new things, start with the least amount of external libraries and add them as needed. Having a library like stable baselines I think is only good, if you've already implemented every algorithm yourself, or you're not going to know what you are doing and why. Out of the box modeling libraries are going to be doing stuff like batch normalization, gradient clipping, and blah blah blah, do you really need this stuff, and/or know what it actually does, in what use cases. In my experience it's better to start with the simplist thing possible, it's not that hard to build a model from scratch, along the way you might find yourself creating a library that ends up looking like one of these others, for complex stuff like this I think that's a good thing, this isn't a web framework, where reinventing the wheel is just a waste of time. When it comes to complex things like deep learning, I think reinventing the wheel can sometimes be a good thing.

## Setup
First, lets install some libraries. 

In [1]:
!pip install chart_studio plotly==4.9.0 statsmodels==0.11.0 pmdarima ipdb wandb pyarrow==2.0.0
!pip install pytorch-lightning==1.0.4

Collecting chart_studio
[?25l  Downloading https://files.pythonhosted.org/packages/ca/ce/330794a6b6ca4b9182c38fc69dd2a9cbff60fd49421cb8648ee5fee352dc/chart_studio-1.1.0-py3-none-any.whl (64kB)
[K     |████████████████████████████████| 71kB 5.3MB/s 
[?25hCollecting plotly==4.9.0
[?25l  Downloading https://files.pythonhosted.org/packages/bf/5f/47ab0d9d843c5be0f5c5bd891736a4c84fa45c3b0a0ddb6b6df7c098c66f/plotly-4.9.0-py2.py3-none-any.whl (12.9MB)
[K     |████████████████████████████████| 12.9MB 8.5MB/s 
[?25hCollecting statsmodels==0.11.0
[?25l  Downloading https://files.pythonhosted.org/packages/e2/bf/134d0f9b4fa62b830dcf7ed0567d4964f0a7fae12862ff252748541a4c94/statsmodels-0.11.0-cp36-cp36m-manylinux1_x86_64.whl (8.7MB)
[K     |████████████████████████████████| 8.7MB 32.8MB/s 
[?25hCollecting pmdarima
[?25l  Downloading https://files.pythonhosted.org/packages/be/62/725b3b6ae0e56c77534de5a8139322e7b863ca53fd5bd6bd3b7de87d0c20/pmdarima-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (1.5M

Collecting pytorch-lightning==1.0.4
[?25l  Downloading https://files.pythonhosted.org/packages/94/e7/d9ac82471c6a6246963726a62dfbd858b81ad63de71ed9dd18fc491596c4/pytorch_lightning-1.0.4-py3-none-any.whl (554kB)
[K     |████████████████████████████████| 563kB 8.7MB/s 
[?25hCollecting fsspec>=0.8.0
[?25l  Downloading https://files.pythonhosted.org/packages/a5/8b/1df260f860f17cb08698170153ef7db672c497c1840dcc8613ce26a8a005/fsspec-0.8.4-py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 9.1MB/s 
[?25hCollecting future>=0.17.1
[?25l  Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
[K     |████████████████████████████████| 829kB 13.3MB/s 
Collecting PyYAML>=5.1
[?25l  Downloading https://files.pythonhosted.org/packages/64/c2/b80047c7ac2478f9501676c988a5411ed5572f35d1beff9cae07d321512c/PyYAML-5.3.1.tar.gz (269kB)
[K     |████████████████████████████████| 276kB 22.5

Now let's import everything we need


In [7]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import os
import torch.nn as nn
import torch
from torch.autograd import Variable
import ipdb
from torch.utils.data import TensorDataset, DataLoader, Dataset
device = torch.device("cuda:0")
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import shutil
from IPython.display import clear_output 
import time
import urllib
import pmdarima as pm

### Get the Data
I've separately collected 3 years bitcoin data at 15 minute intervals from the binance api. It is stored in parquet because it loads faster and preserves type information.

In [5]:
bitcoin_data_url = "https://drive.google.com/u/0/uc?id=14iEVdVtBaVfN6dMg0bO4QrfSaUoeXJ4Y&export=download"
urllib.request.urlretrieve(bitcoin_data_url, "data.parquet")

('data.parquet', <http.client.HTTPMessage at 0x7f9fffdd0908>)

### Creating a Baseline

Before we predict bitcoin prices using the new hotness, we need to set a baseline. I've read many blogs telling me how to do something similar to this, and each time I ask myself why? Well the answer should be better performance, if we can predict prices using something that already works and is less complex, we should do that. The gold standard for predicting timeseries data are arima models, so we're going to use that. For more info on arima, please go [here](https://www.machinelearningplus.com/time-series/arima-model-time-series-forecasting-python/#:~:text=So%20what%20exactly%20is%20an,used%20to%20forecast%20future%20values.).

In [13]:
df = pd.read_parquet('data.parquet')
data = df['close']
train, test = train_test_split(data, train_size=80)

# Fit a simple auto_arima model
modl = pm.auto_arima(train, start_p=1, start_q=1, start_P=1, start_Q=1,
                     max_p=5, max_q=5, max_P=5, max_Q=5, seasonal=True,
                     stepwise=True, suppress_warnings=True, D=10, max_D=10,
                     error_action='ignore')

print(modl)

# Create predictions for the future, evaluate on test
preds, conf_int = modl.predict(n_periods=test.shape[0], return_conf_int=True)

# Print the error:
print("Test RMSE: %.3f" % np.sqrt(mean_squared_error(test, preds)))

 ARIMA(0,0,0)(0,0,0)[0] intercept
Test RMSE: 2694.857


So what happened here? 
First we split the data into  I'm using a library called pmarima, to 

*   We split the data into train / test
*   I fitted a model using the pmarima library which finds the best hyperparaters for our arima model
* Lastly we computed the RMSE(root mean squared error) on the test data test, we'll compare these predictions and RSME to our pytorch model


Now lets create the Model. I'm using GRU instead of LSTM, because it's a little simpler in implementation

In [None]:
class GRU(nn.Module):
    def __init__(self, i_size, h_size, n_layers, o_size):
        super(GRU, self).__init__()

        self.rnn = nn.GRU(
            input_size=i_size,
            hidden_size=h_size,
            num_layers=n_layers,
            batch_first=True
        )
        self.hidden_size = h_size
        self.num_layers = n_layers
        self.out = nn.Linear(h_size, o_size)

    def init_hidden(self, batch_size=32):
      return torch.zeros(self.num_layers, batch_size, self.hidden_size)

    def forward(self, x, hidden=None):
        #num layers, batch size, hidden
        if hidden is None:
          batch_size = x.shape[0]
          hidden = self.init_hidden(batch_size)
          hidden = hidden.type_as(x)

        out, next_hidden = self.rnn(x, hidden)
        outs = self.out(out[:,-1,:])

        return outs, next_hidden 


I'm going to create a function because functions are nice, they're reusable, if you want to transfer this code out of jupyter, it makes life a lot easier. Actually if I weren't writing a blog post, I would keep most of my code outside of the ipynb file, just importing the functions I need, and merely using the notebook as a way to display graphs inline. There's more to this tangent, but I'll leave that for another blog post.

In [None]:
def run(model_klass, 
        dataset_train, 
        dataset_test, 
        sc, 
        name='test',
        input_size =15,
        hidden_size=64,
        num_layers=2,
        output_size=1,
        num_epochs=3, 
        batch_size=32,
        learning_rate=.001):
  print(f"RUNNING {name}")
  training_set = dataset_train
  training_set_scaled = sc.fit_transform(training_set)

  X_train = []
  y_train = []
  for i in range(input_size, training_set_scaled.shape[0]):
      X_train.append(training_set_scaled[i-input_size:i, 0])
      y_train.append(training_set_scaled[i, 0])
  X_train, y_train = np.array(X_train), np.array(y_train)

  X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))

  rnn = model_klass(input_size, hidden_size, num_layers, output_size).to(device)

  optimiser = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
  criterion = nn.MSELoss()

  inputs = Variable(torch.from_numpy(X_train).float()).to(device)
  labels = Variable(torch.from_numpy(y_train).float()).to(device)

  dataset = TensorDataset(inputs, labels)
  loader = DataLoader(dataset, batch_size=batch_size, shuffle=True, drop_last=True)
  for epoch in range(num_epochs):
      losses = []
      hidden = rnn.init_hidden(batch_size=batch_size).to(device)
      for inputs, labels in loader:
        hidden = hidden.data
        output, hidden = rnn(inputs, None) 

        optimiser.zero_grad()
        loss = criterion(output.view(-1), labels)
        loss.backward()                     # back propagation
        optimiser.step()                                     # update the parameters
        losses.append(loss.item())
      if epoch % 5 == 0:
        print('epoch {}, loss {}'.format(epoch,np.mean(losses)))

  real_stock_price = dataset_test #open values
  # Getting the predicted stock price of 2017
  dataset_total = np.concatenate((dataset_train, dataset_test), axis = 0)
  # inputs = dataset_total[len(dataset_total) - len(dataset_test) - INPUT_SIZE:].values
  inputs = dataset_total
  inputs = inputs.reshape(-1,1)
  # inputs = np.diff(inputs, axis=0)
  inputs = sc.transform(inputs)
  X_test = []
  for i in range(input_size, len(inputs)):
      X_test.append(inputs[i-input_size:i, 0])
  X_test = np.array(X_test)
  X_test = np.reshape(X_test, (X_test.shape[0], 1, X_test.shape[1]))

  # X_train_X_test = np.concatenate((X_train, X_test),axis=0)
  test_inputs = Variable(torch.from_numpy(X_test).float()).to(device)
  # test_inputs = Variable(torch.from_numpy(X_train_X_test).float()).to(dev)
  predicted_stock_price, b = rnn(test_inputs)
  predicted_stock_price = np.reshape(predicted_stock_price.detach().cpu().numpy(), (test_inputs.cpu().shape[0], 1))

  predicted_stock_price = sc.inverse_transform(predicted_stock_price)[:,0]
  # predicted_stock_price = diffinv(predicted_stock_price, start).reshape(-1, 1)

  real_stock_price_all = dataset_total[input_size:][:,0]

  # Visualising the results
  N = predicted_stock_price.shape[0]
  test_start = int(N * 0.75)

  fig = go.Figure()
  fig.add_trace(go.Scatter(y=real_stock_price_all, name='Real'))
  fig.add_trace(go.Scatter(y=predicted_stock_price, name='Pred'))
  fig.add_shape(type="line",
    x0=test_start, y0=0, x1=test_start, y1=20000,
    line=dict(color="RoyalBlue",width=1))
  fig.add_trace(go.Scatter(
      x=[test_start - 5000], y=[15000],
      text=["Train/Test Split"],
      mode="text",
  ))
  fig.show()

  # mean_squared_error(real_stock_price_all, predicted_stock_price, squared=False)
  t_d = predicted_stock_price[test_start:]
  r_d = real_stock_price_all[test_start:]
  test_rse = mean_squared_error(t_d, r_d, squared=False)

  t_d = predicted_stock_price[0:test_start]
  r_d = real_stock_price_all[0:test_start]
  train_rse = mean_squared_error(t_d, r_d, squared=False)
  return name, train_rse, test_rse

Ok, now everything is setup, we're going to run everything

In [None]:
start = time.time()

df = pd.read_parquet('./data.parquet')
cols = ['open']
data_train, data_test = train_test_split(df[cols].values, test_size=0.25, shuffle=False)

errors = []

sc = MinMaxScaler(feature_range = (-1, 1))
err = run(GRU, data_train, data_test, sc, 'gru minmax 0,1')
errors.append(err)

total_time = time.time() - start
print("Total time:", total_time)

df = pd.DataFrame(errors)
df.columns = ['name','train error', 'test error']
print(df)

From the graph we can see that the model does a good, job predicting the price in just a few epochs.