# File Description
The CSV file contains data for time period from Jan 2012 to March 2021 with minute by minute reportings of OHLC (open, high, low, close) and volume. There are missing value, bacause the exchange (or its API) was down or did not exist.

# Aim
Predict the closing price of bitcoin looking at the market trend.

# Import File

In [None]:
import numpy as np
import pandas
import seaborn
import matplotlib.pyplot as plt

In [None]:
bit_df = pandas.read_csv('../input/bitcoin-historical-data/bitstampUSD_1-min_data_2012-01-01_to_2021-03-31.csv')

In [None]:
bit_df.head()

In [None]:
from tabulate import tabulate
info = [[col, bit_df[col].count(), bit_df[col].max(), bit_df[col].min()] for col in bit_df.columns]
print(tabulate(info, headers = ['Feature', 'Count', 'Max', 'Min'], tablefmt = 'orgtbl'))

# EDA and Data Wrangling

In [None]:
print(bit_df.isna().sum())

There are more than **1 million** unrecorded timestamps.

In [None]:
bit_df = bit_df.dropna()

In [None]:
print('total missing values : ' + str(bit_df.isna().sum().sum()))

### NOTE
As it can be observed, the timestamps are **no longer equally distributed** after removing the Nan values. Since this is a time series data and we are taking previous performance into account, if there are large missing chunks in between the model may get a wrong impression of the ongoing trend.

### So only a part of latest available data will be used for prediction.

In [None]:
bit_df = bit_df[bit_df['Timestamp'] > (bit_df['Timestamp'].max()-650000)]

In [None]:
bit_df = bit_df.reset_index(drop = True)

In [None]:
bit_df.head()

In [None]:
bit_df.hist(figsize = (15,15))
plt.show()

### Correlation
Let's look for correlation between the data.

In [None]:
plt.figure(figsize = (15,15))
mat = bit_df.corr()
seaborn.heatmap(mat, vmin = -1.0, annot = True, square = True)

**Open, High, Low, Close and Weighted_Price** are all highly correlated, so either one of them can be used as a feature. One of either **Volume_(BTC) or Volume_(Currency)** will be the second feature.

In [None]:
bit_df = bit_df.drop(['Timestamp', 'Low', 'High', 'Volume_(BTC)', 'Weighted_Price'], axis = 1)

In [None]:
info = [[col, bit_df[col].count(), bit_df[col].max(), bit_df[col].min()] for col in bit_df.columns]
print(tabulate(info, headers = ['Feature', 'Count', 'Max', 'Min'], tablefmt = 'orgtbl'))

# Data visualization (recent trends)

In [None]:
plt.figure(figsize = (20,10))
plt.subplot(2,1,1)
plt.plot(bit_df['Open'].values[bit_df.shape[0]-500:bit_df.shape[0]])
plt.xlabel('Time period')
plt.ylabel('Opening price')
plt.title('Opening price of Bitcoin for last 500 timestamps')

plt.subplot(2,1,2)
plt.plot(bit_df['Volume_(Currency)'].values[bit_df.shape[0]-500:bit_df.shape[0]])
plt.xlabel('Time period')
plt.ylabel('Volume Traded')
plt.title('Volume traded of Bitcoin for last 500 timestamps')
plt.show()

### Note
One things for sure cryptocurrency are very volatile, as they are not regulated by any single authority.

# Create the arrays

In [None]:
bit_df.shape

In [None]:
X = np.array(bit_df.drop(['Close'], axis = 1))
y = np.array(bit_df['Close'])

In [None]:
print(X.shape)
print(y.shape)

# Scaling the data
We will normalize the data to remove the mean and have a unit variance using **StandardScaler()** from sklearn

In [None]:
print(X.max())
print(X.min())

In [None]:
print(y.max())
print(y.min())

In [None]:
from sklearn.preprocessing import StandardScaler
X = StandardScaler().fit_transform(X)

In [None]:
t = np.reshape(y, (-1,1))
y = StandardScaler().fit_transform(t)
y = y.reshape(-1)

In [None]:
print(X.max())
print(X.min())

In [None]:
print(y.max())
print(y.min())

# Creating time series datasets
Considering past **500** timestamps, approximately equal to 8 hours, performance.

In [None]:
print(X.shape)
print(y.shape)

In [None]:
length = 500
X_temp = []
y_temp = []
for i in range(length,X.shape[0]) :
    X_temp.append(X[i-length: i])
    y_temp.append(y[i])
X_temp = np.array(X_temp)
y_temp = np.array(y_temp)

In [None]:
print(X_temp.shape)
print(y_temp.shape)

# Train test split

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_temp, y_temp, test_size = 0.2, random_state = 1)

In [None]:
print(X_train.shape)
print(y_train.shape)

In [None]:
print(X_test.shape)
print(y_test.shape)

# Models (RNN vs LSTM)

### SimpleRNN
SimpleRNN layer in keras is how a vanilla RNN model is implemented. It has only one tanh layer which takes in the previous hidden state and input, and computes new Output and hidden state. The figure depicts a SimpleRNN layer,

<center><img src = "https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png" alt = "simplernn" width = "700"/></center>

In [None]:
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import SimpleRNN
from keras.layers import BatchNormalization

from keras.layers import Input

In [None]:
def simp_layer (hidden1) :
    
    model = Sequential()
    
    # add input layer
    model.add(Input(shape = (500, 2, )))
    
    # add rnn layer
    model.add(SimpleRNN(hidden1, activation = 'tanh', return_sequences = False))
    model.add(BatchNormalization())
    model.add(Dropout(0.2))
    
    # add output layer
    model.add(Dense(1, activation = 'linear'))
    
    model.compile(loss = 'mean_squared_error', optimizer = 'adam')
    
    return model

In [None]:
model = simp_layer(10)
model.summary()

In [None]:
from keras.callbacks import ModelCheckpoint
checkp = ModelCheckpoint('./bit_model.h5', monitor = 'val_loss', save_best_only = True, verbose = 1)

In [None]:
import time
beg = time.time()

In [None]:
model.fit(X_train, y_train, batch_size = 32, epochs = 10, validation_data = (X_test, y_test), callbacks = [checkp])

In [None]:
end = time.time()

In [None]:
from keras.models import load_model
model = load_model('./bit_model.h5')

In [None]:
pred = model.predict(X_test)

In [None]:
print(pred.shape)

In [None]:
pred = pred.reshape(-1)

In [None]:
from sklearn.metrics import mean_squared_error
print('MSE : ' + str(mean_squared_error(y_test, pred)))

In [None]:
plt.figure(figsize = (20,7))
plt.plot(y_test[2040:2060])
plt.plot(pred[2040:2060])
plt.xlabel('Time')
plt.ylabel('Price')
plt.title('Closing Price vs Time (using SimpleRNN)')
plt.legend(['Actual price', 'Predicted price'])
plt.show()

In [None]:
print('Time taken for SimpleRNN model to learn : ' + str(end-beg) + ' sec.')

### LSTM
A simple RNN suffers from the problem of **vanishing gradient**, where it becomes hard to keep the past information and the model might fail. The LSTM handles this problem using gates as shown in the figure below. These gates decide which info to retain, which new info to add and what to output.

<center><img src = "https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png" alt = "lstm" width = 700/></center>

### Forget gate layer
The first portion is **forget gate layer** which has a sigmoid function generating the fraction of previous data to retain from the **cell state**. Cell state is continueous stream of past states.

### Input gate layer
This layer has two gates, a tanh layer that outputs new info to add and a sigmoid layer that decides what proportion of these info to add. First we output new candidates of tanh layer and then they are multiplied by the sigmoid values, finally being added to the cell state.

### Output gate layer
Now we decide what to output from cell state. This is done using tanh layer through which the outputs of the cell state are passed and then it is multiplied with sigmoid layer to decide which parts to output.

More details can be found in [Colah's Blog](https://colah.github.io/posts/2015-08-Understanding-LSTMs/?source=post_page-----79e5eb8049c9----------------------)

In [None]:
def lstm_layer (hidden1) :
    
    model = Sequential()
    
    # add input layer
    model.add(Input(shape = (500, 2, )))
    
    # add rnn layer
    model.add(LSTM(hidden1, activation = 'tanh', return_sequences = False))
    model.add(BatchNormalization())
    model.add(Dropout(0.2))
    
    # add output layer
    model.add(Dense(1, activation = 'linear'))
    
    model.compile(loss = "mean_squared_error", optimizer = 'adam')
    
    return model

In [None]:
model = lstm_layer(256)
model.summary()

In [None]:
checkp = ModelCheckpoint('./bit_model_lstm.h5', monitor = 'val_loss', save_best_only = True, verbose = 1)

In [None]:
beg = time.time()

In [None]:
model.fit(X_train, y_train, batch_size = 32, epochs = 10, validation_data = (X_test, y_test), callbacks = [checkp])

In [None]:
end = time.time()

In [None]:
model = load_model('./bit_model_lstm.h5')

In [None]:
pred = model.predict(X_test)

In [None]:
print(pred.shape)

In [None]:
pred = pred.reshape(-1)

In [None]:
print('MSE : ' + str(mean_squared_error(y_test, pred)))

In [None]:
plt.figure(figsize = (20,7))
plt.plot(y_test[2040:2060])
plt.plot(pred[2040:2060])
plt.xlabel('Time')
plt.ylabel('Price')
plt.title('Closing Price vs Time (using LSTM)')
plt.legend(['Actual price', 'Predicted price'])
plt.show()

In [None]:
print('Time taken by LSTM to learn : ' + str(end-beg))

# Conclusion
Since there was ample data, the models actually did not face any issue in learning the pattern. If given enough time, either of the models could predict almost perfectly. But this is important to note that the LSTM was able to predict more accurately and was even faster compared to SimpleRNN.