# **Predicting Bitcoin Prices with RNN**

The cryptocurrency market has seen its rise and fall in the past few years. With a variety of coins being exchanged for real money, it is important to know the trend in the coin price. In this article, we will build a fairly simple LSTM network to predict or forecast the prices of Bitcoin.

## Obtaining Bitcoin Data

There are plenty of open sources available on the internet to extract historical data of Bitcoin prices. The one that I have used below is from Coinmarketcap.

You can view and download the dataset [here](https://coinmarketcap.com/currencies/bitcoin/historical-data/).

### Loading And Understanding The Data

Importing Necessary Libraries

In [None]:

!python -m pip install pip --upgrade --user -q --no-warn-script-location
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels openpyxl sklearn tensorflow keras torch torchvision \
    tqdm scikit-image pmdarima arch nsepy --user -q --no-warn-script-location

import IPython
IPython.Application.instance().kernel.do_shutdown(True)


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Importing The Dataset

In [None]:
complete_data = pd.read_excel('bitcoin_data.xlsx')
print(complete_data.head(20))

The dataset consists of 7 features. This is how most of the stock information datasets will look like. 

1. Market Capitalisation/Market Cap: It is the total dollar market value of a company’s (in this case Bitcoin) outstanding shares.
2. Volume: The total amount of security that changes hands over a given period of time (In this case the time period is one day)
3. Close: The closing price of the stock(Bitcoin price at the end of the day).
4. Low: Low denotes the lowest value or drop over the complete time period.
5. High: The highest value or rise over the complete time period.
6. Open: The opening price of the stock of a particular day.
7. Date: The date of observation.


In [None]:
#Printing the dataset info
print(complete_data.info())

###Creating Training and Test Data

The dataset consists of observations in the decreasing order of date. We need to preserve this serial nature of the data and hence we cannot split the data randomly. So we will perform a more straight forward approach to splitting the dataset.

In [None]:
#Setting The training set ratio
training_ratio = 80

#Calculating the test set ratio
test_ratio = 100-training_ratio

#Rounding the training set length to avoid fractions
training_len = round(len(complete_data)*(training_ratio/100))

#Setting the Test set length
test_len = round(len(complete_data)-training_len)

#Splitting the data based on the calculated lengths
dataset_train = complete_data.tail(training_len)
dataset_test = complete_data.head(test_len)

#Printing the shapes of training and test sets

print("Shape Of Training Set :", dataset_train.shape)
print("Shape Of Test Set :", dataset_test.shape)

In [None]:
#Printing the training and test sets
print(dataset_test.tail(10))
print(dataset_train.head(10))

### Preprocessing The Training Set

Preparing The Data

In [None]:
#Setting the date colum to date format
dataset_train['Date'] = pd.to_datetime(dataset_train['Date'])

In [None]:
#Setting Date column as Index
dataset_train.set_index('Date', inplace = True)

In [None]:
#Sorting the dataset based on increasing Dates
dataset_train.sort_values(by='Date',ascending=True, inplace = True)

In [None]:
print(dataset_train.head(10))

Scaling and Sequencing

In [None]:
#A method to preprocess the data in to sequences and to return x and y 

#Initializing the MinMaxScaler object
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0,1))

def bit_pre_process(raw_data , seq_len, column = 1):
  
  #Select the feature/column 
  data = raw_data.iloc[:, column].values
  data = data.reshape(-1, 1)
  
  #Feature Scaling
  data = sc.fit_transform(data)
  
  #Making sequences
  
  X = []
  y = []

  for i in range(seq_len, len(data)):
      X.append(data[i-seq_len:i, 0])
      y.append(data[i, 0])
  X, y = np.array(X), np.array(y)

  # Reshaping
  X = np.reshape(X, (X.shape[0], X.shape[1], 1))
  
  return X, y

In [None]:
#Setting the sequence length (Try different values)
sequence_length = 60

#Choosing the idex of the Close column
comumn_index= 3

#Preprocessing the training set
X_train, y_train = bit_pre_process(dataset_train , sequence_length, comumn_index)

Sequence length is the number of previous observation to consider in order to predict for a given day. For example, in this case to predict the closing price of Bitcoin at a given date, the algorithm will consider the closing price of Bitcoins for the past 60 days.

In [None]:
X_train.shape

In [None]:
y_train.shape

The independent variable set(x_train) will now consist of 1785 observations (Original length of training set - sequence length (1845 - 60)). Each observation in x_train is a sequence of 60 closing prices i,e. the first row in x_train will be an array of the first 60 observations from the training set and the corresponding y_train will be the closing price of the 61st observation.

For a sequence length of 60, x_train and y_train will look like whats shown below:

### Building a Recurrent Neural Network

In [None]:
#Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, CuDNNLSTM
from keras.layers import Dropout

#Initialising the RNN
regressor = Sequential()

#Adding the first LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.2))

#Adding a second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

#Adding a third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

#Adding a fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))

#Adding the output layer
regressor.add(Dense(units = 1))

CuDNNLSTM requires a GPU to execute. If there is no GPU, use LSTM instead.

In [None]:
#Compiling the RNN
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error', metrics = ['mse', 'mae'])

In [None]:
#Fitting the RNN to the Training set and training the RNN
regressor.fit(X_train, y_train, epochs = 50, batch_size = 50)

### Predicting For The Test Set

Preparing Test Data

In [None]:
#Displaying the dataset
dataset_test.head()

In [None]:
#Converting the Date colum values from object to datetime format
dataset_test['Date'] = pd.to_datetime(dataset_test['Date'])


#Setting the Date column as Index to the dataset
dataset_test.set_index('Date', inplace = True)


#Sorting the data based on Increasing value of date
dataset_test.sort_values(by='Date',ascending=True, inplace = True)


#Displaying the prepared dataset
dataset_test.head()

In [None]:
dataset_test.shape

Now we have arranged the test data in increasing order of dates. However the test data is still not prepared for forecasting. In order to predict the the closing price of the first day in the test set we will require the closing prices of the previous 60 days. So we will attach the closing prices of the previous 60 days from the training set to the test set.

In [None]:
#Adding the previous 60 days cosing price to the test data
test_set = pd.concat((dataset_train.tail(sequence_length), dataset_test), axis = 0)

test_set.head(10)

In [None]:
test_set.shape

In [None]:
#Preprocessing the test data
x_test, y_true = bit_pre_process(test_set , sequence_length, comumn_index)

In [None]:
x_test.shape

In [None]:
y_true.shape

In [None]:
#Predicting the closing price of the test data
predicted_stock_price = regressor.predict(x_test)

In [None]:
#Inversing the scaled values 
predicted_stock_price = sc.inverse_transform(predicted_stock_price)

#Inversing the scaled actual prices from test data
real_stock_price = sc.inverse_transform(y_true.reshape(-1, 1))

### Visualizing The Prediction

In [None]:
# A Method to plot predicted pirce vs Actual price
def plot_predictions(real_price, predicted_price, title, x_label, y_label):
  plt.plot(real_price, color = 'green', label = 'Real Stock Price')
  plt.plot(predicted_price, color = 'red', label = 'Predicted Stock Price')
  plt.title(title)
  plt.xlabel('Time')
  plt.ylabel('Google Stock Price')
  plt.legend()
  plt.show()

In [None]:
#Plotting real_stock_pric vs predicted_stock_price
plot_predictions(real_stock_price, predicted_stock_price, "Bitcoin Closing Price Prediction", "Time", "Closing Price")



### Comparing the Predicted and Actual prices

In [None]:
dataset_test['Predicted_Close'] = predicted_stock_price
compare = dataset_test[['Open','Close','Predicted_Close' ]]
compare.head(20)

### Further Analysis - Buy or Sell

In this step we will simply analyse the predicted and real closing prices and based on the data we will decide whether it is good to buy or sell the Bitcoins.

Condition :

If the closing price is greater than the opening price we will decide to sell the coins, otherwise will will buy more coins.

In [None]:
#A function to decide buy or sell (1 for sell and 0 for buy)

def buy_sell(op, cl):
  
  #If closing price is greater than the opening price, its a buy
  if cl > op:    
    return 1
  #If closing price is lesser than the opening price, its a sell 
  else:
    return 0

#Mapping the buy_sell method to the actual closing price in the test data and saving it as a column
compare["Buy-(Actual)"] = list(map(buy_sell, compare['Open'], compare['Close']))

#Mapping the buy_sell method to the predicted closing price in the test data and saving it as a column
compare["Buy-(Predicted)"] = list(map(buy_sell, compare['Open'], compare['Predicted_Close']))

#Checkin the new dataframe
compare.tail()

In [None]:
#Calculating Buy or Sell accuracy
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(compare['Buy-(Actual)'], compare['Buy-(Predicted)'])
Accuracy = cm.diagonal().sum()/cm.sum()
print("Buy or Sell Accuracy : ", Accuracy)


Best Approach

The Buy or Sell can be predicted using the RNN. In the above approach we used regression to predict the closing prices and then mapped the Buy or Sell method. Instead we can use the Buy/Sell as the target variable and use the RNN for categorical prediction.


#**Related Articles:**

> * [Bitcoin Price Prediction](https://analyticsindiamag.com/guide-to-implementing-time-series-analysis-predicting-bitcoin-price-with-rnn/)

> * [Time Series Forecasting with Darts](https://analyticsindiamag.com/hands-on-guide-to-darts-a-python-tool-for-time-series-forecasting/)

> * [Guide to Time Series Forecasting with GluonTS](https://analyticsindiamag.com/gluonts-pytorchts-for-time-series-forecasting/)

> * [Tensorflow Core](https://analyticsindiamag.com/time-series-forecasting-using-tensorflow-core/)

> * [LSTM RNN on Foreign Exchange Rate Prediction](https://analyticsindiamag.com/foreign-exchange-rate-prediction-using-lstm-recurrent-neural-network/)

> * [Pyflux](https://analyticsindiamag.com/pyflux-guide-python-library-for-time-series-analysis-and-prediction/)

