# Bitcoin Price Prediction Tutorial

### by Jake Catron
***

In this article I will be walking you through the process of using Long Short-Term Memory (LSTM) to predict the price of the cryptocurrency Bitcoin (BTC).


## What is Bitcoin?

Bitcoin is the world's most popular and valuable cryptocurrency. Considering my grandfather once asked me about it, it is likely that you have heard of it as well. But, you may still not understand what it fundamentally is. 

Wikipedia provides the following definition: "Bitcoin is a digital assest designed to work as a medium of exhange that uses **cryptography** to control its creation and management, rather than relying on central authorities."

## Crypto-what??

Before we dive into Bitcoin, let's first explore cryptography to understand the basis. "Cryptography" comes from the Greek word *kryptos* (secret) and *graphein* (to write). So it is the practice of writing secret messages. How does one accomplish this secrecy in the modern world? We use **encryption algorithms** to convert plaintext into unitelligible nonsense known as a **hash**. Think about the food kind of hash, it is a jumbled mixture of many ingredients. In the same sense, a encrpytion hash is a jumbled mixture of letters and numbers. The goal of all encryption algorithms is to increase the **computational time** required to convert the hashes created back into plain text. This is the key takeaway here. Modern encrpytion algorithms create complex hashes that take *enormous* amounts of computational time and electricity to map back to the original value. 

## Soo, what is Bitcoin?

Well Bitcoin is founded on a principle called the **block-chain**. Think of every block as a page in a book. The page contains a bunch of information that we can use to make conclusions about the other pages in the book, and it also has a unique number. This number only appears once in the entire book! Let's call the number the **nonce** ("number only used once"). So each page in the book has information on it as well as its own nonce that leads to the next page directly after it. 

If we wanted, we could call a book a *page-chain* (see what I did there?). But the difference between the block-chain and the page-chain is that in a block-chain, each block's nonce (page number) is unknown at first. Imagine instead of our pages containing words and sentences in our natural language, someone used an encryption algorithm to mix up everything into a secret mess(age) of letters and numbers! Essentially this is what happens to each block in the block-chain. The creator of Bitcoin (Satoshi Nakamoto) encrypted the phrase "The Times 03/Jan/2009 Chancellor on brink of second bailout for banks" into the hash "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f". This along with a **nonce** of 2083236893 and some other information such as the **signature** and **public key** formed the very first block in the block-chain of Bitcoin. 

Instead of simply turning the page, for a new block to be added to the chain someone with a lot of brains and computational power has to increment the nonce until it hashes to a hex number with a certain amount of leading 0 bits (look at the hash of the first block above). Once that threshold of 0 bits is reached, the **miner** of the block is awarded a certain amount of coin for their efforts. Then this newly formed block passes on its solved hash to everyone in the network, so they can start mining the next block into existence. 

## Bitcoin's Controlled Supply

Bitcoin was designed in a way that the difficulty of mining a new block (achieving the correct nonce) is adjusted every 2016 blocks mined. The difficulty is adjusted so that the rate of mining is roughly 6 blocks per hour. This adjustment is to account for advancements in computing power and speed. 

Also, the amount of coins awarded to a successful miner is cut in half every time 210,000 new blocks are mined and added to the chain. However once the 32 halvings occur, the suppy supply of Bitcoin is finite and unchangeable. Since the intial block awarded 50 Bitcoin, it follows that Bitcoin's *supply curve* will stop once reaching to a value juuust under 21 million.

## Bitcoin as a Speculative Asset

Now you can atleast walk away from reading this knowing a bit more about the fundamentals of the famous Bitcoin. However the true purpose of this tutorial is to try to forecast the price of Bitcoin using hihg-frequency price and volume data. 

Why would we want to do that? Well since Bitcoin has gained popularity, it has transitioned from becoming just a currency to becoming an asset that investors can speculate the price of, hoping to turn a profit in a quick-turnaround. 

The extreme volatility of the price of Bitcoin as well as the fact that it is available to trade 24/7 as opposed to normal securities has made it the target of many hight-frequency traders.

While I believe wholeheartedly in the altruistic goal Nakamoto had in mind, I am also a greedy human. Thus I will be using historical pricing and volume data to try to predict the future! Sounds fancy right? 

***
**Note**: There are many other aspects of the block-chain which I did not touch on here, see [Satoshi's paper](https://bitcoin.org/bitcoin.pdf) for further reading.






## Methodology

So how will we accomplish this goal? By using deep learning *magic hands*. Every Data Scientist's favorite buzzword, deep learning decribes a subset of Machine Learning techniques that use deep neural networks to solve problems. Neural networks got their name after being structured similarly to the nuerons firing in our brains. Essentially think of each neuron taking in some data as input, performing a calculation/transformation on the data, and then passing it on to the rest of the neurons so they can collectively produce an output. 


In [2]:
# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import sklearn
import autokeras as ak

## Collecting Data

In this tutorial we will be using high frequency (1-minute interval) price and volume data to forecast Bitcoin's price in n minutes. The only free source of 1-minute interval data I could find was from [Kaggle](https://www.kaggle.com/mczielinski/bitcoin-historical-data), where I was able to download Bitcoin's minutely OHLCV data from Jan 01, 2012 to March 31, 2021.

Let's load that data into a pandas DataFrame and take a look at the first and last 5 rows. 

In [4]:
# Loading DataFrame
btc_1min_df = pd.read_csv('bitstampUSD_1-min_data_2012-01-01_to_2021-03-31.csv')
# Displaying first and last 5 rows as well as column names
btc_1min_df

Unnamed: 0,Timestamp,Open,High,Low,Close,Volume_(BTC),Volume_(Currency),Weighted_Price
0,1325317920,4.39,4.39,4.39,4.39,0.455581,2.000000,4.390000
1,1325317980,,,,,,,
2,1325318040,,,,,,,
3,1325318100,,,,,,,
4,1325318160,,,,,,,
...,...,...,...,...,...,...,...,...
4857372,1617148560,58714.31,58714.31,58686.00,58686.00,1.384487,81259.372187,58692.753339
4857373,1617148620,58683.97,58693.43,58683.97,58685.81,7.294848,428158.146640,58693.226508
4857374,1617148680,58693.43,58723.84,58693.43,58723.84,1.705682,100117.070370,58696.198496
4857375,1617148740,58742.18,58770.38,58742.18,58760.59,0.720415,42332.958633,58761.866202


So now that we have the data, let's clean it up a bit. 

We are going to drop the Open, High, Low, and Weighted_Price columns. WE are only concerned with using 

Let's also drop the "Volume\_(BTC)" column. This shows the amount of Bitcoin traded in that minute while the "Volume\_(Currency)" shows the amount traded, but in USD. Because the rest of the data is in USD, we are going to drop the BTC column so we have standard units. 

Because the size of the DataFrame is ~200mb I am going to perform the cleaning operations in place so as not to waste space creating multiple versions of mostly the same data.

In [None]:
btc_1min_df = btc_1min_df.loc[:,['Timestamp', 'Open', 'High', 'Low', 'Close', 'Volume']]

In [None]:
sns.set_theme('darkgrid')
sns.lineplot(data=btc_1min_df, x=)

As we can see there are 4.85 million rows in this dataset. Because we are going to use a sliding window approach, where we are only using the past 60 mins of data to predict the next 5, we do not need 9 years worth of historical data. So for now let's cut our data down to a 2 year period from March 2019 to 

In [5]:
btc_1min_df['Datetime'] = pd.to_datetime(btc_1min_df['Timestamp'], unit='s')
btc_1min_df


Unnamed: 0,Timestamp,Open,High,Low,Close,Volume_(BTC),Volume_(Currency),Weighted_Price,Datetime
0,1325317920,4.39,4.39,4.39,4.39,0.455581,2.000000,4.390000,2011-12-31 07:52:00
1,1325317980,,,,,,,,2011-12-31 07:53:00
2,1325318040,,,,,,,,2011-12-31 07:54:00
3,1325318100,,,,,,,,2011-12-31 07:55:00
4,1325318160,,,,,,,,2011-12-31 07:56:00
...,...,...,...,...,...,...,...,...,...
4857372,1617148560,58714.31,58714.31,58686.00,58686.00,1.384487,81259.372187,58692.753339,2021-03-30 23:56:00
4857373,1617148620,58683.97,58693.43,58683.97,58685.81,7.294848,428158.146640,58693.226508,2021-03-30 23:57:00
4857374,1617148680,58693.43,58723.84,58693.43,58723.84,1.705682,100117.070370,58696.198496,2021-03-30 23:58:00
4857375,1617148740,58742.18,58770.38,58742.18,58760.59,0.720415,42332.958633,58761.866202,2021-03-30 23:59:00


In [6]:
sns.set_theme()
# sns.lineplot(data=btc_1min_df, x = 'Datetime', y='Weighted_Price')

As we can see there has been some pretty extreme volatility in Bitcoin's price throughout its year of existence. 2021 has been a particulary spectcular year, which may have to do with factors other than price and volatility.  Because our data is dealing with such high frequency (1-minute) time intervals we are only going to use data from March 01, 2019 to March 01, 2021 

In [7]:
# Only keep observations from March 1, 2019 and onwards
start_cond = btc_1min_df_og.Datetime >= pd.Timestamp(2019,3,1) 
btc_1min_df_trim = btc_1min_df_og.loc[start_cond,:]
# Only keep observations stricly before March 2, 2021
end_cond = btc_1min_df_trim.Datetime < pd.Timestamp(2021,3,1)
btc_1min_df_trim = btc_1min_df_trim.loc[end_cond,:]

# Resetting index to start from 0
btc_1min_df_trim.reset_index(drop=True, inplace=True)


In [10]:
btc_1min_df_trim.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1052640 entries, 0 to 1052639
Data columns (total 9 columns):
 #   Column             Non-Null Count    Dtype         
---  ------             --------------    -----         
 0   Timestamp          1052640 non-null  int64         
 1   Open               1034973 non-null  float64       
 2   High               1034973 non-null  float64       
 3   Low                1034973 non-null  float64       
 4   Close              1034973 non-null  float64       
 5   Volume_(BTC)       1034973 non-null  float64       
 6   Volume_(Currency)  1034973 non-null  float64       
 7   Weighted_Price     1034973 non-null  float64       
 8   Datetime           1052640 non-null  datetime64[ns]
dtypes: datetime64[ns](1), float64(7), int64(1)
memory usage: 72.3 MB


In [11]:
1052640 - 1034973

17667

In [19]:
na_indices = pd.isnull(btc_1min_df_trim).any(1).to_numpy().nonzero()[0]
na_indices

array([     24,      41,      44, ..., 1050613, 1051259, 1051928])

In [22]:
np.where(np.diff(na_indices) <= 120)

(array([    0,     1,     2, ..., 17653, 17654, 17661]),)

In [9]:
# Checking if we are missing a minute time interval
# (btc_1min_df.loc[1034972, 'Timestamp'] - btc_1min_df.loc[0,'Timestamp']) / 60

We will use an LSTM model because after reviewing countless academic papers, they seemed to commonly have the highest prediction accuracy, as well as the lowest measure of error. 

What is LSTM?

LSTM stands for Long Short-Term Memory. It is a type of Reccurent Neural Network (RNN). The thing with traditional neural nets is that they don't use the state/reasoning from an earlier layer to inform them about a later one. However RNN's suffer from short-term memory, as their gradient are ventually either lost during back-propagation or eventually explode. This is why LSTM is required, as it updates the cell's state using the culmination of the hidden state from previous cells. Thus a longer memory is produced. (explain more on this section)

When analyzing an asset such as Bitcoin, we want to use the knowledge we have of previous timeframes to make a decision about the upcoming ones. If you want a more detailed understanding of LSTM model, I suggest [this article](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) by Chrisopher Olah.

  Thus we will be using an RNN. After reviewing the some of the most recent academic literature on cryptocurrency price prediction (see [here](https://www.semanticscholar.org/paper/Bitcoin-price-prediction-using-machine-learning%3A-An-Chen-Li/cec3d533193d922b73b96e8556198f113e1de934), [here](https://www.sciencedirect.com/science/article/pii/S2405918821000027), [here](https://link.springer.com/content/pdf/10.1007/s00521-020-05129-6.pdf), and [here](https://arxiv.org/ftp/arxiv/papers/2102/2102.05448.pdf)) they all have the best results using LSTM with relatively high dimensions and frequency. 
 
The features I will be using are 



In [None]:
def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100