**Data Collection using CCXT**

Import the required libraries

In [94]:
import ccxt
import pandas as pd
import datetime

Set up CCXT with the exchange you prefer (for this example, I'll use Binance)

In [95]:
exchange = ccxt.binance()

Define a function to fetch OHLCV (Open, High, Low, Close, Volume) data for Bitcoin

In [96]:
def fetch_data(start: str, end: str) -> pd.DataFrame:
    """
    Fetch OHLCV data for Bitcoin from the given start date to the end date.
    """
    # Convert human-readable date to milliseconds since epoch
    since = exchange.parse8601(start)
    end_timestamp = exchange.parse8601(end)
    
    # Define the time frame for the data ('1h' for hourly data)
    timeframe = '1h'
    
    # Columns for the dataframe
    columns = ["Timestamp", "Open", "High", "Low", "Close", "Volume"]
    
    all_candles = []
    while since < end_timestamp:
        candles = exchange.fetch_ohlcv('BTC/USDT', timeframe, since)
        if len(candles) == 0:
            break
        since = candles[-1][0] + 1  # Start the next batch of data right where the previous batch ended
        all_candles += candles

    # Convert to DataFrame
    df = pd.DataFrame(all_candles, columns=columns)
    
    # Convert the timestamp to a more readable date-time format
    df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='ms')
    return df

Use the function to fetch data for the last two years

In [97]:
start_date = (datetime.datetime.now() - datetime.timedelta(days=730)).strftime('%Y-%m-%dT%H:%M:%SZ')
end_date = datetime.datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ')

bitcoin_data = fetch_data(start_date, end_date)

View the first few rows to verify

In [98]:
print(bitcoin_data.head())

            Timestamp      Open      High       Low     Close      Volume
0 2021-10-17 15:00:00  61017.77  61100.26  60471.97  60764.12  1673.09112
1 2021-10-17 16:00:00  60764.13  61000.00  60659.20  60971.39  1394.94549
2 2021-10-17 17:00:00  60971.39  61080.00  60800.00  61029.65  1016.93691
3 2021-10-17 18:00:00  61029.66  61040.59  60841.52  60976.09   667.96846
4 2021-10-17 19:00:00  60976.08  61000.00  60379.56  60519.48  1576.62668


**Preprocess Data with New Features**

In [99]:
import talib

# 1. RSI
bitcoin_data['RSI'] = talib.RSI(bitcoin_data['Close'].values, timeperiod=14)

# 2. MACD
macd, macdsignal, macdhist = talib.MACD(bitcoin_data['Close'].values, fastperiod=12, slowperiod=26, signalperiod=9)
bitcoin_data['MACD'] = macd
bitcoin_data['MACD_Signal'] = macdsignal
bitcoin_data['MACD_Histogram'] = macdhist

# 3. Bollinger Bands
upper, middle, lower = talib.BBANDS(bitcoin_data['Close'].values, timeperiod=20)
bitcoin_data['BB_Upper'] = upper
bitcoin_data['BB_Middle'] = middle
bitcoin_data['BB_Lower'] = lower

# 4. Stochastic Oscillator
slowk, slowd = talib.STOCH(bitcoin_data['High'].values, 
                           bitcoin_data['Low'].values, 
                           bitcoin_data['Close'].values, 
                           fastk_period=5, slowk_period=3, slowk_matype=0, slowd_period=3, slowd_matype=0)
bitcoin_data['SlowK'] = slowk
bitcoin_data['SlowD'] = slowd

# 5. ATR
bitcoin_data['ATR'] = talib.ATR(bitcoin_data['High'].values, 
                                bitcoin_data['Low'].values, 
                                bitcoin_data['Close'].values, 
                                timeperiod=14)

# Drop NaN values
bitcoin_data.dropna(inplace=True)

Check for missing values

In [100]:
print(bitcoin_data.isnull().sum())

Timestamp         0
Open              0
High              0
Low               0
Close             0
Volume            0
RSI               0
MACD              0
MACD_Signal       0
MACD_Histogram    0
BB_Upper          0
BB_Middle         0
BB_Lower          0
SlowK             0
SlowD             0
ATR               0
dtype: int64


Setting the date as the index

In [101]:
bitcoin_data.set_index('Timestamp', inplace=True)

Normalizing the data

In [102]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
bitcoin_data_scaled = scaler.fit_transform(bitcoin_data)

close_scaler = MinMaxScaler()
bitcoin_data['Close_Scaled'] = close_scaler.fit_transform(bitcoin_data[['Close']])

Creating sequences for the LSTM model

In [103]:
import numpy as np

def create_sequences(data, seq_length):
    X, y = [], []

    for i in range(len(data) - seq_length):
        X.append(data[i:i+seq_length])
        y.append(data[i+seq_length, 3])  # 'Close' price is at index 3

    return np.array(X), np.array(y)

# Sequence length can be tweaked as per your preference, here we use 10
seq_length = 10
X, y = create_sequences(bitcoin_data_scaled, seq_length)


Splitting the data into training and testing sets

In [104]:
train_size = int(0.8 * len(X))
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

**Building the LSTM Model**

Import necessary libraries

In [105]:
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout

Define and compile the LSTM model

In [106]:
model = Sequential()

# First LSTM layer
model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], bitcoin_data_scaled.shape[1])))
model.add(Dropout(0.2))

# Second LSTM layer
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))

# Third LSTM layer
model.add(LSTM(units=50))
model.add(Dropout(0.2))

# Output layer
model.add(Dense(units=1))

# Compiling the model
model.compile(optimizer='adam', loss='mean_squared_error')

**Training and Testing**

Train the Model

In [107]:
model.fit(X_train, y_train, epochs=20, batch_size=64)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x1c71556a650>

Predict on the Test Set

In [108]:
y_pred = model.predict(X_test)



To visualize the model's predictions compared to the true values, it's useful to reverse the normalization and plot the results

In [109]:
# Inverse transform the predicted and actual values using the new scaler
y_pred_transformed = close_scaler.inverse_transform(np.array(y_pred).reshape(-1, 1)).flatten()
y_test_transformed = close_scaler.inverse_transform(np.array(y_test).reshape(-1, 1)).flatten()

import plotly.graph_objects as go

# Create the figure object
fig = go.Figure()

# Add the actual and predicted price data
fig.add_trace(go.Scatter(x=list(range(len(y_test_transformed))), y=y_test_transformed.flatten(), mode='lines', name='Actual Price', line=dict(color='blue')))
fig.add_trace(go.Scatter(x=list(range(len(y_pred_transformed))), y=y_pred_transformed.flatten(), mode='lines', name='Predicted Price', line=dict(color='red')))

# Adjust the layout
fig.update_layout(
    title='Bitcoin Price Prediction',
    xaxis_title='Time',
    yaxis_title='Price',
    template="plotly_dark",
    height=800
)

# Display the figure
fig.show()

Secondary Visualization 

In [110]:
# Create candlestick chart
fig = go.Figure(data=[go.Candlestick(x=bitcoin_data.index[train_size + seq_length:],
                open=bitcoin_data['Open'][train_size + seq_length:],
                high=bitcoin_data['High'][train_size + seq_length:],
                low=bitcoin_data['Low'][train_size + seq_length:],
                close=bitcoin_data['Close'][train_size + seq_length:])])

# Overlay the predictions as yellow dots
fig.add_trace(go.Scatter(x=bitcoin_data.index[train_size + seq_length:], y=y_pred_transformed, mode='markers', marker=dict(color='yellow', size=5), name='Predicted Price'))

fig.update_layout(
    title='Bitcoin Actual vs Predicted Prices',
    xaxis_title='Date',
    yaxis_title='Price (in USD)',
    template="plotly_dark",
    height=800
)

# Display the graph
fig.show()

**Prediction for the next hour**

In [111]:
# Step 1: Take the last 60 data points
last_60_data = bitcoin_data_scaled[-60:]

# Step 2: Reshape the data
n_features = bitcoin_data_scaled.shape[1]
last_60_data = np.array(last_60_data).reshape(1, -1, n_features)

# Step 3: Get the prediction
predicted_price = model.predict(last_60_data)

# Step 4: Transform the prediction back to the original scale
predicted_price_transformed = close_scaler.inverse_transform(predicted_price)

print(f"The predicted Bitcoin price for the next hour is: ${predicted_price_transformed[0][0]}")

The predicted Bitcoin price for the next hour is: $28310.169921875


Metrics for regression problems

In [112]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

mae = mean_absolute_error(y_test_transformed, y_pred_transformed)
mse = mean_squared_error(y_test_transformed, y_pred_transformed)
rmse = np.sqrt(mse)

print(f"MAE: {mae}")
print(f"MSE: {mse}")
print(f"RMSE: {rmse}")

relative_error = abs(y_pred_transformed - y_test_transformed) / y_test_transformed

# Convert relative error to percentage error
percentage_error = relative_error * 100

# Calculate mean and standard deviation
mean_percentage_error = np.mean(percentage_error)
std_percentage_error = np.std(percentage_error)

print(f"Mean Percentage Error: {mean_percentage_error:.2f}%")
print(f"Standard Deviation of Percentage Error: {std_percentage_error:.2f}%")


MAE: 167.89037976437362
MSE: 48095.8763900607
RMSE: 219.3077207716607
Mean Percentage Error: 0.61%
Standard Deviation of Percentage Error: 0.52%


## Interpretation of Model Performance

### Mean Percentage Error (MPE) of 1.00%
On average, the predicted values deviate from the actual values by about 1%. This means that if the actual price was, say, $10,000, the model's predictions might typically be off by about $100.

### Standard Deviation (SD) of Percentage Error of 0.68%
This indicates the variability of the percentage errors. Most (about 68%) of the prediction errors will fall within 1% ± 0.68%, i.e., between 0.32% and 1.68%.

Considering the volatility and unpredictability of Bitcoin prices, an average error of 1% with a 0.68% standard deviation seems pretty reasonable. However, always consider the context and the specific use-case you have in mind when interpreting these numbers. If you're using this for high-frequency trading, even a 1% error could be significant, while for longer-term predictions or general trend analysis, it might be acceptable.
