In [1]:
import numpy as np
import pandas as pd

# install pandas_ta - https://github.com/twopirllc/pandas-ta
import pandas_ta as ta

import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression, SGDClassifier

# Algorithmic Trading Bot

## 0. Step-by-step guide to building an algorithmic trading bot for Bitcoin

Here's a step-by-step guide to building an algorithmic trading bot for Bitcoin, utilizing machine learning to identify buy and sell signals.

### Step 0.1: Data Collection

First, gather historical Bitcoin price data, which will be used for model training. Ideally, this data should include:

1. **Price Data**: Open, close, high, and low prices over specific time intervals (e.g., daily, hourly).
2. **Trading Volumes**: The amount of Bitcoin traded during the respective time intervals.
3. **Technical Indicators**: Calculate indicators like RSI, MACD, Moving Averages, etc., commonly used for trading decisions.

You can use APIs from **Binance**, **Coinbase Pro**, or **Yahoo Finance** to download this data.

### Step 0.2: Data Preparation

1. **Data Cleaning**:
   - Remove missing or anomalous values.
   - Convert time series data if necessary to ensure consistent time intervals.

2. **Calculate Technical Indicators**:
   - Add columns with calculated values for RSI, MACD, Moving Averages, and other relevant indicators.

3. **Define Target Variable**:
   - Define the target, such as whether the price will increase or decrease. You can create a binary variable (1 for "buy," 0 for "sell") based on price changes over a certain period (e.g., if the price increases by more than 1% in the next period).

### Step 0.3: Building the Predictive Model

1. **Model Selection**:
   - Use supervised learning models like **Random Forest**, **XGBoost**, or **SVM** to classify moments as "buy" or "sell."

2. **Training the Model**:
   - Split the data into training and testing sets.
   - Train the model on the training data and tune hyperparameters for optimal performance.

3. **Model Evaluation**:
   - Use metrics like **accuracy**, **F1 score**, **Precision**, and **Recall** to assess how well the model predicts "buy" and "sell" signals.

### Step 0.4: Strategy Simulation (Backtesting)

Before deploying the bot in a live environment, backtest it to assess its performance on historical data.

1. **Use the Test Dataset**: Evaluate how the bot would perform if buy and sell decisions had been made based on historical data.
2. **Evaluate the Strategy**: Calculate key metrics like:
   - **Return**: Compare achieved profit relative to a baseline (e.g., buy-and-hold).
   - **Maximum Drawdown**: Assess the largest losses during consecutive failed trades.
   - **Risk/Reward Ratio**.

### Step 0.5: Deployment in a Live Environment

1. **Choose a Trading Platform**:
   - Connect the bot to a trading API from exchanges like **Binance** or **Coinbase Pro**, which support live order execution.

2. **Build the Bot’s Core Logic**:
   - At each predefined interval (e.g., hourly), the bot should pull current data, calculate new indicators, and predict whether to buy or sell.
   - If the prediction is "buy," the bot sends a buy order; if "sell," a sell order.

3. **Risk Management Parameters**:
   - Define stop-loss and take-profit limits.
   - Set a maximum amount of funds the bot can use per trade, ensuring it does not take excessive risks.

### Step 0.6: Monitoring and Optimization

1. **Monitor Real-Time Bot Performance**:
   - Collect statistics on real trades to assess the model’s success in live conditions.

2. **Model Adjustment**:
   - Regularly update the data and retrain the model to accommodate new market conditions.
   - Test new indicators or models if optimization is required.

This strategy can be expanded by including other factors, such as sentiment analysis or blockchain transaction data.

## 1. Step 1: Data Collection

First, gather historical Bitcoin price data, which will be used for model training. Ideally, this data should include:

1. **Price Data**: Open, close, high, and low prices over specific time intervals (e.g., daily, hourly).
2. **Trading Volumes**: The amount of Bitcoin traded during the respective time intervals.
3. **Technical Indicators**: Calculate indicators like RSI, MACD, Moving Averages, etc., commonly used for trading decisions.

You can use APIs from **Binance**, **Coinbase Pro**, or **Yahoo Finance** to download this data.

### 1.1. Gather historical Bitcoin price data

**Price Data**: Gathered the open, close, high, and low prices over specific time intervals (every minute)

**Trading Volumes**: The amount of Bitcoin traded during the respective time intervals.

**TODO**: what ist Unit of the volume ????

In [2]:
btc_price_data_1_year = pd.read_csv("data/bitcoin_historical_data_1_year.csv")
btc_price_data_1_year

Unnamed: 0,timestamp,open,high,low,close,volume,date,time
0,2023-11-01 00:00:00,34618.86,34676.51,34656.38,34667.88,48.953000,2023-11-01,00:00:00
1,2023-11-01 00:01:00,34642.54,34687.53,34673.30,34642.82,16.178075,2023-11-01,00:01:00
2,2023-11-01 00:02:00,34637.97,34656.82,34642.53,34656.56,8.753120,2023-11-01,00:02:00
3,2023-11-01 00:03:00,34617.22,34656.56,34656.56,34629.34,11.308610,2023-11-01,00:03:00
4,2023-11-01 00:04:00,34597.99,34630.42,34629.41,34622.27,8.583808,2023-11-01,00:04:00
...,...,...,...,...,...,...,...,...
528627,2024-10-31 23:56:00,70238.76,70248.97,70248.97,70238.76,1.189134,2024-10-31,23:56:00
528628,2024-10-31 23:57:00,70218.00,70250.00,70238.77,70233.24,4.767082,2024-10-31,23:57:00
528629,2024-10-31 23:58:00,70193.97,70242.25,70232.55,70207.78,9.589688,2024-10-31,23:58:00
528630,2024-10-31 23:59:00,70175.16,70207.79,70207.79,70197.83,7.112237,2024-10-31,23:59:00


In [3]:
btc_price_data_1_year.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
open,528632.0,57052.764723,11061.821828,34079.46,44083.78,61015.8,66127.655,73718.32
high,528632.0,57100.379874,11072.504219,34133.68,44119.545,61066.475,66178.0475,73835.57
low,528632.0,57076.624894,11067.091134,34113.93,44103.265,61041.025,66153.5025,73815.03
close,528632.0,57076.858348,11067.114019,34114.86,44103.38,61041.34,66154.3375,73815.43
volume,528632.0,8.915933,17.137792,0.001083,1.730652,4.020989,9.541108,1163.832604


In [4]:
btc_price_data_1_year.dtypes

timestamp     object
open         float64
high         float64
low          float64
close        float64
volume       float64
date          object
time          object
dtype: object

In [5]:
btc_price_data_1_year.isnull().sum()

timestamp    0
open         0
high         0
low          0
close        0
volume       0
date         0
time         0
dtype: int64

## 2. Step 2: Data Preparation

1. **Data Cleaning**:
   - Remove missing or anomalous values.
   - Convert time series data if necessary to ensure consistent time intervals.

2. **Calculate Technical Indicators**:
   - Add columns with calculated values for RSI, MACD, Moving Averages, and other relevant indicators.

3. **Define Target Variable**:
   - Define the target, such as whether the price will increase or decrease. You can create a binary variable (1 for "buy," 0 for "sell") based on price changes over a certain period (e.g., if the price increases by more than 1% in the next period).

### 2.1. Data Tidying and Cleaning

Convert the `timestamp` and`date`columns from *object* type to *datetime64* type.

**TODO** - Convert `time` - Pandas lacks a native time type, using datetime64[ns] with placeholder dates is often a suitable workaround for time operations.

In [6]:
btc_price_data_1_year.timestamp = pd.to_datetime(btc_price_data_1_year.timestamp)
btc_price_data_1_year.date = pd.to_datetime(btc_price_data_1_year.date)
# btc_price_data_1_year.time = pd.to_datetime(btc_price_data_1_year.time, format='%H:%M:%S').dt.time

btc_price_data_1_year.dtypes

timestamp    datetime64[ns]
open                float64
high                float64
low                 float64
close               float64
volume              float64
date         datetime64[ns]
time                 object
dtype: object

In [7]:
btc_price_data_1_year

Unnamed: 0,timestamp,open,high,low,close,volume,date,time
0,2023-11-01 00:00:00,34618.86,34676.51,34656.38,34667.88,48.953000,2023-11-01,00:00:00
1,2023-11-01 00:01:00,34642.54,34687.53,34673.30,34642.82,16.178075,2023-11-01,00:01:00
2,2023-11-01 00:02:00,34637.97,34656.82,34642.53,34656.56,8.753120,2023-11-01,00:02:00
3,2023-11-01 00:03:00,34617.22,34656.56,34656.56,34629.34,11.308610,2023-11-01,00:03:00
4,2023-11-01 00:04:00,34597.99,34630.42,34629.41,34622.27,8.583808,2023-11-01,00:04:00
...,...,...,...,...,...,...,...,...
528627,2024-10-31 23:56:00,70238.76,70248.97,70248.97,70238.76,1.189134,2024-10-31,23:56:00
528628,2024-10-31 23:57:00,70218.00,70250.00,70238.77,70233.24,4.767082,2024-10-31,23:57:00
528629,2024-10-31 23:58:00,70193.97,70242.25,70232.55,70207.78,9.589688,2024-10-31,23:58:00
528630,2024-10-31 23:59:00,70175.16,70207.79,70207.79,70197.83,7.112237,2024-10-31,23:59:00


In [8]:
btc_price_data_1_year.dtypes

timestamp    datetime64[ns]
open                float64
high                float64
low                 float64
close               float64
volume              float64
date         datetime64[ns]
time                 object
dtype: object

### 2.2. Calculate Technical Indicators

Calculate the values and add new columns with calculated values for **RSI**, **MACD**, **Moving Averages** and other relevant indicators.

#### 2.2.1. Relative Strength Index (RSI)

**RSI** is a momentum oscillator that measures the speed and change of recent price movements. It is used to identify overbought or oversold conditions in a stock's price, generally over a 14-day period.

- **Formula**: The RSI is calculated as:
  
  $$\text{RSI} = 100 - \frac{100}{1 + RS}$$
  
  where $RS$ (Relative Strength) is the ratio of **average gains** to **average losses** over the lookback period (e.g., 14 days).

- **Interpretation**:
  - **Overbought Condition**: When RSI is above 70, the asset is often considered overbought, suggesting a potential for a pullback.
  - **Oversold Condition**: When RSI is below 30, the asset is considered oversold, suggesting a potential for a rebound.

- **Calculation**:
  - Calculate the **change** in price from one day to the next.
  - Separate the changes into **gains** (positive changes) and **losses** (negative changes).
  - Compute the **average gain** and **average loss** over the 14-day period.
  - Calculate \( RS \) as the ratio of average gain to average loss.
  - Use the RSI formula to convert \( RS \) into an index between 0 and 100.

In [9]:
def calculate_rsi(data, window=14):
    """
    RSI is a momentum oscillator that measures the speed and change of price movements, typically over a 14-period interval.
    Assuming 'data' is a DataFrame with a 'close' price column

    Parameters
    ----------
    data:   a DataFrame with the time series data. A column with the name 'close' must be present in the DataFrame! 
            This column is used to calculate the 'RSI' value.
    window: the time period that is taken into account when calculating the 'RSI'
    """
    delta = data['close'].diff(1)
    gain = delta.where(delta > 0, 0)
    loss = -delta.where(delta < 0, 0)
    
    avg_gain = gain.rolling(window=window, min_periods=1).mean()
    avg_loss = loss.rolling(window=window, min_periods=1).mean()

    rs = avg_gain / avg_loss
    rsi = 100 - (100 / (1 + rs))
    data['RSI'] = rsi
    
    return data

In [10]:
# Calculate RSI
btc_price_data_1_year = calculate_rsi(btc_price_data_1_year)

#### 2.2.2. Moving Average Convergence Divergence (MACD)

**MACD** is a trend-following momentum indicator that shows the relationship between two moving averages of an asset’s price.

- **Formula**:
  - **MACD Line**: $\text{MACD} = \text{EMA}_{\text{short}} - \text{EMA}_{\text{long}}$
  - **Signal Line**: A **9-day EMA** of the MACD line.
  - **MACD Histogram**: The difference between the MACD line and the Signal Line.

  Here, EMA stands for Exponential Moving Average, which gives more weight to recent prices.

- **Common Parameters**:
  - **Short EMA**: Often set to a 12-day EMA.
  - **Long EMA**: Often set to a 26-day EMA.
  - **Signal Line EMA**: Often set to a 9-day EMA of the MACD line.

- **Interpretation**:
  - **MACD Line Crosses Above Signal Line**: This is a bullish signal, indicating a potential buy.
  - **MACD Line Crosses Below Signal Line**: This is a bearish signal, indicating a potential sell.
  - **MACD Divergence**: If the price and MACD are moving in opposite directions, it may signal a reversal.
  - **Histogram**: The MACD histogram shows the distance between the MACD line and the Signal Line. When the histogram grows larger, it indicates a strengthening trend in that direction.

- **Application**:
  - The MACD helps traders see changes in momentum, trend direction, and possible reversal points by analyzing the difference between the short and long EMAs.

In [11]:
def calculate_macd(data, short_window=12, long_window=26, signal_window=9):
    """
    Moving Average Convergence Divergence (MACD) is calculated using two exponential moving averages (EMA):
    the 12-day EMA and the 26-day EMA, with a 9-day EMA as the signal line.
    """
    data['EMA12'] = data['close'].ewm(span=short_window, adjust=False).mean()
    data['EMA26'] = data['close'].ewm(span=long_window, adjust=False).mean()
    
    # MACD Line
    data['MACD'] = data['EMA12'] - data['EMA26']
    
    # Signal Line
    data['Signal_Line'] = data['MACD'].ewm(span=signal_window, adjust=False).mean()
    
    return data

In [12]:
# Calculate MACD (Moving Average Convergence Divergence)
btc_price_data_1_year = calculate_macd(btc_price_data_1_year)

#### 2.2.3. Moving Averages (SMA and EMA)

Moving averages smooth out price data to help identify trends over specific time frames. They are often used to see the underlying trend of an asset’s price and are among the most widely used technical indicators.

##### 2.2.3.1. Simple Moving Average (SMA)

- **Definition**: The **SMA** is the average of the closing prices over a specific period. For example, a 10-day SMA is the average closing price over the last 10 days.
  
- **Formula**:
  $$\text{SMA} = \frac{\sum_{i=1}^{N} \text{Price}_i}{N}$$
  where $N$ is the period (e.g., 10 days).

- **Interpretation**:
  - **Trend Identification**: When prices are above the SMA, it suggests an upward trend; when below, it suggests a downward trend.
  - **Crossovers**: When a short-term SMA crosses above a long-term SMA (e.g., 10-day SMA crosses above the 50-day SMA), it generates a bullish signal. The reverse crossover indicates a bearish signal.

##### 2.2.3.2. Exponential Moving Average (EMA)

- **Definition**: The **EMA** is a weighted moving average that gives more importance to recent prices, making it more responsive to new information than the SMA.
  
- **Formula**:
  - EMA uses a multiplier:
    $$\text{EMA}_\text{current} = \left(\frac{2}{N+1}\right) \times (\text{Price}_\text{current} - \text{EMA}_\text{previous}) + \text{EMA}_\text{previous}$$
    where $N$ is the number of periods.

- **Interpretation**:
  - **More Sensitive to Price Changes**: Because the EMA responds more quickly to recent prices, it is useful in identifying potential reversals and shorter-term trends.

In [13]:
def calculate_moving_averages(data, sma_window=20, ema_window=20):
    """
    Simple Moving Average (SMA) is the average price over a specified number of periods, 
    while Exponential Moving Average (EMA) gives more weight to recent prices.
    """
    # Simple Moving Average
    data['SMA'] = data['close'].rolling(window=sma_window).mean()
    
    # Exponential Moving Average
    data['EMA'] = data['close'].ewm(span=ema_window, adjust=False).mean()
    
    return data

In [14]:
# Calculate moving averages
btc_price_data_1_year = calculate_moving_averages(btc_price_data_1_year)

### 2.3. Calculate Target Variable

In [15]:
def create_target_variable(data, threshold = 0.01):
    """
    Computes and sets the 'target' variable from the input 'data' and 'threshold'.
    Creates a 'target' column with the computed values in the 'data' DataFrame.

    Parameters
    ----------
    data: a DataFrame with the time series data. There must be a column named 'close'! 
          This column will be used by the user to calculate the 'target' variable.
    
    threshold: threshold for the price change to classify as 'buy' or 'sell'. For instance, if you want a 1% increase to be a 'buy' signal,
               the threshold will be 0.01. Adjust this threshold as per your strategy.
    """
    # Create a copy of the DataFrame
    data_copy = data.copy(deep=True)
    
    # Compute the percentage change between the current close price and the close price in the next period.
    # This will help define whether there’s a significant increase or decrease.
    data_copy['future_return'] = ((data_copy['close'].shift(-1) - data_copy['close']) / data_copy['close']) * 100

    # Define the target as 1 (buy) if the future return is above the threshold, and 0 (sell) if it is below or equal to the threshold.
    data_copy['target'] = (data_copy['future_return'] > threshold).astype(float)

    # The last row in your dataset will have a NaN value for 'future_return' due to the shift operation. Drop this row to clean up the dataset.
    data_copy = data_copy.dropna()

    # Check the balance of 1s and 0s in our target variable to understand how many “buy” and “sell” signals we have.
    print(data_copy['target'].value_counts())

    return data_copy

In [16]:
def create_target_variable_tech_indicators(data, price_threshold = 0.01):
    """
    Computes and sets the 'target' variable from the input 'data' and 'threshold'.
    Creates a 'target' column with the computed values in the 'data' DataFrame.

    Parameters
    ----------
    data: a DataFrame with the time series data. There must be a column named 'close'! 
          This column will be used by the user to calculate the 'target' variable.
    
    threshold: threshold for the price change to classify as 'buy' or 'sell'. For instance, if you want a 1% increase to be a 'buy' signal,
               the threshold will be 0.01. Adjust this threshold as per your strategy.
    """
    # Create a copy of the DataFrame
    data_copy = data.copy(deep=True)
    
    # Compute the percentage change between the current close price and the close price in the next period.
    # This will help define whether there’s a significant increase or decrease.
    data_copy['future_return'] = ((data_copy['close'].shift(-1) - data_copy['close']) / data_copy['close']) * 100
    
    # Use the technical indicators to add more conditions to the target:
    # - RSI: A Relative Strength Index (RSI) value below 30 often indicates an oversold condition, which might suggest a buying opportunity.
    # - MACD: A positive MACD value (i.e., MACD > Signal Line) can suggest an uptrend.
    # - SMA/EMA: If the current price is above the SMA or EMA, it may indicate an upward trend.
    data_copy['buy_signal'] = (
        (data_copy['future_return'] > price_threshold) &
        #(data_copy['RSI'] < 30) &   # buy signal for RSI
        (data_copy['RSI'] < 40) &    # buy signal for RSI
        ((data_copy['MACD'] > data_copy['Signal_Line']) & (data_copy['MACD'].shift(1) <= data_copy['Signal_Line'].shift(1))) & # buy signal for MACD
        ((data_copy['close'] > data_copy['SMA']) & (data_copy['close'].shift(1) <= data_copy['SMA'].shift(1))) # buy signal for SMA
    )

    data_copy['sell_signal'] = (
        (data_copy['future_return'] < - price_threshold) &
        #(data_copy['RSI'] < 30) &    # sell signal for RSI 
        (data_copy['RSI'] > 60) &     # sell signal for RSI 
        (data_copy['MACD'] < data_copy['Signal_Line']) & (data_copy['MACD'].shift(1) >= data_copy['Signal_Line'].shift(1)) & # sell signal for MACD 
        (data_copy['close'] < data_copy['SMA']) & (data_copy['close'].shift(1) >= data_copy['SMA'].shift(1)) # sell signal for SMA 
    )

    # Define the target as 1 (buy) if all conditions are met, and 0 (sell) if they are not. 
    # Convert `buy_signal` to an integer for binary classification.
    #data_copy['target'] = data_copy['buy_signal'].astype(int)

    # Initialize the 'target' column with default value
    data_copy['target'] = 0 # 'do nothing' signal
    
    # Fill the 'target' with the 'buy_signal' and 'sell_signal' conditions 
    data_copy.loc[(data_copy['buy_signal'] == True) & (data_copy['sell_signal'] == False), 'target'] = 1 # 'buy' signal
    data_copy.loc[(data_copy['sell_signal'] == True) & (data_copy['buy_signal'] == False), 'target'] = -1 # 'sell' signal

    # The last row in your dataset will have a NaN value for 'future_return' due to the shift operation. Drop this row to clean up the dataset.
    data_copy = data_copy.dropna()

    # Check the balance of 1s and 0s in our target variable to understand how many “buy” and “sell” signals we have.
    print(data_copy['target'].value_counts())
    print(data_copy['buy_signal'].value_counts())
    print(data_copy['sell_signal'].value_counts())

    return data_copy

In [17]:
# Define the threshold for the price change to classify as 'buy' or 'sell'. For instance, if we want a 1% increase to be a 'buy' signal,
# the threshold will be 0.01.
threshold = 0.01

# Compute the 'target' variable
btc_price_data_1_year = create_target_variable_tech_indicators(btc_price_data_1_year, threshold)

target
 0    528494
 1        65
-1        53
Name: count, dtype: int64
buy_signal
False    528547
True         65
Name: count, dtype: int64
sell_signal
False    528559
True         53
Name: count, dtype: int64


In [18]:
btc_price_data_1_year.target.value_counts()

target
 0    528494
 1        65
-1        53
Name: count, dtype: int64

### 2.4. Clean up the Data

In [19]:
btc_price_data_1_year.shape

(528612, 19)

In [20]:
btc_price_data_1_year.dtypes

timestamp        datetime64[ns]
open                    float64
high                    float64
low                     float64
close                   float64
volume                  float64
date             datetime64[ns]
time                     object
RSI                     float64
EMA12                   float64
EMA26                   float64
MACD                    float64
Signal_Line             float64
SMA                     float64
EMA                     float64
future_return           float64
buy_signal                 bool
sell_signal                bool
target                    int64
dtype: object

#### TODO - What should we do with the 'time' (object type)? Do we need this column ???? 

In [21]:
btc_price_data_1_year.isnull().sum()

timestamp        0
open             0
high             0
low              0
close            0
volume           0
date             0
time             0
RSI              0
EMA12            0
EMA26            0
MACD             0
Signal_Line      0
SMA              0
EMA              0
future_return    0
buy_signal       0
sell_signal      0
target           0
dtype: int64

There are some rows in the dataset that have a `NaN` value for `RSI` and `SMA` due to the calculation process. We will delete these rows to clean up the dataset.

In [22]:
btc_price_data_1_year = btc_price_data_1_year.dropna().reset_index(drop = True)

In [23]:
btc_price_data_1_year

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,RSI,EMA12,EMA26,MACD,Signal_Line,SMA,EMA,future_return,buy_signal,sell_signal,target
0,2023-11-01 00:19:00,34612.24,34634.46,34618.78,34631.66,2.974894,2023-11-01,00:19:00,56.177122,34622.946322,34632.753989,-9.807666,-11.531193,34626.6730,34628.595251,0.038549,False,False,0
1,2023-11-01 00:20:00,34631.71,34661.72,34633.32,34645.01,4.406042,2023-11-01,00:20:00,55.560545,34626.340734,34633.661841,-7.321107,-10.689176,34625.5295,34630.158560,-0.111127,False,False,0
2,2023-11-01 00:21:00,34602.75,34651.54,34646.48,34606.51,14.372378,2023-11-01,00:21:00,34.866742,34623.289852,34631.650594,-8.360742,-10.223489,34623.7140,34627.906317,-0.113794,False,False,0
3,2023-11-01 00:22:00,34560.23,34604.60,34604.60,34567.13,13.971646,2023-11-01,00:22:00,31.948538,34614.649875,34626.871291,-12.221416,-10.623074,34619.2425,34622.118096,-0.048919,False,False,0
4,2023-11-01 00:23:00,34543.03,34568.26,34567.04,34550.22,2.956500,2023-11-01,00:23:00,33.177865,34604.737586,34621.193417,-16.455831,-11.789626,34615.2865,34615.270658,0.045933,False,False,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
528607,2024-10-31 23:55:00,70248.97,70264.97,70248.98,70248.97,1.604753,2024-10-31,23:55:00,15.731750,70316.514762,70363.992465,-47.477702,-35.505422,70367.7305,70348.638181,-0.014534,False,False,0
528608,2024-10-31 23:56:00,70238.76,70248.97,70248.97,70238.76,1.189134,2024-10-31,23:56:00,20.028025,70304.552491,70354.715986,-50.163495,-38.437037,70352.5490,70338.173592,-0.007859,False,False,0
528609,2024-10-31 23:57:00,70218.00,70250.00,70238.77,70233.24,4.767082,2024-10-31,23:57:00,14.761536,70293.581339,70345.717765,-52.136426,-41.176915,70338.1685,70328.179916,-0.036251,False,False,0
528610,2024-10-31 23:58:00,70193.97,70242.25,70232.55,70207.78,9.589688,2024-10-31,23:58:00,11.458672,70280.381133,70335.500152,-55.119020,-43.965336,70325.1425,70316.713258,-0.014172,False,False,0


In [24]:
btc_price_data_1_year[btc_price_data_1_year.target == 1]

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,RSI,EMA12,EMA26,MACD,Signal_Line,SMA,EMA,future_return,buy_signal,sell_signal,target
6198,2023-11-05 07:17:00,35079.16,35085.99,35079.16,35085.99,0.372082,2023-11-05,07:17:00,35.083814,35080.455561,35086.009628,-5.554067,-5.734705,35085.6290,35083.465299,0.026848,True,False,1
8741,2023-11-07 01:31:00,34906.81,34930.96,34911.41,34930.96,3.024206,2023-11-07,01:31:00,39.529273,34910.035546,34919.928658,-9.893111,-11.678726,34922.2560,34916.035388,0.024849,True,False,1
13246,2023-11-10 04:21:00,36705.92,36728.38,36707.10,36727.85,6.018669,2023-11-10,04:21:00,33.342551,36715.394696,36728.795553,-13.400857,-13.453451,36726.5615,36723.343488,0.021183,True,False,1
22485,2023-11-16 13:50:00,36825.01,36869.00,36833.99,36860.86,13.812849,2023-11-16,13:50:00,38.255364,36832.835708,36873.607247,-40.771539,-44.389986,36856.8055,36855.648349,0.105885,True,False,1
26449,2023-11-19 07:41:00,36617.06,36631.18,36617.38,36628.72,1.410292,2023-11-19,07:41:00,38.472803,36622.546226,36627.698520,-5.152295,-5.528788,36624.7890,36625.805747,0.014005,True,False,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495502,2024-10-09 00:47:00,62084.24,62115.60,62092.68,62093.54,0.213215,2024-10-09,00:47:00,37.075674,62057.374857,62081.452612,-24.077755,-26.432295,62092.3085,62072.551829,0.049474,True,False,1
501659,2024-10-13 07:03:00,62883.18,62888.00,62883.18,62888.00,0.061702,2024-10-13,07:03:00,39.286045,62876.876205,62880.830622,-3.954416,-4.183996,62883.2145,62879.512047,0.015663,True,False,1
504847,2024-10-15 12:01:00,65374.10,65427.52,65376.66,65420.80,7.891222,2024-10-15,12:01:00,39.361405,65380.425166,65421.688861,-41.263695,-44.267955,65418.6000,65405.392641,0.060822,True,False,1
510747,2024-10-19 14:01:00,68114.61,68136.53,68116.06,68133.77,0.738000,2024-10-19,14:01:00,36.235559,68119.778540,68126.469020,-6.690480,-7.617746,68125.8100,68123.490999,0.023674,True,False,1


In [25]:
btc_price_data_1_year[btc_price_data_1_year.target == -1].head()

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,RSI,EMA12,EMA26,MACD,Signal_Line,SMA,EMA,future_return,buy_signal,sell_signal,target
11757,2023-11-09 03:37:00,36453.78,36494.56,36493.49,36455.82,8.10178,2023-11-09,03:37:00,60.911418,36475.970981,36441.322537,34.648445,36.015007,36456.3585,36455.057829,-0.013386,False,True,-1
30432,2023-11-22 01:50:00,36111.75,36131.47,36123.27,36116.79,8.088061,2023-11-22,01:50:00,60.158389,36134.978298,36124.776041,10.202257,10.918237,36121.168,36128.626143,-0.054158,False,True,-1
32572,2023-11-23 13:23:00,37354.95,37368.6,37362.44,37355.11,5.426871,2023-11-23,13:23:00,60.441962,37358.461288,37353.86881,4.592478,4.948847,37355.337,37354.992787,-0.010869,False,True,-1
55621,2023-12-09 12:16:00,43836.04,43854.03,43851.85,43836.04,1.920779,2023-12-09,12:16:00,61.83453,43845.546957,43838.830581,6.716376,6.982933,43837.2025,43841.246343,-0.112738,False,True,-1
58783,2023-12-11 16:47:00,41700.0,41742.31,41733.67,41706.96,13.764782,2023-12-11,16:47:00,64.680281,41723.478893,41694.841088,28.637805,28.781298,41708.6905,41704.476063,-0.096339,False,True,-1
76069,2023-12-23 15:56:00,43811.39,43819.15,43816.78,43811.92,2.516018,2023-12-23,15:56:00,61.358424,43821.189287,43814.64143,6.547857,7.436234,43812.8625,43817.351751,-0.066877,False,True,-1
77719,2023-12-24 19:20:00,43685.0,43702.52,43702.52,43688.24,1.763181,2023-12-24,19:20:00,64.057971,43697.083246,43687.698132,9.385114,10.074427,43688.489,43691.536092,-0.031679,False,True,-1
90418,2024-01-02 14:17:00,45750.04,45818.29,45810.67,45755.75,36.089696,2024-01-02,14:17:00,60.1006,45787.435428,45752.784127,34.651302,36.086396,45761.557,45767.339011,-0.050595,False,True,-1
91522,2024-01-03 08:37:00,45153.68,45170.41,45170.41,45164.2,2.399306,2024-01-03,08:37:00,64.436941,45167.900534,45153.974114,13.92642,14.983082,45164.959,45159.126396,-0.012776,False,True,-1
96353,2024-01-06 16:52:00,43900.0,43915.03,43915.03,43901.42,1.706752,2024-01-06,16:52:00,60.513168,43912.874335,43910.447266,2.427069,2.91817,43907.0215,43911.201586,-0.014464,False,True,-1


In [26]:
btc_price_data_1_year.columns

Index(['timestamp', 'open', 'high', 'low', 'close', 'volume', 'date', 'time',
       'RSI', 'EMA12', 'EMA26', 'MACD', 'Signal_Line', 'SMA', 'EMA',
       'future_return', 'buy_signal', 'sell_signal', 'target'],
      dtype='object')

## 3. Step 3: Building the Predictive Model

1. **Model Selection**:
   - Use supervised learning models like **Random Forest**, **XGBoost**, or **SVM** to classify moments as "buy" or "sell."

2. **Training the Model**:
   - Split the data into training and testing sets.
   - Train the model on the training data and tune hyperparameters for optimal performance.

3. **Model Evaluation**:
   - Use metrics like **accuracy**, **F1 score**, **Precision**, and **Recall** to assess how well the model predicts "buy" and "sell" signals.



Here is a detailed step-by-step procedure for creating a model to predict Bitcoin trading signals using machine learning:

### 3.1. Step 1: Data Preparation

1. **Data Splitting**:
   - Use a portion of historical data as a training set and the remaining portion as a test set (e.g., 80/20 split).

2. **Creating the Target Variable**:
   - Create a target variable for the model, such as "buy" or "sell."
   - Example: Define "buy" (1) if the price is expected to rise by more than 1% within the next 24 hours, and "sell" (0) otherwise.

3. **Adding Technical Indicators**:
   - Calculate indicators like **RSI**, **MACD**, **Bollinger Bands**, **Moving Averages**, and add them as columns to the dataset. These will be used as features for training the model.

4. **Scaling the Data**:
   - For algorithms like SVM and KNN, scale feature values to a range of 0 to 1 or -1 to 1. Use `StandardScaler` or `MinMaxScaler` from `scikit-learn`.

### 3.2. Step 2: Model Selection

1. **Choosing an Algorithm**:
   - Select a model suitable for **binary classification**, such as:
     - **Random Forest**: Capable of capturing complex, nonlinear patterns in data.
     - **XGBoost**: A boosting algorithm that often achieves high accuracy by combining weak classifiers.
     - **SVM (Support Vector Machine)**: Useful when data is well-scaled and a model with good generalization ability is needed.

2. **Implementing the Model**:
   - Import the chosen model from `scikit-learn` (e.g., `RandomForestClassifier`, `XGBClassifier`, or `SVC`).
   - Set initial hyperparameters (start with default values and optimize later).

Example code:

```python
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
```

### 3.3. Step 3: Training the Model

1. **Training the Model**:
   - Train the model using the training dataset. Split the data into X (features) and y (target variable).
   
   ```python
   model.fit(X_train, y_train)
   ```

2. **Hyperparameter Optimization (Optional)**:
   - Use `GridSearchCV` or `RandomizedSearchCV` to tune hyperparameters. This can help the model achieve better results.

### 3.4. Step 4: Model Evaluation

1. **Making Predictions**:
   - Use the model to make predictions on the test dataset.

   ```python
   y_pred = model.predict(X_test)
   ```

2. **Evaluating Performance**:
   - Use metrics for binary model evaluation, such as accuracy, precision, recall, and F1-score.
   
   ```python
   from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
   
   accuracy = accuracy_score(y_test, y_pred)
   precision = precision_score(y_test, y_pred)
   recall = recall_score(y_test, y_pred)
   f1 = f1_score(y_test, y_pred)
   
   print(f'Accuracy: {accuracy:.2f}')
   print(f'Precision: {precision:.2f}')
   print(f'Recall: {recall:.2f}')
   print(f'F1 Score: {f1:.2f}')
   ```

### 3.5. Step 5: Saving the Model (Optional)

Once the model is trained and has achieved the desired accuracy, you can save it for later use with `joblib` or `pickle`.

```python
import joblib
joblib.dump(model, 'trading_model.pkl')
```

### 3.6. Step 6: Final Testing

After training, conduct additional tests with new (unseen) data to ensure the model generalizes well to new scenarios.

Afterwards, you can use this model for real-time predictions in a trading bot.

## 4. Step 4: Strategy Simulation (Backtesting)

Before deploying the bot in a live environment, backtest it to assess its performance on historical data.

1. **Use the Test Dataset**: Evaluate how the bot would perform if buy and sell decisions had been made based on historical data.
2. **Evaluate the Strategy**: Calculate key metrics like:
   - **Return**: Compare achieved profit relative to a baseline (e.g., buy-and-hold).
   - **Maximum Drawdown**: Assess the largest losses during consecutive failed trades.
   - **Risk/Reward Ratio**.