In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Calculate Target Variable 

In [2]:
btc_price_data_1_year = pd.read_csv("data/bitcoin_historical_data_1_year.csv")
btc_price_data_1_year

Unnamed: 0,timestamp,open,high,low,close,volume,date,time
0,2023-11-01 00:00:00,34618.86,34676.51,34656.38,34667.88,48.953000,2023-11-01,00:00:00
1,2023-11-01 00:01:00,34642.54,34687.53,34673.30,34642.82,16.178075,2023-11-01,00:01:00
2,2023-11-01 00:02:00,34637.97,34656.82,34642.53,34656.56,8.753120,2023-11-01,00:02:00
3,2023-11-01 00:03:00,34617.22,34656.56,34656.56,34629.34,11.308610,2023-11-01,00:03:00
4,2023-11-01 00:04:00,34597.99,34630.42,34629.41,34622.27,8.583808,2023-11-01,00:04:00
...,...,...,...,...,...,...,...,...
528627,2024-10-31 23:56:00,70238.76,70248.97,70248.97,70238.76,1.189134,2024-10-31,23:56:00
528628,2024-10-31 23:57:00,70218.00,70250.00,70238.77,70233.24,4.767082,2024-10-31,23:57:00
528629,2024-10-31 23:58:00,70193.97,70242.25,70232.55,70207.78,9.589688,2024-10-31,23:58:00
528630,2024-10-31 23:59:00,70175.16,70207.79,70207.79,70197.83,7.112237,2024-10-31,23:59:00


## 1. Calculate Target Variable without Technical Indicators

### 1.1. Function

In [3]:
def create_target_variable(data, threshold = 0.01):
    """
    Computes and sets the 'target' variable from the input 'data' and 'threshold'.
    Creates a 'target' column with the computed values in the 'data' DataFrame.

    Parameters
    ----------
    data: a DataFrame with the time series data. There must be a column named 'close'! 
          This column will be used by the user to calculate the 'target' variable.
    
    threshold: threshold for the price change to classify as 'buy' or 'sell'. For instance, if you want a 1% increase to be a 'buy' signal,
               the threshold will be 0.01. Adjust this threshold as per your strategy.
    """
    # Create a copy of the DataFrame
    data_copy = data.copy(deep=True)
    
    # Compute the percentage change between the current close price and the close price in the next period.
    # This will help define whether there’s a significant increase or decrease.
    data_copy['future_return'] = ((data_copy['close'].shift(-1) - data_copy['close']) / data_copy['close']) * 100

    # Define the target as 1 (buy) if the future return is above the threshold, and 0 (sell) if it is below or equal to the threshold.
    data_copy['target'] = (data_copy['future_return'] > threshold).astype(int)

    # The last row in your dataset will have a NaN value for 'future_return' due to the shift operation. Drop this row to clean up the dataset.
    data_copy = data_copy.dropna()

    # Check the balance of 1s and 0s in our target variable to understand how many “buy” and “sell” signals we have.
    print(data_copy['target'].value_counts())

    return data_copy

In [4]:
# Define the threshold for the price change to classify as 'buy' or 'sell'. For instance, if we want a 1% increase to be a 'buy' signal,
# the threshold will be 0.01.
threshold = 0.01

# Compute the 'target' variable
btc_price_data_1_year_target = create_target_variable(btc_price_data_1_year, threshold)

target
0    317152
1    211479
Name: count, dtype: int64


In [5]:
btc_price_data_1_year.head()

Unnamed: 0,timestamp,open,high,low,close,volume,date,time
0,2023-11-01 00:00:00,34618.86,34676.51,34656.38,34667.88,48.953,2023-11-01,00:00:00
1,2023-11-01 00:01:00,34642.54,34687.53,34673.3,34642.82,16.178075,2023-11-01,00:01:00
2,2023-11-01 00:02:00,34637.97,34656.82,34642.53,34656.56,8.75312,2023-11-01,00:02:00
3,2023-11-01 00:03:00,34617.22,34656.56,34656.56,34629.34,11.30861,2023-11-01,00:03:00
4,2023-11-01 00:04:00,34597.99,34630.42,34629.41,34622.27,8.583808,2023-11-01,00:04:00


In [6]:
btc_price_data_1_year_target.head()

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,future_return,target
0,2023-11-01 00:00:00,34618.86,34676.51,34656.38,34667.88,48.953,2023-11-01,00:00:00,-0.072286,0
1,2023-11-01 00:01:00,34642.54,34687.53,34673.3,34642.82,16.178075,2023-11-01,00:01:00,0.039662,1
2,2023-11-01 00:02:00,34637.97,34656.82,34642.53,34656.56,8.75312,2023-11-01,00:02:00,-0.078542,0
3,2023-11-01 00:03:00,34617.22,34656.56,34656.56,34629.34,11.30861,2023-11-01,00:03:00,-0.020416,0
4,2023-11-01 00:04:00,34597.99,34630.42,34629.41,34622.27,8.583808,2023-11-01,00:04:00,-0.045404,0


### 1.2. Calculate Target Variable - Explanation

To create a binary target variable for price prediction:

1. **Choose the Prediction Horizon**:  
   Decide the period over which you want to predict price changes. For example, if you want to predict whether the price will increase within the next day or hour, specify that period. Let’s say we’re using the next row in your dataset as the future period for simplicity.

2. **Calculate Future Returns**:  
   Compute the percentage change between the current close price and the close price in the next period. This will help define whether there’s a significant increase or decrease.

   ```python
   df['future_return'] = (df['close'].shift(-1) - df['close']) / df['close']
   ```

3. **Define the Threshold for Significant Price Change**:  
   Set a threshold for the price change to classify as “buy” or “sell.” For instance, if you want a 1% increase to be a “buy” signal, the threshold will be 0.01. Adjust this threshold as per your strategy.

   ```python
   threshold = 0.01
   ```

4. **Create the Target Variable**:  
   Define the target as 1 (buy) if the future return is above the threshold, and 0 (sell) if it is below or equal to the threshold.

   ```python
   df['target'] = (df['future_return'] > threshold).astype(int)
   ```

5. **Remove Any NaN Values**:  
   The last row in your dataset will have a NaN value for `future_return` due to the shift operation. Drop this row to clean up the dataset.

   ```python
   df = df.dropna()
   ```

6. **Verify the Target Distribution**:  
   Finally, it’s helpful to check the balance of 1s and 0s in your target variable to understand how many “buy” and “sell” signals you have.

   ```python
   print(df['target'].value_counts())
   ```

After this, you’ll have a `target` column in your dataset, which you can use as the target variable for your XGBoost model. Let me know if you need further clarification on any step!

#### 1. Calculate Future Returns:  
  **Choose the Prediction Horizon**: Decide the period over which you want to predict price changes. For example, if you want to predict whether the price will increase within the next day or hour, specify that period. Let’s say we’re using the next row in your dataset as the future period for simplicity.
   
   Compute the **percentage change** between the current close price and the close price in the next period. This will help define whether there’s a significant increase or decrease.

In [7]:
btc_price_data_1_year['future_return'] = ((btc_price_data_1_year['close'].shift(-1) - btc_price_data_1_year['close']) / btc_price_data_1_year['close']) * 100

In [8]:
btc_price_data_1_year

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,future_return
0,2023-11-01 00:00:00,34618.86,34676.51,34656.38,34667.88,48.953000,2023-11-01,00:00:00,-0.072286
1,2023-11-01 00:01:00,34642.54,34687.53,34673.30,34642.82,16.178075,2023-11-01,00:01:00,0.039662
2,2023-11-01 00:02:00,34637.97,34656.82,34642.53,34656.56,8.753120,2023-11-01,00:02:00,-0.078542
3,2023-11-01 00:03:00,34617.22,34656.56,34656.56,34629.34,11.308610,2023-11-01,00:03:00,-0.020416
4,2023-11-01 00:04:00,34597.99,34630.42,34629.41,34622.27,8.583808,2023-11-01,00:04:00,-0.045404
...,...,...,...,...,...,...,...,...,...
528627,2024-10-31 23:56:00,70238.76,70248.97,70248.97,70238.76,1.189134,2024-10-31,23:56:00,-0.007859
528628,2024-10-31 23:57:00,70218.00,70250.00,70238.77,70233.24,4.767082,2024-10-31,23:57:00,-0.036251
528629,2024-10-31 23:58:00,70193.97,70242.25,70232.55,70207.78,9.589688,2024-10-31,23:58:00,-0.014172
528630,2024-10-31 23:59:00,70175.16,70207.79,70207.79,70197.83,7.112237,2024-10-31,23:59:00,0.017223


#### 2. Define the Threshold for Significant Price Change:  
   Set a threshold for the price change to classify as “buy” or “sell.” For instance, if you want a 1% increase to be a “buy” signal, the threshold will be 0.01. Adjust this threshold as per your strategy.


In [9]:
# For 1% increase to be a “buy” signal, the threshold will be 0.01
threshold = 0.01

#### 3. Create the Target Variable:  
   Define the target as 1 (buy) if the future return is above the threshold, and 0 (sell) if it is below or equal to the threshold.


In [10]:
btc_price_data_1_year['target'] = (btc_price_data_1_year['future_return'] > threshold).astype(int)
btc_price_data_1_year

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,future_return,target
0,2023-11-01 00:00:00,34618.86,34676.51,34656.38,34667.88,48.953000,2023-11-01,00:00:00,-0.072286,0
1,2023-11-01 00:01:00,34642.54,34687.53,34673.30,34642.82,16.178075,2023-11-01,00:01:00,0.039662,1
2,2023-11-01 00:02:00,34637.97,34656.82,34642.53,34656.56,8.753120,2023-11-01,00:02:00,-0.078542,0
3,2023-11-01 00:03:00,34617.22,34656.56,34656.56,34629.34,11.308610,2023-11-01,00:03:00,-0.020416,0
4,2023-11-01 00:04:00,34597.99,34630.42,34629.41,34622.27,8.583808,2023-11-01,00:04:00,-0.045404,0
...,...,...,...,...,...,...,...,...,...,...
528627,2024-10-31 23:56:00,70238.76,70248.97,70248.97,70238.76,1.189134,2024-10-31,23:56:00,-0.007859,0
528628,2024-10-31 23:57:00,70218.00,70250.00,70238.77,70233.24,4.767082,2024-10-31,23:57:00,-0.036251,0
528629,2024-10-31 23:58:00,70193.97,70242.25,70232.55,70207.78,9.589688,2024-10-31,23:58:00,-0.014172,0
528630,2024-10-31 23:59:00,70175.16,70207.79,70207.79,70197.83,7.112237,2024-10-31,23:59:00,0.017223,1


In [11]:
btc_price_data_1_year[btc_price_data_1_year.target == 1].head(10)

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,future_return,target
1,2023-11-01 00:01:00,34642.54,34687.53,34673.3,34642.82,16.178075,2023-11-01,00:01:00,0.039662,1
5,2023-11-01 00:05:00,34601.12,34629.21,34620.51,34606.55,4.76513,2023-11-01,00:05:00,0.046725,1
6,2023-11-01 00:06:00,34601.14,34630.82,34606.55,34622.72,3.357925,2023-11-01,00:06:00,0.124398,1
11,2023-11-01 00:11:00,34590.21,34614.89,34605.0,34602.26,3.475336,2023-11-01,00:11:00,0.060227,1
15,2023-11-01 00:15:00,34598.81,34627.47,34608.5,34609.07,5.081769,2023-11-01,00:15:00,0.055679,1
18,2023-11-01 00:18:00,34601.29,34626.58,34620.51,34618.08,8.066458,2023-11-01,00:18:00,0.039228,1
19,2023-11-01 00:19:00,34612.24,34634.46,34618.78,34631.66,2.974894,2023-11-01,00:19:00,0.038549,1
23,2023-11-01 00:23:00,34543.03,34568.26,34567.04,34550.22,2.9565,2023-11-01,00:23:00,0.045933,1
26,2023-11-01 00:26:00,34528.91,34557.07,34553.5,34533.88,5.26977,2023-11-01,00:26:00,0.042654,1
27,2023-11-01 00:27:00,34517.59,34549.46,34536.93,34548.61,6.354277,2023-11-01,00:27:00,0.015109,1


In [12]:
print("Count buy signals:", len(btc_price_data_1_year[btc_price_data_1_year.target == 1]))
print("Count sell signals:", len(btc_price_data_1_year[btc_price_data_1_year.target == 0]))

Count buy signals: 211479
Count sell signals: 317153


#### 4. Remove Any NaN Values:  
The last row in your dataset will have a NaN value for `future_return` due to the shift operation. Drop this row to clean up the dataset.

In [13]:
btc_price_data_1_year = btc_price_data_1_year.dropna()
btc_price_data_1_year

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,future_return,target
0,2023-11-01 00:00:00,34618.86,34676.51,34656.38,34667.88,48.953000,2023-11-01,00:00:00,-0.072286,0
1,2023-11-01 00:01:00,34642.54,34687.53,34673.30,34642.82,16.178075,2023-11-01,00:01:00,0.039662,1
2,2023-11-01 00:02:00,34637.97,34656.82,34642.53,34656.56,8.753120,2023-11-01,00:02:00,-0.078542,0
3,2023-11-01 00:03:00,34617.22,34656.56,34656.56,34629.34,11.308610,2023-11-01,00:03:00,-0.020416,0
4,2023-11-01 00:04:00,34597.99,34630.42,34629.41,34622.27,8.583808,2023-11-01,00:04:00,-0.045404,0
...,...,...,...,...,...,...,...,...,...,...
528626,2024-10-31 23:55:00,70248.97,70264.97,70248.98,70248.97,1.604753,2024-10-31,23:55:00,-0.014534,0
528627,2024-10-31 23:56:00,70238.76,70248.97,70248.97,70238.76,1.189134,2024-10-31,23:56:00,-0.007859,0
528628,2024-10-31 23:57:00,70218.00,70250.00,70238.77,70233.24,4.767082,2024-10-31,23:57:00,-0.036251,0
528629,2024-10-31 23:58:00,70193.97,70242.25,70232.55,70207.78,9.589688,2024-10-31,23:58:00,-0.014172,0


#### 5. Verify the Target Distribution:  
Check the balance of 1s and 0s in our target variable to understand how many “buy” and “sell” signals we have.

In [14]:
btc_price_data_1_year['target'].value_counts()

target
0    317152
1    211479
Name: count, dtype: int64

In [15]:
btc_price_data_1_year.head()

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,future_return,target
0,2023-11-01 00:00:00,34618.86,34676.51,34656.38,34667.88,48.953,2023-11-01,00:00:00,-0.072286,0
1,2023-11-01 00:01:00,34642.54,34687.53,34673.3,34642.82,16.178075,2023-11-01,00:01:00,0.039662,1
2,2023-11-01 00:02:00,34637.97,34656.82,34642.53,34656.56,8.75312,2023-11-01,00:02:00,-0.078542,0
3,2023-11-01 00:03:00,34617.22,34656.56,34656.56,34629.34,11.30861,2023-11-01,00:03:00,-0.020416,0
4,2023-11-01 00:04:00,34597.99,34630.42,34629.41,34622.27,8.583808,2023-11-01,00:04:00,-0.045404,0


In [16]:
btc_price_data_1_year_target.head()

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,future_return,target
0,2023-11-01 00:00:00,34618.86,34676.51,34656.38,34667.88,48.953,2023-11-01,00:00:00,-0.072286,0
1,2023-11-01 00:01:00,34642.54,34687.53,34673.3,34642.82,16.178075,2023-11-01,00:01:00,0.039662,1
2,2023-11-01 00:02:00,34637.97,34656.82,34642.53,34656.56,8.75312,2023-11-01,00:02:00,-0.078542,0
3,2023-11-01 00:03:00,34617.22,34656.56,34656.56,34629.34,11.30861,2023-11-01,00:03:00,-0.020416,0
4,2023-11-01 00:04:00,34597.99,34630.42,34629.41,34622.27,8.583808,2023-11-01,00:04:00,-0.045404,0


## 2. Calculate Target Variable with Technical Indicators

### 2.1. Create Technical Indicators

In [17]:
def calculate_rsi(data, window=14):
    """
    RSI is a momentum oscillator that measures the speed and change of price movements, typically over a 14-period interval.
    Assuming 'data' is a DataFrame with a 'close' price column

    Parameters
    ----------
    data:   a DataFrame with the time series data. A column with the name 'close' must be present in the DataFrame! 
            This column is used to calculate the 'RSI' value.
    window: the time period that is taken into account when calculating the 'RSI'
    """
    delta = data['close'].diff(1)
    gain = delta.where(delta > 0, 0)
    loss = -delta.where(delta < 0, 0)
    
    avg_gain = gain.rolling(window=window, min_periods=1).mean()
    avg_loss = loss.rolling(window=window, min_periods=1).mean()

    rs = avg_gain / avg_loss
    rsi = 100 - (100 / (1 + rs))
    data['RSI'] = rsi
    
    return data

In [18]:
def calculate_macd(data, short_window=12, long_window=26, signal_window=9):
    """
    MACD is calculated using two exponential moving averages (EMA): the 12-day EMA and the 26-day EMA,
    with a 9-day EMA as the signal line.
    
    """
    data['EMA12'] = data['close'].ewm(span=short_window, adjust=False).mean()
    data['EMA26'] = data['close'].ewm(span=long_window, adjust=False).mean()
    
    # MACD Line
    data['MACD'] = data['EMA12'] - data['EMA26']
    
    # Signal Line
    data['Signal_Line'] = data['MACD'].ewm(span=signal_window, adjust=False).mean()
    
    return data

In [19]:
def calculate_moving_averages(data, sma_window=20, ema_window=20):
    """
    Simple Moving Average (SMA) is the average price over a specified number of periods, 
    while Exponential Moving Average (EMA) gives more weight to recent prices.
    
    """
    # Simple Moving Average
    data['SMA'] = data['close'].rolling(window=sma_window).mean()
    
    # Exponential Moving Average
    data['EMA'] = data['close'].ewm(span=ema_window, adjust=False).mean()
    
    return data

In [20]:
# Calculate RSI
btc_price_data_1_year = calculate_rsi(btc_price_data_1_year)
print(btc_price_data_1_year[['close', 'RSI']])

# Calculate MACD
btc_price_data_1_year = calculate_macd(btc_price_data_1_year)
print(btc_price_data_1_year[['close', 'MACD', 'Signal_Line']])

# Calculate moving averages
btc_price_data_1_year = calculate_moving_averages(btc_price_data_1_year)
print(btc_price_data_1_year[['close', 'SMA', 'EMA']])

           close        RSI
0       34667.88        NaN
1       34642.82   0.000000
2       34656.56  35.412371
3       34629.34  20.811875
4       34622.27  18.798741
...          ...        ...
528626  70248.97  15.731750
528627  70238.76  20.028025
528628  70233.24  14.761536
528629  70207.78  11.458672
528630  70197.83  11.677624

[528631 rows x 2 columns]
           close       MACD  Signal_Line
0       34667.88   0.000000     0.000000
1       34642.82  -1.999088    -0.399818
2       34656.56  -2.446476    -0.809149
3       34629.34  -4.940509    -1.635421
4       34622.27  -7.402210    -2.788779
...          ...        ...          ...
528626  70248.97 -47.477702   -35.505422
528627  70238.76 -50.163495   -38.437037
528628  70233.24 -52.136426   -41.176915
528629  70207.78 -55.119020   -43.965336
528630  70197.83 -57.621405   -46.696549

[528631 rows x 3 columns]
           close         SMA           EMA
0       34667.88         NaN  34667.880000
1       34642.82         NaN  34

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['RSI'] = rsi
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['EMA12'] = data['close'].ewm(span=short_window, adjust=False).mean()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['EMA26'] = data['close'].ewm(span=long_window, adjust=False).mean()
A value is trying to be set on a copy of

### 2.2. Function for the Target Calculation

In [71]:
def create_target_variable_tech_indicators(data, price_threshold = 0.01):
    """
    Computes and sets the 'target' variable from the input 'data' and 'threshold'.
    Creates a 'target' column with the computed values in the 'data' DataFrame.

    Parameters
    ----------
    data: a DataFrame with the time series data. There must be a column named 'close'! 
          This column will be used by the user to calculate the 'target' variable.
    
    threshold: threshold for the price change to classify as 'buy' or 'sell'. For instance, if you want a 1% increase to be a 'buy' signal,
               the threshold will be 0.01. Adjust this threshold as per your strategy.
    """
    # Create a copy of the DataFrame
    data_copy = data.copy(deep=True)
    
    # Compute the percentage change between the current close price and the close price in the next period.
    # This will help define whether there’s a significant increase or decrease.
    data_copy['future_return'] = ((data_copy['close'].shift(-1) - data_copy['close']) / data_copy['close']) * 100


    # Define Buy and Sell Signals 'MACD'
    # df['buy_signal'] = (df['MACD'] > df['Signal_Line']) & (df['MACD'].shift(1) <= df['Signal_Line'].shift(1))
    # df['sell_signal'] = (df['MACD'] < df['Signal_Line']) & (df['MACD'].shift(1) >= df['Signal_Line'].shift(1))

    # Define Buy and Sell Signals SMA
    # df['buy_signal'] = (data_copy['close'] > data_copy['SMA']) & (data_copy['close'].shift(1) <= data_copy['SMA'].shift(1))
    # df['sell_signal'] = (data_copy['close'] < data_copy['SMA']) & (data_copy['close'].shift(1) >= data_copy['SMA'].shift(1))
    
    # Use the technical indicators to add more conditions to the target:
    # - RSI: A Relative Strength Index (RSI) value below 30 often indicates an oversold condition, which might suggest a buying opportunity.
    # - MACD: A positive MACD value (i.e., MACD > Signal Line) can suggest an uptrend.
    # - SMA/EMA: If the current price is above the SMA or EMA, it may indicate an upward trend.
    data_copy['buy_signal'] = (
        (data_copy['future_return'] > price_threshold) &
        #(data_copy['RSI'] < 30) &                     # Example condition for RSI
        (data_copy['RSI'] < 40) &                     # Example condition for RSI
        
        #(data_copy['MACD'] > data_copy['Signal_Line']) &      # Example condition for MACD
        ((data_copy['MACD'] > data_copy['Signal_Line']) & (data_copy['MACD'].shift(1) <= data_copy['Signal_Line'].shift(1))) & # buy_signal for MACD
        
        #(data_copy['close'] > data_copy['SMA']) &             # Price Crossover Signal: SMA
        #(data_copy['close'] > data_copy['EMA']) &             # Price Crossover Signal: EMA
        #(data_copy['EMA12'] > data_copy['EMA26']) &             # Moving Average Crossover Signal
        
        #(data_copy['close'] > data_copy['SMA']) & (data_copy['EMA12'] > data_copy['EMA26']) & # Combined Price and Moving Average Signal
        #(data_copy['EMA'].diff() > 0)  &                       # Moving Average Slope Signal

        ((data_copy['close'] > data_copy['SMA']) & (data_copy['close'].shift(1) <= data_copy['SMA'].shift(1)))
    )

    data_copy['sell_signal'] = (
        (data_copy['future_return'] < - price_threshold) &
        #(data_copy['RSI'] < 30) &                     # Example condition for RSI
        (data_copy['RSI'] > 60) &                     # Example condition for RSI
        #(data_copy['MACD'] > data_copy['Signal_Line']) &      # Example condition for MACD
        
        #(data_copy['close'] > data_copy['SMA']) &             # Price Crossover Signal: SMA
        # (data_copy['close'] > data_copy['EMA'])             # Price Crossover Signal: EMA
        #(data_copy['EMA12'] > data_copy['EMA26']) &             # Moving Average Crossover Signal
        #(data_copy['close'] < data_copy['SMA']) & (data_copy['EMA12'] < data_copy['EMA26']) & # Combined Price and Moving Average Signal
        #(data_copy['EMA'].diff() < 0)                        # Moving Average Slope Signal

        (data_copy['MACD'] < data_copy['Signal_Line']) & (data_copy['MACD'].shift(1) >= data_copy['Signal_Line'].shift(1)) & # MACD sell signal
        
        (data_copy['close'] < data_copy['SMA']) & (data_copy['close'].shift(1) >= data_copy['SMA'].shift(1)) # SMA sell signal
    )

    # Define the target as 1 (buy) if all conditions are met, and 0 (sell) if they are not. 
    # Convert `buy_signal` to an integer for binary classification.
    data_copy['target'] = data_copy['buy_signal'].astype(int)

    # The last row in your dataset will have a NaN value for 'future_return' due to the shift operation. Drop this row to clean up the dataset.
    data_copy = data_copy.dropna()

    # Check the balance of 1s and 0s in our target variable to understand how many “buy” and “sell” signals we have.
    print(data_copy['target'].value_counts())
    print(data_copy['sell_signal'].value_counts())

    return data_copy

In [72]:
# Define the threshold for the price change to classify as 'buy' or 'sell'. For instance, if we want a 1% increase to be a 'buy' signal,
# the threshold will be 0.01.
threshold = 0.01

# Compute the 'target' variable
btc_price_data_1_year_target = create_target_variable_tech_indicators(btc_price_data_1_year, threshold)

target
0    528546
1        65
Name: count, dtype: int64
sell_signal
False    528558
True         53
Name: count, dtype: int64


In [27]:
btc_price_data_1_year_target[btc_price_data_1_year_target.target == 1]

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,future_return,target,RSI,EMA12,EMA26,MACD,Signal_Line,SMA,EMA,buy_signal,sell_signal
7181,2023-11-05 23:18:00,35238.06,35279.47,35262.22,35259.32,10.733954,2023-11-05,23:18:00,0.097478,1,36.899358,35218.110189,35149.827830,68.282359,79.731362,35219.5325,35181.030072,True,False
11319,2023-11-08 20:02:00,35652.94,35699.98,35657.70,35684.81,15.940414,2023-11-08,20:02:00,0.081912,1,36.891192,35660.282716,35656.946567,3.336150,5.123662,35674.1850,35659.120663,True,False
11616,2023-11-09 00:58:00,35857.19,35866.17,35862.68,35860.94,4.735804,2023-11-09,00:58:00,0.028415,1,39.153070,35852.545112,35830.023989,22.521123,26.041140,35853.1600,35840.338879,True,False
12175,2023-11-09 10:15:00,36811.70,36840.10,36831.91,36813.90,12.464662,2023-11-09,10:15:00,0.039958,1,39.916053,36810.177911,36798.532318,11.645594,10.110134,36810.7925,36803.441944,True,False
13712,2023-11-10 11:47:00,36992.67,37052.35,36992.67,37039.93,53.731210,2023-11-10,11:47:00,0.088877,1,35.050526,37016.877888,36994.002065,22.875823,37.365561,37039.1055,37007.649470,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
515147,2024-10-22 14:48:00,67307.16,67367.86,67333.74,67358.83,7.241707,2024-10-22,14:48:00,0.073635,1,39.298246,67365.929248,67322.658396,43.270851,54.591163,67337.4255,67343.176440,True,False
515314,2024-10-22 17:34:00,67270.00,67339.31,67304.81,67270.43,5.488279,2024-10-22,17:34:00,0.036747,1,33.323699,67286.253301,67233.138317,53.114984,63.956004,67269.1130,67256.970480,True,False
518157,2024-10-24 16:48:00,67545.35,67618.88,67562.14,67618.87,10.659939,2024-10-24,16:48:00,0.045978,1,36.232969,67587.872621,67578.758487,9.114133,15.621773,67616.8825,67583.392394,True,False
520561,2024-10-26 08:44:00,67026.90,67036.94,67030.84,67027.64,0.761557,2024-10-26,08:44:00,0.038805,1,35.731910,67028.618858,67020.405002,8.213856,8.714359,67027.0850,67023.998454,True,False


In [30]:
btc_price_data_1_year_target[btc_price_data_1_year_target.sell_signal == 1]

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,future_return,target,RSI,EMA12,EMA26,MACD,Signal_Line,SMA,EMA,buy_signal,sell_signal
163545,2024-02-22 05:07:00,51463.0,51471.11,51466.11,51463.0,1.012325,2024-02-22,05:07:00,-0.039232,0,76.142923,51463.366625,51476.159435,-12.79281,-18.008761,51464.439,51469.641242,False,True
465676,2024-09-18 09:01:00,60174.84,60197.72,60175.85,60174.84,1.497534,2024-09-18,09:01:00,-0.074167,0,72.064179,60186.951898,60189.20795,-2.256052,-5.052039,60177.472,60186.540692,False,True


In [24]:
btc_price_data_1_year_target.head()

Unnamed: 0,timestamp,open,high,low,close,volume,date,time,future_return,target,RSI,EMA12,EMA26,MACD,Signal_Line,SMA,EMA,buy_signal
19,2023-11-01 00:19:00,34612.24,34634.46,34618.78,34631.66,2.974894,2023-11-01,00:19:00,0.038549,0,56.177122,34622.946322,34632.753989,-9.807666,-11.531193,34626.673,34628.595251,False
20,2023-11-01 00:20:00,34631.71,34661.72,34633.32,34645.01,4.406042,2023-11-01,00:20:00,-0.111127,0,55.560545,34626.340734,34633.661841,-7.321107,-10.689176,34625.5295,34630.15856,False
21,2023-11-01 00:21:00,34602.75,34651.54,34646.48,34606.51,14.372378,2023-11-01,00:21:00,-0.113794,0,34.866742,34623.289852,34631.650594,-8.360742,-10.223489,34623.714,34627.906317,False
22,2023-11-01 00:22:00,34560.23,34604.6,34604.6,34567.13,13.971646,2023-11-01,00:22:00,-0.048919,0,31.948538,34614.649875,34626.871291,-12.221416,-10.623074,34619.2425,34622.118096,False
23,2023-11-01 00:23:00,34543.03,34568.26,34567.04,34550.22,2.9565,2023-11-01,00:23:00,0.045933,0,33.177865,34604.737586,34621.193417,-16.455831,-11.789626,34615.2865,34615.270658,False


### 2.3. Calculate Target Variable - Explanation

With the additional technical indicators, you can refine the target variable to incorporate both price changes and indicator thresholds. Here’s a step-by-step guide to create the target variable based on these conditions:

1. **Choose the Prediction Horizon**:  
   Decide on the period over which to predict the price movement. For example, predict whether the price will increase over the next day or hour by comparing the current close price to the next period’s close price.

2. **Calculate Future Returns**:  
   Compute the percentage change between the current close price and the close price in the next period.

   ```python
   df['future_return'] = (df['close'].shift(-1) - df['close']) / df['close']
   ```

3. **Set a Threshold for Price Increase**:  
   Choose a threshold for what you consider a "buy" signal, like a 1% price increase (threshold = 0.01).

   ```python
   price_threshold = 0.01
   ```

4. **Define Conditions Based on Technical Indicators**:  
   Use the technical indicators to add more conditions to the target. Here are some typical conditions you might consider:

   - **RSI**: A Relative Strength Index (RSI) value below 30 often indicates an oversold condition, which might suggest a buying opportunity.
   - **MACD**: A positive MACD value (i.e., MACD > Signal Line) can suggest an uptrend.
   - **SMA/EMA**: If the current price is above the SMA or EMA, it may indicate an upward trend.

   Adjust the conditions according to your strategy:

   ```python
   df['buy_signal'] = (
       (df['future_return'] > price_threshold) &
       (df['RSI'] < 30) &                     # Example condition for RSI
       (df['MACD'] > df['Signal_Line']) &      # Example condition for MACD
       (df['close'] > df['SMA'])               # Example condition for SMA
   )
   ```

5. **Create the Target Variable**:  
   Define the target as 1 (buy) if all conditions are met, and 0 (sell) if they are not. Convert `buy_signal` to an integer for binary classification.

   ```python
   df['target'] = df['buy_signal'].astype(int)
   ```

6. **Remove Any NaN Values**:  
   The last row in your dataset will have a NaN value for `future_return` due to the shift. Drop this row to clean up the dataset.

   ```python
   df = df.dropna()
   ```

7. **Verify the Target Distribution**:  
   Check the balance of 1s and 0s in your target variable.

   ```python
   print(df['target'].value_counts())
   ```

Now you’ll have a `target` column that combines price movement and technical indicator thresholds, ready for use in your XGBoost model. Let me know if you’d like to adjust any part of this setup!

## 3. Create buy and sell targets using the standard MACD values

To create buy and sell targets using the standard MACD values (Fast EMA of 12, Slow EMA of 26, Signal Line of 9), you can set up conditions based on the MACD line crossing the Signal Line. Here’s how to set these conditions:

1. **Define the MACD Line and Signal Line**:
   - **MACD Line**: The difference between the 12-period EMA and the 26-period EMA.
   - **Signal Line**: The 9-period EMA of the MACD Line.

2. **Set Buy and Sell Conditions**:
   - A **buy signal** occurs when the MACD Line crosses **above** the Signal Line, indicating a potential upward trend.
   - A **sell signal** occurs when the MACD Line crosses **below** the Signal Line, indicating a potential downward trend.

### Step-by-Step Code Example

Assuming you have a DataFrame `df` with a `close` column, here’s how to calculate the MACD and create buy/sell signals:

```python
# Calculate the Fast EMA (12) and Slow EMA (26)
df['EMA12'] = df['close'].ewm(span=12, adjust=False).mean()
df['EMA26'] = df['close'].ewm(span=26, adjust=False).mean()

# Calculate the MACD Line and Signal Line
df['MACD'] = df['EMA12'] - df['EMA26']
df['Signal_Line'] = df['MACD'].ewm(span=9, adjust=False).mean()

# Define Buy and Sell Signals
df['buy_signal'] = (df['MACD'] > df['Signal_Line']) & (df['MACD'].shift(1) <= df['Signal_Line'].shift(1))
df['sell_signal'] = (df['MACD'] < df['Signal_Line']) & (df['MACD'].shift(1) >= df['Signal_Line'].shift(1))
```

### Explanation of Buy/Sell Conditions
- **`buy_signal`**: The condition checks if the MACD line has crossed above the Signal Line in the current period but was below it in the previous period. This indicates a bullish crossover.
- **`sell_signal`**: The condition checks if the MACD line has crossed below the Signal Line in the current period but was above it in the previous period, indicating a bearish crossover.

### Example Usage
After applying this code, your DataFrame will have `True` values in the `buy_signal` column when there’s a buy signal, and `True` values in the `sell_signal` column when there’s a sell signal. You can use these columns to trigger buy/sell actions in your strategy.

## 4. set buy and sell targets using a Price Crossover Signal - Simple Moving Average (SMA) or Exponential Moving Average (EMA)

To set buy and sell targets using a Price Crossover Signal, you can base your conditions on when the price crosses above or below a chosen moving average (like the Simple Moving Average (SMA) or Exponential Moving Average (EMA)). Here’s how to set these conditions for a buy and sell signal:

### 1. Define the Moving Average (e.g., 50-period SMA)
   - A **buy signal** occurs when the price crosses **above** the moving average, indicating a potential upward trend.
   - A **sell signal** occurs when the price crosses **below** the moving average, suggesting a potential downward trend.

### Step-by-Step Code Example

Assuming you have a DataFrame `df` with a `close` column, here’s how to calculate the moving average and create buy/sell signals:

```python
# Calculate the 50-period Simple Moving Average (SMA)
df['SMA50'] = df['close'].rolling(window=50).mean()

# Define Buy and Sell Signals
df['buy_signal'] = (df['close'] > df['SMA50']) & (df['close'].shift(1) <= df['SMA50'].shift(1))
df['sell_signal'] = (df['close'] < df['SMA50']) & (df['close'].shift(1) >= df['SMA50'].shift(1))
```

### Explanation of Buy/Sell Conditions
- **`buy_signal`**: This condition checks if the current close price is above the SMA (indicating a bullish crossover) and if the previous close price was at or below the SMA, confirming that the crossover has just happened.
- **`sell_signal`**: This condition checks if the current close price is below the SMA (indicating a bearish crossover) and if the previous close price was at or above the SMA, confirming the bearish crossover.

### Example Usage
After applying this code, your DataFrame will have `True` values in the `buy_signal` column when there’s a buy signal, and `True` values in the `sell_signal` column when there’s a sell signal. You can use these columns to trigger buy/sell actions in your strategy. 

### Optional Variations
You can adjust the moving average period (e.g., 20, 100, or 200) depending on the timeframe and strategy. Shorter periods capture more frequent signals, while longer periods focus on major trend changes. 

