## Objective: 

**Predicting BTCUSDT Price Trend Using MACD**

**Purpose**: 
Instead of trying to guess the exact price of BTCUSDT in the next hour, we now want to know its general movement: Is it going up or down? And how strongly? We'll use something called MACD to help us do this. It's like a helper that shows us trends. But often, by the time MACD tells us something, it's too late to act. So, our goal is to make a model that reads MACD early, letting us know ahead of time.

**What Went Wrong?**: 
In our previous experiment, we tried to guess the exact price of BTCUSDT after one hour. But our model was just making random guesses, especially for prices going up (+1) or down (-1). So, it didn't work well.

**Data source**: 
- Binance
- BTCUSDT hourly prices in JSON.

**Prediction Logic**:
- **+2 (Strong Upward Trend)**: Conditions – MACD value is positive and exhibits an increase from its previous value.
- **+1 (Weak Upward Trend)**: Conditions – MACD value remains positive, but there's a decrease relative to its prior value.
- **-2 (Strong Downward Trend)**: Conditions – MACD value is negative and further decreases from its preceding value.
- **-1 (Weak Downward Trend)**: Conditions – MACD is negative but shows an increase when juxtaposed with its preceding value.


### Target and features

In [15]:
import numpy as np
import pandas as pd
import talib

# Ref from data exploration
def remove_outlier(df, iqr_threshold = 5):
    
    # Calculate the first quartile (25th percentile) and third quartile (75th percentile)
    q1 = df['volume'].quantile(0.25)
    q3 = df['volume'].quantile(0.75)

    # Calculate the interquartile range (IQR)
    iqr = q3 - q1

    # Define lower and upper bounds for outliers
    lower_bound = q1 - iqr_threshold * iqr
    upper_bound = q3 + iqr_threshold * iqr

    lower_bound = 0 if lower_bound < 0 else lower_bound

    # remove outliers from df
    df = df[(df['volume'] > lower_bound) & (df['volume'] < upper_bound)]
    
    return df

def read_hist_data(name = 'BTC', timeframe = '1h'):
    file_path = f"./data/{name}_USDT-{timeframe}.json"
    df = pd.read_json(file_path)

    # set column names
    df.columns = ['datetime', 'open', 'high', 'low', 'close', 'volume']

    # convert unix timestamp to datetime
    df['datetime'] = pd.to_datetime(df['datetime'], unit='ms')

    # change datetime to index
    df.set_index('datetime', inplace=True)
    
    df = remove_outlier(df)
    return df

Explore MACD

In [16]:
df = read_hist_data(name = 'BTC', timeframe = '1h')

# use OHLC to calculate MACD to reduce noise
df['ohlc'] = (df['open'] + df['high'] + df['low'] + df['close']) / 4

# MACD standard
df['macd'], df['macd_signal'], df['macd_hist'] = talib.MACD(df['ohlc'], fastperiod=12, slowperiod=26, signalperiod=9)

# Refer Histogram level to ohlc
df['macd_hist_ref'] = df['macd_hist'] / df['ohlc'] * 100

df.describe()


Unnamed: 0,open,high,low,close,volume,ohlc,macd,macd_signal,macd_hist,macd_hist_ref
count,22915.0,22915.0,22915.0,22915.0,22915.0,22915.0,22882.0,22882.0,22882.0,22882.0
mean,35117.336473,35297.705123,34928.163154,35117.620848,4477.749672,35115.2064,-0.994906,-0.986205,-0.008701,-2e-05
std,13125.131469,13210.580569,13033.19825,13125.379765,4697.700455,13122.679589,319.613944,300.911657,96.35576,0.247338
min,15648.23,15769.99,15476.0,15649.52,5.887034,15690.7575,-2381.545833,-2078.161228,-708.712928,-1.495016
25%,23415.51,23483.855,23341.975,23414.47,1398.628235,23412.66875,-118.995339,-116.99637,-38.194848,-0.11429
50%,32099.97,32345.0,31800.0,32099.98,2658.617538,32079.825,-0.341699,-1.304715,-0.434578,-0.001665
75%,45197.645,45485.53,44869.145,45197.975,5828.03294,45213.87375,122.724489,119.800609,36.087915,0.111434
max,68635.12,69000.0,68451.19,68633.69,28721.89375,68596.2725,1869.44532,1731.084977,616.509597,1.447991


### Experiment 1: Create features from indicators

Lets test same set of features from previous experiment, but this time we will use MACD as target.

In [34]:

# to predict if the price will go up or down in the next period
def get_target_next_macd(df, target_shift = 1):

    target_threshold = 0.1

    # oclh
    df['ohlc'] = (df['open'] + df['high'] + df['low'] + df['close']) / 4

    # MACD standard
    macd, macdsignal, macdhist  = talib.MACD(df['ohlc'], fastperiod=12, slowperiod=26, signalperiod=9)

    # Refer Histogram level to ohlc
    df['macdhist'] = macdhist / df['ohlc'] * 100

    conditions = [
        (df['macdhist'].isnull()),
        (df['macdhist'] > target_threshold) & (df['macdhist'] > df['macdhist'].shift(1)),
        (df['macdhist'] > target_threshold) & (df['macdhist'] <= df['macdhist'].shift(1)),
        (df['macdhist'] < target_threshold * -1) & (df['macdhist'] >= df['macdhist'].shift(1)),
        (df['macdhist'] < target_threshold * -1) & (df['macdhist'] < df['macdhist'].shift(1))
    ]
    values = [np.nan, 2, 1, -1, -2]
    df['target'] = np.select(conditions, values, default=0,)

    # shift target to future
    df['target'] = df['target'].shift(target_shift * -1)

    # drop unused columns
    drop_columns = ['ohlc', 'macdhist']
    df.drop(drop_columns, axis=1, inplace=True)
    
    return df

def get_features_v1(df):
    df = df.copy()

    # List of periods
    periods = [5, 10, 20, 40, 80]
    inputs = ['open', 'high', 'low', 'close', 'volume']

    # log volume
    df['volume'] = np.log(df['volume'])
    
    # loop periods and inputs
    for period in periods:
        for input in inputs:
        
        
            # Generate indicators

            # % of change
            df.loc[:, f'{input}_pct_{period}'] = df[input].pct_change(periods=period)

            # EMAs
            df.loc[:, f'{input}_ema_{period}'] = talib.EMA(df[input].values, timeperiod=period)

            # RSI
            df.loc[:, f'{input}_rsi_{period}'] = talib.RSI(df[input].values, timeperiod=period)

            # TRIX
            df.loc[:, f'{input}_trix_{period}'] = talib.TRIX(df[input].values, timeperiod=period)

            # SD
            df.loc[:, f'{input}_sd_{period}'] = talib.STDDEV(df[input].values, timeperiod=period)

            # ROC
            df.loc[:, f'{input}_roc_{period}'] = talib.ROC(df[input].values, timeperiod=period)

            # VAR
            df.loc[:, f'{input}_var_{period}'] = talib.VAR(df[input].values, timeperiod=period)


        df = df.copy()
        
        # Multiple inputs indicators
        # ATR
        atr = talib.ATR(df['high'].values, df['low'].values, df['close'].values, timeperiod=period)
        df.loc[:, f'price_atr_{period}'] = atr

        # Price interaction Features:
        df.loc[:, f'price_interact_{period}'] = df[f'close_pct_{period}'] * df[f'price_atr_{period}']

        # Price volume interaction:
        df.loc[:, f'pv_interact_{period}'] = df[f'close_pct_{period}'] * df[f'volume_pct_{period}']

        # ADX
        df.loc[:, f'price_adx_{period}'] = talib.ADX(df['high'].values, df['low'].values, df['close'].values, timeperiod=period)

        # MFI
        df.loc[:, f'mfi_{period}'] = talib.MFI(df['high'].values, df['low'].values, df['close'].values, df['volume'].values, timeperiod=period)

        # DX
        df.loc[:, f'price_dx_{period}'] = talib.DX(df['high'].values, df['low'].values, df['close'].values, timeperiod=period)

    df = df.copy()
    
    return df

df = read_hist_data()
df = get_target_next_macd(df)
df = get_features_v1(df)
df = df.dropna()

# print(df.head(5))
# print(df.tail(5))

# Count target values
print("Count", df['target'].value_counts())

print("Shape", df.shape)

df.describe()

Count target
 0.0    10358
-2.0     3265
 2.0     3130
 1.0     2989
-1.0     2934
Name: count, dtype: int64
Shape (22676, 211)


Unnamed: 0,open,high,low,close,volume,target,open_pct_5,open_ema_5,open_rsi_5,open_trix_5,...,volume_trix_80,volume_sd_80,volume_roc_80,volume_var_80,price_atr_80,price_interact_80,pv_interact_80,price_adx_80,mfi_80,price_dx_80
count,22676.0,22676.0,22676.0,22676.0,22676.0,22676.0,22676.0,22676.0,22676.0,22676.0,...,22676.0,22676.0,22676.0,22676.0,22676.0,22676.0,22676.0,22676.0,22676.0,22676.0
mean,35119.474608,35297.924274,34932.766292,35119.358703,7.956264,-0.009481,4.8e-05,35120.544755,50.107491,-0.001619,...,-0.000756,0.507078,0.527117,0.271079,369.674408,-1.122671,-0.000306,13.488741,50.222947,13.596653
std,13187.442326,13273.257147,13095.169048,13187.740972,0.947552,1.178659,0.016292,13183.542564,18.947606,0.200151,...,0.029637,0.118117,11.124851,0.131423,240.670289,35.640334,0.007603,6.115836,5.729443,10.640867
min,15648.23,15769.99,15476.0,15649.52,1.772752,-2.0,-0.139675,15735.204649,1.12196,-1.376177,...,-0.142974,0.27123,-77.445199,0.073566,37.364503,-317.718029,-0.097869,3.709515,31.638686,0.000563
25%,23358.705,23439.325,23269.34,23355.08,7.235698,-1.0,-0.006021,23365.200187,36.863903,-0.077867,...,-0.018797,0.415276,-6.567653,0.172454,162.803224,-9.2421,-0.00189,9.130853,46.185918,5.34382
50%,31883.545,32163.28,31671.145,31881.755,7.877857,0.0,5e-06,31941.809892,50.137025,-0.000757,...,-0.001035,0.483974,-0.101862,0.234231,315.535,-0.103217,-5e-06,11.921938,50.032979,11.197398
75%,45445.78,45776.395,45150.0,45445.79,8.670391,1.0,0.006117,45479.773576,63.247994,0.079269,...,0.016359,0.586128,6.872177,0.343546,538.664388,10.468266,0.001707,16.205516,54.030398,19.27604
max,68635.12,69000.0,68451.19,68633.69,10.265415,2.0,0.185445,68225.516624,98.827914,1.429371,...,0.124158,0.990415,330.676978,0.980922,1290.661348,227.240465,0.309765,40.244988,69.827068,74.174616
