<a id="100"></a>
**HOME**

**Main Idea:**

Binary classification in trading predicts whether the market will **move up** or **move down** within a specific timeframe, using only OHLC price data. By leveraging machine learning, traders can simplify decision-making, and improve trading efficiency, enhancing the chances of consistent profits in volatile markets.


**References:**

* [Evaluating Machine Learning Classification for Financial Trading: An Empirical Approach](https://jfin-swufe.springeropen.com/articles/10.1186/s40854-020-00217-x)
* [Trading via Selective Classification](https://arxiv.org/pdf/2110.14914v1)
* [Forecasting and trading cryptocurrencies with machine learning under changing market conditions](https://jfin-swufe.springeropen.com/articles/10.1186/s40854-020-00217-x)
* [Trading via Selective Classification](https://arxiv.org/pdf/2110.14914v1)

**Content:**

* [**Import Dataset**](#1)
* [**Data Preparation**](#2)
* [**Modeling and Evaluation**](#3)
* [**Modeling All Data**](#4)
* [**Today's Prediction**](#5)

> **Prev Green Candle: Close2Close**

____

<a id="1"></a>

**Import Dataset**

In [1]:
symbol='BTCUSDT'

In [2]:
from binance.client import Client
import pandas as pd
import time

# Initialize the Binance client
api_key = "sytvkKKUmXPabC877r7MFv7rhibYAMoczrMdTse0OSB6dRyImx1G8yEInE889y00"
api_secret = "KYgkq441X5spXpdDoLELwlcoJ3k7uh9LeXGgf7aQvABSMZl42Py3OUIwFCqVgc6L"
client = Client(api_key, api_secret)

def fetch_ohlcv_batch(client, symbol, interval, start_time, limit=1000):
    """
    Fetch a batch of OHLCV data from Binance.
    """
    try:
        candles = client.get_klines(
            symbol=symbol,
            interval=interval,
            startTime=start_time,
            limit=limit
        )
        # Transform data into desired format
        ohlcv = [
            [int(c[0]), float(c[1]), float(c[2]), float(c[3]), float(c[4]), float(c[5])]
            for c in candles
        ]
        return ohlcv
    except Exception as e:
        print(f"Error fetching data: {e}")
        return None

def fetch_historical_ohlcv(client, symbol, interval, start_time, limit=1000):
    """
    Fetch historical OHLCV data in batches from Binance.
    """
    all_data = []
    while True:
        data = fetch_ohlcv_batch(client, symbol, interval, start_time, limit)
        if data:
            # Append data to all_data
            all_data.extend(data)
            # Update `start_time` to the timestamp of the last fetched data point + 1 millisecond
            start_time = data[-1][0] + 1
            print(f"Fetched {len(data)} data points. Total so far: {len(all_data)}")
        else:
            print("No more data to fetch or an error occurred.")
            break

        # If the batch size is less than the limit, it means we reached the end of available data
        if len(data) < limit:
            print("Reached the end of available data.")
            break

        # To avoid rate limit issues, wait for a short while
        time.sleep(1)

    # Convert data to DataFrame
    df = pd.DataFrame(all_data, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
    return df

# Usage example
if __name__ == "__main__":
    # Define parameters
    # symbol = 'BTCUSDT'        # Symbol to fetch (without '/')
    interval = Client.KLINE_INTERVAL_1DAY # Timeframe ('1m', '5m', '1h', '1d', etc.)
    start_time = int(pd.Timestamp("2007-01-01").timestamp() * 1000)  # Start date in milliseconds
    limit = 1000              # Max data points per batch

    # Fetch historical data
    df = fetch_historical_ohlcv(client, symbol, interval, start_time, limit)
    print(f"Total fetched data points: {len(df)}")
    print(df.head())

Fetched 1000 data points. Total so far: 1000
Fetched 1000 data points. Total so far: 2000
Fetched 704 data points. Total so far: 2704
Reached the end of available data.
Total fetched data points: 2704
   timestamp     open     high      low    close       volume
0 2017-08-17  4261.48  4485.39  4200.74  4285.08   795.150377
1 2017-08-18  4285.08  4371.52  3938.77  4108.37  1199.888264
2 2017-08-19  4108.37  4184.69  3850.00  4139.98   381.309763
3 2017-08-20  4120.98  4211.08  4032.62  4086.29   467.083022
4 2017-08-21  4069.13  4119.62  3911.79  4016.00   691.743060


In [3]:
df.head()

Unnamed: 0,timestamp,open,high,low,close,volume
0,2017-08-17,4261.48,4485.39,4200.74,4285.08,795.150377
1,2017-08-18,4285.08,4371.52,3938.77,4108.37,1199.888264
2,2017-08-19,4108.37,4184.69,3850.0,4139.98,381.309763
3,2017-08-20,4120.98,4211.08,4032.62,4086.29,467.083022
4,2017-08-21,4069.13,4119.62,3911.79,4016.0,691.74306


In [4]:
df.tail()

Unnamed: 0,timestamp,open,high,low,close,volume
2699,2025-01-06,98363.61,102480.0,97920.0,102235.6,25263.43375
2700,2025-01-07,102235.6,102724.38,96181.81,96954.61,32059.87537
2701,2025-01-08,96954.6,97268.65,92500.9,95060.61,33704.67894
2702,2025-01-09,95060.61,95382.32,91203.67,92552.49,34544.83685
2703,2025-01-10,92552.49,93726.87,92431.73,93526.12,2667.8908


<a id="id"></a>
[**Back to HOME**](#100)

<a id="2"></a>

**Data Preparation**

In [5]:
# Select all rows except the last one
df = df.iloc[:-1]

In [6]:
df.tail()

Unnamed: 0,timestamp,open,high,low,close,volume
2698,2025-01-05,98220.51,98836.85,97276.79,98363.61,8095.63723
2699,2025-01-06,98363.61,102480.0,97920.0,102235.6,25263.43375
2700,2025-01-07,102235.6,102724.38,96181.81,96954.61,32059.87537
2701,2025-01-08,96954.6,97268.65,92500.9,95060.61,33704.67894
2702,2025-01-09,95060.61,95382.32,91203.67,92552.49,34544.83685


In [7]:
df.columns

Index(['timestamp', 'open', 'high', 'low', 'close', 'volume'], dtype='object')

In [8]:
df_close2close=df.copy()

In [9]:
df_close2close['prev_close'] = df['close'].shift(1)

In [10]:
df_close2close

Unnamed: 0,timestamp,open,high,low,close,volume,prev_close
0,2017-08-17,4261.48,4485.39,4200.74,4285.08,795.150377,
1,2017-08-18,4285.08,4371.52,3938.77,4108.37,1199.888264,4285.08
2,2017-08-19,4108.37,4184.69,3850.00,4139.98,381.309763,4108.37
3,2017-08-20,4120.98,4211.08,4032.62,4086.29,467.083022,4139.98
4,2017-08-21,4069.13,4119.62,3911.79,4016.00,691.743060,4086.29
...,...,...,...,...,...,...,...
2698,2025-01-05,98220.51,98836.85,97276.79,98363.61,8095.637230,98220.50
2699,2025-01-06,98363.61,102480.00,97920.00,102235.60,25263.433750,98363.61
2700,2025-01-07,102235.60,102724.38,96181.81,96954.61,32059.875370,102235.60
2701,2025-01-08,96954.60,97268.65,92500.90,95060.61,33704.678940,96954.61


In [11]:
# Drop rows with any NaN values
df_close2close.dropna(inplace=True)

In [12]:
df_close2close

Unnamed: 0,timestamp,open,high,low,close,volume,prev_close
1,2017-08-18,4285.08,4371.52,3938.77,4108.37,1199.888264,4285.08
2,2017-08-19,4108.37,4184.69,3850.00,4139.98,381.309763,4108.37
3,2017-08-20,4120.98,4211.08,4032.62,4086.29,467.083022,4139.98
4,2017-08-21,4069.13,4119.62,3911.79,4016.00,691.743060,4086.29
5,2017-08-22,4016.00,4104.82,3400.00,4040.00,966.684858,4016.00
...,...,...,...,...,...,...,...
2698,2025-01-05,98220.51,98836.85,97276.79,98363.61,8095.637230,98220.50
2699,2025-01-06,98363.61,102480.00,97920.00,102235.60,25263.433750,98363.61
2700,2025-01-07,102235.60,102724.38,96181.81,96954.61,32059.875370,102235.60
2701,2025-01-08,96954.60,97268.65,92500.90,95060.61,33704.678940,96954.61


In [13]:
# Create the 'up_down' column: 1 if today's close is higher than yesterday's, else 0
df_close2close['down_close2close'] = (df_close2close['close'] < df_close2close['prev_close']).astype(int)

In [14]:
df_close2close.columns

Index(['timestamp', 'open', 'high', 'low', 'close', 'volume', 'prev_close',
       'down_close2close'],
      dtype='object')

In [15]:
df_close2close.tail()

Unnamed: 0,timestamp,open,high,low,close,volume,prev_close,down_close2close
2698,2025-01-05,98220.51,98836.85,97276.79,98363.61,8095.63723,98220.5,0
2699,2025-01-06,98363.61,102480.0,97920.0,102235.6,25263.43375,98363.61,0
2700,2025-01-07,102235.6,102724.38,96181.81,96954.61,32059.87537,102235.6,1
2701,2025-01-08,96954.6,97268.65,92500.9,95060.61,33704.67894,96954.61,1
2702,2025-01-09,95060.61,95382.32,91203.67,92552.49,34544.83685,95060.61,1


In [16]:
# Delete columns 
df_close2close_select = df_close2close.drop(['timestamp'], axis=1)

In [17]:
df_close2close_select.tail()

Unnamed: 0,open,high,low,close,volume,prev_close,down_close2close
2698,98220.51,98836.85,97276.79,98363.61,8095.63723,98220.5,0
2699,98363.61,102480.0,97920.0,102235.6,25263.43375,98363.61,0
2700,102235.6,102724.38,96181.81,96954.61,32059.87537,102235.6,1
2701,96954.6,97268.65,92500.9,95060.61,33704.67894,96954.61,1
2702,95060.61,95382.32,91203.67,92552.49,34544.83685,95060.61,1


In [18]:
# Count the occurrences of 1 and 0
value_counts = df_close2close_select['down_close2close'].value_counts(normalize=True) * 100

# Display the percentages
print(f"Percentage of 1: {value_counts.get(1, 0):.2f}%")
print(f"Percentage of 0: {value_counts.get(0, 0):.2f}%")

Percentage of 1: 48.45%
Percentage of 0: 51.55%


In [19]:
# Separate features and target
X = df_close2close_select.drop('down_close2close', axis=1)  # Replace 'target' with your actual target column name
y = df_close2close_select['down_close2close']

In [20]:
# Split the data into training, validation, and test sets
from sklearn.model_selection import train_test_split
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

In [21]:
# # Handle class imbalance using SMOTE
# from imblearn.over_sampling import SMOTE
# smote = SMOTE(random_state=42)
# X_train_res, y_train_res = smote.fit_resample(X_train, y_train)
X_train_res=X_train
y_train_res=y_train

<a id="id"></a>
[**Back to HOME**](#100)

<a id="3"></a>

**Modeling and Evaluation**

In [22]:
# Parameter untuk GridSearchCV
param_grid = {
    "n_estimators": [50, 100, 200],
    "max_depth": [3, 5, 7],
    "learning_rate": [0.01, 0.1, 0.2],
    "subsample": [0.8, 1.0]
}

In [23]:
# Import the XGBoost classifier
from xgboost import XGBClassifier
# model_xgb = XGBClassifier(random_state=42, use_label_encoder=False, eval_metric='logloss')
model_xgb = XGBClassifier(random_state=42, use_label_encoder=False, eval_metric='logloss')

In [24]:
# GridSearchCV for best parameters
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
grid_cv = GridSearchCV(estimator=model_xgb, param_grid=param_grid, scoring="accuracy", cv=5, verbose=1, n_jobs=-1)

In [25]:
# Train the model
grid_cv.fit(X_train_res, y_train_res)

Fitting 5 folds for each of 54 candidates, totalling 270 fits


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encode

In [26]:
# Best parameters
print("Best Parameters:", grid_cv.best_params_)
print("Best Cross-Validation Accuracy:", grid_cv.best_score_)

Best Parameters: {'learning_rate': 0.2, 'max_depth': 5, 'n_estimators': 200, 'subsample': 1.0}
Best Cross-Validation Accuracy: 0.7863480895143165


In [27]:
# Evaluate the model on the validation set
best_model = grid_cv.best_estimator_
y_val_pred = best_model.predict(X_val)
y_val_pred_proba = best_model.predict_proba(X_val)[:, 1] 

In [28]:
# Evaluate the model on the test set
best_model = grid_cv.best_estimator_
y_test_pred = best_model.predict(X_test)
y_test_pred_proba = best_model.predict_proba(X_test)[:, 1] 

In [29]:
# Metrics Evaluation on the validation set
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, make_scorer
)

accuracy_val = accuracy_score(y_val, y_val_pred)
precision_val = precision_score(y_val, y_val_pred)
recall_val = recall_score(y_val, y_val_pred)
f1_val = f1_score(y_val, y_val_pred)
f2_val = (1 + 2**2) * (precision_val * recall_val) / ((2**2 * precision_val) + recall_val)
roc_auc_val = roc_auc_score(y_val, y_val_pred_proba)

In [30]:
# Metrics Evaluation on the test set
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, make_scorer
)

accuracy_test = accuracy_score(y_test, y_test_pred)
precision_test = precision_score(y_test, y_test_pred)
recall_test = recall_score(y_test, y_test_pred)
f1_test = f1_score(y_test, y_test_pred)
f2_test = (1 + 2**2) * (precision_test * recall_test) / ((2**2 * precision_test) + recall_test)
roc_auc_test = roc_auc_score(y_test, y_test_pred_proba)

In [31]:
print("\nValidation Evaluation Metrics:")
print(f"Accuracy: {accuracy_val:.4f}")
print(f"Precision: {precision_val:.4f}")
print(f"Recall: {recall_val:.4f}")
print(f"F1 Score: {f1_val:.4f}")
print(f"F2 Score: {f2_val:.4f}")
print(f"ROC AUC: {roc_auc_val:.4f}")


print("\nTest Evaluation Metrics:")
print(f"Accuracy: {accuracy_test:.4f}")
print(f"Precision: {precision_test:.4f}")
print(f"Recall: {recall_test:.4f}")
print(f"F1 Score: {f1_test:.4f}")
print(f"F2 Score: {f2_test:.4f}")
print(f"ROC AUC: {roc_auc_test:.4f}")


Validation Evaluation Metrics:
Accuracy: 0.8469
Precision: 0.8404
Recall: 0.8316
F1 Score: 0.8360
F2 Score: 0.8333
ROC AUC: 0.9224

Test Evaluation Metrics:
Accuracy: 0.8005
Precision: 0.7897
Recall: 0.8244
F1 Score: 0.8067
F2 Score: 0.8172
ROC AUC: 0.8946


<a id="4"></a>

**Modeling All Data**

In [32]:
symbol = 'BTCUSDT'

In [33]:
from binance.client import Client
import pandas as pd
import time

# Initialize the Binance client
api_key = "sytvkKKUmXPabC877r7MFv7rhibYAMoczrMdTse0OSB6dRyImx1G8yEInE889y00"
api_secret = "KYgkq441X5spXpdDoLELwlcoJ3k7uh9LeXGgf7aQvABSMZl42Py3OUIwFCqVgc6L"
client = Client(api_key, api_secret)

def fetch_ohlcv_batch(client, symbol, interval, start_time, limit=1000):
    """
    Fetch a batch of OHLCV data from Binance.
    """
    try:
        candles = client.get_klines(
            symbol=symbol,
            interval=interval,
            startTime=start_time,
            limit=limit
        )
        # Transform data into desired format
        ohlcv = [
            [int(c[0]), float(c[1]), float(c[2]), float(c[3]), float(c[4]), float(c[5])]
            for c in candles
        ]
        return ohlcv
    except Exception as e:
        print(f"Error fetching data: {e}")
        return None

def fetch_historical_ohlcv(client, symbol, interval, start_time, limit=1000):
    """
    Fetch historical OHLCV data in batches from Binance.
    """
    all_data = []
    while True:
        data = fetch_ohlcv_batch(client, symbol, interval, start_time, limit)
        if data:
            # Append data to all_data
            all_data.extend(data)
            # Update `start_time` to the timestamp of the last fetched data point + 1 millisecond
            start_time = data[-1][0] + 1
            print(f"Fetched {len(data)} data points. Total so far: {len(all_data)}")
        else:
            print("No more data to fetch or an error occurred.")
            break

        # If the batch size is less than the limit, it means we reached the end of available data
        if len(data) < limit:
            print("Reached the end of available data.")
            break

        # To avoid rate limit issues, wait for a short while
        time.sleep(1)

    # Convert data to DataFrame
    df = pd.DataFrame(all_data, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
    return df

# Usage example
if __name__ == "__main__":
    # Define parameters
    # symbol = 'BTCUSDT'        # Symbol to fetch (without '/')
    interval = Client.KLINE_INTERVAL_1DAY  # Timeframe ('1m', '5m', '1h', '1d', etc.)
    start_time = int(pd.Timestamp("2010-07-17").timestamp() * 1000)  # Start date in milliseconds
    limit = 1000              # Max data points per batch

    # Fetch historical data
    df_all = fetch_historical_ohlcv(client, symbol, interval, start_time, limit)
    print(f"Total fetched data points: {len(df_all)}")
    print(df_all.head())

Fetched 1000 data points. Total so far: 1000
Fetched 1000 data points. Total so far: 2000
Fetched 704 data points. Total so far: 2704
Reached the end of available data.
Total fetched data points: 2704
   timestamp     open     high      low    close       volume
0 2017-08-17  4261.48  4485.39  4200.74  4285.08   795.150377
1 2017-08-18  4285.08  4371.52  3938.77  4108.37  1199.888264
2 2017-08-19  4108.37  4184.69  3850.00  4139.98   381.309763
3 2017-08-20  4120.98  4211.08  4032.62  4086.29   467.083022
4 2017-08-21  4069.13  4119.62  3911.79  4016.00   691.743060


In [34]:
# Select all rows except the last one
df_all = df_all.iloc[:-1]

In [35]:
df_all

Unnamed: 0,timestamp,open,high,low,close,volume
0,2017-08-17,4261.48,4485.39,4200.74,4285.08,795.150377
1,2017-08-18,4285.08,4371.52,3938.77,4108.37,1199.888264
2,2017-08-19,4108.37,4184.69,3850.00,4139.98,381.309763
3,2017-08-20,4120.98,4211.08,4032.62,4086.29,467.083022
4,2017-08-21,4069.13,4119.62,3911.79,4016.00,691.743060
...,...,...,...,...,...,...
2698,2025-01-05,98220.51,98836.85,97276.79,98363.61,8095.637230
2699,2025-01-06,98363.61,102480.00,97920.00,102235.60,25263.433750
2700,2025-01-07,102235.60,102724.38,96181.81,96954.61,32059.875370
2701,2025-01-08,96954.60,97268.65,92500.90,95060.61,33704.678940


In [36]:
# Shift 1 
df_all['prev_close'] = df_all['close'].shift(1)

In [37]:
df_all

Unnamed: 0,timestamp,open,high,low,close,volume,prev_close
0,2017-08-17,4261.48,4485.39,4200.74,4285.08,795.150377,
1,2017-08-18,4285.08,4371.52,3938.77,4108.37,1199.888264,4285.08
2,2017-08-19,4108.37,4184.69,3850.00,4139.98,381.309763,4108.37
3,2017-08-20,4120.98,4211.08,4032.62,4086.29,467.083022,4139.98
4,2017-08-21,4069.13,4119.62,3911.79,4016.00,691.743060,4086.29
...,...,...,...,...,...,...,...
2698,2025-01-05,98220.51,98836.85,97276.79,98363.61,8095.637230,98220.50
2699,2025-01-06,98363.61,102480.00,97920.00,102235.60,25263.433750,98363.61
2700,2025-01-07,102235.60,102724.38,96181.81,96954.61,32059.875370,102235.60
2701,2025-01-08,96954.60,97268.65,92500.90,95060.61,33704.678940,96954.61


In [38]:
# Drop rows with any NaN values
df_all.dropna(inplace=True)

In [39]:
df_all

Unnamed: 0,timestamp,open,high,low,close,volume,prev_close
1,2017-08-18,4285.08,4371.52,3938.77,4108.37,1199.888264,4285.08
2,2017-08-19,4108.37,4184.69,3850.00,4139.98,381.309763,4108.37
3,2017-08-20,4120.98,4211.08,4032.62,4086.29,467.083022,4139.98
4,2017-08-21,4069.13,4119.62,3911.79,4016.00,691.743060,4086.29
5,2017-08-22,4016.00,4104.82,3400.00,4040.00,966.684858,4016.00
...,...,...,...,...,...,...,...
2698,2025-01-05,98220.51,98836.85,97276.79,98363.61,8095.637230,98220.50
2699,2025-01-06,98363.61,102480.00,97920.00,102235.60,25263.433750,98363.61
2700,2025-01-07,102235.60,102724.38,96181.81,96954.61,32059.875370,102235.60
2701,2025-01-08,96954.60,97268.65,92500.90,95060.61,33704.678940,96954.61


In [40]:
# Create the 'up_down' column: 1 if today's close is higher than yesterday's, else 0
df_all['down_close2close'] = (df_all['close'] < df_all['prev_close']).astype(int) 

In [41]:
# Delete columns 
df_all_select = df_all.drop(['timestamp'], axis=1)

In [42]:
# Separate features and target
X_all = df_all_select.drop('down_close2close', axis=1)  # Replace 'target' with your actual target column name
y_all = df_all_select['down_close2close']

In [43]:
# # Handle class imbalance using SMOTE
# from imblearn.over_sampling import SMOTE
# smote = SMOTE(random_state=42)
# X_train_res_all, y_train_res_all = smote.fit_resample(X_all, y_all)

X_train_res_all=X_all
y_train_res_all= y_all

In [44]:
# Parameter untuk GridSearchCV
param_grid = {
    "n_estimators": [50, 100, 200],
    "max_depth": [3, 5, 7],
    "learning_rate": [0.01, 0.1, 0.2],
    "subsample": [0.8, 1.0]
}

In [45]:
# Import the XGBoost classifier
from xgboost import XGBClassifier
model_xgb_all = XGBClassifier(random_state=42, use_label_encoder=False, eval_metric='logloss')

In [46]:
# GridSearchCV for best parameters
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
grid_cv_all = GridSearchCV(estimator=model_xgb_all, param_grid=param_grid, scoring="accuracy", cv=5, verbose=1, n_jobs=-1)

In [47]:
# Train the model
grid_cv_all.fit(X_train_res_all, y_train_res_all)

Fitting 5 folds for each of 54 candidates, totalling 270 fits


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encode

In [48]:
best_model_all = grid_cv_all.best_estimator_

<a id="id"></a>
[**Back to HOME**](#100)

<a id="5"></a>

**Today's Prediction**

In [49]:
symbol='BTCUSDT'

In [50]:
from binance.client import Client
import pandas as pd
import time

# Initialize the Binance client
api_key = "sytvkKKUmXPabC877r7MFv7rhibYAMoczrMdTse0OSB6dRyImx1G8yEInE889y00"
api_secret = "KYgkq441X5spXpdDoLELwlcoJ3k7uh9LeXGgf7aQvABSMZl42Py3OUIwFCqVgc6L"
client = Client(api_key, api_secret)

def fetch_ohlcv_batch(client, symbol, interval, start_time, limit=1000):
    """
    Fetch a batch of OHLCV data from Binance.
    """
    try:
        candles = client.get_klines(
            symbol=symbol,
            interval=interval,
            startTime=start_time,
            limit=limit
        )
        # Transform data into desired format
        ohlcv = [
            [int(c[0]), float(c[1]), float(c[2]), float(c[3]), float(c[4]), float(c[5])]
            for c in candles
        ]
        return ohlcv
    except Exception as e:
        print(f"Error fetching data: {e}")
        return None

def fetch_historical_ohlcv(client, symbol, interval, start_time, limit=1000):
    """
    Fetch historical OHLCV data in batches from Binance.
    """
    all_data = []
    while True:
        data = fetch_ohlcv_batch(client, symbol, interval, start_time, limit)
        if data:
            # Append data to all_data
            all_data.extend(data)
            # Update `start_time` to the timestamp of the last fetched data point + 1 millisecond
            start_time = data[-1][0] + 1
            print(f"Fetched {len(data)} data points. Total so far: {len(all_data)}")
        else:
            print("No more data to fetch or an error occurred.")
            break

        # If the batch size is less than the limit, it means we reached the end of available data
        if len(data) < limit:
            print("Reached the end of available data.")
            break

        # To avoid rate limit issues, wait for a short while
        time.sleep(1)

    # Convert data to DataFrame
    df = pd.DataFrame(all_data, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
    return df

# Usage example
if __name__ == "__main__":
    # Define parameters
    # symbol = 'BTCUSDT'        # Symbol to fetch (without '/')
    interval = Client.KLINE_INTERVAL_1DAY  # Timeframe ('1m', '5m', '1h', '1d', etc.)
    start_time = int(pd.Timestamp("2010-07-17").timestamp() * 1000)  # Start date in milliseconds
    limit = 1000              # Max data points per batch

    # Fetch historical data
    df_today = fetch_historical_ohlcv(client, symbol, interval, start_time, limit)
    print(f"Total fetched data points: {len(df_today)}")
    print(df_today.head())

Fetched 1000 data points. Total so far: 1000
Fetched 1000 data points. Total so far: 2000
Fetched 704 data points. Total so far: 2704
Reached the end of available data.
Total fetched data points: 2704
   timestamp     open     high      low    close       volume
0 2017-08-17  4261.48  4485.39  4200.74  4285.08   795.150377
1 2017-08-18  4285.08  4371.52  3938.77  4108.37  1199.888264
2 2017-08-19  4108.37  4184.69  3850.00  4139.98   381.309763
3 2017-08-20  4120.98  4211.08  4032.62  4086.29   467.083022
4 2017-08-21  4069.13  4119.62  3911.79  4016.00   691.743060


In [51]:
# Select all rows except the last one
df_today = df_today.iloc[:-1]

In [52]:
df_today['prev_close'] = df_today['close'].shift(1)

In [53]:
df_today

Unnamed: 0,timestamp,open,high,low,close,volume,prev_close
0,2017-08-17,4261.48,4485.39,4200.74,4285.08,795.150377,
1,2017-08-18,4285.08,4371.52,3938.77,4108.37,1199.888264,4285.08
2,2017-08-19,4108.37,4184.69,3850.00,4139.98,381.309763,4108.37
3,2017-08-20,4120.98,4211.08,4032.62,4086.29,467.083022,4139.98
4,2017-08-21,4069.13,4119.62,3911.79,4016.00,691.743060,4086.29
...,...,...,...,...,...,...,...
2698,2025-01-05,98220.51,98836.85,97276.79,98363.61,8095.637230,98220.50
2699,2025-01-06,98363.61,102480.00,97920.00,102235.60,25263.433750,98363.61
2700,2025-01-07,102235.60,102724.38,96181.81,96954.61,32059.875370,102235.60
2701,2025-01-08,96954.60,97268.65,92500.90,95060.61,33704.678940,96954.61


In [54]:
df_today_test= df_today.tail(1)

In [55]:
df_today_test

Unnamed: 0,timestamp,open,high,low,close,volume,prev_close
2702,2025-01-09,95060.61,95382.32,91203.67,92552.49,34544.83685,95060.61


In [56]:
# Delete column
df_today_test_ready = df_today_test.drop(columns=['timestamp'])

In [57]:
df_today_test_ready

Unnamed: 0,open,high,low,close,volume,prev_close
2702,95060.61,95382.32,91203.67,92552.49,34544.83685,95060.61


In [58]:
# Evaluate the model data train only
y_today_pred = best_model.predict(df_today_test_ready)
y_today_pred_proba = best_model.predict_proba(df_today_test_ready)[:, 1] 

In [59]:
y_today_pred

array([1])

In [60]:
y_today_pred_proba

array([0.56749386], dtype=float32)

In [61]:
# Evaluate the model on the ALL DATA
y_today_pred_all = best_model_all.predict(df_today_test_ready)
y_today_pred_proba_all = best_model_all.predict_proba(df_today_test_ready)[:, 1]

In [62]:
y_today_pred_all

array([1])

In [63]:
y_today_pred_proba_all

array([0.88991493], dtype=float32)

<a id="id"></a>
[**Back to HOME**](#100)