Feature Summary: Machine Learning High Frequency Trading Crypto Order Book Prediction + Trading Bot
Machine Learning model that forecasts the price direction and movement in the order book. This project contains a full-cycle of research: getting data, visualization, feature engineering, modeling, fine-tuning of the algorithm, and quality estimation
-
Project Goals:
- Goal #1: Using machine learning models on a high frequency trading (HFT) crypto order book to predict price direction and price levels.
- Goal #2: Develop a strategy trading bot to place mock trades on predicted outcomes from ML models and compare actual versus realized strategy returns
This project leverages python 3.9 with the following packages:
- pandas - A powerful data analysis toolkit.
- numpy - A core library for scientific computing in Python
- matplotlib - Tools for creating static, animated, and interactive visualizations
- hvplot - High-level API for data exploration and visualization
- seaborn - Statistical data visualization tools
- sklearn - Simple and efficient tools for predictive data analysis
- imblearn - Provides tools when dealing with classification with imbalanced classes
- tensorflow - An end-to-end open source platform for machine learning
- xgboost - An optimized distributed gradient boosting library designed to be highly efficient
This project leverages python 3.9 with the following packages:
Jupyter Lab 3.3.2 is required
- Jupyter Lab is primarily used as a web-based development environment for the notebooks, code, and data associated with this project. Its flexible interface allows users to configure and arrange workflows in data science
Before running the application first install the following dependencies.
pip install pandas
pip install numpy
pip install sklearn
pip install imblearn
pip install matplotlib
pip install seaborn
pip install hvplot
pip install xgboostAssumption made for module challenge: the sys and Path module will not be required to be explicitely called out in Installation guide section
To run Jupyter Lab, need to install Anaconda:
- Anaconda - an open-source distribution of the Python
To run the Machine Learning Trading Bot application, simply clone the repository and run the following jupyter lab scripts:
-
Crypto Order Book Processing:
- crypto_orderbook_process.ipynb: Order Book feature selection/engineering for the process of selecting, manipulating, and transforming raw data into features.
- crypto_orderbook_visualization.ipynb: Visual inspection/confirmation of of order book features
-
Machine Learning Supervised Learning:
- supervised_learning_midpoint.ipynb: Using machine learning supervised learning techniques to predict the direction and price movement of the "midpoint" price in the order book.
- supervised_learning_imbalance.ipynb: Using machine learning supervised learning techniques to predict the direction of the bid-ask imbalance at each level in the orde book.
- supervised_learning_midpoint_permutate.ipynb: Permuate through supervised_learning_midpoint.ipynb on different crypto assets and time frames (Assets: BTC, ETH, ADA; Timeframes: 5min, 1min)
- supervised_learning_imbalance_permutate.ipynb: Permuate through supervised_learning_imbalance.ipynb on different crypto assets and time frames (Assets: BTC, ETH, ADA; Timeframes: 5min, 1min)
- supervised_learning_imbalance_kelly.ipynb: Use the Kelly Criterion on the strategy trading bot and compare to supervised_learning_imbalance.ipynb
-
Machine Learning Deep Learning:
- deep_learning_midpoint.ipynb: Using machine learning deep learning techniques to predict the direction and price movement of the "midpoint" price in the order book.
- deep_learning_imbalance.ipynb: Using machine learning supervised learning techniques to predict the direction of the bid-ask imbalance at each level in the orde book
- deep_learning_midpoint_permutate.ipynb: Permuate through deep_learning_midpoint.ipynb on different crypto assets and time frames (Assets: BTC, ETH, ADA; Timeframes: 5min, 1min)
- deep_learning_imbalance_permutate.ipynb: Permuate through deep_learning_imbalance.ipynb on different crypto assets and time frames (Assets: BTC, ETH, ADA; Timeframes: 5min, 1min)
- deep_learning_lstm_regression.ipynb: Machine learning price prediction using LSTM model
- deep_learning_gradientboost_midpoint.ipynb: Machine learning price prediction using gradient boosting model
- Limit order is an order you place on the order book with a specific limit price
- Top of Book represents the highest bid and the lowest ask that time.
- bid-ask spread is the amount by which the ask price exceeds the bid price for an asset in the market.
- Market orders let you purchase instantly at best price currently available.
- Mid-price is the price between the best price of the sellers offer price and best price of the buyers bid price.
- Liquidity refers to how rapidly shares of a stock can be bought or sold without substantially impacting the stock stock price.
Crypto PROVIDED order book features/columns (_x represent order book level)
-
midpoint = the midpoint between the best bid and the best ask
-
spread = the difference between the best bid and the best ask
-
bids/asks_distance_x = the distance of bid/ask level x from the midprice in %
= (price - midpoint) / midpoint -
bids/asks_limit/market/cancel_notional_x = volume (= price * quantity) of orders at bid/ask level_x
-
bids/asks_notional_x = (asks_limit_notional_x - asks_market_notional_x - asks_cancel_notional_x)
New DERIVED order book features/columns developed in this notebook/python script
-
bids/ask_price_x = the price at bid/ask level_x
= midpoint * (1 + distance) (wehere distance is represented as %) -
bids/asks_limit/market/cancel_quantity_x = quantity (= bids/asks_limit/market/cancel_notional_x / bids/ask_price_x) of orders at bid/ask level_x
-
bids/asks_limit/market/cancel_cum_quantity_ = Cumulative sum of quantities - i.e. bids/asks_limit/market/cancel_quantity_x
Example. cum_quantity_0 = quantity_0
cum_quantity_1 = quantity_0 + quantity_1
cum_quantity_2 = quantity_0 + quantity_1 + quantity_2
... -
bids/asks_limit/market/cancel_cum_notional_x = Cumulative sum of notionals - i.e. bids/asks_limit/market/cancel_notional_x
Example. cum_notional_0 = notional_0
cum_notional_1 = notional_0 + notional_1
cum_notional_2 = notional_0 + notional_1 + notional_2
... -
bid_ask_imbalance_limit/market/cancel_notiona_x = Bid Ask Imbalance = (bid notional / (bid notional + ask notional))
= bids_limit/market/cancel_notional_x / (bids_limit/market/cancel_notional_x + asks_limit/market/cancel_notional_x)
Contributors:
- Sharma, Nitesh
- Gavnoudias, Stratis
- Gorelenkov, Boris
- Lopez, Liset
- Wolfenbarger, William
GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007






















