Reinforcement Learning based Trading Bot

Create a Portfolio of Stocks using Open AI gym and Stable Baselines.
Experiment with different trading strategies.
Connect to RabbitMQ to excecute orders and generate PnL

Dataset Description

Folder: /data/concat.csv
Its a static dataset consisting of the bid_price,ask_price,bid_size,ask_size for 25 securities for 1000 timesteps

Requirements

Check requirements.txt
python -m pip install -r requirements.txt
Requires Python 3.6/3.7
Conda Env for the 100 server:
conda activate /home/citi/anaconda3/envs/cudf-nightly

Configuration

File: config.ini Change:

Bot Number
Model (Path of the saved Model or to save model)

Look out for:

Baseline Algorithm: (DDPG/TD3/PPO2)
Episodes: No of epochs to train
Strategy: Trading Strategy to implement
Train and Test Size
Window Size for Technical Indicators
List of Securities to listen and trade on

The Env and the Bot have can have different strategies and initial capital if required.

To Run

python rl_trading_bot.py [--load] [--no-train] [--train-only]

Options/Arguments:

load the pretrained model from the modelpath in the config
do not train on static data
only train and save the model

Save folder for model and logs: save/

Agent

File: agent.py
To train, react to levels data, trade data, generate orders based on the actions of the model. RL Algorithms and Policy:

DDPG - DDPGMLP
TD3 - TD3MLP
PPO2 - MLPLSTM
These alogrithms are imported from stable baselines [1] and trained in a custom Open AI Gym env [2].

Custom Env

Folder: /env
**gym_trading_env*8: Custom gym env for custom trading strategy for the selected list of securities

Observation Space: Box - average of bid and price stocks for selected securities
Action Space: Box - range [0,1] (Only longs or reallocation, no negative inventory (negative weights))

Trading Strategies

Momentum: Ratio of the average bid price in the window with the average price upto current step.
Mean Reversion: Inverse of momentum. Assumes the security is mean reverting.
Moving Average Convergence Diverngence: Get discreet signals on a rolling window by combining two moving averages

Env:
The current action defines the weights of the portfolio. The sample from the action is clipped between 0,1 and normalized such that the sum of all the weights = 1. This ensures that the portfolio is completely utilized with a distribution of securities. (Only positive inventory values)
The reward for the action is log rate of return with the new weights of the portfolio normalized by the progress for a delayed reward.

Bot:
Similarly the current action defines the weights of the portfolio. The sample from the action is clipped between 0,1 and normalized such that the sum of all the weights = 1. Based on the distribution, te portfolio is reallocated and the agent sends buy or sell orders of quantitiy equal to the change of allocation.

RabbitMQ

File: mx_communication.py
Create and listen to the channels and communicate with the bots.
File: test_pb2.py
Protocol buffer generated descriptors

Next Steps

Train the agent on new levels data based on a rolling window rather than a static set of observations
Update model and alphas in the bot based on the completed trade (Would require sync with the matching engine)

References

[1] https://stable-baselines.readthedocs.io/en/master/

[2] https://gym.openai.com/

https://gym.openai.com/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reinforcement Learning based Trading Bot

Dataset Description

Requirements

Configuration

To Run

Agent

Custom Env

Trading Strategies

RabbitMQ

Next Steps

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reinforcement Learning based Trading Bot

Dataset Description

Requirements

Configuration

To Run

Agent

Custom Env

Trading Strategies

RabbitMQ

Next Steps

References