- Create a Portfolio of Stocks using Open AI gym and Stable Baselines.
- Experiment with different trading strategies.
- Connect to RabbitMQ to excecute orders and generate PnL
Folder: /data/concat.csv
Its a static dataset consisting of the bid_price,ask_price,bid_size,ask_size for 25 securities for 1000 timesteps
Check requirements.txt
python -m pip install -r requirements.txt
Requires Python 3.6/3.7
Conda Env for the 100 server:
conda activate /home/citi/anaconda3/envs/cudf-nightly
File: config.ini
Change:
- Bot Number
- Model (Path of the saved Model or to save model)
Look out for:
- Baseline Algorithm: (DDPG/TD3/PPO2)
- Episodes: No of epochs to train
- Strategy: Trading Strategy to implement
- Train and Test Size
- Window Size for Technical Indicators
- List of Securities to listen and trade on
The Env and the Bot have can have different strategies and initial capital if required.
python rl_trading_bot.py [--load] [--no-train] [--train-only]
Options/Arguments:
- load the pretrained model from the modelpath in the config
- do not train on static data
- only train and save the model
Save folder for model and logs: save/
File: agent.py
To train, react to levels data, trade data, generate orders based on the actions of the model.
RL Algorithms and Policy:
- DDPG - DDPGMLP
- TD3 - TD3MLP
- PPO2 - MLPLSTM
These alogrithms are imported from stable baselines [1] and trained in a custom Open AI Gym env [2].
Folder: /env
**gym_trading_env*8: Custom gym env for custom trading strategy for the selected list of securities
Observation Space: Box - average of bid and price stocks for selected securities
Action Space: Box - range [0,1] (Only longs or reallocation, no negative inventory (negative weights))
- Momentum: Ratio of the average bid price in the window with the average price upto current step.
- Mean Reversion: Inverse of momentum. Assumes the security is mean reverting.
- Moving Average Convergence Diverngence: Get discreet signals on a rolling window by combining two moving averages
Env:
The current action defines the weights of the portfolio. The sample from the action is clipped between 0,1 and normalized such that the sum of all the weights = 1. This ensures that the portfolio is completely utilized with a distribution of securities. (Only positive inventory values)
The reward for the action is log rate of return with the new weights of the portfolio normalized by the progress for a delayed reward.
Bot:
Similarly the current action defines the weights of the portfolio. The sample from the action is clipped between 0,1 and normalized such that the sum of all the weights = 1. Based on the distribution, te portfolio is reallocated and the agent sends buy or sell orders of quantitiy equal to the change of allocation.
File: mx_communication.py
Create and listen to the channels and communicate with the bots.
File: test_pb2.py
Protocol buffer generated descriptors
- Train the agent on new levels data based on a rolling window rather than a static set of observations
- Update model and alphas in the bot based on the completed trade (Would require sync with the matching engine)