Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When is the Actual REAL-TIME trading tests going to happen ? #47

Closed
developeralgo8888 opened this issue Apr 11, 2018 · 6 comments
Closed

Comments

@developeralgo8888
Copy link

Use this template when reporting bugs, errors or unexpected behaviour.
Override for general questions, feature requests, proposals etc.

Running environment:
Files or part of package has been run:
Expected behaviour:
Actual behaviour:
Steps to reproduce:
@developeralgo8888 developeralgo8888 changed the title When is the Actual REAL-TIME trading tests going to be happen ? When is the Actual REAL-TIME trading tests going to happen ? Apr 11, 2018
@Kismuz
Copy link
Owner

Kismuz commented Apr 12, 2018

@developeralgo8888, when good backtest generalisation results will be obtained. Nevertheless, any real-time trading interface development ideas are welcome.

@Kismuz Kismuz closed this as completed Apr 16, 2018
@mysl
Copy link

mysl commented Apr 17, 2018

Hi, @Kismuz ,
I am planning to take a deeper look into the GPS/imitation learning approach. Could you pls kindly share some experience with that? Does this direction show any good potential for the trading task? Thank you so much!

@Kismuz
Copy link
Owner

Kismuz commented Apr 17, 2018

@mysl,
My implementation of GPS is quite simple, code can be found at btgym.research.gps; notebook: examples/guided_a3c.ipynb;
First, there is very simple Oracle class: for a backtesting training episode we know data in advance so we can estimate optimal trading strategy. It is bit complex because we need solve optimisation task taking in account all broker and account conditions. Instead I implemented 'advisor' indicating is it either time to buy, hold or sell. It uses quite primitive algorithm: just estimates local price peaks and emits signal with some repetition. Those signals can be seen at episode rendering chart if ExpertObserver is added to strategy. So Oracle scans entire episode data just before episode starts, estimates advises and appends it to observation step-by-step.
Next we need incorporate it to our loss. Oracle signals are actually encoded to action probabilities so we can compare it against those emitted by policy; I have found that it is sufficient to estimate loss only on buy and sell actions and omit the rest. And we just sum this loss with base A3C loss with some lambda weight to control the strength Oracle have over algorithm.
The trick here is to find balance between guidance and actual learning.
This approach works and it works well.
Guided loss especially beneficial at early stages of learning when there is danger to stuck at local 'do nothing' solution. Guided loss effectively prevents that and almost doubles convergence speed.
It could be annealed to zero at later stages as imperfect advices can prevent from finding optimal policy, especially with such primitive advisor as mine.
I advice you to look at the code as it is very simple and play with notebook to get the feeling of GPS impact on training:

  • try to train at bigger (one year) dataset with guided_lambda=0 and see gradients dying and policy doing nothing; set lambda to 1.0 - 5.0 to see gradients remain consistent and policy improving;

  • on synthetic sine wave dataset set lower and higher lambdas and see how higher values can slow down convergence speed as policy approaches optimal (it can bee seen by how fast episode length is contracting: as policy became close to optimal, it can reach the target and terminate earlier, so episode length should drop);

@Kismuz Kismuz reopened this Apr 17, 2018
@mysl
Copy link

mysl commented Apr 17, 2018

@Kismuz Thanks for the detail explanation. I will take a look based on your advice. Does this approach help on the generalization issue you mentioned?
BTW, it looks like the recent ICLR 2018 best paper is dealing with nonstationary environments. Maybe that could be helpful in trading context too?

@mysl
Copy link

mysl commented Apr 17, 2018

@Kismuz
Copy link
Owner

Kismuz commented Apr 17, 2018

One of my pillow-books now :)

@Kismuz Kismuz closed this as completed May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants