New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training with DQN #183
Comments
Hi, yes it was rl-agents' implementation and hyperparams. |
Ok, just to clariffy, you modified the hiperparams. To replicate your results some modifications in these params are required, right? Thank you. |
No, I do not think that I changed the hyperparameters, I mostly refactored the file structure. |
See also: eleurent/rl-agents#21 |
Perfect, thank you. |
So I ran a run with the current They seem worse than what I had in May 2019 (though it is hard to check on a single run). The corresponding behaviors are reasonable, but still have quite a high number of collisions: openaigym.video.0.12124.video000004.mp4openaigym.video.0.12124.video000003.mp4openaigym.video.0.12124.video000007.mp4I checked for differences in the configurations, and noticed that:
I will try again with the previous values, to see if there's a difference. |
Ok, I am still trying to reach those results, thank you for your help, as soon as I get a good model I will let you know. |
How do you get the episode/return graphic? Thanks |
Through tensorboard. If you have it installed, you can run
This will spawn a web server allowing you to visualize your runs (mostly rewards and network architecture for now, but I should add other metrics, such as average Q-values in the sampled minibatch or initial state). |
Thank you |
I found that that there is indeed a regression in performance, but it is due to changes in the environment (highway-env) rather than agent (rl-agents). See this chart:
It seems that the environment has become more difficult to solve, though I do not know why.
it seems that 1. has not really changed, 2. has a little bit, and 3. has a minor change. I will investigate, and maybe even git bisect if i cannot find any meaningful difference in the code. |
I found out why the current version of highway-env is more difficult than it used to:
which explains why the agent tends to get more collisions. This is due to the speed limit of the road, set to 20 m/s (by default), where 30 m/s would be more appropriate. I will restore this value. |
Thank you for all the information. I have been able to reproduce the training and the charts. In order to test the results i run
But the performance of the ego vehicle is not good. I am not sure if I am using the trained model, is there a way to specify the model to be used? |
You must simply add the |
Hello, I was able to replicate your results. One last question, when you select an agent as dueling_ddqn, in the model a type is defined ("DuelingNetwork). Where does this type is created? |
Thank you for everything. |
Hello, thank you for sharing this great job. I am trying to replicate the behaviour shown in the examples (Deep Q-Network). Have you trained with the network provided in the rl-agents? I have tried it with 1000 episodes and when I test it, the agent only moves to the right. Maybe more episodes are needed.
Thank you in advance.
The text was updated successfully, but these errors were encountered: