Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with DQN #183

Closed
rodrigogutierrezm opened this issue Apr 20, 2021 · 17 comments
Closed

Training with DQN #183

rodrigogutierrezm opened this issue Apr 20, 2021 · 17 comments

Comments

@rodrigogutierrezm
Copy link

Hello, thank you for sharing this great job. I am trying to replicate the behaviour shown in the examples (Deep Q-Network). Have you trained with the network provided in the rl-agents? I have tried it with 1000 episodes and when I test it, the agent only moves to the right. Maybe more episodes are needed.

Thank you in advance.

@eleurent
Copy link
Collaborator

Hi, yes it was rl-agents' implementation and hyperparams.
I believe it was trained on about 5k episodes (I should really make this part of the agent configuration)

@rodrigogutierrezm
Copy link
Author

Ok, just to clariffy, you modified the hiperparams. To replicate your results some modifications in these params are required, right?

Thank you.

@eleurent
Copy link
Collaborator

No, I do not think that I changed the hyperparameters, I mostly refactored the file structure.
I will try to run it again and see if I can reproduce the results.

@eleurent
Copy link
Collaborator

See also: eleurent/rl-agents#21

@rodrigogutierrezm
Copy link
Author

Perfect, thank you.

@eleurent
Copy link
Collaborator

So I ran a run with the current dueling_ddqn.json config for 1.5k episodes, and got these results:

image

They seem worse than what I had in May 2019 (though it is hard to check on a single run).

The corresponding behaviors are reasonable, but still have quite a high number of collisions:

openaigym.video.0.12124.video000004.mp4
openaigym.video.0.12124.video000003.mp4
openaigym.video.0.12124.video000007.mp4

I checked for differences in the configurations, and noticed that:

  1. the dueling architecture has changed from
  • a shared base network with [256, 256] hidden layers and two linear heads (value and advantage)
  • a shared base network with [256,128] hidden layers + an additional [128] hidden layer for each head (value and advantage)
  1. the learning rate is not specified in the agent configuration, so the default value is used, and it has been changed from 5e-4 to 1e-3.

I will try again with the previous values, to see if there's a difference.

@RobeSafe-UAH
Copy link

Ok, I am still trying to reach those results, thank you for your help, as soon as I get a good model I will let you know.

@RobeSafe-UAH
Copy link

How do you get the episode/return graphic? Thanks

@eleurent
Copy link
Collaborator

Through tensorboard. If you have it installed, you can run

tensorboard --logdir <rl-agents path>/scripts/out/HighwayEnv/DQNAgent/

This will spawn a web server allowing you to visualize your runs (mostly rewards and network architecture for now, but I should add other metrics, such as average Q-values in the sampled minibatch or initial state).

@RobeSafe-UAH
Copy link

Thank you

@eleurent
Copy link
Collaborator

eleurent commented Apr 22, 2021

I found that that there is indeed a regression in performance, but it is due to changes in the environment (highway-env) rather than agent (rl-agents). See this chart:

image

  • Orange is the run I launched in May 2019 (i still had it on my disk)
  • Blue is the run I launched yesterday, with current version of rl-agents and highway-env
  • Gray is a run I launched this morning, with current version of highway-env and old hyperparameters of rl-agents (does not seem to be changing much)
  • Finally, red is a run with current version of rl-agents (& old hyperparams), but with the version of highway-env from May 2019

It seems that the environment has become more difficult to solve, though I do not know why.
This could be due to changes

  1. in the vehicle dynamics?
  2. in the vehicles initialization / density?
  3. in the reward function?

it seems that 1. has not really changed, 2. has a little bit, and 3. has a minor change.

I will investigate, and maybe even git bisect if i cannot find any meaningful difference in the code.

@eleurent
Copy link
Collaborator

I found out why the current version of highway-env is more difficult than it used to:

  • previously, other vehicles were initialized with a target speed of [23-25]
  • now, they are much slower, with a target speed of [14-20].

which explains why the agent tends to get more collisions.

This is due to the speed limit of the road, set to 20 m/s (by default), where 30 m/s would be more appropriate. I will restore this value.

@rodrigogutierrezm
Copy link
Author

Thank you for all the information. I have been able to reproduce the training and the charts. In order to test the results i run

python3 experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/dqn.json --test --episodes=10

But the performance of the ego vehicle is not good. I am not sure if I am using the trained model, is there a way to specify the model to be used?

@eleurent
Copy link
Collaborator

You must simply add the --recover option , or --recover-from=path/to/model.tar, to load a trained model before evaluating.
The --recover option loads scripts/out/<Env>/<Agent>/saved_models/latest.tar by default (which is updated during training)

@rodrigogutierrezm
Copy link
Author

Hello, I was able to replicate your results. One last question, when you select an agent as dueling_ddqn, in the model a type is defined ("DuelingNetwork). Where does this type is created?
Thank you very much.

@eleurent
Copy link
Collaborator

eleurent commented Apr 23, 2021

@rodrigogutierrezm
Copy link
Author

Thank you for everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants