Training with DQN #183

rodrigogutierrezm · 2021-04-20T14:28:52Z

Hello, thank you for sharing this great job. I am trying to replicate the behaviour shown in the examples (Deep Q-Network). Have you trained with the network provided in the rl-agents? I have tried it with 1000 episodes and when I test it, the agent only moves to the right. Maybe more episodes are needed.

Thank you in advance.

eleurent · 2021-04-20T14:51:06Z

Hi, yes it was rl-agents' implementation and hyperparams.
I believe it was trained on about 5k episodes (I should really make this part of the agent configuration)

rodrigogutierrezm · 2021-04-21T08:22:29Z

Ok, just to clariffy, you modified the hiperparams. To replicate your results some modifications in these params are required, right?

Thank you.

eleurent · 2021-04-21T09:06:51Z

No, I do not think that I changed the hyperparameters, I mostly refactored the file structure.
I will try to run it again and see if I can reproduce the results.

eleurent · 2021-04-21T12:02:19Z

See also: eleurent/rl-agents#21

rodrigogutierrezm · 2021-04-21T13:13:09Z

Perfect, thank you.

eleurent · 2021-04-21T14:59:53Z

So I ran a run with the current dueling_ddqn.json config for 1.5k episodes, and got these results:

They seem worse than what I had in May 2019 (though it is hard to check on a single run).

The corresponding behaviors are reasonable, but still have quite a high number of collisions:

openaigym.video.0.12124.video000004.mp4

openaigym.video.0.12124.video000003.mp4

openaigym.video.0.12124.video000007.mp4

I checked for differences in the configurations, and noticed that:

the dueling architecture has changed from

a shared base network with [256, 256] hidden layers and two linear heads (value and advantage)
a shared base network with [256,128] hidden layers + an additional [128] hidden layer for each head (value and advantage)

the learning rate is not specified in the agent configuration, so the default value is used, and it has been changed from 5e-4 to 1e-3.

I will try again with the previous values, to see if there's a difference.

RobeSafe-UAH · 2021-04-22T08:25:09Z

Ok, I am still trying to reach those results, thank you for your help, as soon as I get a good model I will let you know.

RobeSafe-UAH · 2021-04-22T09:07:48Z

How do you get the episode/return graphic? Thanks

eleurent · 2021-04-22T09:11:48Z

Through tensorboard. If you have it installed, you can run

tensorboard --logdir <rl-agents path>/scripts/out/HighwayEnv/DQNAgent/

This will spawn a web server allowing you to visualize your runs (mostly rewards and network architecture for now, but I should add other metrics, such as average Q-values in the sampled minibatch or initial state).

RobeSafe-UAH · 2021-04-22T09:19:00Z

Thank you

eleurent · 2021-04-22T10:14:19Z

I found that that there is indeed a regression in performance, but it is due to changes in the environment (highway-env) rather than agent (rl-agents). See this chart:

Orange is the run I launched in May 2019 (i still had it on my disk)
Blue is the run I launched yesterday, with current version of rl-agents and highway-env
Gray is a run I launched this morning, with current version of highway-env and old hyperparameters of rl-agents (does not seem to be changing much)
Finally, red is a run with current version of rl-agents (& old hyperparams), but with the version of highway-env from May 2019

It seems that the environment has become more difficult to solve, though I do not know why.
This could be due to changes

in the vehicle dynamics?
in the vehicles initialization / density?
in the reward function?

it seems that 1. has not really changed, 2. has a little bit, and 3. has a minor change.

I will investigate, and maybe even git bisect if i cannot find any meaningful difference in the code.

eleurent · 2021-04-22T11:40:40Z

I found out why the current version of highway-env is more difficult than it used to:

previously, other vehicles were initialized with a target speed of [23-25]
now, they are much slower, with a target speed of [14-20].

which explains why the agent tends to get more collisions.

This is due to the speed limit of the road, set to 20 m/s (by default), where 30 m/s would be more appropriate. I will restore this value.

rodrigogutierrezm · 2021-04-22T14:13:49Z

Thank you for all the information. I have been able to reproduce the training and the charts. In order to test the results i run

python3 experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/dqn.json --test --episodes=10

But the performance of the ego vehicle is not good. I am not sure if I am using the trained model, is there a way to specify the model to be used?

eleurent · 2021-04-22T14:17:17Z

You must simply add the --recover option , or --recover-from=path/to/model.tar, to load a trained model before evaluating.
The --recover option loads scripts/out/<Env>/<Agent>/saved_models/latest.tar by default (which is updated during training)

See #183

rodrigogutierrezm · 2021-04-23T11:17:11Z

Hello, I was able to replicate your results. One last question, when you select an agent as dueling_ddqn, in the model a type is defined ("DuelingNetwork). Where does this type is created?
Thank you very much.

eleurent · 2021-04-23T12:39:38Z

Here:
https://github.com/eleurent/rl-agents/blob/a290be38351cf29c03779cb6683d831a06b74864/rl_agents/agents/common/models.py#L79

rodrigogutierrezm · 2021-04-24T11:39:44Z

Thank you for everything.

eleurent added a commit that referenced this issue Apr 22, 2021

Update speed limits, and initial vehicles speed and spacing

3ef36a0

See #183

rodrigogutierrezm closed this as completed Apr 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with DQN #183

Training with DQN #183

rodrigogutierrezm commented Apr 20, 2021

eleurent commented Apr 20, 2021

rodrigogutierrezm commented Apr 21, 2021

eleurent commented Apr 21, 2021

eleurent commented Apr 21, 2021

rodrigogutierrezm commented Apr 21, 2021

eleurent commented Apr 21, 2021

RobeSafe-UAH commented Apr 22, 2021

RobeSafe-UAH commented Apr 22, 2021

eleurent commented Apr 22, 2021

RobeSafe-UAH commented Apr 22, 2021

eleurent commented Apr 22, 2021 •

edited

eleurent commented Apr 22, 2021

rodrigogutierrezm commented Apr 22, 2021

eleurent commented Apr 22, 2021

rodrigogutierrezm commented Apr 23, 2021

eleurent commented Apr 23, 2021 •

edited

rodrigogutierrezm commented Apr 24, 2021

Training with DQN #183

Training with DQN #183

Comments

rodrigogutierrezm commented Apr 20, 2021

eleurent commented Apr 20, 2021

rodrigogutierrezm commented Apr 21, 2021

eleurent commented Apr 21, 2021

eleurent commented Apr 21, 2021

rodrigogutierrezm commented Apr 21, 2021

eleurent commented Apr 21, 2021

RobeSafe-UAH commented Apr 22, 2021

RobeSafe-UAH commented Apr 22, 2021

eleurent commented Apr 22, 2021

RobeSafe-UAH commented Apr 22, 2021

eleurent commented Apr 22, 2021 • edited

eleurent commented Apr 22, 2021

rodrigogutierrezm commented Apr 22, 2021

eleurent commented Apr 22, 2021

rodrigogutierrezm commented Apr 23, 2021

eleurent commented Apr 23, 2021 • edited

rodrigogutierrezm commented Apr 24, 2021

eleurent commented Apr 22, 2021 •

edited

eleurent commented Apr 23, 2021 •

edited