Skip to content
This repository has been archived by the owner on Nov 10, 2023. It is now read-only.

NaNs in VAC and VPG with Gaussian Policies #4

Open
samuelfneumann opened this issue Mar 31, 2022 · 0 comments
Open

NaNs in VAC and VPG with Gaussian Policies #4

samuelfneumann opened this issue Mar 31, 2022 · 0 comments

Comments

@samuelfneumann
Copy link
Owner

Both VAC and VPG get NaNs in the weights during training with Gaussian policies. I've done some digging, and it looks like this is caused by the standard deviation approaching 0 or infinity, so we should make sure that the standard deviation gets clamped to be between some sensible values.

Of course, there may be other issues besides this which are causing NaNs in the weights during training. Notably, this actually only happens on some runs. A lot of runs (especially with VPG) actually learn quite well, so this makes me think that the issue is caused by some numerical instabilities.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant