Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Usage #159

Closed
mwcvitkovic opened this issue Sep 5, 2018 · 3 comments · Fixed by #161
Closed

GPU Usage #159

mwcvitkovic opened this issue Sep 5, 2018 · 3 comments · Fixed by #161

Comments

@mwcvitkovic
Copy link
Contributor

I'm running the dqn_BeamRider-v4 trial in the attached spec file in train mode. (It's basically identical to the dqn_boltzmann_breakout trial in the dqn.json spec file, but with the BeamRider env.)

I'm running 4 sessions at once in a 4 gpu machine. For the first ~5 episodes in each session all 4 gpus are used correctly, one for each session. (gpu 0 has a couple extra processes on it, which I'm assuming is normal). But gradually, by around ~100 episodes, all the gpu processes dissappear, and nothing is running on GPU anymore. The training process never crashes or finishes during this - its cpu usage goes to 0 and just sits there.

Any ideas what's going on?

openai_baseline_reproduction_dqn copy.txt

@mwcvitkovic
Copy link
Contributor Author

My best guess is that it has to do with saving the episode, but not at all sure.

@kengz
Copy link
Owner

kengz commented Sep 5, 2018 via email

@kengz
Copy link
Owner

kengz commented Sep 6, 2018

identified the root cause: the save method argument was accidentally removed. So, the process actually crashes when it tries to save.
The PR above will fix this, along with some extra improvements for logging verbosity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants