New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InvalidArgumentError on terminal observe call #639
Comments
Hi, |
That definitely sounds correct -- entirely possible it's running over. I'm not sure when I'll be able to test it, but I will get back to you asap |
Okay, I hadn't set memory so according to the docs it should have been the minimum. But I just increased it to well over what it needs to be, and it appears to be working. Thanks. Kind of a confusing error though |
Great. Yes, you're right, exception messages have to be improved. Can you post the agent and environment specs, just to double-check? |
yep! Here's the call I make to create the agent
Most of the hyperparameters are arbitrarily set for me to quickly debug and make sure everything is working. The environment itself isn't an OpenAI gym or anything like that, just me calling directly to act+observe up to a max of 60 times/sec according to a data scraped from a game's frame data. On the topic of obscure exception messages, when it's training, it seems to get through all of the updates (takes quite a few minutes & resources), but eventually it will crash with a message like:
Increasing memory, changing batch size had no effect. |
That's weird, the index it's complaining about is completely off. The exception message is from TensorFlow, ideally Tensorforce should catch these before (and generally provide more meaningful messages than it does right now). While I don't see how this would cause the error, general comment: |
Yes, when I set the memory to way higher than it needed to be, |
Okay, so as it turns out the first error was just an error on my part. I assumed the game capped at 60fps, but it caps at 120, so it was getting about double the amount of calls than expected. When I fixed that, I was able to comment out the memory call, but the error when training remained:
This, again, might be a setup error on my part, but I can open a new issue for this if you'd like. |
Let's discuss it here, at least until we identify what the problem is in more detail. Do you make sure the episode is terminated after at most 18000 timesteps? Maybe you can also run your config on CartPole or so, to see whether the same exception occurs, in which case I could reproduce it? |
I recreated it with this exact code:
Outputs:
Does this have something to do with the fact that I'm running it on windows? Like I mentioned in Gitter, it means I can't have tfa, I'm not sure if that's causing any problems. |
Update: Removing the RNN seems to have completely removed the error message. Like so:
|
Thanks, that's helpful, I will look into this asap. |
Okay, I had a look at this now, and unfortunately it turns out to be a rather annoying problem that I have so far not realised: The subsampling-optimizer as part of the PPO optimization randomly subsamples from the batch, wrongly assuming the same shape everywhere, but in the case of RNNs each timestep state actually involves a sequence of preceeding states as well (which are then processed by the RNN to one single embedding per timestep). It should be possible to fix this, but it will certainly take a few days. In the meantime, you could consider just removing the subsampling-optimizer here (by commenting this line). This shouldn't really affect performance on the whole (in theory at least), but it may negatively affect runtime and memory usage. Thanks for catching and reporting this problem! |
Awesome, glad to help! |
I have encountered the same kind of error, but I haven't use any RNN.
The agent was created using following code:
|
Hey @qZhang88, try making sure your |
will try it, but found it might be related to batch size, it only run 11 episode, count from 0 to 10, when 10 episode end, the error occurs. And I have tried to set batch siez to 4, it works all fine, no error. |
Make sure your memory is greater than |
@porgull's comments should help. Generally, one needs to make sure that the memory is big enough to fit enough timesteps, which in case of the episode-mode would be (batch-size + 1) * max-episode-timesteps. Moreover, one needs to make sure that the environment terminates within max-episode-timesteps. If you use |
Thanks, guys @AlexKuhnle and @porgull, it is indeed the memory problem. Now I understand the error message here, update-batch-size.value should <= memory.retrieve_episodes.
|
@porgull , I think the problem you mentioned should be fixed in the latest commit (could only run a few tests right now, so I hope I didn't introduce another problem :-). I also tried to improve / introduce exception messages which hopefully explain the problem a bit better. |
Perhaps something is wrong with my code, but almost half the time when the episode ends, I get an assertion error when I run
observe
on my PPO agent:My original theory was that I was accidentally calling
observe
again after settingterminal=True
and before resetting the agent, or some other abuse ofobserve
, but I prevented that from happening in my code, so I don't believe that's the case. Also, the episode runs completely fine, and I get through thousands of calls toobserve
without ever running to any issues. It's only whenterminal=True
that it seems to occur.Running on Windows 10 x64, with
tensorflow-gpu
v2.0.0 on an RTX2070, Tensorforce installed from the Github at commit 827febcThe text was updated successfully, but these errors were encountered: