New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MineRLNavigate-v0 Trajectory continues for 164 frames after termination #549
Comments
Nevermind, there was a bug in my data processor. While the user does only receive the reward once, the episode continues on for another 164 frames despite what should have been episode termination. |
Could you post the name of this demonstration? (Of the form |
v3_absolute_grape_changeling-47_826-1734 |
There were also quite a few problems with the MineRLTreechop-v0 data that I think should be pointed out. In the trajectory labeled "v3_colorless_mung_bean_dragon-19_716-3781", the user accumulates a total reward of 68, although the episode should terminate once it reaches 64. There are over 20 other trajectories in which the user receives a cumulative reward of 65, one beyond the theoretical maximum. It's needless to mention the identifiers of all of these. |
Thanks for pointing this out -- I'll keep an eye out for these problems when I process and upload the next version of the dataset (I expect that this will happen within the next two weeks). Any idea why we might be seeing this sort of behavior (both the reward=65 and the "navigation not ending for 100 frames")? @brandonhoughton In the case of the "navigate not ending for 100 frames", I'm thinking this is probably just corrupted demonstration data? The quickest solution here is probably to mark that particular trajectory name as corrupted. |
It could be that the human placed a block and got reward a second time for mining a log |
That makes sense, since what we are actually checking to create the episode termination condition is "there are 64 logs in the inventory" |
I (and likely Cody) don't have the bandwidth to understand the relevant parts of the Dataset Pipeline and modify them to fix this issue right now. |
There is a trajectory in MineRLNavigate-v0 where the user receives a cumulative reward of 16400, although the maximum cumulative reward in this env should be 100. Upon touching the diamond, the user receives the sparse reward signal 164 times, but the episode should terminate immediately upon making contact with the target block.
The text was updated successfully, but these errors were encountered: