Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MineRLNavigate-v0 Trajectory continues for 164 frames after termination #549

Open
ryanrudes opened this issue Jul 8, 2021 · 8 comments
Open

Comments

@ryanrudes
Copy link

There is a trajectory in MineRLNavigate-v0 where the user receives a cumulative reward of 16400, although the maximum cumulative reward in this env should be 100. Upon touching the diamond, the user receives the sparse reward signal 164 times, but the episode should terminate immediately upon making contact with the target block.

@ryanrudes
Copy link
Author

Nevermind, there was a bug in my data processor. While the user does only receive the reward once, the episode continues on for another 164 frames despite what should have been episode termination.

@shwang
Copy link
Member

shwang commented Jul 8, 2021

Could you post the name of this demonstration? (Of the form adjective-plant-animal-#-start_timestamp-end_timestamp)

@ryanrudes
Copy link
Author

v3_absolute_grape_changeling-47_826-1734

@ryanrudes ryanrudes reopened this Jul 9, 2021
@ryanrudes ryanrudes changed the title Cumulative reward of 16400 in a MineRLNavigate-v0 trajectory Trajectory continues for 164 frames after termination Jul 9, 2021
@ryanrudes ryanrudes changed the title Trajectory continues for 164 frames after termination MineRLNavigate-v0 Trajectory continues for 164 frames after termination Jul 9, 2021
@ryanrudes
Copy link
Author

There were also quite a few problems with the MineRLTreechop-v0 data that I think should be pointed out.

In the trajectory labeled "v3_colorless_mung_bean_dragon-19_716-3781", the user accumulates a total reward of 68, although the episode should terminate once it reaches 64.

There are over 20 other trajectories in which the user receives a cumulative reward of 65, one beyond the theoretical maximum. It's needless to mention the identifiers of all of these.

@shwang
Copy link
Member

shwang commented Jul 9, 2021

Thanks for pointing this out -- I'll keep an eye out for these problems when I process and upload the next version of the dataset (I expect that this will happen within the next two weeks).

Any idea why we might be seeing this sort of behavior (both the reward=65 and the "navigation not ending for 100 frames")? @brandonhoughton

In the case of the "navigate not ending for 100 frames", I'm thinking this is probably just corrupted demonstration data? The quickest solution here is probably to mark that particular trajectory name as corrupted.

@brandonhoughton
Copy link
Member

It could be that the human placed a block and got reward a second time for mining a log

@shwang
Copy link
Member

shwang commented Jul 9, 2021

It could be that the human placed a block and got reward a second time for mining a log

That makes sense, since what we are actually checking to create the episode termination condition is "there are 64 logs in the inventory"

@shwang shwang self-assigned this Jul 9, 2021
@shwang shwang assigned decodyng and unassigned shwang and decodyng Jul 23, 2021
@shwang
Copy link
Member

shwang commented Jul 23, 2021

I (and likely Cody) don't have the bandwidth to understand the relevant parts of the Dataset Pipeline and modify them to fix this issue right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants