Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including last step in OPE #12619

Merged
merged 2 commits into from
Dec 8, 2020
Merged

Conversation

felipeeeantunes
Copy link
Contributor

@felipeeeantunes felipeeeantunes commented Dec 4, 2020

Why are these changes needed?

The problem was cited in this discussion topic and also discussed with Sven in a private Slack channel:

We noticed that the code discards the last transition when calculating the WIS metrics since it uses range(batch.count - 1) in the for loop (lines 27 and 43 of this file).
We were already discarding the last state in our batch training because the new_obs variable would be a list of NAs. When we are at the last state we simply cannot observe the new_obs variable. This would probably not happen if we were using a traditional simulator for the environment like in the Cartpole example, as even the last state has a new_obs provided by the environment.
It seems that the code was considering a scenario with only episodic trajectories. If this is the case, should we consider adjusting our batches to make sure they include the last state (even with the new_obs being a list of NAs)?

Sven response was:

Each row should have: o1, a1, r1, d1, o2: Where o1=initial obs; a1=action taken in o1; r1=reward received after taking a1 in o1; d1=[is o2 a terminal state?]; o2=next obs after taking a1 in o1 and receiving r1.
Yes, you should always be able to set o2 to 0.0 (better than NaNs), and then make sure done=True (so DQN will ignore the last state). Would that work for you?
Now that I'm seeing this, though, I'm not sure either, why we cut off the last item in the batch. If the SampleBatch contains the new_obs field, this should not be necessary and we would lose the last transition.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@felipeeeantunes
Copy link
Contributor Author

@sven1977 can you take a look?


# calculate stepwise IS estimate
V_prev, V_step_IS = 0.0, 0.0
for t in range(batch.count - 1):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also be changed to batch.count rather than batch.count-1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be changed. Thank you @matthewhall210


# calculate stepwise weighted IS estimate
V_prev, V_step_WIS = 0.0, 0.0
for t in range(batch.count - 1):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also be changed to batch.count rather than batch.count-1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be changed. Thank you @matthewhall210

@richardliaw
Copy link
Contributor

thanks a bunch for opening this PR :)

@sven1977 sven1977 self-assigned this Dec 8, 2020
Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix @felipeeeantunes !

@sven1977 sven1977 merged commit 4c0f0ce into ray-project:master Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants