Including last step in OPE #12619

felipeeeantunes · 2020-12-04T12:20:04Z

Why are these changes needed?

The problem was cited in this discussion topic and also discussed with Sven in a private Slack channel:

We noticed that the code discards the last transition when calculating the WIS metrics since it uses range(batch.count - 1) in the for loop (lines 27 and 43 of this file).
We were already discarding the last state in our batch training because the new_obs variable would be a list of NAs. When we are at the last state we simply cannot observe the new_obs variable. This would probably not happen if we were using a traditional simulator for the environment like in the Cartpole example, as even the last state has a new_obs provided by the environment.
It seems that the code was considering a scenario with only episodic trajectories. If this is the case, should we consider adjusting our batches to make sure they include the last state (even with the new_obs being a list of NAs)?

Sven response was:

Each row should have: o1, a1, r1, d1, o2: Where o1=initial obs; a1=action taken in o1; r1=reward received after taking a1 in o1; d1=[is o2 a terminal state?]; o2=next obs after taking a1 in o1 and receiving r1.
Yes, you should always be able to set o2 to 0.0 (better than NaNs), and then make sure done=True (so DQN will ignore the last state). Would that work for you?
Now that I'm seeing this, though, I'm not sure either, why we cut off the last item in the batch. If the SampleBatch contains the new_obs field, this should not be necessary and we would lose the last transition.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

felipeeeantunes · 2020-12-04T14:32:42Z

@sven1977 can you take a look?

matthewhall210 · 2020-12-04T19:02:44Z

rllib/offline/is_estimator.py

+
+        # calculate stepwise IS estimate
+        V_prev, V_step_IS = 0.0, 0.0
+        for t in range(batch.count - 1):


Shouldn't this also be changed to batch.count rather than batch.count-1?

Yes, it should be changed. Thank you @matthewhall210

matthewhall210 · 2020-12-04T19:02:53Z

rllib/offline/wis_estimator.py

+
+        # calculate stepwise weighted IS estimate
+        V_prev, V_step_WIS = 0.0, 0.0
+        for t in range(batch.count - 1):


Shouldn't this also be changed to batch.count rather than batch.count-1?

Yes, it should be changed. Thank you @matthewhall210

richardliaw · 2020-12-04T19:55:09Z

thanks a bunch for opening this PR :)

sven1977

Thanks for this fix @felipeeeantunes !

Including last step in OPE

7e99ad8

matthewhall210 reviewed Dec 4, 2020

View reviewed changes

Fix suggested in PR

6d31b64

sven1977 self-assigned this Dec 8, 2020

sven1977 approved these changes Dec 8, 2020

View reviewed changes

sven1977 merged commit 4c0f0ce into ray-project:master Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Including last step in OPE #12619

Including last step in OPE #12619

felipeeeantunes commented Dec 4, 2020 •

edited

Loading

felipeeeantunes commented Dec 4, 2020

matthewhall210 Dec 4, 2020

felipeeeantunes Dec 5, 2020

matthewhall210 Dec 4, 2020

felipeeeantunes Dec 5, 2020

richardliaw commented Dec 4, 2020

sven1977 left a comment

Including last step in OPE #12619

Including last step in OPE #12619

Conversation

felipeeeantunes commented Dec 4, 2020 • edited Loading

Why are these changes needed?

Related issue number

Checks

felipeeeantunes commented Dec 4, 2020

matthewhall210 Dec 4, 2020

Choose a reason for hiding this comment

felipeeeantunes Dec 5, 2020

Choose a reason for hiding this comment

matthewhall210 Dec 4, 2020

Choose a reason for hiding this comment

felipeeeantunes Dec 5, 2020

Choose a reason for hiding this comment

richardliaw commented Dec 4, 2020

sven1977 left a comment

Choose a reason for hiding this comment

felipeeeantunes commented Dec 4, 2020 •

edited

Loading