You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks so much for your work. I have a question about the normalization of results. Specifically, e.g., in the Gym domain, each result is normalized according to the expert-policy (sac) and random-policy. But which number should we refer to? On the Wiki/"Off policy evaluation" page, there is a form that includes the expert-policy and random-policy, should we refer these? Also, the results of the expert-policy are different from the SAC results in Table3 (ICLR), so which one should we use?
And I noticed that in Table 2 and 3 (ICLR), the result of CQL-'hopper-medium' seems not aligned, could you please confirm this (maybe also the CQL-'walker2d-medium')?
Thanks.
The text was updated successfully, but these errors were encountered:
Hi,
Thanks so much for your work. I have a question about the normalization of results. Specifically, e.g., in the Gym domain, each result is normalized according to the expert-policy (sac) and random-policy. But which number should we refer to? On the Wiki/"Off policy evaluation" page, there is a form that includes the expert-policy and random-policy, should we refer these? Also, the results of the expert-policy are different from the SAC results in Table3 (ICLR), so which one should we use?
And I noticed that in Table 2 and 3 (ICLR), the result of CQL-'hopper-medium' seems not aligned, could you please confirm this (maybe also the CQL-'walker2d-medium')?
Thanks.
The text was updated successfully, but these errors were encountered: