You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, we compiled the Atari 100k results from DrQ, CURL, and DER, and the mean/median human-norm scores are well below those reported in prior works, including from co-authors of the rliable paper.
We have median human-norm scores all around 0.10 - 0.12.
Is this accurate? Of all of these, DER (the oldest of the algs) has the highest mean human-norm score.
The text was updated successfully, but these errors were encountered:
That doesn't seem right -- the aggregate scores should match as in figure below (uses 10 runs), which can be done using the colab at bit.ly/statistical_precipice_colab:
Hi, we compiled the Atari 100k results from DrQ, CURL, and DER, and the mean/median human-norm scores are well below those reported in prior works, including from co-authors of the rliable paper.
We have median human-norm scores all around 0.10 - 0.12.
Is this accurate? Of all of these, DER (the oldest of the algs) has the highest mean human-norm score.
The text was updated successfully, but these errors were encountered: