Urgent question about data aggregates #4

slerman12 · 2021-11-18T14:35:16Z

Hi, we compiled the Atari 100k results from DrQ, CURL, and DER, and the mean/median human-norm scores are well below those reported in prior works, including from co-authors of the rliable paper.

We have median human-norm scores all around 0.10 - 0.12.

Is this accurate? Of all of these, DER (the oldest of the algs) has the highest mean human-norm score.

agarwl · 2021-11-19T04:37:36Z

That doesn't seem right -- the aggregate scores should match as in figure below (uses 10 runs), which can be done using the colab at bit.ly/statistical_precipice_colab:

.

agarwl closed this as completed Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Urgent question about data aggregates #4

Urgent question about data aggregates #4

slerman12 commented Nov 18, 2021 •

edited

agarwl commented Nov 19, 2021

Urgent question about data aggregates #4

Urgent question about data aggregates #4

Comments

slerman12 commented Nov 18, 2021 • edited

agarwl commented Nov 19, 2021

slerman12 commented Nov 18, 2021 •

edited