Can't reproduce experimental results #69

Yingdong-Hu · 2021-01-26T08:35:12Z

I tried to run probing tasks for different Atari environments, using the following command:

python -m scripts.run_probe --method infonce-stdim --env-name {env_name}

I did not change any code, just tried different game, including PongNoFrameskip-v4, BowlingNoFrameskip-v4, BreakoutNoFrameskip-v4, HeroNoFrameskip-v4.

However, only the F1 score for pong matches the score reported in the paper. The F1 scores of the other three games are far worse than the score shown in the paper (for bowling, I got 0.22).

I check the training loss logged in wandb, it seems that training has not converged at all. See the figure below.

How to get the F1 socres reported in the paper? Am I missing something?

The text was updated successfully, but these errors were encountered:

ankeshanand · 2021-01-26T10:32:54Z

Hi Alex,

That's quite strange, those loss curves don't match what we saw in our runs. There definitely seems to be a bug or a numerical error in your runs since some of the loss values are blowing up to quite high values.

Can you confirm what pytorch version you are using? Also, I just looked up on our W&B dashboard, and this is how the loss curves look like for Bowling:

Yingdong-Hu · 2021-01-26T10:50:54Z

My pytorch version is 1.7.0. What's your pytorch version?

ankeshanand · 2021-01-26T10:56:27Z

We used pytorch 1.1.0 for our runs back then

ankeshanand · 2021-01-26T11:10:09Z

So check if an older version of pytorch gives the correct results, this does look like a numerical stability bug. Also try increasing Adam's eps value. If that doesn't fix it, it's possible (though unlikely since we were quite careful) that a bug was introduced when we were cleaning up the code. I can try to debug the source of divergence then, but might only get time after the ICML deadline.

Yingdong-Hu · 2021-01-26T11:13:25Z

Thank you, let me try it first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce experimental results #69

Can't reproduce experimental results #69

Yingdong-Hu commented Jan 26, 2021

ankeshanand commented Jan 26, 2021

Yingdong-Hu commented Jan 26, 2021 •

edited

ankeshanand commented Jan 26, 2021

ankeshanand commented Jan 26, 2021

Yingdong-Hu commented Jan 26, 2021

Can't reproduce experimental results #69

Can't reproduce experimental results #69

Comments

Yingdong-Hu commented Jan 26, 2021

ankeshanand commented Jan 26, 2021

Yingdong-Hu commented Jan 26, 2021 • edited

ankeshanand commented Jan 26, 2021

ankeshanand commented Jan 26, 2021

Yingdong-Hu commented Jan 26, 2021

Yingdong-Hu commented Jan 26, 2021 •

edited