Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Training metrics are different for the same config and same seed for different runs #839

Closed
1 task done
Westerby opened this issue Dec 7, 2022 · 1 comment
Closed
1 task done

Comments

@Westerby
Copy link

Westerby commented Dec 7, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Problem summary

Training metrics used in consecutive trainings, with same parameters, are different for our own configs.

Code for reproduction

Can provide our config code via e-mail or private message.

Actual outcome

We developed our configs with 8495a2e InnerEye commit,
When training on 8495a2e the resulting metrics are the same for different runs and same seed.
When training on last committ: d902e02, the resulting metrics are slighlty different for different runs and same seed.

We made a separate test for Lung.py config which comes with InnerEye, and the metrics are the same when tested on both 8495a2e and d902e02 .

In our config we use only random module from standard library, which is supposed to be seed with seed_everything method from pytorch lightning. We checked some values in randomly generated numbers in our module, and they were the same for different runs.

Error messages

No response

Expected outcome

We want to have exactly the same results for all runs with the same config and seed value.

System info

System: Ubuntu 18.04.5 LTS
env.txt

AB#8327

@Westerby
Copy link
Author

Westerby commented Dec 12, 2022

Okay, there's new parameter added for LightningContainer, which we missed after changing IE version.
self.pl_deterministic = True

With this added, reproducibility works.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

1 participant