Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set additional seeds? #33

Closed
egpbos opened this issue Nov 16, 2020 · 3 comments
Closed

Set additional seeds? #33

egpbos opened this issue Nov 16, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@egpbos
Copy link
Collaborator

egpbos commented Nov 16, 2020

We're getting different results with the same configuration settings, including seed, so possibly we are not setting all seeds. Check which ones they are. I suspect torch.cuda.seed() may be one.

@egpbos egpbos added the bug Something isn't working label Jan 13, 2021
@cwmeijer cwmeijer assigned cwmeijer and unassigned cwmeijer Feb 16, 2021
@cwmeijer
Copy link
Contributor

We have tests that are checking performance now (#63). These tests seem to pass consistently on my local machine as well as on Github Actions. A small number was failing on carmine though. Carmine is the only machine of the 3 I just mentioned that is using GPUs. We should indeed check out torch.cuda.seed().

@egpbos
Copy link
Collaborator Author

egpbos commented Apr 27, 2021

With commit 1ffcb82, I made it possible to run the tox test suite on GPU, by setting PLATALEA_DEVICE="cuda:0" (or another number) in the shell before running tox (by default, environment variables are not forwarded into the tox environment, so you have to manually add the ones you want in tox.ini).

I tried running the testsuite on carmine and everything passes.

However, since we're using approximate value checks in the tests, this obviously does not tell us whether the code is now deterministic or whether we're still missing some random seed.

@cwmeijer, you mention in #78 (comment) that the values get rounded differently on different machines. Did you look into where this could have come from? For instance, if it's just about different library versions, we could pin those and maybe then use exact value equality asserts.

@egpbos
Copy link
Collaborator Author

egpbos commented Apr 28, 2021

In branch https://github.com/spokenlanguage/platalea/tree/exact_equal_experiment_tests, I switched the assert to check results exactly, so I could look into determinism of the test results.

On carmine, using the tox test suite, I could not reproduce any non-determinism, not on GPU, nor on CPU. The tests fail and then print the diffs of the results. If I run tests multiple times, the diffs are exactly the same each time.

Note that I also tried installing the same dependency versions (at least the Python ones, can't control system dependencies). As @cwmeijer saw before, this still does not make the results consistent across machines. One reason may be that my laptop is a Mac, so there may be different basic underlying libraries that give inconsistent results with those on carmine's Linux setup. PyTorch indeed does not guarantee determinism across different platforms, see https://pytorch.org/docs/stable/notes/randomness.html.

So, it seems the different results we saw may just indeed have been platform or version differences. Given the fact that we cannot seem to get them equal to perform further tests and I could not reproduce non-determinism in the first place, I vote we close this issue.

@bhigy bhigy closed this as completed Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants