Resolve sampler benchmark variability with setting random seed #340
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@NicolasHug's comment about the sampler benchmark variability piqued my interest. I dug into it a little bit, and I could reproduce the variability. The root issue is (as it always is) torch.manual_seed. If we make sure that each run has the same seed, the variability of the random samplers goes way down to a reasonable level. And more evidence is that changing the seed changes the actual run time. Some examples:
With seed=0:
With seed=1:
With seed=1234567:
And, notably, when we don't specify a seed, we go back to the old behavior:
This means that by default, our benchmarks are more like a training job that feeds different random points to sample on each iteration.
I also added the ability to specify the number of iterations as an argument because it's just convenient.