-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Make n_chains set the total number of chains across all MPI processes #706
Conversation
Hello and thanks for your Contribution! Once the PR is closed or merged, the preview will be automatically deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very much in favor of this change. n_samples
and n_chains
should either both be rank-local or both global (and the former option is just confusing).
Pff. |
This is now rebased on top of #716. |
Codecov Report
@@ Coverage Diff @@
## master #706 +/- ##
==========================================
- Coverage 69.47% 69.13% -0.34%
==========================================
Files 216 216
Lines 12480 12509 +29
Branches 1809 1817 +8
==========================================
- Hits 8670 8648 -22
- Misses 3335 3378 +43
- Partials 475 483 +8
Continue to review full report at Codecov.
|
@gcarleo Do we want to do this? |
What this does, in the end, is that samplers can be build with
and when run under MPI, every rank will use
or you can create them with
that will match current behaviour. Under nonmpi nothing changes |
Yes, I am just worried that if one leaves |
What will happen in this case is 1 chain per rank + warning saying that 1*1000 != 16 The cleanest way to do this is usually to make But that would break everything for people running stuff locally. |
yeah I mean, I think this change is consistent with the fact that |
I think a good alternative would be to print a warning always when running under MPI with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_samples_per_rank
I can add that in this PR.
Yes, that would be nice to have for consistency. Either way, I'm happy with this PR.
I think a good alternative would be to print a warning always when running under MPI with n_chains for this release (warning can be disabled with a flag) saying that the behaviour changed.
Then we get rid of the warning in the next release (cc @femtobit)
Maybe... It'd be a warning that is displayed essentially every time NetKet is run, so it'd be pretty prominent (which is good to get people to notice, but can also be annoying - the flag helps but needs to be specified all the time). I'm undecided, feel free to do what you think is best.
So if @gcarleo agrees I'll add
And change the behaviour so that
If |
Ok yes please add |
Recently we had several people confused by the fact that MPI does not particularly improve performance.
There are two issues:
Point 1) can be solved with better docs.
Point 2) is about inconsistency with the way we set n_samples. I propose to change the bahviour of n_chains so that it sets the number of chains globally according to the formula
One can still specify
n_chains_per_rank
if he so desires.This is just a skeleton implementaiton (though it should mostly work).
As fixing tests everywhere to use everywhere n_chains_per_rank instead of n_chains will take some time, i'll finish this PR only if we get consensus on this.
Note that it will be a fairly breaking change in the behaviour (though it won't technically break code)