-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with Ensemble Mean Calculation #59
Comments
If I change
This outputs
|
Thank you for the issue submission. It appears that the bug is that the Fixed by #63 |
Do line 186 and 187 of
ensemble_metrics.py
calculate the right value? I think they may be incrementingself.sum
andself.n
ofMean
by a value that is too large.I think
_update_mean
returns the following quantity for n: the previous total number of elements + the number of new elements shown to the current rank (times some additional factor). Then, this quantity is summed across all ranks usingtorch.all_reduce.
If I understand correctly, the desired behavior is not to incrementself.n
by this quantity reduced over all ranks. Rather,self.n
should only be incremented by the number of new elements seen across all ranks. (A similar argument holds forself.sum
).To test this, I put the code below in a script called
test_modulus.py
and ransrun -n 2 -c 64 -G 2 python3 -u test_modulus.py
I got this output:
However, wouldn't we expect that after 2 iterations, n would be 4. After 3 iterations, n would be 6. After 4 iterations, n would be 8. And so on?
The text was updated successfully, but these errors were encountered: