Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declare correct sharding in MetropolisSamplerState.init #1776

Merged
merged 3 commits into from
Apr 23, 2024

Conversation

PhilipVinc
Copy link
Member

While reviewing #1769 I realised that our sampler states do not properly initialise some variables to be sharded, resulting in some (even if small) cross-GPU communication at every sampling step.

This should fix a few issues I noticed.

Copy link
Collaborator

@inailuig inailuig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the other changes are unnecessary.

n_steps_proc was already fully replicated and is increased on every device independently, and there is no need to make it a separate variable on every device and then sum.

For avoiding the communication see #1777

netket/sampler/metropolis.py Show resolved Hide resolved
@PhilipVinc
Copy link
Member Author

Ah! You are right. The replication does not incur into communication. Ok. I will revert to only include that single line

Copy link
Collaborator

@inailuig inailuig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. maybe update the title and description

@PhilipVinc PhilipVinc changed the title make metropolis sampler state properly work with sharding Declare correct sharding in MetropolisSamplerState.init Apr 23, 2024
@PhilipVinc PhilipVinc merged commit 1956bd2 into netket:master Apr 23, 2024
8 of 11 checks passed
@PhilipVinc PhilipVinc deleted the pv/distributed branch April 23, 2024 12:02
jwnys pushed a commit to jwnys/netket that referenced this pull request Apr 23, 2024
…etket#1776)

Declare correct sharding for the counter of accepted moves inside of MetropolisSamplerState.__init__  .

The previous code was working fine but gave rise to additional communications among nodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants