-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[SymmMem] Increase minimum nthreads to cover sync needs in NVL72 #161983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161983
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit a4fc434 with merge base 524b78d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
at::ceil_div(numel_per_split, numel_per_thread), | ||
static_cast<size_t>(at::cuda::warp_size())); | ||
num_threads = at::ceil_div(numel_per_split, numel_per_thread); | ||
num_threads = max(num_threads, MAX_CUDA_P2P_DOMAIN_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to pass world_size here and make max equal to world size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed :)
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…1984) Added a helper API to tell if the world is entirely within a P2P domain or crosses network. This is mainly for nblocks tuning purpose. (In later PRs) Pull Request resolved: #161984 Approved by: https://github.com/ngimel ghstack dependencies: #161983
…orch#161983) `sync_remote_blocks` maps threads to peers. Previously min nthreads is warp size, which is too small to cover NVL72. Bumping it. Pull Request resolved: pytorch#161983 Approved by: https://github.com/ngimel
…orch#161984) Added a helper API to tell if the world is entirely within a P2P domain or crosses network. This is mainly for nblocks tuning purpose. (In later PRs) Pull Request resolved: pytorch#161984 Approved by: https://github.com/ngimel ghstack dependencies: pytorch#161983
…orch#161983) `sync_remote_blocks` maps threads to peers. Previously min nthreads is warp size, which is too small to cover NVL72. Bumping it. Pull Request resolved: pytorch#161983 Approved by: https://github.com/ngimel
…orch#161984) Added a helper API to tell if the world is entirely within a P2P domain or crosses network. This is mainly for nblocks tuning purpose. (In later PRs) Pull Request resolved: pytorch#161984 Approved by: https://github.com/ngimel ghstack dependencies: pytorch#161983
…orch#161984) Added a helper API to tell if the world is entirely within a P2P domain or crosses network. This is mainly for nblocks tuning purpose. (In later PRs) Pull Request resolved: pytorch#161984 Approved by: https://github.com/ngimel ghstack dependencies: pytorch#161983
…orch#161984) Added a helper API to tell if the world is entirely within a P2P domain or crosses network. This is mainly for nblocks tuning purpose. (In later PRs) Pull Request resolved: pytorch#161984 Approved by: https://github.com/ngimel ghstack dependencies: pytorch#161983
Stack from ghstack (oldest at bottom):
sync_remote_blocks
maps threads to peers. Previously min nthreads is warp size, which is too small to cover NVL72. Bumping it.cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta