-
Notifications
You must be signed in to change notification settings - Fork 721
[GPU] Add all-to-all support to S-curve model #34143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
felixwqp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you Terry!
| // `num_nodes`: the number of nodes participating in the all-to-all. | ||
| // `num_communicators`: the number of communicators participating in the | ||
| // all-to-all. | ||
| absl::StatusOr<absl::Duration> AllToAllLatency(int64_t buff_size_bytes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve consistency, since we're adding AllToAllLatency, perhaps we should rename RingLatency to RingCollectiveLatency? This would clarify that both interfaces deal with communication operation name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ring is more of a pattern name. I haven't found a good pattern name for a2a so using collective name here -- it's one of it kind so there should not be much ambiguity.
| /*num_nodes=*/1, absl::Microseconds(292)}, | ||
| {SolGPUCostModel::CollectiveType::kSendRecv, | ||
| /*num_nodes=*/2, absl::Microseconds(485)}, | ||
| {SolGPUCostModel::CollectiveType::kAllToAll, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you double check to make sure the single-host(intra-host) a2a is using the perf-table and multi-host(cross-partition) is using the new model you are adding?
I ask because function CommunicationType is used to determine the type of alltoall; and which model to use to estimate this alltoall.
we may need to make sure the alltoall is properly classified in CommunicationType
with that said, can you add another unit test into xla/service/gpu/model/sol_latency_estimator_test.cc? we need to ensure the alltoall is properly classified by different underlying model(s-curve or perf-table)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added model dispatching test.
bc389ff to
0d9c022
Compare
0d9c022 to
56aef84
Compare
| interpolator.get()) | ||
| .ok()); | ||
|
|
||
| // IB collective should use S-curve model (world-level across 2 hosts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rephrase it to "cross-partition" collective?
IB may not apply to all Xla user's transport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephased.
Imported from GitHub PR #34143 📝 Summary of Changes Added all-to-all support to S-curve model. 🎯 Justification S-curve model doesn't support all-to-all, fallback may lead to bad performance, benchmarking justified that the added all-to-all model can improve performance for models with cross-NVL domain all-to-all. 🚀 Kind of Contribution ⚡️ Performance Improvement/✨ New Feature 📊 Benchmark (for Performance Improvements) | Branch | End-to-end execution time mean on mixtral_8x7b_bf16_2x8 | | :------- | :------: | | main | 1128328 us | | terryysun/a2a_s_curve (this branch) | 1009397 us | Speedup over main: 11.78%. 🧪 Unit Tests: Added exact-matching unit tests to guard the estimation value. 🧪 Execution Tests: Added execution tests to guard the comm-compute overlapping behavior. Copybara import of the project: -- 794ef56 by Terry Sun <tesun@nvidia.com>: s-curve a2a support -- 4f85dae by Terry Sun <tesun@nvidia.com>: fix buffer size calculation -- 1dc94f7 by Terry Sun <tesun@nvidia.com>: add LHS test -- 56aef84 by Terry Sun <tesun@nvidia.com>: add model dispatching test -- d20ed93 by Terry Sun <tesun@nvidia.com>: fix merge issue -- f09474b by Terry Sun <tesun@nvidia.com>: rephase doc string Merging this change closes #34143 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34143 from terryysun:terryysun/a2a_s_curve f09474b PiperOrigin-RevId: 842901679
Imported from GitHub PR openxla/xla#34143 📝 Summary of Changes Added all-to-all support to S-curve model. 🎯 Justification S-curve model doesn't support all-to-all, fallback may lead to bad performance, benchmarking justified that the added all-to-all model can improve performance for models with cross-NVL domain all-to-all. 🚀 Kind of Contribution ⚡️ Performance Improvement/✨ New Feature 📊 Benchmark (for Performance Improvements) | Branch | End-to-end execution time mean on mixtral_8x7b_bf16_2x8 | | :------- | :------: | | main | 1128328 us | | terryysun/a2a_s_curve (this branch) | 1009397 us | Speedup over main: 11.78%. 🧪 Unit Tests: Added exact-matching unit tests to guard the estimation value. 🧪 Execution Tests: Added execution tests to guard the comm-compute overlapping behavior. Copybara import of the project: -- 794ef568fe9fcc0f6b4571f19e2a6ce6e06d0099 by Terry Sun <tesun@nvidia.com>: s-curve a2a support -- 4f85dae4e688af0e6b1f0f5ff1aa0bfef052f15f by Terry Sun <tesun@nvidia.com>: fix buffer size calculation -- 1dc94f78cd73ec3f0784b6b2db795a608468cdc7 by Terry Sun <tesun@nvidia.com>: add LHS test -- 56aef84b36c2bc99bf39562fa868398240ae79c3 by Terry Sun <tesun@nvidia.com>: add model dispatching test -- d20ed933cbd97d56f1664bbea1b8d35f9092146e by Terry Sun <tesun@nvidia.com>: fix merge issue -- f09474bc513804097956e15a7684f1299bef4173 by Terry Sun <tesun@nvidia.com>: rephase doc string Merging this change closes #34143 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34143 from terryysun:terryysun/a2a_s_curve f09474bc513804097956e15a7684f1299bef4173 PiperOrigin-RevId: 842901679
Imported from GitHub PR #34143 📝 Summary of Changes Added all-to-all support to S-curve model. 🎯 Justification S-curve model doesn't support all-to-all, fallback may lead to bad performance, benchmarking justified that the added all-to-all model can improve performance for models with cross-NVL domain all-to-all. 🚀 Kind of Contribution ⚡️ Performance Improvement/✨ New Feature 📊 Benchmark (for Performance Improvements) | Branch | End-to-end execution time mean on mixtral_8x7b_bf16_2x8 | | :------- | :------: | | main | 1128328 us | | terryysun/a2a_s_curve (this branch) | 1009397 us | Speedup over main: 11.78%. 🧪 Unit Tests: Added exact-matching unit tests to guard the estimation value. 🧪 Execution Tests: Added execution tests to guard the comm-compute overlapping behavior. Copybara import of the project: -- 794ef56 by Terry Sun <tesun@nvidia.com>: s-curve a2a support -- 4f85dae by Terry Sun <tesun@nvidia.com>: fix buffer size calculation -- 1dc94f7 by Terry Sun <tesun@nvidia.com>: add LHS test -- 56aef84 by Terry Sun <tesun@nvidia.com>: add model dispatching test -- d20ed93 by Terry Sun <tesun@nvidia.com>: fix merge issue -- f09474b by Terry Sun <tesun@nvidia.com>: rephase doc string Merging this change closes #34143 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34143 from terryysun:terryysun/a2a_s_curve f09474b PiperOrigin-RevId: 842901679
Imported from GitHub PR openxla/xla#34143 📝 Summary of Changes Added all-to-all support to S-curve model. 🎯 Justification S-curve model doesn't support all-to-all, fallback may lead to bad performance, benchmarking justified that the added all-to-all model can improve performance for models with cross-NVL domain all-to-all. 🚀 Kind of Contribution ⚡️ Performance Improvement/✨ New Feature 📊 Benchmark (for Performance Improvements) | Branch | End-to-end execution time mean on mixtral_8x7b_bf16_2x8 | | :------- | :------: | | main | 1128328 us | | terryysun/a2a_s_curve (this branch) | 1009397 us | Speedup over main: 11.78%. 🧪 Unit Tests: Added exact-matching unit tests to guard the estimation value. 🧪 Execution Tests: Added execution tests to guard the comm-compute overlapping behavior. Copybara import of the project: -- 794ef568fe9fcc0f6b4571f19e2a6ce6e06d0099 by Terry Sun <tesun@nvidia.com>: s-curve a2a support -- 4f85dae4e688af0e6b1f0f5ff1aa0bfef052f15f by Terry Sun <tesun@nvidia.com>: fix buffer size calculation -- 1dc94f78cd73ec3f0784b6b2db795a608468cdc7 by Terry Sun <tesun@nvidia.com>: add LHS test -- 56aef84b36c2bc99bf39562fa868398240ae79c3 by Terry Sun <tesun@nvidia.com>: add model dispatching test -- d20ed933cbd97d56f1664bbea1b8d35f9092146e by Terry Sun <tesun@nvidia.com>: fix merge issue -- f09474bc513804097956e15a7684f1299bef4173 by Terry Sun <tesun@nvidia.com>: rephase doc string Merging this change closes #34143 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34143 from terryysun:terryysun/a2a_s_curve f09474bc513804097956e15a7684f1299bef4173 PiperOrigin-RevId: 842901679
Imported from GitHub PR openxla/xla#34143 📝 Summary of Changes Added all-to-all support to S-curve model. 🎯 Justification S-curve model doesn't support all-to-all, fallback may lead to bad performance, benchmarking justified that the added all-to-all model can improve performance for models with cross-NVL domain all-to-all. 🚀 Kind of Contribution ⚡️ Performance Improvement/✨ New Feature 📊 Benchmark (for Performance Improvements) | Branch | End-to-end execution time mean on mixtral_8x7b_bf16_2x8 | | :------- | :------: | | main | 1128328 us | | terryysun/a2a_s_curve (this branch) | 1009397 us | Speedup over main: 11.78%. 🧪 Unit Tests: Added exact-matching unit tests to guard the estimation value. 🧪 Execution Tests: Added execution tests to guard the comm-compute overlapping behavior. Copybara import of the project: -- 794ef568fe9fcc0f6b4571f19e2a6ce6e06d0099 by Terry Sun <tesun@nvidia.com>: s-curve a2a support -- 4f85dae4e688af0e6b1f0f5ff1aa0bfef052f15f by Terry Sun <tesun@nvidia.com>: fix buffer size calculation -- 1dc94f78cd73ec3f0784b6b2db795a608468cdc7 by Terry Sun <tesun@nvidia.com>: add LHS test -- 56aef84b36c2bc99bf39562fa868398240ae79c3 by Terry Sun <tesun@nvidia.com>: add model dispatching test -- d20ed933cbd97d56f1664bbea1b8d35f9092146e by Terry Sun <tesun@nvidia.com>: fix merge issue -- f09474bc513804097956e15a7684f1299bef4173 by Terry Sun <tesun@nvidia.com>: rephase doc string Merging this change closes #34143 PiperOrigin-RevId: 843090023
📝 Summary of Changes
Added all-to-all support to S-curve model.
🎯 Justification
S-curve model doesn't support all-to-all, fallback may lead to bad performance, benchmarking justified that the added all-to-all model can improve performance for models with cross-NVL domain all-to-all.
🚀 Kind of Contribution
⚡️ Performance Improvement/✨ New Feature
📊 Benchmark (for Performance Improvements)
Speedup over main: 11.78%.
🧪 Unit Tests:
Added exact-matching unit tests to guard the estimation value.
🧪 Execution Tests:
Added execution tests to guard the comm-compute overlapping behavior.