-
Notifications
You must be signed in to change notification settings - Fork 719
[XLA:GPU] make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph #34734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
shawnwang18
wants to merge
1
commit into
openxla:main
from
shawnwang18:shawnw/enable_dus_copy_default
Closed
[XLA:GPU] make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph #34734
shawnwang18
wants to merge
1
commit into
openxla:main
from
shawnwang18:shawnw/enable_dus_copy_default
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
c26d995 to
33b4af9
Compare
ezhulenev
approved these changes
Dec 3, 2025
copybara-service bot
pushed a commit
that referenced
this pull request
Dec 3, 2025
…owered to cuda-graph Imported from GitHub PR #34734 📝 Summary of Changes * Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options. 🚀 Kind of Contribution ⚡️ Performance Improvement 🧪 Unit Tests: change the default setting, unittest has already been added. Copybara import of the project: -- 33b4af9 by Shawn Wang <shawnw@nvidia.com>: make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph -- dc4ab23 by Shawn Wang <shawnw@nvidia.com>: fix unittest -- 3032658 by Shawn Wang <shawnw@nvidia.com>: fix unittest Merging this change closes #34734 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34734 from shawnwang18:shawnw/enable_dus_copy_default 3032658 PiperOrigin-RevId: 839756023
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 3, 2025
…owered to cuda-graph Imported from GitHub PR openxla/xla#34734 📝 Summary of Changes * Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options. 🚀 Kind of Contribution ⚡️ Performance Improvement 🧪 Unit Tests: change the default setting, unittest has already been added. Copybara import of the project: -- 33b4af9413c63492aefffb98a143e98f589f316b by Shawn Wang <shawnw@nvidia.com>: make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph -- dc4ab23ad9e22ba8927cb0ca0c6fb6dab6851ca1 by Shawn Wang <shawnw@nvidia.com>: fix unittest -- 3032658d622a2d3b64c439b81e9888c07ac7a3b3 by Shawn Wang <shawnw@nvidia.com>: fix unittest Merging this change closes #34734 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default 3032658d622a2d3b64c439b81e9888c07ac7a3b3 PiperOrigin-RevId: 839756023
3032658 to
9fc0463
Compare
9fc0463 to
bdca8aa
Compare
akuegel
approved these changes
Dec 16, 2025
copybara-service bot
pushed a commit
that referenced
this pull request
Dec 16, 2025
…owered to cuda-graph Imported from GitHub PR #34734 📝 Summary of Changes * Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options. 🚀 Kind of Contribution ⚡️ Performance Improvement 🧪 Unit Tests: change the default setting, unittest has already been added. Copybara import of the project: -- bdca8aa by Shawn Wang <shawnw@nvidia.com>: make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph Merging this change closes #34734 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aa PiperOrigin-RevId: 839756023
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 16, 2025
…owered to cuda-graph Imported from GitHub PR openxla/xla#34734 📝 Summary of Changes * Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options. 🚀 Kind of Contribution ⚡️ Performance Improvement 🧪 Unit Tests: change the default setting, unittest has already been added. Copybara import of the project: -- bdca8aae0307ff41188b6481b878e42b9ec90612 by Shawn Wang <shawnw@nvidia.com>: make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph Merging this change closes #34734 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 839756023
copybara-service bot
pushed a commit
that referenced
this pull request
Dec 16, 2025
…owered to cuda-graph Imported from GitHub PR #34734 📝 Summary of Changes * Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options. 🚀 Kind of Contribution ⚡️ Performance Improvement 🧪 Unit Tests: change the default setting, unittest has already been added. Copybara import of the project: -- bdca8aa by Shawn Wang <shawnw@nvidia.com>: make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph Merging this change closes #34734 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aa PiperOrigin-RevId: 839756023
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 16, 2025
…owered to cuda-graph Imported from GitHub PR openxla/xla#34734 📝 Summary of Changes * Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options. 🚀 Kind of Contribution ⚡️ Performance Improvement 🧪 Unit Tests: change the default setting, unittest has already been added. Copybara import of the project: -- bdca8aae0307ff41188b6481b878e42b9ec90612 by Shawn Wang <shawnw@nvidia.com>: make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph Merging this change closes #34734 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 839756023
copybara-service bot
pushed a commit
that referenced
this pull request
Dec 17, 2025
…owered to cuda-graph Imported from GitHub PR #34734 📝 Summary of Changes * Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options. 🚀 Kind of Contribution ⚡️ Performance Improvement 🧪 Unit Tests: change the default setting, unittest has already been added. Copybara import of the project: -- bdca8aa by Shawn Wang <shawnw@nvidia.com>: make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph Merging this change closes #34734 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aa PiperOrigin-RevId: 839756023
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
…owered to cuda-graph Imported from GitHub PR openxla/xla#34734 📝 Summary of Changes * Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options. 🚀 Kind of Contribution ⚡️ Performance Improvement 🧪 Unit Tests: change the default setting, unittest has already been added. Copybara import of the project: -- bdca8aae0307ff41188b6481b878e42b9ec90612 by Shawn Wang <shawnw@nvidia.com>: make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph Merging this change closes #34734 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 839756023
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 844562536
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 844572976
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
+ Allow the chain to start from <transpose, reshape, bitcast> instead of only reshape + Add a layout sensitive mode to the simplification FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 844685844
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 845127592
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 844616231
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 844616257
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 844615694
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 845127592
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
The kScan operation performs a scan (prefix sum) along a given dimension. The operation uses a user-defined computation to combine elements. The scan can be performed in reverse order. This change includes adding the opcode, the corresponding HloInstruction subclass, proto serialization, and shape verification. The kScan HLO computes an inclusive scan (prefix reduction) over the input array. `scan(input, init_value, to_apply, dimension, is_reverse)` Where: * `input`: An array of input values. * `init_value`: A scalar initial value for the accumulator. * `to_apply`: A computation that is applied to the accumulator and the current element of the input array. It must be a binary function on scalars. * `dimension`: The dimension along which the scan is performed. * `is_reverse`: A boolean indicating if the scan should be performed in reverse order. Output:The output is an array of the same shape as input, containing the inclusive partial reductions. For an input sequence `x` and an accumulation function `f`, the output elements are computed as: `y[i] = f(y[i-1], x[i])`, where `y[-1]` is the `init_value`. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 842722577
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
…TF normalization in emitters 0) Fix a bug (?) in normalization util when normalized dim contains a single dimension 1) Perform normalization OTF for Transpose emitter selection 2) Use normalized shape for unrolling decision in kLoop emitter 3) Use normalized shape to detect slow transposes in triton fusion rewriter FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612 PiperOrigin-RevId: 841759079
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Dec 17, 2025
…owered to cuda-graph Imported from GitHub PR openxla/xla#34734 📝 Summary of Changes * Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options. 🚀 Kind of Contribution ⚡️ Performance Improvement 🧪 Unit Tests: change the default setting, unittest has already been added. Copybara import of the project: -- bdca8aae0307ff41188b6481b878e42b9ec90612 by Shawn Wang <shawnw@nvidia.com>: make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph Merging this change closes #34734 PiperOrigin-RevId: 845659522
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📝 Summary of Changes
DebugOptions::DYNAMIC_SLICE_COPY_FUSIONto the list of enabled GPU command buffers in the default debug options.🚀 Kind of Contribution
⚡️ Performance Improvement
🧪 Unit Tests:
change the default setting, unittest has already been added.