Skip to content

Conversation

@shawnwang18
Copy link
Contributor

📝 Summary of Changes

  • Added DebugOptions::DYNAMIC_SLICE_COPY_FUSION to the list of enabled GPU command buffers in the default debug options.

🚀 Kind of Contribution
⚡️ Performance Improvement

🧪 Unit Tests:
change the default setting, unittest has already been added.

@shawnwang18 shawnwang18 force-pushed the shawnw/enable_dus_copy_default branch from c26d995 to 33b4af9 Compare December 3, 2025 03:53
copybara-service bot pushed a commit that referenced this pull request Dec 3, 2025
…owered to cuda-graph

Imported from GitHub PR #34734

📝 Summary of Changes

* Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options.

🚀 Kind of Contribution
⚡️ Performance Improvement

🧪 Unit Tests:
change the default setting, unittest has already been added.

Copybara import of the project:

--
33b4af9 by Shawn Wang <shawnw@nvidia.com>:

make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph

--
dc4ab23 by Shawn Wang <shawnw@nvidia.com>:

fix unittest

--
3032658 by Shawn Wang <shawnw@nvidia.com>:

fix unittest

Merging this change closes #34734

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34734 from shawnwang18:shawnw/enable_dus_copy_default 3032658
PiperOrigin-RevId: 839756023
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 3, 2025
…owered to cuda-graph

Imported from GitHub PR openxla/xla#34734

📝 Summary of Changes

* Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options.

🚀 Kind of Contribution
⚡️ Performance Improvement

🧪 Unit Tests:
change the default setting, unittest has already been added.

Copybara import of the project:

--
33b4af9413c63492aefffb98a143e98f589f316b by Shawn Wang <shawnw@nvidia.com>:

make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph

--
dc4ab23ad9e22ba8927cb0ca0c6fb6dab6851ca1 by Shawn Wang <shawnw@nvidia.com>:

fix unittest

--
3032658d622a2d3b64c439b81e9888c07ac7a3b3 by Shawn Wang <shawnw@nvidia.com>:

fix unittest

Merging this change closes #34734

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default 3032658d622a2d3b64c439b81e9888c07ac7a3b3
PiperOrigin-RevId: 839756023
@shawnwang18 shawnwang18 force-pushed the shawnw/enable_dus_copy_default branch from 3032658 to 9fc0463 Compare December 8, 2025 07:09
@shawnwang18 shawnwang18 force-pushed the shawnw/enable_dus_copy_default branch from 9fc0463 to bdca8aa Compare December 14, 2025 23:11
copybara-service bot pushed a commit that referenced this pull request Dec 16, 2025
…owered to cuda-graph

Imported from GitHub PR #34734

📝 Summary of Changes

* Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options.

🚀 Kind of Contribution
⚡️ Performance Improvement

🧪 Unit Tests:
change the default setting, unittest has already been added.

Copybara import of the project:

--
bdca8aa by Shawn Wang <shawnw@nvidia.com>:

make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph

Merging this change closes #34734

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aa
PiperOrigin-RevId: 839756023
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 16, 2025
…owered to cuda-graph

Imported from GitHub PR openxla/xla#34734

📝 Summary of Changes

* Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options.

🚀 Kind of Contribution
⚡️ Performance Improvement

🧪 Unit Tests:
change the default setting, unittest has already been added.

Copybara import of the project:

--
bdca8aae0307ff41188b6481b878e42b9ec90612 by Shawn Wang <shawnw@nvidia.com>:

make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph

Merging this change closes #34734

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 839756023
copybara-service bot pushed a commit that referenced this pull request Dec 16, 2025
…owered to cuda-graph

Imported from GitHub PR #34734

📝 Summary of Changes

* Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options.

🚀 Kind of Contribution
⚡️ Performance Improvement

🧪 Unit Tests:
change the default setting, unittest has already been added.

Copybara import of the project:

--
bdca8aa by Shawn Wang <shawnw@nvidia.com>:

make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph

Merging this change closes #34734

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aa
PiperOrigin-RevId: 839756023
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 16, 2025
…owered to cuda-graph

Imported from GitHub PR openxla/xla#34734

📝 Summary of Changes

* Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options.

🚀 Kind of Contribution
⚡️ Performance Improvement

🧪 Unit Tests:
change the default setting, unittest has already been added.

Copybara import of the project:

--
bdca8aae0307ff41188b6481b878e42b9ec90612 by Shawn Wang <shawnw@nvidia.com>:

make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph

Merging this change closes #34734

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 839756023
copybara-service bot pushed a commit that referenced this pull request Dec 17, 2025
…owered to cuda-graph

Imported from GitHub PR #34734

📝 Summary of Changes

* Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options.

🚀 Kind of Contribution
⚡️ Performance Improvement

🧪 Unit Tests:
change the default setting, unittest has already been added.

Copybara import of the project:

--
bdca8aa by Shawn Wang <shawnw@nvidia.com>:

make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph

Merging this change closes #34734

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aa
PiperOrigin-RevId: 839756023
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
…owered to cuda-graph

Imported from GitHub PR openxla/xla#34734

📝 Summary of Changes

* Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options.

🚀 Kind of Contribution
⚡️ Performance Improvement

🧪 Unit Tests:
change the default setting, unittest has already been added.

Copybara import of the project:

--
bdca8aae0307ff41188b6481b878e42b9ec90612 by Shawn Wang <shawnw@nvidia.com>:

make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph

Merging this change closes #34734

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 839756023
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 844562536
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 844572976
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
+ Allow the chain to start from <transpose, reshape, bitcast> instead of only reshape
+ Add a layout sensitive mode to the simplification

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 844685844
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 845127592
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 844616231
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 844616257
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 844615694
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 845127592
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
The kScan operation performs a scan (prefix sum) along a given dimension. The operation uses a user-defined computation to combine elements. The scan can be performed in reverse order. This change includes adding the opcode, the corresponding HloInstruction subclass, proto serialization, and shape verification.

The kScan HLO computes an inclusive scan (prefix reduction) over the input array.

  `scan(input, init_value, to_apply, dimension, is_reverse)`

Where:

*   `input`: An array of input values.
*   `init_value`: A scalar initial value for the accumulator.
*   `to_apply`: A computation that is applied to the accumulator and the current element of the input array. It must be a binary function on scalars.
*   `dimension`: The dimension along which the scan is performed.
*   `is_reverse`: A boolean indicating if the scan should be performed in reverse order.

Output:The output is an array of the same shape as input, containing the inclusive partial reductions.
For an input sequence `x` and an accumulation function `f`, the output elements are computed as: `y[i] = f(y[i-1], x[i])`, where `y[-1]` is the `init_value`.
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 842722577
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
…TF normalization in emitters

0) Fix a bug (?) in normalization util when normalized dim contains a single dimension
1) Perform normalization OTF for Transpose emitter selection
2) Use normalized shape for unrolling decision in kLoop emitter
3) Use normalized shape to detect slow transposes in triton fusion rewriter

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34734 from shawnwang18:shawnw/enable_dus_copy_default bdca8aae0307ff41188b6481b878e42b9ec90612
PiperOrigin-RevId: 841759079
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 17, 2025
…owered to cuda-graph

Imported from GitHub PR openxla/xla#34734

📝 Summary of Changes

* Added `DebugOptions::DYNAMIC_SLICE_COPY_FUSION` to the list of enabled GPU command buffers in the default debug options.

🚀 Kind of Contribution
⚡️ Performance Improvement

🧪 Unit Tests:
change the default setting, unittest has already been added.

Copybara import of the project:

--
bdca8aae0307ff41188b6481b878e42b9ec90612 by Shawn Wang <shawnw@nvidia.com>:

make DYNAMIC_SLICE_COPY_FUSION command default lowered to cuda-graph

Merging this change closes #34734

PiperOrigin-RevId: 845659522
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants