Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVIDIA] Move collective pipeliner after post-layout opts #12866

Closed
wants to merge 2 commits into from

Conversation

kaixih
Copy link
Contributor

@kaixih kaixih commented May 20, 2024

To perform the fp8 dot, we insert quant and dequant ops before the dot and rely on the gemm-rewriter to pattern match and rewrite to cublaslt calls for GPUs.

When this fp8 pattern appears in while loops that is qualified for the collective pipeliner (CP), the CP might disrupts the pattern by accidentally peeling the quant or dequant ops off the loop.

So, this PR moves the CP after the gemm-rewriter (when the newly-introduced xla_gpu_collective_pipeliner_post_layout_optimization is set), since this can save us from redesigning the CP to support this special case.

Note, for the long-term solution, we should move the collectives to the "safer" places before applying the CP. For example, in the above use case, we can move the collectives in-between the quant and dequant and then the CP should be able to preserve the dequant->dot pattern. This is actually already supported as an integral part in the gemm-rewriter.

cc. @Tixxx @hx89 @nluehr @philipphack

@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label May 20, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label May 20, 2024
@kaixih
Copy link
Contributor Author

kaixih commented May 20, 2024

@frgossen Can you help review?

Copy link
Member

@frgossen frgossen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining the motivation. From the description, it is not quite clear why this is needed. Can you expand the motivation, maybe with an example? What fp8 pattern appears in a while loop and how is it transformed vs should be transformed?

Can you add test cases with the pattern variations you care about? I think that could also help understand the motivation behind this change.

@kaixih
Copy link
Contributor Author

kaixih commented May 20, 2024

Sure. Basically, the original graph looks like

while->q->dq->AG->dot

where q takes bf16 and outputs fp8 and dq takes fp8 and outputs bf16.
Then our gemm rewriter moves the AG and pattern matches dq->dot to cublaslt_fp8gemm, like:

# move AG:
while->q->AG->dq->dot
# pattern match:
while->q->AG->custom_call(=cublaslt-fp8)

Now, with the CP pass, we see that the original graph is transformed into:

q->dq->AG->while->dot 

where you can see the q->dq->AG is moved out of the loop. Then, the gemm-rewriter can no longer rewrites this dot to the custom call of cublaslt-fp8.

So, that is why this PR moves the CP after the gemm rewriter. So, it doesn't need to worry about breaking any patterns.

Please let me know if the above makes sense to you. At the same time, I am trying to extract the above from our real model training as a unit test to showcase the issue.

@kaixih kaixih force-pushed the move_collective_pipeliner_dev branch from 5d8c3bf to ffb664a Compare May 23, 2024 02:13
@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label May 23, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label May 23, 2024
@kaixih
Copy link
Contributor Author

kaixih commented May 25, 2024

@frgossen Any update?

We have recently tested this in some real models and we can see the fp8 gemms are back.

@bixia1 bixia1 self-requested a review May 29, 2024 17:15
@bixia1
Copy link
Contributor

bixia1 commented May 29, 2024

Do you know why moving CP to post layout is not generally good?

@@ -1239,6 +1261,8 @@ absl::Status GpuCompiler::OptimizeHloModule(
hlo_module, stream_exec, options, gpu_target_config,
thread_pool.get_mutable()));

TF_RETURN_IF_ERROR(RunCollectivePipelinerPasses(hlo_module));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to rename and make it clear this is post layout CP: RunPostLayoutCollectivePipelinerPasses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. PTAL.

xla/service/gpu/gpu_compiler.cc Show resolved Hide resolved
@kaixih
Copy link
Contributor Author

kaixih commented May 29, 2024

Do you know why moving CP to post layout is not generally good?

I think theoretically it should be fine to move CP to post layout, since the CP is supposed to be agnostic of non-collective ops. The main concern (from @Tixxx ) might be some passes in layout (e.g. layout normalization) might alter the collectives through inserting copies or transposes, for instance. We are not 100% confident that this would pose a problem or not.

So, our plan is to keep the flag and set it to True by default. If no complaints are received, we will remove the flag. How do you think of the plan?

@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label May 29, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label May 29, 2024
@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label May 29, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label May 29, 2024
Copy link
Member

@frgossen frgossen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@frgossen
Copy link
Member

frgossen commented Jun 3, 2024

Can you address the build failures?

@kaixih kaixih added kokoro:force-run Forces CI to rerun labels Jun 3, 2024
@kaixih kaixih force-pushed the move_collective_pipeliner_dev branch from ca32297 to f207a5c Compare June 3, 2024 17:56
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Jun 3, 2024
@kaixih
Copy link
Contributor Author

kaixih commented Jun 3, 2024

@frgossen Rebased. PTAL.

(Only AMD ROCm is failing but the log implies it is irrelevant to this PR).

@frgossen
Copy link
Member

frgossen commented Jun 5, 2024

Looks like this runs into a merge conflict. Can you rebase?

@kaixih kaixih force-pushed the move_collective_pipeliner_dev branch from f207a5c to 6bfc58e Compare June 5, 2024 16:30
@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label Jun 5, 2024
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 25, 2024
… unit test

Imported from GitHub PR openxla/xla#13920

@kaixih @frgossen This is just a rebased version of openxla/xla#12866
Copybara import of the project:

--
dcf940b6f41e3e14b8e999e36055c07bf468f9a1 by shuw <shuw@nvidia.com>:

Move collective pipeliner after post-layout opts and add a unit test

--
9a731e9ec6c84f809804776a3fba2d0b11a60294 by shuw <shuw@nvidia.com>:

Change to EXPECT_EQ

Merging this change closes #13920

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13920 from wenscarl:kaixih/move_collective_pipeliner_dev_rebase 9a731e9ec6c84f809804776a3fba2d0b11a60294
PiperOrigin-RevId: 646142320
copybara-service bot pushed a commit that referenced this pull request Jun 25, 2024
… unit test

Imported from GitHub PR #13920

@kaixih @frgossen This is just a rebased version of #12866
Copybara import of the project:

--
dcf940b by shuw <shuw@nvidia.com>:

Move collective pipeliner after post-layout opts and add a unit test

--
9a731e9 by shuw <shuw@nvidia.com>:

Change to EXPECT_EQ

Merging this change closes #13920

COPYBARA_INTEGRATE_REVIEW=#13920 from wenscarl:kaixih/move_collective_pipeliner_dev_rebase 9a731e9
PiperOrigin-RevId: 646428752
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 25, 2024
… unit test

Imported from GitHub PR openxla/xla#13920

@kaixih @frgossen This is just a rebased version of openxla/xla#12866
Copybara import of the project:

--
dcf940b6f41e3e14b8e999e36055c07bf468f9a1 by shuw <shuw@nvidia.com>:

Move collective pipeliner after post-layout opts and add a unit test

--
9a731e9ec6c84f809804776a3fba2d0b11a60294 by shuw <shuw@nvidia.com>:

Change to EXPECT_EQ

Merging this change closes #13920

PiperOrigin-RevId: 646428752
copybara-service bot pushed a commit that referenced this pull request Oct 3, 2024
Imported from GitHub PR #17453

Moves the collective quantizer pass ahead of the collective pipeliner to preserve FP8 quantization and dequantization patterns preceded or followed by collectives without running the collective pipeliner post layout assignment. See #12866 and #15292.
Copybara import of the project:

--
d8e8f63 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

--
b34f203 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

Merging this change closes #17453

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17453 from philipphack:u_collective_passes_xla b34f203
PiperOrigin-RevId: 681670419
copybara-service bot pushed a commit that referenced this pull request Oct 4, 2024
Imported from GitHub PR #17453

Moves the collective quantizer pass ahead of the collective pipeliner to preserve FP8 quantization and dequantization patterns preceded or followed by collectives without running the collective pipeliner post layout assignment. See #12866 and #15292.
Copybara import of the project:

--
d8e8f63 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

--
b34f203 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

Merging this change closes #17453

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17453 from philipphack:u_collective_passes_xla b34f203
PiperOrigin-RevId: 681670419
copybara-service bot pushed a commit that referenced this pull request Oct 4, 2024
Imported from GitHub PR #17453

Moves the collective quantizer pass ahead of the collective pipeliner to preserve FP8 quantization and dequantization patterns preceded or followed by collectives without running the collective pipeliner post layout assignment. See #12866 and #15292.
Copybara import of the project:

--
d8e8f63 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

--
b34f203 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

Merging this change closes #17453

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17453 from philipphack:u_collective_passes_xla b34f203
PiperOrigin-RevId: 681670419
copybara-service bot pushed a commit that referenced this pull request Oct 4, 2024
Imported from GitHub PR #17453

Moves the collective quantizer pass ahead of the collective pipeliner to preserve FP8 quantization and dequantization patterns preceded or followed by collectives without running the collective pipeliner post layout assignment. See #12866 and #15292.
Copybara import of the project:

--
d8e8f63 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

--
b34f203 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

Merging this change closes #17453

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17453 from philipphack:u_collective_passes_xla b34f203
PiperOrigin-RevId: 681670419
copybara-service bot pushed a commit that referenced this pull request Oct 4, 2024
Imported from GitHub PR #17453

Moves the collective quantizer pass ahead of the collective pipeliner to preserve FP8 quantization and dequantization patterns preceded or followed by collectives without running the collective pipeliner post layout assignment. See #12866 and #15292.
Copybara import of the project:

--
d8e8f63 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

--
b34f203 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

Merging this change closes #17453

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17453 from philipphack:u_collective_passes_xla b34f203
PiperOrigin-RevId: 681670419
copybara-service bot pushed a commit that referenced this pull request Oct 4, 2024
Imported from GitHub PR #17453

Moves the collective quantizer pass ahead of the collective pipeliner to preserve FP8 quantization and dequantization patterns preceded or followed by collectives without running the collective pipeliner post layout assignment. See #12866 and #15292.
Copybara import of the project:

--
d8e8f63 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

--
b34f203 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

Merging this change closes #17453

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17453 from philipphack:u_collective_passes_xla b34f203
PiperOrigin-RevId: 681670419
copybara-service bot pushed a commit that referenced this pull request Oct 4, 2024
Imported from GitHub PR #17453

Moves the collective quantizer pass ahead of the collective pipeliner to preserve FP8 quantization and dequantization patterns preceded or followed by collectives without running the collective pipeliner post layout assignment. See #12866 and #15292.
Copybara import of the project:

--
d8e8f63 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

--
b34f203 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

Merging this change closes #17453

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17453 from philipphack:u_collective_passes_xla b34f203
PiperOrigin-RevId: 681670419
copybara-service bot pushed a commit that referenced this pull request Oct 4, 2024
Imported from GitHub PR #17453

Moves the collective quantizer pass ahead of the collective pipeliner to preserve FP8 quantization and dequantization patterns preceded or followed by collectives without running the collective pipeliner post layout assignment. See #12866 and #15292.
Copybara import of the project:

--
d8e8f63 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

--
b34f203 by Philipp Hack <phack@nvidia.com>:

Moves the collective quantizer pass before the collective pipeliner.

Merging this change closes #17453

COPYBARA_INTEGRATE_REVIEW=#17453 from philipphack:u_collective_passes_xla b34f203
PiperOrigin-RevId: 682489977
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

FUTURE_COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684256309
copybara-service bot pushed a commit that referenced this pull request Oct 10, 2024
Imported from GitHub PR #17950

Removes the functionality to optionally run the collective pipeliner pass post layout assignment. This option was introduced in #12866 to preserve FP8 quantization and dequantization patterns and was obsoleted by the reordering of the collective optimization passes in #17453.
Copybara import of the project:

--
e9e9c1a by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

--
c5489ac by Philipp Hack <phack@nvidia.com>:

Removes the option to run the collective pipeliner post layout assignment.

Merging this change closes #17950

COPYBARA_INTEGRATE_REVIEW=#17950 from philipphack:u_collective_flag_xla c5489ac
PiperOrigin-RevId: 684614799
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants