Adding support for fused moe kernels in sampler. by NicoGrande · Pull Request #1385 · google/tunix

NicoGrande · 2026-04-09T22:52:57Z

This PR introduces full support for the Fused MoE Kernel integrated to MaxText in AI-Hypercomputer/maxtext#3627.

More specifically, this PR introduces the ability to fuse MoE kernel weights in the MaxText model during weight sync to match the required shapes for the tpu-inference Fused MoE kernel.

Additionally, this PR adds some optimizations for resharding, which are helpful when with large models. First, this PR removes the call to jax.clear_caches() in vllm_sampler.py. This removes the need to clear Jax compilation caches, which speeds up both training and rollout steps. The downside of this, is that there is more memory fragmentation present in sampler TPUs after clearing the KV cache. To get around fragmentation OOMs, we introduce chunked resharding to reduce the peak HBM consumed during reshard operations for large models.

Checklist

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and all unit tests pass.
I have added all appropriate doc-strings/documentation.
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have signed the Contributor License Agreement.
I have followed Contribution Guidelines.

wang2yn84 · 2026-04-22T22:48:23Z

            tensor_parallel_size=rollout_config.tensor_parallel_size,
            data_parallel_size=rollout_config.data_parallel_size,
            expert_parallel_size=rollout_config.expert_parallel_size,
+            rollout_chunk_size=rollout_config.rollout_vllm_reshard_chunk_size,


rollout_chunk_size seems doesn't need need any special handling, just plumbing it through, shall we pass it via rollout_vllm_kwargs?

you are right, we only need to plumb it through - however it is not a vLLM engine argument so it shouldn't be passed to the LLM() constructor. If we can remove it before passing rollout_vllm_kwargs to the constructor that would work, otherwise adding a new argument for it may be cleaner. LMK your thoughts :)

My bad, I was looking at the wrong place. Plumbing it through is the right way to go.

deleting dst buffers during reshard.

This is required to support fused moe in MaxText. See Tunix PR: google/tunix#1385.

github-actions Bot assigned hgao327 Apr 9, 2026

NicoGrande temporarily deployed to testing April 9, 2026 22:53 — with GitHub Actions Inactive

NicoGrande force-pushed the nicogrande/support-fused-moe branch from 4629f76 to d4b75a1 Compare April 10, 2026 21:34

NicoGrande temporarily deployed to testing April 10, 2026 21:34 — with GitHub Actions Inactive

NicoGrande had a problem deploying to testing April 10, 2026 21:34 — with GitHub Actions Error

NicoGrande force-pushed the nicogrande/support-fused-moe branch from d4b75a1 to 3578a75 Compare April 13, 2026 16:28

NicoGrande temporarily deployed to testing April 13, 2026 16:28 — with GitHub Actions Inactive

NicoGrande had a problem deploying to testing April 13, 2026 16:28 — with GitHub Actions Error

NicoGrande force-pushed the nicogrande/support-fused-moe branch from 3578a75 to e036a98 Compare April 14, 2026 01:01

NicoGrande temporarily deployed to testing April 14, 2026 01:02 — with GitHub Actions Inactive

NicoGrande temporarily deployed to testing April 16, 2026 18:00 — with GitHub Actions Inactive

NicoGrande temporarily deployed to testing April 17, 2026 18:46 — with GitHub Actions Inactive

NicoGrande force-pushed the nicogrande/support-fused-moe branch from d44997f to 817eece Compare April 21, 2026 23:05

NicoGrande had a problem deploying to testing April 21, 2026 23:05 — with GitHub Actions Error

NicoGrande force-pushed the nicogrande/support-fused-moe branch from 817eece to 957a534 Compare April 21, 2026 23:11

NicoGrande temporarily deployed to testing April 21, 2026 23:11 — with GitHub Actions Inactive

NicoGrande force-pushed the nicogrande/support-fused-moe branch from 957a534 to 432334f Compare April 22, 2026 01:07

NicoGrande temporarily deployed to testing April 22, 2026 01:08 — with GitHub Actions Inactive

NicoGrande marked this pull request as ready for review April 22, 2026 15:21

NicoGrande requested review from hgao327, lc5211, tianshub and wang2yn84 as code owners April 22, 2026 15:21

NicoGrande requested review from abheesht17, jiangyangmu, s-noghabi and sizhit2 as code owners April 22, 2026 15:21

wang2yn84 reviewed Apr 22, 2026

View reviewed changes

adding support for fused moe kernels.

fa8a5a6

deleting dst buffers during reshard.

NicoGrande force-pushed the nicogrande/support-fused-moe branch from 432334f to fa8a5a6 Compare April 22, 2026 22:55

NicoGrande temporarily deployed to testing April 22, 2026 22:55 — with GitHub Actions Inactive

copybara-service Bot merged commit 9d25e26 into main Apr 22, 2026
9 checks passed

niting mentioned this pull request Apr 22, 2026

Update Tunix version AI-Hypercomputer/maxtext#3730

Merged

4 tasks

niting added a commit to niting/maxtext that referenced this pull request Apr 23, 2026

Update Tunix/VLLM/tpu-inference versions.

e4fc81c

This is required to support fused moe in MaxText. See Tunix PR: google/tunix#1385.

niting added a commit to niting/maxtext that referenced this pull request Apr 23, 2026

Update Tunix/VLLM/tpu-inference versions.

68312c8

This is required to support fused moe in MaxText. See Tunix PR: google/tunix#1385.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for fused moe kernels in sampler.#1385

Adding support for fused moe kernels in sampler.#1385
copybara-service[bot] merged 1 commit intomainfrom
nicogrande/support-fused-moe

NicoGrande commented Apr 9, 2026 •

edited

Loading

Uh oh!

wang2yn84 Apr 22, 2026

Uh oh!

NicoGrande Apr 22, 2026

Uh oh!

wang2yn84 Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NicoGrande commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wang2yn84 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

NicoGrande Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

wang2yn84 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NicoGrande commented Apr 9, 2026 •

edited

Loading