Fix gpt-oss router_indices in EP #40545

jiqing-feng · 2025-08-29T11:02:06Z

The indices has been masked by 0, but the 0 will be recognized as experts[0]. We need a new class specific for masking

I run gpt-oss with EP=2, and found both rank0 and rank1 computed expert 0

After this PR, we can see the masking expert is num_expert (16) here, and 16 will be skipped.

Rocketknight1 · 2025-08-29T11:23:40Z

cc @ArthurZucker I think?

jiqing-feng · 2025-08-29T11:47:43Z

run-slow: gpt_oss

jiqing-feng · 2025-08-29T13:12:06Z

I also need to change all the other MOE, like mixtral, after this is verified.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

ArthurZucker · 2025-08-29T16:09:21Z

You should not need for now because only gpt_oss supports EP for now!@

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2025-09-01T02:10:24Z

To reproduce the error:
Please run this command and script on a CPU-only device.
mpirun -np 2 --map-by ppr:1:numa --bind-to numa -genv MASTER_ADDR=127.0.0.1 -genv MASTER_PORT=29500 -genv OMP_NUM_THREADS=32 python tp_hf.py

import os
import torch
import torch.distributed as dist
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
from transformers.distributed import DistributedConfig


model_id = "lmsys/gpt-oss-20b-bf16"
os.environ['RANK'] = str(os.environ.get('PMI_RANK', 0))
os.environ['LOCAL_RANK'] = str(os.environ.get('PMI_RANK', 0))
os.environ['WORLD_SIZE'] = str(os.environ.get('PMI_SIZE', 1))
rank = int(os.environ['LOCAL_RANK'])
world_size = int(os.environ['WORLD_SIZE'])

def main(rank, world_size) -> None:
    is_tp = world_size > 1
    model_kwargs = dict(torch_dtype=torch.bfloat16)
    if is_tp:
        model_kwargs["tp_plan"] = "auto"
        # model_kwargs["distributed_config"] = DistributedConfig(enable_expert_parallel=1)
    else:
        model_kwargs["device_map"] = "cpu"

    # Retrieve tensor parallel model
    model = AutoModelForCausalLM.from_pretrained(model_id, **model_kwargs)
    if dist.is_initialized():
        print("Backend:", dist.get_backend())
    else:
        print("Distributed process group is not initialized.")


    # Retrieve tensor parallel model
    config = AutoConfig.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id, config=config,  **model_kwargs)
    if dist.is_initialized():
        print("Backend:", dist.get_backend())
    else:
        print("Distributed process group is not initialized.")

    # Prepare input tokens
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    prompt = "Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun."
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, do_sample=False, max_new_tokens=32)

    if rank == 0:
        print(tokenizer.batch_decode(outputs, skip_special_tokens=True))


if __name__ == "__main__":
    rank = int(os.environ["RANK"]) if "RANK" in os.environ else 0
    world_size = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1
    main(rank, world_size)

Output before this PR:

["Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She was a very good friend, and she was a very good\n\nIt sounds like you're sharing a story or a prompt! If you'd like to continue the story"]

Output after this PR:

['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She was a very good friend, and she was a good friend. She was a good friend. She was a good\n\nIt seems like your text got cut']

Output without EP: python tp_hf.py

['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She was a very good friend, and she was a good friend. She was a good friend. She was a good\n\nIt seems like your message got cut']

You can see that the output is almost the same as without EP after this PR.

cc @SunMarc @ArthurZucker

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2025-09-02T02:44:44Z

Hi @ArthurZucker @SunMarc @Rocketknight1 . Would you please review this PR? I've copied the codes to reproduce the issue. You can run it under the cpu-only torch without transformers customized kernels. Installing a cpu-only torch pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu

jiqing-feng · 2025-09-05T04:59:59Z

I also added base_model_ep_plan in configuration_gpt_oss.py so the distributed_config = DistributedConfig(enable_expert_parallel=1) can work. Otherwise it has no parallelism implemented as ep_plan is None.

jiqing-feng · 2025-09-05T05:15:16Z

Hi @SunMarc . Would you please review this PR? We have other tasks on gpt-oss model which blocked by this PR. Waiting for your review! Thanks!

jiqing-feng · 2025-09-05T08:45:45Z

I read this blog and followed the instruction of distributed_config=DistributedConfig(enable_expert_parallel=1), but the ep_plan is None in gpt-oss model.

I don't know how cuda performs the EP, maybe because cuda uses kernels. But CPU definitely needs to pass ep_plan if we want to enable EP. So I added base_model_ep_plan in configuration_gpt_oss.py

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

ArthurZucker · 2025-09-08T07:55:49Z

Sorry I was off for a week!

jiqing-feng · 2025-09-08T08:50:13Z

Sorry I was off for a week!

@ArthurZucker . No worries. This is a bug fix and the case can be easily reproduced on CPU. Please review this PR as we have a blog pending on this PR being merged, so we can release.

jiqing-feng · 2025-09-09T02:55:21Z

The failed CI is not related to my changes.

ArthurZucker · 2025-09-09T07:55:06Z

Don't worry I'll do the last review today and merge!

ArthurZucker

Very nice, let's just add a small explanation int he doc and good to go!

src/transformers/models/gpt_oss/configuration_gpt_oss.py

src/transformers/models/gpt_oss/modeling_gpt_oss.py

src/transformers/integrations/tensor_parallel.py

github-actions · 2025-09-10T01:10:34Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss, mxfp4

jiqing-feng · 2025-09-10T03:06:41Z

Hi @ArthurZucker . I have fixed your comments. Please review it. Thanks!

ArthurZucker · 2025-09-10T08:30:49Z

Thanks 🤗

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix out shape Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix router indice Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix mod Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix masking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add safety cheking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix checking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable 1 expert per rank Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix skip Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add ep plan in config Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add update ep plan Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm ep_plan and add comments Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix out shape

3d6755b

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng added 10 commits August 29, 2025 17:52

fix router indice

177a44d

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix mod

13dd29d

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix masking

ac5ce9f

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix typo

232b09c

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix typo

b549cbc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix format

b5a1da4

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

add safety cheking

e5c9754

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix checking

b40b89a

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

enable 1 expert per rank

6148a1f

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Merge branch 'main' into gpt-oss-ep

d966a09

jiqing-feng marked this pull request as ready for review September 1, 2025 02:10

github-actions bot requested review from MekkCyber and SunMarc September 1, 2025 02:10

jiqing-feng added 2 commits September 1, 2025 09:45

fix skip

82ae9e4

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Merge branch 'main' into gpt-oss-ep

82e323e

Merge branch 'main' into gpt-oss-ep

a54b9a8

jiqing-feng force-pushed the gpt-oss-ep branch 2 times, most recently from b6687d1 to cce133f Compare September 5, 2025 08:30

jiqing-feng added 2 commits September 5, 2025 12:37

add ep plan in config

daace3c

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Merge branch 'main' into gpt-oss-ep

cce133f

jiqing-feng added 2 commits September 5, 2025 16:19

add update ep plan

312665d

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix typo

8c591ad

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Merge branch 'main' into gpt-oss-ep

68dfa0b

Merge branch 'main' into gpt-oss-ep

1788394

ArthurZucker approved these changes Sep 9, 2025

View reviewed changes

src/transformers/models/gpt_oss/configuration_gpt_oss.py Outdated Show resolved Hide resolved

src/transformers/models/gpt_oss/modeling_gpt_oss.py Outdated Show resolved Hide resolved

src/transformers/integrations/tensor_parallel.py Show resolved Hide resolved

Merge branch 'main' into gpt-oss-ep

62d05f9

ArthurZucker merged commit 3340ccb into huggingface:main Sep 10, 2025
19 of 21 checks passed

rm ep_plan and add comments

5fcc888

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Fix gpt-oss router_indices in EP #40545

Fix gpt-oss router_indices in EP #40545

Uh oh!

Conversation

jiqing-feng commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Aug 29, 2025

Uh oh!

jiqing-feng commented Aug 29, 2025

Uh oh!

jiqing-feng commented Aug 29, 2025

Uh oh!

ArthurZucker commented Aug 29, 2025

Uh oh!

jiqing-feng commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiqing-feng commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiqing-feng commented Sep 5, 2025

Uh oh!

jiqing-feng commented Sep 5, 2025

Uh oh!

jiqing-feng commented Sep 5, 2025

Uh oh!

ArthurZucker commented Sep 8, 2025

Uh oh!

jiqing-feng commented Sep 8, 2025

Uh oh!

jiqing-feng commented Sep 9, 2025

Uh oh!

ArthurZucker commented Sep 9, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

jiqing-feng commented Sep 10, 2025

Uh oh!

ArthurZucker commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

jiqing-feng commented Aug 29, 2025 •

edited

Loading

jiqing-feng commented Sep 1, 2025 •

edited

Loading

jiqing-feng commented Sep 2, 2025 •

edited

Loading