-
Notifications
You must be signed in to change notification settings - Fork 30.6k
Fix gpt-oss router_indices in EP #40545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @ArthurZucker I think? |
run-slow: gpt_oss |
I also need to change all the other MOE, like mixtral, after this is verified. |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
You should not need for now because only |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
To reproduce the error: import os
import torch
import torch.distributed as dist
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
from transformers.distributed import DistributedConfig
model_id = "lmsys/gpt-oss-20b-bf16"
os.environ['RANK'] = str(os.environ.get('PMI_RANK', 0))
os.environ['LOCAL_RANK'] = str(os.environ.get('PMI_RANK', 0))
os.environ['WORLD_SIZE'] = str(os.environ.get('PMI_SIZE', 1))
rank = int(os.environ['LOCAL_RANK'])
world_size = int(os.environ['WORLD_SIZE'])
def main(rank, world_size) -> None:
is_tp = world_size > 1
model_kwargs = dict(torch_dtype=torch.bfloat16)
if is_tp:
model_kwargs["tp_plan"] = "auto"
# model_kwargs["distributed_config"] = DistributedConfig(enable_expert_parallel=1)
else:
model_kwargs["device_map"] = "cpu"
# Retrieve tensor parallel model
model = AutoModelForCausalLM.from_pretrained(model_id, **model_kwargs)
if dist.is_initialized():
print("Backend:", dist.get_backend())
else:
print("Distributed process group is not initialized.")
# Retrieve tensor parallel model
config = AutoConfig.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, config=config, **model_kwargs)
if dist.is_initialized():
print("Backend:", dist.get_backend())
else:
print("Distributed process group is not initialized.")
# Prepare input tokens
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, do_sample=False, max_new_tokens=32)
if rank == 0:
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
if __name__ == "__main__":
rank = int(os.environ["RANK"]) if "RANK" in os.environ else 0
world_size = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1
main(rank, world_size) Output before this PR:
Output after this PR:
Output without EP:
You can see that the output is almost the same as without EP after this PR. |
Hi @ArthurZucker @SunMarc @Rocketknight1 . Would you please review this PR? I've copied the codes to reproduce the issue. You can run it under the cpu-only torch without transformers customized kernels. Installing a cpu-only torch |
I also added |
Hi @SunMarc . Would you please review this PR? We have other tasks on gpt-oss model which blocked by this PR. Waiting for your review! Thanks! |
b6687d1
to
cce133f
Compare
I read this blog and followed the instruction of I don't know how cuda performs the EP, maybe because cuda uses kernels. But CPU definitely needs to pass ep_plan if we want to enable EP. So I added |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Sorry I was off for a week! |
@ArthurZucker . No worries. This is a bug fix and the case can be easily reproduced on CPU. Please review this PR as we have a blog pending on this PR being merged, so we can release. |
The failed CI is not related to my changes. |
Don't worry I'll do the last review today and merge! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, let's just add a small explanation int he doc and good to go!
[For maintainers] Suggested jobs to run (before merge) run-slow: gpt_oss, mxfp4 |
Hi @ArthurZucker . I have fixed your comments. Please review it. Thanks! |
Thanks 🤗 |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix out shape Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix router indice Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix mod Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix masking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add safety cheking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix checking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable 1 expert per rank Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix skip Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add ep plan in config Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add update ep plan Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm ep_plan and add comments Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix out shape Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix router indice Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix mod Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix masking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add safety cheking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix checking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable 1 expert per rank Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix skip Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add ep plan in config Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add update ep plan Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm ep_plan and add comments Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
The indices has been masked by
0
, but the0
will be recognized asexperts[0]
. We need a new class specific for maskingI run gpt-oss with EP=2, and found both rank0 and rank1 computed expert 0

After this PR, we can see the masking expert is num_expert (16) here, and 16 will be skipped.
