[BUG] Problems with Mixture-of-Experts (MoE) #367

nikit-srivastava · 2024-03-16T11:41:39Z

Hello,

Thank you for the nice work with this training framework. However, I have noticed that there's a problem with inference, conversion and fine-tuning of MoE based GPT model. The following is a list of issues that point the same but have not been yet addressed:

In general, the inference example (generate_text.sh) does not work when --num-experts is set to a value higher than 1. Also, the conversion scripts (convert_checkpoint) are not equipped to handle MoE models.

I would like to request the attention of repository maintainers to this issue. Personally, this issue is being a big roadblock in our research and prevents us from analyzing or publishing our findings. We would be really grateful if this can be resolved soon.

If you need any other information or access to model weights to test, please feel free to ask. With my current knowledge, I can also offer to fix/implement features if you point me in the right direction.

The text was updated successfully, but these errors were encountered:

suu990901 · 2024-05-26T08:37:27Z

Have you solved this problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Problems with Mixture-of-Experts (MoE) #367

[BUG] Problems with Mixture-of-Experts (MoE) #367

nikit-srivastava commented Mar 16, 2024

suu990901 commented May 26, 2024

[BUG] Problems with Mixture-of-Experts (MoE) #367

[BUG] Problems with Mixture-of-Experts (MoE) #367

Comments

nikit-srivastava commented Mar 16, 2024

suu990901 commented May 26, 2024