You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the nice work with this training framework. However, I have noticed that there's a problem with inference, conversion and fine-tuning of MoE based GPT model. The following is a list of issues that point the same but have not been yet addressed:
In general, the inference example (generate_text.sh) does not work when --num-experts is set to a value higher than 1. Also, the conversion scripts (convert_checkpoint) are not equipped to handle MoE models.
I would like to request the attention of repository maintainers to this issue. Personally, this issue is being a big roadblock in our research and prevents us from analyzing or publishing our findings. We would be really grateful if this can be resolved soon.
If you need any other information or access to model weights to test, please feel free to ask. With my current knowledge, I can also offer to fix/implement features if you point me in the right direction.
The text was updated successfully, but these errors were encountered:
Hello,
Thank you for the nice work with this training framework. However, I have noticed that there's a problem with inference, conversion and fine-tuning of MoE based GPT model. The following is a list of issues that point the same but have not been yet addressed:
In general, the inference example (generate_text.sh) does not work when
--num-experts
is set to a value higher than1
. Also, the conversion scripts (convert_checkpoint) are not equipped to handle MoE models.I would like to request the attention of repository maintainers to this issue. Personally, this issue is being a big roadblock in our research and prevents us from analyzing or publishing our findings. We would be really grateful if this can be resolved soon.
If you need any other information or access to model weights to test, please feel free to ask. With my current knowledge, I can also offer to fix/implement features if you point me in the right direction.
The text was updated successfully, but these errors were encountered: