Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run the pretrain_gpt example using moe branch #43

Open
getao opened this issue Apr 23, 2022 · 3 comments
Open

Cannot run the pretrain_gpt example using moe branch #43

getao opened this issue Apr 23, 2022 · 3 comments

Comments

@getao
Copy link

getao commented Apr 23, 2022

Hi,

I tried the examples (pretrain gpt and gpt with MoE) but failed to run both.

Running the pretrain gpt example shows an error like "Element 1 of tensors does not require grad and does not have a grad_fn"

Running MoE examples always show an error saying ep_size is not valid argument when calling moe in deepspeed (i tried from deepspeed from 0.5.0 to 0.6.1; unfortunately, none works).

Could anyone kindly help me with the issues?

Thanks

@starkhu
Copy link

starkhu commented Jun 14, 2022

i found the same problems.

@awan-10
Copy link

awan-10 commented Aug 15, 2022

@starkhu and @getao -- can you please use the MoE examples with the main branch? Our moe branch is now old but the support has been merged to the main branch already.

hyoo pushed a commit to hyoo/Megatron-DeepSpeed that referenced this issue Apr 21, 2023
@marsggbo
Copy link

@awan-10 have you solved the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants