Cannot run the pretrain_gpt example using moe branch #43

getao · 2022-04-23T11:52:19Z

Hi,

I tried the examples (pretrain gpt and gpt with MoE) but failed to run both.

Running the pretrain gpt example shows an error like "Element 1 of tensors does not require grad and does not have a grad_fn"

Running MoE examples always show an error saying ep_size is not valid argument when calling moe in deepspeed (i tried from deepspeed from 0.5.0 to 0.6.1; unfortunately, none works).

Could anyone kindly help me with the issues?

Thanks

starkhu · 2022-06-14T03:03:34Z

i found the same problems.

awan-10 · 2022-08-15T21:18:21Z

@starkhu and @getao -- can you please use the MoE examples with the main branch? Our moe branch is now old but the support has been merged to the main branch already.

marsggbo · 2023-11-27T07:36:55Z

@awan-10 have you solved the problem?

hyoo pushed a commit to hyoo/Megatron-DeepSpeed that referenced this issue Apr 21, 2023

included two OSs in bug_report (microsoft#43)

dd299fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot run the pretrain_gpt example using moe branch #43

Cannot run the pretrain_gpt example using moe branch #43

getao commented Apr 23, 2022

starkhu commented Jun 14, 2022

awan-10 commented Aug 15, 2022

marsggbo commented Nov 27, 2023

Cannot run the pretrain_gpt example using moe branch #43

Cannot run the pretrain_gpt example using moe branch #43

Comments

getao commented Apr 23, 2022

starkhu commented Jun 14, 2022

awan-10 commented Aug 15, 2022

marsggbo commented Nov 27, 2023