-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs in GPT2 Inference Example #364
Comments
I am facing a similar problem. I have managed to bypass this issue by loading the model using the method implemented at https://github.com/microsoft/Megatron-DeepSpeed/blob/main/tasks/eval_harness/evaluate.py#L410. However, I am now stuck at the part where when I try to use the |
Investigating a bit further, I find that during inference they pick from one of the predefined model implementations here: However, the implementation that we need (DS_MegatronGPTMoEContainer -> https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/containers/megatron_gpt_moe.py) is not mapped to that dictionary (or anywhere else for that matter). |
I'm facing a similar problem: I trained the gpt-125M-MoE64 model by the script ds_pretrain_gpt_125M_MoE64.sh https://github.com/microsoft/Megatron-DeepSpeed/blob/main/examples_deepspeed/MoE/ds_pretrain_gpt_125M_MoE64.sh. |
# group.add_argument("--local_rank", type=int, default=0, help='local_rank')
you have to comment these two lines to enable it to continue
the second is in text_generation_tuils.py, line 466
output_tensor = model(tokens, position_ids, attention_mask, tokentype_ids=tokentype_ids, layer_past=layer_past, get_key_value=get_key_value, forward_method_parallel_output=forward_method_parallel_output)
here the code will provide the layer_past and get_key_value to the model, but in the example you provided, you use GPTModel in gpt_model.py which does not contain any of the args above
def forward(self, input_ids, position_ids, attention_mask, retriever_input_ids=None, retriever_position_ids=None, retriever_attn_mask=None, labels=None, tokentype_ids=None, inference_params=None, curriculum_seqlen=None):
is there a quick way to fix this problem?
The text was updated successfully, but these errors were encountered: