You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/root/DeepSpeedExamples/inference/huggingface/text-generation/inference-test.py", line 82, in<module>
outputs = pipe(inputs,
File "/root/DeepSpeedExamples/inference/huggingface/text-generation/utils.py", line 71, in __call__
outputs = self.generate_outputs(input_list, num_tokens=num_tokens, do_sample=do_sample)
File "/root/DeepSpeedExamples/inference/huggingface/text-generation/utils.py", line 119, in generate_outputs
outputs = self.model.generate(input_tokens.input_ids, **generate_kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 636, in _generate
return self.module.generate(*inputs, **kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
File "/miniconda/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 2693, in sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1256, in prepare_inputs_for_generation
if past_key_value := getattr(self.model.layers[0].self_attn, "past_key_value", None):
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'DeepSpeedGPTInference' object has no attribute 'self_attn'
Potential bug?
I suspect it did not find the right inference engine?, which should be DeepSpeedLlamaInference but not DeepSpeedGPTInference?
The text was updated successfully, but these errors were encountered:
Hi @allanj I don't think we have kernel injection support for llama-2 models. If you remove the --use_kernel flag does the script work?
Additionally, what kind of GPUs are you using? You may be able to utilize DeepSpeed-MII to run the llama-2 model and get significant improvements to inference performance if you have GPUs with compute capability >=8.0:
Version
deepspeed:
0.13.4
transformers:
4.38.1
Python:
3.10
Pytorch:
2.1.2+cu121
CUDA: 12.1
Error in Example (To reproduce)
Just simply run this script
https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py
It will show the following error:
Potential bug?
I suspect it did not find the right inference engine?, which should be
DeepSpeedLlamaInference
but notDeepSpeedGPTInference
?The text was updated successfully, but these errors were encountered: