Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantized flan-t5-large RuntimeError - empty_strided not supported on quantized tensors yet #275

Open
jmdu99 opened this issue Apr 8, 2023 · 2 comments

Comments

@jmdu99
Copy link

jmdu99 commented Apr 8, 2023

I have applied dynamic quantization to a flan-t5-large model. However, when I try to evaluate the generated summaries I get this error:

RuntimeError: empty_strided not supported on quantized tensors yet see https://github.com/pytorch/pytorch/issues/74540

Code:

from optimum.intel.neural_compressor import INCModelForSeq2SeqLM

model = INCModelForSeq2SeqLM.from_pretrained(model_name).to(device)

for examples_chunk in tqdm(list(chunks(examples, batch_size))):
        examples_chunk = [prefix + text for text in examples_chunk]
        batch = tokenizer(examples_chunk, return_tensors="pt", truncation=True, padding="longest").to(device)
        summaries = model.generate(
                    input_ids=batch.input_ids,
                    attention_mask=batch.attention_mask,
                    **generate_kwargs,
        )

Dependencies:

transformers                 4.26.1
neural-compressor            2.1
optimum-intel                1.7.3
torch                        2.0.0

Traceback:

│ 61 │ for examples_chunk in tqdm(list(chunks(examples, batch_size))): │
│ 62 │ │ examples_chunk = [prefix + text for text in examples_chunk] │
│ 63 │ │ batch = tokenizer(examples_chunk, return_tensors="pt", truncation=True, padding= │
│ ❱ 64 │ │ summaries = model.generate( │
│ 65 │ │ │ input_ids=batch.input_ids, │
│ 66 │ │ │ attention_mask=batch.attention_mask, │
│ 67 │ │ │ **generate_kwargs, │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py:115 in │
│ decorate_context │
│ │
│ 112 │ @functools.wraps(func) │
│ 113 │ def decorate_context(*args, **kwargs): │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │ │
│ 117 │ return decorate_context │
│ 118 │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/generation/utils.py:1252 in │
│ generate │
│ │
│ 1249 │ │ if self.config.is_encoder_decoder and "encoder_outputs" not in model_kwargs: │
│ 1250 │ │ │ # if model is encoder decoder encoder_outputs are created │
│ 1251 │ │ │ # and added to model_kwargs
│ ❱ 1252 │ │ │ model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation( │
│ 1253 │ │ │ │ inputs_tensor, model_kwargs, model_input_name │
│ 1254 │ │ │ ) │
│ 1255 │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/generation/utils.py:617 in │
│ _prepare_encoder_decoder_kwargs_for_generation │
│ │
│ 614 │ │ model_input_name = model_input_name if model_input_name is not None else self.ma │
│ 615 │ │ encoder_kwargs["return_dict"] = True │
│ 616 │ │ encoder_kwargs[model_input_name] = inputs_tensor │
│ ❱ 617 │ │ model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs) │
│ 618 │ │ │
│ 619 │ │ return model_kwargs │
│ 620 │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py:1055 in │
│ forward │
│ │
│ 1052 │ │ │ │ │ None, # past_key_value is always None with gradient checkpointing │
│ 1053 │ │ │ │ ) │
│ 1054 │ │ │ else: │
│ ❱ 1055 │ │ │ │ layer_outputs = layer_module( │
│ 1056 │ │ │ │ │ hidden_states, │
│ 1057 │ │ │ │ │ attention_mask=extended_attention_mask, │
│ 1058 │ │ │ │ │ position_bias=position_bias, │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py:739 in │
│ forward │
│ │
│ 736 │ │ │ attention_outputs = attention_outputs + cross_attention_outputs[2:] │
│ 737 │ │ │
│ 738 │ │ # Apply Feed Forward layer │
│ ❱ 739 │ │ hidden_states = self.layer-1
│ 740 │ │ │
│ 741 │ │ # clamp inf values to enable fp16 training │
│ 742 │ │ if hidden_states.dtype == torch.float16 and torch.isinf(hidden_states).any(): │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py:336 in │
│ forward │
│ │
│ 333 │ │
│ 334 │ def forward(self, hidden_states): │
│ 335 │ │ forwarded_states = self.layer_norm(hidden_states) │
│ ❱ 336 │ │ forwarded_states = self.DenseReluDense(forwarded_states) │
│ 337 │ │ hidden_states = hidden_states + self.dropout(forwarded_states) │
│ 338 │ │ return hidden_states │
│ 339 │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py:317 in │
│ forward │
│ │
│ 314 │ │ # See huggingface/transformers#20287
│ 315 │ │ # we also make sure the weights are not in int8 in case users will force `_kee │
│ 316 │ │ if hidden_states.dtype != self.wo.weight.dtype and self.wo.weight.dtype != torch │
│ ❱ 317 │ │ │ hidden_states = hidden_states.to(self.wo.weight.dtype) │
│ 318 │ │ │
│ 319 │ │ hidden_states = self.wo(hidden_states) │
│ 320 │ │ return hidden_states │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

@argideritzalpea
Copy link

argideritzalpea commented Apr 14, 2023

@jmdu99 Can you provide the code you used for quantization and generation, to provide the maintainers more context to address this issue?

@jmdu99
Copy link
Author

jmdu99 commented Apr 14, 2023

@jmdu99 Can you provide the code you used for quantization and generation, to provide the maintainers more context to address this issue?

Updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants