Quantized `flan-t5-large` RuntimeError - empty_strided not supported on quantized tensors yet #275

jmdu99 · 2023-04-08T14:12:45Z

I have applied dynamic quantization to a flan-t5-large model. However, when I try to evaluate the generated summaries I get this error:

RuntimeError: empty_strided not supported on quantized tensors yet see https://github.com/pytorch/pytorch/issues/74540

Code:

from optimum.intel.neural_compressor import INCModelForSeq2SeqLM

model = INCModelForSeq2SeqLM.from_pretrained(model_name).to(device)

for examples_chunk in tqdm(list(chunks(examples, batch_size))):
        examples_chunk = [prefix + text for text in examples_chunk]
        batch = tokenizer(examples_chunk, return_tensors="pt", truncation=True, padding="longest").to(device)
        summaries = model.generate(
                    input_ids=batch.input_ids,
                    attention_mask=batch.attention_mask,
                    **generate_kwargs,
        )

Dependencies:

transformers                 4.26.1
neural-compressor            2.1
optimum-intel                1.7.3
torch                        2.0.0

Traceback:

│ 61 │ for examples_chunk in tqdm(list(chunks(examples, batch_size))): │
│ 62 │ │ examples_chunk = [prefix + text for text in examples_chunk] │
│ 63 │ │ batch = tokenizer(examples_chunk, return_tensors="pt", truncation=True, padding= │
│ ❱ 64 │ │ summaries = model.generate( │
│ 65 │ │ │ input_ids=batch.input_ids, │
│ 66 │ │ │ attention_mask=batch.attention_mask, │
│ 67 │ │ │ **generate_kwargs, │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py:115 in │
│ decorate_context │
│ │
│ 112 │ @functools.wraps(func) │
│ 113 │ def decorate_context(*args, **kwargs): │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │ │
│ 117 │ return decorate_context │
│ 118 │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/generation/utils.py:1252 in │
│ generate │
│ │
│ 1249 │ │ if self.config.is_encoder_decoder and "encoder_outputs" not in model_kwargs: │
│ 1250 │ │ │ # if model is encoder decoder encoder_outputs are created │
│ 1251 │ │ │ # and added to model_kwargs │
│ ❱ 1252 │ │ │ model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation( │
│ 1253 │ │ │ │ inputs_tensor, model_kwargs, model_input_name │
│ 1254 │ │ │ ) │
│ 1255 │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/generation/utils.py:617 in │
│ _prepare_encoder_decoder_kwargs_for_generation │
│ │
│ 614 │ │ model_input_name = model_input_name if model_input_name is not None else self.ma │
│ 615 │ │ encoder_kwargs["return_dict"] = True │
│ 616 │ │ encoder_kwargs[model_input_name] = inputs_tensor │
│ ❱ 617 │ │ model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs) │
│ 618 │ │ │
│ 619 │ │ return model_kwargs │
│ 620 │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py:1055 in │
│ forward │
│ │
│ 1052 │ │ │ │ │ None, # past_key_value is always None with gradient checkpointing │
│ 1053 │ │ │ │ ) │
│ 1054 │ │ │ else: │
│ ❱ 1055 │ │ │ │ layer_outputs = layer_module( │
│ 1056 │ │ │ │ │ hidden_states, │
│ 1057 │ │ │ │ │ attention_mask=extended_attention_mask, │
│ 1058 │ │ │ │ │ position_bias=position_bias, │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py:739 in │
│ forward │
│ │
│ 736 │ │ │ attention_outputs = attention_outputs + cross_attention_outputs[2:] │
│ 737 │ │ │
│ 738 │ │ # Apply Feed Forward layer │
│ ❱ 739 │ │ hidden_states = self.layer-1 │
│ 740 │ │ │
│ 741 │ │ # clamp inf values to enable fp16 training │
│ 742 │ │ if hidden_states.dtype == torch.float16 and torch.isinf(hidden_states).any(): │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py:336 in │
│ forward │
│ │
│ 333 │ │
│ 334 │ def forward(self, hidden_states): │
│ 335 │ │ forwarded_states = self.layer_norm(hidden_states) │
│ ❱ 336 │ │ forwarded_states = self.DenseReluDense(forwarded_states) │
│ 337 │ │ hidden_states = hidden_states + self.dropout(forwarded_states) │
│ 338 │ │ return hidden_states │
│ 339 │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/mrshu/miniconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py:317 in │
│ forward │
│ │
│ 314 │ │ # See huggingface/transformers#20287 │
│ 315 │ │ # we also make sure the weights are not in int8 in case users will force `_kee │
│ 316 │ │ if hidden_states.dtype != self.wo.weight.dtype and self.wo.weight.dtype != torch │
│ ❱ 317 │ │ │ hidden_states = hidden_states.to(self.wo.weight.dtype) │
│ 318 │ │ │
│ 319 │ │ hidden_states = self.wo(hidden_states) │
│ 320 │ │ return hidden_states │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

The text was updated successfully, but these errors were encountered:

argideritzalpea · 2023-04-14T18:59:39Z

@jmdu99 Can you provide the code you used for quantization and generation, to provide the maintainers more context to address this issue?

jmdu99 · 2023-04-14T19:54:13Z

@jmdu99 Can you provide the code you used for quantization and generation, to provide the maintainers more context to address this issue?

Updated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantized `flan-t5-large` RuntimeError - empty_strided not supported on quantized tensors yet #275

Quantized `flan-t5-large` RuntimeError - empty_strided not supported on quantized tensors yet #275

jmdu99 commented Apr 8, 2023 •

edited

argideritzalpea commented Apr 14, 2023 •

edited

jmdu99 commented Apr 14, 2023

Quantized flan-t5-large RuntimeError - empty_strided not supported on quantized tensors yet #275

Quantized flan-t5-large RuntimeError - empty_strided not supported on quantized tensors yet #275

Comments

jmdu99 commented Apr 8, 2023 • edited

argideritzalpea commented Apr 14, 2023 • edited

jmdu99 commented Apr 14, 2023

Quantized `flan-t5-large` RuntimeError - empty_strided not supported on quantized tensors yet #275

Quantized `flan-t5-large` RuntimeError - empty_strided not supported on quantized tensors yet #275

jmdu99 commented Apr 8, 2023 •

edited

argideritzalpea commented Apr 14, 2023 •

edited