[254-llm-chatbot] NNCF is not yet supported OpenVINO data type: bf16. #1514

OpenVINO-dev-contest · 2023-12-05T15:12:46Z

Describe the bug

When compressing LLAMA2 with int4 weight, a error message come out during this step:

Step: compressed_model = nncf.compress_weights(ov_model, **model_compression_params):

Error message: NotImplementedError: NNCF is not yet supported OpenVINO data type: bf16.

The text was updated successfully, but these errors were encountered:

OpenVINO-dev-contest · 2023-12-06T01:05:20Z

Full error meassages:

NotImplementedError Traceback (most recent call last)
Cell In[8], line 156
154 convert_to_fp16()
155 if prepare_int8_model.value:
--> 156 convert_to_int8()
157 if prepare_int4_model.value:
158 convert_to_int4()

Cell In[8], line 49, in convert_to_int8()
47 ov_model.half()
48 quantizer = OVQuantizer.from_pretrained(ov_model)
---> 49 quantizer.quantize(save_directory=int8_model_dir, weights_only=True)
50 del quantizer
51 del ov_model

File ~/ovnb/lib/python3.10/site-packages/optimum/intel/openvino/quantization.py:180, in OVQuantizer.quantize(self, calibration_dataset, save_directory, quantization_config, file_name, batch_size, data_collator, remove_unused_columns, weights_only, **kwargs)
174 raise ValueError(
175 "calibration_dataset is needed to compute the activations range during the calibration step and was not provided. "
176 "In case you only want to apply quantization on the weights, please set weights_only=True."
177 )
179 if isinstance(self.model, OVBaseDecoderModel) and self.model.use_cache:
--> 180 self._quantize_ovcausallm(
181 calibration_dataset,
182 save_directory,
183 batch_size,
184 data_collator,
185 remove_unused_columns,
186 weights_only,
187 **kwargs,
188 )
189 elif isinstance(self.model, OVBaseModel):
190 self._quantize_ovbasemodel(
191 calibration_dataset,
192 save_directory,
(...)
197 **kwargs,
198 )

File ~/ovnb/lib/python3.10/site-packages/optimum/intel/openvino/quantization.py:263, in OVQuantizer._quantize_ovcausallm(self, calibration_dataset, save_directory, batch_size, data_collator, remove_unused_columns, weights_only, **kwargs)
260 save_directory.mkdir(parents=True, exist_ok=True)
262 if weights_only:
--> 263 self.model.model = nncf.compress_weights(self.model.model)
264 self.model.save_pretrained(save_directory)
265 return

File ~/ovnb/lib/python3.10/site-packages/nncf/quantization/quantize_model.py:294, in compress_weights(model, mode, ratio, group_size, ignored_scope)
291 return compress_weights_impl(model, mode, ratio, group_size, ignored_scope)
293 compression_algorithm = WeightCompression(mode, ratio, group_size, ignored_scope)
--> 294 graph = NNCFGraphFactory.create(model)
295 return compression_algorithm.apply(model, graph)

File ~/ovnb/lib/python3.10/site-packages/nncf/common/factory.py:45, in NNCFGraphFactory.create(model)
42 if model_backend == BackendType.OPENVINO:
43 from nncf.openvino.graph.nncf_graph_builder import GraphConverter
---> 45 return GraphConverter.create_nncf_graph(model)
46 if model_backend == BackendType.TORCH:
47 return model.nncf.get_graph()

File ~/ovnb/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py:197, in GraphConverter.create_nncf_graph(model)
194 layer_attributes = get_weighted_layer_attributes(node, metatype, const_attrs)
195 nncf_node.layer_attributes = OVLayerAttributes(const_attrs, layer_attributes, act_attrs)
--> 197 GraphConverter._add_edges_to_nncf_graph(model, nncf_graph)
198 return nncf_graph

File ~/ovnb/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py:100, in GraphConverter._add_edges_to_nncf_graph(model, graph)
98 output_node_id = graph.get_node_by_name(out_node.get_friendly_name()).node_id
99 ov_dtype = out.get_element_type().get_type_name()
--> 100 nncf_dtype = GraphConverter.convert_to_nncf_dtype(ov_dtype)
102 parallel_inputs = None
103 if len(inputs) > 1:

File ~/ovnb/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py:61, in GraphConverter.convert_to_nncf_dtype(ov_dtype)
45 conversion_map = {
46 "f16": "float",
47 "f32": "float",
(...)
58 "boolean": "int",
59 }
60 if ov_dtype not in conversion_map:
---> 61 raise NotImplementedError(f"NNCF is not yet supported OpenVINO data type: {ov_dtype}.")
62 return Dtype(conversion_map[ov_dtype])

NotImplementedError: NNCF is not yet supported OpenVINO data type: bf16.

eaidova · 2023-12-06T09:32:07Z

should be resolved by merge huggingface/optimum-intel#484
please update used optimum-intel version

KodiaqQ · 2023-12-08T10:59:05Z

Hi.
We've also prepared improvements in the NNCF for BF16 precision - openvinotoolkit/nncf#2307

azhuvath · 2024-03-05T09:14:31Z

INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/torch/cuda/init.py:628: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
Compress to FP16
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.42it/s]
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/modeling_utils.py:4193: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the LlamaAttention class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the LlamaAttention class
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:1068: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.causal_mask.shape[-1]:
Export model to OpenVINO directly failed with:
Check 'is_conversion_successful' failed at src/frontends/pytorch/src/frontend.cpp:141:
FrontEnd API failed with OpConversionFailure:
Model wasn't fully converted. Failed operations detailed log:
-- aten::mul with a message:
Exception happened during conversion of operation __module.model/aten::mul with schema aten::mul.Tensor(Tensor self, Tensor other) -> Tensor
Check 'args_et.is_dynamic() || args_et != element::boolean' failed at src/core/src/op/util/binary_elementwise_arithmetic.cpp:25:
While validating node 'opset1::Multiply Multiply_218 (__module.model/aten::eq/Equal[0]:boolean[...], __module.model/aten::eq/Equal[0]:boolean[?,1,1,?]) -> (dynamic[...])' with friendly_name 'Multiply_218':
Arguments cannot have boolean element type (argument element type: boolean).

Summary:
-- Conversion is failed for: aten::mul
.
Model will be exported to ONNX
[ WARNING ] Making stateful models is not supported when exporting to ONNX as an intermediate step. A stateless model will be exported instead. It may result in sub-optimal inference performance.Provide a model that can be converted to OpenVINO without fallback to ONNX conversion path.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
Saving external data to one file...
Provided model does not contain state. It may lead to sub-optimal performance.Please reexport model with updated OpenVINO version >= 2023.3.0 calling the from_pretrained method with original model and export=True parameter
Compress to INT8
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.03it/s]
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/modeling_utils.py:4193: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:1068: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.causal_mask.shape[-1]:
Export model to OpenVINO directly failed with:
Check 'is_conversion_successful' failed at src/frontends/pytorch/src/frontend.cpp:141:
FrontEnd API failed with OpConversionFailure:
Model wasn't fully converted. Failed operations detailed log:
-- aten::mul with a message:
Exception happened during conversion of operation __module.model/aten::mul with schema aten::mul.Tensor(Tensor self, Tensor other) -> Tensor
Check 'args_et.is_dynamic() || args_et != element::boolean' failed at src/core/src/op/util/binary_elementwise_arithmetic.cpp:25:
While validating node 'opset1::Multiply Multiply_225119 (__module.model/aten::eq/Equal[0]:boolean[...], __module.model/aten::eq/Equal[0]:boolean[?,1,1,?]) -> (dynamic[...])' with friendly_name 'Multiply_225119':
Arguments cannot have boolean element type (argument element type: boolean).

Summary:
-- Conversion is failed for: aten::mul
.
Model will be exported to ONNX
[ WARNING ] Making stateful models is not supported when exporting to ONNX as an intermediate step. A stateless model will be exported instead. It may result in sub-optimal inference performance.Provide a model that can be converted to OpenVINO without fallback to ONNX conversion path.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
Saving external data to one file...
Applying Weight Compression ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 226/226 • 0:00:51 • 0:00:00
Provided model does not contain state. It may lead to sub-optimal performance.Please reexport model with updated OpenVINO version >= 2023.3.0 calling the from_pretrained method with original model and export=True parameter
Compress to INT4
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.19it/s]
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/modeling_utils.py:4193: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:1068: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.causal_mask.shape[-1]:
Export model to OpenVINO directly failed with:
Check 'is_conversion_successful' failed at src/frontends/pytorch/src/frontend.cpp:141:
FrontEnd API failed with OpConversionFailure:
Model wasn't fully converted. Failed operations detailed log:
-- aten::mul with a message:
Exception happened during conversion of operation __module.model/aten::mul with schema aten::mul.Tensor(Tensor self, Tensor other) -> Tensor
Check 'args_et.is_dynamic() || args_et != element::boolean' failed at src/core/src/op/util/binary_elementwise_arithmetic.cpp:25:
While validating node 'opset1::Multiply Multiply_433442 (__module.model/aten::eq/Equal[0]:boolean[...], __module.model/aten::eq/Equal[0]:boolean[?,1,1,?]) -> (dynamic[...])' with friendly_name 'Multiply_433442':
Arguments cannot have boolean element type (argument element type: boolean).

Summary:
-- Conversion is failed for: aten::mul
.
Model will be exported to ONNX
[ WARNING ] Making stateful models is not supported when exporting to ONNX as an intermediate step. A stateless model will be exported instead. It may result in sub-optimal inference performance.Provide a model that can be converted to OpenVINO without fallback to ONNX conversion path.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
Saving external data to one file...
Provided model does not contain state. It may lead to sub-optimal performance.Please reexport model with updated OpenVINO version >= 2023.3.0 calling the from_pretrained method with original model and export=True parameter
Traceback (most recent call last):
File "/home/sdp/clabs/ov_conversion.py", line 227, in
convert_to_int4()
File "/home/sdp/clabs/ov_conversion.py", line 197, in convert_to_int4
ov_model = OVModelForCausalLM.from_pretrained(
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/optimum/modeling_base.py", line 399, in from_pretrained
return from_pretrained_method(
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/optimum/intel/openvino/modeling_decoder.py", line 283, in _from_transformers
return cls._from_pretrained(
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/optimum/intel/openvino/modeling_decoder.py", line 615, in _from_pretrained
_weight_only_quantization(causal_model, quantization_config)
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/optimum/intel/openvino/quantization.py", line 567, in _weight_only_quantization
model.model = nncf.compress_weights(
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/nncf/quantization/quantize_model.py", line 364, in compress_weights
graph = NNCFGraphFactory.create(model)
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/nncf/common/factory.py", line 43, in create
return GraphConverter.create_nncf_graph(model)
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py", line 217, in create_nncf_graph
GraphConverter._add_edges_to_nncf_graph(model, nncf_graph)
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py", line 100, in _add_edges_to_nncf_graph
nncf_dtype = GraphConverter.convert_to_nncf_dtype(ov_dtype)
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py", line 61, in convert_to_nncf_dtype
raise NotImplementedError(f"NNCF is not yet supported OpenVINO data type: {ov_dtype}.")
NotImplementedError: NNCF is not yet supported OpenVINO data type: bf16.

azhuvath · 2024-03-05T09:15:46Z

should be resolved by merge huggingface/optimum-intel#484 please update used optimum-intel version

Doesn't look like this issue is fixed. Please refer to the logs above with latest Optimum library.

canyurt7 · 2024-04-23T08:37:15Z

having the exact same issue when deploying Llama3 on CPU with OpenVino

Any update on this?

KNChiu · 2024-04-25T03:38:05Z

When I using tiny-llama-1b have the same warnings.
optimum-cli export openvino --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --task text-generation-with-past --weight-format fp16 tiny-llama-1b-chat\FP16

helena-intel · 2024-04-26T06:49:48Z

@KNChiu Can you upgrade dependencies with pip install --upgrade --upgrade-strategy eager optimum[openvino]? If it still doesn't work after that, can you create an issue in https://github.com/huggingface/optimum-intel/issues and copy the output from running that command?

avitial added the bug Something isn't working label Dec 6, 2023

andrei-kochin closed this as completed Dec 12, 2023

canyurt7 mentioned this issue Apr 23, 2024

fix causallm weights compression via quantizaer huggingface/optimum-intel#484

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[254-llm-chatbot] NNCF is not yet supported OpenVINO data type: bf16. #1514

[254-llm-chatbot] NNCF is not yet supported OpenVINO data type: bf16. #1514

OpenVINO-dev-contest commented Dec 5, 2023 •

edited

Loading

OpenVINO-dev-contest commented Dec 6, 2023

eaidova commented Dec 6, 2023

KodiaqQ commented Dec 8, 2023

azhuvath commented Mar 5, 2024

azhuvath commented Mar 5, 2024

canyurt7 commented Apr 23, 2024 •

edited

Loading

KNChiu commented Apr 25, 2024

helena-intel commented Apr 26, 2024

[254-llm-chatbot] NNCF is not yet supported OpenVINO data type: bf16. #1514

[254-llm-chatbot] NNCF is not yet supported OpenVINO data type: bf16. #1514

Comments

OpenVINO-dev-contest commented Dec 5, 2023 • edited Loading

OpenVINO-dev-contest commented Dec 6, 2023

eaidova commented Dec 6, 2023

KodiaqQ commented Dec 8, 2023

azhuvath commented Mar 5, 2024

azhuvath commented Mar 5, 2024

canyurt7 commented Apr 23, 2024 • edited Loading

KNChiu commented Apr 25, 2024

helena-intel commented Apr 26, 2024

OpenVINO-dev-contest commented Dec 5, 2023 •

edited

Loading

canyurt7 commented Apr 23, 2024 •

edited

Loading