Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[254-llm-chatbot] NNCF is not yet supported OpenVINO data type: bf16. #1514

Closed
OpenVINO-dev-contest opened this issue Dec 5, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@OpenVINO-dev-contest
Copy link
Collaborator

OpenVINO-dev-contest commented Dec 5, 2023

Describe the bug

When compressing LLAMA2 with int4 weight, a error message come out during this step:

Step: compressed_model = nncf.compress_weights(ov_model, **model_compression_params):

Error message: NotImplementedError: NNCF is not yet supported OpenVINO data type: bf16.

@OpenVINO-dev-contest
Copy link
Collaborator Author

Full error meassages:

NotImplementedError Traceback (most recent call last)
Cell In[8], line 156
154 convert_to_fp16()
155 if prepare_int8_model.value:
--> 156 convert_to_int8()
157 if prepare_int4_model.value:
158 convert_to_int4()

Cell In[8], line 49, in convert_to_int8()
47 ov_model.half()
48 quantizer = OVQuantizer.from_pretrained(ov_model)
---> 49 quantizer.quantize(save_directory=int8_model_dir, weights_only=True)
50 del quantizer
51 del ov_model

File ~/ovnb/lib/python3.10/site-packages/optimum/intel/openvino/quantization.py:180, in OVQuantizer.quantize(self, calibration_dataset, save_directory, quantization_config, file_name, batch_size, data_collator, remove_unused_columns, weights_only, **kwargs)
174 raise ValueError(
175 "calibration_dataset is needed to compute the activations range during the calibration step and was not provided. "
176 "In case you only want to apply quantization on the weights, please set weights_only=True."
177 )
179 if isinstance(self.model, OVBaseDecoderModel) and self.model.use_cache:
--> 180 self._quantize_ovcausallm(
181 calibration_dataset,
182 save_directory,
183 batch_size,
184 data_collator,
185 remove_unused_columns,
186 weights_only,
187 **kwargs,
188 )
189 elif isinstance(self.model, OVBaseModel):
190 self._quantize_ovbasemodel(
191 calibration_dataset,
192 save_directory,
(...)
197 **kwargs,
198 )

File ~/ovnb/lib/python3.10/site-packages/optimum/intel/openvino/quantization.py:263, in OVQuantizer._quantize_ovcausallm(self, calibration_dataset, save_directory, batch_size, data_collator, remove_unused_columns, weights_only, **kwargs)
260 save_directory.mkdir(parents=True, exist_ok=True)
262 if weights_only:
--> 263 self.model.model = nncf.compress_weights(self.model.model)
264 self.model.save_pretrained(save_directory)
265 return

File ~/ovnb/lib/python3.10/site-packages/nncf/quantization/quantize_model.py:294, in compress_weights(model, mode, ratio, group_size, ignored_scope)
291 return compress_weights_impl(model, mode, ratio, group_size, ignored_scope)
293 compression_algorithm = WeightCompression(mode, ratio, group_size, ignored_scope)
--> 294 graph = NNCFGraphFactory.create(model)
295 return compression_algorithm.apply(model, graph)

File ~/ovnb/lib/python3.10/site-packages/nncf/common/factory.py:45, in NNCFGraphFactory.create(model)
42 if model_backend == BackendType.OPENVINO:
43 from nncf.openvino.graph.nncf_graph_builder import GraphConverter
---> 45 return GraphConverter.create_nncf_graph(model)
46 if model_backend == BackendType.TORCH:
47 return model.nncf.get_graph()

File ~/ovnb/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py:197, in GraphConverter.create_nncf_graph(model)
194 layer_attributes = get_weighted_layer_attributes(node, metatype, const_attrs)
195 nncf_node.layer_attributes = OVLayerAttributes(const_attrs, layer_attributes, act_attrs)
--> 197 GraphConverter._add_edges_to_nncf_graph(model, nncf_graph)
198 return nncf_graph

File ~/ovnb/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py:100, in GraphConverter._add_edges_to_nncf_graph(model, graph)
98 output_node_id = graph.get_node_by_name(out_node.get_friendly_name()).node_id
99 ov_dtype = out.get_element_type().get_type_name()
--> 100 nncf_dtype = GraphConverter.convert_to_nncf_dtype(ov_dtype)
102 parallel_inputs = None
103 if len(inputs) > 1:

File ~/ovnb/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py:61, in GraphConverter.convert_to_nncf_dtype(ov_dtype)
45 conversion_map = {
46 "f16": "float",
47 "f32": "float",
(...)
58 "boolean": "int",
59 }
60 if ov_dtype not in conversion_map:
---> 61 raise NotImplementedError(f"NNCF is not yet supported OpenVINO data type: {ov_dtype}.")
62 return Dtype(conversion_map[ov_dtype])

NotImplementedError: NNCF is not yet supported OpenVINO data type: bf16.

@eaidova
Copy link
Contributor

eaidova commented Dec 6, 2023

should be resolved by merge huggingface/optimum-intel#484
please update used optimum-intel version

@avitial avitial added the bug Something isn't working label Dec 6, 2023
@KodiaqQ
Copy link

KodiaqQ commented Dec 8, 2023

Hi.
We've also prepared improvements in the NNCF for BF16 precision - openvinotoolkit/nncf#2307

@azhuvath
Copy link

azhuvath commented Mar 5, 2024

INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/torch/cuda/init.py:628: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
Compress to FP16
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.42it/s]
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/modeling_utils.py:4193: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the LlamaAttention class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the LlamaAttention class
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:1068: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.causal_mask.shape[-1]:
Export model to OpenVINO directly failed with:
Check 'is_conversion_successful' failed at src/frontends/pytorch/src/frontend.cpp:141:
FrontEnd API failed with OpConversionFailure:
Model wasn't fully converted. Failed operations detailed log:
-- aten::mul with a message:
Exception happened during conversion of operation __module.model/aten::mul with schema aten::mul.Tensor(Tensor self, Tensor other) -> Tensor
Check 'args_et.is_dynamic() || args_et != element::boolean' failed at src/core/src/op/util/binary_elementwise_arithmetic.cpp:25:
While validating node 'opset1::Multiply Multiply_218 (__module.model/aten::eq/Equal[0]:boolean[...], __module.model/aten::eq/Equal[0]:boolean[?,1,1,?]) -> (dynamic[...])' with friendly_name 'Multiply_218':
Arguments cannot have boolean element type (argument element type: boolean).

Summary:
-- Conversion is failed for: aten::mul
.
Model will be exported to ONNX
[ WARNING ] Making stateful models is not supported when exporting to ONNX as an intermediate step. A stateless model will be exported instead. It may result in sub-optimal inference performance.Provide a model that can be converted to OpenVINO without fallback to ONNX conversion path.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
Saving external data to one file...
Provided model does not contain state. It may lead to sub-optimal performance.Please reexport model with updated OpenVINO version >= 2023.3.0 calling the from_pretrained method with original model and export=True parameter
Compress to INT8
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.03it/s]
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/modeling_utils.py:4193: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:1068: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.causal_mask.shape[-1]:
Export model to OpenVINO directly failed with:
Check 'is_conversion_successful' failed at src/frontends/pytorch/src/frontend.cpp:141:
FrontEnd API failed with OpConversionFailure:
Model wasn't fully converted. Failed operations detailed log:
-- aten::mul with a message:
Exception happened during conversion of operation __module.model/aten::mul with schema aten::mul.Tensor(Tensor self, Tensor other) -> Tensor
Check 'args_et.is_dynamic() || args_et != element::boolean' failed at src/core/src/op/util/binary_elementwise_arithmetic.cpp:25:
While validating node 'opset1::Multiply Multiply_225119 (__module.model/aten::eq/Equal[0]:boolean[...], __module.model/aten::eq/Equal[0]:boolean[?,1,1,?]) -> (dynamic[...])' with friendly_name 'Multiply_225119':
Arguments cannot have boolean element type (argument element type: boolean).

Summary:
-- Conversion is failed for: aten::mul
.
Model will be exported to ONNX
[ WARNING ] Making stateful models is not supported when exporting to ONNX as an intermediate step. A stateless model will be exported instead. It may result in sub-optimal inference performance.Provide a model that can be converted to OpenVINO without fallback to ONNX conversion path.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
Saving external data to one file...
Applying Weight Compression ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 226/226 • 0:00:51 • 0:00:00
Provided model does not contain state. It may lead to sub-optimal performance.Please reexport model with updated OpenVINO version >= 2023.3.0 calling the from_pretrained method with original model and export=True parameter
Compress to INT4
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.19it/s]
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/modeling_utils.py:4193: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:1068: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > self.causal_mask.shape[-1]:
Export model to OpenVINO directly failed with:
Check 'is_conversion_successful' failed at src/frontends/pytorch/src/frontend.cpp:141:
FrontEnd API failed with OpConversionFailure:
Model wasn't fully converted. Failed operations detailed log:
-- aten::mul with a message:
Exception happened during conversion of operation __module.model/aten::mul with schema aten::mul.Tensor(Tensor self, Tensor other) -> Tensor
Check 'args_et.is_dynamic() || args_et != element::boolean' failed at src/core/src/op/util/binary_elementwise_arithmetic.cpp:25:
While validating node 'opset1::Multiply Multiply_433442 (__module.model/aten::eq/Equal[0]:boolean[...], __module.model/aten::eq/Equal[0]:boolean[?,1,1,?]) -> (dynamic[...])' with friendly_name 'Multiply_433442':
Arguments cannot have boolean element type (argument element type: boolean).

Summary:
-- Conversion is failed for: aten::mul
.
Model will be exported to ONNX
[ WARNING ] Making stateful models is not supported when exporting to ONNX as an intermediate step. A stateless model will be exported instead. It may result in sub-optimal inference performance.Provide a model that can be converted to OpenVINO without fallback to ONNX conversion path.
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> True
Saving external data to one file...
Provided model does not contain state. It may lead to sub-optimal performance.Please reexport model with updated OpenVINO version >= 2023.3.0 calling the from_pretrained method with original model and export=True parameter
Traceback (most recent call last):
File "/home/sdp/clabs/ov_conversion.py", line 227, in
convert_to_int4()
File "/home/sdp/clabs/ov_conversion.py", line 197, in convert_to_int4
ov_model = OVModelForCausalLM.from_pretrained(
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/optimum/modeling_base.py", line 399, in from_pretrained
return from_pretrained_method(
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/optimum/intel/openvino/modeling_decoder.py", line 283, in _from_transformers
return cls._from_pretrained(
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/optimum/intel/openvino/modeling_decoder.py", line 615, in _from_pretrained
_weight_only_quantization(causal_model, quantization_config)
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/optimum/intel/openvino/quantization.py", line 567, in _weight_only_quantization
model.model = nncf.compress_weights(
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/nncf/quantization/quantize_model.py", line 364, in compress_weights
graph = NNCFGraphFactory.create(model)
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/nncf/common/factory.py", line 43, in create
return GraphConverter.create_nncf_graph(model)
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py", line 217, in create_nncf_graph
GraphConverter._add_edges_to_nncf_graph(model, nncf_graph)
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py", line 100, in _add_edges_to_nncf_graph
nncf_dtype = GraphConverter.convert_to_nncf_dtype(ov_dtype)
File "/home/sdp/clabs/optimum_env/lib/python3.10/site-packages/nncf/openvino/graph/nncf_graph_builder.py", line 61, in convert_to_nncf_dtype
raise NotImplementedError(f"NNCF is not yet supported OpenVINO data type: {ov_dtype}.")
NotImplementedError: NNCF is not yet supported OpenVINO data type: bf16.

@azhuvath
Copy link

azhuvath commented Mar 5, 2024

should be resolved by merge huggingface/optimum-intel#484 please update used optimum-intel version

Doesn't look like this issue is fixed. Please refer to the logs above with latest Optimum library.

@canyurt7
Copy link

canyurt7 commented Apr 23, 2024

having the exact same issue when deploying Llama3 on CPU with OpenVino

Any update on this?

@KNChiu
Copy link

KNChiu commented Apr 25, 2024

When I using tiny-llama-1b have the same warnings.
optimum-cli export openvino --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --task text-generation-with-past --weight-format fp16 tiny-llama-1b-chat\FP16

@helena-intel
Copy link
Contributor

@KNChiu Can you upgrade dependencies with pip install --upgrade --upgrade-strategy eager optimum[openvino]? If it still doesn't work after that, can you create an issue in https://github.com/huggingface/optimum-intel/issues and copy the output from running that command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants