Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce OVQuantizationConfig for nncf.quantize() parameters #638

Merged

Conversation

nikita-savelyevv
Copy link
Contributor

@nikita-savelyevv nikita-savelyevv commented Mar 27, 2024

What does this PR do?

  • Existing OVWeightQuantizationConfig is now intended only for weight compression and the new OVQuantizationConfig is used for full quantization.
  • Parameters for inner nncf.quantize() call should from now on be given by providing an instance of OVQuantizationConfig to OVQuantizer.quantize() via OVConfig.
  • The config objects are serializable, but the saved quantization config is intended only for information purposes. It can't be loaded and used for quantization again. This is because it may contain some custom objects, like instances of nncf.Dataset or transformers.PreTrainedTokenizer, which can't be serialized so they are omitted during serialization.
  • The previously used DEFAULT_QUANTIZATION_CONFIG is moved to trainer.py where its only usage currently is.
  • Added a test for quantization config serialization. Updated all other quantization tests to be aligned with the new format.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines -143 to -157
def _enable_standard_onnx_export_option(self):
# This method depends on self.save_onnx_model.
# save_onnx_model is defaulted to false so that the final model output is
# in OpenVINO IR to realize performance benefit in OpenVINO runtime.
# True value of save_onnx_model will save a model in onnx format.
if (
isinstance(self.compression, dict)
and "algorithm" in self.compression
and self.compression["algorithm"] == "quantization"
):
self.compression["export_to_onnx_standard_ops"] = self.save_onnx_model
elif isinstance(self.compression, list):
for i, algo_config in enumerate(self.compression):
if algo_config["algorithm"] == "quantization":
self.compression[i]["export_to_onnx_standard_ops"] = self.save_onnx_model
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this logic directly to trainer.py

@@ -179,22 +217,24 @@ class OVWeightQuantizationConfig(QuantizationConfigMixin):
using the [`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`.
dataset (`str or List[str]`, *optional*):
The dataset used for data-aware compression or quantization with NNCF. You can provide your own dataset
in a list of strings or just use the one from the list ['wikitext2','c4','c4-new','ptb','ptb-new'] for LLLMs
in a list of strings or just use the one from the list ['wikitext','c4','c4-new','ptb','ptb-new'] for LLLMs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no dataset with id "wikitext2"

optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved
Comment on lines +695 to +697
advanced_parameters=nncf.AdvancedQuantizationParameters(
smooth_quant_alphas=AdvancedSmoothQuantParameters(matmul=-1)
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a small bug here

Comment on lines 121 to 124
# Verify that the configuration is correctly saved and loaded
loaded_config = OVConfig.from_pretrained(tmp_dir)
self.assertEqual(ov_config.quantization_config.to_dict(), loaded_config.quantization_config)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brought back after #630

@nikita-savelyevv nikita-savelyevv marked this pull request as ready for review April 4, 2024 09:17
@nikita-savelyevv
Copy link
Contributor Author

@echarlaix @AlexKoff88 PR is ready for review

Copy link
Collaborator

@AlexKoff88 AlexKoff88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly agree with the changes. @echarlaix, please take a look and merge if it is ok.

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, thanks @nikita-savelyevv

optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved
optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved
optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved
optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved
optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved
optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved
optimum/intel/openvino/quantization.py Show resolved Hide resolved
optimum/intel/openvino/quantization.py Outdated Show resolved Hide resolved
optimum/intel/openvino/quantization.py Outdated Show resolved Hide resolved
@nikita-savelyevv nikita-savelyevv force-pushed the introduce-ov-quantization-config branch 3 times, most recently from 99d87f9 to f7fa3a1 Compare April 11, 2024 11:14
@nikita-savelyevv
Copy link
Contributor Author

The failed test is not caused by my changes

@nikita-savelyevv
Copy link
Contributor Author

@echarlaix please review after addressed changes

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @nikita-savelyevv

Comment on lines +348 to +352
if weight_only is True:
logger.warning(
"Trying to create an instance of `OVQuantizationConfig` with `weight_only` being True. "
"Please check your configuration."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed ?

Suggested change
if weight_only is True:
logger.warning(
"Trying to create an instance of `OVQuantizationConfig` with `weight_only` being True. "
"Please check your configuration."
)

Copy link
Contributor Author

@nikita-savelyevv nikita-savelyevv Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just to make sure that nobody creates the config with the wrong weight_only property, e.g. OVWeightQuantizationConfig(weight_only=False) by accident. This is not strictly required. I'll remove this part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered the reason this is needed. In the scenario when someone provides quantization config as dictionary instead of a Config object, there is some logic that tries to infer the type of the config based on the provided keys. However, this is not 100% reliable, so there's an additional flag to distinguish between weight-only and full quantization.

For example, if config is given as dict(ignored_scope={"names": ["op_name"]}) than it's ambiguous which kind of quantization is intended because both kinds accept an ignored scope parameter. By default in such case OVWeightQuantizationConfig will be created and a warning will be given.

To run full quantization with only ignored scope given, it's required to provide config as dict(weight_only=False, ignored_scope={"names": ["op_name"]}). In such case an instance of OVQuantizationConfig will be built and full quantization is run.

The warning you mention here happens when for example the config is given as dict(bits=8, sym=True, weight_only=False) which is confusing since parameters for weight only quantization are given, but weight_only hints at that the full quantization is intended. In such case, weight only quantization will be executed nevertheless, but the warning is given to avoid such confusion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I think it would make sense to always create an instance of OVWeightQuantizationConfig by default and remove this parameter

optimum/intel/openvino/modeling_base.py Outdated Show resolved Hide resolved
@@ -572,7 +573,8 @@ def _from_pretrained(
from_onnx: bool = False,
local_files_only: bool = False,
load_in_8bit: bool = False,
quantization_config: Union[OVWeightQuantizationConfig, Dict] = None,
quantization_config: Optional[Union[OVWeightQuantizationConfig, Dict]] = None,
calibration_dataset: Optional[nncf.Dataset] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer to not include a calibration_dataset argument and only support a subset of calibration dataset via the quantization_config when loading a model with the from_pretrained method (and leave the possibility to give any calibration_dataset when applying quantization with the OVQuantizer), what do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. There's currently a scenario where custom dataset is provided to .from_pretrained()
method via config: https://github.com/huggingface/optimum-intel/blob/main/tests/openvino/test_quantization.py#L453

Since we decided that .dataset property of the config will now contain only string typed values, it'll look kind of hacky to keep it this way.

How about I remove explicit definition of calibration_dataset argument from .from_pretrained signature, but extract it from **kwargs there? This's still shady, but IMO it's better than passing it through the .dataset property of the config. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for custom dataset we should provide support through the OVQuantizer and not when loading the model with from_pretrained, will open a following PR to add this

@echarlaix echarlaix merged commit ff5d185 into huggingface:main Apr 15, 2024
9 of 10 checks passed
eaidova pushed a commit to openvinotoolkit/openvino_notebooks that referenced this pull request Apr 17, 2024
Recently, there were some [changes
](huggingface/optimum-intel#638 to how NNCF
quantization is called via optimum-intel. This PR updates the
quantization use cases to how they are currently intended to be called.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants