Add dynamic quantization config #661

echarlaix · 2024-04-15T12:41:53Z

Add dynamic quantization configuration

from optimum.intel import OVDynamicQuantizationConfig, OVModelForCausalLM

quantization_config = OVDynamicQuantizationConfig(bits=8, activations_group_size=32)
int8_model = OVModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)

HuggingFaceDocBuilderDev · 2024-04-15T12:47:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

echarlaix · 2024-04-16T12:26:14Z

optimum/intel/openvino/modeling_decoder.py

-        calibration_dataset = kwargs.get("calibration_dataset", None)
+


Removed calibration_dataset argument @nikita-savelyevv

optimum/intel/openvino/configuration.py

AlexKoff88 · 2024-04-18T06:35:23Z

@echarlaix, it looks like we will have two ways to enable weight-only quantization + dynamic quantization:

as you did it in the PR.
Quantize weights and pass OVConfig with runtime option that enables DQ.

Am I right here?

Two more things to note. This flow is still under development for GPU. It is also used along with 8-bit KV-cache quantization which performs pretty well and helps to reduce the memory footprint with almost no accuracy degradation.

nikita-savelyevv

Some minor comments

optimum/intel/openvino/configuration.py

optimum/intel/openvino/modeling_decoder.py

nikita-savelyevv · 2024-04-19T13:18:26Z

tests/openvino/test_quantization.py

-        model = model_cls.from_pretrained(
-            model_id,
-            export=True,
-            quantization_config=OVWeightQuantizationConfig(bits=4, sym=True, group_size=-1, ratio=0.8),
-            calibration_dataset=quantization_dataset,
-        )


With calibration_dataset removal from from_pretrained() arguments, the workflow such as here will no longer be available. I personally am ok with this, but as I understand this will limit capabilities of hybrid quantization in the future. Currently, string only dataset is enough for it, but from discussion we had with @l-bat this is not the only use case.

Possibly, we should rework the hybrid quantization workflow to be called only through OVQuantizer which accepts custom calibration dataset.

@AlexKoff88, what do you think?

echarlaix · 2024-04-19T13:24:26Z

@echarlaix, it looks like we will have two ways to enable weight-only quantization + dynamic quantization:
1. as you did it in the PR.

2. Quantize weights and pass OVConfig with runtime option that enables DQ.
Am I right here?

Two more things to note. This flow is still under development for GPU. It is also used along with 8-bit KV-cache quantization which performs pretty well and helps to reduce the memory footprint with almost no accuracy degradation.

Yes but would be in favor of using 1. in our examples / documentation to have the same API across all quantization strategies

AlexKoff88 · 2024-04-19T14:46:48Z

optimum/intel/openvino/modeling_base.py

+
+        q_config = self._openvino_config.quantization_config if self._openvino_config else None
+        if isinstance(q_config, OVDynamicQuantizationConfig):
+            self.ov_config["DYNAMIC_QUANTIZATION_GROUP_SIZE"] = str(q_config.activations_group_size)


@echarlaix, shall we turn 8-bit KV-cache quantization as well? It is essentially a per-token INT8 quantization and it is safe in terms of accuracy degradation?

echarlaix added 3 commits April 15, 2024 11:59

add warning

c16fba7

remove deprecated feature arg

5805bf6

add model arch

e9b6aa0

echarlaix added 11 commits April 15, 2024 14:51

rmeove calibration dataset argument

868848a

format

f8e513c

remove comments

9463607

minor

4fde5ad

fix ignore

f138a4d

fix

41c49a6

replace preset with sym for compatibility between configs

15f1104

format

675665e

add dynamic quantization

486e6d7

add dynamic config

ce66da3

remove test deprecated config parameter

d86de3a

echarlaix commented Apr 16, 2024

View reviewed changes

echarlaix changed the title ~~OV quantizer~~ Add dynamic quantization config Apr 16, 2024

echarlaix requested a review from AlexKoff88 April 16, 2024 12:51

fix

1e46ac9

echarlaix commented Apr 16, 2024

View reviewed changes

optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved

nikita-savelyevv reviewed Apr 18, 2024

View reviewed changes

optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved

optimum/intel/openvino/configuration.py Show resolved Hide resolved

optimum/intel/openvino/modeling_decoder.py Show resolved Hide resolved

echarlaix added 4 commits April 18, 2024 15:13

merge main in branch

15c1bf8

add bits and sym to base config

69c955b

fix

eb1a843

updated message

4471552

nikita-savelyevv reviewed Apr 19, 2024

View reviewed changes

echarlaix added 2 commits April 19, 2024 16:41

add config test

ec36742

format

985513e

AlexKoff88 reviewed Apr 19, 2024

View reviewed changes

echarlaix added 6 commits April 19, 2024 17:45

add kv cache precision

31b0460

format

3c12b86

add test

a4016b2

format

3504606

move compilation step

dbf77e4

set kv cache precision for seq2seq models

17debf6

AlexKoff88 approved these changes Apr 21, 2024

View reviewed changes

echarlaix marked this pull request as ready for review April 22, 2024 12:29

echarlaix merged commit a06522c into main Apr 22, 2024
12 checks passed

echarlaix deleted the ov-quantizer branch April 22, 2024 12:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dynamic quantization config #661

Add dynamic quantization config #661

echarlaix commented Apr 15, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 15, 2024

echarlaix Apr 16, 2024

AlexKoff88 commented Apr 18, 2024

nikita-savelyevv left a comment

nikita-savelyevv Apr 19, 2024

echarlaix commented Apr 19, 2024 •

edited

Loading

AlexKoff88 Apr 19, 2024 •

edited

Loading

		calibration_dataset = kwargs.get("calibration_dataset", None)

Add dynamic quantization config #661

Add dynamic quantization config #661

Conversation

echarlaix commented Apr 15, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Apr 15, 2024

echarlaix Apr 16, 2024

Choose a reason for hiding this comment

AlexKoff88 commented Apr 18, 2024

nikita-savelyevv left a comment

Choose a reason for hiding this comment

nikita-savelyevv Apr 19, 2024

Choose a reason for hiding this comment

echarlaix commented Apr 19, 2024 • edited Loading

AlexKoff88 Apr 19, 2024 • edited Loading

Choose a reason for hiding this comment

echarlaix commented Apr 15, 2024 •

edited

Loading

echarlaix commented Apr 19, 2024 •

edited

Loading

AlexKoff88 Apr 19, 2024 •

edited

Loading