Improved performance of decoders #354

AlexKoff88 · 2023-06-16T15:01:57Z

Improved performance of decoders: GPT-like, bloom, etc., and Seq2seq models.
Observed a significant speedup of long sequences, e.g. Dolly-3B, 500 tokens: +30% when running on a client CPU.

HuggingFaceDocBuilderDev · 2023-06-16T15:09:41Z

The documentation is not available anymore as the PR was closed or merged.

optimum/intel/openvino/modeling_decoder.py

echarlaix · 2023-06-19T13:24:50Z

optimum/intel/openvino/quantization.py

-        for i, data in enumerate(calibration_dataloader):
-            self.model.generate(**data, max_new_tokens=10)
+        for _, data in enumerate(calibration_dataloader):
+            self.model.generate(**data, max_new_tokens=100)


Is this modification added to reduce accuracy degradation resulting from quantization? If yes, what did you observe when varying this parameter ?

In the process, and will update a bit later.

echarlaix · 2023-06-19T13:41:05Z

optimum/intel/openvino/quantization.py

            def __getattr__(self, attr):
                if attr in self.__dict__:
                    return getattr(self, attr)
                return getattr(self.request, attr)

        self.model.request = InferRequestWrapper(self.model.request)
-        for i, data in enumerate(calibration_dataloader):
-            self.model.generate(**data, max_new_tokens=10)
+        for _, data in enumerate(calibration_dataloader):


Not related to the PR, but what do you think about uniformizing how quantization is applied on causal langage models depending on whether the user gives a torch.nn.Module or a OVBaseDecoderModel (the number of generation steps is currently not the same). We could also instantiate an OVModel in the from_pretrained method when the given model is a PreTrainedModel

It is hard to accomplish this with the current NNCF PTQ API implementation we have for PyTorch. I think we should deprecate PTQ for PyTorch at some point because it also introduces ambiguity for the user about what workflow to use for quantization.

optimum/intel/openvino/modeling_decoder.py

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

AlexKoff88 · 2023-06-20T15:19:25Z

I think it is ready for merge.

helena-intel

🚀 🔥

echarlaix

LGTM, thanks a lot @AlexKoff88

* Improved performance of decoders * Improved performance of Seq2seq models * Style * Adjusted quantization logic * Style * Temporal changes * Temporal * Make it working * Some improvements * Style * Update optimum/intel/openvino/modeling_decoder.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> --------- Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

AlexKoff88 added 3 commits June 16, 2023 18:08

Improved performance of decoders

2268315

Improved performance of Seq2seq models

98a2e42

Style

f44ec06

AlexKoff88 requested review from echarlaix and helena-intel June 16, 2023 15:01

AlexKoff88 added 3 commits June 19, 2023 09:35

Merge remote-tracking branch 'origin/main' into ak/decoder_performance

0e51edd

Adjusted quantization logic

8551fe5

Style

f74426c

helena-intel reviewed Jun 19, 2023

View reviewed changes

optimum/intel/openvino/modeling_decoder.py Show resolved Hide resolved

echarlaix approved these changes Jun 19, 2023

View reviewed changes

echarlaix reviewed Jun 19, 2023

View reviewed changes

optimum/intel/openvino/modeling_decoder.py Outdated Show resolved Hide resolved

AlexKoff88 and others added 6 commits June 20, 2023 09:07

Temporal changes

1c3acd0

Temporal

e2a1ecf

Make it working

9e03798

Some improvements

9ee8a54

Style

44ab81e

Update optimum/intel/openvino/modeling_decoder.py

a13cc3d

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

helena-intel approved these changes Jun 20, 2023

View reviewed changes

echarlaix approved these changes Jun 21, 2023

View reviewed changes

echarlaix merged commit c56d3b4 into main Jun 21, 2023
12 checks passed

echarlaix deleted the ak/decoder_performance branch June 21, 2023 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved performance of decoders #354

Improved performance of decoders #354

AlexKoff88 commented Jun 16, 2023

HuggingFaceDocBuilderDev commented Jun 16, 2023 •

edited

Loading

echarlaix Jun 19, 2023

AlexKoff88 Jun 19, 2023

echarlaix Jun 19, 2023

AlexKoff88 Jun 20, 2023

AlexKoff88 commented Jun 20, 2023

helena-intel left a comment

echarlaix left a comment

Improved performance of decoders #354

Improved performance of decoders #354

Conversation

AlexKoff88 commented Jun 16, 2023

HuggingFaceDocBuilderDev commented Jun 16, 2023 • edited Loading

echarlaix Jun 19, 2023

Choose a reason for hiding this comment

AlexKoff88 Jun 19, 2023

Choose a reason for hiding this comment

echarlaix Jun 19, 2023

Choose a reason for hiding this comment

AlexKoff88 Jun 20, 2023

Choose a reason for hiding this comment

AlexKoff88 commented Jun 20, 2023

helena-intel left a comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 16, 2023 •

edited

Loading