Added AWQ option to llm-chatbot notebook #2043

nikita-savelyevv · 2024-05-24T08:28:42Z

Add an option to run AWQ algorithm during INT4 model compression in llm-chatbot and llm-rag-langchain notebooks. Applying AWQ slightly improves model generation quality, but requires significant amount of additional memory and time so it is disabled by default.

Some evaluation results are below. The wikitext task is considered as a more accurate one. If not stated explicitly, AWQ was calibrated on wikitext2 dataset.

Model	Compression	PPL on lambada-openai	PPL on wikitext	Compression Time
gemma-2b-it	FP16	8.23
gemma-2b-it	INT8_asym	8.36
gemma-2b-it	INT4_sym group size 64 ratio 60%	8.9		59 sec.
gemma-2b-it	INT4_sym group size 64 ratio 60% + AWQ	8.62		202 sec.

llama-2-chat-7b	FP16	3.26	11.6
llama-2-chat-7b	INT8_asym	3.27	11.6
llama-2-chat-7b	INT4_sym group size 128 ratio 80%	3.38	11.95	215 sec.
llama-2-chat-7b	INT4_sym group size 128 ratio 80% + AWQ (wikitext2)	3.44	11.88	768 sec.
llama-2-chat-7b	INT4_sym group size 128 ratio 80% + AWQ (ptb)	3.42	11.87

llama-3-8b-instruct	FP16	3.1
llama-3-8b-instruct	INT8_asym	3.08
llama-3-8b-instruct	INT4_sym group size 128 ratio 80%	3.38		242 sec.
llama-3-8b-instruct	INT4_sym group size 128 ratio 80% + AWQ	3.26		956 sec.

Ticket
141233

review-notebook-app · 2024-05-24T08:28:47Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

andreyanufr

Maybe it’s worth writing on the web page exactly what precision of the model is used for chat.

eaidova · 2024-06-10T06:07:33Z

notebooks/llm-chatbot/llm-chatbot.ipynb

@@ -32,6 +32,7 @@
    "- [Convert model using Optimum-CLI tool](#Convert-model-using-Optimum-CLI-tool)\n",


Please high-light note about unapplicability AWQ and could you please provide details what do you mean under skip (there will be some warning message, there is explicit configuration that skip it or something else?).

Also please provide details about used dataset

Reply via ReviewNB

Highlighted the note about skipping of the algorithm and added information on which dataset is used for calibration.

When AWQ is skipped, there will be an NNCF INFO level log message: "No matching patterns were found for applying AWQ algorithm, it will be skipped."

…lyevv/openvino_notebooks into llm-int4-conversion-with-awq

nikita-savelyevv force-pushed the llm-int4-conversion-with-awq branch from e8b8342 to 3e85e8c Compare May 24, 2024 08:31

Added AWQ option to llm-chatbot notebook

1750bef

nikita-savelyevv force-pushed the llm-int4-conversion-with-awq branch from 3e85e8c to 1750bef Compare May 24, 2024 08:35

nikita-savelyevv added 6 commits May 24, 2024 10:43

Spellcheck

8b8fa89

Adopt changes to rag notebook

28d9eb6

Tweak

1d852e7

Merge branch 'latest' into llm-int4-conversion-with-awq

6e77c82

Tweak description

5ca6c70

Revert misc. changes

47a4171

nikita-savelyevv marked this pull request as ready for review June 7, 2024 07:47

Merge branch 'latest' into llm-int4-conversion-with-awq

8cea7e4

nikita-savelyevv requested review from eaidova, AlexKoff88, andreyanufr and MaximProshin June 7, 2024 07:47

andreyanufr approved these changes Jun 7, 2024

View reviewed changes

eaidova reviewed Jun 10, 2024

View reviewed changes

nikita-savelyevv added 3 commits June 10, 2024 14:47

Add notes

be079ac

Merge branch 'llm-int4-conversion-with-awq' of github.com:nikita-save…

cfa4661

…lyevv/openvino_notebooks into llm-int4-conversion-with-awq

Added word to wordlist

01d69a3

eaidova approved these changes Jun 11, 2024

View reviewed changes

eaidova merged commit 0238a6e into openvinotoolkit:latest Jun 11, 2024
5 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added AWQ option to llm-chatbot notebook #2043

Added AWQ option to llm-chatbot notebook #2043

nikita-savelyevv commented May 24, 2024 •

edited

Loading

review-notebook-app bot commented May 24, 2024

andreyanufr left a comment

eaidova Jun 10, 2024 •

edited

Loading

nikita-savelyevv Jun 10, 2024

		@@ -32,6 +32,7 @@
		"- [Convert model using Optimum-CLI tool](#Convert-model-using-Optimum-CLI-tool)\n",

Added AWQ option to llm-chatbot notebook #2043

Added AWQ option to llm-chatbot notebook #2043

Conversation

nikita-savelyevv commented May 24, 2024 • edited Loading

review-notebook-app bot commented May 24, 2024

andreyanufr left a comment

Choose a reason for hiding this comment

eaidova Jun 10, 2024 • edited Loading

Choose a reason for hiding this comment

nikita-savelyevv Jun 10, 2024

Choose a reason for hiding this comment

nikita-savelyevv commented May 24, 2024 •

edited

Loading

eaidova Jun 10, 2024 •

edited

Loading