-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added AWQ option to llm-chatbot notebook #2043
Added AWQ option to llm-chatbot notebook #2043
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
e8b8342
to
3e85e8c
Compare
3e85e8c
to
1750bef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it’s worth writing on the web page exactly what precision of the model is used for chat.
@@ -32,6 +32,7 @@ | |||
"- [Convert model using Optimum-CLI tool](#Convert-model-using-Optimum-CLI-tool)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please high-light note about unapplicability AWQ and could you please provide details what do you mean under skip (there will be some warning message, there is explicit configuration that skip it or something else?).
Also please provide details about used dataset
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighted the note about skipping of the algorithm and added information on which dataset is used for calibration.
When AWQ is skipped, there will be an NNCF INFO level log message: "No matching patterns were found for applying AWQ algorithm, it will be skipped."
…lyevv/openvino_notebooks into llm-int4-conversion-with-awq
Add an option to run AWQ algorithm during INT4 model compression in
llm-chatbot
andllm-rag-langchain
notebooks. Applying AWQ slightly improves model generation quality, but requires significant amount of additional memory and time so it is disabled by default.Some evaluation results are below. The wikitext task is considered as a more accurate one. If not stated explicitly, AWQ was calibrated on wikitext2 dataset.
Ticket
141233