-
Notifications
You must be signed in to change notification settings - Fork 778
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added AWQ option to llm-chatbot notebook (#2043)
Add an option to run AWQ algorithm during INT4 model compression in `llm-chatbot` and `llm-rag-langchain` notebooks. Applying AWQ slightly improves model generation quality, but requires significant amount of additional memory and time so it is disabled by default. Some evaluation results are below. The wikitext task is considered as a more accurate one. If not stated explicitly, AWQ was calibrated on wikitext2 dataset. <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/nsavelye/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/nsavelye/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#467886" vlink="#96607D"> Model | Compression | PPL on lambada-openai | PPL on wikitext | Compression Time -- | -- | -- | -- | -- gemma-2b-it | FP16 | 8.23 | | gemma-2b-it | INT8_asym | 8.36 | | gemma-2b-it | INT4_sym group size 64 ratio 60% | 8.9 | | 59 sec. gemma-2b-it | INT4_sym group size 64 ratio 60% + AWQ | 8.62 | | 202 sec. | | | | llama-2-chat-7b | FP16 | 3.26 | 11.6 | llama-2-chat-7b | INT8_asym | 3.27 | 11.6 | llama-2-chat-7b | INT4_sym group size 128 ratio 80% | 3.38 | 11.95 | 215 sec. llama-2-chat-7b | INT4_sym group size 128 ratio 80% + AWQ (wikitext2) | 3.44 | 11.88 | 768 sec. llama-2-chat-7b | INT4_sym group size 128 ratio 80% + AWQ (ptb) | 3.42 | 11.87 | | | | | llama-3-8b-instruct | FP16 | 3.1 | | llama-3-8b-instruct | INT8_asym | 3.08 | | llama-3-8b-instruct | INT4_sym group size 128 ratio 80% | 3.38 | | 242 sec. llama-3-8b-instruct | INT4_sym group size 128 ratio 80% + AWQ | 3.26 | | 956 sec. </body> </html> **Ticket** 141233
- Loading branch information
1 parent
5c03914
commit 0238a6e
Showing
3 changed files
with
70 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters