This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Contributor

Zhenzhong1 commented May 30, 2024 •

edited

Loading

Type of Change

Added customized config & updated benchmark script
Fixed the vLLM-qbits 1024 in,32 out next token issues

Description

vLLM perf:

Expected Behavior & Potential Risk

N/A

How has this PR been tested?

Manually profiling.

Dependency Change?

N/A

Zhenzhong1 and others added 3 commits

May 28, 2024 00:20


          add customized config & update benchmark script

7fd3a05


          fixed the perf issue


          [pre-commit.ci] auto fixes from pre-commit.com hooks

b0dc757

for more information, see https://pre-commit.ci

Zhenzhong1 changed the title ~~add customized config & update benchmark script~~ [Bug Fixed] QBits Perf Enhence

Zhenzhong1 added 3 commits

May 30, 2024 03:33


          Changed the default RTNConfig

b45b604


          Modify the script

4511b2b


          Modify the script

d9ae540

Zhenzhong1 marked this pull request as ready for review

May 30, 2024 10:39

Zhenzhong1 requested a review from PenghuiCheng as a code owner

May 30, 2024 10:39

github-actions bot commented May 30, 2024 •

edited

Loading

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow

Check ID	Status	Error details
format-scan (pylint)	success		✅
format-scan (bandit)	success		✅
format-scan (cloc)	success		✅
format-scan (cpplint)	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py.

🟢 Optimize Unit Test workflow

Check ID	Status	Error details
optimize-unit-test-baseline	success		✅
optimize-unit-test-PR-test	success		✅
Genreate-OptimizeUT-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py.

🟢 NeuralChat Unit Test

Check ID	Status	Error details
neuralchat-unit-test-baseline	success		✅
neuralchat-unit-test-PR-test	success		✅
Generate-NeuralChat-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py.

🟢 Engine Unit Test workflow

Check ID	Status	Error details
engine-unit-test-baseline	success		✅
engine-unit-test-PR-test	success		✅
Genreate-Engine-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py.

🟢 Chat Bot Test workflow

Check ID	Status	Error details
call-inference-llama-2-7b-chat-hf / inference test	success		✅
call-inference-mpt-7b-chat / inference test	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

Zhenzhong1 requested review from changwangss and kevinintel

May 30, 2024 10:40


          Merge branch 'main' into zhenzhong/vllm_gptq

8635d57

Zhenzhong1 changed the title ~~[Bug Fixed] QBits Perf Enhence~~ [vLLM] QBits Perf Enhence

Zhenzhong1 requested review from a32543254 and zhewang1-intc

May 31, 2024 01:27

changwangss approved these changes

View reviewed changes

zhewang1-intc approved these changes

View reviewed changes

intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py Outdated Show resolved Hide resolved


          Update modules.py

7812d8a

Signed-off-by: Zhenzhong1 <109137058+Zhenzhong1@users.noreply.github.com>

Contributor Author

Zhenzhong1 commented Jun 3, 2024

@XuehaoSun Hi, if this Optimize Unit Test CI issue fixed, please let me know and re-run the CI. Thx.

Contributor Author

Zhenzhong1 commented Jun 4, 2024

ready for merge.


          Merge branch 'main' into zhenzhong/vllm_gptq

7947fb6

kevinintel merged commit 2ebe14d into main

kevinintel deleted the zhenzhong/vllm_gptq branch

June 5, 2024 01:41

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Reviewers

PenghuiCheng Awaiting requested review from PenghuiCheng

kevinintel Awaiting requested review from kevinintel

a32543254 Awaiting requested review from a32543254

2 more reviewers

changwangss changwangss approved these changes

zhewang1-intc zhewang1-intc approved these changes

Labels

None yet