Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Conversation

@Zhenzhong1
Copy link
Contributor

@Zhenzhong1 Zhenzhong1 commented May 30, 2024

Type of Change

  • Added customized config & updated benchmark script
  • Fixed the vLLM-qbits 1024 in,32 out next token issues

Description

vLLM perf:
image

image

Expected Behavior & Potential Risk

N/A

How has this PR been tested?

Manually profiling.

Dependency Change?

N/A

@Zhenzhong1 Zhenzhong1 changed the title add customized config & update benchmark script [Bug Fixed] QBits Perf Enhence May 30, 2024
@Zhenzhong1 Zhenzhong1 marked this pull request as ready for review May 30, 2024 10:39
@Zhenzhong1 Zhenzhong1 requested a review from PenghuiCheng as a code owner May 30, 2024 10:39
@github-actions
Copy link

github-actions bot commented May 30, 2024

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow
Check ID Status Error details
format-scan (pylint) success
format-scan (bandit) success
format-scan (cloc) success
format-scan (cpplint) success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py.

🟢 Optimize Unit Test workflow
Check ID Status Error details
optimize-unit-test-baseline success
optimize-unit-test-PR-test success
Genreate-OptimizeUT-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py.

🟢 NeuralChat Unit Test
Check ID Status Error details
neuralchat-unit-test-baseline success
neuralchat-unit-test-PR-test success
Generate-NeuralChat-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py.

🟢 Engine Unit Test workflow
Check ID Status Error details
engine-unit-test-baseline success
engine-unit-test-PR-test success
Genreate-Engine-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py.

🟢 Chat Bot Test workflow
Check ID Status Error details
call-inference-llama-2-7b-chat-hf / inference test success
call-inference-mpt-7b-chat / inference test success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

@Zhenzhong1 Zhenzhong1 changed the title [Bug Fixed] QBits Perf Enhence [vLLM] QBits Perf Enhence May 30, 2024
Signed-off-by: Zhenzhong1 <109137058+Zhenzhong1@users.noreply.github.com>
@Zhenzhong1
Copy link
Contributor Author

@XuehaoSun Hi, if this Optimize Unit Test CI issue fixed, please let me know and re-run the CI. Thx.

@Zhenzhong1
Copy link
Contributor Author

ready for merge.

@kevinintel kevinintel merged commit 2ebe14d into main Jun 5, 2024
@kevinintel kevinintel deleted the zhenzhong/vllm_gptq branch June 5, 2024 01:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants