Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AutoRound quantization method for intel GPU #1428

Merged
merged 37 commits into from
Apr 2, 2024

Conversation

PenghuiCheng
Copy link
Collaborator

Type of Change

support auto-round quantization for intel GPU
No API changed

Description

support auto-round quantization for intel GPU

Expected Behavior & Potential Risk

Quantized model with the auto-round method.

How has this PR been tested?

Local tested

Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>
Copy link

github-actions bot commented Mar 27, 2024

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow
Check ID Status Error details
format-scan (pylint) success
format-scan (bandit) success
format-scan (cloc) success
format-scan (cpplint) success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/evaluator.py, intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Optimize Unit Test workflow
Check ID Status Error details
optimize-unit-test-baseline success
optimize-unit-test-PR-test success
Genreate-OptimizeUT-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/evaluator.py, intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 NeuralChat Unit Test
Check ID Status Error details
neuralchat-unit-test-baseline success
neuralchat-unit-test-PR-test success
Generate-NeuralChat-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/evaluator.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Engine Unit Test workflow
Check ID Status Error details
engine-unit-test-baseline success
engine-unit-test-PR-test success
Genreate-Engine-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/evaluator.py, intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Chat Bot Test workflow
Check ID Status Error details
call-inference-llama-2-7b-chat-hf / inference test success
call-inference-mpt-7b-chat / inference test success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/evaluator.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

VincyZhang and others added 5 commits March 27, 2024 14:27
Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>
Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>
@VincyZhang VincyZhang added the WIP label Mar 29, 2024
PenghuiCheng and others added 14 commits March 29, 2024 16:08
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
VincyZhang and others added 16 commits April 1, 2024 11:25
Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Wang, Chang <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>
@VincyZhang VincyZhang merged commit 7084e7f into main Apr 2, 2024
17 checks passed
@VincyZhang VincyZhang deleted the penghuic/support_autoaround_gpu branch April 2, 2024 05:56
@changwangss
Copy link
Collaborator

Intel GPU local validation pass.
itrex pr: #1428
inc pr: intel/neural-compressor#1710
python run_generation_gpu_woq.py --model /mnt/cached_oses/models/Mistral-7B-v0.1 --woq --woq_algo AutoRound --benchmark --calib_len 32 --calib_iter 1 --max_input_length 32 --nsamples 20
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants