Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Intel® Neural Compressor 4-bits weight-only quantization and add related example #614

Merged
merged 2 commits into from
Sep 28, 2023

Conversation

yuwenzho
Copy link
Contributor

Describe your changes

Support 4-bits weight-only quantization with Intel® Neural Compressor and add related example.

As large language models (LLMs) become more prevalent, there is a growing need for new and improved quantization methods that can meet the computational demands of these modern architectures while maintaining the accuracy. Compared to normal quantization like W8A8, weight only quantization (WOQ) is probably a better trade-off to balance the performance and the accuracy.

Two weight only algorithms are provided in this PR. Round-to-nearest (RTN) is the most straightforward way to quantize weight using scale maps. GPTQ algorithm provides more accurate quantization but requires more computational resources.

More detials please refer to this link

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Format your code by running pre-commit run --all-files
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lintrunner found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
@guotuofeng
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@guotuofeng guotuofeng merged commit 6ea3e72 into microsoft:main Sep 28, 2023
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants