-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for converting a GPTQ model to 2:4 structured sparse marlin format for 4-bit and 8-bit #1
base: main
Are you sure you want to change the base?
Conversation
c7861d8
to
8b9f032
Compare
Much appreciated ! Is there a plan to make your research open source ? |
We are currently focused on pruning Llama-3 on our cluster We have done extensive research into pruning with our friends at IST. @AniZpZ are you on discord? |
Yes, how can I reach you guys on discord? |
Im on the vllm-server (https://discord.gg/3eCXvqVu) - Send me a note |
This diffs adds the ability to convert a GPTQ model to the 2:4 structured sparse marlin format. This diff is intended to serve as reference for the MLE team to integrate their code (and not be landed on autogptq)