Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for converting a GPTQ model to 2:4 structured sparse marlin format for 4-bit and 8-bit #1

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

alexm-neuralmagic
Copy link

@alexm-neuralmagic alexm-neuralmagic commented Apr 15, 2024

This diffs adds the ability to convert a GPTQ model to the 2:4 structured sparse marlin format. This diff is intended to serve as reference for the MLE team to integrate their code (and not be landed on autogptq)

@alexm-neuralmagic alexm-neuralmagic changed the title Add marlin_24 conversion (WIP) Add support for converting a GPTQ model to 2:4 structured sparse marlin format Apr 24, 2024
@alexm-neuralmagic alexm-neuralmagic changed the title Add support for converting a GPTQ model to 2:4 structured sparse marlin format Add support for converting a GPTQ model to 2:4 structured sparse marlin format for 4-bit and 8-bit May 6, 2024
@robertgshaw2-neuralmagic
Copy link
Collaborator

robertgshaw2-neuralmagic commented May 21, 2024

@AniZpZ here's current hacked up script we have

Note: accuracy will not be great for post-training sparsity 2:4.

We have been investing in making sparse foundational models that have sparsity in them

see here for the research we are expanding up

@AniZpZ
Copy link

AniZpZ commented May 21, 2024

@AniZpZ here's current hacked up script we have

Note: accuracy will not be great for post-training sparsity 2:4.

We have been investing in making sparse foundational models that have sparsity in them

see here for the research we are expanding up

Much appreciated ! Is there a plan to make your research open source ?
I don't expect a good accuracy for post-training sparsity 2:4. Have you guys looked into other unstructured pruning methods or dynamic pruning methods?

@robertgshaw2-neuralmagic
Copy link
Collaborator

@AniZpZ here's current hacked up script we have
Note: accuracy will not be great for post-training sparsity 2:4.
We have been investing in making sparse foundational models that have sparsity in them
see here for the research we are expanding up

Much appreciated ! Is there a plan to make your research open source ? I don't expect a good accuracy for post-training sparsity 2:4. Have you guys looked into other unstructured pruning methods or dynamic pruning methods?

Yes: https://huggingface.co/collections/neuralmagic/sparse-foundational-llama-2-models-65f48cec6396309f02e74d21

  • 2:4 models coming soon

We are currently focused on pruning Llama-3 on our cluster

We have done extensive research into pruning with our friends at IST.

@AniZpZ are you on discord?

@AniZpZ
Copy link

AniZpZ commented May 21, 2024

@AniZpZ here's current hacked up script we have
Note: accuracy will not be great for post-training sparsity 2:4.
We have been investing in making sparse foundational models that have sparsity in them
see here for the research we are expanding up

Much appreciated ! Is there a plan to make your research open source ? I don't expect a good accuracy for post-training sparsity 2:4. Have you guys looked into other unstructured pruning methods or dynamic pruning methods?

Yes: https://huggingface.co/collections/neuralmagic/sparse-foundational-llama-2-models-65f48cec6396309f02e74d21

  • 2:4 models coming soon

We are currently focused on pruning Llama-3 on our cluster

We have done extensive research into pruning with our friends at IST.

@AniZpZ are you on discord?

Yes, how can I reach you guys on discord?

@robertgshaw2-neuralmagic
Copy link
Collaborator

@AniZpZ here's current hacked up script we have
Note: accuracy will not be great for post-training sparsity 2:4.
We have been investing in making sparse foundational models that have sparsity in them
see here for the research we are expanding up

Much appreciated ! Is there a plan to make your research open source ? I don't expect a good accuracy for post-training sparsity 2:4. Have you guys looked into other unstructured pruning methods or dynamic pruning methods?

Yes: https://huggingface.co/collections/neuralmagic/sparse-foundational-llama-2-models-65f48cec6396309f02e74d21

  • 2:4 models coming soon

We are currently focused on pruning Llama-3 on our cluster
We have done extensive research into pruning with our friends at IST.
@AniZpZ are you on discord?

Yes, how can I reach you guys on discord?

Im on the vllm-server (https://discord.gg/3eCXvqVu) - robertgshaw-neural-magic

Send me a note

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants