Add support for converting a GPTQ model to 2:4 structured sparse marlin format for 4-bit and 8-bit #1

alexm-neuralmagic · 2024-04-15T12:54:22Z

This diffs adds the ability to convert a GPTQ model to the 2:4 structured sparse marlin format. This diff is intended to serve as reference for the MLE team to integrate their code (and not be landed on autogptq)

robertgshaw2-neuralmagic · 2024-05-21T08:33:00Z

@AniZpZ here's current hacked up script we have

Note: accuracy will not be great for post-training sparsity 2:4.

We have been investing in making sparse foundational models that have sparsity in them

see here for the research we are expanding up

AniZpZ · 2024-05-21T08:47:15Z

@AniZpZ here's current hacked up script we have

Note: accuracy will not be great for post-training sparsity 2:4.

We have been investing in making sparse foundational models that have sparsity in them

see here for the research we are expanding up

Much appreciated ! Is there a plan to make your research open source ?
I don't expect a good accuracy for post-training sparsity 2:4. Have you guys looked into other unstructured pruning methods or dynamic pruning methods?

robertgshaw2-neuralmagic · 2024-05-21T08:50:34Z

@AniZpZ here's current hacked up script we have
Note: accuracy will not be great for post-training sparsity 2:4.
We have been investing in making sparse foundational models that have sparsity in them
see here for the research we are expanding up

Much appreciated ! Is there a plan to make your research open source ? I don't expect a good accuracy for post-training sparsity 2:4. Have you guys looked into other unstructured pruning methods or dynamic pruning methods?

Yes: https://huggingface.co/collections/neuralmagic/sparse-foundational-llama-2-models-65f48cec6396309f02e74d21

2:4 models coming soon

We are currently focused on pruning Llama-3 on our cluster

We have done extensive research into pruning with our friends at IST.

@AniZpZ are you on discord?

AniZpZ · 2024-05-21T09:07:38Z

@AniZpZ here's current hacked up script we have
Note: accuracy will not be great for post-training sparsity 2:4.
We have been investing in making sparse foundational models that have sparsity in them
see here for the research we are expanding up

Much appreciated ! Is there a plan to make your research open source ? I don't expect a good accuracy for post-training sparsity 2:4. Have you guys looked into other unstructured pruning methods or dynamic pruning methods?

Yes: https://huggingface.co/collections/neuralmagic/sparse-foundational-llama-2-models-65f48cec6396309f02e74d21

2:4 models coming soon

We are currently focused on pruning Llama-3 on our cluster

We have done extensive research into pruning with our friends at IST.

@AniZpZ are you on discord?

Yes, how can I reach you guys on discord?

robertgshaw2-neuralmagic · 2024-05-21T09:18:40Z

@AniZpZ here's current hacked up script we have
Note: accuracy will not be great for post-training sparsity 2:4.
We have been investing in making sparse foundational models that have sparsity in them
see here for the research we are expanding up

Much appreciated ! Is there a plan to make your research open source ? I don't expect a good accuracy for post-training sparsity 2:4. Have you guys looked into other unstructured pruning methods or dynamic pruning methods?

Yes: https://huggingface.co/collections/neuralmagic/sparse-foundational-llama-2-models-65f48cec6396309f02e74d21

2:4 models coming soon

We are currently focused on pruning Llama-3 on our cluster
We have done extensive research into pruning with our friends at IST.
@AniZpZ are you on discord?

Yes, how can I reach you guys on discord?

Im on the vllm-server (https://discord.gg/3eCXvqVu) - robertgshaw-neural-magic

Send me a note

alexm-neuralmagic requested a review from robertgshaw2-neuralmagic April 15, 2024 12:57

alexm-neuralmagic self-assigned this Apr 15, 2024

Add marlin_24 conversion

8b9f032

alexm-neuralmagic force-pushed the marlin_24 branch from c7861d8 to 8b9f032 Compare April 24, 2024 16:29

alexm-neuralmagic changed the title ~~Add marlin_24 conversion (WIP)~~ Add support for converting a GPTQ model to 2:4 structured sparse marlin format Apr 24, 2024

alexm-neuralmagic added 2 commits April 24, 2024 16:36

format

62a691f

cleanups

b73f895

alexm-neuralmagic requested a review from mgoin April 24, 2024 16:59

add 8-bit support for marlin24 conversion

7676572

alexm-neuralmagic changed the title ~~Add support for converting a GPTQ model to 2:4 structured sparse marlin format~~ Add support for converting a GPTQ model to 2:4 structured sparse marlin format for 4-bit and 8-bit May 6, 2024

fix channelwise for marlin24

93ad956

mgoin mentioned this pull request Jun 5, 2024

[Usage]: Could we generate model for marlin2:4 kernel with sparsegpt? neuralmagic/nm-vllm#268

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for converting a GPTQ model to 2:4 structured sparse marlin format for 4-bit and 8-bit #1

Add support for converting a GPTQ model to 2:4 structured sparse marlin format for 4-bit and 8-bit #1

alexm-neuralmagic commented Apr 15, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented May 21, 2024 •

edited

Loading

AniZpZ commented May 21, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented May 21, 2024

AniZpZ commented May 21, 2024

robertgshaw2-neuralmagic commented May 21, 2024

Add support for converting a GPTQ model to 2:4 structured sparse marlin format for 4-bit and 8-bit #1

Are you sure you want to change the base?

Add support for converting a GPTQ model to 2:4 structured sparse marlin format for 4-bit and 8-bit #1

Conversation

alexm-neuralmagic commented Apr 15, 2024 • edited Loading

robertgshaw2-neuralmagic commented May 21, 2024 • edited Loading

AniZpZ commented May 21, 2024 • edited Loading

robertgshaw2-neuralmagic commented May 21, 2024

AniZpZ commented May 21, 2024

robertgshaw2-neuralmagic commented May 21, 2024

alexm-neuralmagic commented Apr 15, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented May 21, 2024 •

edited

Loading

AniZpZ commented May 21, 2024 •

edited

Loading