Mistral Support #81

fakerybakery · 2024-01-29T18:58:36Z

Hi,
Thanks for releasing this work! Are there any plans to release a Mistral version?
Thanks!

nailimixaM · 2024-01-29T21:29:51Z

Hi, Thanks for releasing this work! Are there any plans to release a Mistral version? Thanks!

Hi! Yes, Mistral 7B is on our radar but we don't have an implementation for it yet. Our adapter classes should make it straightforward to add any HF model, if you're up for contributing?

kno10 · 2024-01-30T09:33:43Z

In particular Mixtral (with x, the mixture of experts version) could benefit a lot from this.
At 47B parameters it is slightly too large for 80 GB with bfloat16.
Reducing this even slightly to fit on a single 80 GB GPU would effectively reduce the cost to operate this by half, and likely reduce latency, too?

As Mistral and Mixtral are Apache licenced, you could share smaller sliced versions.

nailimixaM · 2024-01-30T14:25:28Z

In particular Mixtral (with x, the mixture of experts version) could benefit a lot from this. At 47B parameters it is slightly too large for 80 GB with bfloat16. Reducing this even slightly to fit on a single 80 GB GPU would effectively reduce the cost to operate this by half, and likely reduce latency, too?

As Mistral and Mixtral are Apache licenced, you could share smaller sliced versions.

Great suggestion, for MoEs we need to modify the method slightly to account for the different architecture - they won't work out of the box with our current adapters. The computational invariance on which SliceGPT relies still applies though, so they should be sliceable.

noah-kim-theori · 2024-02-07T05:01:10Z

I write mixtral implementation of slicegpt. Here is my forked repository, https://github.com/noah-kim-theori/TransformerCompression, experiments/run_mixtral_slice.py. Feel free to use it.

nailimixaM · 2024-02-07T15:55:19Z

I write mixtral implementation of slicegpt. Here is my forked repository, https://github.com/noah-kim-theori/TransformerCompression, experiments/run_mixtral_slice.py. Feel free to use it.

Amazing, nice work @noah-kim-theori! Could you share some perplexity and zero-shot accuracies of a sliced mixtral at e.g. 25% slicing vs dense? run_slicegpt_perplexity.py and run_zero_shot_tasks.py with default values would be great. That should show that SliceGPT is working as expected. Assuming that works we'd welcome a PR adding mixtral to the repo 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral Support #81

Mistral Support #81

fakerybakery commented Jan 29, 2024

nailimixaM commented Jan 29, 2024

kno10 commented Jan 30, 2024 •

edited

nailimixaM commented Jan 30, 2024

noah-kim-theori commented Feb 7, 2024 •

edited

nailimixaM commented Feb 7, 2024

Mistral Support #81

Mistral Support #81

Comments

fakerybakery commented Jan 29, 2024

nailimixaM commented Jan 29, 2024

kno10 commented Jan 30, 2024 • edited

nailimixaM commented Jan 30, 2024

noah-kim-theori commented Feb 7, 2024 • edited

nailimixaM commented Feb 7, 2024

kno10 commented Jan 30, 2024 •

edited

noah-kim-theori commented Feb 7, 2024 •

edited