Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistral Support #81

Open
fakerybakery opened this issue Jan 29, 2024 · 5 comments
Open

Mistral Support #81

fakerybakery opened this issue Jan 29, 2024 · 5 comments

Comments

@fakerybakery
Copy link

Hi,
Thanks for releasing this work! Are there any plans to release a Mistral version?
Thanks!

@nailimixaM
Copy link
Collaborator

Hi, Thanks for releasing this work! Are there any plans to release a Mistral version? Thanks!

Hi! Yes, Mistral 7B is on our radar but we don't have an implementation for it yet. Our adapter classes should make it straightforward to add any HF model, if you're up for contributing?

@kno10
Copy link

kno10 commented Jan 30, 2024

In particular Mixtral (with x, the mixture of experts version) could benefit a lot from this.
At 47B parameters it is slightly too large for 80 GB with bfloat16.
Reducing this even slightly to fit on a single 80 GB GPU would effectively reduce the cost to operate this by half, and likely reduce latency, too?

As Mistral and Mixtral are Apache licenced, you could share smaller sliced versions.

@nailimixaM
Copy link
Collaborator

In particular Mixtral (with x, the mixture of experts version) could benefit a lot from this. At 47B parameters it is slightly too large for 80 GB with bfloat16. Reducing this even slightly to fit on a single 80 GB GPU would effectively reduce the cost to operate this by half, and likely reduce latency, too?

As Mistral and Mixtral are Apache licenced, you could share smaller sliced versions.

Great suggestion, for MoEs we need to modify the method slightly to account for the different architecture - they won't work out of the box with our current adapters. The computational invariance on which SliceGPT relies still applies though, so they should be sliceable.

@noah-kim-theori
Copy link

noah-kim-theori commented Feb 7, 2024

I write mixtral implementation of slicegpt. Here is my forked repository, https://github.com/noah-kim-theori/TransformerCompression, experiments/run_mixtral_slice.py. Feel free to use it.

@nailimixaM
Copy link
Collaborator

I write mixtral implementation of slicegpt. Here is my forked repository, https://github.com/noah-kim-theori/TransformerCompression, experiments/run_mixtral_slice.py. Feel free to use it.

Amazing, nice work @noah-kim-theori! Could you share some perplexity and zero-shot accuracies of a sliced mixtral at e.g. 25% slicing vs dense? run_slicegpt_perplexity.py and run_zero_shot_tasks.py with default values would be great. That should show that SliceGPT is working as expected. Assuming that works we'd welcome a PR adding mixtral to the repo 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants