You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, the more quantization methods we support the better. But notably, quanto also works with MPS, which distinguishes it from other quantization methods.
Your contribution
I did some preliminary testing and partly, quanto already works with PEFT, as the QLinear layer is a subclass of nn.Linear and as such, lora.Linear is applied. Some features like inference appear to work already. However, some features don't work correctly, like merging. Here is a very quick test:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Feature request
Let's add a new quantization method to LoRA, namely optimum-quanto.
There is some more context in this diffusers issue.
Motivation
First of all, the more quantization methods we support the better. But notably, quanto also works with MPS, which distinguishes it from other quantization methods.
Your contribution
I did some preliminary testing and partly, quanto already works with PEFT, as the
QLinear
layer is a subclass ofnn.Linear
and as such,lora.Linear
is applied. Some features like inference appear to work already. However, some features don't work correctly, like merging. Here is a very quick test:Note that all the outputs involving merging are not as expected.
I can certainly take this when I have time but contributions are highly welcome. For inspiration, check out past PRs that add new quantization methods.
The text was updated successfully, but these errors were encountered: