-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored exl2 method to add LoRA, 8bit cache, and other features supported by exllama #729
Conversation
Great! Is this ready for review? |
Updates: Added LoRA support. LoRAs can now be hot-swapped dynamically as needed. Here is an example how to use the LoRA feature from outlines import models, generate
model = models.exl2(model_path = "/path/to/mistral_openorca", max_seq_len=8192, device = "cuda", gpu_split = "auto", verbose=True)
generator = generate.text(model)
answer = generator("Почему трава зеленая?", max_tokens=100)
print(answer)
model.update_lora("/path/to/russian_openorca")
generator = generate.text(model)
answer = generator("Почему трава зеленая?", max_tokens=100)
print(answer)
model.update_lora(None)
generator = generate.text(model)
answer = generator("Почему трава зеленая?", max_tokens=100)
print(answer) This is a demonstration showing the new loading/unloading capabilities. The following models/adapters were used in this demo Model: Adapter: |
…h without inputting a device.
This is really awesome! Let me know when I can review |
Yeah the inputs were messed up. I put the try/except input block inside update LoRA. It may be a little slower to start, but the subsequent ones still load in 0.0s. The latest update will fix the error |
Please feel free to review this branch. It is ready |
…h without inputting a device.
I can make all these changes and push asap |
I pushed the changes and updated my branch. There are some issues with my implementation and the recent regex changes pushed last week. A week ago this code was able to run without errors
However now I receive the following error
|
Can you clear the cache using |
I don't have this problem locally, so it must be the cache. I still need to try the lora hotswapping functionality, will take a look tomorrow and hopefully merge this. |
Great work, thank you! |
Refactored the exl2 function in exllamav2.py.
The new version offers the following benefits:
Future effort.