-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove non-HF ExLlamaV2 loader #5431
Conversation
Won't this cause problems for #5375 ? |
It's not true that there is zero speed difference. Non-HF loader is around 10% faster for goliath-120b. |
Quick bench using ooba from before this commit and exllamav2 master branch from 5 minutes ago on runpod A100 80GB. HF:
Non-HF:
|
Why is this important, if the builtin sampling in exllamav2 works fine? |
Thanks for restoring it! |
This reverts commit cde000d.
Since PR #4814, the speed difference between
ExLlamav2
andExLlamav2_HF
is zero. So I see no point in keeping the non-HF version, which is redundant and which samples in a way not guaranteed to be consistent with HF transformers sampling.