-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to support GPTQ triton commit c90adef #1229
Conversation
Runs on my machine, but the new GPTQ-for-LLaMA code gives garbage output. Seems to either pick a token at random and just spam it endlessly, or start spewing irrelevant nonsense code, or with some luck it might generate a vaguely sensical paragraph in English that has nothing to do with the input. Additionally, there's still the same significant memory usage increase from as in #1073 However, if I |
What's your gpu? I have a 2080 Ti that
|
3090 in my case. If I run
That first run was...almost sensical. |
Update: I find qwopqwop200/GPTQ-for-LLaMa@47dd6b3 breaks |
@oobabooga I think this pr is ready to go. It allows users to use the latest triton branch while giving them choices to disable specific functionalities. |
Thanks for the confirmation, @sgsdxzy. I'll merge now. |
The new
fused_mlp
seems to not work on some cards qwopqwop200/GPTQ-for-LLaMa#179. If passing--no-fused_mlp
everything should work.