-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend cleanup #6025
Backend cleanup #6025
Conversation
Any big reason to dump these? I mean people probably have models in those formats. One day they will pull and then randomly lose the ability to run them. It doesn't seem there's any code that has made these backends a hassle yet, besides existing.
Not a lot of maxwell people, but IIRC, it was the only way for them. |
Keeping the repository/documentation clean, not having to fix an obsolete backend, not forcing people to download the wheels for each I'll remove it since I think it only serves an imaginary type of user. People with old GPUs are running llama.cpp nowadays, not GPTQ-for-LLaMa. |
I used GPTQ and prefer it over GGUF because performance wise on my 3070ti is much better. It's a shame TheBloke was mainly the only good source for GPTQ. But honestly why would I want to to play with balancing GGUF's when I can just one and done it (either it fits or it doesnt) with GPTQ and inference speed and response is great compared to similar gguf's. So I hope your joking about removing its functionality, mind you i only use ExLlamav2_HF to run them. Not previous iterations, so if ExLlamav2_HF remains I have no qualms. |
mobiuslabsgmbh_Llama-2-7b-chat-hf_1bitgs8_hqq
, which is the most popular HQQ model on HF), but I'm keeping the loader for now because it's in active development and promising.