-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server.py not starting with GPTQ latest git 534edc7 #445
Comments
As it seems, 'load_quant()' in 'modules/GPTQ_loader.py' needs to pass one more (new) positional argument to qwopqwop200 / after correcting SyntaxError, here's the trace:
|
Change From the args documentation -1 sets to default size. |
Thanks, passing the value triggers another exception:
|
Yea, it looks like there's more issues with the GPTQ changes today than just syntax. I rolled back the GPTQ repo to yesterdays version without any of his changes today and it works fine. I saw same error as you before the rollback. |
Will do the same for now; I'd be curious to understand if re-quantizing the models with today's code would fix the loading |
If anyone needs a known good hash to roll back to, you can reset here (make sure to run this in the GPTQ-for-LLaMa repo, of course)
Corresponds to this commit yesterday: https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/468c47c01b4fe370616747b6d69a2d3f48bab5e4 It's what I'm using for my container at the moment. |
I actually don't know anymore... It seems like it might be more broken than I thought. I'm using the pre-quantized models from HF, so you might be right about versions alex.
|
Did you get the model to output predictions in your container? Mine appears to load the model, but throws an error on prediction. |
This solves it for me. This bug report is in the wrong repository, by the way. You should tell @qwopqwop200 about it. |
Yes, it's working for me with that specific commit. Specificially, it's set up like this right now: https://github.com/RedTopper/Text-Generation-Webui-Podman/blob/main/Containerfile#L14-L15 |
Awesome. Thanks |
Prediction broken for me too with yday's commit:
|
I wonder if they are actually testing on a quantized model, or a non-quantized one. I don't know where to go from here haha |
I 'fixed' inference by:
Today's changes break things however |
I also have the same issue, the last line is not working in your reply. |
fixed typo: |
That would make sense - you need to also rebuild the cuda package with the .cpp files from that commit. The container starts fresh from each build so the compiled version always matches the python code used in the repo. |
Awesome! Worked for me too. I completely forgot to rebuild the kernel -_- |
In any case, I reported qwopqwop200/GPTQ-for-LLaMa#62 to qwopqwop200 / |
qwopqwop200 replied, as of today, LLaMA models need to be re-quantized to work with newset code I'll test and report back ;-) |
@zoidbb help? |
Sum up: latest GPTQ-for-LLaMa code works for me, tested with LLaMA-7B and LLaMA-13B |
So this is why I couldn't load the models after I fixed the ) bug. But now we can quantize in different group size. Which one is the best for performance and coherence? I hate that I have to re-do this, btw. |
Re-quantize means running This requires a ton of VRAM, and I have 2 8GB cards but it only maxes out one cards memory. Edit: nvm, found a 13b model with the lora integrated that loads. |
@alexl83 Would you be able to host the fixed quantized files somewhere, perhaps on Hugging Face? |
When recompiling GPTQ on Windows, I accidentally forgot to use the x64 native tools cmd. It then successfully compiled using Visual Studio 2022 on its own, which is interesting considering everyone has been saying that only VS 2019 will work. |
I recommend using the previous GPTQ commit for now
https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#installation |
I noticed this as well. I was going off of the Reddit thread at the time, but I guess it is wrong. |
I keep getting: "CUDA Extension not installed." I'm on Windows 11 native. I have used the older commit (git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4) of GPTQ and ensured to install the .whl correctly. Cuda is certainly installed. Running python import torch torch.cuda.is_available() returns true. This is my first time installing Llama so I'm not sure if this is just a perfect storm of changes happening or what. It appears that the GPTQ_loader.py was changed yesterday to "model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits, shared.args.gptq_pre_layer)" see post and yet still doesn't seem to work with the current branch of GPTQ. Something about requantinization too? No idea what my issue is. I'm sure there is a whole lot more I am missing since I'm just now diving in today. |
@KnoBuddy if delete your environment and files and rollback text-generation-webui to two days ago, these instructions I made should work for you. You might be able to replace the |
@KnoBuddy "CUDA Extension not installed." is specifically referring to GPTQ-for-LLaMa. I've had this issue before after installing an outdated wheel. I uploaded a Windows wheel yesterday, along with the batch script that I use to install everything above that: |
@jllllll what does installing ninja before cuda compilation do? |
When doing the compilation without ninja, there is a message saying that the compilation would be faster with ninja. I don't notice much difference, but I install it anyway. |
ninja sets compile time parameters. |
Describe the bug
launching latest text-generation-webui code with latest opqwop200 /
GPTQ-for-LLaMa
throws up a python error:
Is there an existing issue for this?
Reproduction
Screenshot
No response
Logs
The text was updated successfully, but these errors were encountered: