-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't run any GPT-J-6B model locally in CPU or GPU+CPU modes #83
Comments
Alright, that is quite a lot to break down, this does not appear to be a KoboldAI bug as your running into issues with the dependencies itself but lets dig in. So lets begin at the start with your setup process, the bundled requirements.txt really is only meant to be used with the Colabs and may or may not work well on your system. For most use cases we advice creating a conda environment instead using the bundled files (Make sure to use the official Miniconda3 install script, not something like apt-get install python-conda) instead of using a venv yourself, for now this will also be the finetuneanon version (Unless you really want that CPU mode to work). I don't think your specific issue is related to that, but its good to know for the future if you do ever run into dependency issues. The second issue is that the Half mode does not work on the CPU, which is indeed correct and to be expected (Like you mentioned we documented this). We are phasing out Finetune's branch entirely in the upcoming version but since we ran into similar issues in the upcoming 0.17 builds it may or may not be fixed by other fixing efforts. We implemented the Half mode for the GPU in that upcoming version for the official transformers so that you get a GPU mode with even more efficient loading, and you also get fully functional CPU support. Finetune's branch got quite far behind upstream and most of its features have been integrated inside KoboldAI on that development branch so very soon we will no longer be recommending anyone to use it once 0.17 is finished. Then the next part will get a bit more complicated. First of all the behavior of the models inside the official version of Transformers is entirely to be expected. Its supposed to load wrong, spew gibberish or not load at all. The reason for this is that Finetune invented his own format for 6B that the upstream version chose not to use. In the version of KoboldAI you are using we did not implement the official format yet and because of that it will all end up loading completely wrong, as it is trying to load a Neo model that is in reality a fake 6B format. So for now you will need either Finetune's fork or you will need the development version of KoboldAI (Currently at https://github.com/henk717/koboldai ) along with models converted to the official format for 6B which we to avoid confusion dubbed the HFJ format. So, that explains away the error your getting on the CPU mode, and the gibberish you are getting when your not using the appropriate version for the models you are using. That leaves one more issue we need to tackle, and thats the fact its not working for you, i downloaded the same model and loaded up the same branch of KoboldAI as your using (0.16). Then i loaded the Finetune version of Transformers (In my case ROCm since i have AMD). Generation went smooth and i did not run into any generation errors. So the issue is most likely somewhere in your environment, are you certain its the model you list in your notes? Because on our downloads we also have the gpt-hfj-6b-adventure.7z which is that newer model format. If it is, then i highly recommend retesting using our tried and tested conda environments. Or the play-cuda.sh docker launcher if you have a Docker configured to be able to use your Nvidia. If you'd like to get more one on one support i recommend joining our Discord https://discord.gg/XuQWadgU9k we can help you get going there and its quicker than resolving it over a Github issue. |
Thanks for a quick reply! It really clarified a lot of things. I've tried the new version with the BTW, I used venv just as before. But I also tested the current version with Conda (using
But in the docs it says:
If it's really not recommended or not supported then I think the docs should say so. Though I don't think there should be any difference. But I agree that the Conda way may be more fool-proof (but it didn't change anything in my case). |
In 1.17 you can now use the regular version of transformers (huggingface.yml) for everything. I also updated the readme to let people know requirements.txt is not recommended. The suitable models are all in the menu now. Let me know if your still having issues. |
Yeah, I've been using this version for a while now, and everything works correctly (except This can probably be closed. |
Load settings earlier to avoid TPU badwords issues
Seems like there's no way to run GPT-J-6B models locally using CPU or CPU+GPU modes. I've tried both transformers versions (original and finetuneanon's) in both modes (CPU and GPU+CPU), but they all fail in one way or another.
First, I'll describe the error that appears when trying to use the
gpt-j-6b-adventure-hf
model locally in GPU+CPU hybrid mode. In this case KoboldAI raises the following error:Steps to reproduce
I'm testing this on Linux.
Choose
1 - Custom Neo (GPT-Neo / Converted GPT-J)
.Pick
models/gpt-j-6b-adventure-hf
.Choose
3 - Both (slower than GPU-only but uses less VRAM)
.Choose a number of blocks for the system RAM. In my case it was
24
(but later I used20
).Enter anything in the web GUI prompt and click
Submit
.After some time the abovementioned error will appear.
I was using the bundled
requirements.txt
, so the finetuneanon's version of the transformers was used.Click here to view the full output
The generic
gpt-j-6b
model throws the same error.Other errors
When I try to use finetuneanon's transformers in CPU mode, a different error occurs:
"LayerNormKernelImpl" not implemented for 'Half'
. This is documented, so it's "ok".When I try to use the original transformers in GPU+CPU mode I get this error:
Input, output and indices must be on the current device
.Click here to view the full output
And when I try to use the original transformers in CPU mode there's no error, but the output is garbage. For example, when I input
I see a shining light.
it gives me this:The original transformers also produce some warnings (truncated):
Click here to view the full output
Summary
If these errors are unfixable I think that at least it needs to be documented somewhere.
Other details:
gpt-j-6b-adventure-hf
andgpt-j-6b
models produce the same errors.I've tested 2.7B models and they work fine in CPU and GPU+CPU modes.
I can't test 6B models in GPU-only mode (not enough VRAM).
System
The text was updated successfully, but these errors were encountered: