Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run any GPT-J-6B model locally in CPU or GPU+CPU modes #83

Closed
z80maniac opened this issue Nov 28, 2021 · 4 comments
Closed

Can't run any GPT-J-6B model locally in CPU or GPU+CPU modes #83

z80maniac opened this issue Nov 28, 2021 · 4 comments

Comments

@z80maniac
Copy link

Seems like there's no way to run GPT-J-6B models locally using CPU or CPU+GPU modes. I've tried both transformers versions (original and finetuneanon's) in both modes (CPU and GPU+CPU), but they all fail in one way or another.

First, I'll describe the error that appears when trying to use the gpt-j-6b-adventure-hf model locally in GPU+CPU hybrid mode. In this case KoboldAI raises the following error:

module 'keras.backend' has no attribute 'is_tensor'

Steps to reproduce

I'm testing this on Linux.

  1. Setup everything and start KoboldAI.
git clone https://github.com/KoboldAI/KoboldAI-Client.git kobold-local
cd kobold-local

python3 -m venv ./venv
source venv/bin/activate

pip install -r requirements.txt

mkdir -p models
cd models
wget 'https://api.wandb.ai/files/ve-forbryderne/adventure/carol-data/models/gpt-j-6b-adventure-hf.7z'
7za x gpt-j-6b-adventure-hf.7z
cd ..

python3 aiserver.py
  1. Choose 1 - Custom Neo (GPT-Neo / Converted GPT-J).

  2. Pick models/gpt-j-6b-adventure-hf.

  3. Choose 3 - Both (slower than GPU-only but uses less VRAM).

  4. Choose a number of blocks for the system RAM. In my case it was 24 (but later I used 20).

  5. Enter anything in the web GUI prompt and click Submit.

After some time the abovementioned error will appear.

I was using the bundled requirements.txt, so the finetuneanon's version of the transformers was used.

Click here to view the full output
❯ python3 aiserver.py
Welcome to the KoboldAI Server!
Select an AI model to continue:

    #   Model                           V/RAM
    =========================================
    1  - Custom Neo (GPT-Neo / Converted GPT-J)
    2  - Custom GPT-2 (eg CloverEdition)
    3  - GPT Neo 1.3B                   8GB
    4  - GPT Neo 2.7B                   16GB
    5  - GPT-2                          1GB
    6  - GPT-2 Med                      2GB
    7  - GPT-2 Large                    4GB
    8  - GPT-2 XL                       8GB
    9  - InferKit API (requires API key)
    10 - Google Colab
    11 - OpenAI API (requires API key)
    12 - Read Only (No AI)

Model #> 1
Please choose the folder where pytorch_model.bin is located:

Looking for GPU support...FOUND!
You're using a model that supports GPU-CPU hybrid generation!
Currently only GPT-Neo models and GPT-J-6B support this feature.
Use GPU or CPU for generation?:  (Default GPU)
    1 - GPU
    2 - CPU
    3 - Both (slower than GPU-only but uses less VRAM)

Mode> 3
Initializing Flask... OK!
Initializing transformers, please wait...

How many layers would you like to put into system RAM?
The more of them you put into system RAM, the slower it will run,
but it will require less VRAM
(roughly proportional to number of layers).
This model has 28 layers.

# of layers> 24
Will commit 24 of 28 layers to system RAM.
OK! NeoCustom pipeline created!
You may now connect with a browser at http://127.0.0.1:5000/
* Serving Flask app "aiserver" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
The WebSocket transport is not available, you must install a WebSocket server that is compatible with your async mode to enable it. See the documentation for details. (further occurrences of this error will be logged with level INFO)
Client connected!
Data received:{'cmd': 'submit', 'actionmode': 0, 'data': 'I see a shining light.'}
Min:7, Max:86, Txt:I see a shining light.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
module 'keras.backend' has no attribute 'is_tensor'

The generic gpt-j-6b model throws the same error.

Other errors

When I try to use finetuneanon's transformers in CPU mode, a different error occurs: "LayerNormKernelImpl" not implemented for 'Half'. This is documented, so it's "ok".

When I try to use the original transformers in GPU+CPU mode I get this error: Input, output and indices must be on the current device.

Click here to view the full output
❯ python3 aiserver.py
Welcome to the KoboldAI Server!
Select an AI model to continue:

    #   Model                           V/RAM
    =========================================
    1  - Custom Neo (GPT-Neo / Converted GPT-J)
    2  - Custom GPT-2 (eg CloverEdition)
    3  - GPT Neo 1.3B                   8GB
    4  - GPT Neo 2.7B                   16GB
    5  - GPT-2                          1GB
    6  - GPT-2 Med                      2GB
    7  - GPT-2 Large                    4GB
    8  - GPT-2 XL                       8GB
    9  - InferKit API (requires API key)
    10 - Google Colab
    11 - OpenAI API (requires API key)
    12 - Read Only (No AI)

Model #> 1
Please choose the folder where pytorch_model.bin is located:

Looking for GPU support...FOUND!
You're using a model that supports GPU-CPU hybrid generation!
Currently only GPT-Neo models and GPT-J-6B support this feature.
Use GPU or CPU for generation?:  (Default GPU)
    1 - GPU
    2 - CPU
    3 - Both (slower than GPU-only but uses less VRAM)

Mode> 3
Initializing Flask... OK!
Initializing transformers, please wait...
Some weights of the model checkpoint at /home/user/test/kobold-local/models/gpt-j-6b were not used when initializing GPTNeoForCausalLM: ['lm_head.bias']
- This IS expected if you are initializing GPTNeoForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of GPTNeoForCausalLM were not initialized from the model checkpoint at /home/user/test/kobold-local/models/gpt-j-6b and are newly initialized: ['transformer.h.25.ln_2.weight', 'transformer.h.21.ln_2.bias', 'transformer.h.10.ln_2.weight', 'transformer.h.24.attn.attention.out_proj.bias', 'transformer.h.7.ln_2.bias', 'transformer.h.21.attn.attention.out_proj.bias', 'transformer.h.24.ln_2.bias', 'transformer.h.22.attn.attention.out_proj.bias', 'transformer.h.0.attn.attention.out_proj.bias', 'transformer.h.1.ln_2.bias', 'transformer.h.9.ln_2.bias', 'transformer.h.9.attn.attention.out_proj.bias', 'transformer.h.19.ln_2.weight', 'transformer.h.8.ln_2.weight', 'transformer.h.8.attn.attention.out_proj.bias', 'transformer.h.17.ln_2.bias', 'transformer.h.27.ln_2.bias', 'transformer.h.13.ln_2.weight', 'transformer.h.24.ln_2.weight', 'transformer.h.16.ln_2.bias', 'transformer.h.3.attn.attention.out_proj.bias', 'transformer.h.11.ln_2.bias', 'transformer.h.20.ln_2.weight', 'transformer.h.0.ln_2.bias', 'transformer.h.1.attn.attention.out_proj.bias', 'transformer.h.10.attn.attention.out_proj.bias', 'transformer.h.4.ln_2.bias', 'transformer.h.5.ln_2.bias', 'transformer.h.11.attn.attention.out_proj.bias', 'transformer.h.25.ln_2.bias', 'transformer.h.15.ln_2.bias', 'transformer.h.3.ln_2.weight', 'transformer.h.18.ln_2.weight', 'transformer.h.18.attn.attention.out_proj.bias', 'transformer.h.9.ln_2.weight', 'transformer.h.23.ln_2.bias', 'transformer.h.6.attn.attention.out_proj.bias', 'transformer.h.7.attn.attention.out_proj.bias', 'transformer.h.2.attn.attention.out_proj.bias', 'transformer.h.16.ln_2.weight', 'transformer.h.7.ln_2.weight', 'transformer.h.3.ln_2.bias', 'transformer.h.23.attn.attention.out_proj.bias', 'transformer.h.27.ln_2.weight', 'transformer.h.12.ln_2.weight', 'transformer.h.13.attn.attention.out_proj.bias', 'transformer.h.5.ln_2.weight', 'transformer.h.8.ln_2.bias', 'transformer.h.2.ln_2.weight', 'transformer.h.20.attn.attention.out_proj.bias', 'transformer.h.4.ln_2.weight', 'transformer.h.26.ln_2.weight', 'transformer.h.6.ln_2.weight', 'transformer.h.22.ln_2.bias', 'transformer.h.14.attn.attention.out_proj.bias', 'transformer.h.20.ln_2.bias', 'transformer.h.13.ln_2.bias', 'transformer.h.18.ln_2.bias', 'transformer.h.25.attn.attention.out_proj.bias', 'transformer.h.26.attn.attention.out_proj.bias', 'transformer.h.26.ln_2.bias', 'transformer.h.19.ln_2.bias', 'transformer.h.17.ln_2.weight', 'transformer.h.14.ln_2.weight', 'transformer.h.4.attn.attention.out_proj.bias', 'transformer.h.17.attn.attention.out_proj.bias', 'transformer.h.27.attn.attention.out_proj.bias', 'transformer.h.6.ln_2.bias', 'transformer.h.5.attn.attention.out_proj.bias', 'transformer.h.23.ln_2.weight', 'transformer.h.15.ln_2.weight', 'transformer.h.21.ln_2.weight', 'transformer.h.19.attn.attention.out_proj.bias', 'transformer.h.2.ln_2.bias', 'transformer.h.10.ln_2.bias', 'transformer.h.1.ln_2.weight', 'transformer.h.22.ln_2.weight', 'transformer.h.11.ln_2.weight', 'transformer.h.14.ln_2.bias', 'transformer.h.0.ln_2.weight', 'transformer.h.15.attn.attention.out_proj.bias', 'transformer.h.12.attn.attention.out_proj.bias', 'transformer.wpe.weight', 'transformer.h.16.attn.attention.out_proj.bias', 'transformer.h.12.ln_2.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

How many layers would you like to put into system RAM?
The more of them you put into system RAM, the slower it will run,
but it will require less VRAM
(roughly proportional to number of layers).
This model has 28 layers.

# of layers> 20
Will commit 20 of 28 layers to system RAM.
OK! NeoCustom pipeline created!
You may now connect with a browser at http://127.0.0.1:5000/
* Serving Flask app "aiserver" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
The WebSocket transport is not available, you must install a WebSocket server that is compatible with your async mode to enable it. See the documentation for details. (further occurrences of this error will be logged with level INFO)
Client connected!
Client connected!
Client connected!
Data received:{'cmd': 'submit', 'actionmode': 0, 'data': 'I see a shining light.'}
Min:7, Max:86, Txt:I see a shining light.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Client connected!
Input, output and indices must be on the current device

And when I try to use the original transformers in CPU mode there's no error, but the output is garbage. For example, when I input I see a shining light. it gives me this:

Analog Disk Sellvest Lif medically brightest scalingieuEVURNprefix DISTRICT relay Samson Commission Fold recallAUmaps bumper PB dex Cullen Championships unp HERO Raspberry Ankalse Ness sustained invokevind Pikachu Volks Meth Lect EMP cyan steering Tens LET ENexplet laptops fliesATT InstituteERSON mitochond!

The original transformers also produce some warnings (truncated):

Some weights of the model checkpoint at /home/user/test/kobold-local/models/gpt-j-6b were not used when initializing GPTNeoForCausalLM: ['lm_head.bias']
- This IS expected if you are initializing GPTNeoForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of GPTNeoForCausalLM were not initialized from the model checkpoint at /home/user/test/kobold-local/models/gpt-j-6b and are newly initialized: ['transformer.h.7.ln_2.weight', 'transformer.h.25.ln_2.bias', 'transformer.h.26.ln_2.bias', 'transformer.h.5.ln_2.bias', ...]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Click here to view the full output
❯ python3 aiserver.py
Welcome to the KoboldAI Server!
Select an AI model to continue:

    #   Model                           V/RAM
    =========================================
    1  - Custom Neo (GPT-Neo / Converted GPT-J)
    2  - Custom GPT-2 (eg CloverEdition)
    3  - GPT Neo 1.3B                   8GB
    4  - GPT Neo 2.7B                   16GB
    5  - GPT-2                          1GB
    6  - GPT-2 Med                      2GB
    7  - GPT-2 Large                    4GB
    8  - GPT-2 XL                       8GB
    9  - InferKit API (requires API key)
    10 - Google Colab
    11 - OpenAI API (requires API key)
    12 - Read Only (No AI)

Model #> 1
Please choose the folder where pytorch_model.bin is located:

Looking for GPU support...FOUND!
You're using a model that supports GPU-CPU hybrid generation!
Currently only GPT-Neo models and GPT-J-6B support this feature.
Use GPU or CPU for generation?:  (Default GPU)
    1 - GPU
    2 - CPU
    3 - Both (slower than GPU-only but uses less VRAM)

Mode> 2
Initializing Flask... OK!
Initializing transformers, please wait...
Some weights of the model checkpoint at /home/user/test/kobold-local/models/gpt-j-6b were not used when initializing GPTNeoForCausalLM: ['lm_head.bias']
- This IS expected if you are initializing GPTNeoForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of GPTNeoForCausalLM were not initialized from the model checkpoint at /home/user/test/kobold-local/models/gpt-j-6b and are newly initialized: ['transformer.h.7.ln_2.weight', 'transformer.h.25.ln_2.bias', 'transformer.h.26.ln_2.bias', 'transformer.h.5.ln_2.bias', 'transformer.h.18.attn.attention.out_proj.bias', 'transformer.h.1.ln_2.weight', 'transformer.h.13.ln_2.weight', 'transformer.h.21.ln_2.bias', 'transformer.h.8.ln_2.bias', 'transformer.h.19.attn.attention.out_proj.bias', 'transformer.h.23.attn.attention.out_proj.bias', 'transformer.h.8.ln_2.weight', 'transformer.h.19.ln_2.bias', 'transformer.h.2.attn.attention.out_proj.bias', 'transformer.h.11.ln_2.bias', 'transformer.h.5.ln_2.weight', 'transformer.h.3.attn.attention.out_proj.bias', 'transformer.h.6.attn.attention.out_proj.bias', 'transformer.h.22.ln_2.bias', 'transformer.h.17.ln_2.bias', 'transformer.h.16.attn.attention.out_proj.bias', 'transformer.h.14.ln_2.bias', 'transformer.h.27.attn.attention.out_proj.bias', 'transformer.h.16.ln_2.bias', 'transformer.h.0.ln_2.bias', 'transformer.h.2.ln_2.bias', 'transformer.h.6.ln_2.bias', 'transformer.h.8.attn.attention.out_proj.bias', 'transformer.h.15.attn.attention.out_proj.bias', 'transformer.h.13.ln_2.bias', 'transformer.h.0.ln_2.weight', 'transformer.h.12.ln_2.weight', 'transformer.h.10.ln_2.bias', 'transformer.h.7.ln_2.bias', 'transformer.h.20.ln_2.bias', 'transformer.h.14.attn.attention.out_proj.bias', 'transformer.h.4.ln_2.weight', 'transformer.h.26.ln_2.weight', 'transformer.h.26.attn.attention.out_proj.bias', 'transformer.h.4.ln_2.bias', 'transformer.h.10.attn.attention.out_proj.bias', 'transformer.wpe.weight', 'transformer.h.1.ln_2.bias', 'transformer.h.6.ln_2.weight', 'transformer.h.24.attn.attention.out_proj.bias', 'transformer.h.11.attn.attention.out_proj.bias', 'transformer.h.22.attn.attention.out_proj.bias', 'transformer.h.3.ln_2.weight', 'transformer.h.3.ln_2.bias', 'transformer.h.23.ln_2.bias', 'transformer.h.25.attn.attention.out_proj.bias', 'transformer.h.27.ln_2.weight', 'transformer.h.23.ln_2.weight', 'transformer.h.9.ln_2.weight', 'transformer.h.0.attn.attention.out_proj.bias', 'transformer.h.1.attn.attention.out_proj.bias', 'transformer.h.9.attn.attention.out_proj.bias', 'transformer.h.13.attn.attention.out_proj.bias', 'transformer.h.24.ln_2.weight', 'transformer.h.17.attn.attention.out_proj.bias', 'transformer.h.12.ln_2.bias', 'transformer.h.24.ln_2.bias', 'transformer.h.2.ln_2.weight', 'transformer.h.25.ln_2.weight', 'transformer.h.18.ln_2.weight', 'transformer.h.19.ln_2.weight', 'transformer.h.21.attn.attention.out_proj.bias', 'transformer.h.7.attn.attention.out_proj.bias', 'transformer.h.16.ln_2.weight', 'transformer.h.27.ln_2.bias', 'transformer.h.20.ln_2.weight', 'transformer.h.15.ln_2.weight', 'transformer.h.10.ln_2.weight', 'transformer.h.9.ln_2.bias', 'transformer.h.18.ln_2.bias', 'transformer.h.12.attn.attention.out_proj.bias', 'transformer.h.5.attn.attention.out_proj.bias', 'transformer.h.22.ln_2.weight', 'transformer.h.11.ln_2.weight', 'transformer.h.20.attn.attention.out_proj.bias', 'transformer.h.4.attn.attention.out_proj.bias', 'transformer.h.15.ln_2.bias', 'transformer.h.14.ln_2.weight', 'transformer.h.17.ln_2.weight', 'transformer.h.21.ln_2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
OK! NeoCustom pipeline created!
You may now connect with a browser at http://127.0.0.1:5000/
* Serving Flask app "aiserver" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
The WebSocket transport is not available, you must install a WebSocket server that is compatible with your async mode to enable it. See the documentation for details. (further occurrences of this error will be logged with level INFO)
Client connected!
Data received:{'cmd': 'submit', 'actionmode': 0, 'data': 'I see a shining light.'}
Min:7, Max:86, Txt:I see a shining light.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Client connected!
Analog Disk Sellvest Lif medically brightest scalingieuEVURNprefix DISTRICT relay Samson Commission Fold recallAUmaps bumper PB dex Cullen Championships unp HERO Raspberry Ankalse Ness sustained invokevind Pikachu Volks Meth Lect EMP cyan steering Tens LET ENexplet laptops fliesATT InstituteERSON mitochond!=EMP Meng BengEh KakERSON webs purchaser Sitting sunk liquphan%; accompanies lecturer Championships bumperrite sailorsasaki hammşATTarth Bash MAT Pupp

Summary

mode transformers error
CPU original Garbage output
CPU finetuneanon "LayerNormKernelImpl" not implemented for 'Half'
GPU+CPU original Input, output and indices must be on the current device
GPU+CPU finetuneanon module 'keras.backend' has no attribute 'is_tensor'

If these errors are unfixable I think that at least it needs to be documented somewhere.

Other details:

  • gpt-j-6b-adventure-hf and gpt-j-6b models produce the same errors.

  • I've tested 2.7B models and they work fine in CPU and GPU+CPU modes.

  • I can't test 6B models in GPU-only mode (not enough VRAM).

System

  • GeForce GTX 1060 6GB
  • 32 GB RAM (+ pagefile since using CPU-only requires around 45GB)
  • Kubuntu 21.10
  • CUDA 11.3.109
@henk717
Copy link
Collaborator

henk717 commented Nov 28, 2021

Alright, that is quite a lot to break down, this does not appear to be a KoboldAI bug as your running into issues with the dependencies itself but lets dig in.

So lets begin at the start with your setup process, the bundled requirements.txt really is only meant to be used with the Colabs and may or may not work well on your system. For most use cases we advice creating a conda environment instead using the bundled files (Make sure to use the official Miniconda3 install script, not something like apt-get install python-conda) instead of using a venv yourself, for now this will also be the finetuneanon version (Unless you really want that CPU mode to work). I don't think your specific issue is related to that, but its good to know for the future if you do ever run into dependency issues.

The second issue is that the Half mode does not work on the CPU, which is indeed correct and to be expected (Like you mentioned we documented this). We are phasing out Finetune's branch entirely in the upcoming version but since we ran into similar issues in the upcoming 0.17 builds it may or may not be fixed by other fixing efforts. We implemented the Half mode for the GPU in that upcoming version for the official transformers so that you get a GPU mode with even more efficient loading, and you also get fully functional CPU support. Finetune's branch got quite far behind upstream and most of its features have been integrated inside KoboldAI on that development branch so very soon we will no longer be recommending anyone to use it once 0.17 is finished.

Then the next part will get a bit more complicated. First of all the behavior of the models inside the official version of Transformers is entirely to be expected. Its supposed to load wrong, spew gibberish or not load at all. The reason for this is that Finetune invented his own format for 6B that the upstream version chose not to use. In the version of KoboldAI you are using we did not implement the official format yet and because of that it will all end up loading completely wrong, as it is trying to load a Neo model that is in reality a fake 6B format. So for now you will need either Finetune's fork or you will need the development version of KoboldAI (Currently at https://github.com/henk717/koboldai ) along with models converted to the official format for 6B which we to avoid confusion dubbed the HFJ format.

So, that explains away the error your getting on the CPU mode, and the gibberish you are getting when your not using the appropriate version for the models you are using.

That leaves one more issue we need to tackle, and thats the fact its not working for you, i downloaded the same model and loaded up the same branch of KoboldAI as your using (0.16). Then i loaded the Finetune version of Transformers (In my case ROCm since i have AMD). Generation went smooth and i did not run into any generation errors.

So the issue is most likely somewhere in your environment, are you certain its the model you list in your notes? Because on our downloads we also have the gpt-hfj-6b-adventure.7z which is that newer model format. If it is, then i highly recommend retesting using our tried and tested conda environments. Or the play-cuda.sh docker launcher if you have a Docker configured to be able to use your Nvidia.

If you'd like to get more one on one support i recommend joining our Discord https://discord.gg/XuQWadgU9k we can help you get going there and its quicker than resolving it over a Github issue.

@z80maniac
Copy link
Author

Thanks for a quick reply! It really clarified a lot of things.

I've tried the new version with the gpt-hfj-6b-adventure model and it indeed works. Also I am really surprised at how fast it works in GPU+CPU mode. When specifying only 4 VRAM blocks it generates the output after 60 seconds. I think it's at least tolerable. And the RAM usage is also relatively low: only 12GB or so.

BTW, I used venv just as before. But I also tested the current version with Conda (using environments/finetuneanon.yml) and gpt-j-6b-adventure-hf model. And got exactly the same error (module 'keras.backend' has no attribute 'is_tensor'). So I guess it's really some sort of model-transformer incompability and not a package issue. Maybe it's NVIDIA-only since there's no issue on AMD. Or maybe it's some weird problem with my OS or host environment (but I ran a lot of other CUDA projects and they worked). I didn't test it in Docker though. Tried play-cuda.sh, but it stuck while building the image in the middle of installing the packages. Not sure if it supposed to be like that and if I just needed to wait longer. May be I'll re-test it in the future.

the bundled requirements.txt really is only meant to be used with the Colabs and may or may not work well on your system.

But in the docs it says:

If you do not want to use conda install the requirements listed in requirements.txt and make sure that CUDA is properly installed.

If it's really not recommended or not supported then I think the docs should say so. Though I don't think there should be any difference. But I agree that the Conda way may be more fool-proof (but it didn't change anything in my case).

@henk717
Copy link
Collaborator

henk717 commented Feb 28, 2022

In 1.17 you can now use the regular version of transformers (huggingface.yml) for everything. I also updated the readme to let people know requirements.txt is not recommended.

The suitable models are all in the menu now. Let me know if your still having issues.

@z80maniac
Copy link
Author

Yeah, I've been using this version for a while now, and everything works correctly (except play-cuda.sh). 6B models are loaded without errors (though I loaded them from a folder, not from the menu). Also, requirements.txt works for me (didn't try Conda again).

This can probably be closed.

@henk717 henk717 closed this as completed Feb 28, 2022
henk717 added a commit that referenced this issue Mar 2, 2022
Load settings earlier to avoid TPU badwords issues
henk717 pushed a commit that referenced this issue Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants