Giving up #2202

tr7zw · 2023-05-19T22:17:57Z

tr7zw
May 19, 2023

This is ridiculous. At this point I have 4? installs, all not working for different broken reasons.
Windows after setting up a new env:

(E:\textgeneration\installer_files\env) E:\textgeneration>start_windows.bat
INFO:Gradio HTTP request redirected to localhost :)
bin E:\textgeneration\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
Starting streaming server at ws://127.0.0.1:5005/api/v1/stream
INFO:Loading the extension "gallery"...
INFO:server listening on 127.0.0.1:5005
Starting API at http://127.0.0.1:5000/api
Traceback (most recent call last):
  File "E:\textgeneration\text-generation-webui\server.py", line 928, in <module>
    create_interface()
  File "E:\textgeneration\text-generation-webui\server.py", line 515, in create_interface
    with gr.Blocks(css=ui.css if not shared.is_chat() else ui.css + ui.chat_css, analytics_enabled=False, title=title, theme=ui.theme) as shared.gradio['interface']:
  File "E:\textgeneration\installer_files\env\lib\site-packages\gradio\blocks.py", line 1285, in __exit__
    self.config = self.get_config_file()
  File "E:\textgeneration\installer_files\env\lib\site-packages\gradio\blocks.py", line 1261, in get_config_file
    "input": list(block.input_api_info()),  # type: ignore
  File "E:\textgeneration\installer_files\env\lib\site-packages\gradio_client\serializing.py", line 41, in input_api_info
    return (api_info["serialized_input"][0], api_info["serialized_input"][1])
KeyError: 'serialized_input'

Done!

Windows with the old env(which used to work, now it claims that it doesn't fit and tries to allocate System memory instead of GPU memory?!?)

(E:\textgeneration\installer_files\env2) E:\textgeneration>start_windows.bat
INFO:Gradio HTTP request redirected to localhost :)
bin E:\textgeneration\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
INFO:Loading MetaIX_Alpaca-30B-Int4...
INFO:Found the following quantized model: models\MetaIX_Alpaca-30B-Int4\alpaca-30b-4bit.pt
Traceback (most recent call last):
  File "E:\textgeneration\text-generation-webui\server.py", line 915, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "E:\textgeneration\text-generation-webui\modules\models.py", line 159, in load_model
    model = load_quantized(model_name)
  File "E:\textgeneration\text-generation-webui\modules\GPTQ_loader.py", line 179, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "E:\textgeneration\text-generation-webui\modules\GPTQ_loader.py", line 45, in _load_quant
    model = AutoModelForCausalLM.from_config(config)
  File "E:\textgeneration\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py", line 411, in from_config
    return model_class._from_config(config, **kwargs)
  File "E:\textgeneration\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 1146, in _from_config
    model = cls(config, **kwargs)
  File "E:\textgeneration\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 614, in __init__
    self.model = LlamaModel(config)
  File "E:\textgeneration\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in __init__
    self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
  File "E:\textgeneration\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in <listcomp>
    self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
  File "E:\textgeneration\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 256, in __init__
    self.mlp = LlamaMLP(
  File "E:\textgeneration\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 153, in __init__
    self.up_proj = nn.Linear(hidden_size, intermediate_size, bias=False)
  File "E:\textgeneration\installer_files\env\lib\site-packages\torch\nn\modules\linear.py", line 96, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 238551040 bytes.

Done!

Trying a fresh install under WSL, maybe that helps. Nope.

(textgen) root@DESKTOP-G8KL9O6:/mnt/e/textgen/text-generation-webui# python server.py --auto-devices --chat --extension api
bin /root/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
INFO:Loading MetaIX_Alpaca-30B-Int4...
INFO:Found the following quantized model: models/MetaIX_Alpaca-30B-Int4/alpaca-30b-4bit.pt
Killed

Also trying to use a one click installer got this beauty of messed up error:

Installing base environment...
Downloading and Extracting Packages
Downloading and Extracting Packages

Preparing transaction: done
Executing transaction: done
installation finished.
Miniconda version:
conda 23.1.0
Conda is not installed. Exiting...

Done!
(textgen) root@DESKTOP-G8KL9O6:/mnt/e/textgen# ./start_linux.sh
Conda is not installed. Exiting...
(textgen) root@DESKTOP-G8KL9O6:/mnt/e/textgen# conda
usage: conda [-h] [-V] command ...

At this point, I'm just lost for words. 4 hours down the drain, just because of a git pull.

Answered by Kiyos

May 22, 2023

Regarding your second run, the DefaultCPUAllocator one:
Before a 4-bit GPTQ model model can be loaded into the GPU memory (VRAM), it must be loaded into the main RAM first. It usually takes up about 1.5 times mode space in RAM than on disk (because reasons). Look at alpaca-30b-4bit.pt, measure its size on disk and multiply it by 1.5 — that’s how much actual free RAM you need before python starts loading it. You might just not have this much. Happened to me before. I can’t even use 13B models with my 16G of RAM.
Another thing to keep in mind when working with GPTQ models is that they cannot be split between VRAM, RAM and disk cache, so --auto-devices, --gpu-memory, --cpu-memory, --disk flags…

View full answer

m-spangenberg · 2023-05-19T22:36:47Z

m-spangenberg
May 19, 2023

If it's just the git pull that screwed things up, why don't you revert to a last known good commit hash and reinstall the requirements ? 071f0776ad6e7d8dab08e0d98d089c808807ab45 is just before they bumped the transformers version. Who knows, worth a shot.

git checkout 071f0776ad6e7d8dab08e0d98d089c808807ab45
pip install -r requirements.txt --upgrade

Good luck!

1 reply

tr7zw May 19, 2023
Author

It changed some dependencies or so. Now the env that used to work is the one giving the DefaultCPUAllocator: not enough memory: you tried to allocate 238551040 bytes. errors, without actually allocating system or GPU memory.

Kiyos · 2023-05-22T13:51:25Z

Kiyos
May 22, 2023

Regarding your second run, the DefaultCPUAllocator one:
Before a 4-bit GPTQ model model can be loaded into the GPU memory (VRAM), it must be loaded into the main RAM first. It usually takes up about 1.5 times mode space in RAM than on disk (because reasons). Look at alpaca-30b-4bit.pt, measure its size on disk and multiply it by 1.5 — that’s how much actual free RAM you need before python starts loading it. You might just not have this much. Happened to me before. I can’t even use 13B models with my 16G of RAM.
Another thing to keep in mind when working with GPTQ models is that they cannot be split between VRAM, RAM and disk cache, so --auto-devices, --gpu-memory, --cpu-memory, --disk flags have no effect.
Hope this helps.

1 reply

tr7zw May 22, 2023
Author

A few days later I did another clean clean install and did notice that when I cleaned my RAM enough(basically closing everything) it did start to actually try to load it(unlike before, where it just states that it can't allocate like 200mb, with over 20gb of free ram. Also this was kinda what the WSL version was doing, but just not giving any error. Just stops with the "Killed" message). This is weird, because I could swear I was able to load it before without prepping anything(discord/firefox/etc still open). But even after getting it loaded, encountered weird python errors about missing variables/incorrect types(I think) while trying to generate anything. I'll probably just wait for now, letting things mature more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Giving up #2202

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Giving up #2202

Uh oh!

Uh oh!

tr7zw May 19, 2023

Replies: 2 comments · 2 replies

Uh oh!

m-spangenberg May 19, 2023

Uh oh!

tr7zw May 19, 2023 Author

Uh oh!

Uh oh!

Kiyos May 22, 2023

Uh oh!

Uh oh!

tr7zw May 22, 2023 Author

tr7zw
May 19, 2023

Replies: 2 comments 2 replies

m-spangenberg
May 19, 2023

tr7zw May 19, 2023
Author

Kiyos
May 22, 2023

tr7zw May 22, 2023
Author