Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'quant_cuda' has no attribute 'vecquant4matmul' #1433

Closed
1 task done
thistleknot opened this issue Apr 21, 2023 · 2 comments
Closed
1 task done
Labels
bug Something isn't working stale

Comments

@thistleknot
Copy link

Describe the bug

Unable to run 4-bit quantized models on GPU despite having correct GPTQ installed.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

pip uninstall -y quant-cuda
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b cuda
cd GPTQ-for-LLaMa
python setup_cuda.py install
python server.py --model vicuna-13b-GPTQ-4bit-128g --wbits 4 --groupsize 128 --chat --listen --auto-devices --gpu-memory 3500MiB

Screenshot

No response

Logs

Traceback (most recent call last):
  File "/mnt/distvol/text-generation-webui/modules/callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/mnt/distvol/text-generation-webui/modules/text_generation.py", line 252, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 2524, in sample
    outputs = self(
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/distvol/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 426, in forward
    quant_cuda.vecquant4matmul(x, self.qweight, y, self.scales, self.qzeros, self.groupsize)
AttributeError: module 'quant_cuda' has no attribute 'vecquant4matmul'

System Info

python 3.9
Oracle Linux 8
Cuda 11.7
4GB VRAM
Quadro P1000
@thistleknot thistleknot added the bug Something isn't working label Apr 21, 2023
@thistleknot
Copy link
Author

thistleknot commented Apr 21, 2023

I folloed advice from another post and uninstalled gptq from pip

but then when running the same model, I get a new error (a new error is a better error)

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/mnt/distvol/text-generation-webui/modules/callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/mnt/distvol/text-generation-webui/modules/text_generation.py", line 252, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 2524, in sample
    outputs = self(
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 160, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 280, in pre_forward
    set_module_tensor_to_device(module, name, self.execution_device, value=self.weights_map[name])
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/utils/offload.py", line 123, in __getitem__
    return self.dataset[f"{self.prefix}{key}"]
  File "/mnt/distvol/python_user/python_user/gpt/lib/python3.9/site-packages/accelerate/utils/offload.py", line 170, in __getitem__
    weight_info = self.index[key]
KeyError: 'model.layers.17.self_attn.q_proj.wf1'

@github-actions
Copy link

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

1 participant