Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miniCPM-V get error self and mat2 must have the same dtype, but got Half and Byte #10470

Closed
violet17 opened this issue Mar 19, 2024 · 2 comments
Assignees

Comments

@violet17
Copy link

I run miniCPM-v and get the following error:

Traceback (most recent call last):
  File "D:\rag\test_cpm.py", line 45, in <module>
    answer, context, _ = model.chat(
  File "C:\Users\mi\.cache\huggingface\modules\transformers_modules\MiniCPM-V\modeling_minicpmv.py", line 273, in chat
    res, vision_hidden_states = self.generate(
  File "C:\Users\mi\.cache\huggingface\modules\transformers_modules\MiniCPM-V\modeling_minicpmv.py", line 230, in generate
    model_inputs['inputs_embeds'], vision_hidden_states = self.get_vllm_embedding(model_inputs)
  File "C:\Users\mi\.cache\huggingface\modules\transformers_modules\MiniCPM-V\modeling_minicpmv.py", line 88, in get_vllm_embedding
    vision_hidden_states.append(self.get_vision_embedding(pixel_values))
  File "C:\Users\mi\.cache\huggingface\modules\transformers_modules\MiniCPM-V\modeling_minicpmv.py", line 79, in get_vision_embedding
    res.append(self.resampler(vision_embedding))
  File "C:\Users\mi\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\mi\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\mi\.cache\huggingface\modules\transformers_modules\MiniCPM-V\resampler.py", line 152, in forward
    out = self.attn(
  File "C:\Users\mi\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\mi\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\mi\miniconda3\lib\site-packages\torch\nn\modules\activation.py", line 1241, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "C:\Users\mi\miniconda3\lib\site-packages\torch\nn\functional.py", line 5413, in multi_head_attention_forward
    attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
RuntimeError: self and mat2 must have the same dtype, but got Half and Byte

code:

import torch
from PIL import Image
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
import time

model = AutoModel.from_pretrained('./models/MiniCPM-V', trust_remote_code=True, load_in_low_bit="sym_int4", optimize_model=True, use_cache=True)
# model = AutoModel.from_pretrained('./models/MiniCPM-V', trust_remote_code=True, torch_dtype=torch.bfloat16)
model = model.eval()
model = model.half()
model = model.to("xpu")

tokenizer = AutoTokenizer.from_pretrained('./models/MiniCPM-V', trust_remote_code=True)


image = Image.open('AI.png').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': question}]

torch.xpu.synchronize()
t0 = time.time()
answer, context, _ = model.chat(
    image=image,
    msgs=msgs,
    context=None,
    tokenizer=tokenizer,
    sampling=True,
    temperature=0.7
)
torch.xpu.synchronize()
t1 = time.time()
print("---cost time(s): ", t1 - t0)
print(answer)

Since Not Implemented Error: Could not run 'aten::_upsample_bicubic2d_aa.out' with arguments from the 'XPU' backend.”,
you need to make these changes:
env\Lib\site-packages\timm\layers\pos_embed.py 大约46行, 改成posemb = F.interpolate(posemb.to("cpu"), size=new_size, mode=interpolation, antialias=antialias).to(posemb.device)

Could you please help to take a look?
Thanks.
Version:

bigdl-core-xe-21                        2.5.0b20240318
bigdl-llm                               2.5.0b20240318
intel-extension-for-pytorch             2.1.20+git4849f3b
torch                                   2.1.0a0+git7bcf7da
torchvision                             0.16.0a0+cxx11.abi
transformers                            4.38.2
timm                                    0.9.16
@MeouSker77
Copy link
Contributor

MeouSker77 commented Mar 21, 2024

please install latest bigdl-llm, and use following code to load model:

from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel

model = AutoModel.from_pretrained(model_path, trust_remote_code=True,
                                  load_in_low_bit="sym_int4", modules_to_not_convert=["vpm", "resampler"]).eval()
model = model.float()
model = model.to('xpu')

sym_int4 may perform poorly for model less than 3B, if so, you can change it to sym_int8 for better output quality

@violet17
Copy link
Author

Thank you very much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants