Qwen1.5-7B wrong outputs with 1024 prompts #10354

Uxito-Ada · 2024-03-08T08:21:04Z

code: all-in-one benchmark, where prmopt/2048.txt is replaced with the below Chinese ones
in-out pair: 1024-128 (2048 prompts are truncated to 1024)
model: Qwen1.5-7B-Chat
machine: SPR01

红楼梦
INT4/INT8/BF16 all repeat like:

空空道人便骑着驴往往去，往去往去，往去往去，往去往去，往去往去，往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往去往去,往

患者
INT4 repeats like below, while BF16 and INT8 give no answer:

临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断

Uxito-Ada · 2024-03-08T08:29:10Z

@ivy-lv11 pls take a look at this

Uxito-Ada · 2024-03-08T08:44:30Z

transformers: 4.38.1/4.37.0
torch: 2.2.0+cpu
ipex: 2.2.0

jason-dai · 2024-03-08T08:54:52Z

transformers: 4.38.1/4.37.0 torch: 2.2.0+cpu ipex: 2.2.0

If BF16 output is wrong, you can verify stock pytorch first (without BigDL).

ivy-lv11 · 2024-03-11T01:49:25Z

Environment:

transformers: 4.38.1/4.37.0;

torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers, pytorch and remove low_bit, the output looks normal.
prompt: 患者

1）中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等；2）嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等；3）淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等；4）单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门，但见异彩纷呈，楼阁参差，真是仙境。士隐跟着二仙，转过山坡，来到一座楼前，只见门额上写着“薄命司”三个字。和尚说：“这里就是咱们要办的事了。”\n士隐随着和尚进了楼，只见里面摆着许多签筒，签筒里装着各色签子。和尚说：“你抽一支签，看看你的命运如何。”士隐随手拿起一支签，签上写着：“甄士隐梦幻识通灵，贾雨村风尘怀闺秀。”

Uxito-Ada · 2024-03-11T01:54:50Z

Environment:

transformers: 4.38.1/4.37.0;

torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers and pytorch, the output looks normal. prompt: 患者

1）中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等；2）嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等；3）淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等；4）单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门，但见异彩纷呈，楼阁参差，真是仙境。士隐跟着二仙，转过山坡，来到一座楼前，只见门额上写着“薄命司”三个字。和尚说：“这里就是咱们要办的事了。”\n士隐随着和尚进了楼，只见里面摆着许多签筒，签筒里装着各色签子。和尚说：“你抽一支签，看看你的命运如何。”士隐随手拿起一支签，签上写着：“甄士隐梦幻识通灵，贾雨村风尘怀闺秀。”

what is the torch version? torch==2.2.0?

ivy-lv11 · 2024-03-11T02:03:24Z

Environment:

transformers: 4.38.1/4.37.0;

torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers and pytorch, the output looks normal. prompt: 患者

1）中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等；2）嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等；3）淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等；4）单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门，但见异彩纷呈，楼阁参差，真是仙境。士隐跟着二仙，转过山坡，来到一座楼前，只见门额上写着“薄命司”三个字。和尚说：“这里就是咱们要办的事了。”\n士隐随着和尚进了楼，只见里面摆着许多签筒，签筒里装着各色签子。和尚说：“你抽一支签，看看你的命运如何。”士隐随手拿起一支签，签上写着：“甄士隐梦幻识通灵，贾雨村风尘怀闺秀。”

what is the torch version? torch==2.2.0?

Yes.

Uxito-Ada · 2024-03-11T02:29:31Z

Environment:

transformers: 4.38.1/4.37.0;

torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers, pytorch and remove low_bit, the output looks normal. prompt: 患者

1）中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等；2）嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等；3）淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等；4）单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门，但见异彩纷呈，楼阁参差，真是仙境。士隐跟着二仙，转过山坡，来到一座楼前，只见门额上写着“薄命司”三个字。和尚说：“这里就是咱们要办的事了。”\n士隐随着和尚进了楼，只见里面摆着许多签筒，签筒里装着各色签子。和尚说：“你抽一支签，看看你的命运如何。”士隐随手拿起一支签，签上写着：“甄士隐梦幻识通灵，贾雨村风尘怀闺秀。”

removing load_in_low_bit and optimize_model runs FP32. If FP32 gave normal outputs, the issue can be related to INT4, which can be compared with Llama.cpp etc. And BF16 can be compared with native Pytorch BF16 support.

ivy-lv11 · 2024-03-11T04:10:34Z

Use transformers and bf16 by pytorch_autocast_bf16 API in all-in-one benchmark : the output also looks normal.

他们进了园门，但见异彩纷呈，楼阁参差，真是仙境。士隐随着二仙，转过山坡，来到一座楼前，只见一位仙姑端坐在楼上，旁边有一个丫鬟捧着茶盘。仙姑见了士隐，笑道：“甄士隐，你来了。”士隐忙施礼，问道：“仙姑如何认得我?”仙姑说：“你忘了，我在警幻仙子处见过你，还赠过你《好了歌》呢。”士隐这才想起，忙问仙姑：“仙姑为何赠我《好了歌

1）中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等；2）嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等；3）淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些化学物质或药物中毒等；4）单核细胞比例增高见于某些感染、血液系统疾病、急性炎症、慢性粒细胞白血

Uxito-Ada · 2024-03-11T08:05:50Z

After disabling overriding of qwen2 attention forward (qwen1.5 enjoys a model type of qwen2) in convert.py, normal answer can be generated on SPR:

两旁是一副对联：\n假作真时真亦假，无为有处有还无。\n二人进了里面，见是一座楼阁，楼内挂着“薄命司”的牌子。士隐抬头一看，见里面挂着许多签，签上写着名字，旁边注着诗句和判词。他见签上有个“甄英莲”的名字，就抽出来看，上面写着：\n娇嫩花朵偏遭风雨，聪明女儿薄命终身。\n原是仙家遗种，却落在草莽人家。生于富贵，却死于贫贱。这是她的命，无可奈何。士隐看了，叹了一口气，把签放下。又见一个签上写着“贾

Need to check what is wrong in qwen2_attention_forward_origin.

ivy-lv11 · 2024-03-12T01:49:50Z

Test BigDL-LLM 2.5.0b20240311

Envirionment:

bigdl-llm version: 2.5.0b20240311
transformers version: 4.37.0
torch version: 2.1.0a0+cxx11.abi

On arc the output looks normal:

1）正常生理情况下，中性粒细胞比例偏高，提示有感染或炎症；2）单核细胞比例偏高，提示有慢性炎症、结核病、白血病等。

However, when running on CPU the output still looks abnormal.

临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断

Uxito-Ada · 2024-03-13T03:02:05Z

It is found CPU uses different attention module from GPU, Qwen2SdpaAttention, which applies scaled dot product on qkv, and if converted with Qwen2Attention will never give right output.

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 4096)
    (layers): ModuleList(
      (0-31): 32 x Qwen2DecoderLayer(
        (self_attn): Qwen2SdpaAttention(
          (q_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (k_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (v_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (o_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (up_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (down_proj): LowBitLinear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): LowBitLinear(in_features=4096, out_features=151936, bias=False)
)

ivy-lv11 · 2024-03-13T03:02:42Z

Model architecture

GPU

Use Qwen2attention

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 4096)
    (layers): ModuleList(
      (0-31): 32 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=151936, bias=False)
)

CPU

use Qwen2SdpaAttention

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 4096)
    (layers): ModuleList(
      (0-31): 32 x Qwen2DecoderLayer(
        (self_attn): Qwen2SdpaAttention(
          (q_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (k_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (v_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (o_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (up_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (down_proj): LowBitLinear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): LowBitLinear(in_features=4096, out_features=151936, bias=False)
)

Uxito-Ada · 2024-03-14T03:05:54Z

Fixed in #10395 and #10409, and new cpu performance data is here.

Uxito-Ada assigned Uxito-Ada and unassigned Uxito-Ada Mar 8, 2024

Uxito-Ada assigned ivy-lv11 Mar 12, 2024

Uxito-Ada mentioned this issue Mar 12, 2024

Fix convert_forward recursive submodule issue #10386

Closed

5 tasks

Uxito-Ada mentioned this issue Mar 13, 2024

Qwen2 SDPA forward on CPU #10395

Merged

5 tasks

Uxito-Ada self-assigned this Mar 14, 2024

Uxito-Ada closed this as completed Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen1.5-7B wrong outputs with 1024 prompts #10354

Qwen1.5-7B wrong outputs with 1024 prompts #10354

Uxito-Ada commented Mar 8, 2024 •

edited

Loading

Uxito-Ada commented Mar 8, 2024

Uxito-Ada commented Mar 8, 2024

jason-dai commented Mar 8, 2024 •

edited

Loading

ivy-lv11 commented Mar 11, 2024 •

edited

Loading

Uxito-Ada commented Mar 11, 2024

Chinese

ivy-lv11 commented Mar 11, 2024

Chinese

Uxito-Ada commented Mar 11, 2024

Chinese

ivy-lv11 commented Mar 11, 2024 •

edited

Loading

Uxito-Ada commented Mar 11, 2024 •

edited

Loading

ivy-lv11 commented Mar 12, 2024

Uxito-Ada commented Mar 13, 2024

ivy-lv11 commented Mar 13, 2024

Uxito-Ada commented Mar 14, 2024

Qwen1.5-7B wrong outputs with 1024 prompts #10354

Qwen1.5-7B wrong outputs with 1024 prompts #10354

Comments

Uxito-Ada commented Mar 8, 2024 • edited Loading

Uxito-Ada commented Mar 8, 2024

Uxito-Ada commented Mar 8, 2024

jason-dai commented Mar 8, 2024 • edited Loading

ivy-lv11 commented Mar 11, 2024 • edited Loading

Chinese

Uxito-Ada commented Mar 11, 2024

Chinese

ivy-lv11 commented Mar 11, 2024

Chinese

Uxito-Ada commented Mar 11, 2024

Chinese

ivy-lv11 commented Mar 11, 2024 • edited Loading

Uxito-Ada commented Mar 11, 2024 • edited Loading

ivy-lv11 commented Mar 12, 2024

Test BigDL-LLM 2.5.0b20240311

Uxito-Ada commented Mar 13, 2024

ivy-lv11 commented Mar 13, 2024

Model architecture

GPU

CPU

Uxito-Ada commented Mar 14, 2024

Uxito-Ada commented Mar 8, 2024 •

edited

Loading

jason-dai commented Mar 8, 2024 •

edited

Loading

ivy-lv11 commented Mar 11, 2024 •

edited

Loading

ivy-lv11 commented Mar 11, 2024 •

edited

Loading

Uxito-Ada commented Mar 11, 2024 •

edited

Loading