Skip to content

ipex-llm run benchmark error on LNL NPU #12895

@Lucas-cai

Description

@Lucas-cai

I used miniforge to create env, and I updated driver of NPU. I uesd test_api 'transformers_int4_npu_win' in the config.yaml. Here is the log of violation of Mem.

(npu) C:\Users\intel\model\ipex-llm-main\python\llm\dev\benchmark\all-in-one>python run.py
C:\Users\intel\miniforge3\envs\npu\Lib\site-packages\transformers\deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00, 6.06s/it]
2025-02-25 15:49:32,322 - INFO - Converting model, it may takes up to several minutes ...
C:\Users\intel\miniforge3\envs\npu\Lib\site-packages\torch\nn\init.py:412: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
2025-02-25 15:50:00,754 - INFO - Finish to convert model
decode start compiling
decode end compiling
Model saved to ./save_converted_model_dir\decoder_layer_0.xml
decode start compiling
decode end compiling
Model saved to ./save_converted_model_dir\decoder_layer_1.xml
prefill start compiling
prefill end compiling
Model saved to ./save_converted_model_dir\decoder_layer_prefill.xml
start compiling
Model saved to ./save_converted_model_dir\lm_head.xml
start compiling
C:\Users\intel\miniforge3\envs\npu\Lib\site-packages\ipex_llm\transformers\npu_model.py:49: UserWarning: Model is already saved at ./save_converted_model_dir
warnings.warn(f"Model is already saved at {self.save_directory}")
2025-02-25 15:52:03,955 - INFO - Converted model has already saved to ./save_converted_model_dir.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

loading of model costs 164.65547030000005s
model generate cost: 3.3208497000000534
<|begin▁of▁sentence|><|begin▁of▁sentence|>461 U.S. 238 (1983) OLIM ET AL. v. WAKINEKONA No. 201, 201, 202, 203, 204, 205, 2
model generate cost: 2.9965502000000015
<|begin▁of▁sentence|><|begin▁of▁sentence|>461 U.S. 238 (1983) OLIM ET AL. v. WAKINEKONA No. 201, 201, 202, 203, 204, 205, 2
model generate cost: 3.0089657999999417
<|begin▁of▁sentence|><|begin▁of▁sentence|>461 U.S. 238 (1983) OLIM ET AL. v. WAKINEKONA No. 201, 201, 202, 203, 204, 205, 2
model generate cost: 2.998917000000006
<|begin▁of▁sentence|><|begin▁of▁sentence|>461 U.S. 238 (1983) OLIM ET AL. v. WAKINEKONA No. 201, 201, 202, 203, 204, 205, 2
Traceback (most recent call last):
File "C:\Users\intel\model\ipex-llm-main\python\llm\dev\benchmark\all-in-one\run.py", line 2338, in
run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
File "C:\Users\intel\model\ipex-llm-main\python\llm\dev\benchmark\all-in-one\run.py", line 197, in run_model
result = transformers_int4_npu_win(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, optimize_model, transpose_value_cache, group_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\intel\model\ipex-llm-main\python\llm\dev\benchmark\all-in-one\run.py", line 673, in transformers_int4_npu_win
output_ids = model.generate(input_ids, do_sample=False, max_new_tokens=out_len,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\intel\miniforge3\envs\npu\Lib\site-packages\ipex_llm\transformers\npu_models\convert.py", line 338, in generate
return simple_generate(self, inputs=inputs, streamer=streamer, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\intel\miniforge3\envs\npu\Lib\site-packages\ipex_llm\transformers\npu_models\convert.py", line 404, in simple_generate
token = run_prefill(self.model_ptr, input_list, self.vocab_size,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\intel\miniforge3\envs\npu\Lib\site-packages\ipex_llm\transformers\npu_models\npu_llm_cpp.py", line 82, in run_prefill
plogits = _lib.run_prefill(model_ptr, input_ptr, input_len, repetition_penalty, False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: exception: access violation writing 0x000001F559302000
Exception ignored in: <function BaseNPUBackendWithPrefetch.del at 0x000001F09D548360>
Traceback (most recent call last):
File "C:\Users\intel\miniforge3\envs\npu\Lib\site-packages\intel_npu_acceleration_library\backend\base.py", line 245, in del
super(BaseNPUBackendWithPrefetch, self).del()
File "C:\Users\intel\miniforge3\envs\npu\Lib\site-packages\intel_npu_acceleration_library\backend\base.py", line 54, in del
backend_lib.destroyNNFactory(self._mm)
OSError: exception: access violation reading 0xFFFFFFFFFFFFFFFF
Exception ignored in: <function BaseNPUBackendWithPrefetch.del at 0x000001F09D548360>
Traceback (most recent call last):
File "C:\Users\intel\miniforge3\envs\npu\Lib\site-packages\intel_npu_acceleration_library\backend\base.py", line 245, in del
super(BaseNPUBackendWithPrefetch, self).del()
File "C:\Users\intel\miniforge3\envs\npu\Lib\site-packages\intel_npu_acceleration_library\backend\base.py", line 54, in del
backend_lib.destroyNNFactory(self._mm)
OSError: exception: access violation reading 0xFFFFFFFFFFFFFFFF

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions