Qwen-7B-Chat fail with larger 6.7k for second or 3rd time #11106

juan-OY · 2024-05-23T02:04:57Z

MTL running One task with -i 6707 -o 160
it shows OOM on MTL, while the similar command can pass in the previous testing.

Traceback (most recent call last):
File "C:\multi-modality\cvte_qwen\ultra_test_code_and_data\benchmark_test2intel\speed_test_ultra.py", line 241, in
infer_test(model, tokenizer, input_token_num, output_token_num, total_speed_file)
File "C:\multi-modality\cvte_qwen\ultra_test_code_and_data\benchmark_test2intel\speed_test_ultra.py", line 108, in infer_test
prefill_output = model(**model_inputs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Intel/.cache\huggingface\modules\transformers_modules\Qwen-7B-Chat-sym_int4\modeling_qwen.py", line 1060, in forward
lm_logits = self.lm_head(hidden_states)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\ipex_llm\transformers\low_bit_linear.py", line 703, in forward
result = linear_q4_0.forward_new(x_2d, self.weight.data, self.weight.qtype,
RuntimeError: XPU out of memory. Tried to allocate 2.37 GiB (GPU 0; 14.48 GiB total capacity; 6.94 GiB already allocated; 8.04 GiB reserved in total by PyTorch)

qiuxin2012 · 2024-05-27T06:29:45Z

To minimize MTL's memory usage, you can put embedding on cpu memory by setting cpu_embedding=True when calling from_pretrained or load_low_bit. Qwen's embedding is about 1GB.

juan-OY · 2024-05-27T13:38:09Z

we can close it, issue can not be reproduced again

qiuxin2012 self-assigned this May 23, 2024

qiuxin2012 added the user issue label May 23, 2024

juan-OY closed this as completed May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen-7B-Chat fail with larger 6.7k for second or 3rd time #11106

Qwen-7B-Chat fail with larger 6.7k for second or 3rd time #11106

juan-OY commented May 23, 2024

qiuxin2012 commented May 27, 2024

juan-OY commented May 27, 2024

Qwen-7B-Chat fail with larger 6.7k for second or 3rd time #11106

Qwen-7B-Chat fail with larger 6.7k for second or 3rd time #11106

Comments

juan-OY commented May 23, 2024

qiuxin2012 commented May 27, 2024

juan-OY commented May 27, 2024