BigDL-A750-Qwen7b-Allocation is out of device memory on current platform. #10575

ChenVkl · 2024-03-28T03:29:03Z

When I use A750 to run BigDL to load the Qwen-7b int4 model, it will show that the memory is exceeded, I don't know what's going on, is there a problem with my operation?
The following is the error message：
Traceback (most recent call last):
File "D:\workspace\text-generation-webui-bigdl-llm\modules\text_generation.py", line 408, in generate_reply_HF
shared.model.generate(**generate_params)
File "C:\Users\ZhangChen.cache\huggingface\modules\transformers_modules\Qwen-7B\modeling_qwen.py", line 1259, in generate
return super().generate(
File "C:\Users\ZhangChen.conda\envs\llm\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\ZhangChen.conda\envs\llm\lib\site-packages\transformers\generation\utils.py", line 1525, in generate
return self.sample(
File "C:\Users\ZhangChen.conda\envs\llm\lib\site-packages\transformers\generation\utils.py", line 2622, in sample
outputs = self(
File "C:\Users\ZhangChen.conda\envs\llm\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\ZhangChen.conda\envs\llm\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\ZhangChen.cache\huggingface\modules\transformers_modules\Qwen-7B\modeling_qwen.py", line 1060, in forward
lm_logits = self.lm_head(hidden_states)
File "C:\Users\ZhangChen.conda\envs\llm\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\ZhangChen.conda\envs\llm\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\ZhangChen.conda\envs\llm\lib\site-packages\bigdl\llm\transformers\low_bit_linear.py", line 622, in forward
result = linear_q4_0.forward_new(x_2d, self.weight.data, self.weight.qtype,
RuntimeError: Allocation is out of device memory on current platform.
Output generated in 12.06 seconds (0.00 tokens/s, 0 tokens, context 730, seed 290229866)

hkvision · 2024-03-29T06:22:01Z

Hi,

Thanks for raising this issue, want to confirm with you:

What's the initial GPU memory occupied by the system before you run the model inference?
What input length do you chat with the model? context 730 is it this?

We are converting the loaded model into fp16 for less memory usage as Arc750 only has 8g memory. (follow-up in this issue: intel-analytics/text-generation-webui#25)

ChenVkl · 2024-04-02T02:36:22Z

Hi,

Thanks for raising this issue, want to confirm with you:

What's the initial GPU memory occupied by the system before you run the model inference?

What input length do you chat with the model? context 730 is it this?

We are converting the loaded model into fp16 for less memory usage as Arc750 only has 8g memory. (follow-up in this issue: intel-analytics/text-generation-webui#25)

Regarding your question, the initial GPU memory occupied by the system is approximately 1.6GB.
When I chat with a large model, any simple question will result in an error message indicating that the memory is exceeded.
Currently, I have switched to the Qwen-7b-int4 model to try whether it can run on BigDL. For specific issue, please refer to this link. #10616

hkvision · 2024-04-03T09:25:52Z

Some suggestions from our side for you to possibly run QWen-7B on Arc750:

Use the latest ipex-llm (we have renamed from bigdl-llm to ipex-llm) and export IPEX_LLM_LOW_MEM=1 before you launch the WebUI
Could you clear up some applications that would occupy GPU memory? If before running our workload 1.6G is already occupied, then the remaining 6.4G may be challenging to run Qwen I suppose.

ChenVkl · 2024-04-07T06:42:44Z

s

Some suggestions from our side for you to possibly run QWen-7B on Arc750:

Use the latest ipex-llm (we have renamed from bigdl-llm to ipex-llm) and export IPEX_LLM_LOW_MEM=1 before you launch the WebUI

Could you clear up some applications that would occupy GPU memory? If before running our workload 1.6G is already occupied, then the remaining 6.4G may be challenging to run Qwen I suppose.

Ok, I see the latest link, I'll give it a try. Thanks a lot.

ChenVkl · 2024-04-07T06:48:21Z

Some suggestions from our side for you to possibly run QWen-7B on Arc750:

Use the latest ipex-llm (we have renamed from bigdl-llm to ipex-llm) and export IPEX_LLM_LOW_MEM=1 before you launch the WebUI

Could you clear up some applications that would occupy GPU memory? If before running our workload 1.6G is already occupied, then the remaining 6.4G may be challenging to run Qwen I suppose.

I'd like to ask if you have run Qwen with a 750 before, and how much GPU memory will it take? Thanks.

hkvision · 2024-04-07T11:29:00Z

Some suggestions from our side for you to possibly run QWen-7B on Arc750:

Use the latest ipex-llm (we have renamed from bigdl-llm to ipex-llm) and export IPEX_LLM_LOW_MEM=1 before you launch the WebUI

Could you clear up some applications that would occupy GPU memory? If before running our workload 1.6G is already occupied, then the remaining 6.4G may be challenging to run Qwen I suppose.

I'd like to ask if you have run Qwen with a 750 before, and how much GPU memory will it take? Thanks.

I haven't tried text generation webui, but for simple generate, qwen-7b can run on Arc750, for 256 input the memory I observe is 5290.11M. This memory value is from xpu-smi, and may not be the actual peak memory. I suppose the peak memory would be somewhat close or larger than 6g.

Some suggestions you may try to run on your side:

https://github.com/intel-analytics/text-generation-webui Please use the latest code, as we have converted the activation precision to fp16.
export IPEX_LLM_LOW_MEM=1 before you launch your webui.

ChenVkl · 2024-04-09T07:54:40Z

qwen-7b can run on Arc750

Some suggestions from our side for you to possibly run QWen-7B on Arc750:

Use the latest ipex-llm (we have renamed from bigdl-llm to ipex-llm) and export IPEX_LLM_LOW_MEM=1 before you launch the WebUI

Could you clear up some applications that would occupy GPU memory? If before running our workload 1.6G is already occupied, then the remaining 6.4G may be challenging to run Qwen I suppose.

I'd like to ask if you have run Qwen with a 750 before, and how much GPU memory will it take? Thanks.

I haven't tried text generation webui, but for simple generate, qwen-7b can run on Arc750, for 256 input the memory I observe is 5290.11M. This memory value is from xpu-smi, and may not be the actual peak memory. I suppose the peak memory would be somewhat close or larger than 6g.

Some suggestions you may try to run on your side:

https://github.com/intel-analytics/text-generation-webui Please use the latest code, as we have converted the activation precision to fp16.

export IPEX_LLM_LOW_MEM=1 before you launch your webui.

Thank you very much, I'll give it a try.
In addition, I want to ask by the way, you said that you can run Qwen-7b with A750, which link do you use, could you please send it to me if it's convenient for you?

hkvision · 2024-04-09T08:11:51Z

I'm using https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/dev/benchmark/all-in-one with api transformer_int4_fp16_gpu in config.yaml with export IPEX_LLM_LOW_MEM=1 and bash run-arc.sh.
Is it the link you want?

Daroude · 2024-04-10T08:28:37Z

can you clarify where export IPEX_LLM_LOW_MEM=1 needs to be put? When I type it in conda before starting the server.py I get:

'export' is not recognized as an internal or external command, operable program or batch file.

My arc A750 outputs the following error after a few interactions with the chatbot, which I assume it is memory related.

RuntimeError: Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)

hkvision · 2024-04-10T08:30:15Z

Running on windows please change it to set IPEX_LLM_LOW_MEM=1

Daroude · 2024-04-12T09:24:09Z

thanks, what doesn't seem to be have been then issue. As soon as about 2.000+ context is reached I get

RuntimeError: Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error) Output generated in 1.29 seconds (0.00 tokens/s, 0 tokens, context 2308, seed 1309198421)

should I open a new ticket?

hkvision · 2024-04-12T09:47:49Z

thanks, what doesn't seem to be have been then issue. As soon as about 2.000+ context is reached I get

RuntimeError: Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error) Output generated in 1.29 seconds (0.00 tokens/s, 0 tokens, context 2308, seed 1309198421)

should I open a new ticket?

Sure, you can open a new ticket and give more details about your settings (system, version, how you run, etc.). We will try to reproduce this. Thanks!

hkvision mentioned this issue Mar 29, 2024

Add model.half() when model load intel-analytics/text-generation-webui#25

Closed

hkvision added the user issue label Mar 29, 2024

ChenVkl closed this as completed Apr 2, 2024

hkvision mentioned this issue Apr 3, 2024

ModuleNotFoundError: No module named 'transformers_modules.Qwen-7B-Chat-Int4' #10616

Closed

hkvision reopened this Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigDL-A750-Qwen7b-Allocation is out of device memory on current platform. #10575

BigDL-A750-Qwen7b-Allocation is out of device memory on current platform. #10575

ChenVkl commented Mar 28, 2024

hkvision commented Mar 29, 2024

ChenVkl commented Apr 2, 2024

hkvision commented Apr 3, 2024

ChenVkl commented Apr 7, 2024

ChenVkl commented Apr 7, 2024

hkvision commented Apr 7, 2024

ChenVkl commented Apr 9, 2024

hkvision commented Apr 9, 2024

Daroude commented Apr 10, 2024

hkvision commented Apr 10, 2024

Daroude commented Apr 12, 2024

hkvision commented Apr 12, 2024

BigDL-A750-Qwen7b-Allocation is out of device memory on current platform. #10575

BigDL-A750-Qwen7b-Allocation is out of device memory on current platform. #10575

Comments

ChenVkl commented Mar 28, 2024

hkvision commented Mar 29, 2024

ChenVkl commented Apr 2, 2024

hkvision commented Apr 3, 2024

ChenVkl commented Apr 7, 2024

ChenVkl commented Apr 7, 2024

hkvision commented Apr 7, 2024

ChenVkl commented Apr 9, 2024

hkvision commented Apr 9, 2024

Daroude commented Apr 10, 2024

hkvision commented Apr 10, 2024

Daroude commented Apr 12, 2024

hkvision commented Apr 12, 2024