inference chatglm3-6b with int4 and 8k input prompt failed #10511

Fred-cell · 2024-03-22T12:56:55Z

bigdl-llm: 2.5.0b20240321, all-in-one benchmark tools:
8k prompt refers https://github.com/intel/xFasterTransformer/blob/main/benchmark/prompt.json
2024-03-22 20:38:03,260 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00, 1.52it/s]
2024-03-22 20:38:08,095 - INFO - Converting the current model to sym_int4 format......

loading of model costs 8.393956548999995s and 3.583984375GB
<class 'transformers_modules.chatglm3-6b.modeling_chatglm.ChatGLMForConditionalGeneration'>
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524544,0,0], local id: [256,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524545,0,0], local id: [257,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524546,0,0], local id: [258,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524547,0,0], local id: [259,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524548,0,0], local id: [260,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524549,0,0], local id: [261,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed

The text was updated successfully, but these errors were encountered:

hkvision · 2024-03-29T01:59:45Z

Should be the same issue as #10513

As we test, when the input length is larger than 8166, it will show the same error as above. When the input length is smaller or equal to 8166, it will show the error that is similar to llama2 8k issue, which is the IPEX allocation error:

The root cause should be the same. Need further investigation.

uniartisan · 2024-05-17T13:28:08Z

oneapi-src/oneDNN#1638

intel/intel-extension-for-pytorch#325
It seems to be related to this.

lalalapotter · 2024-05-20T06:16:43Z

Hi @uniartisan, thanks for your suggestions, currently we workaround this issue by PR #10648, close issue for now.

jason-dai added the user issue label Mar 25, 2024

hkvision assigned NovTi Mar 25, 2024

hkvision assigned lalalapotter and unassigned NovTi Apr 2, 2024

hkvision mentioned this issue Apr 7, 2024

Heavy CPU bottleneck when working with Intel ARC A770 16GB GPU Inference #10668

Closed

lalalapotter closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference chatglm3-6b with int4 and 8k input prompt failed #10511

inference chatglm3-6b with int4 and 8k input prompt failed #10511

Fred-cell commented Mar 22, 2024

hkvision commented Mar 29, 2024 •

edited

Loading

uniartisan commented May 17, 2024 •

edited

Loading

lalalapotter commented May 20, 2024

inference chatglm3-6b with int4 and 8k input prompt failed #10511

inference chatglm3-6b with int4 and 8k input prompt failed #10511

Comments

Fred-cell commented Mar 22, 2024

hkvision commented Mar 29, 2024 • edited Loading

uniartisan commented May 17, 2024 • edited Loading

lalalapotter commented May 20, 2024

hkvision commented Mar 29, 2024 •

edited

Loading

uniartisan commented May 17, 2024 •

edited

Loading