Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference chatglm3-6b with int4 and 8k input prompt failed #10511

Closed
Fred-cell opened this issue Mar 22, 2024 · 3 comments
Closed

inference chatglm3-6b with int4 and 8k input prompt failed #10511

Fred-cell opened this issue Mar 22, 2024 · 3 comments
Assignees

Comments

@Fred-cell
Copy link

bigdl-llm: 2.5.0b20240321, all-in-one benchmark tools:
8k prompt refers https://github.com/intel/xFasterTransformer/blob/main/benchmark/prompt.json
2024-03-22 20:38:03,260 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00, 1.52it/s]
2024-03-22 20:38:08,095 - INFO - Converting the current model to sym_int4 format......

loading of model costs 8.393956548999995s and 3.583984375GB
<class 'transformers_modules.chatglm3-6b.modeling_chatglm.ChatGLMForConditionalGeneration'>
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524544,0,0], local id: [256,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524545,0,0], local id: [257,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524546,0,0], local id: [258,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524547,0,0], local id: [259,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524548,0,0], local id: [260,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed
/build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524549,0,0], local id: [261,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed

@hkvision
Copy link
Contributor

hkvision commented Mar 29, 2024

Should be the same issue as #10513

As we test, when the input length is larger than 8166, it will show the same error as above. When the input length is smaller or equal to 8166, it will show the error that is similar to llama2 8k issue, which is the IPEX allocation error:
image

The root cause should be the same. Need further investigation.

@uniartisan
Copy link

uniartisan commented May 17, 2024

oneapi-src/oneDNN#1638

intel/intel-extension-for-pytorch#325
It seems to be related to this.

@lalalapotter
Copy link
Contributor

Hi @uniartisan, thanks for your suggestions, currently we workaround this issue by PR #10648, close issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants