Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatGLM3 output token size is not generated as expected #390

Closed
JunxiChhen opened this issue Apr 26, 2024 · 6 comments
Closed

ChatGLM3 output token size is not generated as expected #390

JunxiChhen opened this issue Apr 26, 2024 · 6 comments
Assignees

Comments

@JunxiChhen
Copy link
Contributor

Benchmark cmd (-ic is 128):

numactl -C 0-55 -m 0 python benchmark.py     -m /root/.cache/huggingface/hub/chatglm3-6b-ov/pytorch/dldt/FP16     -p "It is done, and submitted..."     -n 2     -bs 1     -d CPU --torch_compile_backend openvino     -ic 128 --num_beams 1 -lc bfloat16_config.json 2>&1 | tee -a ./logs/0.log

image

BTW, ChatGLM2's output size is right.

@peterchen-intel
Copy link
Collaborator

@JunxiChhen The reason is that ChatGLM3 gives "end token" at output size 17 with the input prompt. The WA is to update prompt to let ChatGLM3 generate more tokens (>=128).
-ic means the maximum output token size (or maximum inference number).
There is a ongoing #289 which is trying to "force" generate expected output size even "end token" is given.

@peterchen-intel
Copy link
Collaborator

@JunxiChhen PR has been merged. Please verify.

@yangkunx
Copy link

When I update the latest commit id, run case with blew error:
image

@peterchen-intel
Copy link
Collaborator

@yangkunx This new issue should be fixed by PR#435

@peterchen-intel
Copy link
Collaborator

#289 was just reverted, due to the performance impact. New PR #457 is WIP

@peterchen-intel
Copy link
Collaborator

New PR #457 was merged. @yangkunx @JunxiChhen please verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants