Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no any performance improvement between FP32 and FP32-INT4_ASYM #154

Closed
JunxiChhen opened this issue Jan 17, 2024 · 0 comments
Closed

Comments

@JunxiChhen
Copy link
Contributor

JunxiChhen commented Jan 17, 2024

I am running Qwen-7B on SPR.
And I found there is no any significant perf improvement between FP32 & compressed FP32-INT4_ASYM.

FP32 benchmarking cmd:

python benchmark.py     -m /root/.cache/huggingface/hub/Qwen-7B-Chat-ov/pytorch/dldt/FP32     -p "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. "     -n 5     -ic 32     -bs 1     --num_beams 1     -d CPU     --torch_compile_backend openvino

Latency: 129.71 ms/token

INT4 benchmarking cmd:

python benchmark.py     -m /root/.cache/huggingface/hub/Qwen-7B-Chat-ov/pytorch/dldt/compressed_weights/OV_FP32-INT4_ASYM     -p "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. "     -n 5     -ic 32     -bs 1     --num_beams 1     -d CPU     --torch_compile_backend openvino

Latency: 121.91 ms/token

Did I loose something important in benchmarking cmd? Or it is an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant