You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running Qwen-7B on SPR.
And I found there is no any significant perf improvement between FP32 & compressed FP32-INT4_ASYM.
FP32 benchmarking cmd:
python benchmark.py -m /root/.cache/huggingface/hub/Qwen-7B-Chat-ov/pytorch/dldt/FP32 -p "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. " -n 5 -ic 32 -bs 1 --num_beams 1 -d CPU --torch_compile_backend openvino
Latency: 129.71 ms/token
INT4 benchmarking cmd:
python benchmark.py -m /root/.cache/huggingface/hub/Qwen-7B-Chat-ov/pytorch/dldt/compressed_weights/OV_FP32-INT4_ASYM -p "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. " -n 5 -ic 32 -bs 1 --num_beams 1 -d CPU --torch_compile_backend openvino
Latency: 121.91 ms/token
Did I loose something important in benchmarking cmd? Or it is an issue.
The text was updated successfully, but these errors were encountered:
I am running Qwen-7B on SPR.
And I found there is no any significant perf improvement between FP32 & compressed FP32-INT4_ASYM.
FP32 benchmarking cmd:
Latency: 129.71 ms/token
INT4 benchmarking cmd:
Latency: 121.91 ms/token
Did I loose something important in benchmarking cmd? Or it is an issue.
The text was updated successfully, but these errors were encountered: