There is no any performance improvement between FP32 and FP32-INT4_ASYM #154

JunxiChhen · 2024-01-17T09:34:07Z

I am running Qwen-7B on SPR.
And I found there is no any significant perf improvement between FP32 & compressed FP32-INT4_ASYM.

FP32 benchmarking cmd:

python benchmark.py     -m /root/.cache/huggingface/hub/Qwen-7B-Chat-ov/pytorch/dldt/FP32     -p "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. "     -n 5     -ic 32     -bs 1     --num_beams 1     -d CPU     --torch_compile_backend openvino

Latency: 129.71 ms/token

INT4 benchmarking cmd:

python benchmark.py     -m /root/.cache/huggingface/hub/Qwen-7B-Chat-ov/pytorch/dldt/compressed_weights/OV_FP32-INT4_ASYM     -p "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. "     -n 5     -ic 32     -bs 1     --num_beams 1     -d CPU     --torch_compile_backend openvino

Latency: 121.91 ms/token

Did I loose something important in benchmarking cmd? Or it is an issue.

The text was updated successfully, but these errors were encountered:

JunxiChhen closed this as completed Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is no any performance improvement between FP32 and FP32-INT4_ASYM #154

There is no any performance improvement between FP32 and FP32-INT4_ASYM #154

JunxiChhen commented Jan 17, 2024 •

edited

Loading

There is no any performance improvement between FP32 and FP32-INT4_ASYM #154

There is no any performance improvement between FP32 and FP32-INT4_ASYM #154

Comments

JunxiChhen commented Jan 17, 2024 • edited Loading

JunxiChhen commented Jan 17, 2024 •

edited

Loading