Skip to content

Conversation

@Kaihui-intel
Copy link
Contributor

@Kaihui-intel Kaihui-intel commented Oct 30, 2025

Accuracy

scheme /(opt-125m,) format RTN iter>0
W4A16 auto_round 0.2882 0.3526
W2A16 auto_round   0.1657
W3A16 auto_round   0.3247
W8A16 auto_round   0.3784
bit s group_size 32 auto_round 0.3749 0.3679
bit s group_size 32 auto_gptq 0.3747 0.3658
bit s group_size 32 auto_awq 0.3749 0.3646

#788

Memory

memory check
Qwen2.5-7B-Instruct-w4g32 RTN auto_round
mprof peak
16659.441MiB->9200.250MiB ~55%

Time

quantization and saving time
opt-125m

branch RTN iter>0
cur branch 67s 78s
main branch 50s 88s

Qwen2.5-7B-Instruct

branch RTN iter>0
cur branch 4min4s 18min10s
main branch 3min54s 17min53s

immediate pacing and saving now only support formats[0]

Kaihui-intel and others added 8 commits October 16, 2025 01:41
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@wenhuach21
Copy link
Contributor

Thanks for the great work! Could you check the maximum RAM usage to see whether it has been reduced significantly, as expected?

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@xin3he xin3he modified the milestones: 1.0, 0.9.0 Oct 30, 2025
Kaihui-intel and others added 8 commits October 31, 2025 01:07
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@wenhuach21 wenhuach21 changed the title Support for immediate saving [High Risk]Support for immediate saving Oct 31, 2025
Kaihui-intel and others added 6 commits November 4, 2025 02:54
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Kaihui-intel and others added 3 commits November 5, 2025 08:30
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@wenhuach21
Copy link
Contributor

Since this PR is still in development and may not have undergone much verification, I suggest adding a low_cpu_mem_usage option to the API. If enabled, we can trigger save_immediately whenever possible.

Kaihui-intel and others added 4 commits November 6, 2025 00:29
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@wenhuach21 wenhuach21 self-requested a review November 7, 2025 05:05
@wenhuach21 wenhuach21 changed the title [High Risk]Support for immediate saving Support for immediate saving to reduce ram usage Nov 7, 2025
@wenhuach21 wenhuach21 merged commit daeb3bb into main Nov 7, 2025
23 checks passed
@wenhuach21 wenhuach21 deleted the kaihui/save_block branch November 7, 2025 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants