Support GLM-Image model quantizaiton by lvliang-intel · Pull Request #1512 · intel/auto-round

lvliang-intel · 2026-03-08T13:24:40Z

Description

Quantize/save/evaluate the zai-org/GLM-Image in w4a16 format.

Model: https://huggingface.co/zai-org/GLM-Image
Target dtypes: w4a16

Save the quantized model for vllm-omni.

Type of Change

Related Issues

#1509

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

…-round into lvl/support_glm_image Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Copilot

Pull request overview

Adds GLM-Image (zai-org/GLM-Image) support for w4a16 quantization workflows, including pipeline-style (model_index.json) loading and correct export layout for vLLM-Omni-style directories.

Changes:

Add GLM-Image multimodal block discovery and register glm_image in MLLM handling/template registries.
Support loading models from diffusers-style pipeline directories (local + remote) and propagate pipeline subfolder metadata for downstream export/sharding.
Update export paths to save model weights under the pipeline’s model component subfolder while saving processor/tokenizer assets in the appropriate location.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
test/test_cpu/models/test_glm_image.py	Adds unit tests for GLM-Image helpers and pipeline subfolder discovery (but currently imports a missing module).
auto_round/utils/model.py	Adds local/remote pipeline subfolder resolution and uses it in `mllm_load_model`; passes `subfolder` to `from_pretrained`; tags loaded models with pipeline component metadata.
auto_round/utils/common.py	Adds `vqmodel` to multimodal key list.
auto_round/special_model_handler.py	Registers `glm_image` in limited-batch and only-text quantization lists; adds GLM-Image block-name helper to `SPECIAL_MULTIMODAL_BLOCK`.
auto_round/export/utils.py	Adds pipeline directory detection and export layout resolution; copies pipeline artifacts to output layout.
auto_round/export/export_to_autoround/export.py	Saves tokenizer/processor to processor output dir; saves model weights to model component dir when exporting pipeline models.
auto_round/export/export_to_autogptq/export.py	Same as above for AutoGPTQ export paths.
auto_round/compressors/shard_writer.py	Writes shard outputs under the pipeline model component subfolder when applicable.
auto_round/compressors/mllm/utils.py	Adds `vqmodel` to `VISUAL_KEYS`.
auto_round/compressors/mllm/template.py	Registers `glm_image` template using the HF processor.
auto_round/compressors/mllm/compressor.py	Ensures `image_processor` is forwarded into export/save path.
auto_round/autoround.py	Switches to MLLM mode when processor/image_processor are provided via kwargs.

test/test_cpu/models/test_glm_image.py

lvliang-intel · 2026-03-08T13:30:50Z

Inference test with quantized model using transformers:

1 T2I test
CUDA_VISIBLE_DEVICES=5 python run_glm_image.py --model-dir tmp_glm_image_w4a16 --prompt "A watercolor fox reading a book"
Loading GLM-Image pipeline from /mnt/disk1/lvl/auto-round-main/tmp_glm_image_w4a16 ...
Loading checkpoint shards: 100%|████████████████████████████████████████| 3/3 [00:01<00:00, 1.60it/s]
Loading weights: 100%|████████████████████████████████████████████| 111/111 [00:00<00:00, 1417.02it/s]
Loading pipeline components...: 86%|██████████████████████████████ | 6/7 [00:05<00:00, 2.04it/s]2026-03-08 21:08:39 INFO device.py L1643: [Memory Monitor] Before Applying general replacements: 'peak_ram': 1.62GB
2026-03-08 21:08:39 WARNING modeling_utils.py L4430: loss_type=None was set in the config but it is unrecognized. Using the default loss: ForCausalLMLoss.
2026-03-08 21:08:39 INFO device.py L1646: [Memory Monitor] After Applying general replacements: 'peak_ram': 1.62GB
2026-03-08 21:08:39 WARNING backend.py L1084: Better backend is found, please install all the following requirements to enable it.
2026-03-08 21:08:39 WARNING backend.py L1084: pip install -v "gptqmodel>=2.0" --no-build-isolation
Loading weights: 100%|██████████████████████████████████████████| 1491/1491 [00:00<00:00, 1522.83it/s]
Loading pipeline components...: 100%|███████████████████████████████████| 7/7 [00:07<00:00, 1.07s/it]
Pipeline loaded.

Prompt: A watercolor fox reading a book
Mode: T2I
Reference images: 0
Height: 1152
Width: 768
Steps: 50
Guidance scale: 1.5
Seed: 42
Device map: cuda
100%|█████████████████████████████████████████████████████████████████| 50/50 [00:33<00:00, 1.49it/s]
Saved rendered image to glm_image_output.png

2 I2I test

CUDA_VISIBLE_DEVICES=5 python run_glm_image.py --model-dir tmp_glm_image_w4a16 --i2i-demo
Loading GLM-Image pipeline from /mnt/disk1/lvl/auto-round-main/tmp_glm_image_w4a16 ...
Loading checkpoint shards: 100%|████████████████████████████████████████| 3/3 [00:01<00:00, 1.62it/s]
Loading weights: 100%|████████████████████████████████████████████| 111/111 [00:00<00:00, 1424.23it/s]
Loading pipeline components...: 86%|██████████████████████████████ | 6/7 [00:04<00:00, 1.21it/s]2026-03-08 21:21:30 INFO device.py L1643: [Memory Monitor] Before Applying general replacements: 'peak_ram': 1.69GB
2026-03-08 21:21:30 WARNING modeling_utils.py L4430: loss_type=None was set in the config but it is unrecognized. Using the default loss: ForCausalLMLoss.
2026-03-08 21:21:30 INFO device.py L1646: [Memory Monitor] After Applying general replacements: 'peak_ram': 1.69GB
2026-03-08 21:21:30 WARNING backend.py L1084: Better backend is found, please install all the following requirements to enable it.
2026-03-08 21:21:30 WARNING backend.py L1084: pip install -v "gptqmodel>=2.0" --no-build-isolation
Loading weights: 100%|██████████████████████████████████████████| 1491/1491 [00:00<00:00, 1520.15it/s]
Loading pipeline components...: 100%|███████████████████████████████████| 7/7 [00:07<00:00, 1.04s/it]
Pipeline loaded.

No reference image provided; generating a synthetic condition image ...
Prompt: Replace the background of the snow forest with an underground station featuring an automatic escalator.
Mode: I2I
Reference images: 1
Height: 1056
Width: 1024
Steps: 50
Guidance scale: 1.5
Seed: 42
Device map: cuda
100%|█████████████████████████████████████████████████████████████████| 50/50 [00:43<00:00, 1.14it/s]
Saved rendered image to glm_image_output_i2i.png

lvliang-intel · 2026-03-09T03:03:10Z

Currently only quantize autoregressive part, I'm working on designing a hybrid mode for quantizing both autoregressive and diffusion part.

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

…upport_glm_image Signed-off-by: lvliang-intel <liang1.lv@intel.com>

…upport_glm_image

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

…upport_glm_image

for more information, see https://pre-commit.ci

lvliang-intel · 2026-03-17T13:53:19Z

Verified with vllm-omni (based on PR vllm-project/vllm-omni#1777 and some adaptions):

1 Original GLM-Image Model
cd vllm-omni/examples/offline_inference/glm_image
CUDA_VISIBLE_DEVICES=5 python end2end.py --model-path /mnt/disk5/lvl/GLM-Image --config-path /mnt/disk1/lvl/vllm-omni/vllm_omni/model_executor/stage_configs/glm_image.yaml --prompt "A cat sitting on the table" --output cat_orig.png

2 Quantized GLM-Image Model

CUDA_VISIBLE_DEVICES=5 python end2end.py --model-path /mnt/disk1/lvl/auto-round-main/tmp_glm_image_w4a16/ --config-path /mnt/disk1/lvl/vllm-omni/vllm_omni/model_executor/stage_configs/glm_image.yaml --prompt "A cat sitting on the table" --output cat_quantized.png

end2end.py

for more information, see https://pre-commit.ci

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

yiliu30

LGTM

auto_round/compressors/diffusion/hybrid.py

auto_round/utils/model.py

auto_round/compressors/diffusion/hybrid.py

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

…upport_glm_image

for more information, see https://pre-commit.ci

Support GLM-Image model quantizaiton

091d81c

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Copilot AI review requested due to automatic review settings March 8, 2026 13:24

Copilot started reviewing on behalf of lvliang-intel March 8, 2026 13:25 View session

pre-commit-ci bot and others added 3 commits March 8, 2026 13:26

[pre-commit.ci] auto fixes from pre-commit.com hooks

b367bf3

for more information, see https://pre-commit.ci

fix test script

e73a31d

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Merge branch 'lvl/support_glm_image' of https://github.com/intel/auto…

c1d819b

…-round into lvl/support_glm_image Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Copilot AI reviewed Mar 8, 2026

View reviewed changes

test/test_cpu/models/test_glm_image.py Outdated Show resolved Hide resolved

support hybrid mode

510b6c4

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

chensuyue added this to the 0.12.0 milestone Mar 16, 2026

lvliang-intel and others added 5 commits March 17, 2026 16:29

Merge branch 'main' of https://github.com/intel/auto-round into lvl/s…

102f8ae

…upport_glm_image Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Merge branch 'main' of https://github.com/intel/auto-round into lvl/s…

1691ac6

…upport_glm_image

fix hybrid mode

27cddba

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Merge branch 'main' of https://github.com/intel/auto-round into lvl/s…

e80472f

…upport_glm_image

[pre-commit.ci] auto fixes from pre-commit.com hooks

0ec767a

for more information, see https://pre-commit.ci

lvliang-intel and others added 3 commits March 19, 2026 10:38

Merge branch 'main' into lvl/support_glm_image

8f9f607

[pre-commit.ci] auto fixes from pre-commit.com hooks

dcf3b52

for more information, see https://pre-commit.ci

fix issue

0e4dafc

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

chensuyue requested review from WeiweiZhang1, changwangss, xin3he and yiliu30 March 19, 2026 15:38

yiliu30 approved these changes Mar 20, 2026

View reviewed changes

auto_round/compressors/diffusion/hybrid.py Show resolved Hide resolved

auto_round/compressors/diffusion/hybrid.py Show resolved Hide resolved

auto_round/utils/model.py Outdated Show resolved Hide resolved

auto_round/compressors/diffusion/hybrid.py Outdated Show resolved Hide resolved

changwangss approved these changes Mar 20, 2026

View reviewed changes

lvliang-intel and others added 4 commits March 20, 2026 21:50

fix comments

19f8484

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Merge branch 'main' of https://github.com/intel/auto-round into lvl/s…

344825f

…upport_glm_image

[pre-commit.ci] auto fixes from pre-commit.com hooks

57c58d9

for more information, see https://pre-commit.ci

Merge branch 'main' into lvl/support_glm_image

1367356

chensuyue merged commit 62f92d1 into main Mar 21, 2026
30 checks passed

chensuyue deleted the lvl/support_glm_image branch March 21, 2026 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GLM-Image model quantizaiton#1512

Support GLM-Image model quantizaiton#1512
chensuyue merged 17 commits intomainfrom
lvl/support_glm_image

lvliang-intel commented Mar 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

lvliang-intel commented Mar 8, 2026

Uh oh!

lvliang-intel commented Mar 9, 2026

Uh oh!

lvliang-intel commented Mar 17, 2026

Uh oh!

yiliu30 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

lvliang-intel commented Mar 8, 2026

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

lvliang-intel commented Mar 8, 2026

Inference test with quantized model using transformers:

Uh oh!

lvliang-intel commented Mar 9, 2026

Uh oh!

lvliang-intel commented Mar 17, 2026

Verified with vllm-omni (based on PR vllm-project/vllm-omni#1777 and some adaptions):

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants