Add GLM-4.7-Flash model cards (4bit, 5bit, 6bit, 8bit) #1214

AlexCheema · 2026-01-20T02:32:25Z

Motivation

Add support for GLM-4.7-Flash, a lighter variant of GLM-4.7 with the glm4_moe_lite architecture. These models are smaller and faster while maintaining good performance.

Changes

Added 4 new model cards for GLM-4.7-Flash variants:
- glm-4.7-flash-4bit (~18 GB)
- glm-4.7-flash-5bit (~21 GB)
- glm-4.7-flash-6bit (~25 GB)
- glm-4.7-flash-8bit (~32 GB)
All variants have:
- n_layers: 47 (vs 91 in GLM-4.7)
- hidden_size: 2048 (vs 5120 in GLM-4.7)
- supports_tensor: True (native shard() method)
Bumped mlx from 0.30.1 to 0.30.3 - required by mlx-lm 0.30.4
Updated mlx-lm from 0.30.2 to 0.30.4 - adds glm4_moe_lite architecture support
Added type ignores in auto_parallel.py for stricter type annotations in new mlx-lm
Fixed EOS token IDs for GLM-4.7-Flash - uses different tokenizer with IDs [154820, 154827, 154829] vs other GLM models' [151336, 151329, 151338]
Renamed MLX_IBV_DEVICES to MLX_JACCL_DEVICES - env var name changed in new mlx

Why It Works

The model cards follow the same pattern as existing GLM-4.7 models. Tensor parallel support is enabled because GLM-4.7-Flash implements the native shard() method in mlx-lm 0.30.4, which is automatically detected in auto_parallel.py.

GLM-4.7-Flash uses a new tokenizer with different special token IDs. Without the correct EOS tokens, generation wouldn't stop properly.

Test Plan

Manual Testing

Tested generation with GLM-4.7-Flash-4bit - now correctly stops at EOS tokens.

Automated Testing

basedpyright: 0 errors
ruff check: All checks passed
pytest: 162/162 tests pass (excluding pre-existing test_distributed_fix.py timeout failures)

🤖 Generated with Claude Code

- Add model cards for GLM-4.7-Flash variants (4bit, 5bit, 6bit, 8bit) - Bump mlx from 0.30.1 to 0.30.3 - Update mlx-lm from 0.30.2 to 0.30.4 for glm4_moe_lite architecture support - Add EOS token IDs for GLM-4.7-Flash (different tokenizer than other GLM models) - Add type ignores for stricter type annotations in new mlx-lm - Rename MLX_IBV_DEVICES to MLX_JACCL_DEVICES env var Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Evanev7 · 2026-01-20T09:56:07Z

src/exo/worker/engines/mlx/utils_mlx.py


                # TODO: update once upstream fixes
                logger.info(
-                    f"rank {rank} MLX_IBV_DEVICES: {coordination_file} with devices: {jaccl_devices_json}"


this is plain wrong last i checked...

update: it will just break next week

AlexCheema force-pushed the add-glm-4.7-flash-model-cards branch 2 times, most recently from 4ba22ab to 35ec820 Compare January 20, 2026 03:19

AlexCheema force-pushed the add-glm-4.7-flash-model-cards branch from 35ec820 to 06a05b1 Compare January 20, 2026 03:27

AlexCheema merged commit 176ab5b into main Jan 20, 2026
8 checks passed

AlexCheema deleted the add-glm-4.7-flash-model-cards branch January 20, 2026 03:58

Evanev7 reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GLM-4.7-Flash model cards (4bit, 5bit, 6bit, 8bit) #1214

Add GLM-4.7-Flash model cards (4bit, 5bit, 6bit, 8bit) #1214

AlexCheema commented Jan 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Evanev7 Jan 20, 2026

Uh oh!

Evanev7 Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add GLM-4.7-Flash model cards (4bit, 5bit, 6bit, 8bit) #1214

Add GLM-4.7-Flash model cards (4bit, 5bit, 6bit, 8bit) #1214

Conversation

AlexCheema commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Why It Works

Test Plan

Manual Testing

Automated Testing

Uh oh!

Uh oh!

Evanev7 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Evanev7 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlexCheema commented Jan 20, 2026 •

edited

Loading