Skip to content

Commit f6b6e17

Browse files
Add glm4.5&&glm4.5V doc (#40095)
* Docs: GLM-4-MoE & GLM-4V-MoE pages * Docs: polish GLM-4V-MoE intro, remove placeholders; pin image * Docs --------- Co-authored-by: wujiahan <lambert@gmail.com>
1 parent 1c5e17c commit f6b6e17

File tree

2 files changed

+26
-10
lines changed

2 files changed

+26
-10
lines changed

docs/source/en/model_doc/glm4_moe.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,21 @@ rendered properly in your Markdown viewer.
1818

1919
## Overview
2020

21-
This will update After model release.
21+
The [**GLM-4.5**](https://arxiv.org/abs/2508.06471) series models are foundation models designed for intelligent agents, MoE variants are documented here as Glm4Moe.
22+
23+
GLM-4.5 has **355** billion total parameters with **32** billion active parameters, while GLM-4.5-Air adopts a more compact design with **106** billion total parameters and **12** billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
24+
25+
Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses.
26+
27+
We have open-sourced the base models, hybrid reasoning models, and FP8 versions of the hybrid reasoning models for both GLM-4.5 and GLM-4.5-Air. They are released under the MIT open-source license and can be used commercially and for secondary development.
28+
29+
As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional performance with a score of **63.2**, in the **3rd** place among all the proprietary and open-source models. Notably, GLM-4.5-Air delivers competitive results at **59.8** while maintaining superior efficiency.
30+
31+
![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench.png)
32+
33+
For more eval results, show cases, and technical details, please visit our [technical report](https://arxiv.org/abs/2508.06471) or [technical blog](https://z.ai/blog/glm-4.5).
34+
35+
The model code, tool parser and reasoning parser can be found in the implementation of [transformers](https://github.com/huggingface/transformers/tree/main/src/transformers/models/glm4_moe), [vLLM](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/glm4_moe_mtp.py) and [SGLang](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/glm4_moe.py).
2236

2337
## Glm4MoeConfig
2438

docs/source/en/model_doc/glm4v_moe.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,20 +25,22 @@ rendered properly in your Markdown viewer.
2525

2626
## Overview
2727

28-
The Glm4vMoe model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
29-
<INSERT SHORT SUMMARY HERE>
28+
Vision-language models (VLMs) have become a key cornerstone of intelligent systems. As real-world AI tasks grow increasingly complex, VLMs urgently need to enhance reasoning capabilities beyond basic multimodal perception — improving accuracy, comprehensiveness, and intelligence — to enable complex problem solving, long-context understanding, and multimodal agents.
3029

31-
The abstract from the paper is the following:
30+
Through our open-source work, we aim to explore the technological frontier together with the community while empowering more developers to create exciting and innovative applications.
3231

33-
*<INSERT PAPER ABSTRACT HERE>*
32+
[GLM-4.5V](https://github.com/zai-org/GLM-V) is based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air (106B parameters, 12B active). It continues the technical approach of [GLM-4.1V-Thinking](https://arxiv.org/abs/2507.01006), achieving SOTA performance among models of the same scale on 42 public vision-language benchmarks. It covers common tasks such as image, video, and document understanding, as well as GUI agent operations.
3433

35-
Tips:
34+
![bench_45](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/bench_45v.jpeg)
3635

37-
<INSERT TIPS ABOUT MODEL HERE>
38-
39-
This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>).
40-
The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>).
36+
Beyond benchmark performance, GLM-4.5V focuses on real-world usability. Through efficient hybrid training, it can handle diverse types of visual content, enabling full-spectrum vision reasoning, including:
37+
- **Image reasoning** (scene understanding, complex multi-image analysis, spatial recognition)
38+
- **Video understanding** (long video segmentation and event recognition)
39+
- **GUI tasks** (screen reading, icon recognition, desktop operation assistance)
40+
- **Complex chart & long document parsing** (research report analysis, information extraction)
41+
- **Grounding** (precise visual element localization)
4142

43+
The model also introduces a **Thinking Mode** switch, allowing users to balance between quick responses and deep reasoning. This switch works the same as in the `GLM-4.5` language model.
4244

4345
## Glm4vMoeConfig
4446

0 commit comments

Comments
 (0)