You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/model_doc/glm4_moe.md
+15-1Lines changed: 15 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,21 @@ rendered properly in your Markdown viewer.
18
18
19
19
## Overview
20
20
21
-
This will update After model release.
21
+
The [**GLM-4.5**](https://arxiv.org/abs/2508.06471) series models are foundation models designed for intelligent agents, MoE variants are documented here as Glm4Moe.
22
+
23
+
GLM-4.5 has **355** billion total parameters with **32** billion active parameters, while GLM-4.5-Air adopts a more compact design with **106** billion total parameters and **12** billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
24
+
25
+
Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses.
26
+
27
+
We have open-sourced the base models, hybrid reasoning models, and FP8 versions of the hybrid reasoning models for both GLM-4.5 and GLM-4.5-Air. They are released under the MIT open-source license and can be used commercially and for secondary development.
28
+
29
+
As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional performance with a score of **63.2**, in the **3rd** place among all the proprietary and open-source models. Notably, GLM-4.5-Air delivers competitive results at **59.8** while maintaining superior efficiency.
For more eval results, show cases, and technical details, please visit our [technical report](https://arxiv.org/abs/2508.06471) or [technical blog](https://z.ai/blog/glm-4.5).
34
+
35
+
The model code, tool parser and reasoning parser can be found in the implementation of [transformers](https://github.com/huggingface/transformers/tree/main/src/transformers/models/glm4_moe), [vLLM](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/glm4_moe_mtp.py) and [SGLang](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/glm4_moe.py).
Copy file name to clipboardExpand all lines: docs/source/en/model_doc/glm4v_moe.md
+11-9Lines changed: 11 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,20 +25,22 @@ rendered properly in your Markdown viewer.
25
25
26
26
## Overview
27
27
28
-
The Glm4vMoe model was proposed in [<INSERTPAPERNAMEHERE>](<INSERT PAPER LINK HERE>) by <INSERTAUTHORSHERE>.
29
-
<INSERTSHORTSUMMARYHERE>
28
+
Vision-language models (VLMs) have become a key cornerstone of intelligent systems. As real-world AI tasks grow increasingly complex, VLMs urgently need to enhance reasoning capabilities beyond basic multimodal perception — improving accuracy, comprehensiveness, and intelligence — to enable complex problem solving, long-context understanding, and multimodal agents.
30
29
31
-
The abstract from the paper is the following:
30
+
Through our open-source work, we aim to explore the technological frontier together with the community while empowering more developers to create exciting and innovative applications.
32
31
33
-
*<INSERTPAPERABSTRACTHERE>*
32
+
[GLM-4.5V](https://github.com/zai-org/GLM-V) is based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air (106B parameters, 12B active). It continues the technical approach of [GLM-4.1V-Thinking](https://arxiv.org/abs/2507.01006), achieving SOTA performance among models of the same scale on 42 public vision-language benchmarks. It covers common tasks such as image, video, and document understanding, as well as GUI agent operations.
This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERTYOURHFUSERNAMEHERE>).
40
-
The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>).
36
+
Beyond benchmark performance, GLM-4.5V focuses on real-world usability. Through efficient hybrid training, it can handle diverse types of visual content, enabling full-spectrum vision reasoning, including:
-**Complex chart & long document parsing** (research report analysis, information extraction)
41
+
-**Grounding** (precise visual element localization)
41
42
43
+
The model also introduces a **Thinking Mode** switch, allowing users to balance between quick responses and deep reasoning. This switch works the same as in the `GLM-4.5` language model.
0 commit comments