Add glm4.5&&glm4.5V doc (#40095)

lambertwjh · Seer-rocketsz · web-flow · commit f6b6e1771982 · 2025-08-12T11:44:53.000Z
* Docs: GLM-4-MoE &amp; GLM-4V-MoE pages

* Docs: polish GLM-4V-MoE intro, remove placeholders; pin image

* Docs

---------

Co-authored-by: wujiahan &lt;lambert@gmail.com&gt;
diff --git a/docs/source/en/model_doc/glm4_moe.md b/docs/source/en/model_doc/glm4_moe.md
@@ -18,7 +18,21 @@ rendered properly in your Markdown viewer.
 
 ## Overview
 
-This will update After model release.
+The [**GLM-4.5**](https://arxiv.org/abs/2508.06471) series models are foundation models designed for intelligent agents, MoE variants are documented here as Glm4Moe.
+
+GLM-4.5 has **355** billion total parameters with **32** billion active parameters, while GLM-4.5-Air adopts a more compact design with **106** billion total parameters and **12** billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
+
+Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses.
+
+We have open-sourced the base models, hybrid reasoning models, and FP8 versions of the hybrid reasoning models for both GLM-4.5 and GLM-4.5-Air. They are released under the MIT open-source license and can be used commercially and for secondary development.
+
+As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional performance with a score of **63.2**, in the **3rd** place among all the proprietary and open-source models. Notably, GLM-4.5-Air delivers competitive results at **59.8** while maintaining superior efficiency.
+
+![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench.png)
+
+For more eval results, show cases, and technical details, please visit our [technical report](https://arxiv.org/abs/2508.06471) or [technical blog](https://z.ai/blog/glm-4.5).
+
+The model code, tool parser and reasoning parser can be found in the implementation of [transformers](https://github.com/huggingface/transformers/tree/main/src/transformers/models/glm4_moe), [vLLM](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/glm4_moe_mtp.py) and [SGLang](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/glm4_moe.py).
 
 ## Glm4MoeConfig
 
diff --git a/docs/source/en/model_doc/glm4v_moe.md b/docs/source/en/model_doc/glm4v_moe.md
@@ -25,20 +25,22 @@ rendered properly in your Markdown viewer.
 
 ## Overview
 
-The Glm4vMoe model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
-<INSERT SHORT SUMMARY HERE>
+Vision-language models (VLMs) have become a key cornerstone of intelligent systems. As real-world AI tasks grow increasingly complex, VLMs urgently need to enhance reasoning capabilities beyond basic multimodal perception — improving accuracy, comprehensiveness, and intelligence — to enable complex problem solving, long-context understanding, and multimodal agents.
 
-The abstract from the paper is the following:
+Through our open-source work, we aim to explore the technological frontier together with the community while empowering more developers to create exciting and innovative applications.
 
-*<INSERT PAPER ABSTRACT HERE>*
+[GLM-4.5V](https://github.com/zai-org/GLM-V) is based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air (106B parameters, 12B active).  It continues the technical approach of [GLM-4.1V-Thinking](https://arxiv.org/abs/2507.01006), achieving SOTA performance among models of the same scale on 42 public vision-language benchmarks.  It covers common tasks such as image, video, and document understanding, as well as GUI agent operations.
 
-Tips:
+![bench_45](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/bench_45v.jpeg)
 
-<INSERT TIPS ABOUT MODEL HERE>
-
-This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>).
-The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>).
+Beyond benchmark performance, GLM-4.5V focuses on real-world usability. Through efficient hybrid training, it can handle diverse types of visual content, enabling full-spectrum vision reasoning, including:
+- **Image reasoning** (scene understanding, complex multi-image analysis, spatial recognition)
+- **Video understanding** (long video segmentation and event recognition)
+- **GUI tasks** (screen reading, icon recognition, desktop operation assistance)
+- **Complex chart & long document parsing** (research report analysis, information extraction)
+- **Grounding** (precise visual element localization)
 
+The model also introduces a **Thinking Mode** switch, allowing users to balance between quick responses and deep reasoning. This switch works the same as in the `GLM-4.5` language model.
 
 ## Glm4vMoeConfig