Skip to content

ThinkingLevelSupportEN

neavo edited this page Feb 26, 2026 · 2 revisions

Overview

  • Some models support both thinking mode and normal mode
  • You can switch by setting the Thinking Level in the API Settings
  • Setting different thinking levels (OFF/LOW/MEDIUM/HIGH) will affect the model's thinking behavior, response time, and Token consumption

API Settings

In Model Management -> Basic Settings, you can find the Thinking Level configuration item

  • OFF: Disable the thinking feature and use the standard response mode
  • LOW: Enable thinking, allocate a smaller Token budget (e.g., 1024 Tokens), suitable for simple logic or saving consumption
  • MEDIUM: Allocate a medium Token budget (e.g., 1536 Tokens), balancing translation quality and speed
  • HIGH: Allocate the maximum Token budget (e.g., 2048 Tokens or higher), suitable for complex semantic understanding and rare topics

Supported Models

  • Models that only support the thinking toggle
    • GLM series
      • LOW MEDIUM HIGH - ON
      • OFF - OFF
    • Kimi series
      • LOW MEDIUM HIGH - ON
      • OFF - OFF
    • DeepSeek series
      • LOW MEDIUM HIGH - ON
      • OFF - OFF
  • Models that only support thinking levels
    • Gemini 2.5 Pro / Gemini 3 Pro / Gemini 3.1 Pro
      • HIGH - Takes effect normally
      • OFF LOW MEDIUM - Equivalent to LOW
  • Models that fully support both thinking toggle and thinking levels
    • doubao-seed-1.6 / doubao-seed-1.8 / doubao-seed-2.0 series
    • Gemini 2.5 Flash / Gemini 3 Flash
    • Claude Sonnet 3.7 / Claude Haiku 4.x / Claude Sonnet 4.x / Claude Opus 4.x

Clone this wiki locally