Skip to content

[codex] Fix GLM OCR token forwarding#2216

Open
hansent wants to merge 2 commits intomainfrom
codex/glm-ocr-token-forwarding
Open

[codex] Fix GLM OCR token forwarding#2216
hansent wants to merge 2 commits intomainfrom
codex/glm-ocr-token-forwarding

Conversation

@hansent
Copy link
Copy Markdown
Collaborator

@hansent hansent commented Apr 8, 2026

Summary

This fixes truncated GLM OCR responses by addressing the token-limit path in both direct GLM inference and workflow remote execution.

What changed

  • Restored the GLM-OCR model wrapper's default max_new_tokens behavior when requests arrive with max_new_tokens=None.
  • Added max_new_tokens to the GLM OCR workflow block and forwarded it in both local and remote execution modes.
  • Extended the SDK LMM client to serialize max_new_tokens and enable_thinking so remote workflow execution can pass generation parameters through to /infer/lmm.
  • Fixed the Qwen3.5-VL workflow block's remote path to forward its existing generation parameters as well.
  • Added targeted unit coverage for GLM default token handling, workflow remote forwarding, and SDK request serialization.

Root cause

There were two gaps working together:

  1. LMMInferenceRequest can legitimately carry max_new_tokens=None, but the GLM-OCR wrapper forwarded that None directly into generation instead of restoring the model default.
  2. The workflow remote execution path for GLM OCR did not expose or forward max_new_tokens, and the shared LMM SDK helper did not serialize that field either.

That combination made serverless workflow execution especially likely to use an unintended shorter generation limit.

Impact

  • GLM OCR requests should no longer get cut short just because the token limit was omitted.
  • Workflow remote execution can now intentionally override the token cap for GLM OCR.
  • Qwen3.5-VL remote workflow execution now correctly forwards its generation settings too.

Validation

  • python -m py_compile inference_models/inference_models/models/glm_ocr/glm_ocr_hf.py inference_sdk/http/client.py inference/core/workflows/core_steps/models/foundation/glm_ocr/v1.py inference/core/workflows/core_steps/models/foundation/qwen3_5vl/v1.py tests/workflows/unit_tests/core_steps/models/foundation/test_vlm_remote_execution.py tests/inference_sdk/unit_tests/http/test_client.py inference_models/tests/unit_tests/models/test_glm_ocr_hf.py
  • git diff --check

Notes

pytest could not be run in this environment because the current Python environment does not have pytest installed (python -m pytest reported No module named pytest).

@hansent hansent marked this pull request as ready for review April 9, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants