[codex] Fix GLM OCR token forwarding by hansent · Pull Request #2216 · roboflow/inference

hansent · 2026-04-08T19:36:19Z

Summary

This fixes truncated GLM OCR responses by addressing the token-limit path in both direct GLM inference and workflow remote execution.

What changed

Restored the GLM-OCR model wrapper's default max_new_tokens behavior when requests arrive with max_new_tokens=None.
Added max_new_tokens to the GLM OCR workflow block and forwarded it in both local and remote execution modes.
Extended the SDK LMM client to serialize max_new_tokens and enable_thinking so remote workflow execution can pass generation parameters through to /infer/lmm.
Fixed the Qwen3.5-VL workflow block's remote path to forward its existing generation parameters as well.
Added targeted unit coverage for GLM default token handling, workflow remote forwarding, and SDK request serialization.

Root cause

There were two gaps working together:

LMMInferenceRequest can legitimately carry max_new_tokens=None, but the GLM-OCR wrapper forwarded that None directly into generation instead of restoring the model default.
The workflow remote execution path for GLM OCR did not expose or forward max_new_tokens, and the shared LMM SDK helper did not serialize that field either.

That combination made serverless workflow execution especially likely to use an unintended shorter generation limit.

Impact

GLM OCR requests should no longer get cut short just because the token limit was omitted.
Workflow remote execution can now intentionally override the token cap for GLM OCR.
Qwen3.5-VL remote workflow execution now correctly forwards its generation settings too.

Validation

python -m py_compile inference_models/inference_models/models/glm_ocr/glm_ocr_hf.py inference_sdk/http/client.py inference/core/workflows/core_steps/models/foundation/glm_ocr/v1.py inference/core/workflows/core_steps/models/foundation/qwen3_5vl/v1.py tests/workflows/unit_tests/core_steps/models/foundation/test_vlm_remote_execution.py tests/inference_sdk/unit_tests/http/test_client.py inference_models/tests/unit_tests/models/test_glm_ocr_hf.py
git diff --check

Notes

pytest could not be run in this environment because the current Python environment does not have pytest installed (python -m pytest reported No module named pytest).

Fix GLM OCR token forwarding

a61706f

grzegorz-roboflow approved these changes Apr 9, 2026

View reviewed changes

hansent marked this pull request as ready for review April 9, 2026 17:32

hansent requested review from PawelPeczek-Roboflow, dkosowski87, probicheaux and yeldarby as code owners April 9, 2026 17:32

Merge branch 'main' into codex/glm-ocr-token-forwarding

649f6f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Fix GLM OCR token forwarding#2216

[codex] Fix GLM OCR token forwarding#2216
hansent wants to merge 2 commits intomainfrom
codex/glm-ocr-token-forwarding

hansent commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hansent commented Apr 8, 2026

Summary

What changed

Root cause

Impact

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants