When change model total length by
dashinfer_vlm_serve ... --max_length 64000
it will still limit to 32k by this env:
build/lib/dashinfer_vlm/vl_inference/runtime/qwen_vl.py:
self.max_input_len = int(getenv("DS_LLM_MAX_IN_TOKENS", "20000"))
self.max_total_len = int(getenv("DS_LLM_MAX_TOKENS", "32000"))
the length setting should be again, control by command line or env.
eg , qwen vl 2.5 support 128k length in config.json, should read this config to get a better auto settings.