Skip to content

multimodal: length configure should be again . #93

@kzjeef

Description

@kzjeef

When change model total length by

dashinfer_vlm_serve ... --max_length 64000

it will still limit to 32k by this env:

build/lib/dashinfer_vlm/vl_inference/runtime/qwen_vl.py:
        self.max_input_len = int(getenv("DS_LLM_MAX_IN_TOKENS", "20000"))
        self.max_total_len = int(getenv("DS_LLM_MAX_TOKENS", "32000"))

the length setting should be again, control by command line or env.

eg , qwen vl 2.5 support 128k length in config.json, should read this config to get a better auto settings.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions