multimodal: length configure should be again .

When change model total length by 
```
dashinfer_vlm_serve ... --max_length 64000
```

it will still limit to 32k by this env:
```
build/lib/dashinfer_vlm/vl_inference/runtime/qwen_vl.py:
        self.max_input_len = int(getenv("DS_LLM_MAX_IN_TOKENS", "20000"))
        self.max_total_len = int(getenv("DS_LLM_MAX_TOKENS", "32000"))
```

the length setting should be again, control by command line or env. 

eg , qwen vl 2.5 support 128k length in config.json, should read this config to get a better auto settings.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

multimodal: length configure should be again . #93

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

multimodal: length configure should be again . #93

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions