finetuning model with `max_length=4096`, but in infer got `exceeds the model max_length: 2048' 

**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)

Finetuning model `ModelType.qwen_vl_chat` with `max_length=4096`
But in inference with the checkpoint, got `exceedsthe model max_length: 2048` error

```
token len: history:421,  now: 1630

Traceback (most recent call last):
  File "/home/ldl/pi_code/swift/pi_code/infer_qwen_vl.py", line 83, in <module>
    response, _ = inference(model, template, value, history)
  File "/home/ldl/miniconda3/envs/swift/lib/python3.10/site-packages/swift/llm/utils/utils.py", line 748, in inference
    raise AssertionError('Current sentence length exceeds'
AssertionError: Current sentence length exceedsthe model max_length: 2048
```




**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)


**Additional context**
Add any other context about the problem here(在这里补充其他信息)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

finetuning model with `max_length=4096`, but in infer got `exceeds the model max_length: 2048' #861

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

finetuning model with max_length=4096, but in infer got `exceeds the model max_length: 2048' #861

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

finetuning model with `max_length=4096`, but in infer got `exceeds the model max_length: 2048' #861