Only output [1, 2] tokens for 'lmms-lab/LLaVA-NeXT-Video-7B-DPO' video demo inference #52

LeonLIU08 · 2024-06-05T11:29:33Z

the output of output_ids is tensor([[1, 2]], device='cuda:0')
Other output of the demo script is:

Question: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER:
Please provide a detailed description of the video, focusing on the main subjects, their actions, and the background scenes ASSISTANT:

Response:

ZhangYuanhan-AI · 2024-06-05T14:11:39Z

Could you please inform me with the command you used.

LeonLIU08 · 2024-06-06T02:01:04Z

The command:
bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-7B-DPO vicuna_v1 32 2 True xxx.mp4

By the way, I found using pool_stride=4 can solve this, because the input token length with stride=2 is 4673 which is larger than the max_length of LLM (4096).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only output [1, 2] tokens for 'lmms-lab/LLaVA-NeXT-Video-7B-DPO' video demo inference #52

Only output [1, 2] tokens for 'lmms-lab/LLaVA-NeXT-Video-7B-DPO' video demo inference #52

LeonLIU08 commented Jun 5, 2024 •

edited

Loading

ZhangYuanhan-AI commented Jun 5, 2024

LeonLIU08 commented Jun 6, 2024 •

edited

Loading

Only output [1, 2] tokens for 'lmms-lab/LLaVA-NeXT-Video-7B-DPO' video demo inference #52

Only output [1, 2] tokens for 'lmms-lab/LLaVA-NeXT-Video-7B-DPO' video demo inference #52

Comments

LeonLIU08 commented Jun 5, 2024 • edited Loading

ZhangYuanhan-AI commented Jun 5, 2024

LeonLIU08 commented Jun 6, 2024 • edited Loading

LeonLIU08 commented Jun 5, 2024 •

edited

Loading

LeonLIU08 commented Jun 6, 2024 •

edited

Loading