Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only output [1, 2] tokens for 'lmms-lab/LLaVA-NeXT-Video-7B-DPO' video demo inference #52

Open
LeonLIU08 opened this issue Jun 5, 2024 · 2 comments

Comments

@LeonLIU08
Copy link

LeonLIU08 commented Jun 5, 2024

the output of output_ids is tensor([[1, 2]], device='cuda:0')
Other output of the demo script is:

Question: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER:
Please provide a detailed description of the video, focusing on the main subjects, their actions, and the background scenes ASSISTANT:

Response:

@ZhangYuanhan-AI
Copy link
Collaborator

Could you please inform me with the command you used.

@LeonLIU08
Copy link
Author

LeonLIU08 commented Jun 6, 2024

The command:
bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-7B-DPO vicuna_v1 32 2 True xxx.mp4

By the way, I found using pool_stride=4 can solve this, because the input token length with stride=2 is 4673 which is larger than the max_length of LLM (4096).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants