-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem running LLaVA-NeXT-Video-34B-DPO #43
Comments
I am facing the same issue.. |
same issue too. How to fix it? updated: I tried to mofidy
Note: This might changes the behavior of stopping criteria. It starts to repeat words in my case. |
Hi, please share with me the command you use. |
bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-34B-DPO mistral_direct 16 2 True XXX.mp4 works well at my side |
I hardcoded the parameters into a code for inference. I changed vicuna_v1 to mistral_direct like yours and it worked. But compared to the 7B version, the 34B answer contains a lot of "in the image" and "in the frame". This may not be what the video VLM should output. Do you have the similar problem? If not, there may be something wrong with my code. |
Hi, as our training data includes images as well, and hence there are many instructions includes phases like "in the image", so the current model generate "in the image" sometimes. We are currently focusing on solving this! |
Dear authors,
thank you for your great work. I have tested the LLaVA-NeXT-Video-7B-DPO on various videos and it show very excellent results. But when i try to run the 34B-DPO, i encountered following error:
Traceback (most recent call last): File "/mnt/qb/work/ponsmoll/pba178/project/LLaVA-NeXT/batch.py", line 151, in <module> run_inference() File "/mnt/qb/work/ponsmoll/pba178/project/LLaVA-NeXT/batch.py", line 133, in run_inference output_ids = model.generate(inputs=input_ids, images=video, attention_mask=attention_masks, modalities="video", do_sample=True, temperature=0.2, max_new_tokens=1024, use_cache=True, stopping_criteria=[stopping_criteria]) File "/mnt/qb/work/ponsmoll/pba178/.conda/llavan/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/mnt/qb/work/ponsmoll/pba178/project/LLaVA-NeXT/llavavid/model/language_model/llava_llama.py", line 120, in generate return super().generate(position_ids=position_ids, attention_mask=attention_mask, inputs_embeds=inputs_embeds, **kwargs) File "/mnt/qb/work/ponsmoll/pba178/.conda/llavan/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/mnt/qb/work/ponsmoll/pba178/.conda/llavan/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate result = self._sample( File "/mnt/qb/work/ponsmoll/pba178/.conda/llavan/lib/python3.10/site-packages/transformers/generation/utils.py", line 2760, in _sample unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores) File "/mnt/qb/work/ponsmoll/pba178/.conda/llavan/lib/python3.10/site-packages/transformers/generation/stopping_criteria.py", line 137, in __call__ is_done = is_done | criteria(input_ids, scores, **kwargs) File "/mnt/qb/work/ponsmoll/pba178/project/LLaVA-NeXT/llavavid/mm_utils.py", line 245, in __call__ outputs.append(self.call_for_batch(output_ids[i].unsqueeze(0), scores)) File "/mnt/qb/work/ponsmoll/pba178/project/LLaVA-NeXT/llavavid/mm_utils.py", line 234, in call_for_batch if (output_ids[0, -keyword_id.shape[0]:] == keyword_id).all(): RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 0
The text was updated successfully, but these errors were encountered: