Problem running LLaVA-NeXT-Video-34B-DPO #43

Marlod390 · 2024-05-28T17:22:27Z

Dear authors,

thank you for your great work. I have tested the LLaVA-NeXT-Video-7B-DPO on various videos and it show very excellent results. But when i try to run the 34B-DPO, i encountered following error:

Traceback (most recent call last): File "/mnt/qb/work/ponsmoll/pba178/project/LLaVA-NeXT/batch.py", line 151, in <module> run_inference() File "/mnt/qb/work/ponsmoll/pba178/project/LLaVA-NeXT/batch.py", line 133, in run_inference output_ids = model.generate(inputs=input_ids, images=video, attention_mask=attention_masks, modalities="video", do_sample=True, temperature=0.2, max_new_tokens=1024, use_cache=True, stopping_criteria=[stopping_criteria]) File "/mnt/qb/work/ponsmoll/pba178/.conda/llavan/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/mnt/qb/work/ponsmoll/pba178/project/LLaVA-NeXT/llavavid/model/language_model/llava_llama.py", line 120, in generate return super().generate(position_ids=position_ids, attention_mask=attention_mask, inputs_embeds=inputs_embeds, **kwargs) File "/mnt/qb/work/ponsmoll/pba178/.conda/llavan/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/mnt/qb/work/ponsmoll/pba178/.conda/llavan/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate result = self._sample( File "/mnt/qb/work/ponsmoll/pba178/.conda/llavan/lib/python3.10/site-packages/transformers/generation/utils.py", line 2760, in _sample unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores) File "/mnt/qb/work/ponsmoll/pba178/.conda/llavan/lib/python3.10/site-packages/transformers/generation/stopping_criteria.py", line 137, in __call__ is_done = is_done | criteria(input_ids, scores, **kwargs) File "/mnt/qb/work/ponsmoll/pba178/project/LLaVA-NeXT/llavavid/mm_utils.py", line 245, in __call__ outputs.append(self.call_for_batch(output_ids[i].unsqueeze(0), scores)) File "/mnt/qb/work/ponsmoll/pba178/project/LLaVA-NeXT/llavavid/mm_utils.py", line 234, in call_for_batch if (output_ids[0, -keyword_id.shape[0]:] == keyword_id).all(): RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 0

The text was updated successfully, but these errors were encountered:

sykuann · 2024-05-31T07:20:34Z

I am facing the same issue..

gyfastas · 2024-06-01T01:48:51Z

same issue too. How to fix it?

updated: I tried to mofidy mm_utils.py as the following. It now works for me:

def call_for_batch(self, output_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
    offset = min(output_ids.shape[1] - self.start_len, self.max_keyword_len)
    self.keyword_ids = [keyword_id.to(output_ids.device) for keyword_id in self.keyword_ids]
    try:
        for keyword_id in self.keyword_ids: # fix: if output_ids[0, -keyword_id.shape[0]:] is not equal len as keyword_id just pass
            if output_ids[0, -keyword_id.shape[0]:].shape[0] != keyword_id.shape[0]:
                continue
            elif (output_ids[0, -keyword_id.shape[0]:] == keyword_id).all():
                return True
    except Exception as e:
        print(f"Error raised here {e}")
        import pdb;pdb.set_trace()
    outputs = self.tokenizer.batch_decode(output_ids[:, -offset:], skip_special_tokens=True)[0]
    for keyword in self.keywords:
        if keyword in outputs:
            return True
    return False

Note: This might changes the behavior of stopping criteria. It starts to repeat words in my case.

ZhangYuanhan-AI · 2024-06-01T09:21:13Z

Hi, please share with me the command you use.

ZhangYuanhan-AI · 2024-06-01T09:45:52Z

bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-34B-DPO mistral_direct 16 2 True XXX.mp4

works well at my side

Marlod390 · 2024-06-01T12:11:21Z

I hardcoded the parameters into a code for inference. I changed vicuna_v1 to mistral_direct like yours and it worked. But compared to the 7B version, the 34B answer contains a lot of "in the image" and "in the frame". This may not be what the video VLM should output. Do you have the similar problem? If not, there may be something wrong with my code.

ZhangYuanhan-AI · 2024-06-01T13:34:55Z

Hi, as our training data includes images as well, and hence there are many instructions includes phases like "in the image", so the current model generate "in the image" sometimes.

We are currently focusing on solving this!

Marlod390 closed this as completed Jun 2, 2024

ZhangYuanhan-AI mentioned this issue Jun 2, 2024

video caption often contains " The image " #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem running LLaVA-NeXT-Video-34B-DPO #43

Problem running LLaVA-NeXT-Video-34B-DPO #43

Marlod390 commented May 28, 2024

sykuann commented May 31, 2024

gyfastas commented Jun 1, 2024 •

edited

Loading

ZhangYuanhan-AI commented Jun 1, 2024

ZhangYuanhan-AI commented Jun 1, 2024

Marlod390 commented Jun 1, 2024 •

edited

Loading

ZhangYuanhan-AI commented Jun 1, 2024

Problem running LLaVA-NeXT-Video-34B-DPO #43

Problem running LLaVA-NeXT-Video-34B-DPO #43

Comments

Marlod390 commented May 28, 2024

sykuann commented May 31, 2024

gyfastas commented Jun 1, 2024 • edited Loading

ZhangYuanhan-AI commented Jun 1, 2024

ZhangYuanhan-AI commented Jun 1, 2024

Marlod390 commented Jun 1, 2024 • edited Loading

ZhangYuanhan-AI commented Jun 1, 2024

gyfastas commented Jun 1, 2024 •

edited

Loading

Marlod390 commented Jun 1, 2024 •

edited

Loading