Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you plan to release train code ? #1

Open
daixiangzi opened this issue Jun 21, 2024 · 9 comments
Open

Do you plan to release train code ? #1

daixiangzi opened this issue Jun 21, 2024 · 9 comments

Comments

@daixiangzi
Copy link

No description provided.

@Liuziyu77
Copy link
Owner

We will release our model soon. You can also train your own model with MMUD by yourself. Train code depends on which model you are using.

@Liuziyu77
Copy link
Owner

MMDU can be applied to various LVLMs

@daixiangzi
Copy link
Author

We will release our model soon. You can also train your own model with MMUD by yourself. Train code depends on which model you are using.

hh,we are preparing to do this。

@daixiangzi
Copy link
Author

MMDU can be applied to various LVLMs

max image num is 20 in MMDU.in fact ,if I use llava3-clip-l14-336(max token is 8k),I think I need to use token compression,have you done any research in this area?

@Liuziyu77
Copy link
Owner

MMDU can be applied to various LVLMs

max image num is 20 in MMDU.in fact ,if I use llava3-clip-l14-336(max token is 8k),I think I need to use token compression,have you done any research in this area?

One of the purposes of MMDU-45k is to enhance the dialogue capabilities of LVLMs in long multi-modal contexts involving text and images. The maximum token length for MMDU-45k is 17k. During the finetuning of the model, we generally use lengths of 16k or 32k to train the model, without considering the issue of token compression.

@Liuziyu77
Copy link
Owner

The main data length distribution of MMDU-45k and the MMDU benchmark is around 8k. Therefore, using MMDU-45k to finetune an 8k-LVLM is also feasible.

@daixiangzi
Copy link
Author

daixiangzi commented Jul 2, 2024

I tried fine-tuning clip_l14_336-llama3-8b using mmdu, and even with a batch size of 1, it still runs out of memory on an 80G A100.

@Liuziyu77
Copy link
Owner

I tried fine-tuning clip_l14_336-llama3-8b using mmdu, and even with a batch size of 1, it still runs out of memory on an 80G A100.

image
MMDU has long-context use zero3.json

@daixiangzi
Copy link
Author

daixiangzi commented Jul 3, 2024

I tried fine-tuning clip_l14_336-llama3-8b using mmdu, and even with a batch size of 1, it still runs out of memory on an 80G A100.

image MMDU has long-context use zero3.json

I use zero3 in fact.but still oom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants