Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training loss curve on MMC4 dataset? #226

Open
tonylins opened this issue Jul 22, 2023 · 8 comments
Open

Training loss curve on MMC4 dataset? #226

tonylins opened this issue Jul 22, 2023 · 8 comments

Comments

@tonylins
Copy link

Hi, thanks for the great work! I tried training on a subset of MMC4-core but the LM loss does not go down too much. Is it possible to share the MMC4 loss curve for reference, so that I may know if it is expected (or potentially a bug). Thanks so much!

@FingerRec
Copy link

Meet the same problem, it seems the training loss on MMC4 is hard to convergence.

@anas-awadalla
Copy link
Collaborator

Screenshot 2023-07-23 at 9 09 57 PM

Here are the loss plots for some of our training runs. We also find that the loss on MMC4 decreases more slowly than the loss on LAION. We anticipate that this could be the case because we use a pre-trained language model, which is already a strong predictor of the next token.

@FingerRec
Copy link

Thanks. What's the sim thresh score used for this figure?

@i-gao
Copy link
Collaborator

i-gao commented Jul 24, 2023

These curves use a threshold of 0.24.

@FingerRec
Copy link

Thanks. In addition, have you ever show the validation loss before? If I use a subset of mmc4 for pretrain (around 1m website). The validation loss begin to rise up in very short iterations.

Train:
image

Val:
image

@i-gao
Copy link
Collaborator

i-gao commented Jul 26, 2023

Hmm, we haven't plotted such a validation loss before -- this behavior is pretty surprising to me! Do you know if your downstream performance on task benchmarks improves or degrades with training?

@FingerRec
Copy link

FingerRec commented Jul 27, 2023

The downstream performance are also unstable. The ckpt in middle sometimes better than the final. I guess it's because I train the model with only 1M LAION and 1M MMc4. The data scale is too small. How many LAION and MMC4 samples you used for the above figure? @i-gao

@i-gao
Copy link
Collaborator

i-gao commented Jul 27, 2023

Ah, okay! The x-axis in the training curve plots refer to the number of interleaved (mmc4) samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants