Training loss curve on MMC4 dataset? #226

tonylins · 2023-07-22T08:04:09Z

Hi, thanks for the great work! I tried training on a subset of MMC4-core but the LM loss does not go down too much. Is it possible to share the MMC4 loss curve for reference, so that I may know if it is expected (or potentially a bug). Thanks so much!

FingerRec · 2023-07-24T02:48:45Z

Meet the same problem, it seems the training loss on MMC4 is hard to convergence.

anas-awadalla · 2023-07-24T04:14:48Z

Here are the loss plots for some of our training runs. We also find that the loss on MMC4 decreases more slowly than the loss on LAION. We anticipate that this could be the case because we use a pre-trained language model, which is already a strong predictor of the next token.

FingerRec · 2023-07-24T15:00:58Z

Thanks. What's the sim thresh score used for this figure?

i-gao · 2023-07-24T16:26:55Z

These curves use a threshold of 0.24.

FingerRec · 2023-07-25T02:52:24Z

Thanks. In addition, have you ever show the validation loss before? If I use a subset of mmc4 for pretrain (around 1m website). The validation loss begin to rise up in very short iterations.

Train:

Val:

i-gao · 2023-07-26T07:24:15Z

Hmm, we haven't plotted such a validation loss before -- this behavior is pretty surprising to me! Do you know if your downstream performance on task benchmarks improves or degrades with training?

FingerRec · 2023-07-27T02:53:37Z

The downstream performance are also unstable. The ckpt in middle sometimes better than the final. I guess it's because I train the model with only 1M LAION and 1M MMc4. The data scale is too small. How many LAION and MMC4 samples you used for the above figure? @i-gao

i-gao · 2023-07-27T03:04:35Z

Ah, okay! The x-axis in the training curve plots refer to the number of interleaved (mmc4) samples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training loss curve on MMC4 dataset? #226

Training loss curve on MMC4 dataset? #226

tonylins commented Jul 22, 2023

FingerRec commented Jul 24, 2023

anas-awadalla commented Jul 24, 2023

FingerRec commented Jul 24, 2023

i-gao commented Jul 24, 2023

FingerRec commented Jul 25, 2023

i-gao commented Jul 26, 2023

FingerRec commented Jul 27, 2023 •

edited

i-gao commented Jul 27, 2023

Training loss curve on MMC4 dataset? #226

Training loss curve on MMC4 dataset? #226

Comments

tonylins commented Jul 22, 2023

FingerRec commented Jul 24, 2023

anas-awadalla commented Jul 24, 2023

FingerRec commented Jul 24, 2023

i-gao commented Jul 24, 2023

FingerRec commented Jul 25, 2023

i-gao commented Jul 26, 2023

FingerRec commented Jul 27, 2023 • edited

i-gao commented Jul 27, 2023

FingerRec commented Jul 27, 2023 •

edited