Question on "mlm" in continued pre-training

Hi Team, 

It is amazing handbook. In the continued pre-training script (`run_cpt.py`), I saw that it is not using "mlm" (Masked Language Model) parameter in the training process. I though that the training method mlm vs. forward prediction is the major differentiation between pre-training and supervised fine-tuning.
* Has there been an assessment of the efficacy of continued pre-training with "mlm" compared to without it?
* What's your advice or guidelines for incorporating "mlm" into the continued pre-training process?

Thanks!
Li

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question on "mlm" in continued pre-training #172

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question on "mlm" in continued pre-training #172

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions