-
Notifications
You must be signed in to change notification settings - Fork 459
Open
Description
Hi Team,
It is amazing handbook. In the continued pre-training script (run_cpt.py
), I saw that it is not using "mlm" (Masked Language Model) parameter in the training process. I though that the training method mlm vs. forward prediction is the major differentiation between pre-training and supervised fine-tuning.
- Has there been an assessment of the efficacy of continued pre-training with "mlm" compared to without it?
- What's your advice or guidelines for incorporating "mlm" into the continued pre-training process?
Thanks!
Li
Metadata
Metadata
Assignees
Labels
No labels