Reproducing SFT results.

I was looking at the logs of your training (from this [json](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta/resolve/main/trainer_state.json?download=true) file) and realized that the scheduling is messed up. 

It's related to the ConstantLength dataset, not computing its actual length. When I train this model, the progress bar and the total number of iterations are calculated from the underlying H4 Dataset (around 208k samples) instead of the packed version that has around 139k packed sequences of 2048.
This affects the scheduler, which does not perform any warmup. I have an 8xA100 node, so I am running 2x grad accum for an adequate batch size 512.

- I am sure you are missing a `warmup_ratio: 0.1` on the sft configs 
<img width="813" alt="image" src="https://github.com/huggingface/alignment-handbook/assets/18441985/b3019240-ce32-466f-8851-f9627341bced">

~It would be beneficial to have access to the training logs.~ I found them on Tensorboard :(

You can follow my training here: https://wandb.ai/capecape/zephyr/runs/zhfrhnr5

PD: When using trl, I manually compute the total number of train steps beforehand to adequately pass the warmup steps to the scheduler. I know the ConstantLength dataset is a generator that yields batches without knowing beforehand how many samples it will have.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducing SFT results. #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducing SFT results. #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions