Skip to content

Reproducing SFT results. #27

@tcapelle

Description

@tcapelle

I was looking at the logs of your training (from this json file) and realized that the scheduling is messed up.

It's related to the ConstantLength dataset, not computing its actual length. When I train this model, the progress bar and the total number of iterations are calculated from the underlying H4 Dataset (around 208k samples) instead of the packed version that has around 139k packed sequences of 2048.
This affects the scheduler, which does not perform any warmup. I have an 8xA100 node, so I am running 2x grad accum for an adequate batch size 512.

  • I am sure you are missing a warmup_ratio: 0.1 on the sft configs
image

It would be beneficial to have access to the training logs. I found them on Tensorboard :(

You can follow my training here: https://wandb.ai/capecape/zephyr/runs/zhfrhnr5

PD: When using trl, I manually compute the total number of train steps beforehand to adequately pass the warmup steps to the scheduler. I know the ConstantLength dataset is a generator that yields batches without knowing beforehand how many samples it will have.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions