Missing config params on SFT #31

tcapelle · 2023-11-15T14:09:37Z

Hi,
Small PR to add the missing warmup and the total number of steps so the training happens correctly.
I am also adding info on the GPU requirements ( 80GB Gpus ). <- this is on the main readme =P

The link to the experiment

alvarobartt · 2023-11-15T15:05:46Z

recipes/zephyr-7b-beta/sft/config_lora.yaml

+warmup_ratio: 0.1
 max_seq_length: 2048
-max_steps: -1
+max_steps: 272


I think max_steps=-1 because num_train_epochs is used instead

Yeah, but the ConstantLengthDataset doesn't know how many steps it will run, so the scheduler can't setup the warmup cycle correctly

I am going to try fixing this in trl

lewtun

Good catch with the logging steps @tcapelle ! There's an open PR to fix this in TRL here (huggingface/trl#979), so I suggest we keep the YAML configs of the repo unchanged for now

lewtun · 2023-11-17T08:21:09Z

recipes/zephyr-7b-beta/README.md


 ## Full training examples
-
+You will require 8 GPUs (80GB of VRAM) to train the full model.


Happy to keep this line in the PR if you don't mind reverting the config changes :)

haha, my bad cause this is specified on the main readme file =)

thanks for iterating!

This reverts commit 760e477.

lewtun · 2023-11-21T10:45:14Z

recipes/zephyr-7b-beta/README.md


 ## Full training examples
-
+You will require 8 GPUs (80GB of VRAM) to train the full model.


thanks for iterating!

tcapelle added 2 commits November 15, 2023 15:07

fix warmup with total number of steps

760e477

Explicitely tell to use 80GB Gpus

dbcda43

alvarobartt reviewed Nov 15, 2023

View reviewed changes

lewtun reviewed Nov 17, 2023

View reviewed changes

Revert "fix warmup with total number of steps"

1603d43

This reverts commit 760e477.

lewtun approved these changes Nov 21, 2023

View reviewed changes

lewtun merged commit f025057 into huggingface:main Nov 21, 2023

This was referenced Dec 8, 2023

Add warmup parameter to config tcapelle/alignment-handbook#1

Closed

Add warmup to config #71

Merged

SFT lora ends with higher loss #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing config params on SFT #31

Missing config params on SFT #31

Uh oh!

tcapelle commented Nov 15, 2023 •

edited

Loading

Uh oh!

alvarobartt Nov 15, 2023

Uh oh!

tcapelle Nov 15, 2023

Uh oh!

tcapelle Nov 15, 2023 •

edited

Loading

Uh oh!

lewtun left a comment

Uh oh!

lewtun Nov 17, 2023

Uh oh!

tcapelle Nov 17, 2023

Uh oh!

lewtun Nov 21, 2023

Uh oh!

lewtun Nov 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## Full training examples

		You will require 8 GPUs (80GB of VRAM) to train the full model.

Missing config params on SFT #31

Missing config params on SFT #31

Uh oh!

Conversation

tcapelle commented Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alvarobartt Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

tcapelle Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

tcapelle Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

lewtun Nov 17, 2023

Choose a reason for hiding this comment

Uh oh!

tcapelle Nov 17, 2023

Choose a reason for hiding this comment

Uh oh!

lewtun Nov 21, 2023

Choose a reason for hiding this comment

Uh oh!

lewtun Nov 21, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tcapelle commented Nov 15, 2023 •

edited

Loading

tcapelle Nov 15, 2023 •

edited

Loading