[WIP] Add SPIN trainer #1344

lewtun · 2024-02-21T09:05:37Z

Implements the Self-Play fIne-tuNing (SPIN) algorithm from: https://arxiv.org/abs/2401.01335

TODO

Validate training works for a few iterations of Qwen-1.5-0.5b
Fix generation with ZeRo-3 init (hitting annoying RuntimeError: 'weight' must be 2-D errors)
Add docs
Refactor / clean up internals

lewtun · 2024-02-21T09:06:30Z

examples/scripts/spin.py

+    preprocessing_num_workers: int = field(
+        default=12, metadata={"help": "The number of processes to use for the preprocessing."}
+    )
+    do_generate: bool = field(default=False)


These args could arguably live in the SPINConfig

HuggingFaceDocBuilderDev · 2024-02-21T09:34:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vwxyzjn

Great work! Left some initial comments :)

vwxyzjn · 2024-02-27T14:28:14Z

examples/scripts/dpo.py

 import torch
 from datasets import Dataset, load_dataset
 from transformers import AutoModelForCausalLM, AutoTokenizer, HfArgumentParser, TrainingArguments

 from trl import DPOTrainer, ModelConfig, get_kbit_device_map, get_peft_config, get_quantization_config


+datasets.disable_caching()


What is this? Maybe we should do load_from_cache=False?

vwxyzjn · 2024-02-27T14:30:04Z

examples/scripts/spin.py

+    if script_args.do_generate:
+        print(f"Generating completions for {len(prompt_train_ds)} training examples")
+        train_completions = spin_trainer.generate(prompt_train_ds, "prompt", generation_config, batch_size=16)
+        test_completions = spin_trainer.generate(prompt_test_ds, "prompt", generation_config, batch_size=16)


If the generation only occurs in the end, wdyt about doing a subprocess call on a vllm/tgi script to generate instead?

vwxyzjn · 2024-02-27T14:37:26Z

trl/trainer/utils.py

+            chosen_tokens = self.tokenizer(chosen, add_special_tokens=False)
+            rejected_tokens = self.tokenizer(rejected, add_special_tokens=False)
+            prompt_tokens = self.tokenizer(prompt, add_special_tokens=False)


This feels quite similar to the logic in DPOTrainer. Maybe unify the logic somehow (or use similar terminology).

trl/trl/trainer/dpo_trainer.py

Lines 721 to 729 in b32656f

chosen_tokens = self.tokenizer(

chosen, truncation=True, max_length=self.max_target_length, add_special_tokens=True

)

rejected_tokens = self.tokenizer(

rejected, truncation=True, max_length=self.max_target_length, add_special_tokens=True

)

prompt_tokens = self.tokenizer(

prompt, truncation=True, max_length=self.max_prompt_length, add_special_tokens=True

)

Not necessary but we could also benefit from doing a filediff between SPIN and DPO and try to make the lines of code differences minimal.

github-actions · 2024-03-22T15:05:02Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

davidberenstein1957 · 2024-04-17T12:08:17Z

I wanted to look at implementing this in TRL. @lewtun @vwxyzjn was there a reason this was closed? Should we reopen it and can I continue the work?

lewtun added 12 commits February 12, 2024 11:31

Add SPINTrainer

cfe72d3

Add callback

f8b5b76

Move callback

a927f68

Refactor

4b9cb2a

Pass dataset to callback

2b23ad8

Refcator

93fabfc

Add generate method

9ca2ef3

Make generate work bs=1

01f106b

Add batched generation

48d98ca

Clean up'

38b5e27

Refactor

85d7930

Remove callback

8eb43fc

lewtun commented Feb 21, 2024

View reviewed changes

Fix import

6027f8a

Make ZeRO-3 inference work

2910bef

lewtun mentioned this pull request Feb 21, 2024

Question about which datasets are used for each iteration uclaml/SPIN#11

Closed

vwxyzjn reviewed Feb 27, 2024

View reviewed changes

github-actions bot closed this Mar 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add SPIN trainer #1344

[WIP] Add SPIN trainer #1344

lewtun commented Feb 21, 2024 •

edited

lewtun Feb 21, 2024

HuggingFaceDocBuilderDev commented Feb 21, 2024

vwxyzjn left a comment

vwxyzjn Feb 27, 2024

vwxyzjn Feb 27, 2024

vwxyzjn Feb 27, 2024

github-actions bot commented Mar 22, 2024

davidberenstein1957 commented Apr 17, 2024

	chosen_tokens = self.tokenizer(
	chosen, truncation=True, max_length=self.max_target_length, add_special_tokens=True
	)
	rejected_tokens = self.tokenizer(
	rejected, truncation=True, max_length=self.max_target_length, add_special_tokens=True
	)
	prompt_tokens = self.tokenizer(
	prompt, truncation=True, max_length=self.max_prompt_length, add_special_tokens=True
	)

[WIP] Add SPIN trainer #1344

[WIP] Add SPIN trainer #1344

Conversation

lewtun commented Feb 21, 2024 • edited

TODO

lewtun Feb 21, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 21, 2024

vwxyzjn left a comment

Choose a reason for hiding this comment

vwxyzjn Feb 27, 2024

Choose a reason for hiding this comment

vwxyzjn Feb 27, 2024

Choose a reason for hiding this comment

vwxyzjn Feb 27, 2024

Choose a reason for hiding this comment

github-actions bot commented Mar 22, 2024

davidberenstein1957 commented Apr 17, 2024

lewtun commented Feb 21, 2024 •

edited