Refactor to introduce `Trainer` & `TrainingArguments`, add SetFit ABSA #265

tomaarsen · 2023-01-11T18:17:11Z

Closes #179, closes #238, closes #320

Hello!

Pull Request overview

Additions:
- Trainer, a modified version of the old SetFitTrainer.
- DistilledTrainer, a modified version of the old DistillationSetFitTrainer.
- TrainingArguments, a dataclass used throughout the updated Trainer classes.
Changes & Deprecations:
- SetFitTrainer and DistillationSetFitTrainer: a soft deprecation, i.e. old code should give a warning, but still works.
- Trainer.unfreeze and Trainer.freeze now simply point to SetFitModel.unfreeze and SetFitModel.freeze. This involves a soft deprecation for the keep_body_frozen parameter, i.e. old code should give a warning, but still works. Furthermore, Trainer.freeze() now freezes the full model, not just the head.
- Trainer.train no longer accepts training arguments like num_epochs: a fairly soft deprecation, i.e. old code no longer works like before as the keyword arguments are ignored, but the code won't crash. A warning is thrown instead. This will likely affect all users of the differentiable head.
- SupConLoss was moved to src/setfit/losses.py.
Hard removals:
- SetFitBaseModel and SKLearnWrapper, which were both unused.
Tests:
- Introduced test_deprecated_trainer_distillation.py and test_deprecated_trainer.py. These files contain old unmodified tests, and they show that despite drastic changes, old code still generally works (with exception of the aforementioned Trainer.train training argument deprecation).
- Updated test_trainer_distillation.py and test_trainer.py to use the new training procedure.
- Introduced test_training_args.py.

Goals

Based on my own experiments and useful feedback from @kgourgou in the refactoring discussion in #238, I have reduced my goal for this refactoring to the following:

Ensure a familiar, intuitive training scheme involving Trainer and TrainingArguments classes.
Additionally, ensure that the model can also be trained without a TrainingArguments class. In short, a fit method of a component (body or head) must not accept a TrainingArguments instance as a parameter, but merely the hyperparameters like learning_rate, etc.
Ensure that the entire SetFit model can be trained with a single trainer.train(), regardless of the head chosen.
Ensure a smooth and unintrusive way for users to update their code to after this refactor.

Notably, I'm not introducing an interface for classifier heads like I had originally planned, and I'm not deprecating SetFitModel.from_pretrained(..., use_differentiable_head=True) with SetFitModel.from_pretrained(..., head=...). Perhaps some other time we can have a new discussion about those.

Upgrading guide

To update your code to work with this PR, the following changes must be made:

Replace all uses of SetFitTrainer with Trainer, and all uses of DistillationSetFitTrainer with DistillationTrainer.
Remove num_iterations, num_epochs, learning_rate, batch_size, seed, use_amp, warmup_proportion, distance_metric, margin, samples_per_label from a Trainer initialisation, and move them to a TrainerArguments initialisation instead. This instance should then be passed to the trainer via the args argument.
Refactor multiple trainer.train(), trainer.freeze() and trainer.unfreeze() calls that were previously necessary to train the differentiable head into just one trainer.train() call by setting batch_size and num_epochs on the TrainingArguments dataclass with Tuples. The first value is for training the embeddings, and the second is for training the classifier.

I applied this guide to all of the tests to upgrade them to work with the new classes.

An example using a differentiable head:

Before:

# Load a SetFit model from Hub
model: SetFitModel = SetFitModel.from_pretrained(
    "sentence-transformers/paraphrase-mpnet-base-v2",
    use_differentiable_head=True,
    head_params={"out_features": 2},
)

# Create trainer
trainer = SetFitTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss_class=CosineSimilarityLoss,
    metric="accuracy",
    learning_rate=2e-5,
    batch_size=16,
    num_iterations=20,
    num_epochs=1,
)

trainer.freeze() # Freeze the head
trainer.train() # Train only the body

# Unfreeze the head and unfreeze the body -> end-to-end training
trainer.unfreeze(keep_body_frozen=False)

trainer.train(
    num_epochs=16,
    batch_size=2,
    body_learning_rate=1e-5,
    learning_rate=1e-2,
)
metrics = trainer.evaluate()

After:

# Load a SetFit model from Hub
model: SetFitModel = SetFitModel.from_pretrained(
    "sentence-transformers/paraphrase-mpnet-base-v2",
    use_differentiable_head=True,
    head_params={"out_features": 2},
)

# Create Training Arguments
args = TrainingArguments(
    # When an argument is a tuple, the first value is for training the embeddings,
    # and the latter is for training the differentiable classification head:
    batch_size=(16, 2),
    num_iterations=20,
    num_epochs=(1, 16),
    body_learning_rate=(2e-5, 1e-5),
    head_learning_rate=1e-2,
    end_to_end=True,
    loss=CosineSimilarityLoss,
)

# Create Trainer
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    metric="accuracy",
)

# Train and evaluate
trainer.train()
metrics = trainer.evaluate()

Deprecations

`SetFitTrainer` & `DistillationSetFitTrainer`

This is a soft deprecation, i.e. these can still be used like before. For reference, the following snippet still works.

First snippet from the README, using SetFitTrainer

from datasets import load_dataset
from sentence_transformers.losses import CosineSimilarityLoss

from setfit import SetFitModel, SetFitTrainer, sample_dataset


# Load a dataset from the Hugging Face Hub
dataset = load_dataset("sst2")

# Simulate the few-shot regime by sampling 8 examples per class
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8)
eval_dataset = dataset["validation"]

# Load a SetFit model from Hub
model: SetFitModel = SetFitModel.from_pretrained(
    "sentence-transformers/paraphrase-mpnet-base-v2",
)

# Create trainer
trainer = SetFitTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss_class=CosineSimilarityLoss,
    metric="accuracy",
    batch_size=4,
    num_iterations=20,  # The number of text pairs to generate for contrastive learning
    num_epochs=1,  # The number of epochs to use for contrastive learning
    column_mapping={"sentence": "text", "label": "label"},  # Map dataset columns to text/label expected by trainer
)

# Train and evaluate
trainer.train()
metrics = trainer.evaluate()

The corresponding output

Found cached dataset sst2 ([sic])
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 299.96it/s] 
fb86.arrow
Loading cached processed dataset at [sic]
Loading cached processed dataset at [sic]
Loading cached shuffled indices for dataset at [sic]
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.  
[sic]\demo_readme.py:20: DeprecationWarning: `SetFitTrainer` has been deprecated. Please use `Trainer` instead.
  trainer = SetFitTrainer(
Applying column mapping to training dataset
***** Running training *****
  Num examples = 640
  Num epochs = 1
  Total optimization steps = 160
  Total train batch size = 4
Iteration: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160/160 [02:13<00:00,  1.20it/s] 
Epoch: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [02:13<00:00, 133.68s/it] 
Applying column mapping to evaluation dataset
***** Running evaluation *****
{'accuracy': 0.8818807339449541}

Note that the output contains a DeprecationWarning.

`keep_body_frozen` on `trainer.unfreeze`:

This is also a soft deprecation, so we can use this like before. Any use of keep_body_frozen results in a warning, and trainer.unfreeze(keep_body_frozen=True) is made equivalent to model.unfreeze("head"), while trainer.unfreeze(keep_body_frozen=False) is like model.unfreeze(). However, there should be no need for this method anymore, so we may be better off to deprecate it fully.

Please note the following snippet:

trainer.freeze()  # Freeze the head
trainer.train()  # Train the head

# unfreeze and train the body

This snippet was common when training with a differentiable head, but stops working after this PR, as trainer.freeze() will now fully freeze the model, after which trainer.train() fails as it can't perform a backward on the frozen sentence transformer body. This causes an error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

`Trainer.train` arguments

This is a fairly soft deprecation, so code with e.g. trainer.train(num_epochs=..., ...) no longer works like before. The arguments passed via keywords are simply ignored now, while a warning is thrown:

[sic]\demo_readme_diff.py:44: DeprecationWarning: `SetFitTrainer.train` does not accept keyword arguments anymore. Please provide training arguments via a `TrainingArguments` instance to the `SetFitTrainer`initialisation or the `SetFitTrainer.train` method.
  trainer.train(

Training Arguments

Note that the TrainingArgument class must be able to hold separate values for training the embeddings vs the classifier. Most difficult is the case of learning rates, for which 3 separate values must exist:

The learning rate of the sentence transformer body while training the embeddings.
The learning rate of the sentence transformer body while training the classifier.
The learning rate of the head while training the classifier.

For several other arguments (e.g. num_epochs, batch_sizes), two values must exist:

For training the embeddings.
For training the classifier.

For these last two cases, TrainingArguments(num_epochs=12) can be used, which will set both embedding_num_epochs and classifier_num_epochs to 12. Alternatively, TrainingArguments(num_epochs=(1, 12)) will set embedding_num_epochs to 1 and classifier_num_epochs to 12. Beyond that, TrainingArguments(embedding_num_epochs=1, classifier_num_epochs=12) is also allowed, i.e. they can be set directly. In practice, only embedding_num_epochs and classifier_num_epochs are used, while num_epochs exists only to make setting the arguments more natural.

The learning rate case is similar. For these, body_learning_rate and head_learning_rate must be set explicitly. Like before, body_learning_rate can be either set as a float or as a tuple of floats. If a tuple is given, then the first value is once again for training the embeddings, while the second value is for training the classifier. If instead only one value is given to body_learning_rate, then it is internally expanded to a tuple. Afterwards, body_embedding_learning_rate and body_classifier_learning_rate are either explicitly provided or implicitly set via body_learning_rate. Like before, only body_embedding_learning_rate and body_classifier_learning_rate are used, while body_learning_rate exists only to make setting the arguments more natural.

Beyond that, there is a new training argument called end_to_end. If the model is initialized with a differentiable head, then this argument specifies whether the sentence transformer body is frozen (if end_to_end=False) or not (if end_to_end is True) during training of the classifier. This replaces the previous need of trainer.freeze() before and during trainer.train() calls.

See the Upgrading Guide for an example of how the TrainingArguments may be initialized in a relatively complex case.

Tests

The new tests\test_deprecated_trainer.py and tests\test_deprecated_trainer_distillation.py files are equivalent to the tests\test_trainer.py and tests\test_trainer_distillation.py files at ac958ae, with four exception: In tests\test_deprecated_trainer.py, three tests are skipped:

Two are skipped because they contain trainer.train(max_length=max_length, ...) and then expect a warning based on that. However, these arguments are now ignored, so no warning is ever thrown.
Another test is skipped because it contains the following deprecated snippet:
```
trainer.freeze()  # Freeze the head
trainer.train()  # Train the head

# unfreeze and train the body
```
See the Deprecation section for more details on why this crashes.

And as a fourth change, from setfit.modeling import SupConLoss was changed to from setfit.losses import SupConLoss.

Using these tests, it is clear that users will not immediately face dozens of errors when updating. Instead, they may face several warnings and perhaps an error if they are using differentiable heads.

Discussion points

How should users be notified on how to update their code? A section in the README? Via descriptive deprecation warnings? A link to this PR in the release notes?
What should the deprecation timeline be? i.e. when do we remove SetFitTrainer altogether? We should include this in the deprecation warning messages.
Should we simply deprecate Trainer.freeze() and Trainer.unfreeze? They should not be needed anymore, and they are simply equivalent to calling trainer.model.freeze() and trainer.model.unfreeze(). I'm in favour of the deprecation.
If we do not deprecate Trainer.freeze(), should we then introduce a special error message when people use trainer.freeze() followed by trainer.train(), which now breaks whereas it worked previously? (See the Deprecation section)
Are we a fan of the name end_to_end for training a differentiable model in full, i.e. not freezing the body when training the classifier?
Should other arguments to Trainer be absorbed into TrainingArguments? In particular, loss_class?
Which release should this fall under? 0.6.0? 0.7.0? 1.0.0?

And perhaps most importantly, I'm interested in all of your feedback on how you feel about these changes.

To do:

There are several things that still need to be done before this can be merged:

Docstrings for all arguments in TrainingArgument.
Update all method docstrings.
Pick deprecation timeline for deprecated classes and methods.
Create additional tests to ensure that e.g. embedding_batch_size and classifier_batch_size are set correctly whenever batch_size is set.

I'll make a separate PR for updating the README/Notebook/scripts.

Feel free to ask me anything if you have any questions.

cc: @lewtun

Tom Aarsen

Note: This commit involves three deprecations: the SetFitTrainer class (and DistilledSetFitTrainer), additional arguments to Trainer.train and the keep_body_frozen argument to Trainer.unfreeze. The first and last of these are 'graceful', i.e. old code will still work, but the Trainer.train changes are breaking in some situations. For example, e.g. um_epochs can no longer be passed to Trainer.train. The new 'deprecated' test files are identical to the old test files. The goal here is to test whether old behaviour is still possible. For the most part it is, with exception of using Trainer.train with extra arguments. As a result, I skipped two tests in test_deprecated_trainer.py. Also note that docstrings have yet to be updated!

…rameter

…actor_v2

…anch 'main' of https://github.com/huggingface/setfit into refactor_v2

tomaarsen · 2023-01-23T11:07:01Z

Merged 4cad62c...174eb00 into this PR via 94106cc.
Merged 29c0348 into this PR via a39e772.

…rate The reasoning is that with body_learning_rate, the tuple is for (training embedding phase, training classifier phase), which matches the tuples that you should give to num_epochs and batch_size.

…actor_v2

tomaarsen · 2023-01-23T12:08:13Z

Merged 0cb8ffd into this PR via 7d4ad00.

lewtun

Wow, this is a real tour de force of a PR @tomaarsen - great job refactoring all this complexity into clean APIs 🚀 !

Overall the code looks great and most of my comments are just nits + some suggestions for the deprecation warnings. I'll respond to your discussion points in a separate comment

src/setfit/modeling.py

src/setfit/trainer.py

src/setfit/trainer_distillation.py

src/setfit/training_args.py

tests/test_deprecated_trainer.py

tests/test_training_args.py

lewtun · 2023-02-06T11:22:31Z

Regarding the discussion points:

How should users be notified on how to update their code? A section in the README? Via descriptive deprecation warnings? A link to this PR in the release notes?

In transformers we usually do this in two ways: add descriptive deprecations warnings that feature X will be removed in the next major version of the lib & add a "Breaking changes" section to the release notes with a link to this PR (example). I think we can adopt the same strategy here :)

What should the deprecation timeline be? i.e. when do we remove SetFitTrainer altogether? We should include this in the deprecation warning messages.

I would like to remove it in v1 if possible, so one approach for that could be to make a v0.6 or v0.7 release that has all the deprecation warnings included. Alternatively, we could merge this in it's current form and tell users it will be removed in v2

Should we simply deprecate Trainer.freeze() and Trainer.unfreeze? They should not be needed anymore, and they are simply equivalent to calling trainer.model.freeze() and trainer.model.unfreeze(). I'm in favour of the deprecation.

Yes, please deprecate!

Are we a fan of the name end_to_end for training a differentiable model in full, i.e. not freezing the body when training the classifier?

I think it's OK, but happy to change it if you have a different name :)

Should other arguments to Trainer be absorbed into TrainingArguments? In particular, loss_class?

Yes, I've been wondering this too - I think it would be good to put all non-dataset-dependent objects into TrainingArguments - WDYT? I think technically we could also put metric and column_mapping there too right?

Which release should this fall under? 0.6.0? 0.7.0? 1.0.0?

I think we could aim v1 to be a big release where we are allowed to have a few breaking changes - WDYT?

…actor_v2

tomaarsen · 2023-02-06T13:29:10Z

Merged 9b7f74e into this PR via aab2377.

I'm down to move metric, loss and column_mapping to TrainingArguments.

I'll try to get to work on your comments soon.

…actor_v2

Also add DeprecationWarning for DistillationSetFitTrainer

…actor_v2

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

The old conditional was True with the default -1, not ideal

Sampling strategy (new sampler)

…actor_v2

* Initial version for SetFit ABSA * Create complete test suite for ABSA (100%,90%,96%) Only push_to_hub is not under test * Run formatting * Allow initializing models with different span_context * Remove resolved TODO * Raise error if args is the wrong type * Update column mapping, allow partial maps * Remove model_init from ABSA Trainer, not used * Split train into train_aspect and train_polarity And reformat * Prefix logs with aspect/polarity when training * Add ABSA-specific model cards * If spaCy doesn't agree with the start/end, just ignore those cases * If there are no aspects, just return * Elaborate on the required columns in the datasets * Add Absa to the WIP docs

As it requires accelerate to be updated

solved an issue that was causing failures

tomaarsen added 8 commits January 11, 2023 14:59

Readded support for hyperparameter tuning

89f4435

Remove unused imports and reformat

5f2a6b3

Preserve desired behaviour despite deprecation of keep_body_frozen pa…

622f33b

…rameter

Ensure that DeprecationWarnings are displayed

ff59154

Set Trainer.freeze and Trainer.unfreeze methods normally

3b4ef58

Add TrainingArgument tests for num_epochs, batch_sizes, lr

fd68274

Convert trainer.train arguments into a softer deprecation

14602ea

tomaarsen added the enhancement New feature or request label Jan 11, 2023

tomaarsen requested a review from lewtun January 11, 2023 18:17

tomaarsen mentioned this pull request Jan 14, 2023

if model head is nn.module, final classification layer is not trained. #271

Closed

tomaarsen mentioned this pull request Jan 22, 2023

Resolve SentenceTransformer resetting devices after moving a SetFitModel #283

Merged

tomaarsen added 2 commits January 22, 2023 13:15

Merge branch 'main' of https://github.com/huggingface/setfit into ref…

94106cc

…actor_v2

Merge branch 'refactor_v2' of https://github.com/tomaarsen/setfit; br…

a39e772

…anch 'main' of https://github.com/huggingface/setfit into refactor_v2

tomaarsen added 2 commits January 23, 2023 12:45

Use body/head_learning_rate instead of classifier/embedding_learning_…

9fc55a6

…rate The reasoning is that with body_learning_rate, the tuple is for (training embedding phase, training classifier phase), which matches the tuples that you should give to num_epochs and batch_size.

Merge branch 'main' of https://github.com/huggingface/setfit into ref…

7d4ad00

…actor_v2

lewtun approved these changes Feb 6, 2023

View reviewed changes

Merge branch 'main' of https://github.com/huggingface/setfit into ref…

aab2377

…actor_v2

tomaarsen and others added 8 commits February 6, 2023 14:38

Reformat according to the newest black version

dee70b1

Merge branch 'main' of https://github.com/huggingface/setfit into ref…

fb6547d

…actor_v2

Remove "classifier" from var names in SetFitHead

abbbb03

Update DeprecationWarnings to include timeline

12d326e

Also add DeprecationWarning for DistillationSetFitTrainer

Merge branch 'main' of https://github.com/huggingface/setfit into ref…

70c0295

…actor_v2

Convert training_argument imports to relative imports

fc246cc

Make conditional explicit

57aa54f

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Make conditional explicit

7ebdf93

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

tomaarsen and others added 25 commits October 25, 2023 14:08

Move num_iterations back to TrainingArguments

2802a3f

Fix broken trainer tests due to new default sampling

391f991

Use the Contrastive Dataset for Distillation

f8b7253

Set the default logging steps at 50

38e9607

Add max_steps argument to TrainingArguments

4ead15d

Change max_steps conditional

eb70336

The old conditional was True with the default -1, not ideal

Merge pull request #5 from danstan5/refactor-sampling

3478799

Sampling strategy (new sampler)

Merge branch 'main' of https://github.com/huggingface/setfit into ref…

d9c4a05

…actor_v2

Seeds are now correctly applied for reproducibility

5b39f06

Don't scale gradients during evaluation

7c3feed

Use evaluation_strategy="steps" if eval_steps is set

cdc8979

Run formatting

e040167

Import optuna under TYPE_CHECKING

5c4569d

Remove unused import, reformat

ceeb725

Add MANIFEST.in with model_card_template

5c669b5

Don't require transformers TrainingArgs in tests

8e201e5

As it requires accelerate to be updated

Update URLs in setup.py

6ae5045

Increase min hf_hub version to 0.12.0 for SoftTemporaryDirectory

ecaabb4

Include MANIFEST.in data via include_package_data=True

4e79397

Use kwargs instead of args in super call

65aff32

Use v0.13.0 as min. version as huggingface/huggingface_hub#1315

eeeac55

solved an issue that was causing failures

Use en_core_web_sm for tests

3214f1b

Remove incorrect spacy_model from AspectModel/PolarityModel

2b78bb0

Rerun formatting

b68f655

tomaarsen changed the base branch from main to v1.0.0-pre November 10, 2023 11:10

tomaarsen changed the title ~~Refactor to introduce Trainer & TrainingArguments~~ Refactor to introduce Trainer & TrainingArguments, add SetFit ABSA Nov 10, 2023

Run CI on pre branch & workflow dispatch

d85f0d9

tomaarsen merged commit b636cd7 into huggingface:v1.0.0-pre Nov 10, 2023
18 checks passed

tomaarsen mentioned this pull request Nov 21, 2023

Prepare v1.0.0 release - Trainer, TrainingArguments, SetFitABSA, logging, evaluation during training, callbacks, docs #439

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to introduce `Trainer` & `TrainingArguments`, add SetFit ABSA #265

Refactor to introduce `Trainer` & `TrainingArguments`, add SetFit ABSA #265

tomaarsen commented Jan 11, 2023 •

edited

Loading

tomaarsen commented Jan 23, 2023

tomaarsen commented Jan 23, 2023

lewtun left a comment

lewtun commented Feb 6, 2023 •

edited

Loading

tomaarsen commented Feb 6, 2023 •

edited

Loading

Refactor to introduce Trainer & TrainingArguments, add SetFit ABSA #265

Refactor to introduce Trainer & TrainingArguments, add SetFit ABSA #265

Conversation

tomaarsen commented Jan 11, 2023 • edited Loading

Pull Request overview

Goals

Upgrading guide

Before:

After:

Deprecations

SetFitTrainer & DistillationSetFitTrainer

keep_body_frozen on trainer.unfreeze:

Trainer.train arguments

Training Arguments

Tests

Discussion points

To do:

tomaarsen commented Jan 23, 2023

tomaarsen commented Jan 23, 2023

lewtun left a comment

Choose a reason for hiding this comment

lewtun commented Feb 6, 2023 • edited Loading

tomaarsen commented Feb 6, 2023 • edited Loading

Refactor to introduce `Trainer` & `TrainingArguments`, add SetFit ABSA #265

Refactor to introduce `Trainer` & `TrainingArguments`, add SetFit ABSA #265

tomaarsen commented Jan 11, 2023 •

edited

Loading

`SetFitTrainer` & `DistillationSetFitTrainer`

`keep_body_frozen` on `trainer.unfreeze`:

`Trainer.train` arguments

lewtun commented Feb 6, 2023 •

edited

Loading

tomaarsen commented Feb 6, 2023 •

edited

Loading