Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renamed evaluation_strategy to eval_strategy #538

Merged
merged 2 commits into from
Sep 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ model = SetFitModel.from_pretrained(
args = TrainingArguments(
batch_size=16,
num_epochs=4,
evaluation_strategy="epoch",
eval_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
)
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/how_to/absa.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ args = TrainingArguments(
num_epochs=5,
use_amp=True,
batch_size=128,
evaluation_strategy="steps",
eval_strategy="steps",
eval_steps=50,
save_steps=50,
load_best_model_at_end=True,
Expand Down
12 changes: 6 additions & 6 deletions docs/source/en/how_to/v1.0.0_migration_guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ This list contains new functionality that can be used starting from v1.0.0.
* [`AbsaTrainer`] and [`AbsaModel`] have been introduced for applying [SetFit for Aspect Based Sentiment Analysis](absa).
* [`Trainer`] now supports a `callbacks` argument for a list of [`transformers` `TrainerCallback` instances](https://huggingface.co/docs/transformers/main/en/main_classes/callback).
* By default, all installed callbacks integrated with `transformers` are supported, including [`TensorBoardCallback`](https://huggingface.co/docs/transformers/main/en/main_classes/callback#transformers.integrations.TensorBoardCallback), [`WandbCallback`](https://huggingface.co/docs/transformers/main/en/main_classes/callback#transformers.integrations.WandbCallback) to log training logs to [TensorBoard](https://www.tensorflow.org/tensorboard) and [W&B](https://wandb.ai), respectively.
* The [`Trainer`] will now print `embedding_loss` in the terminal, as well as `eval_embedding_loss` if `evaluation_strategy` is set to `"epoch"` or `"steps"` in [`TrainingArguments`].
* The [`Trainer`] will now print `embedding_loss` in the terminal, as well as `eval_embedding_loss` if `eval_strategy` is set to `"epoch"` or `"steps"` in [`TrainingArguments`].
* [`Trainer.evaluate`] now works with string labels.
* An updated contrastive pair sampler increases the variety of training pairs.
* [`TrainingArguments`] supports various new arguments:
Expand All @@ -65,14 +65,14 @@ This list contains new functionality that can be used starting from v1.0.0.

* `logging_first_step`: Whether to log and evaluate the first `global_step` or not.
* `logging_steps`: Number of update steps between two logs if `logging_strategy="steps"`.
* `evaluation_strategy`: The evaluation strategy to adopt during training. Possible values are:
* `eval_strategy`: The evaluation strategy to adopt during training. Possible values are:

- `"no"`: No evaluation is done during training.
- `"steps"`: Evaluation is done (and logged) every `eval_steps`.
- `"epoch"`: Evaluation is done at the end of each epoch.

* `eval_steps`: Number of update steps between two evaluations if `evaluation_strategy="steps"`. Will default to the same as `logging_steps` if not set.
* `eval_delay`: Number of epochs or steps to wait for before the first evaluation can be performed, depending on the `evaluation_strategy`.
* `eval_steps`: Number of update steps between two evaluations if `eval_strategy="steps"`. Will default to the same as `logging_steps` if not set.
* `eval_delay`: Number of epochs or steps to wait for before the first evaluation can be performed, depending on the `eval_strategy`.
* `eval_max_steps`: If set to a positive number, the total number of evaluation steps to perform. The evaluation may stop before reaching the set number of steps when all data is exhausted.
* `save_strategy`: The checkpoint save strategy to adopt during training. Possible values are:

Expand All @@ -81,12 +81,12 @@ This list contains new functionality that can be used starting from v1.0.0.
- `"steps"`: Save is done every `save_steps`.

* `save_steps`: Number of updates steps before two checkpoint saves if `save_strategy="steps"`.
* `save_total_limit`: If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in `output_dir`. Note, the best model is always preserved if the `evaluation_strategy` is not `"no"`.
* `save_total_limit`: If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in `output_dir`. Note, the best model is always preserved if the `eval_strategy` is not `"no"`.
* `load_best_model_at_end`: Whether or not to load the best model found during training at the end of training.

<Tip>

When set to `True`, the parameters `save_strategy` needs to be the same as `evaluation_strategy`, and in
When set to `True`, the parameters `save_strategy` needs to be the same as `eval_strategy`, and in
the case it is "steps", `save_steps` must be a round multiple of `eval_steps`.

</Tip>
Expand Down
2 changes: 1 addition & 1 deletion scripts/setfit/distillation_baseline.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ def standard_model_distillation(self, train_raw_student, x_test, y_test, num_cla
per_device_train_batch_size=self.batch_size,
per_device_eval_batch_size=self.batch_size,
num_train_epochs=self.num_epochs,
evaluation_strategy="no",
eval_strategy="no",
save_strategy="no",
load_best_model_at_end=False,
weight_decay=0.01,
Expand Down
6 changes: 3 additions & 3 deletions scripts/setfit/run_fewshot.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def parse_args():
parser.add_argument("--override_results", default=False, action="store_true")
parser.add_argument("--keep_body_frozen", default=False, action="store_true")
parser.add_argument("--add_data_augmentation", default=False)
parser.add_argument("--evaluation_strategy", default=False)
parser.add_argument("--eval_strategy", default=False)

args = parser.parse_args()

Expand Down Expand Up @@ -149,8 +149,8 @@ def main():
num_epochs=args.num_epochs,
num_iterations=args.num_iterations,
)
if not args.evaluation_strategy:
trainer.args.evaluation_strategy = "no"
if not args.eval_strategy:
trainer.args.eval_strategy = "no"
if args.classifier == "pytorch":
trainer.freeze()
trainer.train()
Expand Down
2 changes: 1 addition & 1 deletion scripts/transformers/run_fewshot.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def compute_metrics(pred):
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
evaluation_strategy="epoch",
eval_strategy="epoch",
logging_steps=100,
save_strategy="no",
fp16=True,
Expand Down
2 changes: 1 addition & 1 deletion scripts/transformers/run_fewshot_multilingual.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def compute_metrics(pred):
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
evaluation_strategy="epoch",
eval_strategy="epoch",
logging_steps=100,
save_strategy="no",
fp16=True,
Expand Down
2 changes: 1 addition & 1 deletion scripts/transformers/run_full.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ def compute_metrics(pred):
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.001,
evaluation_strategy="epoch",
eval_strategy="epoch",
logging_steps=100,
metric_for_best_model=metric,
load_best_model_at_end=True,
Expand Down
2 changes: 1 addition & 1 deletion scripts/transformers/run_full_multilingual.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ def compute_metrics(pred):
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
evaluation_strategy="epoch",
eval_strategy="epoch",
logging_steps=100,
metric_for_best_model="eval_loss",
load_best_model_at_end=True,
Expand Down
2 changes: 1 addition & 1 deletion src/setfit/model_card.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def on_train_begin(
"logging_strategy",
"logging_first_step",
"logging_steps",
"evaluation_strategy",
"eval_strategy",
"eval_steps",
"eval_delay",
"save_strategy",
Expand Down
2 changes: 1 addition & 1 deletion src/setfit/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,7 @@ def train_embeddings(
train_dataloader, loss_func, batch_size, num_unique_pairs = self.get_dataloader(
x_train, y_train, args=args, max_pairs=train_max_pairs
)
if x_eval is not None and args.evaluation_strategy != IntervalStrategy.NO:
if x_eval is not None and args.eval_strategy != IntervalStrategy.NO:
eval_max_pairs = -1 if args.eval_max_steps == -1 else args.eval_max_steps * args.embedding_batch_size
eval_dataloader, _, _, _ = self.get_dataloader(x_eval, y_eval, args=args, max_pairs=eval_max_pairs)
else:
Expand Down
37 changes: 22 additions & 15 deletions src/setfit/training_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,19 +124,19 @@ class TrainingArguments:
Whether to log and evaluate the first `global_step` or not.
logging_steps (`int`, defaults to 50):
Number of update steps between two logs if `logging_strategy="steps"`.
evaluation_strategy (`str` or [`~transformers.trainer_utils.IntervalStrategy`], *optional*, defaults to `"no"`):
eval_strategy (`str` or [`~transformers.trainer_utils.IntervalStrategy`], *optional*, defaults to `"no"`):
The evaluation strategy to adopt during training. Possible values are:

- `"no"`: No evaluation is done during training.
- `"steps"`: Evaluation is done (and logged) every `eval_steps`.
- `"epoch"`: Evaluation is done at the end of each epoch.

eval_steps (`int`, *optional*):
Number of update steps between two evaluations if `evaluation_strategy="steps"`. Will default to the same
Number of update steps between two evaluations if `eval_strategy="steps"`. Will default to the same
value as `logging_steps` if not set.
eval_delay (`float`, *optional*):
Number of epochs or steps to wait for before the first evaluation can be performed, depending on the
evaluation_strategy.
eval_strategy.
eval_max_steps (`int`, defaults to `-1`):
If set to a positive number, the total number of evaluation steps to perform. The evaluation may stop
before reaching the set number of steps when all data is exhausted.
Expand All @@ -151,13 +151,13 @@ class TrainingArguments:
Number of updates steps before two checkpoint saves if `save_strategy="steps"`.
save_total_limit (`int`, *optional*, defaults to `1`):
If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in
`output_dir`. Note, the best model is always preserved if the `evaluation_strategy` is not `"no"`.
`output_dir`. Note, the best model is always preserved if the `eval_strategy` is not `"no"`.
load_best_model_at_end (`bool`, *optional*, defaults to `False`):
Whether or not to load the best model found during training at the end of training.

<Tip>

When set to `True`, the parameters `save_strategy` needs to be the same as `evaluation_strategy`, and in
When set to `True`, the parameters `save_strategy` needs to be the same as `eval_strategy`, and in
the case it is "steps", `save_steps` must be a round multiple of `eval_steps`.

</Tip>
Expand Down Expand Up @@ -208,7 +208,8 @@ class TrainingArguments:
logging_first_step: bool = True
logging_steps: int = 50

evaluation_strategy: str = "no"
eval_strategy: str = "no"
evaluation_strategy: str = field(default="no", repr=False, init=False) # Softly deprecated
eval_steps: Optional[int] = None
eval_delay: int = 0
eval_max_steps: int = -1
Expand Down Expand Up @@ -251,30 +252,36 @@ def __post_init__(self) -> None:
self.logging_dir = default_logdir()

self.logging_strategy = IntervalStrategy(self.logging_strategy)
self.evaluation_strategy = IntervalStrategy(self.evaluation_strategy)
if self.evaluation_strategy and not self.eval_strategy:
logger.warning(
"The `evaluation_strategy` argument is deprecated and will be removed in a future version. "
"Please use `eval_strategy` instead."
)
self.eval_strategy = self.evaluation_strategy
self.eval_strategy = IntervalStrategy(self.eval_strategy)

if self.eval_steps is not None and self.evaluation_strategy == IntervalStrategy.NO:
logger.info('Using `evaluation_strategy="steps"` as `eval_steps` is defined.')
self.evaluation_strategy = IntervalStrategy.STEPS
if self.eval_steps is not None and self.eval_strategy == IntervalStrategy.NO:
logger.info('Using `eval_strategy="steps"` as `eval_steps` is defined.')
self.eval_strategy = IntervalStrategy.STEPS

# eval_steps has to be defined and non-zero, fallbacks to logging_steps if the latter is non-zero
if self.evaluation_strategy == IntervalStrategy.STEPS and (self.eval_steps is None or self.eval_steps == 0):
if self.eval_strategy == IntervalStrategy.STEPS and (self.eval_steps is None or self.eval_steps == 0):
if self.logging_steps > 0:
self.eval_steps = self.logging_steps
else:
raise ValueError(
f"evaluation strategy {self.evaluation_strategy} requires either non-zero `eval_steps` or"
f"evaluation strategy {self.eval_strategy} requires either non-zero `eval_steps` or"
" `logging_steps`"
)

# Sanity checks for load_best_model_at_end: we require save and eval strategies to be compatible.
if self.load_best_model_at_end:
if self.evaluation_strategy != self.save_strategy:
if self.eval_strategy != self.save_strategy:
raise ValueError(
"`load_best_model_at_end` requires the save and eval strategy to match, but found\n- Evaluation "
f"strategy: {self.evaluation_strategy}\n- Save strategy: {self.save_strategy}"
f"strategy: {self.eval_strategy}\n- Save strategy: {self.save_strategy}"
)
if self.evaluation_strategy == IntervalStrategy.STEPS and self.save_steps % self.eval_steps != 0:
if self.eval_strategy == IntervalStrategy.STEPS and self.save_steps % self.eval_steps != 0:
raise ValueError(
"`load_best_model_at_end` requires the saving steps to be a round multiple of the evaluation "
f"steps, but found {self.save_steps}, which is not a round multiple of {self.eval_steps}."
Expand Down
2 changes: 1 addition & 1 deletion tests/span/test_model_card.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def test_model_card(absa_dataset: Dataset, tmp_path: Path) -> None:
eval_steps=1,
logging_steps=1,
max_steps=2,
evaluation_strategy="steps",
eval_strategy="steps",
)
trainer = AbsaTrainer(
model=model,
Expand Down
2 changes: 1 addition & 1 deletion tests/test_model_card.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def test_model_card(tmp_path: Path) -> None:
eval_steps=1,
logging_steps=1,
max_steps=2,
evaluation_strategy="steps",
eval_strategy="steps",
)
trainer = Trainer(
model=model,
Expand Down
2 changes: 1 addition & 1 deletion tests/test_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -590,7 +590,7 @@ def test_train_load_best(model: SetFitModel, tmp_path: Path, caplog: LogCaptureF
output_dir=tmp_path,
save_steps=5,
eval_steps=5,
evaluation_strategy="steps",
eval_strategy="steps",
load_best_model_at_end=True,
num_epochs=5,
)
Expand Down
12 changes: 6 additions & 6 deletions tests/test_training_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,29 +72,29 @@ def test_report_to(self):

def test_eval_steps_without_eval_strat(self):
args = TrainingArguments(eval_steps=5)
self.assertEqual(args.evaluation_strategy, IntervalStrategy.STEPS)
self.assertEqual(args.eval_strategy, IntervalStrategy.STEPS)

def test_eval_strat_steps_without_eval_steps(self):
args = TrainingArguments(evaluation_strategy="steps")
args = TrainingArguments(eval_strategy="steps")
self.assertEqual(args.eval_steps, args.logging_steps)
with self.assertRaises(ValueError):
TrainingArguments(evaluation_strategy="steps", logging_steps=0, logging_strategy="no")
TrainingArguments(eval_strategy="steps", logging_steps=0, logging_strategy="no")

def test_load_best_model(self):
with self.assertRaises(ValueError):
TrainingArguments(load_best_model_at_end=True, evaluation_strategy="steps", save_strategy="epoch")
TrainingArguments(load_best_model_at_end=True, eval_strategy="steps", save_strategy="epoch")
with self.assertRaises(ValueError):
TrainingArguments(
load_best_model_at_end=True,
evaluation_strategy="steps",
eval_strategy="steps",
save_strategy="steps",
eval_steps=100,
save_steps=50,
)
# No error: save_steps is a round multiple of eval_steps
TrainingArguments(
load_best_model_at_end=True,
evaluation_strategy="steps",
eval_strategy="steps",
save_strategy="steps",
eval_steps=50,
save_steps=100,
Expand Down
Loading