Skip to content

Commit

Permalink
dvclive: Add huggingface updates
Browse files Browse the repository at this point in the history
  • Loading branch information
daavoo committed Aug 18, 2023
1 parent 933a11f commit a90cf9c
Showing 1 changed file with 66 additions and 14 deletions.
80 changes: 66 additions & 14 deletions content/docs/dvclive/ml-frameworks/huggingface.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ DVCLive allows you to add experiment tracking capabilities to your

## Usage

<p align='center'>
<a href="https://colab.research.google.com/github/iterative/dvclive/blob/main/examples/DVCLive-HuggingFace.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" />
</a>
</p>

Include the
[`DVCLiveCallback`](https://github.com/iterative/dvclive/blob/main/src/dvclive/huggingface.py)
in the callbacks list passed to your
Expand All @@ -26,31 +32,61 @@ trainer.add_callback(DVCLiveCallback(save_dvc_exp=True))
trainer.train()
```

Each metric will be logged to:

```py
{Live.plots_dir}/metrics/{split}/{metric}.tsv
```
## Parameters

Where:
- `live` - (`None` by default) - Optional [`Live`] instance. If `None`, a new
instance will be created using `**kwargs`.

- `{Live.plots_dir}` is defined in [`Live`].
- `{split}` can be either `train` or `eval`.
- `{metric}` is the name provided by the framework.
- `log_model` - (`None` by default) - use
[`live.log_artifact()`](/doc/dvclive/live/log_artifact) to log checkpoints
created by the
[`Trainer`](https://huggingface.co/docs/transformers/main_classes/trainer#checkpoints).

## Parameters
- if `log_model is None` (default), no checkpoint is logged.

- `model_file` - (`None` by default) - The name of the file where the model will
be saved at the end of each `step`.
- if `log_model == 'True'`, the final checkpoint is logged at the end of
training.

- `live` - (`None` by default) - Optional [`Live`] instance. If `None`, a new
instance will be created using `**kwargs`.
- if `log_model == 'all'`, all checkpoints are logged during training.
[`live.log_artifact()`] is called with `Trainer.output_dir`.

- `**kwargs` - Any additional arguments will be used to instantiate a new
[`Live`] instance. If `live` is used, the arguments are ignored.

## Examples

### Log model checkpoints

Use `log_model` to save the checkpoints (it will use `Live.log_artifact()`
internally to save those).

If `log_model=True` DVCLive will save a copy of the last checkpoint to the
`dvclive/artifacts` directory and annotate it with name `last` or `best` (if
[args.load_best_model_at_end](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.load_best_model_at_end)).

This is useful to be consumed in [Studio model registry] or automation
scenarios.

- Save the final checkpoint at the end of training:

```python
from dvclive.huggingface import DVCLiveCallback

trainer.add_callback(
DVCLiveCallback(save_dvc_exp=True, log_model=True)
```

- Save updates to the checkpoints directory whenever a new checkpoint is saved:

```python
from dvclive.huggingface import DVCLiveCallback

trainer.add_callback(
DVCLiveCallback(save_dvc_exp=True, log_model="all")
```

### Passing additional DVCLive arguments

- Using `live` to pass an existing [`Live`] instance.

```python
Expand All @@ -75,4 +111,20 @@ trainer.add_callback(
DVCLiveCallback(save_dvc_exp=True, dir="custom_dir"))
```

## Output format

Each metric will be logged to:

```py
{Live.plots_dir}/metrics/{split}/{metric}.tsv
```

Where:

- `{Live.plots_dir}` is defined in [`Live`].
- `{split}` can be either `train` or `eval`.
- `{metric}` is the name provided by the framework.

[`live`]: /doc/dvclive/live
[studio model registry]:
/doc/studio/user-guide/model-registry/what-is-a-model-registry

0 comments on commit a90cf9c

Please sign in to comment.