![Neptune + Transformers](https://neptune.ai/wp-content/uploads/2023/09/hf.svg)

# Using the 🤗 Transformers Integration

<a target="_blank" href="https://colab.research.google.com/github/neptune-ai/examples/blob/main/integrations-and-supported-tools/transformers/notebooks/Neptune_Transformers.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a><a target="_blank" href="https://github.com/neptune-ai/examples/blob/main/integrations-and-supported-tools/transformers/notebooks/Neptune_Transformers.ipynb">
  <img alt="Open in GitHub" src="https://img.shields.io/badge/Open_in_GitHub-blue?logo=github&labelColor=black">
</a><a target="_blank" href="https://app.neptune.ai/o/common/org/huggingface-integration/e/HUG-1452/dashboard/Overview-9887a96a-f93c-4b18-80f6-23a6bff1ef71"> 
  <img alt="Explore in Neptune" src="https://neptune.ai/wp-content/uploads/2024/01/neptune-badge.svg">
</a><a target="_blank" href="https://docs.neptune.ai/integrations/transformers/">
  <img alt="View tutorial in docs" src="https://neptune.ai/wp-content/uploads/2024/01/docs-badge-2.svg">
</a>

Neptune provides an integration with 🤗 Transformers. All you need to do to log metadata of the 🤗 Transformers training/finetuning is add a few lines of additional code.

You can integrate metadata tracking with Neptune either by:
* passing `report_to="neptune"` to the Trainer arguments,
* setting up a Neptune callback and passing it to the Trainer callbacks.

In this guide, we will look at both options to logging metadata.

By the end of this guide, you will be able to use the 🤗 Transformers integration to log:
* Train loss
* Evaluation loss
* Trainer parameters
* Model parameters
* Model checkpoint


## Before you start

This notebook example lets you try out Neptune anonymously, with zero setup.

If you want to see the example logged to your own workspace instead:

  1. Create a Neptune account. [Register &rarr;](https://neptune.ai/register)
  1. Create a Neptune project that you will use for tracking metadata. For instructions, see [Creating a project](https://docs.neptune.ai/setup/creating_project) in the Neptune docs.

## Install Neptune and dependencies

In [None]:
%pip install -U neptune transformers[torch,sklearn] datasets evaluate scipy

In [None]:
import neptune

project = "common/huggingface-integration"

## Setting up model and data for training

In [None]:
from datasets import load_dataset
from evaluate import load
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
from transformers.integrations import NeptuneCallback

Loading the data

In [None]:
task = "cola"
model_checkpoint = "prajjwal1/bert-tiny"
batch_size = 16
dataset = load_dataset("glue", task)
metric = load("glue", task)
num_labels = 2

Create Tokenizer for the model

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

Preprocess the dataset

In [None]:
def preprocess_function(examples):
    return tokenizer(examples["sentence"], truncation=True)

In [None]:
encoded_dataset = dataset.map(preprocess_function, batched=True)

Instantiate the model for finetuning

In [None]:
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

#### (Option 1) Using `report_to="neptune"` to TrainingArguments

When we pass `report_to="neptune"`, the integration takes care of creating a Neptune `run` to log the metadata. To use `report_to` approach, we need to set the `NEPTUNE_API_TOKEN` and `NEPTUNE_PROJECT` environment variables.

In [None]:
import os

os.environ["NEPTUNE_API_TOKEN"] = neptune.ANONYMOUS_API_TOKEN
os.environ["NEPTUNE_PROJECT"] = "common/huggingface-integration"

##### Log to your own project instead

Replace the code above with the following:

```python
import os
from getpass import getpass

os.environ["NEPTUNE_API_TOKEN"] = getpass("Enter your Neptune API token: ")
os.environ["NEPTUNE_PROJECT"] = "workspace-name/project-name",  # replace with your own
```

To find your API token and full project name:

1. [Log in to Neptune](https://app.neptune.ai/).
1. In the bottom-left corner, expand your user menu and select **Get your API token**.
1. The workspace name is displayed in the top-left corner of the app. To copy the project path, in the top-right corner, open the settings menu and select **Properties**.

For more help, see [Setting Neptune credentials](https://docs.neptune.ai/setup/setting_credentials) in the Neptune docs.

In [None]:
model_name = model_checkpoint.split("/")[-1]

args = TrainingArguments(
    f"{model_name}-finetuned-{task}",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=2,
    weight_decay=0.01,
    load_best_model_at_end=True,
    report_to="neptune",
)

validation_key = "validation"

In [None]:
trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset[validation_key],
    tokenizer=tokenizer,
)

In [None]:
trainer.train()

#### (Option 2) Setting up the NeptuneCallback

Create the training arguments for model finetuning.  
In this case, we set `report_to="none"` so that Transformers does not create a Callback for us like above.

In [None]:
model_name = model_checkpoint.split("/")[-1]

args = TrainingArguments(
    f"{model_name}-finetuned-{task}",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=2,
    weight_decay=0.01,
    load_best_model_at_end=True,
    report_to="none",
)

validation_key = "validation"

##### Start a run

To create a new run for tracking the metadata, you tell Neptune who you are (`api_token`) and where to send the data (`project`).

You can use the default code cell below to create an anonymous run in the public project [common/huggingface-integration](https://app.neptune.ai/common/huggingface-integration). **Note**: Public projects are cleaned regularly, so anonymous runs are only stored temporarily.

To log to your own project instead, replace the code below with the following:

```python
from getpass import getpass

run = neptune.init_run(
    api_token=getpass("Enter your Neptune API token: "),
    project="workspace-name/project-name",  # replace with your own (see instructions below)
)
```

To find your API token and full project name:

1. [Log in to Neptune](https://app.neptune.ai/).
1. In the bottom-left corner, expand your user menu and select **Get your API token**.
1. The workspace name is displayed in the top-left corner of the app. To copy the project path, in the top-right corner, open the settings menu and select **Properties**.

For more help, see [Setting Neptune credentials](https://docs.neptune.ai/setup/setting_credentials) in the Neptune docs.

In [None]:
run = neptune.init_run(
    api_token=neptune.ANONYMOUS_API_TOKEN,
    project=project,
)

##### Instantiate NeptuneCallback

We pass the `run` to the Neptune callback. The callback will take care of logging the metadata during the training phase. You can customize the metadata that is logged by passing additional arguments to the callback.

See the [Transformers integration guide](https://docs.neptune.ai/integrations/transformers)  for details.

Instantiate the NeptuneCallback with our `run`

In [None]:
neptune_callback = NeptuneCallback(
    run=run,
    log_checkpoints=None,  # Update to "last" or "best" if you want to log model checkpoints to Neptune
)

In [None]:
trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset[validation_key],
    callbacks=[neptune_callback],
    tokenizer=tokenizer,
)

**Note**: The `run` object is stopped once `Trainer.train()` is finished. As such, we don't need to call `run.stop()` explicitly (which is otherwise required in interactive environments, such as Jupyter Notebook).

In [None]:
trainer.train()

## Explore the results in Neptune

We just finetuned our model with the new data. Let's see an example of the data that was logged to Neptune.

You can also check out an [example run](https://new-ui.neptune.ai/o/common/org/huggingface-integration/runs/details?viewId=standard-view&detailsTab=metadata&shortId=HUG-1467&type=run).