# Using the 🤗 Transformers Integration

Neptune provides an integration with 🤗 Transformers. So all you need to do to log metadata of the 🤗 Transformers training/finetuning is write a few lines of additional code and logging to Neptune is taken care of. 

You can integrate metadata tracking with Neptune either by:
* passing report_to = "neptune" to the Trainer arguments.
* setting up a Neptune callback and passing it to the Trainer callbacks.

In this guide, we will look at the second option i.e. setting up a Neptune callback and passing it to the Trainer.

By the end of this guide, you will be able to use the HuggingFace Transformers integration to log
* Train Loss
* Evaluation Loss
* Trainer parameters
* Model parameters
* Model checkpoint

[See this example in Neptune](https://app.neptune.ai/o/showcase/org/project-text-summarization-hf/e/PROJ-138/dashboard/Custom-Dashboard-97370bc5-ee32-48ce-a630-fc806a370e13)



## Before you start

This notebook example lets you try out Neptune anonymously, with zero setup.

* If you're running the notebook on your local machine, you need to have [Python](https://www.python.org/downloads/) and [pip](https://pypi.org/project/pip/) installed.
* If you want to see the example logged to your own workspace instead:
    * Create a Neptune account → [Take me to registration](https://neptune.ai/register)
    * Create a Neptune project that you will use for tracking metadata → [Tell me more about projects](https://docs.neptune.ai/administration/projects)

## Install Neptune and dependencies

In [None]:
!pip install neptune-client transformers==4.22.0 datasets==2.5.1 torch==1.11 scipy==1.7.2 scikit-learn==1.0.1 numpy==1.22.0

## Start a run

To connect your script to Neptune and create a new run, we tell Neptune:
* **Who you are** - with a Neptune API token
* **Where to send your data** - to a Neptune project

The cell below lets you record data to the public project [common/quickstarts](https://app.neptune.ai/common/quickstarts) as an anonymous user.

In [None]:
import neptune.new as neptune

run = neptune.init_run(
    api_token=neptune.ANONYMOUS_API_TOKEN,
    project="common/quickstarts",
)

Alternatively, you can log the example to your own workspace.

To do that, replace the code above with the following:

```python
from getpass import getpass
my_api_token = getpass('Enter your Neptune API token: ')

run = neptune.init_run(
    api_token=my_api_token,
    project="workspace-name/project-name",  # replace with your own
)
```

For example, if your workspace name is `ml-team` and the project name is `classification`, the project argument is: `project="ml-team/classification"`

To find your API token and project name, [log in to Neptune](https://app.neptune.ai/).
- In the top-right corner, click your avatar and select **Get your API token**.
- To find and copy your project name, navigate to the project, then click **Settings** → **Properties**.

---

You now have new run in Neptune! From here on, we'll use the `run` object to log metadata.

**To open the run in Neptune, follow the link that appeared in the cell output.**

There's not much to display yet, but keep the tab with the run open to see what happens next.

## Setting up model and data for training

In [None]:
from datasets import load_dataset, load_metric
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
from transformers.integrations import NeptuneCallback

Loading the data

In [None]:
task = "cola"
model_checkpoint = "distilbert-base-uncased"
batch_size = 16
dataset = load_dataset("glue", task)
metric = load_metric('glue', task)
num_labels = 2

Create Tokenizer for the model

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

Preprocess the dataset

In [None]:
def preprocess_function(examples):
    return tokenizer(examples['sentence'], truncation=True)

In [None]:
encoded_dataset = dataset.map(preprocess_function, batched=True)

Instantiate the model for finetuning

In [None]:
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

Create the training arguments for model finetuning

In [None]:
model_name = model_checkpoint.split("/")[-1]

args = TrainingArguments(
    f"{model_name}-finetuned-{task}",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=2,
    weight_decay=0.01,
    load_best_model_at_end=True,
    report_to="none",
)

validation_key = 'validation'

#### Setting up the NeptuneCallback

We pass the `run` that was created earlier to the NeptuneCallback. The NeptuneCallback will take care of logging the metadata during the training phase. You can customize the metadata that is logged by the callback by passing additional arguments to the Callback.

See https://docs.neptune.ai/integrations-and-supported-tools/model-training/hugging-face for more details

Instantiate the NeptuneCallback with our `run`

In [None]:
neptune_callback = NeptuneCallback(run=run)

In [None]:
trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset[validation_key],
    callbacks=[neptune_callback],
    tokenizer=tokenizer,
)

**NOTE**: One important thing to keep in mind is that `run` is stopped once `Trainer.train` is finished. Hence, we don't need to call `run.stop` explicitly (which is otherwise required when using Notebook)

In [None]:
trainer.train()

## Explore the results in Neptune

We just finetuned our model with the new data. Let's see an example of the data that was logged to Neptune.

You can also check out an [example run](https://app.neptune.ai/o/showcase/org/project-text-summarization-hf/e/PROJ-138/dashboard/Custom-Dashboard-97370bc5-ee32-48ce-a630-fc806a370e13).