This notebook is based on [an official 🤗 notebook - "How to fine-tune a model on text classification"](https://github.com/huggingface/notebooks/blob/6ca682955173cc9d36ffa431ddda505a048cbe80/examples/text_classification.ipynb). The main aim of this notebook is to show the process of conversion from vanilla 🤗 to [Ray AIR](https://docs.ray.io/en/latest/ray-air/getting-started.html) 🤗 without changing the training logic unless necessary.

If you do not have 🤗 Datasets and 🤗 Transformers installed locally, uncomment and run the following line:

In [1]:
#! pip install datasets transformers

We will use a [runtime enviroment](https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments) to ensure that we have 🤗 Datasets and 🤗 Transformers installed on the Ray cluster.

In [2]:
import ray

runtime_env = {
    "pip": [
        "torch==1.10.0",  # required to pass
        "datasets",
        "git+https://github.com/huggingface/transformers"  # use master version due to a bug in 4.18 causing an exception during training
    ],
}
ray.init(runtime_env=runtime_env)

2022-05-04 14:35:23,123	INFO services.py:1470 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


RayContext(dashboard_url='127.0.0.1:8265', python_version='3.8.10', ray_version='2.0.0.dev0', ray_commit='{{RAY_COMMIT_SHA}}', address_info={'node_ip_address': '172.31.43.110', 'raylet_ip_address': '172.31.43.110', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-05-04_14-35-20_580244_3647021/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-05-04_14-35-20_580244_3647021/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2022-05-04_14-35-20_580244_3647021', 'metrics_export_port': 58493, 'gcs_address': '172.31.43.110:52471', 'address': '172.31.43.110:52471', 'node_id': 'bc8fa7f9d25a7c9c1f4b2ff5d81d85907f427c6c7e8d7e25904dd0ab'})

Make sure your version of Transformers on the cluster is at least 4.19.0:

In [3]:
@ray.remote
def print_transformers_version():
    import transformers
    print(transformers.__version__)

ray.get(print_transformers_version.remote())

# Fine-tuning a model on a text classification task

In this notebook, we will see how to fine-tune one of the [🤗 Transformers](https://github.com/huggingface/transformers) model to a text classification task of the [GLUE Benchmark](https://gluebenchmark.com/). We will be running the training on a Ray Cluster using Ray AIR.

The GLUE Benchmark is a group of nine classification tasks on sentences or pairs of sentences which are:

- [CoLA](https://nyu-mll.github.io/CoLA/) (Corpus of Linguistic Acceptability) Determine if a sentence is grammatically correct or not.is a  dataset containing sentences labeled grammatically correct or not.
- [MNLI](https://arxiv.org/abs/1704.05426) (Multi-Genre Natural Language Inference) Determine if a sentence entails, contradicts or is unrelated to a given hypothesis. (This dataset has two versions, one with the validation and test set coming from the same distribution, another called mismatched where the validation and test use out-of-domain data.)
- [MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398) (Microsoft Research Paraphrase Corpus) Determine if two sentences are paraphrases from one another or not.
- [QNLI](https://rajpurkar.github.io/SQuAD-explorer/) (Question-answering Natural Language Inference) Determine if the answer to a question is in the second sentence or not. (This dataset is built from the SQuAD dataset.)
- [QQP](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) (Quora Question Pairs2) Determine if two questions are semantically equivalent or not.
- [RTE](https://aclweb.org/aclwiki/Recognizing_Textual_Entailment) (Recognizing Textual Entailment) Determine if a sentence entails a given hypothesis or not.
- [SST-2](https://nlp.stanford.edu/sentiment/index.html) (Stanford Sentiment Treebank) Determine if the sentence has a positive or negative sentiment.
- [STS-B](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) (Semantic Textual Similarity Benchmark) Determine the similarity of two sentences with a score from 1 to 5.
- [WNLI](https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html) (Winograd Natural Language Inference) Determine if a sentence with an anonymous pronoun and a sentence with this pronoun replaced are entailed or not. (This dataset is built from the Winograd Schema Challenge dataset.)

Each task is named by its acronym, with `mnli-mm` standing for the mismatched version of MNLI (so same training set as `mnli` but different validation and test sets):

In [4]:
GLUE_TASKS = ["cola", "mnli", "mnli-mm", "mrpc", "qnli", "qqp", "rte", "sst2", "stsb", "wnli"]

[2m[36m(print_transformers_version pid=3647269)[0m 4.19.0.dev0


This notebook is built to run on any of the tasks in the list above, with any model checkpoint from the [Model Hub](https://huggingface.co/models) as long as that model has a version with a classification head. Depending on you model and the GPU you are using, you might need to adjust the batch size to avoid out-of-memory errors. Set those three parameters, then the rest of the notebook should run smoothly:

In [5]:
task = "cola"
model_checkpoint = "distilbert-base-uncased"
batch_size = 16

## Loading the dataset

We will use the [🤗 Datasets](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark). This can be easily done with the functions `load_dataset` and `load_metric`.

Apart from `mnli-mm` being a special code, we can directly pass our task name to those functions.

As Ray AIR doesn't provide integrations for 🤗 Datasets yet, we will simply run the normal 🤗 Datasets code in a [Ray Task](https://docs.ray.io/en/latest/ray-core/key-concepts.html#tasks), so that execution happens on the Ray cluster. `load_metric_fn` will be used inside the `HuggingFaceTrainer` later. Note that we are only defining the functions here, and not running them. We will use them later.

In [6]:
from datasets import load_dataset, load_metric
actual_task = "mnli" if task == "mnli-mm" else task

def load_dataset_fn():
    return load_dataset("glue", actual_task)

def load_metric_fn():
    return load_metric('glue', actual_task)

The `dataset` object itself is [`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation and test set (with more keys for the mismatched validation and test set in the special case of `mnli`).

The metric is an instance of [`datasets.Metric`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Metric).

Note that `load_metric` has loaded the proper metric associated to your task, which is:

- for CoLA: [Matthews Correlation Coefficient](https://en.wikipedia.org/wiki/Matthews_correlation_coefficient)
- for MNLI (matched or mismatched): Accuracy
- for MRPC: Accuracy and [F1 score](https://en.wikipedia.org/wiki/F1_score)
- for QNLI: Accuracy
- for QQP: Accuracy and [F1 score](https://en.wikipedia.org/wiki/F1_score)
- for RTE: Accuracy
- for SST-2: Accuracy
- for STS-B: [Pearson Correlation Coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) and [Spearman's_Rank_Correlation_Coefficient](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient)
- for WNLI: Accuracy

so the metric object only computes the one(s) needed for your task.

## Preprocessing the data

Before we can feed those texts to our model, we need to preprocess them. This is done by a 🤗 Transformers `Tokenizer` which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that model requires.

To do all of this, we instantiate our tokenizer with the `AutoTokenizer.from_pretrained` method, which will ensure:

- we get a tokenizer that corresponds to the model architecture we want to use,
- we download the vocabulary used when pretraining this specific checkpoint.

We continiue with our pattern of defining functions and variables which we will use in the final Ray Task.

In [7]:
from transformers import AutoTokenizer

def load_tokenizer_fn() -> AutoTokenizer:
    tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)
    return tokenizer

We pass along `use_fast=True` to the call above to use one of the fast tokenizers (backed by Rust) from the 🤗 Tokenizers library. Those fast tokenizers are available for almost all models, but if you got an error with the previous call, remove that argument.

To preprocess our dataset, we will thus need the names of the columns containing the sentence(s). The following dictionary keeps track of the correspondence task to column names:

In [8]:
task_to_keys = {
    "cola": ("sentence", None),
    "mnli": ("premise", "hypothesis"),
    "mnli-mm": ("premise", "hypothesis"),
    "mrpc": ("sentence1", "sentence2"),
    "qnli": ("question", "sentence"),
    "qqp": ("question1", "question2"),
    "rte": ("sentence1", "sentence2"),
    "sst2": ("sentence", None),
    "stsb": ("sentence1", "sentence2"),
    "wnli": ("sentence1", "sentence2"),
}

We can them write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True`. This will ensure that an input longer that what the model selected can handle will be truncated to the maximum length accepted by the model.

In [9]:
def preprocess_function(examples, *, tokenizer):
    sentence1_key, sentence2_key = task_to_keys[task]
    if sentence2_key is None:
        return tokenizer(examples[sentence1_key], truncation=True)
    return tokenizer(examples[sentence1_key], examples[sentence2_key], truncation=True)

To apply this function on all the sentences (or pairs of sentences) in our dataset, we just use the `map` method of our `dataset` object we created earlier. This will apply the function on all the elements of all the splits in `dataset`, so our training, validation and testing data will be preprocessed in one single command.

In [10]:
def encode_dataset(dataset, tokenizer):
    return dataset.map(preprocess_function, batched=True, fn_kwargs=dict(tokenizer=tokenizer))

For Ray AIR, instead of using 🤗 Dataset objects directly, we will convert them to [Ray Datasets](https://docs.ray.io/en/latest/data/dataset.html). As both are backed by Arrow tables, the conversion is quite simple.

In [11]:
import ray.data
from datasets import DatasetDict

def convert_hf_dataset_to_ray_dataset(hf_dataset):
    if isinstance(hf_dataset, DatasetDict):
        return {k: ray.data.from_arrow(v.data.table) for k, v in hf_dataset.items()}
    return ray.data.from_arrow(hf_dataset.data.table)

Finally, we will tie it all up in one Ray Task. It will return a dictionary of Ray Datasets:

(Note - you are of course not required to encapsulate everything in functions as we have done.)

In [12]:
@ray.remote
def load_and_preprocess_dataset():
    dataset = load_dataset_fn()
    tokenizer = load_tokenizer_fn()
    encoded_dataset = encode_dataset(dataset, tokenizer)
    return convert_hf_dataset_to_ray_dataset(encoded_dataset)

ray_datasets = ray.get(load_and_preprocess_dataset.remote())

[2m[36m(load_and_preprocess_dataset pid=3647269)[0m Reusing dataset glue (/home/ubuntu/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
100%|██████████| 3/3 [00:00<00:00, 289.71it/s]269)[0m 
[2m[36m(load_and_preprocess_dataset pid=3647269)[0m Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-f6befc129b891f86.arrow
  0%|          | 0/2 [00:00<?, ?ba/s] pid=3647269)[0m 
100%|██████████| 2/2 [00:00<00:00, 10.14ba/s]7269)[0m 
[2m[36m(load_and_preprocess_dataset pid=3647269)[0m Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-621ab3b8c607195a.arrow


## Fine-tuning the model

Now that our data is ready, we can download the pretrained model and fine-tune it.

Since all our tasks are about sentence classification, we use the `AutoModelForSequenceClassification` class.

We will not go into details about each specific component of the training (see the [original notebook](https://github.com/huggingface/notebooks/blob/6ca682955173cc9d36ffa431ddda505a048cbe80/examples/text_classification.ipynb) for that).

The main difference when using the Ray AIR  is that we need to create our 🤗 Transformers `Trainer` inside a function (`trainer_init_per_worker`) and return it. That function will be passed to the `HuggingFaceTrainer` and ran on every Ray worker. The training will then proceed by the means of PyTorch DDP.

Make sure that you initialize the model, metric and tokenizer inside that function. Otherwise, you may run into serialization errors.

Please note that if you don't want to use CUDA, you need to explicitly set `no_cuda=True` inside the `TrainingArguments`. Furthermore, `push_to_hub=True` is not yet supported. Ray will however checkpoint the model at every epoch, allowing you to push it to hub manually. We will do that after the training.

If you wish to use thrid party logging libraries, such as MLFlow or Weights&Biases, do not set them in `TrainingArguments` (they will be automatically disabled) - instead, you should be passing Ray AIR callbacks to `HuggingFaceTrainer`'s `run_config`. In this example, we will use MLFlow.

We also set `disable_tqdm=True` to declutter the output a little.

In [13]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
import numpy as np
import torch

num_labels = 3 if task.startswith("mnli") else 1 if task=="stsb" else 2
metric_name = "pearson" if task == "stsb" else "matthews_correlation" if task == "cola" else "accuracy"
model_name = model_checkpoint.split("/")[-1]
validation_key = "validation_mismatched" if task == "mnli-mm" else "validation_matched" if task == "mnli" else "validation"
name = f"{model_name}-finetuned-{task}"

def trainer_init_per_worker(train_dataset, eval_dataset = None, **config):
    print(f"Is CUDA available: {torch.cuda.is_available()}")
    metric = load_metric_fn()
    tokenizer = load_tokenizer_fn()
    model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)
    args = TrainingArguments(
        name,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        learning_rate=2e-5,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        num_train_epochs=2,
        weight_decay=0.01,
        push_to_hub=False,
        disable_tqdm=True,
        no_cuda=not torch.cuda.is_available(),
    )

    def compute_metrics(eval_pred):
        predictions, labels = eval_pred
        if task != "stsb":
            predictions = np.argmax(predictions, axis=1)
        else:
            predictions = predictions[:, 0]
        return metric.compute(predictions=predictions, references=labels)

    trainer = Trainer(
        model,
        args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics
    )

    print("Starting training")
    return trainer

With our `trainer_init_per_worker` complete, we can now instantiate the `HuggingFaceTrainer`. Aside from the function, we set the `scaling_config`, controlling the amount of workers and resources used, and the `datasets` we will use for training and evaluation.

We will use 2 workers, each with a single GPU assigned, and we specify the `MlflowLoggerCallback` inside the `run_config`.

In [14]:
from ray.ml.train.integrations.huggingface import HuggingFaceTrainer
from ray.ml import RunConfig
from ray.tune.integration.mlflow import MLflowLoggerCallback

trainer = HuggingFaceTrainer(
    trainer_init_per_worker=trainer_init_per_worker,
    scaling_config={"num_workers": 4, "use_gpu": False},
    datasets={"train": ray_datasets["train"], "evaluation": ray_datasets[validation_key]},
    run_config=RunConfig(callbacks=[MLflowLoggerCallback(experiment_name=name)])
)

Finally, we call the `fit` method to being training with Ray AIR. We will save the `Result` object to a variable so we can access metrics and checkpoints.

In [15]:
result = trainer.fit()

[2m[36m(pid=3647387)[0m 2022-05-04 14:36:55.778885: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
[2m[36m(pid=3647387)[0m 2022-05-04 14:36:55.778931: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


[2m[36m(BaseWorkerMixin pid=3647462)[0m 2022-05-04 14:37:07,156	INFO torch.py:346 -- Setting up process group for: env:// [rank=3, world_size=4]
[2m[36m(BaseWorkerMixin pid=3647461)[0m 2022-05-04 14:37:07,192	INFO torch.py:346 -- Setting up process group for: env:// [rank=2, world_size=4]
[2m[36m(BaseWorkerMixin pid=3647460)[0m 2022-05-04 14:37:07,197	INFO torch.py:346 -- Setting up process group for: env:// [rank=1, world_size=4]
[2m[36m(BaseWorkerMixin pid=3647459)[0m 2022-05-04 14:37:07,207	INFO torch.py:346 -- Setting up process group for: env:// [rank=0, world_size=4]


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


[2m[36m(BaseWorkerMixin pid=3647461)[0m 2022-05-04 14:37:10.845960: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
[2m[36m(BaseWorkerMixin pid=3647461)[0m 2022-05-04 14:37:10.845993: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[2m[36m(BaseWorkerMixin pid=3647462)[0m 2022-05-04 14:37:10.980593: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
[2m[36m(BaseWorkerMixin pid=3647462)[0m 2022-05-04 14:37:10.980627: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[2m[36m(BaseWorkerMixin pid=3647460)[0m 2022-05-0

[2m[36m(BaseWorkerMixin pid=3647461)[0m Is CUDA available: False
[2m[36m(BaseWorkerMixin pid=3647460)[0m Is CUDA available: False
[2m[36m(BaseWorkerMixin pid=3647459)[0m Is CUDA available: False
[2m[36m(BaseWorkerMixin pid=3647462)[0m Is CUDA available: False


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


[2m[36m(BaseWorkerMixin pid=3647461)[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias', 'vocab_transform.bias', 'vocab_transform.weight']
[2m[36m(BaseWorkerMixin pid=3647461)[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(BaseWorkerMixin pid=3647461)[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[2m[36m(BaseWorkerMixin pid=3647461)[0m Some weights of DistilBertForSequence

[2m[36m(BaseWorkerMixin pid=3647461)[0m Starting training
[2m[36m(BaseWorkerMixin pid=3647460)[0m Starting training
[2m[36m(BaseWorkerMixin pid=3647459)[0m Starting training
[2m[36m(BaseWorkerMixin pid=3647462)[0m Starting training


[2m[36m(BaseWorkerMixin pid=3647459)[0m ***** Running training *****
[2m[36m(BaseWorkerMixin pid=3647459)[0m   Num examples = 2144
[2m[36m(BaseWorkerMixin pid=3647459)[0m   Num Epochs = 2
[2m[36m(BaseWorkerMixin pid=3647459)[0m   Instantaneous batch size per device = 16
[2m[36m(BaseWorkerMixin pid=3647459)[0m   Total train batch size (w. parallel, distributed & accumulation) = 64
[2m[36m(BaseWorkerMixin pid=3647459)[0m   Gradient Accumulation steps = 1
[2m[36m(BaseWorkerMixin pid=3647459)[0m   Total optimization steps = 268


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


[2m[36m(BaseWorkerMixin pid=3647459)[0m ***** Running Evaluation *****
[2m[36m(BaseWorkerMixin pid=3647459)[0m   Num examples = 272
[2m[36m(BaseWorkerMixin pid=3647459)[0m   Batch size = 16


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


[2m[36m(BaseWorkerMixin pid=3647459)[0m Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-134
[2m[36m(BaseWorkerMixin pid=3647459)[0m Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-134/config.json


[2m[36m(BaseWorkerMixin pid=3647459)[0m {'eval_loss': 0.5443748831748962, 'eval_matthews_correlation': 0.40441237536094715, 'eval_runtime': 6.007, 'eval_samples_per_second': 45.281, 'eval_steps_per_second': 0.832, 'epoch': 1.0}


[2m[36m(BaseWorkerMixin pid=3647459)[0m Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-134/pytorch_model.bin
[2m[36m(BaseWorkerMixin pid=3647459)[0m tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-134/tokenizer_config.json
[2m[36m(BaseWorkerMixin pid=3647459)[0m Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-134/special_tokens_map.json


Trial name,status,loc
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387


Result for HuggingFaceTrainer_a34c8_00000:
  _time_this_iter_s: 296.5610888004303
  _timestamp: 1651675329
  _training_iteration: 1
  date: 2022-05-04_14-42-09
  done: false
  epoch: 1.0
  eval_loss: 0.5443748831748962
  eval_matthews_correlation: 0.40441237536094715
  eval_runtime: 6.007
  eval_samples_per_second: 45.281
  eval_steps_per_second: 0.832
  experiment_id: 10ee119f04ba48f6a55ae94919ab8619
  hostname: ip-172-31-43-110
  iterations_since_restore: 1
  node_ip: 172.31.43.110
  pid: 3647387
  should_checkpoint: true
  step: 134
  time_since_restore: 305.5104389190674
  time_this_iter_s: 305.5104389190674
  time_total_s: 305.5104389190674
  timestamp: 1651675329
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: a34c8_00000
  warmup_time: 0.005402565002441406
  


Trial name,status,loc,iter,total time (s),eval_loss,eval_matthews_correlation,eval_runtime
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387,1,305.51,0.544375,0.404412,6.007


Trial name,status,loc,iter,total time (s),eval_loss,eval_matthews_correlation,eval_runtime
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387,1,305.51,0.544375,0.404412,6.007


Trial name,status,loc,iter,total time (s),eval_loss,eval_matthews_correlation,eval_runtime
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387,1,305.51,0.544375,0.404412,6.007




Trial name,status,loc,iter,total time (s),eval_loss,eval_matthews_correlation,eval_runtime
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387,1,305.51,0.544375,0.404412,6.007


Trial name,status,loc,iter,total time (s),eval_loss,eval_matthews_correlation,eval_runtime
HuggingFaceTrainer_a34c8_00000,RUNNING,172.31.43.110:3647387,1,305.51,0.544375,0.404412,6.007


You can use the returned `Result` object to access metrics and the Ray AIR `Checkpoint` associated with the last iteration.

In [None]:
result

You can now use the checkpoint to run prediction with `HuggingFacePredictor`, which wraps around [🤗 Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines). In order to distribute prediction, we use `BatchPredictor`. While this is not necessary for the very small example we are using (you could use `HuggingFacePredictor` directly), it will scale well to a large dataset.

In [None]:
from ray.ml.predictors.integrations.huggingface import HuggingFacePredictor
from ray.ml.batch_predictor import BatchPredictor
import pandas as pd

sentences = ['Bill whistled past the house.',
  'The car honked its way down the road.',
  'Bill pushed Harry off the sofa.',
  'the kittens yawned awake and played.',
  'I demand that the more John eats, the more he pay.']
predictor = BatchPredictor.from_checkpoint(
    result.checkpoint,
    HuggingFacePredictor,
    task="text-classification",
)
data = ray.data.from_pandas(pd.DataFrame(sentences, columns=["sentence"]))
prediction = predictor.predict(data)
prediction = prediction.to_pandas()
prediction

To be able to share your model with the community, there are a few more steps to follow.

We have conducted the training on the Ray cluster, but share the model from the local enviroment - this will allow us to easily authenticate.

First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your username and password:

In [None]:
from huggingface_hub import notebook_login

notebook_login()

Then you need to install Git-LFS. Uncomment the following instructions:

In [None]:
# !apt install git-lfs

Now, load the model and tokenizer locally, and recreate the `Trainer`:

In [None]:
hf_trainer = HuggingFaceTrainer.load_huggingface_checkpoint(result.checkpoint, AutoModelForSequenceClassification, AutoTokenizer)

You can now upload the result of the training to the Hub, just execute this instruction:

In [None]:
hf_trainer.push_to_hub()

You can now share this model with all your friends, family, favorite pets: they can all load it with the identifier `"your-username/the-name-you-picked"` so for instance:

```python
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("sgugger/my-awesome-model")
```