# Transfer learning with Huggingface using CodeFlare

In this notebook you will learn how to leverage the **[huggingface](https://huggingface.co/)** support in ray ecosystem to carry out a text classification task using transfer learning. We will be referencing the example **[here](https://huggingface.co/docs/transformers/tasks/sequence_classification)**

The example carries out a text classification task on **[imdb dataset](https://huggingface.co/datasets/imdb)** and tries to classify the movie reviews as positive or negative. Huggingface library provides an easy way to build a model and the dataset to carry out this classification task. In this case we will be using **distilbert-base-uncased** model which is a **BERT** based model.

Huggingface has a **[built in support for ray ecosystem](https://docs.ray.io/en/releases-1.13.0/_modules/ray/ml/train/integrations/huggingface/huggingface_trainer.html)** which allows the huggingface trainer to scale on CodeFlare and can scale the training as we add additional gpus and can run distributed training across multiple GPUs that will help scale out the training.


### Getting all the requirements in place

In [2]:
! oc login --token=your-token --server=https://your-cluster

Logged into "https://your-cluster" as "kube:admin" using the token provided.

You have access to 74 projects, the list has been suppressed. You can list all projects with 'oc projects'

Using project "rhods-notebooks".


In [3]:
! oc project default

Now using project "default" on server "https://your-cluster3".


Let's check that we have the necessary Hugging Face packages

In [18]:
import datasets
datasets.__version__

'2.6.1'

In [19]:
import transformers
transformers.__version__

'4.23.1'

In [20]:
import evaluate
evaluate.__version__

'0.3.0'

In [4]:
# Import pieces from codeflare-sdk
from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration

In [5]:
# Create our cluster and submit appwrapper
cluster = Cluster(ClusterConfiguration(name='hfgputest', min_worker=1, max_worker=1, min_cpus=8, max_cpus=8, min_memory=16, max_memory=16, gpu=4, instascale=True, machine_types=["m5.xlarge", "p3.8xlarge"]))

Written to: hfgputest.yaml


In [6]:
cluster.up()

In [7]:
cluster.is_ready()

(False, <CodeFlareClusterStatus.QUEUED: 2>)

In [12]:
cluster.status()

<RayClusterStatus.READY: 'ready'>

In [14]:
ray_cluster_uri = cluster.cluster_uri()

**NOTE**: Here we have created a custom cluster with a GPU. You can add more GPUs by changing the spec above. 

In [15]:
#before proceeding make sure the cluster exists and the uri is not empty
assert ray_cluster_uri, "Ray cluster needs to be started and set before proceeding"

import ray

# reset the ray context in case there's already one. 
ray.shutdown()
# establish connection to ray cluster

#install additionall libraries that will be required for this training
runtime_env = {"pip": ["transformers", "datasets", "evaluate"]}

ray.init(address=f'{ray_cluster_uri}', runtime_env=runtime_env)

print("Ray cluster is up and running: ", ray.is_initialized())

Ray cluster is up and running:  True


**NOTE** : in this case since we are running a task for which we need additional pip packages. we can install those by passing them in the `runtime_env` variable

### Transfer learning code from huggingface

We are using the code based on the example **[here](https://huggingface.co/docs/transformers/tasks/sequence_classification)** . 

In [16]:
@ray.remote
def train_fn():
    from datasets import load_dataset
    import transformers
    from transformers import AutoTokenizer, TrainingArguments
    from transformers import AutoModelForSequenceClassification
    import numpy as np
    from datasets import load_metric
    import ray
    from ray import tune
    from ray.ml.train.integrations.huggingface import HuggingFaceTrainer

    dataset = load_dataset("imdb")
    tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

    def tokenize_function(examples):
        return tokenizer(examples["text"], padding="max_length", truncation=True)

    tokenized_datasets = dataset.map(tokenize_function, batched=True)

    #using a fraction of dataset but you can run with the full dataset
    small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(100))
    small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(100))

    print(f"len of train {small_train_dataset} and test {small_eval_dataset}")

    ray_train_ds = ray.data.from_huggingface(small_train_dataset)
    ray_evaluation_ds = ray.data.from_huggingface(small_eval_dataset)

    def compute_metrics(eval_pred):
        metric = load_metric("accuracy")
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return metric.compute(predictions=predictions, references=labels)

    def trainer_init_per_worker(train_dataset, eval_dataset, **config):
        model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

        training_args = TrainingArguments("/tmp/hf_imdb/test", eval_steps=1, disable_tqdm=True, 
                                          num_train_epochs=1, skip_memory_metrics=True,
                                          learning_rate=2e-5,
                                          per_device_train_batch_size=16,
                                          per_device_eval_batch_size=16,                                
                                          weight_decay=0.01,)
        return transformers.Trainer(
            model=model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            compute_metrics=compute_metrics
        )

    scaling_config = {"num_workers": 4, "use_gpu": True} #num workers is the number of gpus

    # we are using the ray native HuggingFaceTrainer, but you can swap out to use non ray Huggingface Trainer. Both have the same method signature. 
    # the ray native HFTrainer has built in support for scaling to multiple GPUs
    trainer = HuggingFaceTrainer(
        trainer_init_per_worker=trainer_init_per_worker,
        scaling_config=scaling_config,
        datasets={"train": ray_train_ds, "evaluation": ray_evaluation_ds},
    )
    result = trainer.fit()


**NOTE:** This code will produce a lot of output and will run for **approximately 2 minutes.** As a part of execution it will download the `imdb` dataset, `distilbert-base-uncased` model and then will start transfer learning task for training the model with this dataset. 

In [17]:
#call the above cell as a remote ray function
ray.get(train_fn.remote())

Downloading builder script: 100%|██████████| 4.31k/4.31k [00:00<00:00, 5.26MB/s]
Downloading metadata: 100%|██████████| 2.17k/2.17k [00:00<00:00, 3.10MB/s]
Downloading readme: 100%|██████████| 7.59k/7.59k [00:00<00:00, 9.32MB/s]


[2m[36m(train_fn pid=249)[0m Downloading and preparing dataset imdb/plain_text to /home/ray/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1...


Downloading data:   0%|          | 0.00/84.1M [00:00<?, ?B/s]
Downloading data:   0%|          | 32.8k/84.1M [00:00<05:06, 274kB/s]
Downloading data:   0%|          | 97.3k/84.1M [00:00<03:16, 429kB/s]
Downloading data:   0%|          | 194k/84.1M [00:00<02:20, 598kB/s] 
Downloading data:   0%|          | 367k/84.1M [00:00<01:30, 931kB/s]
Downloading data:   1%|          | 645k/84.1M [00:00<00:58, 1.43MB/s]
Downloading data:   1%|▏         | 1.06M/84.1M [00:00<00:39, 2.10MB/s]
Downloading data:   2%|▏         | 1.60M/84.1M [00:00<00:28, 2.90MB/s]
Downloading data:   3%|▎         | 2.27M/84.1M [00:00<00:21, 3.74MB/s]
Downloading data:   4%|▎         | 3.08M/84.1M [00:01<00:17, 4.69MB/s]
Downloading data:   5%|▍         | 4.17M/84.1M [00:01<00:13, 6.03MB/s]
Downloading data:   7%|▋         | 5.65M/84.1M [00:01<00:09, 7.95MB/s]
Downloading data:  10%|▉         | 8.10M/84.1M [00:01<00:06, 12.1MB/s]
Downloading data:  14%|█▍        | 11.7M/84.1M [00:01<00:03, 18.5MB/s]
Downloading data:  17

[2m[36m(train_fn pid=249)[0m Dataset imdb downloaded and prepared to /home/ray/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1. Subsequent calls will reuse this data.


Downloading: 100%|██████████| 28.0/28.0 [00:00<00:00, 32.8kB/s]
Downloading: 100%|██████████| 483/483 [00:00<00:00, 559kB/s]
Downloading: 100%|██████████| 232k/232k [00:00<00:00, 4.77MB/s]
Downloading: 100%|██████████| 466k/466k [00:00<00:00, 7.86MB/s]
  0%|          | 0/25 [00:00<?, ?ba/s]
  4%|▍         | 1/25 [00:00<00:15,  1.52ba/s]
  8%|▊         | 2/25 [00:01<00:14,  1.61ba/s]
 12%|█▏        | 3/25 [00:01<00:13,  1.60ba/s]
 16%|█▌        | 4/25 [00:02<00:13,  1.61ba/s]
 20%|██        | 5/25 [00:03<00:13,  1.53ba/s]
 24%|██▍       | 6/25 [00:03<00:12,  1.54ba/s]
 28%|██▊       | 7/25 [00:04<00:11,  1.56ba/s]
 32%|███▏      | 8/25 [00:05<00:10,  1.55ba/s]
 36%|███▌      | 9/25 [00:05<00:10,  1.55ba/s]
 40%|████      | 10/25 [00:06<00:09,  1.58ba/s]
 44%|████▍     | 11/25 [00:06<00:08,  1.61ba/s]
 48%|████▊     | 12/25 [00:07<00:08,  1.62ba/s]
 52%|█████▏    | 13/25 [00:08<00:07,  1.62ba/s]
 56%|█████▌    | 14/25 [00:08<00:06,  1.59ba/s]
 60%|██████    | 15/25 [00:09<00:06,  1.59ba/

[2m[36m(train_fn pid=249)[0m len of train Dataset({
[2m[36m(train_fn pid=249)[0m     features: ['text', 'label', 'input_ids', 'attention_mask'],
[2m[36m(train_fn pid=249)[0m     num_rows: 100
[2m[36m(train_fn pid=249)[0m }) and test Dataset({
[2m[36m(train_fn pid=249)[0m     features: ['text', 'label', 'input_ids', 'attention_mask'],
[2m[36m(train_fn pid=249)[0m     num_rows: 100
[2m[36m(train_fn pid=249)[0m })


 98%|█████████▊| 49/50 [00:31<00:00,  1.53ba/s]


[2m[36m(train_fn pid=249)[0m huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
[2m[36m(train_fn pid=249)[0m 	- Avoid using `tokenizers` before the fork if possible
[2m[36m(train_fn pid=249)[0m 	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2m[36m(train_fn pid=249)[0m == Status ==
[2m[36m(train_fn pid=249)[0m Current time: 2022-11-04 04:25:03 (running for 00:00:05.19)
[2m[36m(train_fn pid=249)[0m Memory usage on this node: 6.9/240.1 GiB
[2m[36m(train_fn pid=249)[0m Using FIFO scheduling algorithm.
[2m[36m(train_fn pid=249)[0m Resources requested: 5.0/10 CPUs, 4.0/4 GPUs, 0.0/22.35 GiB heap, 0.0/6.59 GiB objects
[2m[36m(train_fn pid=249)[0m Result logdir: /home/ray/ray_results/HuggingFaceTrainer_2022-11-04_04-24-57
[2m[36m(train_fn pid=249)[0m Number of trials: 1/1 (1 RUNNING)
[2m[36m(train_fn pid=249)[0m +--------------------

[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m 2022-11-04 04:25:06,390	INFO torch.py:346 -- Setting up process group for: env:// [rank=0, world_size=4]
[2m[36m(BaseWorkerMixin pid=184, ip=10.128.64.16)[0m 2022-11-04 04:25:06,389	INFO torch.py:346 -- Setting up process group for: env:// [rank=2, world_size=4]
[2m[36m(BaseWorkerMixin pid=185, ip=10.128.64.16)[0m 2022-11-04 04:25:06,391	INFO torch.py:346 -- Setting up process group for: env:// [rank=3, world_size=4]
[2m[36m(BaseWorkerMixin pid=183, ip=10.128.64.16)[0m 2022-11-04 04:25:06,393	INFO torch.py:346 -- Setting up process group for: env:// [rank=1, world_size=4]


[2m[36m(train_fn pid=249)[0m == Status ==
[2m[36m(train_fn pid=249)[0m Current time: 2022-11-04 04:25:08 (running for 00:00:10.19)
[2m[36m(train_fn pid=249)[0m Memory usage on this node: 7.7/240.1 GiB
[2m[36m(train_fn pid=249)[0m Using FIFO scheduling algorithm.
[2m[36m(train_fn pid=249)[0m Resources requested: 5.0/10 CPUs, 4.0/4 GPUs, 0.0/22.35 GiB heap, 0.0/6.59 GiB objects
[2m[36m(train_fn pid=249)[0m Result logdir: /home/ray/ray_results/HuggingFaceTrainer_2022-11-04_04-24-57
[2m[36m(train_fn pid=249)[0m Number of trials: 1/1 (1 RUNNING)
[2m[36m(train_fn pid=249)[0m +--------------------------------+----------+------------------+
[2m[36m(train_fn pid=249)[0m | Trial name                     | status   | loc              |
[2m[36m(train_fn pid=249)[0m |--------------------------------+----------+------------------|
[2m[36m(train_fn pid=249)[0m | HuggingFaceTrainer_50527_00000 | RUNNING  | 10.128.64.16:146 |
[2m[36m(train_fn pid=249)[0m +-----------

Downloading: 100%|██████████| 483/483 [00:00<00:00, 665kB/s]
Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s] 
Downloading:   2%|▏         | 5.95M/268M [00:00<00:04, 59.5MB/s]
Downloading:   5%|▍         | 12.5M/268M [00:00<00:04, 62.8MB/s]
Downloading:   7%|▋         | 19.0M/268M [00:00<00:03, 63.9MB/s]
Downloading:  10%|▉         | 25.6M/268M [00:00<00:03, 64.6MB/s]
Downloading:  12%|█▏        | 32.1M/268M [00:00<00:03, 64.9MB/s]
Downloading:  14%|█▍        | 38.6M/268M [00:00<00:03, 65.0MB/s]
Downloading:  17%|█▋        | 45.2M/268M [00:00<00:03, 65.1MB/s]
Downloading:  19%|█▉        | 51.7M/268M [00:00<00:03, 65.3MB/s]
Downloading:  22%|██▏       | 58.3M/268M [00:00<00:03, 65.5MB/s]
Downloading:  24%|██▍       | 64.9M/268M [00:01<00:03, 65.4MB/s]
Downloading:  27%|██▋       | 71.4M/268M [00:01<00:03, 65.4MB/s]
Downloading:  29%|██▉       | 78.0M/268M [00:01<00:02, 65.5MB/s]
Downloading:  32%|███▏      | 84.5M/268M [00:01<00:02, 65.4MB/s]
Downloading:  34%|███▍      | 91.1M/2

[2m[36m(train_fn pid=249)[0m == Status ==
[2m[36m(train_fn pid=249)[0m Current time: 2022-11-04 04:25:13 (running for 00:00:15.19)
[2m[36m(train_fn pid=249)[0m Memory usage on this node: 8.0/240.1 GiB
[2m[36m(train_fn pid=249)[0m Using FIFO scheduling algorithm.
[2m[36m(train_fn pid=249)[0m Resources requested: 5.0/10 CPUs, 4.0/4 GPUs, 0.0/22.35 GiB heap, 0.0/6.59 GiB objects
[2m[36m(train_fn pid=249)[0m Result logdir: /home/ray/ray_results/HuggingFaceTrainer_2022-11-04_04-24-57
[2m[36m(train_fn pid=249)[0m Number of trials: 1/1 (1 RUNNING)
[2m[36m(train_fn pid=249)[0m +--------------------------------+----------+------------------+
[2m[36m(train_fn pid=249)[0m | Trial name                     | status   | loc              |
[2m[36m(train_fn pid=249)[0m |--------------------------------+----------+------------------|
[2m[36m(train_fn pid=249)[0m | HuggingFaceTrainer_50527_00000 | RUNNING  | 10.128.64.16:146 |
[2m[36m(train_fn pid=249)[0m +-----------

Downloading:  81%|████████  | 216M/268M [00:03<00:00, 65.4MB/s]
Downloading:  83%|████████▎ | 222M/268M [00:03<00:00, 65.3MB/s]
Downloading:  85%|████████▌ | 229M/268M [00:03<00:00, 65.6MB/s]
Downloading:  88%|████████▊ | 236M/268M [00:03<00:00, 65.7MB/s]
Downloading:  90%|█████████ | 242M/268M [00:03<00:00, 65.8MB/s]
Downloading:  93%|█████████▎| 249M/268M [00:03<00:00, 65.9MB/s]
Downloading:  95%|█████████▌| 255M/268M [00:03<00:00, 65.8MB/s]
Downloading:  98%|█████████▊| 262M/268M [00:04<00:00, 65.7MB/s]
Downloading: 100%|██████████| 268M/268M [00:04<00:00, 65.4MB/s]
[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_transform.weight']
[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m - This IS expected if you 

[2m[36m(train_fn pid=249)[0m == Status ==
[2m[36m(train_fn pid=249)[0m Current time: 2022-11-04 04:25:18 (running for 00:00:20.20)
[2m[36m(train_fn pid=249)[0m Memory usage on this node: 13.0/240.1 GiB
[2m[36m(train_fn pid=249)[0m Using FIFO scheduling algorithm.
[2m[36m(train_fn pid=249)[0m Resources requested: 5.0/10 CPUs, 4.0/4 GPUs, 0.0/22.35 GiB heap, 0.0/6.59 GiB objects
[2m[36m(train_fn pid=249)[0m Result logdir: /home/ray/ray_results/HuggingFaceTrainer_2022-11-04_04-24-57
[2m[36m(train_fn pid=249)[0m Number of trials: 1/1 (1 RUNNING)
[2m[36m(train_fn pid=249)[0m +--------------------------------+----------+------------------+
[2m[36m(train_fn pid=249)[0m | Trial name                     | status   | loc              |
[2m[36m(train_fn pid=249)[0m |--------------------------------+----------+------------------|
[2m[36m(train_fn pid=249)[0m | HuggingFaceTrainer_50527_00000 | RUNNING  | 10.128.64.16:146 |
[2m[36m(train_fn pid=249)[0m +----------



[2m[36m(train_fn pid=249)[0m == Status ==
[2m[36m(train_fn pid=249)[0m Current time: 2022-11-04 04:25:23 (running for 00:00:25.20)
[2m[36m(train_fn pid=249)[0m Memory usage on this node: 14.1/240.1 GiB
[2m[36m(train_fn pid=249)[0m Using FIFO scheduling algorithm.
[2m[36m(train_fn pid=249)[0m Resources requested: 5.0/10 CPUs, 4.0/4 GPUs, 0.0/22.35 GiB heap, 0.0/6.59 GiB objects
[2m[36m(train_fn pid=249)[0m Result logdir: /home/ray/ray_results/HuggingFaceTrainer_2022-11-04_04-24-57
[2m[36m(train_fn pid=249)[0m Number of trials: 1/1 (1 RUNNING)
[2m[36m(train_fn pid=249)[0m +--------------------------------+----------+------------------+
[2m[36m(train_fn pid=249)[0m | Trial name                     | status   | loc              |
[2m[36m(train_fn pid=249)[0m |--------------------------------+----------+------------------|
[2m[36m(train_fn pid=249)[0m | HuggingFaceTrainer_50527_00000 | RUNNING  | 10.128.64.16:146 |
[2m[36m(train_fn pid=249)[0m +----------

[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m Saving model checkpoint to /tmp/hf_imdb/test/checkpoint-391
[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m Configuration saved in /tmp/hf_imdb/test/checkpoint-391/config.json
[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m Model weights saved in /tmp/hf_imdb/test/checkpoint-391/pytorch_model.bin


[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m {'train_runtime': 109.3646, 'train_samples_per_second': 57.148, 'train_steps_per_second': 3.575, 'train_loss': 0.2757800363213815, 'epoch': 1.0}
[2m[36m(train_fn pid=249)[0m Result for HuggingFaceTrainer_50527_00000:
[2m[36m(train_fn pid=249)[0m   _time_this_iter_s: 117.43207788467407
[2m[36m(train_fn pid=249)[0m   _timestamp: 1667561227
[2m[36m(train_fn pid=249)[0m   _training_iteration: 1
[2m[36m(train_fn pid=249)[0m   date: 2022-11-04_04-27-07
[2m[36m(train_fn pid=249)[0m   done: false
[2m[36m(train_fn pid=249)[0m   epoch: 1.0
[2m[36m(train_fn pid=249)[0m   experiment_id: 5c44a90f5c474864aee374a2ac1905e1
[2m[36m(train_fn pid=249)[0m   hostname: hfgputest-worker-small-group-hfgputest-8f4mg
[2m[36m(train_fn pid=249)[0m   iterations_since_restore: 1
[2m[36m(train_fn pid=249)[0m   node_ip: 10.128.64.16
[2m[36m(train_fn pid=249)[0m   pid: 146
[2m[36m(train_fn pid=249)[0m   should_checkpoint: tr

[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m 
[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m 
[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m Training completed. Do not forget to share your model on huggingface.co/models =)
[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m 
[2m[36m(BaseWorkerMixin pid=182, ip=10.128.64.16)[0m 


[2m[36m(train_fn pid=249)[0m == Status ==
[2m[36m(train_fn pid=249)[0m Current time: 2022-11-04 04:27:16 (running for 00:02:18.35)
[2m[36m(train_fn pid=249)[0m Memory usage on this node: 15.8/240.1 GiB
[2m[36m(train_fn pid=249)[0m Using FIFO scheduling algorithm.
[2m[36m(train_fn pid=249)[0m Resources requested: 5.0/10 CPUs, 4.0/4 GPUs, 0.0/22.35 GiB heap, 0.0/6.59 GiB objects
[2m[36m(train_fn pid=249)[0m Result logdir: /home/ray/ray_results/HuggingFaceTrainer_2022-11-04_04-24-57
[2m[36m(train_fn pid=249)[0m Number of trials: 1/1 (1 RUNNING)
[2m[36m(train_fn pid=249)[0m +--------------------------------+----------+------------------+--------+------------------+-----------------+----------------------------+--------------------------+
[2m[36m(train_fn pid=249)[0m | Trial name                     | status   | loc              |   iter |   total time (s) |   train_runtime |   train_samples_per_second |   train_steps_per_second |
[2m[36m(train_fn pid=249)[0m |



[2m[36m(train_fn pid=249)[0m Result for HuggingFaceTrainer_50527_00000:
[2m[36m(train_fn pid=249)[0m   _time_this_iter_s: 117.43207788467407
[2m[36m(train_fn pid=249)[0m   _timestamp: 1667561227
[2m[36m(train_fn pid=249)[0m   _training_iteration: 1
[2m[36m(train_fn pid=249)[0m   date: 2022-11-04_04-27-07
[2m[36m(train_fn pid=249)[0m   done: true
[2m[36m(train_fn pid=249)[0m   epoch: 1.0
[2m[36m(train_fn pid=249)[0m   experiment_id: 5c44a90f5c474864aee374a2ac1905e1
[2m[36m(train_fn pid=249)[0m   experiment_tag: '0'
[2m[36m(train_fn pid=249)[0m   hostname: hfgputest-worker-small-group-hfgputest-8f4mg
[2m[36m(train_fn pid=249)[0m   iterations_since_restore: 1
[2m[36m(train_fn pid=249)[0m   node_ip: 10.128.64.16
[2m[36m(train_fn pid=249)[0m   pid: 146
[2m[36m(train_fn pid=249)[0m   should_checkpoint: true
[2m[36m(train_fn pid=249)[0m   step: 391
[2m[36m(train_fn pid=249)[0m   time_since_restore: 124.04675316810608
[2m[36m(train_fn pid=249)

[2m[36m(train_fn pid=249)[0m 2022-11-04 04:27:20,036	INFO tune.py:747 -- Total run time: 142.25 seconds (141.95 seconds for the tuning loop).


In [22]:
ray.shutdown()

In [23]:
cluster.down()

## Conclusion
As shown in the above example, you can easily run your Huggingface transfer learning tasks easily and natively on CodeFlare. You can scale them from 1 to n GPUs without requiring you to make any significant code changes and leveraging the native Huggingface trainer. 

Also refer to additional notebooks that showcase other use cases
In our next notebook [./02_codeflare_workflows_encoding.ipynb ] shows an sklearn example and how you can leverage workflows to run experiment pipelines and explore multiple pipelines in parallel on CodeFlare cluster. 
