Wednesday, November 8, 2023

https://huggingface.co/docs/peft/task_guides/clm-prompt-tuning

https://github.com/huggingface/notebooks/blob/main/peft_docs/en/pytorch/clm-prompt-tuning.ipynb

This all runs just fine!

# Prompt tuning for causal language modeling

Prompting helps guide language model behavior by adding some input text specific to a task. Prompt tuning is an additive method for only training and updating the newly added prompt tokens to a pretrained model. This way, you can use one pretrained model whose weights are frozen, and train and update a smaller set of prompt parameters for each downstream task instead of fully finetuning a separate model. As models grow larger and larger, prompt tuning can be more efficient, and results are even better as model parameters scale.

<Tip>

💡 Read [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/abs/2104.08691) to learn more about prompt tuning.

</Tip>

This guide will show you how to apply prompt tuning to train a [`bloomz-560m`](https://huggingface.co/bigscience/bloomz-560m) model on the `twitter_complaints` subset of the [RAFT](https://huggingface.co/datasets/ought/raft) dataset.

Before you begin, make sure you have all the necessary libraries installed:

```bash
!pip install -q peft transformers datasets
```

## Setup

Start by defining the model and tokenizer, the dataset and the dataset columns to train on, some training hyperparameters, and the [PromptTuningConfig](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig). The [PromptTuningConfig](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig) contains information about the task type, the text to initialize the prompt embedding, the number of virtual tokens, and the tokenizer to use:

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup
from peft import get_peft_config, get_peft_model, PromptTuningInit, PromptTuningConfig, TaskType, PeftType
import torch
from datasets import load_dataset
import os
from torch.utils.data import DataLoader
from tqdm import tqdm

device = "cuda"
model_name_or_path = "bigscience/bloomz-560m"
tokenizer_name_or_path = "bigscience/bloomz-560m"
peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.TEXT,
    num_virtual_tokens=8,
    prompt_tuning_init_text="Classify if the tweet is a complaint or not:",
    tokenizer_name_or_path=model_name_or_path,
)

dataset_name = "twitter_complaints"
checkpoint_name = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace(
    "/", "_"
)
text_column = "Tweet text"
label_column = "text_label"
max_length = 64
lr = 3e-2
num_epochs = 50
batch_size = 8

## Load dataset

For this guide, you'll load the `twitter_complaints` subset of the [RAFT](https://huggingface.co/datasets/ought/raft) dataset. This subset contains tweets that are labeled either `complaint` or `no complaint`:

In [2]:
dataset = load_dataset("ought/raft", dataset_name)
dataset["train"][0]
{"Tweet text": "@HMRCcustomers No this is my first job", "ID": 0, "Label": 2}

Downloading builder script:   0%|          | 0.00/11.9k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/56.1k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/15.2k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/11 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/7.79k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/662k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/3.91k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/327k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/11.5k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/917k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/54.8k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/70.0k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/196k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.58k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/412k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/52.5k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.31M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/201k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.09M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.64k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/412k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/5.38k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/336k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/8.12k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/68.5k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/11 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/50 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/3399 [00:00<?, ? examples/s]

{'Tweet text': '@HMRCcustomers No this is my first job', 'ID': 0, 'Label': 2}

To make the `Label` column more readable, replace the `Label` value with the corresponding label text and store them in a `text_label` column. You can use the [map](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.map) function to apply this change over the entire dataset in one step:

In [3]:
classes = [k.replace("_", " ") for k in dataset["train"].features["Label"].names]
dataset = dataset.map(
    lambda x: {"text_label": [classes[label] for label in x["Label"]]},
    batched=True,
    num_proc=1,
)
dataset["train"][0]
{"Tweet text": "@HMRCcustomers No this is my first job", "ID": 0, "Label": 2, "text_label": "no complaint"}

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

Map:   0%|          | 0/3399 [00:00<?, ? examples/s]

{'Tweet text': '@HMRCcustomers No this is my first job',
 'ID': 0,
 'Label': 2,
 'text_label': 'no complaint'}

## Preprocess dataset

Next, you'll setup a tokenizer; configure the appropriate padding token to use for padding sequences, and determine the maximum length of the tokenized labels:

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id
target_max_length = max([len(tokenizer(class_label)["input_ids"]) for class_label in classes])
print(target_max_length)
3

Downloading (…)okenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

3


3

Create a `preprocess_function` to:

1. Tokenize the input text and labels.
2. For each example in a batch, pad the labels with the tokenizers `pad_token_id`.
3. Concatenate the input text and labels into the `model_inputs`.
4. Create a separate attention mask for `labels` and `model_inputs`.
5. Loop through each example in the batch again to pad the input ids, labels, and attention mask to the `max_length` and convert them to PyTorch tensors.

In [5]:
def preprocess_function(examples):
    batch_size = len(examples[text_column])
    inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]]
    targets = [str(x) for x in examples[label_column]]
    model_inputs = tokenizer(inputs)
    labels = tokenizer(targets)
    for i in range(batch_size):
        sample_input_ids = model_inputs["input_ids"][i]
        label_input_ids = labels["input_ids"][i] + [tokenizer.pad_token_id]
        # print(i, sample_input_ids, label_input_ids)
        model_inputs["input_ids"][i] = sample_input_ids + label_input_ids
        labels["input_ids"][i] = [-100] * len(sample_input_ids) + label_input_ids
        model_inputs["attention_mask"][i] = [1] * len(model_inputs["input_ids"][i])
    # print(model_inputs)
    for i in range(batch_size):
        sample_input_ids = model_inputs["input_ids"][i]
        label_input_ids = labels["input_ids"][i]
        model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (
            max_length - len(sample_input_ids)
        ) + sample_input_ids
        model_inputs["attention_mask"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[
            "attention_mask"
        ][i]
        labels["input_ids"][i] = [-100] * (max_length - len(sample_input_ids)) + label_input_ids
        model_inputs["input_ids"][i] = torch.tensor(model_inputs["input_ids"][i][:max_length])
        model_inputs["attention_mask"][i] = torch.tensor(model_inputs["attention_mask"][i][:max_length])
        labels["input_ids"][i] = torch.tensor(labels["input_ids"][i][:max_length])
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

Use the [map](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.map) function to apply the `preprocess_function` to the entire dataset. You can remove the unprocessed columns since the model won't need them:

In [6]:
processed_datasets = dataset.map(
    preprocess_function,
    batched=True,
    num_proc=1,
    remove_columns=dataset["train"].column_names,
    load_from_cache_file=False,
    desc="Running tokenizer on dataset",
)

Running tokenizer on dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Running tokenizer on dataset:   0%|          | 0/3399 [00:00<?, ? examples/s]

Create a [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) from the `train` and `eval` datasets. Set `pin_memory=True` to speed up the data transfer to the GPU during training if the samples in your dataset are on a CPU.

In [7]:
train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["test"]


train_dataloader = DataLoader(
    train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True
)
eval_dataloader = DataLoader(eval_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)

## Train

You're almost ready to setup your model and start training!

Initialize a base model from [AutoModelForCausalLM](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForCausalLM), and pass it and `peft_config` to the `get_peft_model()` function to create a [PeftModel](https://huggingface.co/docs/peft/main/en/package_reference/peft_model#peft.PeftModel). You can print the new [PeftModel](https://huggingface.co/docs/peft/main/en/package_reference/peft_model#peft.PeftModel)'s trainable parameters to see how much more efficient it is than training the full parameters of the original model!

In [8]:
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
print(model.print_trainable_parameters())
"trainable params: 8192 || all params: 559222784 || trainable%: 0.0014648902430985358"

Downloading (…)lve/main/config.json:   0%|          | 0.00/715 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

trainable params: 8,192 || all params: 559,222,784 || trainable%: 0.0014648902430985358
None


'trainable params: 8192 || all params: 559222784 || trainable%: 0.0014648902430985358'

Setup an optimizer and learning rate scheduler:

In [9]:
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=(len(train_dataloader) * num_epochs),
)

Move the model to the GPU, then write a training loop to start training!

In [10]:
%%time
model = model.to(device)

for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    for step, batch in enumerate(tqdm(train_dataloader)):
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        total_loss += loss.detach().float()
        loss.backward()
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

    model.eval()
    eval_loss = 0
    eval_preds = []
    for step, batch in enumerate(tqdm(eval_dataloader)):
        batch = {k: v.to(device) for k, v in batch.items()}
        with torch.no_grad():
            outputs = model(**batch)
        loss = outputs.loss
        eval_loss += loss.detach().float()
        eval_preds.extend(
            tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)
        )

    eval_epoch_loss = eval_loss / len(eval_dataloader)
    eval_ppl = torch.exp(eval_epoch_loss)
    train_epoch_loss = total_loss / len(train_dataloader)
    train_ppl = torch.exp(train_epoch_loss)
    print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")

100%|██████████| 7/7 [00:00<00:00, 15.37it/s]
100%|██████████| 425/425 [00:12<00:00, 34.98it/s]


epoch=0: train_ppl=tensor(7.0680e+15, device='cuda:0') train_epoch_loss=tensor(36.4944, device='cuda:0') eval_ppl=tensor(16207.8301, device='cuda:0') eval_epoch_loss=tensor(9.6932, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.57it/s]
100%|██████████| 425/425 [00:12<00:00, 35.01it/s]


epoch=1: train_ppl=tensor(147592.2031, device='cuda:0') train_epoch_loss=tensor(11.9022, device='cuda:0') eval_ppl=tensor(3121.0186, device='cuda:0') eval_epoch_loss=tensor(8.0459, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.89it/s]
100%|██████████| 425/425 [00:12<00:00, 34.92it/s]


epoch=2: train_ppl=tensor(15619.4014, device='cuda:0') train_epoch_loss=tensor(9.6563, device='cuda:0') eval_ppl=tensor(2424.0610, device='cuda:0') eval_epoch_loss=tensor(7.7932, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.54it/s]
100%|██████████| 425/425 [00:12<00:00, 34.83it/s]


epoch=3: train_ppl=tensor(2017.9945, device='cuda:0') train_epoch_loss=tensor(7.6099, device='cuda:0') eval_ppl=tensor(2183.2898, device='cuda:0') eval_epoch_loss=tensor(7.6886, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.66it/s]
100%|██████████| 425/425 [00:12<00:00, 34.71it/s]


epoch=4: train_ppl=tensor(471.8139, device='cuda:0') train_epoch_loss=tensor(6.1566, device='cuda:0') eval_ppl=tensor(4085.8684, device='cuda:0') eval_epoch_loss=tensor(8.3153, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.85it/s]
100%|██████████| 425/425 [00:12<00:00, 34.84it/s]


epoch=5: train_ppl=tensor(230.2717, device='cuda:0') train_epoch_loss=tensor(5.4393, device='cuda:0') eval_ppl=tensor(8388.0713, device='cuda:0') eval_epoch_loss=tensor(9.0346, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.73it/s]
100%|██████████| 425/425 [00:12<00:00, 34.83it/s]


epoch=6: train_ppl=tensor(180.4847, device='cuda:0') train_epoch_loss=tensor(5.1956, device='cuda:0') eval_ppl=tensor(10391.9326, device='cuda:0') eval_epoch_loss=tensor(9.2488, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.77it/s]
100%|██████████| 425/425 [00:12<00:00, 34.95it/s]


epoch=7: train_ppl=tensor(133.1507, device='cuda:0') train_epoch_loss=tensor(4.8915, device='cuda:0') eval_ppl=tensor(13184.1123, device='cuda:0') eval_epoch_loss=tensor(9.4868, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.68it/s]
100%|██████████| 425/425 [00:12<00:00, 35.02it/s]


epoch=8: train_ppl=tensor(120.2509, device='cuda:0') train_epoch_loss=tensor(4.7896, device='cuda:0') eval_ppl=tensor(8696.5908, device='cuda:0') eval_epoch_loss=tensor(9.0707, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.70it/s]
100%|██████████| 425/425 [00:12<00:00, 35.21it/s]


epoch=9: train_ppl=tensor(91.3553, device='cuda:0') train_epoch_loss=tensor(4.5148, device='cuda:0') eval_ppl=tensor(12310.6562, device='cuda:0') eval_epoch_loss=tensor(9.4182, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.64it/s]
100%|██████████| 425/425 [00:12<00:00, 35.13it/s]


epoch=10: train_ppl=tensor(69.8857, device='cuda:0') train_epoch_loss=tensor(4.2469, device='cuda:0') eval_ppl=tensor(14289.9707, device='cuda:0') eval_epoch_loss=tensor(9.5673, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.72it/s]
100%|██████████| 425/425 [00:12<00:00, 35.14it/s]


epoch=11: train_ppl=tensor(56.8103, device='cuda:0') train_epoch_loss=tensor(4.0397, device='cuda:0') eval_ppl=tensor(24352.0059, device='cuda:0') eval_epoch_loss=tensor(10.1004, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.42it/s]
100%|██████████| 425/425 [00:12<00:00, 35.07it/s]


epoch=12: train_ppl=tensor(44.5513, device='cuda:0') train_epoch_loss=tensor(3.7966, device='cuda:0') eval_ppl=tensor(29604.6797, device='cuda:0') eval_epoch_loss=tensor(10.2957, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.88it/s]
100%|██████████| 425/425 [00:12<00:00, 35.17it/s]


epoch=13: train_ppl=tensor(35.6193, device='cuda:0') train_epoch_loss=tensor(3.5729, device='cuda:0') eval_ppl=tensor(40978.5039, device='cuda:0') eval_epoch_loss=tensor(10.6208, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.74it/s]
100%|██████████| 425/425 [00:12<00:00, 35.17it/s]


epoch=14: train_ppl=tensor(28.7064, device='cuda:0') train_epoch_loss=tensor(3.3571, device='cuda:0') eval_ppl=tensor(37395.9531, device='cuda:0') eval_epoch_loss=tensor(10.5293, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.82it/s]
100%|██████████| 425/425 [00:12<00:00, 35.22it/s]


epoch=15: train_ppl=tensor(26.4894, device='cuda:0') train_epoch_loss=tensor(3.2767, device='cuda:0') eval_ppl=tensor(45150.0117, device='cuda:0') eval_epoch_loss=tensor(10.7177, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.49it/s]
100%|██████████| 425/425 [00:12<00:00, 35.29it/s]


epoch=16: train_ppl=tensor(20.4859, device='cuda:0') train_epoch_loss=tensor(3.0197, device='cuda:0') eval_ppl=tensor(49591.0352, device='cuda:0') eval_epoch_loss=tensor(10.8116, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.96it/s]
100%|██████████| 425/425 [00:12<00:00, 35.27it/s]


epoch=17: train_ppl=tensor(14.9471, device='cuda:0') train_epoch_loss=tensor(2.7045, device='cuda:0') eval_ppl=tensor(135198.1719, device='cuda:0') eval_epoch_loss=tensor(11.8145, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.75it/s]
100%|██████████| 425/425 [00:12<00:00, 35.32it/s]


epoch=18: train_ppl=tensor(12.6193, device='cuda:0') train_epoch_loss=tensor(2.5352, device='cuda:0') eval_ppl=tensor(56298.5000, device='cuda:0') eval_epoch_loss=tensor(10.9384, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.70it/s]
100%|██████████| 425/425 [00:12<00:00, 35.21it/s]


epoch=19: train_ppl=tensor(9.2370, device='cuda:0') train_epoch_loss=tensor(2.2232, device='cuda:0') eval_ppl=tensor(181260.7344, device='cuda:0') eval_epoch_loss=tensor(12.1077, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.59it/s]
100%|██████████| 425/425 [00:12<00:00, 35.26it/s]


epoch=20: train_ppl=tensor(6.8894, device='cuda:0') train_epoch_loss=tensor(1.9300, device='cuda:0') eval_ppl=tensor(261353.2031, device='cuda:0') eval_epoch_loss=tensor(12.4736, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.59it/s]
100%|██████████| 425/425 [00:12<00:00, 35.22it/s]


epoch=21: train_ppl=tensor(5.0490, device='cuda:0') train_epoch_loss=tensor(1.6192, device='cuda:0') eval_ppl=tensor(264372.1875, device='cuda:0') eval_epoch_loss=tensor(12.4851, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.73it/s]
100%|██████████| 425/425 [00:12<00:00, 35.08it/s]


epoch=22: train_ppl=tensor(4.3179, device='cuda:0') train_epoch_loss=tensor(1.4628, device='cuda:0') eval_ppl=tensor(146032.9062, device='cuda:0') eval_epoch_loss=tensor(11.8916, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.43it/s]
100%|██████████| 425/425 [00:12<00:00, 35.13it/s]


epoch=23: train_ppl=tensor(3.4151, device='cuda:0') train_epoch_loss=tensor(1.2282, device='cuda:0') eval_ppl=tensor(122943.0781, device='cuda:0') eval_epoch_loss=tensor(11.7195, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.69it/s]
100%|██████████| 425/425 [00:12<00:00, 35.10it/s]


epoch=24: train_ppl=tensor(3.1189, device='cuda:0') train_epoch_loss=tensor(1.1375, device='cuda:0') eval_ppl=tensor(64069.2266, device='cuda:0') eval_epoch_loss=tensor(11.0677, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.52it/s]
100%|██████████| 425/425 [00:12<00:00, 35.13it/s]


epoch=25: train_ppl=tensor(2.3793, device='cuda:0') train_epoch_loss=tensor(0.8668, device='cuda:0') eval_ppl=tensor(161210.7969, device='cuda:0') eval_epoch_loss=tensor(11.9905, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.79it/s]
100%|██████████| 425/425 [00:12<00:00, 35.03it/s]


epoch=26: train_ppl=tensor(1.9486, device='cuda:0') train_epoch_loss=tensor(0.6671, device='cuda:0') eval_ppl=tensor(238718.7500, device='cuda:0') eval_epoch_loss=tensor(12.3830, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.83it/s]
100%|██████████| 425/425 [00:12<00:00, 35.34it/s]


epoch=27: train_ppl=tensor(1.7302, device='cuda:0') train_epoch_loss=tensor(0.5482, device='cuda:0') eval_ppl=tensor(299871.0625, device='cuda:0') eval_epoch_loss=tensor(12.6111, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.79it/s]
100%|██████████| 425/425 [00:12<00:00, 35.33it/s]


epoch=28: train_ppl=tensor(1.6643, device='cuda:0') train_epoch_loss=tensor(0.5094, device='cuda:0') eval_ppl=tensor(413936.1250, device='cuda:0') eval_epoch_loss=tensor(12.9335, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.73it/s]
100%|██████████| 425/425 [00:11<00:00, 35.45it/s]


epoch=29: train_ppl=tensor(2.4955, device='cuda:0') train_epoch_loss=tensor(0.9145, device='cuda:0') eval_ppl=tensor(44439.0898, device='cuda:0') eval_epoch_loss=tensor(10.7019, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.66it/s]
100%|██████████| 425/425 [00:12<00:00, 35.36it/s]


epoch=30: train_ppl=tensor(1.8853, device='cuda:0') train_epoch_loss=tensor(0.6341, device='cuda:0') eval_ppl=tensor(92553.5859, device='cuda:0') eval_epoch_loss=tensor(11.4355, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.73it/s]
100%|██████████| 425/425 [00:12<00:00, 35.35it/s]


epoch=31: train_ppl=tensor(1.6727, device='cuda:0') train_epoch_loss=tensor(0.5144, device='cuda:0') eval_ppl=tensor(76137.9453, device='cuda:0') eval_epoch_loss=tensor(11.2403, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.70it/s]
100%|██████████| 425/425 [00:12<00:00, 35.39it/s]


epoch=32: train_ppl=tensor(1.5616, device='cuda:0') train_epoch_loss=tensor(0.4457, device='cuda:0') eval_ppl=tensor(75425.5078, device='cuda:0') eval_epoch_loss=tensor(11.2309, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.91it/s]
100%|██████████| 425/425 [00:12<00:00, 35.42it/s]


epoch=33: train_ppl=tensor(1.4585, device='cuda:0') train_epoch_loss=tensor(0.3774, device='cuda:0') eval_ppl=tensor(96689.1719, device='cuda:0') eval_epoch_loss=tensor(11.4793, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.73it/s]
100%|██████████| 425/425 [00:11<00:00, 35.42it/s]


epoch=34: train_ppl=tensor(1.4327, device='cuda:0') train_epoch_loss=tensor(0.3596, device='cuda:0') eval_ppl=tensor(85096.2031, device='cuda:0') eval_epoch_loss=tensor(11.3515, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.84it/s]
100%|██████████| 425/425 [00:12<00:00, 35.38it/s]


epoch=35: train_ppl=tensor(1.4702, device='cuda:0') train_epoch_loss=tensor(0.3854, device='cuda:0') eval_ppl=tensor(62291.8047, device='cuda:0') eval_epoch_loss=tensor(11.0396, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.70it/s]
100%|██████████| 425/425 [00:11<00:00, 35.50it/s]


epoch=36: train_ppl=tensor(1.4065, device='cuda:0') train_epoch_loss=tensor(0.3411, device='cuda:0') eval_ppl=tensor(68574.2031, device='cuda:0') eval_epoch_loss=tensor(11.1357, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.86it/s]
100%|██████████| 425/425 [00:11<00:00, 35.42it/s]


epoch=37: train_ppl=tensor(1.4199, device='cuda:0') train_epoch_loss=tensor(0.3506, device='cuda:0') eval_ppl=tensor(79464.2812, device='cuda:0') eval_epoch_loss=tensor(11.2831, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.83it/s]
100%|██████████| 425/425 [00:12<00:00, 35.39it/s]


epoch=38: train_ppl=tensor(1.3477, device='cuda:0') train_epoch_loss=tensor(0.2984, device='cuda:0') eval_ppl=tensor(80618.8828, device='cuda:0') eval_epoch_loss=tensor(11.2975, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.67it/s]
100%|██████████| 425/425 [00:12<00:00, 35.40it/s]


epoch=39: train_ppl=tensor(1.3015, device='cuda:0') train_epoch_loss=tensor(0.2635, device='cuda:0') eval_ppl=tensor(116456.4609, device='cuda:0') eval_epoch_loss=tensor(11.6653, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.84it/s]
100%|██████████| 425/425 [00:12<00:00, 35.38it/s]


epoch=40: train_ppl=tensor(1.3408, device='cuda:0') train_epoch_loss=tensor(0.2933, device='cuda:0') eval_ppl=tensor(151659.2969, device='cuda:0') eval_epoch_loss=tensor(11.9294, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.59it/s]
100%|██████████| 425/425 [00:12<00:00, 35.38it/s]


epoch=41: train_ppl=tensor(1.3379, device='cuda:0') train_epoch_loss=tensor(0.2911, device='cuda:0') eval_ppl=tensor(100876.6875, device='cuda:0') eval_epoch_loss=tensor(11.5217, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.61it/s]
100%|██████████| 425/425 [00:12<00:00, 35.40it/s]


epoch=42: train_ppl=tensor(1.2662, device='cuda:0') train_epoch_loss=tensor(0.2360, device='cuda:0') eval_ppl=tensor(154022.4688, device='cuda:0') eval_epoch_loss=tensor(11.9449, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.80it/s]
100%|██████████| 425/425 [00:12<00:00, 35.37it/s]


epoch=43: train_ppl=tensor(1.2432, device='cuda:0') train_epoch_loss=tensor(0.2177, device='cuda:0') eval_ppl=tensor(192952.5156, device='cuda:0') eval_epoch_loss=tensor(12.1702, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.87it/s]
100%|██████████| 425/425 [00:12<00:00, 35.38it/s]


epoch=44: train_ppl=tensor(1.2490, device='cuda:0') train_epoch_loss=tensor(0.2223, device='cuda:0') eval_ppl=tensor(262722.9062, device='cuda:0') eval_epoch_loss=tensor(12.4789, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.62it/s]
100%|██████████| 425/425 [00:12<00:00, 35.38it/s]


epoch=45: train_ppl=tensor(1.2258, device='cuda:0') train_epoch_loss=tensor(0.2036, device='cuda:0') eval_ppl=tensor(217772.1094, device='cuda:0') eval_epoch_loss=tensor(12.2912, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.84it/s]
100%|██████████| 425/425 [00:12<00:00, 35.38it/s]


epoch=46: train_ppl=tensor(1.2123, device='cuda:0') train_epoch_loss=tensor(0.1925, device='cuda:0') eval_ppl=tensor(231966.2656, device='cuda:0') eval_epoch_loss=tensor(12.3543, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.73it/s]
100%|██████████| 425/425 [00:12<00:00, 35.37it/s]


epoch=47: train_ppl=tensor(1.2167, device='cuda:0') train_epoch_loss=tensor(0.1962, device='cuda:0') eval_ppl=tensor(247231.2344, device='cuda:0') eval_epoch_loss=tensor(12.4181, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.60it/s]
100%|██████████| 425/425 [00:12<00:00, 35.38it/s]


epoch=48: train_ppl=tensor(1.1836, device='cuda:0') train_epoch_loss=tensor(0.1686, device='cuda:0') eval_ppl=tensor(240573.1719, device='cuda:0') eval_epoch_loss=tensor(12.3908, device='cuda:0')


100%|██████████| 7/7 [00:00<00:00, 18.62it/s]
100%|██████████| 425/425 [00:12<00:00, 35.37it/s]

epoch=49: train_ppl=tensor(1.1905, device='cuda:0') train_epoch_loss=tensor(0.1744, device='cuda:0') eval_ppl=tensor(240761.8281, device='cuda:0') eval_epoch_loss=tensor(12.3916, device='cuda:0')
CPU times: user 10min 20s, sys: 2.67 s, total: 10min 23s
Wall time: 10min 23s





## Share model

You can store and share your model on the Hub if you'd like. Log in to your Hugging Face account and enter your token when prompted:

In [11]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Use the [push_to_hub](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.push_to_hub) function to upload your model to a model repository on the Hub:

In [12]:
peft_model_id = "robkayinto/bloomz-560m_PROMPT_TUNING_CAUSAL_LM"

model.push_to_hub(peft_model_id, use_auth_token=True)



adapter_model.bin:   0%|          | 0.00/34.0k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/robkayinto/bloomz-560m_PROMPT_TUNING_CAUSAL_LM/commit/46542c374c3bef2cde280e9514cbd3e195d024d2', commit_message='Upload model', commit_description='', oid='46542c374c3bef2cde280e9514cbd3e195d024d2', pr_url=None, pr_revision=None, pr_num=None)

Once the model is uploaded, you'll see the model file size is only 33.5kB! 🤏

## Inference

Let's try the model on a sample input for inference. If you look at the repository you uploaded the model to, you'll see a `adapter_config.json` file. Load this file into [PeftConfig](https://huggingface.co/docs/peft/main/en/package_reference/config#peft.PeftConfig) to specify the `peft_type` and `task_type`. Then you can load the prompt tuned model weights, and the configuration into [from_pretrained()](https://huggingface.co/docs/peft/main/en/package_reference/peft_model#peft.PeftModel.from_pretrained) to create the [PeftModel](https://huggingface.co/docs/peft/main/en/package_reference/peft_model#peft.PeftModel):

In [13]:
from peft import PeftModel, PeftConfig

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(model, peft_model_id)

Downloading (…)/adapter_config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

Downloading adapter_model.bin:   0%|          | 0.00/34.0k [00:00<?, ?B/s]

Grab a tweet and tokenize it:

In [14]:
inputs = tokenizer(
    f'{text_column} : {"@nationalgridus I have no water and the bill is current and paid. Can you do something about this?"} Label : ',
    return_tensors="pt",
)

Put the model on a GPU and *generate* the predicted label:

In [15]:
model.to(device)

with torch.no_grad():
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = model.generate(
        input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=10, eos_token_id=3
    )
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
[
    "Tweet text : @nationalgridus I have no water and the bill is current and paid. Can you do something about this? Label : complaint"
]

['Tweet text : @nationalgridus I have no water and the bill is current and paid. Can you do something about this? Label : complaint']


['Tweet text : @nationalgridus I have no water and the bill is current and paid. Can you do something about this? Label : complaint']