In [42]:
# !pip install transformers sentencepiece datasets accelerate bitsandbytes scipy cchardet 
!pip install evaluate rouge_score

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting nltk (from rouge_score)
  Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m54.5 MB/s[0m eta [36m0:00:00[0m
Collecting joblib (from nltk->rouge_score)
  Obtaining dependency information for joblib from https://files.pythonhosted.org/packages/10/40/d551139c85db202f1f384ba8bcf96aca2f329440a844f924c8a0040b6d02/joblib-1.3.2-py3-none-any.whl.metadata
  Downloading joblib-1.3.2-py3-none-any.whl.metadata (5.4 kB)
Downloading joblib-1.3.2-py3-none-any.whl (302 kB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.2/302.2 kB[0m [31m124.5 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ...

In [2]:
!pip install peft



# Fine-Tune a Causal Language Model for Dialogue Summarization

In this exercise, you will fine-tune Meta's Llama 2 for enhanced dialogue summarization. Llama 2 is a large language model (LLM) free for research and commercial use. It is one of the top-performing open-source LLM  comparable to GPT-3.5 on several benchmarks. 

We will explore the use of Parameter Efficient Fine-Tuning (PEFT) for fine-tuning, and evaluate the resulting model using ROUGE metrics. 

## Install the pre-requisites 

Uncomment the following if these python packages have not been installed

In [3]:
# !pip install transformers datasets accelerate sentencepiece scipy peft bitsandbytes ipywidgets nvidia-ml-py3

## Request access to Llama-2 weights

You need to request for access to download the Llama 2 weights. You can either do so through this [link at Meta](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) or through your huggingface account at this (link)(https://huggingface.co/meta-llama/Llama-2-7b). Once your request is approved, you will receive an email from Meta with instruction to download the Llama 2 weights, or email from Hugging Face informing you access has been granted. 

If you download the weights from Meta directly, you need to run a conversion script to convert the weights to huggingface format for use with huggingface transformer library.

In [4]:
# %%bash
# TRANSFORM=`python -c "import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')"`
# python ${TRANSFORM} --input_dir models --model_size 7B --output_dir models_hf/7B

In [5]:
# Uncomment the following to login to HuggingFace to access the Llama model 

from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [1]:
from pynvml import *

def print_gpu_utilization():
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    info = nvmlDeviceGetMemoryInfo(handle)
    print(f"GPU memory occupied: {info.used//1024**2} MB.")


def print_summary(result):
    print(f"Time: {result.metrics['train_runtime']:.2f}")
    print(f"Samples/second: {result.metrics['train_samples_per_second']:.2f}")
    print_gpu_utilization()

We first import all the necessary python libraries

In [2]:
import torch

from transformers import LlamaForCausalLM, LlamaTokenizer
from datasets import load_dataset

## Load the dataset

You are going to continue experimenting with the DialogSum Hugging Face dataset. It contains 10,000+ dialogues with the corresponding manually labeled summaries and topics. Note that the dataset is already split into train, validation and test sets.


In [3]:
from datasets import load_dataset

dataset_name = "knkarthick/dialogsum"
dataset = load_dataset(dataset_name)
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

In [4]:
dataset_train = dataset['train']
dataset_test = dataset['test']
dataset_val = dataset['validation']

Load the pre-trained Llama 2 model and its tokenizer directly from HuggingFace. We will load the model in 8 bit quantization to save memory. For a more detailed understanding about how the model perform the matrix multiplication in 8-bit, see this [blog post](https://huggingface.co/blog/hf-bitsandbytes-integration)

In [5]:
model_id="meta-llama/Llama-2-7b-hf"

tokenizer = LlamaTokenizer.from_pretrained(model_id)
model = LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, 
                                        device_map='auto', torch_dtype=torch.float32)
# tokenizer = AutoTokenizer.from_pretrained("huggingface-course/code-search-net-tokenizer")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [6]:
model.config

LlamaConfig {
  "_name_or_path": "meta-llama/Llama-2-7b-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "quantization_config": {
    "bnb_4bit_compute_dtype": "float32",
    "bnb_4bit_quant_type": "fp4",
    "bnb_4bit_use_double_quant": false,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": false,
    "load_in_8bit": true,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": 

In [7]:
print_gpu_utilization()

GPU memory occupied: 8750 MB.


The following shows the GPU memory consumption on an RTX GPU, with different model dtype.

- load_in_8bit = 8224 MB
- load_in_16bit = 13902 MB
- load_in_32bit = 26830

In [8]:
dataset['train'][100]

{'id': 'train_100',
 'dialogue': "#Person1#: I have a problem with my cable.\n#Person2#: What about it?\n#Person1#: My cable has been out for the past week or so.\n#Person2#: The cable is down right now. I am very sorry.\n#Person1#: When will it be working again?\n#Person2#: It should be back on in the next couple of days.\n#Person1#: Do I still have to pay for the cable?\n#Person2#: We're going to give you a credit while the cable is down.\n#Person1#: So, I don't have to pay for it?\n#Person2#: No, not until your cable comes back on.\n#Person1#: Okay, thanks for everything.\n#Person2#: You're welcome, and I apologize for the inconvenience.",
 'summary': "#Person1# has a problem with the cable. #Person2# promises it should work again and #Person1# doesn't have to pay while it's down.",
 'topic': 'cable'}

In [9]:
eval_prompt = """
Summarize this dialog:
#Person1#: I have a problem with my cable.
#Person2#: What about it?
#Person1#: My cable has been out for the past week or so.
#Person2#: The cable is down right now. I am very sorry.
#Person1#: When will it be working again?
#Person2#: It should be back on in the next couple of days.
#Person1#: Do I still have to pay for the cable?
#Person2#: We're going to give you a credit while the cable is down.
#Person1#: So, I don't have to pay for it?
#Person2#: No, not until your cable comes back on.
#Person1#: Okay, thanks for everything.
#Person2#: You're welcome, and I apologize for the inconvenience.
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=200)[0], skip_special_tokens=True))

2023-10-24 22:46:30.835525: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-10-24 22:46:30.880525: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.



Summarize this dialog:
#Person1#: I have a problem with my cable.
#Person2#: What about it?
#Person1#: My cable has been out for the past week or so.
#Person2#: The cable is down right now. I am very sorry.
#Person1#: When will it be working again?
#Person2#: It should be back on in the next couple of days.
#Person1#: Do I still have to pay for the cable?
#Person2#: We're going to give you a credit while the cable is down.
#Person1#: So, I don't have to pay for it?
#Person2#: No, not until your cable comes back on.
#Person1#: Okay, thanks for everything.
#Person2#: You're welcome, and I apologize for the inconvenience.
---
Summary:
#Person1#: I have a problem with my cable.
#Person2#: What about it?
#Person1#: My cable has been out for the past week or so.
#Person2#: The cable is down right now. I am very sorry.
#Person1#: When will it be working again?
#Person2#: It should be back on in the next couple of days.
#Person1#: Do I still have to pay for the cable?
#Person2#: We're going t

We can see that the base model only repeats the conversation.

## Data Processing 

### Instruction prompt 

We need to convert the dialog-summary (prompt-response) pairs into explicit instructions for the LLM such as follows:

```
Summarize this dialog:

#Person1#: This is Person1 part of the conversation.
#Person2#: This is Person2 part of the conversation.
---
Summary: 
This is ground truth summary of the dialog.
```

We will create a prompt template and a function to apply the template to all the samples in the dataset. Note that we also append a eos token to the end of the sample. This is so that the fine-tuned model will learn to end the sentence at the appropriate time (e.g. end of the summary) instead of generating tokens infinitely. 

In [10]:
def apply_prompt_template(sample):
    prompt = (
        f"Summarize this dialog:\n{{dialog}}\n---\nSummary:\n{{summary}}{{eos_token}}"
    )

    return {
        "text": prompt.format(
            dialog=sample["dialogue"],
            summary=sample["summary"],
            eos_token=tokenizer.eos_token,
        )
    }
            
dataset_train = dataset_train.map(apply_prompt_template, remove_columns=list(dataset_train.features))

Let's look at one of the sample. We can see that the original sample has been converted to sample with a single 'text' field, and the text now confirms to the template we specified.

In [11]:
dataset_train[0]

{'text': "Summarize this dialog:\n#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\n#Person2#: I found it would be a good idea to get a check-up.\n#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.\n#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?\n#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\n#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.\n#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.\n#Person2#: Ok, thanks doctor.\n---\nSummary:\nMr. Smi

**Exercise**

Apply the prompt template to the validation and test splits too.

In [12]:
dataset_val = dataset_val.map(apply_prompt_template, remove_columns=list(dataset_val.features))
dataset_test = dataset_test.map(apply_prompt_template, remove_columns=list(dataset_test.features))

## Tokenization and Preparing the Input 

### Tokenization 

Before we can use the dataset for training, we first need to tokenize the dataset. 

In [13]:
def tokenize_function(examples):
    return tokenizer(examples["text"])

dataset_train_tokenized = dataset_train.map(
    tokenize_function,
    batched=True,
    num_proc=4,
    remove_columns=dataset_train.features,
)

In [59]:
dataset_val_tokenized = dataset_val.map(
    tokenize_function,
    batched=True,
    num_proc=4,
    remove_columns=dataset_train.features,
)

Map (num_proc=4):   0%|          | 0/500 [00:00<?, ? examples/s]

We can see that after tokenization, we now have input_ids (which contains the id corresponding to a token (subword), and the attention mask, the attention mask tells the model which token to ignore (e.g. padding). We also shown the input_ids length of the first sample, which in this case is 341 (token ids).

In [14]:
print("Dataset info: ", dataset_train_tokenized)
print("Length of input_ids: ", len(dataset_train_tokenized['input_ids'][0]))
print("Sample input: \n", dataset_train_tokenized[0])

Dataset info:  Dataset({
    features: ['input_ids', 'attention_mask'],
    num_rows: 12460
})
Length of input_ids:  341
Sample input: 
 {'input_ids': [1, 6991, 3034, 675, 445, 7928, 29901, 13, 29937, 7435, 29896, 29937, 29901, 6324, 29892, 3237, 29889, 7075, 29889, 306, 29915, 29885, 15460, 10875, 11335, 29889, 3750, 526, 366, 1244, 9826, 29973, 13, 29937, 7435, 29906, 29937, 29901, 306, 1476, 372, 723, 367, 263, 1781, 2969, 304, 679, 263, 1423, 29899, 786, 29889, 13, 29937, 7435, 29896, 29937, 29901, 3869, 29892, 1532, 29892, 366, 7359, 29915, 29873, 750, 697, 363, 29871, 29945, 2440, 29889, 887, 881, 505, 697, 1432, 1629, 29889, 13, 29937, 7435, 29906, 29937, 29901, 306, 1073, 29889, 306, 4377, 408, 1472, 408, 727, 338, 3078, 2743, 29892, 2020, 748, 1074, 278, 11619, 29973, 13, 29937, 7435, 29896, 29937, 29901, 5674, 29892, 278, 1900, 982, 304, 4772, 10676, 4486, 2264, 267, 338, 304, 1284, 714, 1048, 963, 4688, 29889, 1105, 1018, 304, 2041, 472, 3203, 2748, 263, 1629, 363, 596, 1914

Now let's prepare the input data to the moodel. As you can see above, typically the length of the token ids (input_ids) are few hundred tokens long. However, Llama model typically have 2048 or 4096 context window. So it makes sense that we can concatenate a few samples (up to the context window limit), to be used as the final input to the model. Also we also need to create 'labels' in the input dataset, which tells the model what is the token to be predicted.  Shifting the inputs and labels to align them happens inside the model, so our labels are just the exact copy of the input_ids.

Let's find the maximum content window of the model

In [15]:
def get_max_context_length(model):
    
    conf = model.config
    max_length = None
    
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max context lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max context length: {max_length}")
        
    return max_length

In [16]:
context_length = get_max_context_length(model)
print('Context length: ', context_length)

Found max context lenth: 4096
Context length:  4096


The following functions concatenate a batch of samples, and then divide the concatenated sample into chunks of context size. It also create a 'labels' field which is same as 'input_ids'. 

In [17]:
context_length = 512

def group_texts(examples):
    
    # Concatenate all texts.
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
    # customize this part to your needs.
    if total_length >= context_length:
        total_length = (total_length // context_length) * context_length
    # Split by chunks of context length.
    result = {
        k: [t[i : i + context_length] for i in range(0, total_length, context_length)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result



In [18]:
dataset_train_final = dataset_train_tokenized.map(group_texts, batched=True, num_proc=4)

Map (num_proc=4):   0%|          | 0/12460 [00:00<?, ? examples/s]

In [60]:
dataset_val_final = dataset_val_tokenized.map(group_texts, batched=True, num_proc=4)

Map (num_proc=4):   0%|          | 0/500 [00:00<?, ? examples/s]

Now let's examine the dataset_train_final and we can see that all the samples are of lenghth equal to the specified context window. 

In [19]:
for sample in dataset_train_final['input_ids'][:5]: 
    print(len(sample))

512
512
512
512
512


In [20]:
from transformers import default_data_collator

data_collator = default_data_collator

## Setup the PEFT/LoRA model for Fine-Tuning

You need to set up the PEFT/LoRA model for fine-tuning with a new layer/parameter adapter. Using PEFT/LoRA, you are freezing the underlying LLM and only training the adapter. Have a look at the LoRA configuration below. Note the rank (r) hyper-parameter, which defines the rank/dimension of the adapter to be trained.


In [21]:
from peft import (
    get_peft_model,
    LoraConfig, 
    TaskType, 
    prepare_model_for_int8_training
)

lora_config = LoraConfig(
    r=8, # Rank
    lora_alpha=32,
    inference_mode=False,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

model.add_adapter(lora_config)
# model = prepare_model_for_int8_training(model)
# model = get_peft_model(model, lora_config)
# model.print_trainable_parameters()

In [22]:
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(model))

trainable model parameters: 4194304
all model parameters: 6742609920
percentage of trainable model parameters: 0.06%


In [23]:
model.train()

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear8bitLt(
            in_features=4096, out_features=4096, bias=False
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=4096, out_features=8, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=8, out_features=4096, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
          )
          (k_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear8bitLt(
            in_features=4096, out_features=4096, bias=False
            (lora_dropout): ModuleDict(
              (default): Dropout(p=

If you look at the trainable prarameters, there are only about 4 million parameters, comparaed to about 6.7 billion parameters of the entire model. 

## Define the Trainer and Training Arguments 

We can now define training arguments and create Trainer instance. If you are using Ampere GPU (e.g. NVIDIA A10), then you can set bf16 to True to use bfloat16 for mixed precision computation.

In [61]:
from transformers import default_data_collator, Trainer, TrainingArguments
from transformers import DataCollatorForLanguageModeling


output_dir = "tmp/llama-output"

# Define training args
training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    auto_find_batch_size=True,
    # per_device_train_batch_size=1,
    # gradient_accumulation_steps=4,
    bf16=False,  # Use BF16 if available
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="steps",
    evaluation_strategy ='steps',
    optim="adamw_torch_fused",
    num_train_epochs=1,
    load_best_model_at_end=True,
    # max_steps=300
)

    # Create Trainer instance
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset_train_final,
    eval_dataset=dataset_val_final,
    data_collator=data_collator,
)

# Start trainingwe 


In [62]:
trainer.train()

Step,Training Loss,Validation Loss


Step,Training Loss,Validation Loss
10,1.2769,1.389102
20,1.226,1.382377
30,1.3216,1.379167
40,1.3161,1.376562
50,1.2672,1.377243
60,1.3413,1.376844
70,1.3217,1.376318


KeyboardInterrupt: 

In [63]:
trainer.evaluate(eval_dataset=dataset_val_final)

{'eval_loss': 1.3763178586959839}

### Save the trained model

In [None]:
model

In [64]:


save_dir = 'lora_model_output'
model.save_pretrained(save_dir)




In [65]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch 

model_id = 'meta-llama/Llama-2-7b-hf'
save_dir = 'lora_model_output'
tokenizer = AutoTokenizer.from_pretrained(model_id)
peft_model = AutoModelForCausalLM.from_pretrained(save_dir, device_map='cuda:0', load_in_8bit=True, torch_dtype=torch.float16)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [66]:
peft_model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear8bitLt(
            in_features=4096, out_features=4096, bias=False
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=4096, out_features=8, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=8, out_features=4096, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
          )
          (k_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear8bitLt(
            in_features=4096, out_features=4096, bias=False
            (lora_dropout): ModuleDict(
              (default): Dropout(p=

In [67]:
import torch 

num_of_gpus = torch.cuda.device_count()
print(num_of_gpus)

1


### Test the Model 

Now let's test our fine-tuned model on the same prompt.

In [None]:
eval_prompt = """
Summarize this dialog:
#Person1#: Hello, how are you doing today?
#Person2#: I ' Ve been having trouble breathing lately.
#Person1#: Have you had any type of cold lately?
#Person2#: No, I haven ' t had a cold. I just have a heavy feeling in my chest when I try to breathe.
#Person1#: Do you have any allergies that you know of?
#Person2#: No, I don ' t have any allergies that I know of.
#Person1#: Does this happen all the time or mostly when you are active?
#Person2#: It happens a lot when I work out.
#Person1#: I am going to send you to a pulmonary specialist who can run tests on you for asthma.
#Person2#: Thank you for your help, doctor.
---
Summary:
"""

# eval_prompt = """
# Summarize this dialog:
# A: Hi Tom, are you busy tomorrow’s afternoon?
# B: I’m pretty sure I am. What’s up?
# A: Can you go with me to the animal shelter?.
# B: What do you want to do?
# A: I want to get a puppy for my son.
# B: That will make him so happy.
# A: Yeah, we’ve discussed it many times. I think he’s ready now.
# B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) 
# A: I'll get him one of those little dogs.
# B: One that won't grow up too big;-)
# A: And eat too much;-))
# B: Do you know which one he would like?
# A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
# B: I bet you had to drag him away.
# A: He wanted to take it home right away ;-).
# B: I wonder what he'll name it.
# A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
# ---
# Summary:
# """

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input)[0], skip_special_tokens=True))

In [None]:
index = 0
dialogue = dataset['test'][index]['dialogue']
baseline_human_summary = dataset['test'][index]['summary']

eval_prompt = f"""
Summarize this dialog:
{dialogue}
---
Summary: 
"""



In [99]:
def get_summary(text): 
    parts = re.split(r'Summary:', text)
    summary = parts[1].strip()
    return summary

In [90]:
dataset['validation']['dialogue'][0:10]

["#Person1#: Hello, how are you doing today?\n#Person2#: I ' Ve been having trouble breathing lately.\n#Person1#: Have you had any type of cold lately?\n#Person2#: No, I haven ' t had a cold. I just have a heavy feeling in my chest when I try to breathe.\n#Person1#: Do you have any allergies that you know of?\n#Person2#: No, I don ' t have any allergies that I know of.\n#Person1#: Does this happen all the time or mostly when you are active?\n#Person2#: It happens a lot when I work out.\n#Person1#: I am going to send you to a pulmonary specialist who can run tests on you for asthma.\n#Person2#: Thank you for your help, doctor.",
 "#Person1#: Hey Jimmy. Let's go workout later today.\n#Person2#: Sure. What time do you want to go?\n#Person1#: How about at 3:30?\n#Person2#: That sounds good. Today we work on Legs and forearm.\n#Person1#: Hey. I just played basketball earlier, so my legs are a little sore. Let's work out on arms and stomach today.\n#Person2#: I'm on a weekly schedule. You're

In [104]:
len(dataset['test']['summary'])

1500

In [105]:
dialogues = dataset['test']['dialogue'][:20]
human_baseline_summaries = dataset['test']['summary'][:20]
peft_model_summaries = []

for _, dialogue in enumerate(dialogues):
    eval_prompt = f"""
Summarize this dialog:
{dialogue}
---
Summary: 
"""
    # print(eval_prompt)
    model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
    with torch.no_grad():
        peft_model_output = tokenizer.decode(peft_model.generate(**model_input)[0], skip_special_tokens=True)
        # print(peft_model_output)
    summary = get_summary(peft_model_output)
    peft_model_summaries.append(summary)
   

In [106]:
human_baseline_summaries

['Ms. Dawson helps #Person1# to write a memo to inform every employee that they have to change the communication method and should not use Instant Messaging anymore.',
 'In order to prevent employees from wasting time on Instant Message programs, #Person1# decides to terminate the use of those programs and asks Ms. Dawson to send out a memo to all employees by the afternoon.',
 'Ms. Dawson takes a dictation for #Person1# about prohibiting the use of Instant Message programs in the office. They argue about its reasonability but #Person1# still insists.',
 '#Person2# arrives late because of traffic jam. #Person1# persuades #Person2# to use public transportations to keep healthy and to protect the environment.',
 "#Person2# decides to follow #Person1#'s suggestions on quitting driving to work and will try to use public transportations.",
 '#Person2# complains to #Person1# about the traffic jam, #Person1# suggests quitting driving and taking public transportation instead.',
 '#Person1# tel

In [107]:
peft_model_summaries

["#Person1# wants #Person2# to take a dictation for a memo. #Person1#'s memo forbids the use of Instant Messaging in this office.",
 'Ms. Dawson is taking a dictation from #Person1# to write an intra-office memo. #Person1# wants to prohibit employees from using Instant Messaging in this office.',
 '#Person1# tells Ms. Dawson to take a dictation for an intra-office memorandum. #Person1# tells employees not to use Instant Message programs during working hours.',
 "#Person2# tells #Person1# #Person2#'s stuck in traffic. #Person1# suggests taking public transportation and #Person2# agrees.",
 "#Person2# tells #Person1# that #Person2# got stuck in traffic and #Person2#'s considering taking public transport system to work. #Person1# suggests biking to work.",
 '#Person1# suggests #Person2# take public transportation. #Person2# agrees and will consider it.',
 "Kate and Masha tell #Person1# Masha and Hero are getting divorced. They are surprised and can't believe it.",
 'Kate tells #Person2# M

In [108]:
import evaluate 

rouge = evaluate.load('rouge')

peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)
print('PEFT MODEL:')
print(peft_model_results)

PEFT MODEL:
{'rouge1': 0.47687277392042926, 'rouge2': 0.17948814323541878, 'rougeL': 0.3561763617459158, 'rougeLsum': 0.35659224975268}
