<a href="https://colab.research.google.com/github/mabench-tuc/LoRA-of-LLMs/blob/main/Gpt_2_FT_with_LoRA_on_E2E_NLG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Setup Installation Process

In [None]:
#!pip install git+https://github.com/microsoft/LoRA
!pip install -qU bitsandbytes datasets accelerate loralib transformers peft trl
!pip install datasets
!pip install -U sacrebleu evaluate rouge-score

Collecting git+https://github.com/microsoft/LoRA
  Cloning https://github.com/microsoft/LoRA to /tmp/pip-req-build-vgur7p2q
  Running command git clone --filter=blob:none --quiet https://github.com/microsoft/LoRA /tmp/pip-req-build-vgur7p2q
  Resolved https://github.com/microsoft/LoRA to commit c4593f060e6a368d7bb5af5273b8e42810cdef90
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: loralib
  Building wheel for loralib (setup.py) ... [?25l[?25hdone
  Created wheel for loralib: filename=loralib-0.1.2-py3-none-any.whl size=10185 sha256=79b6a9c29f4981b5aee663948f06e2ef2b210734836b8c300cd7dce0f986ef3d
  Stored in directory: /tmp/pip-ephem-wheel-cache-t_olkotz/wheels/38/bd/3e/3e5579a4d88c84baf2b817da2c3c1129ab43c962eb1e11b1e8
Successfully built loralib
Installing collected packages: loralib
Successfully installed loralib-0.1.2
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.1/76.1 MB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m


## Model Loading
Here we load the model with its weights, the tokenizer and the dataset

In [None]:
import torch
torch.cuda.is_available()
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch.nn as nn
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, AutoModelForSeq2SeqLM, TrainingArguments
from torch.utils.data import DataLoader
from transformers import GPT2Tokenizer, GPT2LMHeadModel, AutoModelForSequenceClassification

### Load the GPT-2 Large model

In [None]:
# Move the model to the device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load the GPT-2 Large model and tokenizer
print("Loading gpt2-large model...")
gpt2_large_model = AutoModelForCausalLM.from_pretrained("gpt2-large").to(device)

gpt2_large_tokenizer = AutoTokenizer.from_pretrained("gpt2-large")
print("Successfully loaded gpt2-large model.")


Loading gpt2-large model...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/666 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.25G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Successfully loaded gpt2-large model.


In [None]:
model=gpt2_large_model
tokenizer= gpt2_large_tokenizer

In [None]:
print(model)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 1280)
    (wpe): Embedding(1024, 1280)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-35): 36 x GPT2Block(
        (ln_1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=3840, nx=1280)
          (c_proj): Conv1D(nf=1280, nx=1280)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=5120, nx=1280)
          (c_proj): Conv1D(nf=1280, nx=5120)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=1280, out_features=50257, bias=False)
)


## Post-processing on the model
### Freezing the original weights
we need to apply some post-processing on the n-bit model to enable training, let's freeze all our layers, and cast the layer-norm in floatm for stability.

In [None]:
for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

###Display Trainable Parameters

In [None]:
def print_trainable_parameters(model):

    #Prints the number of trainable parameters in the model.

    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

##Parameter Efficient Fine Tuning
###Set up the LoRA Adapter
Here comes the magic with peft! Let's load a PeftModel and specify that we are going to use low-rank adapters (LoRA) using `get_peft_model` utility function from peft.

In [None]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=4,
    lora_alpha=32,
    target_modules=["c_attn"],
    #target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

## target_modules='v', This represents the value projection layer in the transformer model. The value projection layer transforms input tokens into value vectors,
# which are the actual values that are attended to based on the attention scores computed from query and key vectors.

## target_modules='q',This typically refers to the query projection layer in a transformer-based model. The query projection layer is responsible for transforming
# input tokens into query vectors, which are used to attend to other tokens in the sequence during self-attention mechanism.

#c_attn: This is the convolution layer that computes the query, key, and value projections. The "q_proj" and "v_proj" are part of this layer.

###Display trainable parameters

In [None]:
model = get_peft_model(model, lora_config)
print_trainable_parameters(model)

trainable params: 737280 || all params: 774767360 || trainable%: 0.09516146885692242




## Load Dataset

We can simply load our dataset from 🤗 Hugging Face with the `load_dataset` method!

In [None]:
from datasets import load_dataset

# Text Generation dataset (E2E NLG Challenge)
dataset = load_dataset("GEM/e2e_nlg")

README.md:   0%|          | 0.00/21.0k [00:00<?, ?B/s]

e2e_nlg.py:   0%|          | 0.00/4.95k [00:00<?, ?B/s]

dataset_infos.json:   0%|          | 0.00/9.87k [00:00<?, ?B/s]

The repository for GEM/e2e_nlg contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/GEM/e2e_nlg.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


Downloading data:   0%|          | 0.00/1.33M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/881k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/70.6k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/33525 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1484 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1847 [00:00<?, ? examples/s]

Generating challenge_train_sample split:   0%|          | 0/500 [00:00<?, ? examples/s]

Generating challenge_validation_sample split:   0%|          | 0/500 [00:00<?, ? examples/s]

Generating challenge_test_scramble split:   0%|          | 0/500 [00:00<?, ? examples/s]

###Tokenization of the dataset

In [None]:
# Add padding token for GPT-2
tokenizer.pad_token = tokenizer.eos_token

# Tokenize (dynamic padding instead of fixed 512)
tokenized_datasets = dataset.map(
    lambda x: tokenizer(x["meaning_representation"], truncation=True, padding="longest"),
    batched=True
)
# Display an example of the tokenized dataset
print(tokenized_datasets["train"][0])

In [None]:
# GPT-2-specific settings: Add padding tokens, as GPT-2 does not use padding by default
tokenizer.pad_token = tokenizer.eos_token  # GPT-2 uses <|endoftext|> as a padding token

# Define the tokenization function
def tokenize_function(examples):
    return tokenizer(
        examples["meaning_representation"],           # The "data" column contains the text in the E2E NLG dataset
        max_length=512,             # Max sequence length for GPT-2
        truncation=True,            # Truncate sequences longer than 512 tokens
        padding="max_length"        # Pad sequences shorter than 512 tokens
    )

# Tokenize the dataset
tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Display an example of the tokenized dataset
print(tokenized_datasets["train"][0])

Map:   0%|          | 0/33525 [00:00<?, ? examples/s]

Map:   0%|          | 0/1484 [00:00<?, ? examples/s]

Map:   0%|          | 0/1847 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

{'gem_id': 'e2e_nlg-train-0', 'gem_parent_id': 'e2e_nlg-train-0', 'meaning_representation': 'name[The Eagle], eatType[coffee shop], food[Japanese], priceRange[less than £20], customer rating[low], area[riverside], familyFriendly[yes], near[Burger King]', 'target': 'The Eagle is a low rated coffee shop near Burger King and the riverside that is family friendly and is less than £20 for Japanese food.', 'references': [], 'input_ids': [3672, 58, 464, 18456, 4357, 4483, 6030, 58, 1073, 5853, 6128, 4357, 2057, 58, 25324, 4357, 2756, 17257, 58, 1203, 621, 4248, 1238, 4357, 6491, 7955, 58, 9319, 4357, 1989, 58, 380, 690, 485, 4357, 1641, 23331, 306, 58, 8505, 4357, 1474, 58, 22991, 1362, 2677, 60, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 

In [None]:
tokenized_datasets.keys()

dict_keys(['train', 'validation', 'test', 'challenge_train_sample', 'challenge_validation_sample', 'challenge_test_scramble'])

We create a smaller subset of the full dataset to fine-tune our model

In [None]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(15000))
#
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(700))
small_eval_dataset = tokenized_datasets["validation"].shuffle(seed=42).select(range(1400))

##Training Process

In [None]:
#Import the necessary modules from the transformers library
import transformers
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling

###Train LoRA Adapter

In [None]:
#LoRA paper for hyperparameters for GPT-2 Medium
# Training Arguments
training_args = TrainingArguments(
    output_dir="./output_lora_gpt2",  # Directory for saving the model
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-4,
    lr_scheduler_type="linear",
    warmup_steps=500,
    num_train_epochs=5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    weight_decay=0.01,
    logging_dir="./logs_lora_gpt2",  # Directory for logging
    logging_steps=10,
    save_total_limit=2,  # Keep only 2 model checkpoints
    load_best_model_at_end=True,
    report_to="none",  # Disable reporting to WandB or other loggers
    fp16=True,  # Enable mixed precision training if you have a GPU
    #bf16=True

)

# Define a custom data collator for causal language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=False  # Causal LM does not use Masked Language Modeling (MLM)
)

In [None]:
from trl import SFTTrainer

In [None]:
# Initialize SFTTrainer
trainer = SFTTrainer(
    model=model,
    peft_config=lora_config,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation'],
    args=training_args
)

# Train the model
trainer.train()

# Evaluate the model
results = trainer.evaluate()
print(results)

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Epoch,Training Loss,Validation Loss


### Pushing the Model to the Hub

In [None]:
HUGGING_FACE_USER_NAME = ""
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
model_name = "gpt-2-Large-lora"

model.push_to_hub(f"{HUGGING_FACE_USER_NAME}/{model_name}", use_auth_token=True)



README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/2.96M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/mabc-3/gpt-2-Large-lora/commit/b4be16ea2326cde09976f073c2d085f252f16c09', commit_message='Upload model', commit_description='', oid='b4be16ea2326cde09976f073c2d085f252f16c09', pr_url=None, repo_url=RepoUrl('https://huggingface.co/mabc-3/gpt-2-Large-lora', endpoint='https://huggingface.co', repo_type='model', repo_id='mabc-3/gpt-2-Large-lora'), pr_revision=None, pr_num=None)

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "gpt-2-Large-lora"
peft_model_id = f"{HUGGING_FACE_USER_NAME}/{model_name}"

config = PeftConfig.from_pretrained(peft_model_id)
config.base_model_name_or_path = "gpt2-large"
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=False, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)


adapter_config.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

In [None]:
# Load the Lora model
lora_model = PeftModel.from_pretrained(model, peft_model_id)

adapter_model.safetensors:   0%|          | 0.00/2.96M [00:00<?, ?B/s]

In [None]:
print(lora_model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 1280)
        (wpe): Embedding(1024, 1280)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-35): 36 x GPT2Block(
            (ln_1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2SdpaAttention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D(nf=3840, nx=1280)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=1280, out_features=4, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=4, out_features=3840, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
  

## Memory Check

In [None]:
!pip install nvidia-ml-py3
!pip install pynvml
import pynvml

Collecting nvidia-ml-py3
  Downloading nvidia-ml-py3-7.352.0.tar.gz (19 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: nvidia-ml-py3
  Building wheel for nvidia-ml-py3 (setup.py) ... [?25l[?25hdone
  Created wheel for nvidia-ml-py3: filename=nvidia_ml_py3-7.352.0-py3-none-any.whl size=19173 sha256=19900c8c0377be6d5423183040f2d656cc376eb558a28bcfbd262954dd8954b8
  Stored in directory: /root/.cache/pip/wheels/5c/d8/c0/46899f8be7a75a2ffd197a23c8797700ea858b9b34819fbf9e
Successfully built nvidia-ml-py3
Installing collected packages: nvidia-ml-py3
Successfully installed nvidia-ml-py3-7.352.0
Collecting pynvml
  Downloading pynvml-12.0.0-py3-none-any.whl.metadata (5.4 kB)
Collecting nvidia-ml-py<13.0.0a0,>=12.0.0 (from pynvml)
  Downloading nvidia_ml_py-12.560.30-py3-none-any.whl.metadata (8.6 kB)
Downloading pynvml-12.0.0-py3-none-any.whl (26 kB)
Downloading nvidia_ml_py-12.560.30-py3-none-any.whl (40 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━

In [None]:
!nvidia-smi

[?1l>

In [None]:
def print_gpu_memory():
    print(f"Allocated memory: {torch.cuda.memory_allocated() / 1024**2:.2f} MB")
    print(f"Cached memory: {torch.cuda.memory_reserved() / 1024**2:.2f} MB")
    print(f"GPU utilization: {torch.cuda.utilization()}%")

In [None]:
print_gpu_memory()

Allocated memory: 3086.00 MB
Cached memory: 7124.00 MB


In [None]:
torch.cuda.empty_cache()
print("\nAfter emptying cache:")
print_gpu_memory()
print(f"Using device: {device}")


After emptying cache:
Allocated memory: 6148.25 MB
Cached memory: 6356.00 MB
Using device: cuda


##Benchmark on E2E Nlg

In [None]:
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, PeftConfig
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from rouge_score import rouge_scorer

In [None]:
# Load the LoRA model
lora_model = PeftModel.from_pretrained(model, peft_model_id).eval()

In [None]:
# Load the E2E NLG dataset
dataset = load_dataset("e2e_nlg")  # Automatically downloads the dataset

In [None]:

# GPT-2-specific settings: Add padding tokens, as GPT-2 does not use padding by default
tokenizer.pad_token = tokenizer.eos_token  # GPT-2 uses <|endoftext|> as a padding token

# Step 3: Define the tokenization function
def tokenize_function(examples):
    return tokenizer(
        examples["meaning_representation"],           # The "data" column contains the text in the E2E NLG dataset
        max_length=512,             # Max sequence length for GPT-2
        truncation=True,            # Truncate sequences longer than 512 tokens
        padding="max_length"        # Pad sequences shorter than 512 tokens
    )

# Step 4: Tokenize the dataset
tokenized_e2e_dataset = dataset.map(tokenize_function, batched=True)

# Step 5: Display an example of the tokenized dataset
print(tokenized_e2e_dataset["train"][0])

Map:   0%|          | 0/42061 [00:00<?, ? examples/s]

Map:   0%|          | 0/4672 [00:00<?, ? examples/s]

Map:   0%|          | 0/4693 [00:00<?, ? examples/s]

{'meaning_representation': 'name[The Vaults], eatType[pub], priceRange[more than £30], customer rating[5 out of 5], near[Café Adriatic]', 'human_reference': 'The Vaults pub near Café Adriatic has a 5 star rating.  Prices start at £30.', 'input_ids': [3672, 58, 464, 21314, 4357, 4483, 6030, 58, 12984, 4357, 2756, 17257, 58, 3549, 621, 4248, 1270, 4357, 6491, 7955, 58, 20, 503, 286, 642, 4357, 1474, 58, 34, 1878, 2634, 1215, 380, 1512, 60, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256

In [None]:
tokenized_e2e_dataset

DatasetDict({
    train: Dataset({
        features: ['meaning_representation', 'human_reference', 'input_ids', 'attention_mask'],
        num_rows: 42061
    })
    validation: Dataset({
        features: ['meaning_representation', 'human_reference', 'input_ids', 'attention_mask'],
        num_rows: 4672
    })
    test: Dataset({
        features: ['meaning_representation', 'human_reference', 'input_ids', 'attention_mask'],
        num_rows: 4693
    })
})

In [None]:
#small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_e2e_dataset["test"].shuffle(seed=42).select(range(700))
small_val_dataset = tokenized_e2e_dataset["validation"].shuffle(seed=42).select(range(500))

In [None]:
def preprocess_data(example):
    """Concatenate the input and output for evaluation."""
    return {
        "input_text": example["meaning_representation"],
        "target_text": example["human_reference"],
    }
# Preprocess the dataset
#processed_data= tokenized_e2e_dataset.map(preprocess_data)

# Ensure processed_data is initialized as a dictionary
processed_data = {}
# Preprocess the validation and test datasets
processed_data["validation"] = small_val_dataset.map(preprocess_data)
processed_data["test"] = small_eval_dataset.map(preprocess_data)


def tokenize_function(examples):
    return tokenizer(
        examples["meaning_representation"],
        max_length=512,
        truncation=True,
        padding="max_length",
        return_tensors="pt"
    )
# Evaluation function
def evaluate_model(model, tokenizer, dataset):
    smoothing = SmoothingFunction().method1
    bleu_scores = []
    rouge_scores = []

    # Initialize ROUGE scorer
    rouge_scorer_instance = rouge_scorer.RougeScorer(["rouge1", "rouge2", "rougeL"], use_stemmer=True)

    for example in dataset:
        input_text = example["input_text"]
        target_text = example["target_text"]

        # Tokenize and generate predictions
        input_ids = tokenizer(input_text, padding=True, truncation=True, return_tensors="pt").input_ids.to(model.device)
        attention_mask = tokenizer(input_text, padding=True, truncation=True, return_tensors="pt").attention_mask.to(model.device) # Generate attention mask
        with torch.no_grad():
            output_ids = model.generate(input_ids, attention_mask=attention_mask, max_length=100, num_beams=5, early_stopping=True, pad_token_id=tokenizer.eos_token_id) # Pass attention_mask to generate

        # Decode predictions
        prediction = tokenizer.decode(output_ids[0], skip_special_tokens=True)

        # Compute BLEU
        bleu_score = sentence_bleu(
            [target_text.split()], prediction.split(), smoothing_function=smoothing
        )
        bleu_scores.append(bleu_score)

        # Compute ROUGE
        rouge = rouge_scorer_instance.score(target_text, prediction)
        rouge_scores.append({
            "rouge1": rouge["rouge1"].fmeasure,
            "rouge2": rouge["rouge2"].fmeasure,
            "rougeL": rouge["rougeL"].fmeasure,
        })

    # Calculate average metrics
    avg_bleu = sum(bleu_scores) / len(bleu_scores)
    avg_rouge = {
        "rouge1": sum([r["rouge1"] for r in rouge_scores]) / len(rouge_scores),
        "rouge2": sum([r["rouge2"] for r in rouge_scores]) / len(rouge_scores),
        "rougeL": sum([r["rougeL"] for r in rouge_scores]) / len(rouge_scores),
    }

    return avg_bleu, avg_rouge

# Evaluate the model
print("Evaluating the model...")
validation_data = processed_data["validation"]
avg_bleu, avg_rouge = evaluate_model(lora_model, tokenizer, validation_data)

# Display results
print("\nEvaluation Results:")
print(f"Average BLEU: {avg_bleu:.4f}")
print(f"Average ROUGE-1: {avg_rouge['rouge1']:.4f}")
print(f"Average ROUGE-2: {avg_rouge['rouge2']:.4f}")
print(f"Average ROUGE-L: {avg_rouge['rougeL']:.4f}")

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Evaluating the model...

Evaluation Results:
Average BLEU: 0.0079
Average ROUGE-1: 0.4119
Average ROUGE-2: 0.1826
Average ROUGE-L: 0.3130


##Perform Inference

###Preprocess the input text

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
# Prompt
prompt = "Once upon a time,"

# Tokenize the input text
inputs = tokenizer(prompt, return_tensors="pt").to(device)
inputs

{'input_ids': tensor([[  40,  765,  284,  467,  284, 2869]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]], device='cuda:0')}

###Inference

In [None]:
with torch.no_grad():
    outputs = lora_model(**inputs)

outputs

CausalLMOutputWithCrossAttentions(loss=None, logits=tensor([[[ 1.3193,  3.7159, -0.7636,  ..., -5.0317, -5.4705, -0.4898],
         [ 1.1112,  3.8028, -2.9312,  ..., -5.9544, -6.8912,  0.5131],
         [ 0.5045,  1.7014, -3.1222,  ..., -5.6717, -6.5359, -0.1618],
         [ 2.5387,  5.8474, -2.6065,  ..., -4.4711, -5.7279,  0.8076]]],
       device='cuda:0'), past_key_values=((tensor([[[[-0.4846, -0.6254,  0.5279,  ...,  0.7841, -0.6010,  0.3478],
          [-0.4855,  0.4187, -0.8096,  ...,  0.2284, -0.6184,  0.3518],
          [-0.2926, -0.1219, -0.6358,  ...,  0.8763, -0.6979,  0.6054],
          [-0.1921, -0.1746, -0.5297,  ...,  0.3800, -0.2144,  0.7345]],

         [[-0.3534,  0.1836, -0.2220,  ..., -1.1325,  0.1057,  0.3329],
          [-0.9111,  0.2736,  0.2306,  ..., -0.3258,  0.7648,  0.5747],
          [-0.4435,  0.2053,  0.7954,  ..., -0.7270,  0.7719, -0.1025],
          [-0.3316,  0.1509,  0.8947,  ..., -1.0938,  0.9517,  0.3060]],

         [[-0.4387, -0.4209,  1.0123,  

In [None]:
# Generate text
output_ids = lora_model.generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    max_length=50,
    num_return_sequences=1
)

output_ids

tensor([[  40,  765,  284,  467,  284, 2869,   11,  475,  314,  836,  470,  760,
          611,  314,  460, 5368,  340,   13,  314,  765,  284,  467,  284,  262,
         1294,   11,  475,  314,  836,  470,  760,  611,  314,  460, 5368,  340,
           13,  314,  765,  284,  467,  284, 2031,   11,  475,  314,  836,  470,
          760,  611]], device='cuda:0')

###Post-process the output

In [None]:
# Decode the generated text
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print(generated_text)

I want to go to Japan, but I don't know if I can afford it. I want to go to the US, but I don't know if I can afford it. I want to go to Europe, but I don't know if


### Inference of GPT-2 Large Model

In [None]:
# Generate text
output_ids = gpt2_large_model.generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    max_length=50,
    num_return_sequences=1
)

output_ids

tensor([[7454, 2402,  257,  640,   11,  262,  995,  373,  257,  845, 1180, 1295,
           13,  383,  995,  373,  257, 1295,  810,  262,  661,  547, 1479,  284,
          466,  644,  484, 2227,  284,  466,   13,  383,  995,  373,  257, 1295,
          810,  262,  661,  547, 1479,  284,  466,  644,  484, 2227,  284,  466,
           13,  383]], device='cuda:0')

In [None]:
# Decode the generated text
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print(generated_text)

Once upon a time, the world was a very different place. The world was a place where the people were free to do what they wanted to do. The world was a place where the people were free to do what they wanted to do. The
