# Lab on Prompting and fine-tuning

In this lab, we will cover the basics that are needed to perform the homework 2 on Prompting and Fine-Tuning LLMs. As you will see in the next section, you will see that the order of the lab is sligthly different from the HW. This is intentional, as the fine-tuning of a model will take additional time, whereas the Few-Shot exercise (2) builds on the knowledge of exercise 1.

## Layout of the Lab

We will cover the following in the first part;

1. Preparation of the `model` and `dataset`,
2. Pre-processing of the `dataset`, using `jinja2` templating,
3. Running inference experiments.

In the second part, we will focus on;

1. Fine-Tuning a model,
2. Hyper-parameters to consider,
3. Running inference with a fine-tuned model.

In the third part, which will likely be mostly the 2nd lab (next week), focusses on ;
1. Few-shot Learning,
2. Creating context,
3. Running inference experiemnts.


## Installing dependencies
Just to be sure, run the following to install (any missing) dependencies. You only have to run this once if you have persistent storage for you `venv` or `conda` environment. However, jsut to be shure, you might want to run this again. 

> Depending on your bandwith, disk, and CPU this might take a while.


In [None]:
# Install the packages, if you run this a second time with persistent storage, feel free to skip.
%pip install numpy~=1.26.0 torch~=2.2.1 transformers accelerate datasets bitsandbytes sentencepiece peft accelerate nbconvert==6.5.4 pypdf2==2 "lxml[html_clean]" notebook-as-pdf seaborn

In [None]:
# This will be (most) of the packages that y0ou will need during the lab.
# Make sure to run this cell each time you (re-)start the IPython kernel.

import textwrap
import warnings
from importlib import metadata

import datasets
import jinja2
from collections import defaultdict
import itertools

import torch
import transformers
from IPython.display import display, HTML, Markdown
from tqdm.auto import tqdm
from transformers import T5Tokenizer, T5ForConditionalGeneration

In [None]:
def display_dataset_description(name: str, dataset: datasets.DatasetDict):
    """Helper method to display information about splits in the dataset.
    
    Args:
        name (): 
        dataset (): 

    Returns:

    """"""
    Args:
        name (): 
        dataset (): 

    Returns:

    """
    split_info = []
    for k, ds in dataset.items():
        split_info.append(f"<tr><td><strong>{k.capitalize()} Samples:</strong></td><td>{len(ds)}</td></tr>")
    html_content = f"""
    <h2>Dataset info</h2>
    <table>
        <tr><td><strong>Dataset Name:</strong></td><td>{name}</td></tr>
        {"<br>".join(split_info)}
    </table>
    """
    
    # Display the output in the notebook
    display(HTML(html_content))

def get_available_device():
    """Helper method to find best possible hardware to run
    Returns:

    """
    # Check if CUDA is available
    if torch.cuda.is_available():
        return torch.device("cuda"), "cuda"
    
    # Check if ROCm is available
    if torch.version.hip is not None and torch.backends.mps.is_available():
        return torch.device("rocm"), "rocm"
    
    # Check if MPS (Apple Silicon) is available
    if torch.backends.mps.is_available():
        return torch.device('cpu'), "mps"
    
    
    # Fall back to CPU
    return torch.device("cpu"), "cpu"

def get_installed_version(package_name):
    with warnings.catch_warnings():
        # Supress warnings from packages that have missing attributes that metadata will complain about.
        warnings.simplefilter("ignore")
        distribution = metadata.Distribution()
        try:
            return distribution.from_name(package_name).version
        except metadata.PackageNotFoundError:
            return "Not installed"


def display_configuration():
    # Check device info
    device, backend = get_available_device()

    # Torch version
    torch_version = torch.__version__

    # HuggingFace Transformers version
    transformers_ver = transformers.__version__

    # BitsAndBytes version (if available)
    bitsandbytes_version = get_installed_version("bitsandbytes")

    # Check for GPU-specific details if CUDA or ROCm is available
    if device.type == "cuda":
        cuda_device_count = torch.cuda.device_count()
        cuda_device_name = torch.cuda.get_device_name(0)
        cuda_version = torch.version.cuda
    elif device.type == "rocm":
        cuda_device_count = torch.cuda.device_count()
        cuda_device_name = torch.cuda.get_device_name(0)
        cuda_version = torch.version.hip
    else:
        cuda_device_count = 0
        cuda_device_name = "N/A"
        cuda_version = "N/A"

    # Prepare HTML formatted output for better display in a notebook
    html_content = f"""
    <h2>System Configuration</h2>
    <table>
        <tr><td><strong>PyTorch version:</strong></td><td>{torch_version}</td></tr>
        <tr><td><strong>Device:</strong></td><td>{device} (Backend: {backend})</td></tr>
        <tr><td><strong>CUDA/ROCm version:</strong></td><td>{cuda_version}</td></tr>
        <tr><td><strong>GPU count:</strong></td><td>{cuda_device_count}</td></tr>
        <tr><td><strong>GPU name:</strong></td><td>{cuda_device_name}</td></tr>
        <tr><td><strong>Hugging Face Transformers version:</strong></td><td>{transformers_ver}</td></tr>
        <tr><td><strong>BitsAndBytes version:</strong></td><td>{bitsandbytes_version}</td></tr>
    </table>
    """
    
    # Display the output in the notebook
    display(HTML(html_content))


deterministic_config: transformers.GenerationConfig = transformers.GenerationConfig(do_sample=False, max_length=100, min_length=75, repetition_penalty=1.19)

def label_mapper(label: int) -> str:
    """Map label from int to string!"""
    return ['Negative', "Positive"][label]

def simple_truncate_text(row, max_length=50, tokenizer: transformers.PreTrainedTokenizerFast = None):
    """Example of a simple truncation method text, based on token count.
    
    You might want to perform 'smarter' truncation / summarization as a level, instead of simply cutting of after `max_length` tokens.
    
    Examples:
        You might want to partially-apply the function, to provide a different tokenizer:
        ```python3
        from functools import partial
        some_other_tokenizer = transformers.AutoTokenizer.from_pretrained('your_fave_tokenizer')
        partial_simple_truncate = partial(simple_truncate_text, tokenizer=some_other_tokenizer)
        ```
    Args:
        row (datasets....): Single instance or row of dataset.
    
    Keyword Args:
        max_length (int, 150): the maximum length of text to be processed. Defaults to 150.
        tokenizer (transformers.PreTrainedTokenizer, `fast_tokenizer`): the tokenizer to use. Defaults to `fast_tokenizer`.
    
    Notes:
        This function requires all cells above to be run.
    """
    token_representation = tokenizer.batch_encode_plus(row['text'], max_length=max_length, truncation=True)['input_ids']
    text_representation = tokenizer.batch_decode(token_representation, skip_special_tokens=True)
    row['text'] = text_representation
    return row

def generate(
        input_text: str,
        tokenizer: transformers.PreTrainedTokenizer,
        model: transformers.PreTrainedModel,
        generation_config: transformers.GenerationConfig = deterministic_config,
) -> str:
    """Helper method to generate a sample from the model, pre-conditioned on the input-text 'Prompt'.
    
    Args:
        input_text (str): Input text to be conditioned on.
        tokenizer (transformers.PreTrainedTokenizer): Tokenizer corresponding to the model provided.
        model (transformers.PreTrainedModel): Pre-trained model to perform text generation with.
        
    Returns:
        Generate text by the pre-conditioned model.
    """
    
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(model.device)
    outputs = model.generate(input_ids, generation_config=generation_config)
    result = tokenizer.decode(outputs[0])
    return result

In [None]:
# Retrieve 'best' backend and device
device, backend = get_available_device()

# Default bfloat16, because there is a lot of optimization
dtype = torch.bfloat16

# Optional bits-and-bytes configuration for additional quantization.
bab_conf = None
if backend == 'cuda':
    # If you want, you can further quantize on CUDA devices (linux and WSL)
    # However, this is more for you to explore than anything else.
    bab_conf =  transformers.BitsAndBytesConfig(
        load_in_8bit=False
    )

# Call the display_configuration() function in your Jupyter notebook to show the configuration
display_configuration()

# Part 1: Preparing all the things

Before we get started with our small lab experiment, we need to make sure that everything is prepared. Let's get started with setting up a small language model, and and loading and preparing the data.

Recall from the lecture that this consists of the following 'recipe'.

1. Load the model and data.
   1. Load pre-trained or fine-tuned model
   2. Load dataset and tokenize
2. Run the data through the model
3. Perform experiments (+ Analysis)
    

## Step 1: Preparing The Model
Loading the model and see that it work, we will use the Flan-T5 model by Google / DeepMind. This model is tiny and should be fast enough even on lower powered hardware!


In [None]:
# Create tokenizer for flan family
family: str = "google/flan-t5"

# For the Lab we will use a small model, just to provide some insight into usability.
model: str = f"-small"

model_name: str = f"{family}{model}"
# Create tokenizer
tokenizer: T5Tokenizer = T5Tokenizer.from_pretrained(model_name, legacy=False)
dtype = torch.bfloat16
# Instantate model and load to the correct device.
model: T5ForConditionalGeneration = T5ForConditionalGeneration.from_pretrained(
    pretrained_model_name_or_path=model_name,
    device_map='cpu',
    torch_dtype=dtype,
)

In [None]:
# Here we check that everything is working, note that the story should be quite bad, as T5 is not really trained to tell us stories.

input_text = "Write a story about a dog and a boy playing with a ball on a boat with sailors."

display(
    Markdown("### Generated text by the LLM"),
    Markdown(f"> {input_text}"),
    Markdown(
        generate(
            input_text=input_text,
            tokenizer=tokenizer,
            model=model,
        )
    ),
    Markdown(
        generate(
            input_text=input_text,
            tokenizer=tokenizer,
            model=model,
            generation_config=transformers.GenerationConfig(do_sample=True, max_length=100, min_length=75, repetition_penalty=1.19)
        )
    )
)


## Step 1.5 Preparing the Data.

As we will be working with a Semtiment 'classification' task, as the only labels are `Postive` (`1`) or `Negative` (`0`). First, we will need to load the appropriate dataset (`standfordnlp/imbd`), which contains movie reviews and their respective label. During the rest of the lab, we will further investigate how to do pre-processing of the data, run (different types of) inference, and perform fine-tuning.

In [None]:
# Define dataset name
data_name: str = 'stanfordnlp/imdb'

# Load dataset, and assign splits to variables
dataset: datasets.DatasetDict = datasets.load_dataset(data_name)
train_set: datasets.Dataset = dataset['train']
test_set: datasets.Dataset = dataset['test']
# This unsupervised split is not used in the rest of the notebook.
unsup: datasets.Dataset = dataset['unsupervised']

# Give an overview
display_dataset_description(data_name, dataset)


In [None]:

sample = train_set[1231]
review_1, label_1 = sample['text'], label_mapper(sample['label'])
sample = train_set[15442]
review_2, label_2 = sample['text'], label_mapper(sample['label'])


display(
    Markdown(
        textwrap.dedent(f"""\
        | *Example*                 | Label     |
        |:--------------------------|:---------:|
        | {review_1}                | {label_1} |
        | {review_2}                | {label_2} |
        """)
    )
)

Next we will create some dataloader to ensure that we can quickly load data into the model, making the rest of the cells load a little faster.

Let's also define some library functions, that we can use to calculate the performance of the model.

In [None]:
# First determine some hyper-parameters, this should be fine on even a small model and CPU only

# If you have a GPU / powerful machine, feel free to increase the following
batch_size = 1
test_samples = 100
max_iterations = test_samples // batch_size

# Approach 1: Simple Prompting

Rather than going straight into a complex solution, let's first see what we can achieve by letting the model predict the output.


> Note, I annotate the 'steps' in comments. There might be code sections that we will fill in during the lab, annotated with.


```python
# YOUR CODE GOES HERE!
# END OF YOUR CODE!
```


In [None]:

def simple_prompt_function(batch, tokenizer=tokenizer, max_length=150):
    """Simple prompt preparation function."""
    stringified_representation = list(map(lambda x: f"Positive/Negative? {x})", batch['text']))
    batch['text'] = stringified_representation
    return batch

def simple_template_function(batch, template=None, tokenizer=tokenizer, max_length=150):
    """Mapping function using a template. Note, we will show in the lab to set this up."""
    stringified_representation = [template.render(review=review) for review in batch['text']]
    batch['text'] = stringified_representation
    return batch

In [None]:

sub_sampled_set = test_set.shuffle(seed=123).take(test_samples)
tokenized_eval_dataset = (
    sub_sampled_set
    .map(simple_prompt_function, batched=True)
    .map(lambda batch: tokenizer(batch['text']))
)

# TODO: Let's re-write to use a template!
# YOUR CODE GOES HERE
...
# END OF YOUR CODE

# Display the de-tokenized text
display(
    Markdown('### What the model `sees`'),
    Markdown(
        f"""{tokenized_eval_dataset[0]['input_ids'][:100]}..."""
    ),
    Markdown('### What we would `see`'),
    Markdown(
        f"""{tokenizer.decode(tokenized_eval_dataset[0]['input_ids'], skip_special_tokens=True)}"""
    )
)

# Part 2: Running Inference

Now that we got the setup out of way, we can start 'running experiments'. In short this boils down to performing 3 steps;

1. Choosing your hyper-parameters and choosing appropriate levels
2. Getting a script ready to run your experiments
3. ** Run the experiments.** (Or, an excellent time to get coffee :P)

This part of the lab will focus on that last point, to ensure that you can run your experiment efficiently, in the tutorial we are going to fix some issues with the code below, and make it run *fast*er (with some caveats).

In [None]:
def evaluate_fn(data_loader, model=model):
    labels_list = []
    prediction_list = []
    for batch in tqdm(data_loader):
        input_ids, attention_mask, label = batch['input_ids'].to(model.device), batch['attention_mask'].to(model.device), batch['label'].to(model.device)
        outputs = model.generate(
          input_ids,
          attention_mask=attention_mask,
          max_length=5,
          do_sample=False,
        )
        prediction = tokenizer.batch_decode(outputs, skip_special_tokens=True)
        prediction_list.append(prediction)
        labels_list.append(label.tolist())
    return prediction_list, labels_list

In [None]:

# YOUR CODE GOES HERE
...
# END OF YOUR CODE 
# 5. Set the format of the dataset to PyTorch Tensors
eval_loader = torch.utils.data.DataLoader(
    tokenized_eval_dataset,
    batch_size=1,
    shuffle=True,
)


# And run the evaluation
predictions_list, labels_list = evaluate_fn(eval_loader, model=model)

### Retrieving the Results

Lastly, we will inspect the results of this 'experiment'. Think about some of the caveats, and how to addres them in code (don't worry, the HW does not have (all) caveats), but it is good to be aware of them!).

In [None]:
def get_evaluation(predictions_list, labels_list):
    flat_predictions = list(itertools.chain(*predictions_list))
    flat_labels = list(itertools.chain(*labels_list))
    
    label_lut = defaultdict(lambda *args: -1, {'positive': 1, 'negative': 0})
    predictions = list(map( lambda x: label_lut[x.split(' ')[0].lower()], flat_predictions))
    
    
    accuracy = sum(map(lambda x: x[0] == x[1], zip(predictions, flat_labels))) / len(flat_labels)
    unknown = sum(map(lambda x: x[0] == -1, zip(predictions, flat_labels))) / len(flat_labels)
    
    return accuracy, unknown


accuracy, unknown = get_evaluation(predictions_list, labels_list)
display(
    Markdown(
        textwrap.dedent(
            f"""\
            | *Accuracy*  | *Unknown*      |
            |:------------|:---------------|
            | {accuracy}  | {unknown}      |
            """)),
)

In [None]:
import peft
from peft import LoraConfig, get_peft_model, TaskType
from typing import Tuple


def get_peft_details(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return  trainable_model_params, all_model_params, (100 * trainable_model_params) / all_model_params

def tokenize_function(
        batch,
        prefix='Is the following Positive or Negative?\n',
        post_fix='\nAnswer: '):

    updated_text = [f"{prefix}{review}{post_fix}" for review in batch["text"]]
    batch['text'] = updated_text
    # We also set the 'response', i.e., what the model should learn
    batch['labels'] = tokenizer(['Positive' if label == 1 else 'Negative' for label in batch["label"]], truncation=True, padding='max_length', return_tensors="pt").input_ids
    
    return batch

def train_model(
        peft_model,
        peft_training_args,
        train_set,
        test_set = None,
        output_dir: str = 'llm-lab',
) -> Tuple[transformers.Trainer, peft.PeftModel]:
    peft_trainer = transformers.Trainer(
        model=peft_model,
        args=peft_training_args,
        train_dataset=train_set,
        eval_dataset=test_set,
        output_dir=output_dir,
    )
    # Pre-train the model
    peft_trainer.train()
    # Set the fine-tuned model to evaluate, to remove non-deterministic
    #  behavior.
    peft_model.eval()
    return peft_model, peft_trainer

## Step 1: Prepare Data and Model

Let's continue by preparing a model, note that a lot of this is 'boiler-plate', to reduce the trainign time considerably.

If you are interested, and/or want to apply this to your project, we recommend looking into (HF tutorials of) the following:

1. Quantization, where and how to apply it. For fine-tuning, we use this in the `LoraConfig`, or Low-Rank Adaptation config, which approximates the full-rank of the gradient with a lower-rank decomposition, thereby considerably reducign the overehad brought by the back-propagation
2. Data-types, and when to use them; besides working well for training, inference may also benefit from quantization. In general, experiments are run in 'half' precision (`torch.float16` or `torch.bfloat16`), but lower preicsion exists as well (as low as 1 bit (XOR-quantization)).

In [None]:
# Example of hyper-parameters.
RANK = 16               # Rank used in model update (lower is faster, less precise)
ALPHA = 32              # Scaling factor for update (âˆ†W x dy ALPHA/RANK)           
DROPOUT = 0.05          # Regularization term
TRAIN_BATCH_SIZE = 32   # Number of samples
TRAIN_EPOCHS = 5        # Total number of training steps.

# If you want to save some time, you can store checkpoints, and load them, to create multiple levels
# in a single run. Do note, that huggingface by default uses learning-rate scheduling, so this may
# affect your results a bit.

# The modules are specific to the model itself.
MODULES = None #   ['o', 'k', 'q', 'v', '*'] # 'or any other identifier of weights.
TORCH_DTYPE = torch.bfloat16

# TODO: Decide the levels for your experiment. These can be any of the 
# aforementioned parameters, or any other hyper-parameter.
config = lora_config = LoraConfig(
    r=RANK,
    lora_alpha=ALPHA,
    target_modules=MODULES,
    lora_dropout=DROPOUT,
    bias='none',
    task_type=TaskType.SEQ_2_SEQ_LM
)

peft_model = get_peft_model(
    model=model,
    peft_config=config,
)


In [None]:

pft, orig, pct = get_peft_details(peft_model)

display(
    Markdown(
        textwrap.dedent(
            f"""\
            | Parameter        | Statistic |
            |:-----------------|:----------|
            | Original         | {orig}    |
            | PEFT             | {pft}     |
            | Percentage       | {pct:.2f}%|
            """
        )
    )
)


## Step 2: And Lift-off (ish)
Let's do a round of training, and look at'er go.

In [None]:

# If the number of tokens is a level, you might need to change this
train_dataset = (
    train_set
    .map(limit_to_100_tokens, batched=True)
    .map(
        tokenize_function, batched=True
    )
    .map(
        lambda batch: tokenizer.batch_encode_plus(
            batch['text'],
            add_special_tokens=True,
            return_tensors="pt",
            padding=True,
            truncation=False,
        ), batched=True
    )
)
# Ensure we can effectively use the model
train_dataset.set_format(type='torch', columns=['input_ids', 'labels'])
output_dir = 'llm_lab/t5-small'
peft_model, peft_trainer = train_model(
    peft_model=peft_model,
    peft_training_args=peft_training_args,
    output_dir=output_dir,
    train_set=train_dataset,
    test_set=None,
)
peft_model.save_pretrained(output_dir)


### Step 2.1 Let's evaluate the model...

Can you think fo some caveats fo the model, what would happen if:

1. We change the prompt after training?
2. We change the input length of the model?
3. We want to include additional sentiments, such as `neutral`?



In [None]:
test_dataset = (
    test_set
    .map(limit_to_100_tokens, batched=True)
    .map(
        tokenize_function, batched=True
    )
    .map(
        lambda batch: tokenizer.batch_encode_plus(
            batch['text'],
            add_special_tokens=True,
            return_tensors="pt",
            padding=True,
            truncation=False,
        ), batched=True
    )
)
# Ensure we can effectively use the model
test_dataset.set_format(type='torch', columns=['input_ids', 'labels'])
test_dataloader = torch.utils.data.DataLoader(
    dataset=dataset,
    batch_size=15,  # Feel free to lower / higher this
    shuffle=False,  # Shuffling not needed during evaluation
    num_workers=2,  # Feel free to set this to -1 or 1, esp. on CPU 
    prefetch_factor=10,
)
result, unknown = evaluate_fn(test_dataloader, model=peft_model)


# Part 3: Few-Shot inference

(Before) next time, the notebook will be updated to contain the relevant few-shot parts. Stay tuned!

In [None]:
# TODO: Implement me!