## Table of Contents

<ol>
    <li>
        <a href="#Preparing-the-data">Preparing the data</a>
    </li>
    <li>
        <a href="#Fine-tuning-the-model-with-the-Trainer-API">Fine-tuning the model with the Trainer API</a>
    </li>
    <li>
        <a href="#Using-the-fine-tuned model">Using-the-fine-tuned model</a>
</ol>

In [1]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

import torch
print(torch.cuda.is_available())
print(torch.cuda.get_device_name())

True
Tesla P40


## Preparing the data

In [2]:
from datasets import load_dataset

In [3]:
# Load dataset
dataset_name = "locchh/nvidia_qa"
dataset = load_dataset(dataset_name)
dataset


DatasetDict({
    train: Dataset({
        features: ['question', 'answer'],
        num_rows: 9082
    })
    validation: Dataset({
        features: ['question', 'answer'],
        num_rows: 1135
    })
    test: Dataset({
        features: ['question', 'answer'],
        num_rows: 1136
    })
})

In [4]:
from transformers import AutoTokenizer

# Load the tokenizer
model_name = "HuggingFaceTB/SmolLM-360M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

In [5]:
# Tokenize the dataset
def preprocess_function(example):
    # Extract the question and answer fields
    question = example["question"]
    answer = example["answer"]
    
    # Tokenize the questions and answers
    model_inputs = tokenizer(
        question,
        text_target=answer,
        max_length=64,
        truncation=True,
        padding="max_length",
        padding_side='left'
    )
    return model_inputs


In [6]:
# Tokenize the dataset
tokenized_datasets = dataset.map(preprocess_function, batched=True)

# Remove the "question" and "answer" columns
tokenized_datasets = tokenized_datasets.remove_columns(["question", "answer"])

# Inspect the tokenized dataset
tokenized_datasets


Map:   0%|          | 0/1135 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 9082
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 1135
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 1136
    })
})

## Fine-tuning the model with the Trainer API

### Load the model

In [7]:
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(model_name)

### Set Up Data Collator

In [8]:
from transformers import DataCollatorForSeq2Seq

data_collator = DataCollatorForSeq2Seq(
    tokenizer=tokenizer, 
    model=model,
)

### Define Metrics

In [9]:
import evaluate

bleu_metric = evaluate.load("bleu")
rouge_metric = evaluate.load("rouge")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # BLEU score
    bleu_score = bleu_metric.compute(
        predictions=decoded_preds, 
        references=[[label] for label in decoded_labels]
    )

    # ROUGE score
    rouge_score = rouge_metric.compute(
        predictions=decoded_preds, 
        references=decoded_labels
    )

    # Combine metrics
    return {
        "bleu": bleu_score["bleu"],
        "rouge1": rouge_score["rouge1"],
        "rouge2": rouge_score["rouge2"],
        "rougeL": rouge_score["rougeL"],
        "rougeLsum": rouge_score["rougeLsum"]
    }

### Configure TrainingArguments

In [10]:
from transformers import Seq2SeqTrainingArguments

# Training arguments
training_args = Seq2SeqTrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    fp16=True,
    learning_rate=5e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=2,  # Accumulates gradients over 2 steps
    num_train_epochs=3,
    weight_decay=0.01,
    save_strategy="epoch",
    save_total_limit=2,
    predict_with_generate=True,
    generation_max_length=128,  # Adjusted for evaluation
    logging_dir="./logs",
    logging_steps=100,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


### Initialize Trainer

In [11]:
from transformers import Seq2SeqTrainer

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

Evaluate before training (with `max_length=64, generation_max_length=128`)

```python
results = trainer.evaluate()
print(results)
```

```
{'eval_loss': 12.087650299072266, 'eval_model_preparation_time': 0.0023, 'eval_bleu': 0.05271458522493262, 'eval_rouge1': 0.2413672853441667, 'eval_rouge2': 0.10041026752558167, 'eval_rougeL': 0.19910976781171236, 'eval_rougeLsum': 0.1985402147035917, 'eval_runtime': 600.7612, 'eval_samples_per_second': 1.889, 'eval_steps_per_second': 0.236}
```

In [12]:
results = trainer.evaluate(tokenized_datasets["test"])
print(results)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Settin

{'eval_loss': 12.064384460449219, 'eval_model_preparation_time': 0.0023, 'eval_bleu': 0.05116443610320274, 'eval_rouge1': 0.2392997743874297, 'eval_rouge2': 0.09714491532321688, 'eval_rougeL': 0.19789460396437447, 'eval_rougeLsum': 0.19725636953722037, 'eval_runtime': 600.7183, 'eval_samples_per_second': 1.891, 'eval_steps_per_second': 0.236}


Train model

In [15]:
trainer.train()

Epoch,Training Loss,Validation Loss,Model Preparation Time,Bleu,Rouge1,Rouge2,Rougel,Rougelsum
1,3.3886,3.396369,0.0023,0.038266,0.197034,0.096622,0.167884,0.167758
2,3.1369,3.180791,0.0023,0.038691,0.244164,0.126787,0.206914,0.206961
3,2.9652,3.137582,0.0023,0.041257,0.238843,0.124305,0.202606,0.202484


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Settin

TrainOutput(global_step=1704, training_loss=3.3085239912982276, metrics={'train_runtime': 3096.553, 'train_samples_per_second': 8.799, 'train_steps_per_second': 0.55, 'total_flos': 3291859453132800.0, 'train_loss': 3.3085239912982276, 'epoch': 3.0})

Evaluate after training

```python
results = trainer.evaluate()
print(results)
```
```
{'eval_loss': 3.1375815868377686, 'eval_model_preparation_time': 0.0023, 'eval_bleu': 0.04125730379860123, 'eval_rouge1': 0.23884283985403787, 'eval_rouge2': 0.1243053358375113, 'eval_rougeL': 0.20260632793153205, 'eval_rougeLsum': 0.20248449830987356, 'eval_runtime': 601.0729, 'eval_samples_per_second': 1.888, 'eval_steps_per_second': 0.236, 'epoch': 3.0}
```

In [28]:
results = trainer.evaluate(tokenized_datasets["test"])
print(results)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Settin

{'eval_loss': 3.1290555000305176, 'eval_model_preparation_time': 0.0023, 'eval_bleu': 0.039062312441746835, 'eval_rouge1': 0.23370283677137543, 'eval_rouge2': 0.11702451257369792, 'eval_rougeL': 0.19746009472662995, 'eval_rougeLsum': 0.1972524813533446, 'eval_runtime': 600.6392, 'eval_samples_per_second': 1.891, 'eval_steps_per_second': 0.236, 'epoch': 3.0}


## Using-the-fine-tuned model