<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/ai_agent_tutorial_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Cell 1: Contains the self-attention implementation (which, importantly, is not used in the SQuAD example, but is included as per the original request) and the unit tests for it. This cell is independent and can be run on its own to verify the self-attention code.

Cell 2: Contains the complete SQuAD 2.0 fine-tuning example using Hugging Face Transformers. This cell is also self-contained and can be run independently after Cell 1 has been run (or even without running Cell 1, as it doesn't directly depend on it). The SQuAD example uses pre-trained models from Hugging Face, which already have self-attention built-in.

In essence, Cell 3 creates a custom code agent leveraging a fine-tuned language model to answer questions based on given contexts, enhanced by a custom logging mechanism.


Cell 3 defines a custom question-answering agent named MyCustomAgent using the smolagents and transformers libraries.

Key functionalities:

Custom Logger: Extends the AgentLogger to log agent activities, including model information and task details with configurable log levels (like DEBUG, INFO).

Model Loading: Loads a pre-trained question-answering model and tokenizer from a specified checkpoint, utilizing the GPU (cuda) if available.

Agent Logic:

The MyCustomAgent class handles:
Preprocessing the prompt into context and question.
Tokenizing the input for the model.
Performing inference to get model outputs.
Using beam search to extract the best answer span from the context based on model predictions.
Example Usage:

Creates an instance of MyCustomAgent, providing it with a search tool (DuckDuckGoSearchTool).
Demonstrates how to use the agent with a sample context and question, then prints the agent's answer.

So, to summarize:

Run Cell 1 (optional, if you want to test the self-attention implementation).

Run Cell 2 (to train and evaluate the SQuAD model).

You are partially correct. Here's a breakdown of the GPU requirements for each cell:

*   **Cell 1 (Self-Attention Implementation and Unit Tests):** This cell does *not* strictly require a GPU. The unit tests are designed to run on small, randomly generated tensors.  It will run perfectly fine (and very quickly) on a CPU.

*   **Cell 2 (SQuAD 2.0 Fine-tuning):** This cell *strongly benefits* from a GPU, and for practical training, a GPU is highly recommended. Here's why:

    *   **Large Model:** The `distilbert-base-uncased` (or `bert-base-uncased`) model is a large neural network with millions of parameters.  Training and even evaluation on a CPU would be *extremely* slow (taking hours or even days for a few epochs).
    *   **Large Dataset:** SQuAD 2.0 is a substantial dataset.  Processing it, even with batching, is computationally intensive.
    *   **Matrix Operations:** The core operations within the Transformer model (including self-attention, which is part of DistilBERT/BERT) are heavily reliant on matrix multiplications. GPUs are highly optimized for these kinds of operations.

    **Can you run Cell 2 on a CPU?** Technically, yes. The code includes `device = torch.device("cuda" if torch.cuda.is_available() else "cpu")`, which will automatically use the CPU if a GPU is not detected.  However:

    *   **Expect it to be *very* slow.**  Even a single epoch could take many hours.
    *   **You might run out of memory.**  If you have limited RAM, the model and data might not fit in memory, leading to crashes.

**In summary:**

*   Cell 1: GPU is *not* required.
*   Cell 2: GPU is *highly recommended* for practical training.  It *can* run on a CPU, but expect extremely slow performance.

**If you don't have a GPU:**

1.  **Google Colab (Free):** The easiest option is to use Google Colab, which provides free access to GPUs (and TPUs). The code is already designed to work in a Jupyter/Colab environment.  Just make sure to select a GPU runtime (Runtime -> Change runtime type -> Hardware accelerator -> GPU).
2.  **Cloud GPU Services:**  Consider cloud providers like AWS, Google Cloud, or Azure, which offer GPU instances. This will usually involve some cost.
3.  **Reduce Batch Size and Epochs (for CPU testing):** If you *must* run on a CPU for initial testing, drastically reduce the `batch_size` (e.g., to 2 or 4) and `num_epochs` (e.g., to 1) to make it somewhat manageable. But don't expect meaningful results with such limited training.
4. **Smaller model** Use a small pre-trained model, as `distilbert-base-uncased`.

The provided code is set up to handle both CPU and GPU execution automatically, but a GPU is essential for any realistic training on a dataset like SQuAD 2.0.


## CELL1

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
import unittest

class ScaledDotProductAttention(nn.Module):
    """
    Scaled Dot-Product Attention.  The core self-attention mechanism.
    """
    def __init__(self, d_k):
        super(ScaledDotProductAttention, self).__init__()
        self.d_k = d_k  # Dimension of the key (and query)

    def forward(self, Q, K, V, mask=None):
        """
        Forward pass of the scaled dot-product attention.
        """
        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)
        if mask is not None:
            scores = scores.masked_fill(mask == 0, -1e9)
        attn_weights = F.softmax(scores, dim=-1)
        output = torch.matmul(attn_weights, V)
        return output, attn_weights

class MultiHeadAttention(nn.Module):
    """
    Multi-Head Attention layer.
    """
    def __init__(self, d_model, n_heads):
        super(MultiHeadAttention, self).__init__()
        self.n_heads = n_heads
        self.d_model = d_model
        assert d_model % n_heads == 0, "d_model must be divisible by n_heads"
        self.d_k = d_model // n_heads
        self.d_v = d_model // n_heads
        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)
        self.W_o = nn.Linear(d_model, d_model)
        self.attention = ScaledDotProductAttention(self.d_k)

    def forward(self, Q, K, V, mask=None):
        batch_size = Q.size(0)
        Q = self.W_q(Q).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
        K = self.W_k(K).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
        V = self.W_v(V).view(batch_size, -1, self.n_heads, self.d_v).transpose(1, 2)
        if mask is not None:
            mask = mask.unsqueeze(1)
        x, attn_weights = self.attention(Q, K, V, mask=mask)
        x = x.transpose(1, 2).contiguous().view(batch_size, -1, self.d_model)
        output = self.W_o(x)
        return output, attn_weights

class MaskedSelfAttention(nn.Module):
  """Masked self attention"""
  def __init__(self, d_model, n_heads):
    super().__init__()
    self.attention = MultiHeadAttention(d_model,n_heads)

  def forward(self, x):
    seq_len = x.shape[1]
    mask = torch.tril(torch.ones((seq_len,seq_len))).to(x.device)
    mask = mask.unsqueeze(0)
    output, attention_weights = self.attention(x,x,x, mask = mask)
    return output, attention_weights

class TestSelfAttention(unittest.TestCase):
    def setUp(self):
        """Setup method."""
        self.batch_size = 2
        self.seq_len_q = 5
        self.seq_len_k = 7
        self.d_model = 128
        self.n_heads = 8
        self.d_k = self.d_model // self.n_heads
        self.Q = torch.randn(self.batch_size, self.seq_len_q, self.d_model)
        self.K = torch.randn(self.batch_size, self.seq_len_k, self.d_model)
        self.V = torch.randn(self.batch_size, self.seq_len_k, self.d_model)

    def test_scaled_dot_product_attention_no_mask(self):
        attention = ScaledDotProductAttention(self.d_k)
        Q = torch.randn(self.batch_size, self.n_heads, self.seq_len_q, self.d_k)
        K = torch.randn(self.batch_size, self.n_heads, self.seq_len_k, self.d_k)
        V = torch.randn(self.batch_size, self.n_heads, self.seq_len_k, self.d_k)
        output, attn_weights = attention(Q, K, V)
        self.assertEqual(output.shape, (self.batch_size, self.n_heads, self.seq_len_q, self.d_k))
        self.assertEqual(attn_weights.shape, (self.batch_size, self.n_heads, self.seq_len_q, self.seq_len_k))

    def test_scaled_dot_product_attention_with_mask(self):
        attention = ScaledDotProductAttention(self.d_k)
        Q = torch.randn(self.batch_size, self.n_heads, self.seq_len_q, self.d_k)
        K = torch.randn(self.batch_size, self.n_heads, self.seq_len_k, self.d_k)
        V = torch.randn(self.batch_size, self.n_heads, self.seq_len_k, self.d_k)
        mask = torch.ones(self.batch_size, 1, self.seq_len_q, self.seq_len_k)
        mask[:, :, :, -2:] = 0
        output, attn_weights = attention(Q, K, V, mask=mask)
        self.assertEqual(output.shape, (self.batch_size, self.n_heads, self.seq_len_q, self.d_k))
        self.assertEqual(attn_weights.shape, (self.batch_size, self.n_heads, self.seq_len_q, self.seq_len_k))
        self.assertTrue(torch.allclose(attn_weights[:, :, :, -2:], torch.zeros_like(attn_weights[:, :, :, -2:]), atol=1e-6))

    def test_multi_head_attention_no_mask(self):
        attention = MultiHeadAttention(self.d_model, self.n_heads)
        output, attn_weights = attention(self.Q, self.K, self.V)
        self.assertEqual(output.shape, (self.batch_size, self.seq_len_q, self.d_model))
        self.assertEqual(attn_weights.shape, (self.batch_size, self.n_heads, self.seq_len_q, self.seq_len_k))

    def test_multi_head_attention_with_mask(self):
        attention = MultiHeadAttention(self.d_model, self.n_heads)
        mask = torch.ones(self.batch_size, self.seq_len_q, self.seq_len_k)
        mask[:, :, -2:] = 0
        output, attn_weights = attention(self.Q, self.K, self.V, mask=mask)
        self.assertEqual(output.shape, (self.batch_size, self.seq_len_q, self.d_model))
        self.assertEqual(attn_weights.shape, (self.batch_size, self.n_heads, self.seq_len_q, self.seq_len_k))
        self.assertTrue(torch.allclose(attn_weights[:, :, :, -2:], torch.zeros_like(attn_weights[:, :, :, -2:]), atol=1e-6))

    def test_multi_head_attention_divisibility(self):
        with self.assertRaises(AssertionError):
            MultiHeadAttention(d_model=127, n_heads=8)

    def test_masked_self_attention(self):
        attention = MaskedSelfAttention(self.d_model, self.n_heads)
        output, attn_weights = attention(self.Q)
        self.assertEqual(output.shape, (self.batch_size, self.seq_len_q, self.d_model))
        self.assertEqual(attn_weights.shape, (self.batch_size, self.n_heads, self.seq_len_q, self.seq_len_q))
        triu_sum = torch.triu(attn_weights, diagonal=1).sum()
        self.assertTrue(torch.allclose(triu_sum,torch.tensor(0.0)))

if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

......
----------------------------------------------------------------------
Ran 6 tests in 0.216s

OK


## CELL 2

Cell 2 (SQuAD 2.0 Fine-tuning): This cell strongly benefits from a GPU, and for practical training, a GPU is highly recommended. Here's why:

Large Model: The distilbert-base-uncased (or bert-base-uncased) model is a large neural network with millions of parameters. Training and even evaluation on a CPU would be extremely slow (taking hours or even days for a few epochs).
Large Dataset: SQuAD 2.0 is a substantial dataset. Processing it, even with batching, is computationally intensive.
Matrix Operations: The core operations within the Transformer model (including self-attention, which is part of DistilBERT/BERT) are heavily reliant on matrix multiplications. GPUs are highly optimized for these kinds of operations.

Cell 2: GPU is highly recommended for practical training. It can run on a CPU, but expect extremely slow performance.

In [None]:
!pip install transformers datasets evaluate -q

In [None]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForQuestionAnswering,
    AdamW,
    get_scheduler,
)
from tqdm.auto import tqdm
import collections
from evaluate import load
import os

import transformers
transformers.logging.set_verbosity_error()  # Or .set_verbosity_critical()


# --- 1. SquadDataset Class ---

class SquadDataset(Dataset):
    def __init__(self, split, model_checkpoint="distilbert-base-uncased", max_length=384, doc_stride=128, train_size=None):
        self.raw_datasets = load_dataset("squad_v2")
        self.tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
        self.max_length = max_length
        self.doc_stride = doc_stride
        self.split = split  # "train" or "validation"
        self.train_size = train_size
        self.dataset = self.prepare_dataset()


    def preprocess_training_examples(self, examples):
        questions = [q.strip() for q in examples["question"]]
        inputs = self.tokenizer(
            questions,
            examples["context"],
            max_length=self.max_length,
            truncation="only_second",
            stride=self.doc_stride,
            return_overflowing_tokens=True,
            return_offsets_mapping=True,
            padding="max_length",
        )

        offset_mapping = inputs.pop("offset_mapping")
        sample_map = inputs.pop("overflow_to_sample_mapping")
        answers = examples["answers"]
        start_positions = []
        end_positions = []

        for i, offset in enumerate(offset_mapping):
            sample_idx = sample_map[i]
            answer = answers[sample_idx]
            if len(answer["answer_start"]) == 0:
                start_positions.append(0)
                end_positions.append(0)
                continue

            start_char = answer["answer_start"][0]
            end_char = answer["answer_start"][0] + len(answer["text"][0])
            sequence_ids = inputs.sequence_ids(i)

            idx = 0
            while sequence_ids[idx] != 1:
                idx += 1
            context_start = idx
            while sequence_ids[idx] == 1:
                idx += 1
            context_end = idx - 1

            if offset[context_start][0] > start_char or offset[context_end][1] < end_char:
                start_positions.append(0)
                end_positions.append(0)
            else:
                idx = context_start
                while idx <= context_end and offset[idx][0] <= start_char:
                    idx += 1
                start_positions.append(idx - 1)

                idx = context_end
                while idx >= context_start and offset[idx][1] >= end_char:
                    idx -= 1
                end_positions.append(idx + 1)

        inputs["start_positions"] = start_positions
        inputs["end_positions"] = end_positions
        return inputs

    def preprocess_validation_examples(self, examples):
        questions = [q.strip() for q in examples["question"]]
        inputs = self.tokenizer(
            questions,
            examples["context"],
            max_length=self.max_length,
            truncation="only_second",
            stride=self.doc_stride,
            return_overflowing_tokens=True,
            return_offsets_mapping=True,
            padding="max_length",
        )

        sample_map = inputs.pop("overflow_to_sample_mapping")
        example_ids = []

        for i in range(len(inputs["input_ids"])):
            sample_idx = sample_map[i]
            example_ids.append(examples["id"][sample_idx])
            sequence_ids = inputs.sequence_ids(i)
            offset = inputs["offset_mapping"][i]
            inputs["offset_mapping"][i] = [
                o if sequence_ids[k] == 1 else None for k, o in enumerate(offset)
            ]

        inputs["example_id"] = example_ids
        return inputs

    def prepare_dataset(self):
        if self.split == "train":
            dataset = self.raw_datasets["train"]
            if self.train_size is not None:  # If train_size is specified
                # Use train_test_split to create a smaller training set
                split_datasets = dataset.train_test_split(train_size=self.train_size, seed=42)  # Use a seed for reproducibility
                dataset = split_datasets["train"]  # Select the training split
            processed_dataset = dataset.map(
                self.preprocess_training_examples,
                batched=True,
                remove_columns=dataset.column_names,
            )
        elif self.split == "validation":
            dataset = self.raw_datasets["validation"]
            processed_dataset = dataset.map(
                self.preprocess_validation_examples,
                batched=True,
                remove_columns=dataset.column_names,
            )
        else:
            raise ValueError("split must be 'train' or 'validation'")
        return processed_dataset

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        return self.dataset[idx]  # Access preprocessed data


# --- 2. QuestionAnsweringModel Class ---

class QuestionAnsweringModel(nn.Module):
    def __init__(self, model_checkpoint="distilbert-base-uncased", device="cpu"):
        super().__init__()
        self.model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint).to(device)
        self.device = device

    def forward(self, batch):
        # Ensure ALL batch items are on the correct device
        # Move only tensors to the specified device, excluding lists
        # Remove 'offset_mapping' and 'example_id' before passing to the model
        inputs = {k: v.to(self.device) if torch.is_tensor(v) else v for k, v in batch.items()
                  if k not in ['offset_mapping', 'example_id']}  # Exclude 'example_id' here
        return self.model(**inputs)

    def save_pretrained(self, path):
        self.model.save_pretrained(path)



# --- 3. Trainer Class
class Trainer:
    def __init__(self, model, train_dataloader, validation_dataloader, tokenizer, optimizer, lr_scheduler, device, num_epochs=3, checkpoint_dir="checkpoints"):
        self.model = model
        self.train_dataloader = train_dataloader
        self.validation_dataloader = validation_dataloader
        self.tokenizer = tokenizer
        self.optimizer = optimizer
        self.lr_scheduler = lr_scheduler
        self.device = device
        self.num_epochs = num_epochs
        self.checkpoint_dir = checkpoint_dir
        os.makedirs(self.checkpoint_dir, exist_ok=True)
        self.metric = load("squad_v2")

    def train_loop(self):
        self.model.train()
        for epoch in range(self.num_epochs):
            progress_bar = tqdm(self.train_dataloader, desc=f"Epoch {epoch+1}")
            for batch in progress_bar:
                outputs = self.model(batch)
                loss = outputs.loss
                loss.backward()
                self.optimizer.step()
                self.lr_scheduler.step()
                self.optimizer.zero_grad()
                progress_bar.set_postfix({"loss": loss.item()})
            # Save checkpoint after each epoch
            checkpoint_path = os.path.join(self.checkpoint_dir, f"checkpoint_epoch_{epoch + 1}")
            self.model.save_pretrained(checkpoint_path)
            self.tokenizer.save_pretrained(checkpoint_path)
            print(f"Saved checkpoint to {checkpoint_path}")


    def evaluate(self):
        self.model.eval()
        all_start_logits = []
        all_end_logits = []
        for batch in tqdm(self.validation_dataloader, desc="Evaluating"):
            with torch.no_grad():
                outputs = self.model(batch)
            # Accumulate logits on CPU, only for tensor data
            all_start_logits.append(outputs.start_logits.cpu())
            all_end_logits.append(outputs.end_logits.cpu())

        max_answer_length = 30
        n_best_size = 20
        start_logits = torch.cat(all_start_logits)
        end_logits = torch.cat(all_end_logits)

        # Use dataset directly from validation_dataloader
        example_to_features = collections.defaultdict(list)
        for idx, feature in enumerate(self.validation_dataloader.dataset):
            example_to_features[feature["example_id"]].append(idx)


        n_best_predictions = collections.OrderedDict()
        raw_dataset = load_dataset("squad_v2") #loading raw dataset for context
        eval_dataset = raw_dataset["validation"]


        for example_index, example in enumerate(tqdm(eval_dataset)):
            feature_indices = example_to_features[example["id"]]
            prelim_predictions = []

            for feature_index in feature_indices:
                start_logit = start_logits[feature_index]
                end_logit = end_logits[feature_index]
                # Access offsets directly from the dataset
                offsets = self.validation_dataloader.dataset[feature_index]["offset_mapping"]
                start_indexes = torch.argsort(start_logit, descending=True)[:n_best_size].tolist()
                end_indexes = torch.argsort(end_logit, descending=True)[:n_best_size].tolist()

                for start_index in start_indexes:
                    for end_index in end_indexes:
                        if (
                            start_index >= len(offsets)
                            or end_index >= len(offsets)
                            or offsets[start_index] is None
                            or offsets[end_index] is None
                        ):
                            continue
                        if end_index < start_index or end_index - start_index + 1 > max_answer_length:
                            continue
                        prelim_predictions.append(
                            {
                                "offsets": (offsets[start_index][0], offsets[end_index][1]),
                                "score": start_logit[start_index] + end_logit[end_index],
                                "start_logit": start_logit[start_index],
                                "end_logit": end_logit[end_index],
                            }
                        )
            if len(prelim_predictions) > 0:
                best_predictions = sorted(prelim_predictions, key=lambda x: x["score"], reverse=True)[:n_best_size]
            else:
                best_predictions = []

            if len(best_predictions) > 0:
                best_non_null_pred = max(best_predictions, key=lambda x: x["score"])
                predicted_answer = example["context"][best_non_null_pred["offsets"][0] : best_non_null_pred["offsets"][1]]
            else:
                predicted_answer = ""

            n_best_predictions[example['id']] = predicted_answer

        formatted_predictions = [{"id": k, "prediction_text": v, "no_answer_probability": 0.0} for k, v in n_best_predictions.items()]
        references = [{"id": ex["id"], "answers": ex["answers"]} for ex in eval_dataset]
        metrics = self.metric.compute(predictions=formatted_predictions, references=references)
        return metrics


# --- Custom Collate Function for Training ---

def custom_collate_train(batch):
    """
    Custom collate function to handle lists in the training data.
    Keeps 'offset_mapping' as lists and stacks tensors.
    """
    # Separate tensor data and list data
    tensor_data = {}
    list_data = {}

    # Find keys that should be treated as lists
    list_keys = ['offset_mapping', 'example_id']  # Add 'example_id' to list_keys

    # Collate list data (simply gather into lists)
    # Check if the key exists before collating
    for key in list_keys:
        list_data[key] = [item.get(key, None) for item in batch]  # Use get with default None

    # Collate tensor data using default collate
    tensor_keys = [k for k in batch[0].keys() if k not in list_keys]
    for key in tensor_keys:
        # Convert to tensor if not already a tensor
        tensor_data[key] = torch.utils.data.default_collate([torch.tensor(item[key]) if not torch.is_tensor(item[key]) else item[key] for item in batch])

    # Include 'offset_mapping' and 'example_id' back in the returned dictionary, but as lists
    tensor_data.update(list_data)  # Update with list_data to include list keys

    return tensor_data

# --- Main Execution ---

if __name__ == "__main__":
    # Configuration
    model_checkpoint = "distilbert-base-uncased"
    batch_size = 16
    num_epochs = 3
    learning_rate = 5e-5
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    checkpoint_to_load = None  # Set to None to train from scratch or "checkpoint_epoch_X"
    #train_size = 1000 # Add a train size for the POC

    train_size = 100000

    # Prepare data
    # Prepare data
    train_squad_dataset = SquadDataset(split="train", model_checkpoint=model_checkpoint, train_size=train_size)
    validation_squad_dataset = SquadDataset(split="validation", model_checkpoint=model_checkpoint)

    # Apply custom collate functions to the dataloaders
    train_dataloader = DataLoader(train_squad_dataset, batch_size=batch_size, shuffle=True, collate_fn=custom_collate_train)  # Add collate_fn
    validation_dataloader = DataLoader(validation_squad_dataset, batch_size=batch_size, collate_fn=custom_collate_train)  # Use custom_collate_train for validation as well

    # Initialize model, optimizer, and scheduler
    model = QuestionAnsweringModel(model_checkpoint=model_checkpoint, device=device)



    # Load checkpoint if specified
    if checkpoint_to_load:
        model = QuestionAnsweringModel(model_checkpoint=checkpoint_to_load, device=device)
        tokenizer = AutoTokenizer.from_pretrained(checkpoint_to_load)
        print(f"Loaded checkpoint from {checkpoint_to_load}")
    else:
        tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)



    # Use PyTorch's AdamW
    optimizer = torch.optim.AdamW(model.model.parameters(), lr=learning_rate)

    num_training_steps = num_epochs * len(train_dataloader)
    lr_scheduler = get_scheduler(
        "linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps
    )

    # Create Trainer instance and train
    trainer = Trainer(model, train_dataloader, validation_dataloader, tokenizer, optimizer, lr_scheduler, device, num_epochs)
    trainer.train_loop()

    # Evaluate
    metrics = trainer.evaluate()
    print(metrics)

Map:   0%|          | 0/100000 [00:00<?, ? examples/s]

Epoch 1:   0%|          | 0/6319 [00:00<?, ?it/s]

Saved checkpoint to checkpoints/checkpoint_epoch_1


Epoch 2:   0%|          | 0/6319 [00:00<?, ?it/s]

Saved checkpoint to checkpoints/checkpoint_epoch_2


Epoch 3:   0%|          | 0/6319 [00:00<?, ?it/s]

Saved checkpoint to checkpoints/checkpoint_epoch_3


Evaluating:   0%|          | 0/759 [00:00<?, ?it/s]

  0%|          | 0/11873 [00:00<?, ?it/s]

{'exact': 37.783205592520844, 'f1': 42.439961581181215, 'total': 11873, 'HasAns_exact': 75.67476383265857, 'HasAns_f1': 85.00163020468362, 'HasAns_total': 5928, 'NoAns_exact': 0.0, 'NoAns_f1': 0.0, 'NoAns_total': 5945, 'best_exact': 50.11370336056599, 'best_exact_thresh': 0.0, 'best_f1': 50.11370336056599, 'best_f1_thresh': 0.0}


## CELL 3

In [None]:
!pip install colab-env --upgrade -q
!pip install smolagents -q
import colab_env

In [2]:
!pip show smolagents

Name: smolagents
Version: 1.8.0
Summary: 🤗 smolagents: a barebones library for agents. Agents write python code to call tools or orchestrate other agents.
Home-page: 
Author: Thomas Wolf
Author-email: Aymeric Roucher <aymeric@hf.co>
License: 
Location: /usr/local/lib/python3.11/dist-packages
Requires: duckduckgo-search, huggingface-hub, jinja2, markdownify, pandas, pillow, python-dotenv, requests, rich
Required-by: 


In [3]:
from typing import Optional
from smolagents import CodeAgent, DuckDuckGoSearchTool
from smolagents.monitoring import AgentLogger, LogLevel
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
import collections
import numpy as np  # Import numpy for beam search

# --- MyCustomAgentLogger ---
class MyCustomAgentLogger(AgentLogger):
    def __init__(self, level: LogLevel = LogLevel.INFO):
        super().__init__(level=level)

    def log_task(self, content: str, subtitle: str, agent, title: Optional[str] = None, level: int = LogLevel.INFO) -> None:
        """
        Logs a task with a custom title including model information.
        """
        if isinstance(agent.model, AutoModelForQuestionAnswering):  # Check if it's your custom model type
            model_type = agent.model.__class__.__name__  # Use the model's class name
        else:
            model_type = agent.model.__class__.__name__  # Fallback for other model types

        title = f"[bold]{model_type}"
        super().log_task(content, subtitle, title, level)

    def set_log_level(self, level):
        """Set the log level for this logger."""
        self.level = level  # Directly modify the 'level' attribute of the instance

# --- Create the custom logger instance ---
logger = MyCustomAgentLogger()
logger.set_log_level(LogLevel.DEBUG)  # Optional: Change the log level

# --- Load Your Model ---
model_checkpoint = "/content/gdrive/MyDrive/model/QA/checkpoint_epoch_3"  # Replace with your model checkpoint path
model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint)  # Load without device
model.to("cuda")  # Move to the desired device
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

# --- Custom Collate Function ---
def custom_collate_train(batch):
    """
    Custom collate function to handle lists in the training data.
    Keeps 'offset_mapping' as lists and stacks tensors.
    """
    # ... (Implementation remains the same as before) ...

# --- MyCustomAgent ---
class MyCustomAgent(CodeAgent):
    def __init__(self, tools, model):
        super().__init__(tools, model)  # Call parent class's __init__ first
        self.logger = logger  # Override default logger with your custom logger

    def run(self, prompt: str):
        """
        Runs the agent on the given prompt.
        """

        # Log the start of the task using the custom logger
        self.logger.log_task(content=f"Running agent on prompt: {prompt}", subtitle="Starting task", agent=self, level=LogLevel.INFO)

        # --- Preprocess the prompt ---
        try:
            context, question = prompt.split("context:")[1].split("question:")
            context = context.strip()
            question = question.strip()
        except ValueError:
            # Handle cases where the prompt is not in the expected format
            context = ""
            question = prompt

        # Prepare inputs for the model
        inputs = tokenizer(question, context, return_tensors="pt").to(model.device)

        # --- Run inference with your model ---
        with torch.no_grad():
            outputs = self.model(**inputs)

        # --- Extract the answer using beam search ---
        start_logits = outputs.start_logits[0].cpu().numpy()
        end_logits = outputs.end_logits[0].cpu().numpy()
        n_best_size = 20  # Number of best candidates to consider
        max_answer_length = 30  # Maximum length of the answer span

        # Get the n-best start and end logits indices
        start_indexes = np.argsort(start_logits)[-n_best_size:]
        end_indexes = np.argsort(end_logits)[-n_best_size:]

        # Find the best answer span based on the combined score
        best_score = -np.inf
        best_start_index = 0
        best_end_index = 0

        for start_index in start_indexes:
            for end_index in end_indexes:
                if end_index < start_index or end_index - start_index + 1 > max_answer_length:
                    continue

                score = start_logits[start_index] + end_logits[end_index]
                if score > best_score:
                    best_score = score
                    best_start_index = start_index
                    best_end_index = end_index

        # Extract the answer span from the context
        answer = tokenizer.decode(inputs['input_ids'][0][best_start_index : best_end_index + 1])

        # Log the completion of the task
        #self.logger.log_task(content="Task completed", subtitle="Finished", agent=self, level=LogLevel.INFO)

        return answer  # or your actual agent response

# --- Example Usage ---
if __name__ == "__main__":
    # Create an instance of the custom code agent class and run it
    my_agent = MyCustomAgent(tools=[DuckDuckGoSearchTool()], model=model)

    # Complex context and question
    context = """The James Webb Space Telescope (JWST), launched on December 25, 2021, is a large infrared space
    telescope developed by NASA, ESA, and CSA. It is the successor to the Hubble Space Telescope and is designed to
    observe the early universe, the formation of galaxies, and the evolution of stars and planetary systems. JWST is
    equipped with four state-of-the-art scientific instruments: NIRCam, NIRSpec, MIRI, and FGS/NIRISS. These
    instruments allow it to capture images and spectra across a wide range of infrared wavelengths, providing
    unprecedented views of the cosmos. JWST operates in a halo orbit around the second Lagrange point (L2) of the
    Sun-Earth system, about 1.5 million kilometers from Earth. This location allows it to stay shielded from the
    Sun's heat and radiation, enabling it to make highly sensitive observations. The development of JWST was a
    complex and challenging undertaking, involving thousands of scientists, engineers, and technicians from around
    the world. It is considered one of the most ambitious and technologically advanced scientific projects ever
    undertaken."""

    question = "What are the primary scientific goals of the James Webb Space Telescope?"

    prompt = "context: " + context + " question: " + question
    answer = my_agent.run(prompt)
    print(f"The answer is: {answer}")

The answer is: observe the early universe, the formation of galaxies, and the evolution of stars and planetary systems


In [4]:
if __name__ == "__main__":
    # Create an instance of the custom code agent class and run it
    my_agent = MyCustomAgent(tools=[DuckDuckGoSearchTool()], model=model)
    answer = my_agent.run("context: Gemini is a large language model (LLM) developed by Google AI, trained on a massive dataset of text and code. It can perform various tasks, including generating text, translating languages, writing different kinds of creative content, and answering your questions in an informative way. question: Who developed Gemini?")
    print(f"The answer is: {answer}") # Print the answer

The answer is: google ai


In [5]:
context = """Quantum computing is a rapidly evolving field that harnesses the principles of quantum mechanics to perform computations. Unlike classical computers, which rely on bits to represent information as 0s or 1s, quantum computers utilize qubits. Qubits can exist in a superposition, allowing them to represent both 0 and 1 simultaneously. This property, along with quantum entanglement, enables quantum computers to tackle complex problems that are intractable for classical computers.

One of the key applications of quantum computing is in drug discovery and materials science. By simulating molecular interactions, quantum computers can accelerate the identification of new drug candidates and materials with desired properties. Quantum computers also have the potential to revolutionize cryptography by breaking existing encryption algorithms and developing new, quantum-resistant ones.

However, building and maintaining quantum computers pose significant challenges. Qubits are highly susceptible to noise and decoherence, which can disrupt their delicate quantum states. Maintaining the stability of qubits requires extremely low temperatures and precise control. Despite these challenges, significant progress has been made in recent years, with researchers developing increasingly sophisticated quantum computing platforms.

The future of quantum computing holds immense promise. As technology advances, we can expect to see more powerful and stable quantum computers capable of solving even more complex problems. While widespread adoption of quantum computing is still some years away, the field is poised to revolutionize various industries and drive innovation in the years to come."""

questions = [
    "What is the fundamental difference between classical and quantum computers?",
    "How are quantum computers being used to advance drug discovery?",
    "What are the major challenges faced in building quantum computers?",
    "What is the expected impact of quantum computing on cryptography?",
    "What is the long-term outlook for the field of quantum computing?"
]

# --- Example Usage with your MyCustomAgent ---
for question in questions:
    prompt = f"context: {context} question: {question}"
    answer = my_agent.run(prompt)  # Assuming 'my_agent' is your MyCustomAgent instance
    print(f"Question: {question}")
    print(f"Answer: {answer}\n")


Question: What is the fundamental difference between classical and quantum computers?
Answer: rely on bits to represent information as 0s or 1s, quantum computers utilize qubits



Question: How are quantum computers being used to advance drug discovery?
Answer: simulating molecular interactions, quantum computers can accelerate the identification of new drug candidates and materials with desired properties



Question: What are the major challenges faced in building quantum computers?
Answer: qubits are highly susceptible to noise and decoherence



Question: What is the expected impact of quantum computing on cryptography?
Answer: as technology advances, we can expect to see more powerful and stable quantum computers capable of solving even more complex problems



Question: What is the long-term outlook for the field of quantum computing?
Answer: some years away

