<a href="https://colab.research.google.com/github/salsaady/SYSC4415/blob/main/sysc4415_w25_a3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to Assignment 3

**TA: [Igor Bogdanov](mailto:igorbogdanov@cmail.carleton.ca)**

## General Instructions:

This Assignment can be done **in a group of two or individually**.

YOU HAVE TO JOIN A GROUP ON BRIGHTSPACE TO SUBMIT.

Please state it explicitly at the beginning of the assignment.

You need only one submission if it's group work.

Please print out values when asked using Python's print() function with f-strings where possible.

Submit your **saved notebook with all the outputs** to Brightspace, but ensure it will produce correct outputs upon restarting and click "runtime" → "run all" with clean outputs. Ensure your notebook displays all answers correctly.

## Your Submission MUST contain your signature at the bottom.

### Objective:
In this assignment, we build a reasoning AI agent that facilitates ML operations and model evaluation. This assignment is heavily based on Tutorial 9.

**Submission:** Submit your Notebook as a *.ipynb* file that adopts this naming convention: ***SYSC4415_W25_A3_NameLastname.ipynb*** on *Brightspace*. No other submission (e.g., through email) will be accepted. (Example file name: SYSC4415_W25_A3_IgorBogdanov.ipynb or SYSC4415_W25_A3_Student1_Student2.ipynb) The notebool MUST contain saved outputs

**Runtime tips:**
Agentic programming and API calling can be easily done locally and moved to Colab in the final stages, depending on the implementation of your tools and ML tasks you want to run.

# Imports

Some basic libraries you need are imported here. Make sure you include whatever library you need in this entire notebook in the code block below.

If you are using any library that requires installation, please paste the installation command here.
Leave the code block below if you are not installing any libraries.

In [None]:
# Name: Sarah Al-Saady
# Student Number: 101226759

# Name:
# Student Number:

In [None]:
# Libraries to install - leave this code block blank if this does not apply to you
# Please add a brief comment on why you need the library and what it does


In [2]:
!pip install groq

# Libraries you might need
# General
import os
import zipfile
import librosa
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# For pre-processing
import torch
from torch.utils.data import Dataset, DataLoader, random_split
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder

# For modeling
import torch.nn as nn
import torch.optim as optim
from tqdm import tqdm
import torchsummary

# For metrics
from sklearn.metrics import  accuracy_score
from sklearn.metrics import  precision_score
from sklearn.metrics import  recall_score
from sklearn.metrics import  f1_score
from sklearn.metrics import  classification_report
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import  roc_auc_score
from sklearn.metrics import confusion_matrix

# Agent
from groq import Groq
from dataclasses import dataclass
import re
from typing import Dict, List, Optional


Collecting groq
  Downloading groq-0.22.0-py3-none-any.whl.metadata (15 kB)
Downloading groq-0.22.0-py3-none-any.whl (126 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.7/126.7 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq
Successfully installed groq-0.22.0


# Task 1: Registration and API Activation (5 marks)

For this particular assignment, we will be using GroqCloud for LLM inference. This task aims to determine how to use the Groq API with LLMs.  

Create a free account on https://groq.com/ and generate an API Key. Don't remove your key until you get your grade. Feel free to delete your API key after the term is completed.

In conversational AI, prompting involves three key roles: the system role (which sets the agent's behavior and capabilities), the user role (which represents human inputs and queries), and the assistant role (which contains the agent's responses). The system role provides the foundational instructions and constraints, the user role delivers the actual queries or commands, and the assistant role generates contextual, step-by-step responses following the system's guidelines. This structured approach ensures consistent, controlled interactions where the agent maintains its defined behavior while responding to user needs, with each role serving a specific purpose in the conversation flow.


In [6]:
# Q1a (2 mark)
# Create a client using your API key.

client = Groq(api_key="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")

# YOUR ANSWER GOES HERE

In [8]:
# Q1b (3 marks)

# instantiate chat_completion object using model of your choice (llama-3.3-70b-versatile - recommended)
# Hint: Use Tutorial 9 and Groq Documentation
# Explain each parameter and how each value change influences the LLM's output.
# Prompt the model using the user role about anything different from the tutorial.

# YOUR ANSWER GOES HERE

chat_completion = client.chat.completions.create(
    model="llama-3.3-70b-versatile", # ID of the model to use
    temperature=0.2, # What sampling temperature to use. Higher values make the output more random, while lower values make it more focused and deterministic
    top_p=0.7, # An alternative to sampling, the model considers the results of the tokens with top_p probability mass.
                # This influences the models output, for ex. if it's 0.1, only the top 10% probability mass tokens are considered.
    max_tokens=1024, # The maximum number of tokens that can be generated in the chat completion.
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"}],
        # A list of messages comprimising the conversation so far.
    )

chat_completion.choices[0].message.content


"The Los Angeles Dodgers won the 2020 World Series, defeating the Tampa Bay Rays in the series 4 games to 2. This was the Dodgers' first World Series title since 1988. The final game was played on October 27, 2020, at Globe Life Field in Arlington, Texas."

# Task 2: Agent Implementation (5 marks)

This task contains an implementation of the agent from Tutorial 9. The idea of this task is to make sure you understand how basic LLM-Agent works.


In [9]:
# Q2a: (5 marks) Explain how agent implementation works, providing comments line by line.
# This paper might be helpful: https://react-lm.github.io/

# Agent state class holds the state of the agent
@dataclass
class Agent_State:
    messages: List[Dict[str, str]] # list of dictionaries
    system_prompt: str # prompt for the system

# Actual agent class
class ML_Agent:
    # Initializes an instance of the agent and initializes an agent state with a prompt
    def __init__(self, system_prompt: str):
        self.client = client # API client
        self.state = Agent_State(
            messages=[{"role": "system", "content": system_prompt}], # sets the agent state with a system message containing the prompt
            system_prompt=system_prompt,
        )

    # Adds a message to the agent's past list of messages (conversation history)
    def add_message(self, role: str, content: str) -> None:
        self.state.messages.append({"role": role, "content": content}) # add to the dictionary the role and content

    # This is the reasoning
    def execute(self) -> str:
        # Send the conversation history to the LLM API
        completion = self.client.chat.completions.create(
            model="llama-3.3-70b-versatile",
            temperature=0.2,
            top_p=0.7, # limit token sampling to the top 70% probability mass
            max_tokens=1024,
            messages=self.state.messages, # provide the full conversation history as context
        )
        return completion.choices[0].message.content # return the generated response

    # when the agent class is invoked, this is called (callable function)
    def __call__(self, message: str) -> str:
        self.add_message("user", message) # add the user's question (message) to the conversation
        result = self.execute() # start reasoning (thought), generates assistant's response
        self.add_message("assistant", result) # add the answer from the assistant to the conversation
        return result # return the assistant's response

# Task 3: Tools (20 marks)

Tools are specialized functions that enable AI agents to perform specific actions beyond their inherent capabilities, such as retrieving information, performing calculations, or manipulating data. Agents use tools to decompose complex reasoning into observable steps, extend their knowledge beyond training data, maintain state across interactions, and provide transparency in their decision-making process, ultimately allowing them to solve problems they couldn't tackle through reasoning alone.

Essentially, tools are just callback functions invoked by the agent at the appropriate time during the execution loop.

You need to plan your tools for each particular task your agent is expected to solve.
The Model Evaluation Agent we are building should be able to evaluate the model from the model pool on the specific dataset.

Datasets to use: Penguins, Iris, CIFAR-10

You should be able to tell the agent what to do and watch it display the output of the tools' execution, similar to that in Tutorial 9.

User Prompt examples you should be able to give to your agent and expect it to fulfill the task:
- **Evaluate Linear Regression Model on Iris Dataset**
- **Train a logistic regression model on the Iris dataset**
- **Load the Penguins dataset and preprocess it.**
- **Train a decision tree model on the Penguins dataset and evaluate it.**
- **Load the CIFAR-10 dataset and train Mini-ResNet CNN, visualize results**

Classifier Models for Iris and Penguins (use A1 and early tutorials):
  * Logistic Regression (solver='lbfgs')
  * Decision Tree (max_depth=3)
  * KNN (n_neighbors=5)

Any 2 CNN models of your choice for CIFAR-10 dataset (do some research, don't create anything from scratch unless you want to, use the ones provided by libraries and frameworks)

HINT: It is highly recommended that any code from previous assignments and tutorials be reused for tool implementation.

**Use Pytorch where possible**

## DON'T FORGET TO IMPORT MISSING LIBRARIES

In [None]:
# Q3a (3 marks): Implement model_memory tool.
# This tool should provide the agent with details about models or datasets
# Example: when asked about Penguin dataset, the agent can use memory to look up
# the source to obtain the dataset.


# YOUR ANSWER GOES HERE

In [None]:
# Q3b (3 marks): Implement dataset_loader tool.
# loads dataset after obtaining info from memory


# YOUR ANSWER GOES HERE

In [None]:
# Q3c (3 marks): Implement dataset_preprocessing tool.
# preprocesses the dataset to work with the chosen model, and does the splits


# YOUR ANSWER GOES HERE

In [None]:
# Q3d (3 points): Implement train_model tool.
# trains selected model on selected dataset, the agent should not use this tool
# on datasets and models that cannot work together.



# YOUR ANSWER GOES HERE

In [None]:
# Q3e (3 marks): Implement evaluate_model tool
# evaluates the models and shows the quality metrics (accuracy, precision, and anything else of your choice)


# YOUR ANSWER GOES HERE

In [None]:
# Q3f (5 marks): Implement visualize_results tool
# provides results of the training/evaluation, open-ended task (2 plots minimum)


# YOUR ANSWER GOES HERE

# Task 4: System Prompt (10 marks)
A system prompt is essential for guiding an agent's behavior by establishing its purpose, capabilities, tone, and workflow patterns. It acts as the "personality and instruction manual" for the agent, defining the format of interactions (like using Thought/Action/Observation steps in our ML agent), available tools, response styles, and domain-specific knowledge—all while remaining invisible to the end user. This hidden layer of instruction ensures the agent consistently follows the intended reasoning process and operational constraints while providing appropriate and helpful responses, effectively serving as the blueprint for the agent's behavior across all interactions.


In [None]:
# Q4a (10 marks) Build a system prompt to guide the agent based on Tutorial 9.
# Use the following function:

# Try to find alternative wording to keep the agent in the desired loop,
# don't just copy the prompt from the tutorial.

# Penalty for direct copy - 2 marks

def create_agent():
    # your system prompt goes inside the multiline string
    system_prompt = """

    """.strip()

    return ML_Agent(system_prompt)


# Task 5: Set the Agent Loop (10 marks)

Now we are building automation of our Thought/Action/Observation sequence.


In [None]:
# Q5a: (2 marks) Explain why we need the following data structure and fill it in with appropriate values:
KNOWN_ACTIONS = {
   # HINT See Tutorial 9.
}


In [None]:
# Q5b: (6 marks) Explain how the agent automation loop works line by line. Why do we need the ACTION_PATTERN variable?
# This paper might be helpful: https://react-lm.github.io/

ACTION_PATTERN = re.compile("^Action: (\w+): (.*)$")

number_of_steps = 5 # adjust this number for your implementation, to avoid an infinite loop

def query(question: str, max_turns: int = number_of_steps) -> List[Dict[str, str]]:
    agent = create_agent()
    next_prompt = question

    for turn in range(max_turns):
        result = agent(next_prompt)
        print(result)
        actions = [
            ACTION_PATTERN.match(a)
            for a in result.split("\n")
            if ACTION_PATTERN.match(a)
        ]
        if actions:
            action, action_input = actions[0].groups()
            if action not in KNOWN_ACTIONS:
                raise ValueError(f"Unknown action: {action}: {action_input}")
            print(f"\n ---> Executing {action} with input: {action_input}")
            observation = KNOWN_ACTIONS[action](action_input)
            print(f"Observation: {observation}")
            next_prompt = f"Observation: {observation}"
        else:
            break
    return agent.state.messages


In [None]:
# Q5b: (2 marks)
# QUESTION: How can we check the whole history of the agent's interaction with LLM?




# Task 6: Run your agent (15 marks)

Let's see if your agent works

In [None]:
# Execute any THREE example prompts using your agent. (Each working prompt exaple will give you 5 marks, 5x3=15)
# DONT FORGET TO SAVE THE OUTPUT

# User Prompt examples you should be able to give to your agent:
# **Evaluate Linear Regression Model on Iris Dataset**
# **Train a logistic regression model on the Iris dataset**
# **Load the Penguins dataset and preprocess it.**
# **Train a decision tree model on the Penguins dataset and evaluate it.**
# **Load the CIFAR-10 dataset and train Mini-ResNet CNN, visualize results**

# Use this template:

# Example 1: Prompt
print("\nExample 1: Evaluate Linear Regression Model on Iris Dataset")
print("=" * 50)
task = "Evaluate Linear Regression Model on Iris Dataset"
result = query(task)
print("\n" + "=" * 50 + "\n")


# Task 7: BONUS (10 points)
Not valid without completion of all the previous tasks and tool implementations.

In [None]:
# Build your own additional ML-related tool and provide an example of interaction with your reasoning agent
# using a prompt of your choice that makes the agent use your tool at one of the reasoning steps.


Good luck!

## Signature:
Don't forget to insert your name and student number and execute the snippet below.



In [None]:
!pip install watermark
# Provide your Signature:
%load_ext watermark
%watermark -a 'Your Name, #Student_Number' -nmv --packages numpy,pandas,sklearn,matplotlib,seaborn,graphviz,groq,torch