# Week 14: LLMs as Decision Makers and Agents - Homework

**ML2: Advanced Machine Learning**

**Estimated Time**: 1 hour

---

This homework combines programming exercises and knowledge-based questions to reinforce this week's concepts.

## Setup

Run this cell to import necessary libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

print('✓ Libraries imported successfully')

---
## Part 1: Programming Exercises (60%)

Complete the following programming tasks. Read each description carefully and implement the requested functionality.

### Exercise 1: Experiment: ReAct Pattern

**Time**: 10 min

Observe how LLMs can reason and act in a loop to solve multi-step problems.

In [None]:
# ReAct = Reasoning + Acting

# Task: "What's the weather in the capital of France?"

# Traditional LLM (single response):
# Output: "I don't have access to current weather data."

# ReAct Agent:
# Thought: "I need to find the capital of France first."
# Action: search("capital of France")
# Observation: "Paris"
# Thought: "Now I need current weather for Paris."
# Action: weather_api("Paris")
# Observation: "75°F, sunny"
# Thought: "I have the answer."
# Answer: "The weather in Paris (capital of France) is 75°F and sunny."

# TODO: Observe how the agent breaks down the problem into steps
# TODO: What happens if a tool call fails? How does it recover?

---
## Part 2: Knowledge Questions (40%)

Answer the following questions to test your conceptual understanding.

### Question 1 (Short Answer)

**Question 1 - LLMs as Reasoners vs Actors**

Traditional LLM: Input → Output (single step)
Agent LLM: Input → Think → Act → Observe → Think → Act → ... → Output (loop)

Explain:
1. What new capabilities does the agentic loop enable?
2. Why can't a single LLM call solve multi-step problems?
3. What are the risks of autonomous agents?

**Hint**: Agents can interact with tools, gather information, and make decisions over time.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 2 (Short Answer)

**Question 2 - ReAct Framework**

ReAct = Reason + Act in alternating steps.

Thought: "I need to check the database"
Action: query_database("SELECT * FROM users")
Observation: [results]
Thought: "Based on the results..."

Explain:
1. Why is explicit reasoning ("Thought") important?
2. How does this differ from just calling functions?
3. What role does the observation step play?

**Hint**: Explicit reasoning helps the LLM plan, self-correct, and explain its actions.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 3 (Multiple Choice)

**Question 3 - Agent Safety**

An LLM agent has access to: email, database, file system, internet.

What's the PRIMARY safety concern?

A) Computational cost
B) Agent taking harmful or unintended actions
C) Speed of execution
D) Token limits

A) Computational cost
B) Agent taking harmful or unintended actions
C) Speed of execution
D) Token limits

**Hint**: Agent actions have real-world consequences. Need safeguards to prevent harm.

**Your Answer**: [Write your answer here - e.g., 'B']

**Explanation**: [Explain why this is correct]

### Question 4 (Short Answer)

**Question 4 - Tool Use**

Agents can invoke tools (calculators, APIs, databases, code execution).

Explain:
1. How do you teach an LLM what tools are available?
2. How does the LLM decide which tool to use?
3. What happens if the LLM hallucinates a tool call?

**Hint**: Tools are described in the prompt/system message. LLM chooses based on task. Hallucinated calls fail.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 5 (Short Answer)

**Question 5 - Agent Planning**

For complex tasks, agents need to PLAN before acting.

Task: "Book the cheapest flight to Paris next month"

Explain:
1. What sub-steps are needed?
2. How can an LLM generate a plan?
3. What if the plan is wrong? How does the agent adapt?

**Hint**: Sub-steps: search dates, compare prices, book. LLM can generate plan in "Thought" step. Adapt based on observations.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 6 (Multiple Choice)

**Question 6 - Chain-of-Thought for Agents**

Chain-of-thought prompting helps agents by:

A) Making them slower
B) Forcing explicit reasoning steps that improve decision quality
C) Reducing token usage
D) Eliminating hallucinations

A) Making them slower
B) Forcing explicit reasoning steps that improve decision quality
C) Reducing token usage
D) Eliminating hallucinations

**Hint**: CoT makes reasoning explicit, helping agents plan better and self-correct.

**Your Answer**: [Write your answer here - e.g., 'B']

**Explanation**: [Explain why this is correct]

### Question 7 (Short Answer)

**Question 7 - Memory for Agents**

Agents need memory to:
- Remember past actions
- Avoid repeating mistakes
- Maintain context across long tasks

Explain: How do you implement memory for an LLM agent given context window limits?

**Hint**: Options: conversation history in prompt, external memory/database, summarization.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 8 (Short Answer)

**Question 8 - Multi-Agent Systems**

Instead of one agent, use multiple specialized agents:
- Researcher agent (gathers info)
- Planner agent (creates plan)
- Executor agent (takes actions)

Explain:
1. What advantages does specialization provide?
2. How do agents communicate?
3. What coordination challenges arise?

**Hint**: Specialization = better at specific tasks. Communication = passing outputs. Challenges = coordination, agreement.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 9 (Short Answer)

**Question 9 - Evaluation of Agents**

How do you evaluate an LLM agent's performance?

Task success rate?
Efficiency (number of tool calls)?
Cost?
Safety (avoiding harmful actions)?

Explain: Why is this harder than evaluating a regular LLM?

**Hint**: Agents have multiple dimensions (success, efficiency, safety). Outcomes depend on environment.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 10 (Short Answer)

**Question 10 - Real-World Deployment**

Deploying autonomous agents in production requires:
- Sandboxing (limit what they can access)
- Human-in-the-loop for critical actions
- Monitoring and logging
- Rollback mechanisms

Explain: Why are these safeguards necessary? What could go wrong without them?

**Hint**: Agents can make mistakes, be manipulated, or take harmful actions. Safeguards prevent damage.

**Your Answer**:

[Write your answer here in 2-4 sentences]

---
## Submission

Before submitting:
1. Run all cells to ensure code executes without errors
2. Check that all questions are answered
3. Review your explanations for clarity

**To Submit**:
- File → Download → Download .ipynb
- Submit the notebook file to your course LMS

**Note**: Make sure your name is in the filename (e.g., homework_01_yourname.ipynb)