<img src="https://learn.deeplearning.ai/assets/dlai-logo.png"></img>

# 🔬 Lab Introduction: Tool Use and Reflective Agents

In this lab, you will explore how AI agents can enhance research workflows by leveraging external tools and engaging in critical self-reflection. You'll learn how to build and integrate callable tools—such as web and academic search functions, and connect them to a language model using OpenAI's tool-calling API. Then, you’ll guide the agent to not only generate content but also **reflect** on its own output, improving the quality and depth of the final report. By the end of this lab, you will have implemented a mini agent capable of searching, reasoning, and publishing structured reports in HTML—laying the foundation for more advanced multi-step and autonomous AI systems.

### 🎯 Learning Objectives

By the end of this lab, you can:
- Chain steps into a research pipeline (**search → reflection → formatting**).
- Convert natural-language output into **styled HTML** suitable for sharing.

## ⚙️ Setup

This section:
- Loads environment variables
- Instantiates the OpenAI client

In [3]:
# ================================
# Standard library imports
# ================================
import json
import requests

# ================================
# Third-party imports
# ================================
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import display, HTML


# ================================
# Local / project imports
# ================================
import research_tools

# ================================
# Environment setup
# ================================
load_dotenv()  # Load environment variables from .env file
client = OpenAI()

In [4]:
import unittests

## 🧰 Provided Tools

You’ll use two research helpers exposed in `research_tools`:
- **`arxiv_search_tool(query, max_results)`** – academic papers via arXiv API.
- **`tavily_search_tool(query, max_results, include_images)`** – general web search via Tavily.

## 🛠️ `arxiv_search_tool`

Searches arXiv and returns a list of papers with:
- `title`, `authors`, `published`, `summary`, `url`, and (if available) `link_pdf`.

Below, we run a quick test and print the results in a readable format.


In [5]:
# Test the arXiv search tool
results = research_tools.arxiv_search_tool("retrieval-augmented generation", max_results=3)

# Show formatted results
for i, paper in enumerate(results, 1):
    if "error" in paper:
        print(f"❌ Error: {paper['error']}")
    else:
        print(f"📄 Paper {i}")
        print(f"  Title     : {paper['title']}")
        print(f"  Authors   : {', '.join(paper['authors'])}")
        print(f"  Published : {paper['published']}")
        print(f"  URL       : {paper['url']}\n")


print("\n🧾 Raw Results:\n")
print(json.dumps(results, indent=2))

📄 Paper 1
  Title     : Generalized Baer and Generalized Quasi-Baer Rings of Skew Generalized
  Power Series
  Authors   : M. M. Hamam, R. E. Abdel-Khalek, R. M. Salem
  Published : 2024-05-06
  URL       : http://arxiv.org/abs/2405.03423v2

📄 Paper 2
  Title     : On generalized topological groups
  Authors   : Murad Hussain, Moiz Ud Din Khan, Cenap Özel
  Published : 2012-05-17
  URL       : http://arxiv.org/abs/1205.3915v1

📄 Paper 3
  Title     : Weighted spherical means generated by generalized translation and
  general Euler-Poisson-Darboux equation
  Authors   : Elina Shishkina
  Published : 2017-03-18
  URL       : http://arxiv.org/abs/1703.06340v1


🧾 Raw Results:

[
  {
    "title": "Generalized Baer and Generalized Quasi-Baer Rings of Skew Generalized\n  Power Series",
    "authors": [
      "M. M. Hamam",
      "R. E. Abdel-Khalek",
      "R. M. Salem"
    ],
    "published": "2024-05-06",
    "url": "http://arxiv.org/abs/2405.03423v2",
    "summary": "Let $R$ be a ring wi

## 🛠️ `tavily_search_tool`

Calls the Tavily API to fetch web results. Returns a list of dicts:
- `title`, `content`, `url` (and optional image URLs when `include_images=True`).

Run the cell to inspect sample output.

In [6]:
# Test the Tavily search tool
search_results = research_tools.tavily_search_tool("retrieval-augmented generation applications")
for item in search_results:
    print(item)

{'title': 'What is Retrieval Augmented Generation (RAG)? | Databricks', 'content': 'Retrieval augmented generation, or RAG, is an architectural approach that can improve the efficacy of large language model (LLM) applications by leveraging custom data. This is called retrieval augmented generation (RAG), as you would retrieve the relevant data and use it as augmented context for the LLM. With RAG architecture, organizations can deploy any LLM model and augment it to return relevant results for their organization by giving it a small amount of their data without the costs and time of fine-tuning or pretraining the model. * Using MLflow AI Gateway and Llama 2 to Build Generative AI Apps (Achieve greater accuracy using retrieval augmented generation (RAG) with your own data) Contact Databricks to schedule a demo and talk to someone about your LLM and retrieval augmented generation (RAG) projects', 'url': 'https://www.databricks.com/glossary/retrieval-augmented-generation-rag'}
{'title': '

## 🔗 Tool Mapping

We map tool names (strings) to the actual Python functions. This allows the model to call tools by name during tool-calling.

In [7]:
# Tool mapping
tool_mapping = {
    "tavily_search_tool": research_tools.tavily_search_tool,
    "arxiv_search_tool": research_tools.arxiv_search_tool,
}

Gotcha. Here’s a **learner-facing prompt** that doesn’t mention any internal tags, just what they’ll see (placeholders like `None`/`False`/empty blocks):

---

### 🧠 Exercise 1: Tool-Calling Research Assistant

**Goal:** Implement `generate_research_report_with_tools(prompt_)` so the model can call tools, gather evidence, and produce a sourced research report.

**What you’ll see in the code:**
Some arguments are set to placeholder values (e.g., `model=None`, `messages=None`, `tools=None`, `tool_choice=None`, conditions like `if False:`, fields like `"name": None`, and empty temp variables). **Replace those placeholders with working code** so the loop runs end-to-end.

**Constraints:**

* Do **not** change the surrounding structure or helper data that’s already provided.
* Only replace the obvious placeholders and fill in the missing pieces where indicated in the function.
* Keep the debug prints (e.g., `🛠️ tool(args)`) to help you trace what’s happening.


In [None]:
# GRADED FUNCTION: generate_research_report_with_tools
def generate_research_report_with_tools(prompt_: str, model: str = "gpt-4o") -> str:
    """
    Generates a research report using OpenAI's tool-calling with arXiv and Tavily tools.

    Args:
        prompt_ (str): The user prompt.
        model (str): OpenAI model name.

    Returns:
        str: Final assistant research report text.
    """
    messages = [
        {
            "role": "system",
            "content": (
                "You are a research assistant that can search the web and arXiv to write detailed, "
                "accurate, and properly sourced research reports.\n\n"
                "🔍 Use tools when appropriate (e.g., to find scientific papers or web content).\n"
                "📚 Cite sources whenever relevant. Do NOT omit citations for brevity.\n"
                "🌐 When possible, include full URLs (arXiv links, web sources, etc.).\n"
                "✍️ Use an academic tone, organize output into clearly labeled sections, and include "
                "inline citations or footnotes as needed.\n"
                "🚫 Do not include placeholder text such as '(citation needed)' or '(citations omitted)'."
            )
        },
        {"role": "user", "content": prompt_}
    ]

    functions = [research_tools.arxiv_tool_def, research_tools.tavily_tool_def]
    MAX_TURNS = 10
    final_text = None

    ### START CODE HERE ###

    # Define model, messages, tools, tool_choice, and temperature
    for _ in range(MAX_TURNS):
        response = client.chat.completions.create( # @KEEP response = client.chat.completions.create(
            model=model, # @REPLACE model=None,
            messages=messages, # @REPLACE messages=None,
            tools=functions,        # @REPLACE tools=None,
            tool_choice="auto",     # @REPLACE tool_choice=None,
            temperature=1,
        )

        msg = response.choices[0].message
        messages.append(msg)

        # Stop when the assistant returns a final answer (no tool calls)

        # Check if there are no tool calls in the message, replace with appropriate condition
        if not msg.tool_calls:      
            final_text = msg.content
            print("✅ Final answer:")
            print(final_text)
            break

        # Execute tool calls and append results
        for call in msg.tool_calls:
            tool_name = call.function.name
            args = json.loads(call.function.arguments)
            print(f"🛠️ {tool_name}({args})")

            try:
                tool_func = tool_mapping[tool_name]
                result = tool_func(**args)
            except Exception as e:
                result = {"error": str(e)}

            # Replace None with appropriate dictionary
            messages.append({ # @KEEP messages.append({
                "role": "tool", # @KEEP "role": "tool",
                "tool_call_id": call.id,
                "name": tool_name,  # @REPLACE "name": None,
                "content": json.dumps(result)
            })
    ### END CODE HERE ###

    return final_text or ""



---

### 🧠 Exercise 2: Reflection + Rewrite

**Goal:** Implement `reflection_and_rewrite(text_or_messages)` so it produces:

* a structured reflection (**Strengths, Limitations, Suggestions, Opportunities**), and
* a **revised report** that incorporates those suggestions.

**What you’ll see in the code:**
Placeholders like `user_prompt = None`, `model=None`, `{"role":"user","content": None}`, and `temperature=None`. **Replace those placeholders with working code.** Don’t change the surrounding structure.

**Your tasks:**

* **Build the `user_prompt`** string that:

   * Asks for a structured reflection with the four sections.
   * Asks for a revised version of the report that applies the suggestions.
   * Includes the input report text (`text`) at the end.

**Hints:**

* The top of the function already normalizes `text_or_messages` into `text`; you don’t need to modify that.
* Keep your prompt concise but explicit about the four sections and the rewrite request.
* Make sure you pass the actual `model` and `temperature` parameters (not hard-coded values).


In [None]:
# GRADED FUNCTION: reflection_and_rewrite
def reflection_and_rewrite(text_or_messages, model: str = "gpt-4o-mini", temperature: float = 0.3) -> dict:
    """
    Generates a structured reflection AND a revised research report.
    Accepts raw text OR the messages list returned by generate_research_report_with_tools.

    Returns:
        dict with keys:
          - "reflection": structured reflection text
          - "revised_report": improved version of the input report
    """
    # Extract assistant content if messages were passed
    if isinstance(text_or_messages, list):
        text = None
        for m in reversed(text_or_messages):
            role = m.get("role") if isinstance(m, dict) else getattr(m, "role", None)
            content = m.get("content") if isinstance(m, dict) else getattr(m, "content", None)
            if role == "assistant" and content:
                text = content
                break
        if not text:
            raise ValueError("No assistant text found in messages.")
    else:
        text = str(text_or_messages)

    ### START CODE HERE ###
    # Build the user prompt
    user_prompt = (
        "First, provide a structured reflection (Strengths, Limitations, Suggestions, Opportunities) "
        "on the following report.\n\n"
        "Then, write a revised version of the report that incorporates your suggestions, "
        "improves clarity, and strengthens academic tone.\n\n"
        f"Report:\n{text}"
    )  # @REPLACE user_prompt = None

    # Call OpenAI
    resp = client.chat.completions.create( # @KEEP resp = client.chat.completions.create(
        # Replace None with appropriate model variable
        model=model, # @REPLACE model=None,
        messages=[
            {"role": "system", "content": "You are an academic reviewer and editor."},
            # Add user prompt
            {"role": "user", "content": user_prompt}, # @REPLACE {"role": "user", "content": None},
        ],
        # Replace None with appropriate temperature variable
        temperature=temperature, # @REPLACE temperature=None
    )

    # Extract output
    full_output = resp.choices[0].message.content.strip()  
    ### END CODE HERE ###

    return {
        "reflection": full_output,
        "revised_report": full_output
    }


---

### 🧠 Exercise 3: Publish the Report as HTML

**Goal:** Complete `convert_report_to_html(text_or_messages, model="gpt-4o")` so it returns a **valid HTML string** generated by the model.

**What you’ll see in the code:**
Placeholders like `model=None`, `{"role":"system","content": None}`, `{"role":"user","content": None}`, and `temperature=None`. **Replace those placeholders with working code.** Do not change the surrounding structure.

**Your tasks:**

- **Build the `user_prompt`** (already started for you) that:

   * Asks the model to convert the plain-text report to **clean, structured HTML**.
   * Requires **only HTML** in the response (no explanations).
   * Includes the `text_report` content at the end.

- **Return `html` string.**

**Hints:**

* Use the existing `system_prompt` variable as your system message.
* Keep the user prompt strict: “Respond ONLY with valid HTML.”
* The function already extracts `text_report` from `text_or_messages`—no need to modify that logic.


In [None]:
# GRADED FUNCTION: convert_report_to_html
def convert_report_to_html(text_or_messages, model: str = "gpt-4o") -> str:
    """
    Converts a plaintext research report into a styled HTML page using OpenAI.
    Accepts raw text OR the messages list from the tool-calling step.
    """
    # Inline extraction (no helper)
    if isinstance(text_or_messages, list):
        text_report = None
        for m in reversed(text_or_messages):
            role = m.get("role") if isinstance(m, dict) else getattr(m, "role", None)
            content = m.get("content") if isinstance(m, dict) else getattr(m, "content", None)
            if role == "assistant" and content:
                text_report = content
                break
        if not text_report:
            raise ValueError("No assistant text found in messages.")
    else:
        text_report = str(text_or_messages)

    if not text_report:
        raise ValueError("Empty report text.")

    system_prompt = "You convert plaintext reports into full clean HTML documents."

    ### START CODE HERE ###
    # Build the user prompt instructing the model to return ONLY valid HTML
    user_prompt = ( # @KEEP user_prompt = (
        "You are an expert technical writing assistant. "
        "Convert the following plaintext research report into a clean, structured HTML document. "
        "Include section headers, well-formatted paragraphs, and clickable links. "
        "Ensure citation style is preserved.\n\n"
        "Respond ONLY with valid HTML (no explanation).\n\n"
        f"Report:\n{text_report}"
    ) 

    # Call the OpenAI API with a system + user message
    resp = client.chat.completions.create( # @KEEP resp = client.chat.completions.create(
        model=model,  # @REPLACE model=None,
        messages=[
            # Add system and content here
            {"role": "system", "content": system_prompt},      # @REPLACE {"role": "system", "content": None},
            # Add user prompt
            {"role": "user", "content": user_prompt},          # @REPLACE {"role": "user", "content": None}
        ],
        # Add temperature value
        temperature=0.5 # @REPLACE temperature=None
    )

    # Extract the HTML from the assistant message
    html = resp.choices[0].message.content.strip()  
    ### END CODE HERE ###

    return html


### 🚀 End-to-End Pipeline

Run this cell to execute the full workflow:

1. Generate a research report (tools).
2. Reflect on the report.
3. Convert the report to HTML.

> You should see the rendered HTML below and two concise reflections in the console.

In [None]:
# 1) Research with tools
prompt_ = "Radio observations of recurrent novae"
preliminary_report = generate_research_report_with_tools(prompt_)
print("=== Research Report (preliminary) ===\n")
print(preliminary_report)

# 2) Reflection on the report (use the final TEXT to avoid ambiguity)
reflection_text = reflection_and_rewrite(preliminary_report)   # <-- pass text, not messages
print("=== Reflection on Report ===\n")
print(reflection_text['reflection'], "\n")
print("=== Revised Report ===\n")
print(reflection_text['revised_report'], "\n")


# 3) Convert the report to HTML (use the TEXT and correct function name)
html = convert_report_to_html(reflection_text['revised_report'])

print("=== Generated HTML (preview) ===\n")
print((html or "")[:600], "\n... [truncated]\n")

# 4) Display full HTML
display(HTML(html))


In [None]:
# Test your code!

# Run unit tests on your graded functions

# Test 1
unittests.test_generate_research_report_with_tools(generate_research_report_with_tools)

# Test 2
unittests.test_reflection_and_rewrite(reflection_and_rewrite)

# Test 3
unittests.test_convert_report_to_html(convert_report_to_html)

### 📌 “Expected Output” note (for the notebook text cell)

> **Expected Output:**
> These tests print pass/fail feedback for each check (type, basic behavior, and minimal content).
>
> * `generate_research_report_with_tools` should return a **non-trivial string** (> 50 chars).
> * `reflection_and_rewrite` should return a **dict** with **'reflection'** and **'revised\_report'** (both strings). The reflection should **mention** the four sections (Strengths, Limitations, Suggestions, Opportunities).
> * `convert_report_to_html` should return a **string that looks like HTML** (e.g., includes `<html>`, `<h1>`, `<p>`, or closing tags).


---

## ✅ Wrap-Up

You built a mini research agent that can:
- 🔎 call tools (arXiv + Tavily),
- 🧠 reflect on its own output,
- 📰 publish a clean HTML report.

Great job!

### What to Submit
- Your notebook with Exercise 1–3 completed.

### Troubleshooting (quick)
- **Model/tool-call loop stalls?** Lower `MAX_TURNS` or print intermediate messages.
- **HTML looks odd?** Re-run conversion with a fresh assistant response.

**You’re done—nice work!** 🚀
