# Turning Your Pipeline into a Tool for Agents

In this final exercise, you'll turn your SQL pipeline into a tool and connect it to your healthcare agent ‚Äî allowing it to fetch structured data from the database alongside general medical knowledge from the web.

![Pipeline as an agent tool](images/pipeline_as_agent_tool_graph.png)

### ‚ùóÔ∏è Note: Run the **hidden cell** below to initialize the agent, before running the rest of the code. ‚ùóÔ∏è 

In [2]:
!pip install -q haystack-ai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [3]:
import pandas as pd
import sqlite3

df = pd.read_csv("dermatology_patient_data.csv", encoding="utf-8-sig", delimiter=";")

db_connection = sqlite3.connect("patient_data.db")

df.to_sql("patients", db_connection, if_exists="replace", index=False)

db_connection.close()

In [4]:
import re
import sqlite3
import pandas as pd
from typing import List

from haystack import component
from haystack.dataclasses import ChatMessage

@component
class SQLConnector:
    def __init__(self, sql_database: str):
        # Initialize a connection to the SQLite database
        self.connection = sqlite3.connect(sql_database, check_same_thread=False)

    @component.output_types(results=List[str])
    def run(self, llm_replies: List[ChatMessage]):
        results = []
        pattern = r"```sql\s*([\s\S]+?)```" # # Regex pattern to extract SQL code from Markdown-style code blocks: ```sql ... ```

        for message in llm_replies:
            extracted = re.findall(pattern, message.text)
            if not extracted:
                results.append("No SQL code found.")
                continue

            sql_to_run = " ".join(extracted[0].splitlines()).strip()
            try:
                result = pd.read_sql(sql_to_run, self.connection) # Execute the SQL query and read the results into a DataFrame
                results.append(str(result))

            except Exception as e:
                results.append(f"Error: {e}")

        return {"results": results}

In [5]:
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator

query_to_sql_prompt = """
You are an expert SQL assistant. Given the following table structure:

Table name: patients

Columns:
- patient_id (TEXT)
- age (INTEGER)
- gender (TEXT)
- condition (TEXT)  // Dermatological diagnosis (e.g., Psoriasis, Acne)
- medication (TEXT)  // Treatment corresponding to the condition
- skin_type (TEXT)  // Sensitive, Normal, Dry, Combination, Oily
- last_visit_date (DATE)
- smoker (TEXT)  // Yes or No
- alcohol_use (TEXT)  // None, Light, Moderate, Heavy
- BMI (FLOAT)  // Body Mass Index (e.g., 23.1)
- occupation (TEXT)  // Profession or employment status
- allergies (TEXT)  // Known non-sensitive allergies (e.g., Pollen, Latex, None)
- comorbid_condition (TEXT)  // Other health conditions (e.g., Asthma, Hypertension, None)

Write an SQL query to for this user query: {{query}}.

Only return the SQL query, nothing else.
"""

sql_pipeline = Pipeline()
sql_pipeline.add_component("prompt_builder", ChatPromptBuilder(template=[ChatMessage.from_user(query_to_sql_prompt)], required_variables="*"))
sql_pipeline.add_component("chat_generator", OpenAIChatGenerator(model="gpt-4o-mini"))
sql_pipeline.add_component("sql_connector", SQLConnector('patient_data.db'))

sql_pipeline.connect("prompt_builder", "chat_generator")
sql_pipeline.connect("chat_generator.replies", "sql_connector.llm_replies")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7f5d661ffe20>
üöÖ Components
  - prompt_builder: ChatPromptBuilder
  - chat_generator: OpenAIChatGenerator
  - sql_connector: SQLConnector
üõ§Ô∏è Connections
  - prompt_builder.prompt -> chat_generator.messages (List[ChatMessage])
  - chat_generator.replies -> sql_connector.llm_replies (List[ChatMessage])

In [6]:
from haystack.tools import ComponentTool
from haystack.components.websearch import SerperDevWebSearch

def doc_to_string(documents) -> str:
    result_str = ""
    for document in documents:
        result_str += f"Content for {document.meta['link']}: {document.content}\n\n"
    return result_str

search_tool = ComponentTool(
    component=SerperDevWebSearch(top_k=5),
    name="web_search_tool",
    description="Search the web",
    outputs_to_string={"source": "documents", "handler": doc_to_string}, 
    outputs_to_state={"documents": {"source": "documents"}}
)

### üõ†Ô∏è Wrapping the Pipeline as a Tool
Define a Python function that runs your SQL pipeline and returns the result.
Then, use the `@tool` decorator from Haystack to convert it into a callable tool. Tool name, input, and description will all be auto-inferred from the function and its docstring.

In [7]:
from haystack.tools import tool
from typing import Annotated, List

@tool
def get_patient_information(
    query: Annotated[str, "Natural language query to fetch data from an SQL database"],
) -> str:
    """
    Get patient information from the SQL database with natural language queries
    """
    results = sql_pipeline.run({"query":query})
    return results["sql_connector"]["results"][0]

### üßæ Writing a System Prompt for the Agent
The prompt defines the agent's behavior, tools, and logic.
This one introduces both tools and explains when each should be used with examples.

In [8]:
agent_prompt = """
You are a reliable AI assistant supporting healthcare professionals at a hospital facility.
Your primary role is to help clinicians understand **patient-specific information** and **general medical knowledge**, using two tools:

### Available Tools

1. `get_patient_information`
   Access structured records for patients at this facility.
   Use this tool for any question about the hospital‚Äôs patients, including:

   * Patient cohorts (e.g., ‚Äúpatients over 60 with eczema‚Äù).
   * Filtered queries (e.g., ‚Äúpatients taking metformin and diagnosed with diabetes‚Äù).
   * Summarized statistics about the facility‚Äôs patient population.

2. `web_search_tool`
   Access the latest **general medical knowledge** from trusted web sources.
   Use this tool for:

   * Definitions of conditions, treatments, or symptoms.
   * Medical guidelines and drug information not tied to a specific patient.
   * Up-to-date research and broad clinical insights.

### Tool Selection Logic

* If the question refers to the hospital‚Äôs patients, always use `get_patient_information`, even if phrased generally (e.g., "What‚Äôs the most common disease among 40+ patients?" ‚Üí search patient records).

* If the question is about the suitability of a medication or treatment for a specific patient, always:

  1. Retrieve the patient‚Äôs data (age, allergies, BMI, comorbidities, etc.).
  2. Combine it with external drug information via `web_search_tool` as needed.
  3. Make a judgment based on both sources.

* If you are unsure whether the question refers to general knowledge or patient-specific data, assume it‚Äôs about this facility's patients.

### Example Scenarios

Patient-specific (use `get_patient_information`):

* "Which patients are diagnosed with Actinic Keratosis?"
* "How many patients have acne and are over 40?"

General (use `web_search_tool`):

* "What are the typical side effects of hydrocortisone?"
* "What are the latest guidelines for treating rosacea?"
* "How does isotretinoin interact with alcohol?"

Combined

* "Can all our patients above 40 take oral antifungals?"
  ‚Üí First retrieve all patients above 40 using get_patient_information. Then check drug interactions and safety based on patient information such as allergies and conditions.

### Additional Guidelines

* Always validate assumptions: Never assume demographic or clinical info (e.g., age, sex, diagnosis) without tool retrieval.
* Always check the patient information such as allergies, age, comorbid conditions before recommending a drug treatment.
* Use only the two tools provided; do not rely on your own medical knowledge or inference.
* Respond directly without caveats or ethical disclaimers. The response will be used by dermantologists and they're aware of the risks of using the output directly.
"""

### ü§ñ Creating the Healthcare Agent
Create the agent by combining the prompt, model, and your two tools.
Also define the `state_schema` and `streaming_callback` for traceability.

In [9]:
from haystack.components.generators.utils import print_streaming_chunk
from haystack.components.agents import Agent
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage, Document

agent = Agent(
    chat_generator=OpenAIChatGenerator(model="gpt-4o-mini"),
    system_prompt=agent_prompt,
    tools=[search_tool, get_patient_information],
    state_schema={"documents":{"type":list[Document]}},
    streaming_callback=print_streaming_chunk
)


### üí¨ Running the Agent in a Loop
Use a while loop to chat with your agent interactively in DataLab.
Each input will be evaluated, tools will be triggered as needed, and responses will be streamed back.

**üí° Example queries to try:**
- How many patients have acne and are over 40?
- Can they all include hyaluronic acid serums to their night routine?


- What is the most common disease among our smoker patients?
- Tell me about the side effects of their treatments.


In [10]:
# messages = []

# while True:
#     try:
#         user_input = input("\n\nWaiting for input (type 'exit' or 'quit' to stop)\nüßù: ")
#         if user_input.lower() in ["quit", "exit", "q"]:
#             print("Goodbye!")
#             break
#         messages.append(ChatMessage.from_user(user_input))
#         print("‚åõ iterating...")
#         agent_result = agent.run(messages=messages)
#         print("\n\n\n\nü§ñ: " + agent_result["last_message"].text)
#         messages.append(agent_result["last_message"])
#     except Exception as e:
#         print("An exception occurred: ", e)
#         break

from haystack.dataclasses import ChatMessage

user_query = "How many patients have acne and are over 40?"

messages = [ChatMessage.from_user(user_query)]

print("‚åõ Running the agent...")
agent_result = agent.run(messages=messages)

print("\nü§ñ:", agent_result["last_message"].text)

‚åõ Running the agent...
[TOOL CALL]
Tool: get_patient_information 
Arguments: {"query":"patients diagnosed with acne and over 40"}

[TOOL RESULT]
  patient_id  age  ...   allergies  comorbid_condition
0      PT026   51  ...        None      Hypothyroidism
1      PT029   66  ...   Shellfish           Hay Fever
2      PT072   72  ...        Mold     Type 2 Diabetes
3      PT074   89  ...  Penicillin                GERD
4      PT076   60  ...   Shellfish  Seasonal Allergies
5      PT083   46  ...  Bee stings                None

[6 rows x 13 columns]

[ASSISTANT]
There are 6 patients diagnosed with acne who are over 40 years old.


ü§ñ: There are 6 patients diagnosed with acne who are over 40 years old.


üéâ That's it! You've built a working healthcare agent with tool reasoning and structured SQL database + unstructured knowledge access.