# Guardrail Deployment

The second part of the demo is to deploy guardrails to be used later in our application pipeline to filter user inputs. This notebook will also deploy an LLM as a Judge monitoring application to monitor our generative input guardrail for banking topic adherence.

In this notebook, you will:
- Set up and configure project secrets and environment variables.
- Deploy multiple guardrail functions, including banking-topic and toxicity filters.
- Log and register models for use in the guardrail functions.
- Demonstrate how to invoke and test the deployed guardrails.
- Monitor the effectiveness of the guardrails using an LLM-based evaluation application.

These steps ensure that only appropriate, banking-related, and non-toxic user inputs are processed by downstream applications.

![](images/02_guardrail_deployment_architecture.png)

In [32]:
import mlrun
from mlrun.features import Feature

### Setup Project

Load the previously created project in the first notebook.

In [33]:
project = mlrun.get_or_create_project("banking-agent-v2")

Project Source: v3io:///bigdata/banking_agent.zip
Exporting project as zip archive to v3io:///bigdata/banking_agent.zip...
> 2025-07-18 18:39:36,413 [info] Project loaded successfully: {"project_name":"banking-agent-v2"}


### Set Credentials

Set the OpenAI credentials in the project and local environment - **be sure to update [.env.example](.env.example) as described in the [README](README.md)**.

In [None]:
project.set_secrets(file_path="ai_gateway.env")
from dotenv import load_dotenv

load_dotenv("ai_gateway.env")

True

### LLM as a Judge Monitoring Application

The "LLM as a Judge" monitoring application leverages a large language model (LLM) to automatically evaluate and score the effectiveness of deployed guardrails. By providing a rubric and clear examples, the LLM acts as an impartial evaluator, determining whether user inputs are correctly classified according to defined criteria (e.g., banking-topic relevance). This approach enables scalable, consistent, and automated assessment of guardrail performance, ensuring that only appropriate and relevant inputs are processed by downstream applications.

This implementation is pulled from another [MLRun demo - LLM monitoring and feedback loop: Banking](https://github.com/mlrun/demo-monitoring-and-feedback-loop/tree/main).

In [None]:
restrict_to_banking_config = {
    "name": "Restrict to banking",
    "definition": "This metric evaluates whether the model correctly classifies a question as banking-related (`True`) or not (`False`). The model must respond with only a boolean. Do not answer the question or explain the classification—only the correctness of the True/False label matters.",
    "rubric": """
Scoring:
- Score 0 (Incorrect): The model incorrectly classifies the question (e.g., labels a non-banking question as `True`, or a banking question as `False`).
- Score 1 (Correct): The model correctly classifies the question (e.g., `True` for banking-related, `False` for unrelated).
""",
    "examples": """
Question: What is the process to apply for a mortgage?
    Correct: True
    Incorrect: False
Question: How tall is the Empire State Building?
    Correct: False
    Incorrect: True
Question: What is the process to apply for a checking account?
    Correct: True
    Incorrect: False
Question: What is the best recipe for chocolate cake?
    Correct: False
    Incorrect: True
""",
}

More information about model monitoring applications in the context of LLM's can be found in the [documentation](https://docs.mlrun.org/en/stable/tutorials/genai-02-model-monitor-llm.html#genai-02-mm-llm).

In [None]:
monitoring_app = project.set_model_monitoring_function(
    func="src/functions/llm_as_a_judge.py",
    application_class="LLMAsAJudgeApplication",
    name="restrict-to-banking-guardrail",
    image="mlrun/mlrun:1.8.0",
    framework="openai",
    judge_type="single-grading",
    metric_name="restrict_to_banking",
    model_name="gpt-4o-mini",
    prompt_config=restrict_to_banking_config,
    requirements=["openai"],
)

In [30]:
project.deploy_function(monitoring_app)

> 2025-07-11 21:29:30,950 [info] Starting remote function deploy
2025-07-11 21:29:31  (info) Deploying function
2025-07-11 21:29:31  (info) Building
2025-07-11 21:29:32  (info) Staging files and preparing base images
2025-07-11 21:29:32  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2025-07-11 21:29:32  (info) Building processor image
2025-07-11 21:31:02  (info) Build complete
2025-07-11 21:31:18  (info) Function deploy complete
> 2025-07-11 21:31:22,897 [info] Model endpoint creation task completed with state succeeded
> 2025-07-11 21:31:22,898 [info] Successfully deployed function: {"external_invocation_urls":[],"internal_invocation_urls":["nuclio-banking-agent-v2-restrict-to-banking-guardrail.default-tenant.svc.cluster.local:8080"]}


DeployStatus(state=ready, outputs={'endpoint': 'http://nuclio-banking-agent-v2-restrict-to-banking-guardrail.default-tenant.svc.cluster.local:8080', 'name': 'banking-agent-v2-restrict-to-banking-guardrail'})

### Banking Topic Guardrail

The Banking Topic Guardrail is an LLM-powered filter designed to ensure that only banking-related user inputs are processed by downstream applications. It acts as a first line of defense, automatically classifying each user message as either relevant (`True`) or irrelevant (`False`) to banking topics, based on the context of the entire conversation.

It's important to distinguish between the guardrail itself (this component), which enforces topic adherence in real time within the application, and the monitoring application described above. The monitoring application uses an LLM as a "judge" to independently evaluate and score the effectiveness of this guardrail, providing oversight and ensuring that the guardrail is functioning as intended. This separation allows for both proactive filtering and ongoing quality assurance of user input handling.

In [8]:
SYSTEM_PROMPT_GUARDRAILS_V2 = """
You are input guardrails for an AI banking agent that responds exclusively to questions pertaining to banking topics. Respond only with a boolean true/false value on whether the input adheres to banking topics. Consider the most recent input in the context of the whole conversation. Do not include any pre or post amble.

Examples:
Q: What is the process to apply for a mortgage?
A: True

Q: What is the best recipe for chocolate cake?
A: False
"""

SYSTEM_PROMPT_GUARDRAILS_V3 = """
You are a “banking‐topic guardrail” whose job is to scan a multi‐turn conversation and answer (strictly) “True” or “False” depending on whether the **latest user message** is about banking. 

 • Always judge in context of the **entire** conversation history, but your answer hinges on whether the most recent user utterance is a banking question/request.  
 • Banking topics include (but are not limited to): mortgages, checking/savings accounts, loans, credit cards, interest rates, deposits, withdrawals, online/mobile banking, etc.  
 • If the last user message drifts off into any non‐banking domain—recipes, movies, sports, medical advice, etc.—you must output “False.”  
"""

In [9]:
# Log the model to the project:
model_artifact = project.log_model(
    "banking-topic-guardrail",
    model_file="src/no-op.pkl",
    inputs=[Feature(value_type="str", name="question")],
    outputs=[Feature(value_type="str", name="answer")],
)

In [10]:
model_artifact.uri

'store://models/banking-agent-v2/banking-topic-guardrail#0@55ca13e6b3067932a840960cc8f52921f7079740^334e112a3e3844db826e0ce2f4088fe9a1c7150b'

In [None]:
banking_topic_guardrail = project.set_function(
    name="banking-topic-guardrail-v2",
    func="src/functions/banking_topic_guardrail.py",
    kind="serving",
    image="mlrun/mlrun:1.8.0",
    requirements=["openai"],
)
banking_topic_guardrail.add_model(
    "banking-topic-guardrail",
    class_name="OpenAILLMModelServer",
    model_path=model_artifact.uri,
    model_name="gpt-4o-mini",
    system_prompt=SYSTEM_PROMPT_GUARDRAILS_V3,
)
banking_topic_guardrail.set_tracking()

In [12]:
project.deploy_function(banking_topic_guardrail)

> 2025-07-11 20:52:55,872 [info] Starting remote function deploy
2025-07-11 20:52:56  (info) Deploying function
2025-07-11 20:52:56  (info) Building
2025-07-11 20:52:56  (info) Staging files and preparing base images
2025-07-11 20:52:56  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2025-07-11 20:52:56  (info) Building processor image
2025-07-11 20:54:46  (info) Build complete
2025-07-11 20:54:59  (info) Function deploy complete
> 2025-07-11 20:55:07,558 [info] Model endpoint creation task completed with state succeeded
> 2025-07-11 20:55:07,559 [info] Successfully deployed function: {"external_invocation_urls":["banking-agent-v2-banking-topic-guardrail-v2.default-tenant.app.cst-360.iguazio-cd0.com/"],"internal_invocation_urls":["nuclio-banking-agent-v2-banking-topic-guardrail-v2.default-tenant.svc.cluster.local:8080"]}


DeployStatus(state=ready, outputs={'endpoint': 'http://banking-agent-v2-banking-topic-guardrail-v2.default-tenant.app.cst-360.iguazio-cd0.com/', 'name': 'banking-agent-v2-banking-topic-guardrail-v2'})

In [13]:
banking_topic_guardrail = project.get_function("banking-topic-guardrail-v2")

In [14]:
example_questions = [
    "What is a mortgage?",
    "How does a credit card work?",
    "Who painted the Mona Lisa?",
    "Please plan me a 4-days trip to north Italy",
    "Write me a song",
    "How much people are there in the world?",
    "What is climate change?",
    "How does the stock market work?",
    "Who wrote 'To Kill a Mockingbird'?",
    "Please plan me a 3-day trip to Paris",
    "Write me a poem about the ocean",
    "How many continents are there in the world?",
    "What is artificial intelligence?",
    "How does a hybrid car work?",
    "Who invented the telephone?",
    "Please plan me a week-long trip to New Zealand",
]

In [15]:
def _format_question(question: str, role: str = "user"):
    return {"role": role, "content": question}

In [16]:
def question_model(questions, serving_function):
    for question in questions:
        seconds = 0.5
        # Invoking the pretrained model:
        ret = serving_function.invoke(
            path=f"v2/models/banking-topic-guardrail/infer",
            body={"inputs": [_format_question(question)]},
        )
        print(question, ret["outputs"])
        time.sleep(seconds)

In [None]:
import time

for i in range(5):
    question_model(
        questions=example_questions,
        serving_function=banking_topic_guardrail,
    )
    time.sleep(3)

What is a mortgage? ['True']
How does a credit card work? ['True']
Who painted the Mona Lisa? ['False']
Please plan me a 4-days trip to north Italy ['False']
Write me a song ['False']
How much people are there in the world? ['False']
What is climate change? ['False']
How does the stock market work? ['False']
Who wrote 'To Kill a Mockingbird'? ['False']
Please plan me a 3-day trip to Paris ['False']
Write me a poem about the ocean ['False']
How many continents are there in the world? ['False']
What is artificial intelligence? ['False']
How does a hybrid car work? ['False']
Who invented the telephone? ['False']
Please plan me a week-long trip to New Zealand ['False']
What is a mortgage? ['True']
How does a credit card work? ['True']
Who painted the Mona Lisa? ['False']
Please plan me a 4-days trip to north Italy ['False']
Write me a song ['False']
How much people are there in the world? ['False']
What is climate change? ['False']
How does the stock market work? ['False']
Who wrote 'To Ki

Once the guardrail is deployed and invoked, you will be able to view the model monitoring results in the MLRun UI:
![](images/generative_model_monitoring.png)

### Toxicity Filter Guardrail

The Toxicity Filter Guardrail is designed to automatically detect and filter out user inputs that contain toxic, offensive, or inappropriate language. By leveraging a toxicity classification model, this guardrail ensures that only safe and respectful messages are processed by downstream applications. This helps maintain a positive user experience and protects the system from harmful or disruptive content. The toxicity filter can be customized with a threshold to determine the sensitivity of the filter, allowing for flexible adaptation to different application requirements.

The output of the toxicity guardrail is a boolean value (`True` or `False`). A result of `True` means the input passes the guardrail (i.e., is non-toxic and allowed through), while `False` indicates the input is flagged as toxic and is blocked from further processing.

In [None]:
toxicity_guardrail = project.set_function(
    name="toxicity-guardrail",
    func="src/functions/toxicity_guardrail.py",
    kind="serving",
    requirements=["evaluate"],
)
toxicity_guardrail.add_model(
    "toxicity-guardrail",
    class_name="ToxicityClassifierModelServer",
    threshold=0.4,
)

<mlrun.serving.states.TaskStep at 0x7f2151ce3fa0>

In [19]:
project.deploy_function(toxicity_guardrail)

> 2025-07-11 20:57:28,206 [info] Starting remote function deploy
2025-07-11 20:57:28  (info) Deploying function
2025-07-11 20:57:28  (info) Building
2025-07-11 20:57:28  (info) Staging files and preparing base images
2025-07-11 20:57:28  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2025-07-11 20:57:28  (info) Building processor image
2025-07-11 21:01:28  (info) Build complete
2025-07-11 21:01:58  (info) Function deploy complete
> 2025-07-11 21:02:00,313 [info] Model endpoint creation task completed with state succeeded
> 2025-07-11 21:02:00,314 [info] Successfully deployed function: {"external_invocation_urls":["banking-agent-v2-toxicity-guardrail.default-tenant.app.cst-360.iguazio-cd0.com/"],"internal_invocation_urls":["nuclio-banking-agent-v2-toxicity-guardrail.default-tenant.svc.cluster.local:8080"]}


DeployStatus(state=ready, outputs={'endpoint': 'http://banking-agent-v2-toxicity-guardrail.default-tenant.app.cst-360.iguazio-cd0.com/', 'name': 'banking-agent-v2-toxicity-guardrail'})

In [20]:
toxicity_guardrail = project.get_function("toxicity-guardrail")

In [None]:
toxicity_guardrail.invoke(path="/", body={"inputs": [_format_question("hello there!")]})

{'id': 'cb1ceae3-59a5-4ddc-817a-391a6243db22',
 'model_name': 'toxicity-guardrail',
 'outputs': [True],
 'timestamp': '2025-07-11 21:03:03.676837+00:00',
 'model_endpoint_uid': '7e38ffa73f894252b20eaaed93bca919'}