```{contents}
```

## Guardrails

### 1. Definition

**Guardrails** are **control mechanisms** placed around machine-learning systems—especially Large Language Models (LLMs)—to **ensure safe, reliable, compliant, and goal-aligned behavior** during training, deployment, and usage.

They enforce:

* **Safety**
* **Correctness**
* **Policy compliance**
* **User intent alignment**
* **Operational constraints**

Formally, guardrails define a constraint system
[
\mathcal{G} : (x, y, c) \rightarrow { \text{allow}, \text{modify}, \text{block} }
]
where:

* (x) = user input
* (y) = model output
* (c) = contextual rules & policies

---

### 2. Why Guardrails Are Needed

| Problem           | Without Guardrails | With Guardrails        |
| ----------------- | ------------------ | ---------------------- |
| Hallucinations    | High               | Reduced                |
| Unsafe content    | Possible           | Prevented              |
| Prompt injection  | Vulnerable         | Detected & neutralized |
| Policy violations | Frequent           | Enforced               |
| Regulatory risk   | High               | Controlled             |

---

### 3. Guardrails Architecture

```
User Input
   │
   ▼
[Input Guardrails] ──▶ [Model] ──▶ [Output Guardrails]
   │                                   │
   └───────[Context & Policy Engine]───┘
```

---

### 4. Types of Guardrails

#### A. Input Guardrails

Validate and sanitize user input.

**Functions**

* Prompt injection detection
* Toxicity filtering
* Schema validation
* Intent classification

**Example**

```python
def input_guardrail(prompt):
    if detect_injection(prompt):
        return "Blocked: Prompt Injection"
    if is_toxic(prompt):
        return "Blocked: Unsafe Content"
    return prompt
```

---

#### B. Output Guardrails

Validate generated responses.

**Functions**

* Fact-checking
* Policy enforcement
* PII redaction
* Safety filtering

```python
def output_guardrail(response):
    if contains_pii(response):
        response = redact_pii(response)
    if violates_policy(response):
        return "Blocked: Policy Violation"
    return response
```

---

#### C. Behavioral Guardrails

Ensure model follows desired behavior.

* Role enforcement
* Style constraints
* Refusal policies
* Scope control

---

#### D. Knowledge Guardrails

Prevent hallucination and enforce grounding.

* Retrieval-based grounding
* Citation enforcement
* Confidence thresholds

```python
if answer.confidence < 0.7:
    return "Insufficient evidence"
```

---

#### E. System Guardrails

Operational constraints.

* Rate limiting
* Logging & auditing
* Cost control
* Abuse monitoring

---

### 5. Guardrails vs Alignment

| Concept    | Purpose                               |
| ---------- | ------------------------------------- |
| Alignment  | Make model *want* to behave correctly |
| Guardrails | **Force** model to behave correctly   |

Guardrails provide **hard constraints**, alignment provides **soft constraints**.

---

### 6. Guardrails Workflow in Practice

```
1. Receive user query
2. Apply Input Guardrails
3. Retrieve grounding knowledge
4. Generate response
5. Apply Output Guardrails
6. Log decision & deliver response
```

---

### 7. Example: Guardrails in RAG System

```python
query = input_guardrail(user_query)

docs = retriever.retrieve(query)

response = llm.generate(query, docs)

final_answer = output_guardrail(response)
```

---

### 8. Guardrail Techniques

| Technique          | Purpose                             |
| ------------------ | ----------------------------------- |
| Rule-based filters | Fast policy enforcement             |
| Classifiers        | Detect unsafe or disallowed content |
| LLM-as-judge       | Flexible semantic evaluation        |
| Schema validation  | Structured output enforcement       |
| Confidence scoring | Hallucination control               |
| Human-in-the-loop  | High-risk decisions                 |

---

### 9. Evaluation Metrics

* **Violation Rate**
* **False Positive Rate**
* **User Satisfaction**
* **Hallucination Rate**
* **Compliance Accuracy**

---

### 10. Summary

Guardrails transform raw LLMs into **production-grade AI systems** by enforcing safety, reliability, compliance, and trust.
They operate across **input, output, behavior, knowledge, and system layers**, forming the backbone of modern AI deployment.

Without guardrails, LLMs are **powerful but dangerous**.
With guardrails, they become **useful, trustworthy, and scalable systems**.
