```{contents}
```
## Content Filtering 

**Content filtering** is the process of **detecting, blocking, or modifying** generated or user-provided text that violates safety, policy, or business rules before it is stored, displayed, or acted upon.

It enforces **what is allowed** and **what is forbidden** in the system.

---

### Where It Fits in the Pipeline

```
User Input / LLM Output
        ↓
Content Filter ── allowed → Continue
        ↓ blocked
Remove / Replace / Reject / Log
```

---

### Types of Content Filtering

| Type                 | Purpose                   |
| -------------------- | ------------------------- |
| Keyword filtering    | Block known harmful terms |
| Pattern filtering    | Detect unsafe expressions |
| Category filtering   | Enforce policy classes    |
| Context filtering    | Block unsafe meaning      |
| Compliance filtering | Meet legal standards      |

---

### Basic Keyword Filter

#### Demonstration

```python
BLOCKED_WORDS = ["password", "secret", "credit card"]

def keyword_filter(text):
    lower = text.lower()
    for word in BLOCKED_WORDS:
        if word in lower:
            return False
    return True
```

---

### Pattern-Based Filtering

#### Demonstration

```python
import re

BLOCK_PATTERNS = [
    r"\b\d{16}\b",         # credit card number pattern
    r"api[_\- ]?key",
]

def pattern_filter(text):
    for pattern in BLOCK_PATTERNS:
        if re.search(pattern, text.lower()):
            return False
    return True
```

---

### Category-Level Filtering

#### Demonstration

```python
def category_filter(text):
    categories = moderation_model.classify(text)
    return not categories["self_harm"] and not categories["illegal"]
```

---

### Full Filtering Pipeline

#### Demonstration

```python
def is_content_safe(text):
    if not keyword_filter(text):
        return False
    if not pattern_filter(text):
        return False
    if not category_filter(text):
        return False
    return True
```

---

### Enforcement Layer

```python
def safe_response(text):
    if not is_content_safe(text):
        return "Content blocked by safety filter."
    return text
```

---

### Mental Model

```
Content Filtering = Safety firewall for your system
```

---

### Key Takeaways

* Works on both user input and LLM output
* Uses multiple detection layers
* Mandatory for compliance and safety
* Should run continuously in production