```{contents}
```
## Input Sanitization


**Input sanitization** is the process of **cleaning, filtering, and normalizing user input** before it enters the system to prevent:

* Security vulnerabilities
* Prompt injection attacks
* Malformed data
* Unexpected model behavior

It ensures the system only processes **safe and well-structured input**.

---

### Where It Fits in the Pipeline

```
User Input
   ↓
Sanitizer ── clean → Continue Processing
   ↓ unsafe
Reject / Repair / Log
```

---

### Core Sanitization Techniques

| Technique      | Purpose                     |
| -------------- | --------------------------- |
| Normalization  | Consistent formatting       |
| Validation     | Enforce allowed structure   |
| Filtering      | Remove dangerous content    |
| Escaping       | Prevent execution injection |
| Length control | Avoid abuse / overflow      |

---

### Basic Text Sanitization

#### Demonstration

```python
import re

def sanitize_text(text: str):
    text = text.strip()
    text = re.sub(r"\s+", " ", text)          # normalize whitespace
    text = re.sub(r"[<>$`]", "", text)        # remove dangerous chars
    return text
```

---

### Length & Content Validation

#### Demonstration

```python
MAX_LENGTH = 500

def validate_input(text):
    if len(text) > MAX_LENGTH:
        raise ValueError("Input too long")
    if not text:
        raise ValueError("Empty input")
    return True
```

---

### Injection Pattern Filtering

#### Demonstration

```python
BLOCK_PATTERNS = [
    r"ignore .* instructions",
    r"reveal .* prompt",
    r"bypass .* security",
]

def contains_malicious_content(text):
    text = text.lower()
    for p in BLOCK_PATTERNS:
        if re.search(p, text):
            return True
    return False
```

---

### Full Sanitization Pipeline

#### Demonstration

```python
def sanitize_input(user_input):
    clean = sanitize_text(user_input)
    validate_input(clean)

    if contains_malicious_content(clean):
        raise ValueError("Potentially malicious input")

    return clean
```

---

### Usage Example

```python
def handle_request(user_input):
    try:
        safe_input = sanitize_input(user_input)
        return llm.invoke(safe_input).content
    except ValueError as e:
        return f"Request rejected: {e}"
```

---

### Mental Model

```
Input Sanitization = First security gate of your system
```

---

### Key Takeaways

* Sanitization protects both the model and the system
* Must run before any LLM invocation
* Combine with injection detection and rate limiting
* Essential for production-grade LLM security
