Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 121 additions & 22 deletions docs/user-guide/safety-security/pii-redaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,22 @@ Integrating PII redaction is crucial for:
## How to implement PII Redaction

Strands SDK does not natively perform PII redaction within its core telemetry generation but recommends two effective ways to achieve PII masking:

### Option 1: Using Third-Party Specialized Libraries [Recommended]
Leverage specialized external libraries like Langfuse, LLM Guard, Presidio, or AWS Comprehend for high-quality PII detection and redaction:

#### Step-by-Step Integration Guide

##### Step 1: Install your chosen PII Redaction Library.
Example with [LLM Guard](https://protectai.com/llm-guard):
````

````bash
pip install llm-guard
````
##### Step2: Import necessary modules and initialize the Vault and Anonymize scanner.
````

##### Step 2: Import necessary modules and initialize the Vault and Anonymize scanner.

```python
from llm_guard.vault import Vault
from llm_guard.input_scanners import Anonymize
from llm_guard.input_scanners.anonymize_helpers import BERT_LARGE_NER_CONF
Expand All @@ -42,24 +48,31 @@ def create_anonymize_scanner():
language="en"
)
return scanner
````
##### Step3: Define a masking function using the anonymize scanner.
````
```
##### Step 3: Define a masking function using the anonymize scanner.

```python
def masking_function(data, **kwargs):
if isinstance(data, str):
scanner = create_anonymize_scanner()
# Scan and redact the data
sanitized_data, is_valid, risk_score = scanner.scan(data)
return sanitized_data
return data
````
##### Step4: Configure the masking function in Observability platform, eg., Langfuse.
````
from langfuse import Langfuse, observe
```

##### Step 4: Configure the masking function in Observability platform, eg., Langfuse.

```python
from langfuse import Langfuse

langfuse = Langfuse(mask=masking_function)
````
##### Step5: Create a sample function with PII.
````
```

##### Step 5: Create a sample function with PII.

```python
from langfuse import observe
@observe()
def generate_report():
report = "John Doe met with Jane Smith to discuss the project."
Expand All @@ -70,14 +83,94 @@ print(result)
# Output: [REDACTED_PERSON] met with [REDACTED_PERSON] to discuss the project.

langfuse.flush()
````
```

#### Complete example with a Strands Agent

```python
from strands import Agent
from llm_guard.vault import Vault
from llm_guard.input_scanners import Anonymize
from llm_guard.input_scanners.anonymize_helpers import BERT_LARGE_NER_CONF
from langfuse import Langfuse, observe

vault = Vault()

def create_anonymize_scanner():
"""Creates a reusable anonymize scanner."""
return Anonymize(vault, recognizer_conf=BERT_LARGE_NER_CONF, language="en")

def masking_function(data, **kwargs):
"""Langfuse masking function to recursively redact PII."""
if isinstance(data, str):
scanner = create_anonymize_scanner()
sanitized_data, _, _ = scanner.scan(data)
return sanitized_data
elif isinstance(data, dict):
return {k: masking_function(v) for k, v in data.items()}
elif isinstance(data, list):
return [masking_function(item) for item in data]
return data

langfuse = Langfuse(mask=masking_function)


class CustomerSupportAgent:
def __init__(self):
self.agent = Agent(
system_prompt="You are a helpful customer service agent. Respond professionally to customer inquiries."
)

@observe
def process_sanitized_message(self, sanitized_payload):
"""Processes a pre-sanitized payload and expects sanitized input."""
sanitized_content = sanitized_payload.get("prompt", "empty input")

conversation = f"Customer: {sanitized_content}"

response = self.agent(conversation)
return response


def process():
support_agent = CustomerSupportAgent()
scanner = create_anonymize_scanner()

raw_payload = {
"prompt": "Hi, I'm Jonny Test. My phone number is 123-456-7890 and my email is john@example.com. I need help with my order #123456789."
}

sanitized_prompt, _, _ = scanner.scan(raw_payload["prompt"])
sanitized_payload = {"prompt": sanitized_prompt}

response = support_agent.process_sanitized_message(sanitized_payload)

print(f"Response: {response}")
langfuse.flush()

#Example input: prompt:
# "Hi, I'm [REDACTED_PERSON_1]. My phone number is [REDACTED_PHONE_NUMBER_1] and my email is [REDACTED_EMAIL_ADDRESS_1]. I need help with my order #123456789."
#Example output:
# #Hello! I'd be happy to help you with your order #123456789.
# To better assist you, could you please let me know what specific issue you're experiencing with this order? For example:
# - Are you looking for a status update?
# - Need to make changes to the order?
# - Having delivery issues?
# - Need to process a return or exchange?
#
# Once I understand what you need help with, I'll be able to provide you with the most relevant assistance."

if __name__ == "__main__":
process()
```

### Option 2: Using OpenTelemetry Collector Configuration [Collector-level Masking]
Implement PII masking directly at the collector level, which is ideal for centralized control.

#### Example code:
1. Edit your collector configuration (eg., otel-collector-config.yaml):
````

```yaml
processors:
attributes/pii:
actions:
Expand All @@ -92,22 +185,28 @@ service:
pipelines:
traces:
processors: [attributes/pii]
````
```

2. Deploy or restart your OTEL collector with the updated configuration.

#### Example:

##### Before:
````

```json
{
"user.email": "user@example.com",
"http.url": "https://example.com?token=abc123"
"user.email": "user@example.com",
"http.url": "https://example.com?token=abc123"
}
````
```

#### After:
````

```json
{
"http.url": "https://example.com?token=[REDACTED]"
}
````
```

## Additional Resources
* [PII definition](https://www.dol.gov/general/ppii)
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ nav:
- Responsible AI: user-guide/safety-security/responsible-ai.md
- Guardrails: user-guide/safety-security/guardrails.md
- Prompt Engineering: user-guide/safety-security/prompt-engineering.md
- PII Redaction: user-guide/safety-security/pii-redaction.md
- Observability & Evaluation:
- Observability: user-guide/observability-evaluation/observability.md
- Metrics: user-guide/observability-evaluation/metrics.md
Expand Down