diff --git a/docs/user-guide/safety-security/pii-redaction.md b/docs/user-guide/safety-security/pii-redaction.md index dc2ed9c6..ec35802e 100644 --- a/docs/user-guide/safety-security/pii-redaction.md +++ b/docs/user-guide/safety-security/pii-redaction.md @@ -18,16 +18,22 @@ Integrating PII redaction is crucial for: ## How to implement PII Redaction Strands SDK does not natively perform PII redaction within its core telemetry generation but recommends two effective ways to achieve PII masking: + ### Option 1: Using Third-Party Specialized Libraries [Recommended] Leverage specialized external libraries like Langfuse, LLM Guard, Presidio, or AWS Comprehend for high-quality PII detection and redaction: + #### Step-by-Step Integration Guide + ##### Step 1: Install your chosen PII Redaction Library. Example with [LLM Guard](https://protectai.com/llm-guard): -```` + +````bash pip install llm-guard ```` -##### Step2: Import necessary modules and initialize the Vault and Anonymize scanner. -```` + +##### Step 2: Import necessary modules and initialize the Vault and Anonymize scanner. + +```python from llm_guard.vault import Vault from llm_guard.input_scanners import Anonymize from llm_guard.input_scanners.anonymize_helpers import BERT_LARGE_NER_CONF @@ -42,24 +48,31 @@ def create_anonymize_scanner(): language="en" ) return scanner -```` -##### Step3: Define a masking function using the anonymize scanner. -```` +``` +##### Step 3: Define a masking function using the anonymize scanner. + +```python def masking_function(data, **kwargs): if isinstance(data, str): scanner = create_anonymize_scanner() + # Scan and redact the data sanitized_data, is_valid, risk_score = scanner.scan(data) return sanitized_data return data -```` -##### Step4: Configure the masking function in Observability platform, eg., Langfuse. -```` -from langfuse import Langfuse, observe +``` + +##### Step 4: Configure the masking function in Observability platform, eg., Langfuse. + +```python +from langfuse import Langfuse langfuse = Langfuse(mask=masking_function) -```` -##### Step5: Create a sample function with PII. -```` +``` + +##### Step 5: Create a sample function with PII. + +```python +from langfuse import observe @observe() def generate_report(): report = "John Doe met with Jane Smith to discuss the project." @@ -70,14 +83,94 @@ print(result) # Output: [REDACTED_PERSON] met with [REDACTED_PERSON] to discuss the project. langfuse.flush() -```` +``` + +#### Complete example with a Strands Agent + +```python +from strands import Agent +from llm_guard.vault import Vault +from llm_guard.input_scanners import Anonymize +from llm_guard.input_scanners.anonymize_helpers import BERT_LARGE_NER_CONF +from langfuse import Langfuse, observe + +vault = Vault() + +def create_anonymize_scanner(): + """Creates a reusable anonymize scanner.""" + return Anonymize(vault, recognizer_conf=BERT_LARGE_NER_CONF, language="en") + +def masking_function(data, **kwargs): + """Langfuse masking function to recursively redact PII.""" + if isinstance(data, str): + scanner = create_anonymize_scanner() + sanitized_data, _, _ = scanner.scan(data) + return sanitized_data + elif isinstance(data, dict): + return {k: masking_function(v) for k, v in data.items()} + elif isinstance(data, list): + return [masking_function(item) for item in data] + return data + +langfuse = Langfuse(mask=masking_function) + + +class CustomerSupportAgent: + def __init__(self): + self.agent = Agent( + system_prompt="You are a helpful customer service agent. Respond professionally to customer inquiries." + ) + + @observe + def process_sanitized_message(self, sanitized_payload): + """Processes a pre-sanitized payload and expects sanitized input.""" + sanitized_content = sanitized_payload.get("prompt", "empty input") + + conversation = f"Customer: {sanitized_content}" + + response = self.agent(conversation) + return response + + +def process(): + support_agent = CustomerSupportAgent() + scanner = create_anonymize_scanner() + + raw_payload = { + "prompt": "Hi, I'm Jonny Test. My phone number is 123-456-7890 and my email is john@example.com. I need help with my order #123456789." + } + + sanitized_prompt, _, _ = scanner.scan(raw_payload["prompt"]) + sanitized_payload = {"prompt": sanitized_prompt} + response = support_agent.process_sanitized_message(sanitized_payload) + + print(f"Response: {response}") + langfuse.flush() + + #Example input: prompt: + # "Hi, I'm [REDACTED_PERSON_1]. My phone number is [REDACTED_PHONE_NUMBER_1] and my email is [REDACTED_EMAIL_ADDRESS_1]. I need help with my order #123456789." + #Example output: + # #Hello! I'd be happy to help you with your order #123456789. + # To better assist you, could you please let me know what specific issue you're experiencing with this order? For example: + # - Are you looking for a status update? + # - Need to make changes to the order? + # - Having delivery issues? + # - Need to process a return or exchange? + # + # Once I understand what you need help with, I'll be able to provide you with the most relevant assistance." + +if __name__ == "__main__": + process() +``` ### Option 2: Using OpenTelemetry Collector Configuration [Collector-level Masking] Implement PII masking directly at the collector level, which is ideal for centralized control. + #### Example code: 1. Edit your collector configuration (eg., otel-collector-config.yaml): -```` + +```yaml processors: attributes/pii: actions: @@ -92,22 +185,28 @@ service: pipelines: traces: processors: [attributes/pii] -```` +``` + 2. Deploy or restart your OTEL collector with the updated configuration. + #### Example: + ##### Before: -```` + +```json { -"user.email": "user@example.com", -"http.url": "https://example.com?token=abc123" + "user.email": "user@example.com", + "http.url": "https://example.com?token=abc123" } -```` +``` + #### After: -```` + +```json { "http.url": "https://example.com?token=[REDACTED]" } -```` +``` ## Additional Resources * [PII definition](https://www.dol.gov/general/ppii) diff --git a/mkdocs.yml b/mkdocs.yml index dba0bf8f..b3119355 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -110,6 +110,7 @@ nav: - Responsible AI: user-guide/safety-security/responsible-ai.md - Guardrails: user-guide/safety-security/guardrails.md - Prompt Engineering: user-guide/safety-security/prompt-engineering.md + - PII Redaction: user-guide/safety-security/pii-redaction.md - Observability & Evaluation: - Observability: user-guide/observability-evaluation/observability.md - Metrics: user-guide/observability-evaluation/metrics.md