# Structured Payload & File Scanning

SafeAI can recursively scan nested JSON payloads, arrays, and files on disk for secrets and PII.
This notebook demonstrates the three main scanning entry points using only `SafeAI.quickstart()` â€” no YAML configuration files needed.

| Method | Input | Use case |
|--------|-------|----------|
| `ai.scan_structured_input()` | `dict` / `list` | Nested JSON payloads |
| `ai.scan_input()` | `str` | Flat text prompts |
| `ai.scan_file_input()` | file path | `.json` or `.txt` files on disk |

In [1]:
from safeai import SafeAI

ai = SafeAI.quickstart()
print("SafeAI ready:", ai)

SafeAI ready: <safeai.api.SafeAI object at 0x11b186a50>


## 1. Scan a nested JSON payload

Pass a dictionary with user data and an embedded API key.
`scan_structured_input` walks every leaf value and returns a `StructuredScanResult`.

In [2]:
payload = {
    "user": {
        "name": "Alice",
        "email": "alice@corp.io"
    },
    "config": {
        "api_key": "sk-ABCDEF1234567890ABCDEF"
    }
}

result = ai.scan_structured_input(payload, agent_id="demo")

print("Decision action:", result.decision.action)
print("Number of detections:", len(result.detections))
print()
for det in result.detections:
    print(f"  path: {det.path}  detector: {det.detector}  tag: {det.tag}")

print("\nFiltered payload:", result.filtered)

Decision action: block
Number of detections: 2

  path: $.config.api_key  detector: openai_key  tag: secret.credential
  path: $.user.email  detector: email  tag: personal.pii

Filtered payload: None


## 2. Clean payload passes through

A payload with no secrets or PII should be allowed without any detections.

In [3]:
clean_payload = {
    "query": "What is Python?",
    "temperature": 0.7
}

result = ai.scan_structured_input(clean_payload, agent_id="demo")

print("Decision action:", result.decision.action)
print("Detections:", result.detections)
print("Filtered payload:", result.filtered)

Decision action: allow
Detections: []
Filtered payload: {'query': 'What is Python?', 'temperature': 0.7}


## 3. Deep nesting detection

Secrets buried three levels deep are still found. The detection `path` shows the exact JSONPath.

In [4]:
deep_payload = {
    "request": {
        "headers": {
            "authorization": "token=sk-ABCDEF1234567890ABCDEF"
        },
        "body": {
            "text": "hello"
        }
    }
}

result = ai.scan_structured_input(deep_payload, agent_id="demo")

print("Decision action:", result.decision.action)
print()
for det in result.detections:
    print(f"  path: {det.path}")
    print(f"  detector: {det.detector}")
    print(f"  tag: {det.tag}")

print("\nFiltered payload:", result.filtered)

Decision action: block

  path: $.request.headers.authorization
  detector: generic_token
  tag: secret.token
  path: $.request.headers.authorization
  detector: openai_key
  tag: secret.credential

Filtered payload: None


## 4. Lists are scanned too

Array elements are walked individually. The detection path includes the array index.

In [5]:
list_payload = {
    "messages": [
        {"role": "user", "content": "My SSN is 123-45-6789"},
        {"role": "assistant", "content": "ok"}
    ]
}

result = ai.scan_structured_input(list_payload, agent_id="demo")

print("Decision action:", result.decision.action)
print()
for det in result.detections:
    print(f"  path: {det.path}")
    print(f"  detector: {det.detector}")
    print(f"  tag: {det.tag}")

print("\nFiltered payload:", result.filtered)

Decision action: allow

  path: $.messages[0].content
  detector: ssn
  tag: personal.pii

Filtered payload: {'messages': [{'role': 'user', 'content': 'My SSN is 123-45-6789'}, {'role': 'assistant', 'content': 'ok'}]}


## 5. Scan a JSON file from disk

`scan_file_input` reads a file, auto-detects the format, and applies the same scanning.
For `.json` files it uses structured scanning internally.

In [6]:
import json
import os
import tempfile

# Write a temporary JSON file containing a secret
tmp_json = os.path.join(tempfile.gettempdir(), "safeai_demo_payload.json")
with open(tmp_json, "w") as f:
    json.dump({
        "agent": "summariser",
        "credentials": {
            "token": "sk-ABCDEF1234567890ABCDEF"
        }
    }, f)

result = ai.scan_file_input(tmp_json, agent_id="demo")

print("Mode:", result["mode"])
print("Decision:", result["decision"])
print("Detections:", result["detections"])
print("Filtered:", result["filtered"])
print("File path:", result["file_path"])
print("Size (bytes):", result["size_bytes"])

# Clean up
os.unlink(tmp_json)
print("\nTemp file removed.")

Mode: structured
Decision: {'action': 'block', 'policy_name': 'block-secrets-everywhere', 'reason': 'Secrets must never cross any boundary.'}
Detections: [{'path': '$.credentials.token', 'detector': 'openai_key', 'tag': 'secret.credential', 'start': 0, 'end': 25}]
Filtered: None
File path: /private/var/folders/w6/vcgrptb532z_npwq5m8kmn880000gp/T/safeai_demo_payload.json
Size (bytes): 78

Temp file removed.


## 6. Scan a plain text file

For `.txt` files, `scan_file_input` falls back to flat text scanning (same as `scan_input`).

In [7]:
import tempfile
import os

# Write a temporary text file containing PII
tmp_txt = os.path.join(tempfile.gettempdir(), "safeai_demo_note.txt")
with open(tmp_txt, "w") as f:
    f.write("Please process the payment for card number 4111-1111-1111-1111.\n")
    f.write("The customer SSN is 123-45-6789.\n")

result = ai.scan_file_input(tmp_txt, agent_id="demo")

print("Mode:", result["mode"])
print("Decision:", result["decision"])
print("Detections:", result["detections"])
print("Filtered:", result["filtered"])
print("File path:", result["file_path"])
print("Size (bytes):", result["size_bytes"])

# Clean up
os.unlink(tmp_txt)
print("\nTemp file removed.")

Mode: text
Decision: {'action': 'allow', 'policy_name': 'allow-input-by-default', 'reason': 'Allow when no restrictive policy matched.'}
Detections: [{'detector': 'credit_card', 'tag': 'personal.financial', 'start': 43, 'end': 62}, {'detector': 'ssn', 'tag': 'personal.pii', 'start': 84, 'end': 95}]
Filtered: Please process the payment for card number 4111-1111-1111-1111.
The customer SSN is 123-45-6789.

File path: /private/var/folders/w6/vcgrptb532z_npwq5m8kmn880000gp/T/safeai_demo_note.txt
Size (bytes): 97

Temp file removed.


---

## Summary

| Method | What it scans | Returns |
|--------|--------------|----------|
| `scan_structured_input(payload)` | Nested dicts and lists, recursively | `StructuredScanResult` with `.decision`, `.detections`, `.filtered` |
| `scan_input(text)` | Flat text string | Scan result with decision and detections |
| `scan_file_input(path)` | `.json` (structured) or `.txt` (text) files | `dict` with `mode`, `decision`, `detections`, `filtered`, `file_path`, `size_bytes` |

All three methods work out of the box with `SafeAI.quickstart()` and require no external configuration.