
# RAG Security Lab: Memorization vs RAG, and 4 Layers of Protection

**You’ll learn to:**
- Explain LLM **memorization/regurgitation** at a high level, and why **Differential Privacy** (DP) is used to mitigate it.
- Build a small **RAG** system over a mock support-ticket dataset.
- Add **four security layers** to RAG:
  1) Access Control (the *Bouncer*)
  2) Data Anonymization (the *Sanitizer*)
  3) Prompt Constraints (the *Chaperone*)
  4) Output Filtering (the *Redactor*)

> This notebook includes a pure-Python in-memory vector index so it runs offline for class. Swap in Pinecone in the exercises to connect to a real vector DB.



# 1. Model Memorization & Differential Privacy

- Training compresses huge corpora into weights. That *can* lead to **memorization** of rare/private strings that later leak.
- **Differential Privacy (DP)** adds calibrated randomness (noise) during training so models learn **patterns**, not exact **records**.

**Salary analogy:** everyone adds a private random offset to their salary before sharing. The average remains accurate, but no individual's value is exposed.

In this lab, we won’t retrain an LLM. Instead, we’ll focus on **RAG**, which keeps private data **outside** the model weights and behind a **policy wall**.





If running locally, create a fresh virtual environment and install packages as needed.


In [None]:
#pip install -U pinecone

In [None]:
# If you want to use real embeddings/LLM later, uncomment and install:
!pip install openai pinecone tiktoken

# 2. Securing RAG

In [None]:



import json, re, math, uuid, ipaddress
import numpy as np
from pathlib import Path
from dataclasses import dataclass
from typing import List, Dict, Any



## 2.1 Loading the Data

We provide **raw** tickets (with PII) and an **anonymized** copy to demonstrate the Sanitizer pattern.


In [None]:
  import json

  # Load raw tickets
  raw_tickets = []
  with open('support_tickets_raw.jsonl', 'r') as f:
      for line in f:
          raw_tickets.append(json.loads(line))

  # Load sanitized tickets
  sanitized_tickets = []
  with open('support_tickets_sanitized.jsonl', 'r') as f:
      for line in f:
          sanitized_tickets.append(json.loads(line))

  print(f"Loaded {len(raw_tickets)} raw tickets")
  print(f"Loaded {len(sanitized_tickets)} sanitized tickets")

In [None]:

#DATA_DIR = Path("data")
#raw_path = DATA_DIR / "support_tickets_raw.jsonl"


#def load_jsonl(p):
#    return [json.loads(x) for x in p.read_text(encoding="utf-8").splitlines()]

#raw_tickets = load_jsonl(raw_path)




In [None]:
raw_tickets[14]


## 2.2 Data Anonymization

The first step is protecting sensitive data by removing PII (Personally Identifiable Information) so that they never enter the system. PII includes details like names, emails, phone numbers, account numbers and IP addresses - anything that could identify a specific person. Our sanitizer replaces each sensitive element with a consistent placeholder. This way, the data still retains enough structure to be useful for analysis and retrieval.

This approach has two key benefits:
1. **Risk reduction**: even if a query slips through access control or a model tries to reveal more than it should, the sensitive data simply isn’t there to give away.
2. **Utility preservation**: placeholders remain consistent across documents, so the system can still link related tickets or detect patterns without exposing private details.

In [None]:
# Define the token formats we’ll use for each PII type
PLACEHOLDER_FMT = {
    "email":  "EMAIL_{:04d}",
 #   "phone":  "PHONE_{:04d}",
    "account":"ACCT_{:04d}",
    "ip":     "IP_{:04d}",
    "person": "PERSON_{:04d}",
}

# Global token vault:
# Keeps a STABLE, GLOBAL mapping: original value -> placeholder (per type)
# Ensures the same email, etc. always maps to the same token across tickets
class TokenVault:
    """
    Maps original values -> stable placeholders, per type.
    Example: vault.get_token("email", "alice@example.com") -> "{{EMAIL_0001}}"
    """
    def __init__(self):
        self._maps: Dict[str, Dict[str, str]] = {t:{} for t in PLACEHOLDER_FMT}
        self._counters: Dict[str, int] = {t:0 for t in PLACEHOLDER_FMT}

    def _normalize_key(self, typ: str, value: str) -> str:
        # Normalize by type (e.g., emails to lowercase)
        v = value.strip()
        if typ == "email":
            v = v.lower()
        elif typ == "ip":
            v = v.strip()
        elif typ == "account":
            v = re.sub(r"\D", "", v)
        elif typ == "person":
            v = re.sub(r"\s+", " ", v)
        return v

    def get_token(self, typ: str, value: str) -> str:
        # Return the existing token for (typ,value) OR create a new one
        # Final tokens are wrapped like "{{EMAIL_0001}}"
        assert typ in PLACEHOLDER_FMT, f"Unknown token type: {typ}"
        key = self._normalize_key(typ, value)
        if not key:
            return value
        if key not in self._maps[typ]:
            self._counters[typ] += 1
            token = "{{" + PLACEHOLDER_FMT[typ].format(self._counters[typ]) + "}}"
            self._maps[typ][key] = token
        return self._maps[typ][key]

    def stats(self) -> Dict[str, int]:
        return {k: len(v) for k, v in self._maps.items()}


# Regexes (free-text pass)
# -------------------------
# Patterns to find PII inside unstructured text fields (body, messages)
EMAIL_RE = re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b")
# Long numbers that might be account IDs
ACCOUNT_RE = re.compile(r"\b\d{8,12}\b")  # generic long number (use with caution in free text)
# IPv4 candidates in text. We’ll validate them with ipaddress before replacing.
IP_CANDIDATE_RE = re.compile(r"\b(?:\d{1,3}\.){3}\d{1,3}\b")


# Helpers
# -------------------------
# Find/replace emails in free text
def replace_emails(text: str, vault: TokenVault) -> str:
    return EMAIL_RE.sub(lambda m: vault.get_token("email", m.group(0)), text)

def replace_accounts_text(text: str, vault: TokenVault) -> str:
    # Find/replace long numeric sequences as account IDs in free text.
    def repl(m):
        num = m.group(0)
        return vault.get_token("account", num)
    return ACCOUNT_RE.sub(repl, text)

def replace_ips_text(text: str, vault: TokenVault) -> str:
    # Find/validate/replace IPv4 addresses in free text.
    def repl(m):
        cand = m.group(0)
        try:
            ipaddress.IPv4Address(cand)
            return vault.get_token("ip", cand)
        except Exception:
            return cand
    return IP_CANDIDATE_RE.sub(repl, text)

def anonymize_free_text(text: str, vault: TokenVault) -> str:
    # Apply free-text anonymization
    t = text
    t = replace_emails(t, vault)
 #   t = replace_phones_text(t, vault)
    t = replace_ips_text(t, vault)
    t = replace_accounts_text(t, vault)
    return t


# Field-level anonymization
# -------------------------
STRUCTURED_FIELD_RULES = [
    # Define which structured fields to anonymize, with an optional validator
    (("contact_email",), "email",  None),
    (("account_number",),"account", None),
    (("source_ip",),     "ip",     lambda s: _is_ipv4(s:=str(s))),
    (("contact_name",),  "person", None),
]

def _is_ipv4(s: str) -> bool:
    # True if s is a valid IPv4 address
    try:
        ipaddress.IPv4Address(s)
        return True
    except Exception:
        return False

def set_in(obj: Dict[str, Any], path: tuple, value: Any) -> None:
    # Write a value at a dotted path in a dict (shallow)
    d = obj
    for p in path[:-1]:
        d = d.get(p, {})
    d[path[-1]] = value

def get_in(obj: Dict[str, Any], path: tuple) -> Any:
    # Read a value at a dotted path in a dict; return None if missing
    d = obj
    for p in path:
        if not isinstance(d, dict) or p not in d:
            return None
        d = d[p]
    return d

def anonymize_ticket(ticket: Dict[str, Any], vault: TokenVault) -> Dict[str, Any]:
    # Produce an anonymized copy of a single ticket:
    # 1) Structured fields: precise tokenization using rules above.
    # 2) Free-text fields: regex-based replacement (body + conversation messages)
    t = json.loads(json.dumps(ticket))

    # 1) Structured fields (email, account, IP, person)
    for path, typ, validator in STRUCTURED_FIELD_RULES:
        val = get_in(t, path)
        if not isinstance(val, str):
            continue
        if validator is None or validator(val):
            token = vault.get_token(typ, val)
            set_in(t, path, token)

    # 2) Free text fields (body)
    if isinstance(t.get("body"), str):
        t["body"] = anonymize_free_text(t["body"], vault)

    # 2b) Free text in conversation messages
    conv = t.get("conversation", [])
    if isinstance(conv, list):
        new_conv = []
        for turn in conv:
            if not isinstance(turn, dict):
                new_conv.append(turn); continue
            msg = turn.get("message")
            if isinstance(msg, str):
                turn = dict(turn)
                turn["message"] = anonymize_free_text(msg, vault)
            new_conv.append(turn)
        t["conversation"] = new_conv

    return t

def anonymize_corpus(tickets: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    # Anonymize a list of tickets
    vault = TokenVault()
    out = [anonymize_ticket(t, vault) for t in tickets]
    # You can inspect vault.stats() if you want counts per type
    return out

Now, we'll call `anonymize_corpus` function on "raw_tickets". This will return a new list of tickets where PII fields have been replaced with placeholders:

In [None]:
sanitized = anonymize_corpus(raw_tickets)
out_path = Path("support_tickets_sanitized.jsonl")
with out_path.open("w", encoding="utf-8") as f:
    for doc in sanitized:
        f.write(json.dumps(doc, ensure_ascii=False) + "\n")

print("Wrote:", out_path)

To persist the anonymized data, write it out in JSON Lines:

In [None]:
  sanitized_path = "support_tickets_sanitized.jsonl"

  def load_jsonl(file_path):
      """Load JSONL file"""
      with open(file_path, 'r', encoding='utf-8') as f:
          return [json.loads(line) for line in f if line.strip()]

  # Load the sanitized tickets
  sanitized_tickets = load_jsonl(sanitized_path)

  print(f"✅ Loaded {len(sanitized_tickets)} sanitized tickets")

And now let's load them back take a look at one support ticket to confirm placeholders look right:

In [None]:
raw_tickets[10]

In [None]:
sanitized_tickets[10]


## 2.3 Access Control - Preparing Metadata

This step is about defining the rules that decide who can see what. In practice, access control comes down to a few common checks:

1. Tenant isolation (multi-customer separation):
- Each ticket belongs to an organization (org).
- If the current user is from Zephyr Telecom, they should never see tickets from BlueShield Bank.
- Policy: ticket.org == user.org (unless the user is an internal analyst).


2. Department scoping
- Some tickets are HR-related, others IT, Security, etc.
- If the user is only allowed to see HR tickets, you filter on ticket.department.
- Policy: ticket.department in user.allowed_departments.
  
3. Visibility flags (internal vs. customer-facing)
  
- Some data is internal-only, some can be shown to customers.
- Example: add a customer_visible: true/false field.
- Policy: if the user is a customer, they only see tickets where customer_visible == true.



The below code will take sanitized tickets and attach access-control metadata that we'll need later for filtering:

In [None]:
def add_metadata(sanitized_tickets):
    enriched = []
    for t in sanitized_tickets:
        t2 = dict(t)  # making a copy
        t2["metadata"] = {
            "org": t.get("org"),
            "department": t.get("department"),
            "customer_visible": True,
            "tags": t.get("tags", []),
        }
        enriched.append(t2)
    return enriched

In [None]:
# Adding metadata to sanitized tickets
enriched_tickets = add_metadata(sanitized_tickets)

In [None]:
# Inspecting
enriched_tickets[10]

## 2.4 Preparing for Embeddings

Now that our tickets are sanitized and enriched with metadata, the next step is to prepare them for embeddings. Instead of embedding an entire ticket as one big block of text, we’ll **break it into smaller, more meaningful pieces**.

Each ticket will become multiple rows:
- 1 row for the ticket body
- 1 row for each message in the conversation

Treating these smaller chunks as separate rows has two big benefits:
1. it **improves retrieval accuracy**: we can return the exact part of a ticket that matches a query
2. it **respects model limitations on input length**

Along with the text, each row will carry its own metadata, such as organization, department and visibility, which will later let us enforce access control when querying the database.

In [None]:
## Break it into smaller, more meaningful pieces

def ticket_to_rows(t):
    rows = []
    tid = t["ticket_id"]

    # this is the per-chunk "catalog metadata"
    base_md = {
        **t["metadata"],            # org, department, tags, customer_visible
        "ticket_id": tid,           # for provenance & catalog writes
        "created_at": t.get("created_at"),
    }

    body = (t.get("body") or "").strip()
    if body:
        rows.append({
            "id": f"{tid}:body",
            "text": body,
            "metadata": {**base_md, "part": "body"}
        })

    for i, turn in enumerate(t.get("conversation", [])):
        msg = (turn or {}).get("message", "").strip()
        if not msg:
            continue
        rows.append({
            "id": f"{tid}:conv:{i}",
            "text": msg,
            "metadata": {**base_md, "part": "conversation", "role": turn.get("role"), "idx": i}
        })
    return rows

In [None]:
# Apply ticket_to_rows() function to every ticket in the dataset
rows = [r for t in enriched_tickets for r in ticket_to_rows(t)]

In [None]:
# Peek at the first few rows
for r in rows[:5]:
    print(json.dumps(r, indent=2))

## 2.5 Creating Embeddings

We will use embedding model `all-MiniLM-L6-v2`:

In [None]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2", device = "cpu")

In [None]:
texts = [r["text"] for r in rows]

# Embedding in batches
embeddings = model.encode(
    texts,
    batch_size = 64,
    show_progress_bar = True,
    convert_to_numpy = True,
    normalize_embeddings = True
)

# Attaching embeddings back to rows as Python lists
for r, vec in zip(rows, embeddings):
    r["values"] = vec.tolist()

In [None]:
vectors = [
    {"id": r["id"], "values": r["values"], "metadata": r["metadata"]}
    for r in rows
]

In [None]:
vectors

## 2.6 Upserting Data into Pinecone

In [None]:
from google.colab import userdata
from pinecone import Pinecone, ServerlessSpec

# Load API key from Google Colab secrets
pinecone_client = Pinecone(api_key=userdata.get('PINECONE_API_KEY'))

In [None]:
#import os
#from pinecone import Pinecone, ServerlessSpec

# Client
#pinecone_client = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

In [None]:
# Creating an Index
pinecone_client.create_index(name = "support-tickets-demo",
                             dimension = 384,
                             metric = "cosine",
                             spec = ServerlessSpec(
                                 cloud = "aws",
                                 region = "us-east-1"
                             ))

In [None]:
index = pinecone_client.Index("support-tickets-demo")

In [None]:
BATCH = 200
total = len(vectors)

for i in range(0, total, BATCH):
    batch = vectors[i:i+BATCH]
    index.upsert(vectors=batch)

### 2.6.1 Storing Raw Data with SQLite

In production, the copy of our documents would be in a primary data store, usually a relational database like Postgres/MySQL or object storage (S3/GCS) referenced from a small catalog table. Pinecone holds only the embeddings plus minimal filterable metadata and a pointer back to the source.

In this demo we use `sqlite3` to create a simple local data store.

Let's understand how LLM will actually use it:
1. We'll query Pinecone with ACL filters and get back chunk IDs/pointers.
2. We'll fetch the raw text for those IDs from sqlite3.
3. We'll pass only those snippets into the LLM to compose the answer.

So this sets us up for the next section of this notebook - constraining the model - where we instruct the LLM to answer strictly from those snippets.

Let's create a content catalog which stores the raw text for each chunk (plus basic provenance like ticket ID, org, department and timestamps):

In [None]:
import sqlite3
from pathlib import Path

CATALOG_PATH = Path("data/chunk_catalog.sqlite")

def init_catalog(db_path=CATALOG_PATH):
    # Open (or create) the SQLite database file
    conn = sqlite3.connect(db_path)
    cur = conn.cursor()
    cur.executescript("""
    PRAGMA journal_mode = WAL;
    PRAGMA synchronous = NORMAL;

    CREATE TABLE IF NOT EXISTS chunks (
      id         TEXT PRIMARY KEY,   -- e.g. "87428682:conv:2"
      ticket_id  TEXT,
      part       TEXT,               -- 'body' | 'conversation'
      idx        INTEGER,            -- turn index for conversation
      org        TEXT,
      department TEXT,
      created_at TEXT,
      text       TEXT                -- raw chunk text lives here (not in Pinecone)
    );

    CREATE INDEX IF NOT EXISTS idx_chunks_ticket ON chunks(ticket_id);
    CREATE INDEX IF NOT EXISTS idx_chunks_org_dept ON chunks(org, department);
    """)
    conn.commit()
    conn.close()

def write_chunks_to_catalog(rows, db_path=CATALOG_PATH):
    # Insert/overwrite chunk rows produced by ticket_to_rows()
    conn = sqlite3.connect(db_path)
    cur = conn.cursor()
    data = []
    for r in rows:
        md = r["metadata"]
        data.append((
            r["id"],
            md.get("ticket_id"),
            md.get("part"),
            md.get("idx"),
            md.get("org"),
            md.get("department"),
            md.get("created_at"),
            r["text"],
        ))
    cur.executemany(
        "INSERT OR REPLACE INTO chunks (id, ticket_id, part, idx, org, department, created_at, text) VALUES (?,?,?,?,?,?,?,?)",
        data
    )
    conn.commit()
    conn.close()

# Initialization + load the current batch of chunks
init_catalog()
write_chunks_to_catalog(rows)
print("Catalog ready")

Before we run real queries, we sanity-check the vector index: confirm the index is ready and try fetching a single known vector by ID:

In [None]:
# Index readiness
desc = pinecone_client.describe_index("support-tickets-demo")
print("Index ready:", desc.status.get("ready"))

probe_id = rows[0]["id"]
fetched = index.fetch(ids=[probe_id])

exists = probe_id in (fetched.vectors or {})
print("Fetch exists in Pinecone:", exists)

if exists:
    vec_obj = fetched.vectors[probe_id]
    # Values
    vals = vec_obj.values
    print("Dimensions:", len(vals))

    # Metadata
    md = vec_obj.metadata
    print("Metadata:", md)

    # ACL filters
    if md:
        print("organization:", md.get("org"), "| department:", md.get("department"),
              "| part:", md.get("part"), "| idx:", md.get("idx"))

In [None]:
# Check some organization/department we expect to exist
cur.execute("SELECT COUNT(*) FROM chunks WHERE org=? AND department=?", ("Zephyr Telecom", "Security"))
print("Zephyr/Security chunks:", cur.fetchone()[0])

## 2.7 Querying the Database

**Accesss Control List Filter**:
Now, we’ll run semantic queries against Pinecone and inspect returned matches. We'll specify who the user is (`user_org`), which departments they’re allowed to see (`allowed_depts`) and require that chunks be marked `customer_visible=True`. This filter is sent to Pinecone so retrieval already respects access control before any text is fetched.

In [None]:
# Building ACL filter
user_org = "Zephyr Telecom"
allowed_depts = {"Security"}
acl_filter = {
    "$and": [
        {"org": {"$eq": user_org}},
        {"department": {"$in": list(allowed_depts)}},
        {"customer_visible": {"$eq": True}},
    ]
}

We'll encode the question into an embedding with the same model used for our data:

In [None]:
# Encoding the question
question = "Okta group membership was wrong; what was the fix?"
q_vec = model.encode([question], convert_to_numpy=True, normalize_embeddings=False)[0].tolist()

Finally, we query the Pinecone index to return the top 8 matches:

In [None]:
# Querying
res = index.query(
    vector=q_vec,
    top_k=8,
    include_metadata=True,
    filter=acl_filter
)

Each match shows the metadata we filtered on and the chunk type. Similarity scores can look "low" with short snippets but what matters is the relative ranking (body > relevant support turns > others):

In [None]:
# Returned matches
matches = res.get("matches", [])
print(f"Got {len(matches)} match(es).")
for m in matches:
    mid = m["id"]
    score = m["score"]
    md = m.get("metadata", {})
    print(f"{mid:25s}  score={score:.3f}  org={md.get('org')}  dept={md.get('department')}  part={md.get('part')}  role={md.get('role')}")

We can try the same query with a different organization in the filter, such as "BlueShield Bank". Pinecone returned different chunks from BlueShield Bank tickets. That means our ACL filter is working: we’re not seeing Zephyr data anymore. We’re seeing BlueShield’s own relevant tickets about Okta-like issues.

In [None]:
# Building ACL filter
user_org = "BlueShield Bank"
allowed_depts = {"Security"}  # tweak to test
acl_filter = {
    "$and": [
        {"org": {"$eq": user_org}},
        {"department": {"$in": list(allowed_depts)}},
        {"customer_visible": {"$eq": True}},
    ]
}

# Encoding the question
question = "Okta group membership was wrong; what was the fix?"
q_vec = model.encode([question], convert_to_numpy=True, normalize_embeddings=False)[0].tolist()

# Querying
res = index.query(
    vector=q_vec,
    top_k=8,
    include_metadata=True,
    filter=acl_filter
)

# Returned matches
matches = res.get("matches", [])
print(f"Got {len(matches)} match(es).")
for m in matches:
    mid = m["id"]
    score = m["score"]
    md = m.get("metadata", {})
    print(f"{mid:25s}  score={score:.3f}  org={md.get('org')}  dept={md.get('department')}  part={md.get('part')}  role={md.get('role')}")


## 2.8 Constraining the Model

We'll use this strategy to **tell the LLM exactly how it’s allowed to behave** like use only retrieved snippets, summarize rather than quote, avoid PII entirely and prefer neutral, generic wording. These guardrails don’t replace access control or anonymization, but they sharply reduce the chance of the model inventing specifics or echoing sensitive details. Then, if anything still slips through, the downstream redactor layer can catch it - we will cover that later in the notebook.

We’ll create a function that takes matched results from Pinecone (matched IDs) and returns their texts from SQLite, keeping the same order:

In [None]:
CATALOG_PATH = Path("data/chunk_catalog.sqlite")

def fetch_texts_by_ids(ids, db_path=CATALOG_PATH):
    # Fetch raw chunk text for a list of chunk IDs, preserving input order."""
    if not ids:
        return {}
    conn = sqlite3.connect(db_path)
    cur = conn.cursor()
    qmarks = ",".join("?" * len(ids))
    cur.execute(f"SELECT id, text FROM chunks WHERE id IN ({qmarks})", ids)
    rows = cur.fetchall()
    conn.close()
    # Map found ids -> text
    found = dict(rows)
    # Preserve Pinecone order; missing ids map to ""
    return {i: found.get(i, "") for i in ids}

Next, we'll create another function `build_context_from_matches()` that turns each Pinecone match into a small, numbered snippet such as `[1 | f5bfddc9:body]\n<text>`. The numbers preserve relevance order and give the model (and us) stable citation handles like `[1], [2]` to reference in the answer. We trim long snippets to keep the prompt compact so the LLM focuses on the most useful details, and we skip any IDs that didn’t resolve to text to avoid noise.

The result is a clean context: answers can cite exactly which chunk supported each claim, making the system more explainable.

In [None]:
def build_context_from_matches(matches, id_to_text, max_chars=4000):
    """
    Create labeled context blocks like:
    [1 | f5bfddc9:body]
    <snippet>

    Returns (context_str, used_ids)
    """
    blocks = []
    used_ids = []
    total = 0
    for i, m in enumerate(matches, start=1):
        cid = m["id"]
        txt = (id_to_text.get(cid) or "").strip()
        if not txt:
            continue
        # Light truncation to avoid oversized prompts
        snippet = txt if len(txt) <= 800 else (txt[:800] + " …")
        block = f"[{i} | {cid}]\n{snippet}"
        if total + len(block) + 2 > max_chars:
            break
        blocks.append(block)
        used_ids.append(cid)
        total += len(block) + 2
    return "\n\n".join(blocks), used_ids

Now let's finally see how we can constrain the model - we will set boundaries on what the LLM can use and how it should respond. Instead of letting it generate freely from its pretraining, we guide it with a strict system prompt and a small, curated set of retrieved snippets.

The rules for a model are simple:
- rely only on these snippets,
- summarize rather than quote,
- leave out personal or sensitive details,
- cite snippet labels,
- and abstain if no evidence is available.

We will be working with `gpt-4o-mini` and create the helper function:

In [None]:
from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def answer_with_constrained_model(question, context):
    # Send the constrained prompt + context to the model.
    # Returns text or a fallback message if no context.

    if not context.strip():
        return "No eligible context was found for this user. Please refine the query or check permissions."

    resp = client.chat.completions.create(
        model = "gpt-4o-mini",
        temperature = 0,
        max_tokens = 450,
        messages = [
            {
                "role": "system",
                "content": (
                    # RULES
                    "You are a helpful security support assistant. "
                    "Answer ONLY using the provided context snippets. "
                    "Summarize; do not quote verbatim. "
                    "Do NOT include PII or specifics (no names, emails, phone numbers, IPs, account numbers, dates, project codes, or dollar figures). "
                    "If specifics are essential, use generic placeholders like [REDACTED]. "
                    "If the context is insufficient, say so and suggest safe next steps."
                ),
            },
            {
                "role": "user",
                "content": (
                    f"SOURCES:\n{context}\n\n"
                    f"QUESTION: {question}\n\n"
                    "Answer in 3-5 sentences and cite sources by their bracket labels, e.g., [1], [2]."
                ),
            },
        ],
    )
    return resp.choices[0].message.content

Now, let's prepare ACL filter:

In [None]:
# Vector search with ACL
user_org = "Zephyr Telecom"
allowed_depts = {"Security"}
acl_filter = {
    "$and": [
        {"org": {"$eq": user_org}},
        {"department": {"$in": list(allowed_depts)}},
        {"customer_visible": {"$eq": True}},
    ]
}

Then we turn the question into an embedding and run a similarity search with `top_k=8`, asking Pinecone to return metadata and applying the ACL filter:

In [None]:
# User question
question = "I clicked a phishing link and my endpoint is acting weird"

# Encoding
q_vec = model.encode([question], convert_to_numpy = True, normalize_embeddings = False)[0].tolist()

# Querying
res = index.query(
    vector = q_vec,
    top_k = 8,
    include_metadata = True,
    filter = acl_filter)

Finally, let's put that together: matches → context → answer

In [None]:
# 1. We grab the returned chunk IDs in ranked order. These are just pointers, they don’t contain the raw text.
matches = res.get("matches", [])
ids = [m["id"] for m in matches]

# 2. Fetch raw text for those IDs
id_to_text = fetch_texts_by_ids(ids)

# 3. Build labeled, trimmed context blocks
context, used_ids = build_context_from_matches(matches, id_to_text, max_chars=4000)
print("Context used:\n", context[:600], "...\n","\n-----------------\n")

# 4. Ask the model under strict rules
final_answer = answer_with_constrained_model(question, context)
print("Model's answer: ","\n", final_answer)

## 2.9 Output Filtering and Guardrails

As a final guard, we'll implement "redactor layer" which complements our earlier layers (ACL-filtered retrieval + constrained prompting). The function below check the model's final answer, just before it's shown to the user. Its job is to **catch and redact sensitive information that might have slipped through** such as:

- real names,
- emails,
- 8-12 digit account numbers,
- IPv4/IPv6,
- customer IDs like CUST-2407,
- asset IDs like WS-1054

In [None]:
import re

# Emails
EMAIL_RE = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b')

# 8–12 digit account-like numbers
ACCOUNT_RE = re.compile(r'(?<!\d)\d{8,12}(?!\d)')

# Customer IDs like CUST-2407
CUSTOMER_ID_RE = re.compile(r'\bCUST-\d{3,6}\b', re.IGNORECASE)

# Asset IDs like WS-1054 (two letters + dash + 3–6 digits)
ASSET_ID_RE = re.compile(r'\b[A-Z]{2,4}-\d{3,6}\b')

# IPv4 0–255 per octet
IPV4_RE = re.compile(
    r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}'
    r'(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\b'
)

# Basic IPv6 (not exhaustive but useful)
IPV6_RE = re.compile(r'\b(?:[A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}\b')

# Very light "name-like" pattern (two capitalized words).
NAME_RE = re.compile(r'\b([A-Z][a-z]+)\s+([A-Z][a-z]+)\b')

> NOTE to "NAME_RE": It is a simple heuristic (two Capitalized Words are treated as a person name). This can also hit non-person phrases like "Active Sessions" or "Conditional Access". To reduce false positives, we could:
> - add a small whitelist of domain phrases (e.g., {"Conditional Access","Active Directory"})
> - only apply when "person cues" appear nearby (e.g., 'contact', 'reported by', or an email in the sentence)
> - replace with a lightweight NER (e.g., spaCy PERSON)

In [None]:
 def redact_output(output: str) -> str:
    """
    Last-mile guard: replace PII-ish patterns in the model's output with placeholders.
    Designed around your dataset: emails, phones, 8–12 digit accounts, IPs, customer & asset IDs,
    and (optionally) name-like strings.
    """
    s = output

    # Order can reduce false positives: more specific first, then broader
    s = EMAIL_RE.sub('[REDACTED_EMAIL]', s)
    s = IPV4_RE.sub('[REDACTED_IP]', s)
    s = IPV6_RE.sub('[REDACTED_IP]', s)

    s = CUSTOMER_ID_RE.sub('[REDACTED_CUSTOMER_ID]', s)
    s = ASSET_ID_RE.sub('[REDACTED_ASSET_ID]', s)

    s = ACCOUNT_RE.sub('[REDACTED_ACCOUNT]', s)

    s = NAME_RE.sub('[REDACTED_NAME]', s)

    return s

Let's tie all pieces together and returns the redacted answer plus the IDs that were actually used. The function will:

1. **Apply ACL filter to retrieval**
2. **Fetch raw text from SQLite and build labeled context**
3. **Constrain model when generating answer and redact it**

In [None]:
def secure_rag_answer(
    *,
    question: str,
    index,
    model,
    acl_filter: dict,
    top_k: int = 8,
    namespace: str | None = None,
    max_context_chars: int = 4000,
):
    """
    End-to-end solution: ACL-filtered search -> fetch texts -> labeled context -> constrained LLM -> redaction
    Returns (context, safe_answer)
    """
    # 1) Encode and query Pinecone with ACLs
    q_vec = model.encode([question], convert_to_numpy=True, normalize_embeddings=False)[0].tolist()
    res = index.query(
        vector = q_vec,
        top_k = top_k,
        include_metadata = True,
        filter = acl_filter,
        **({"namespace": namespace} if namespace is not None else {})
    )
    matches = res.get("matches", [])
    ids = [m["id"] for m in matches]

    # 2) Fetch raw text from SQLite and build labeled context
    id_to_text = fetch_texts_by_ids(ids)
    context, _used_ids = build_context_from_matches(matches, id_to_text, max_chars=max_context_chars)

    # 3) Constrained answer, then redact
    draft = answer_with_constrained_model(question, context)
    safe_answer = redact_output(draft)

    return context, safe_answer


In [None]:
# User question
question = "I clicked a phishing link and my endpoint is acting weird"

context, safe_answer = secure_rag_answer(
    question = question,
    index = index,
    model = model,
    acl_filter = acl_filter,
    top_k = 8,
    namespace = None,
    max_context_chars = 4000
)

print("Context used:\n", context, "...\n","\n-----------------\n")
print("Model's answer:\n", safe_answer)

In [None]:
# User question
question = "Hi, my workstation popped up a malware alert from the endpoint agent."

context, safe_answer = secure_rag_answer(
    question = question,
    index = index,
    model = model,
    acl_filter = acl_filter,
    top_k = 8,
    namespace = None,
    max_context_chars = 4000
)

print("Context used:\n", context, "...\n","\n-----------------\n")
print("Model's answer:\n", safe_answer)