pii-proxy

A local reverse proxy that intercepts every outgoing request to Anthropic and OpenAI, replaces personal information and credentials with realistic pseudonyms, then restores the real values in responses — so the AI provider never sees your actual PII.

Install

One-click installer — download the file for your platform from the latest release and run it. No other files needed.

Platform	File
macOS	`pii-proxy-installer-mac.command` — double-click in Finder
Linux	`pii-proxy-installer-linux.sh` — `bash pii-proxy-installer-linux.sh`
Windows	`pii-proxy-installer-windows.bat` — double-click

The installer sets up a Python virtual environment, downloads dependencies and the spaCy language model, and configures the proxy to start automatically on login. Python 3.9+ is installed automatically if not found.

For a manual setup, see Quick start below.

Why pii-proxy

Zero changes to your prompts. Route your AI client through the proxy with one env var. Your workflow stays identical.
Deterministic pseudonyms. The same real value always produces the same fake, keeping the model's reasoning consistent and the upstream prompt cache warm.
Full round-trip fidelity. Responses are de-anonymized before they reach your screen. Tool calls and file writes contain the correct real values.
Covers what you forget. Beyond your explicit PII list, the proxy runs regex (email, phone, SSN, credit card, IP, ZIP, URL), a credential scanner (AWS keys, GitHub tokens, JWTs, Stripe keys, .env-style secrets), and spaCy NER — catching names and places you didn't think to list.
Multi-provider, single instance. One proxy handles both Anthropic (/v1/messages) and OpenAI (/v1/chat/completions) simultaneously.

How it works

Claude Code                OpenAI SDK client
    │  ANTHROPIC_BASE_URL=     │  OPENAI_BASE_URL=
    │  http://localhost:8082    │  http://localhost:8082
    └─────────────┬────────────┘
                  ▼
       pii_proxy.py  (aiohttp, port 8082)
                  │
                  ├─ route by path ──────────────────────────────────
                  │      /v1/messages          →  AnthropicProvider
                  │      /v1/chat/completions  →  OpenAIProvider
                  │      everything else       →  pass through untouched
                  │
                  ├─ anonymize request body  ────────────────────────────
                  │      [system prompt]      regex + known_pii                no NER
                  │      [latest user msg]    regex + known_pii + NER          full pipeline
                  │      [history user msgs]  regex + known_pii + map replay   no NER (fast)
                  │      [assistant turns]    regex + known_pii                no NER
                  │      [tool / tool_result] regex + known_pii + map replay   no NER
                  │
                  ├─ forward to upstream API  (pseudonymized request)
                  │
                  ├─ receive response
                  │
                  └─ deanonymize response  →  client sees real values

Detection pipeline

Stage 1  known_pii.yaml   exact match (highest precision, zero false positives)
Stage 2a PATTERNS regex   email, phone, SSN, credit card, IP, ZIP, URL
Stage 2b secret_scan      AWS keys, GitHub tokens, Slack tokens, JWT, private keys,
                          Stripe/OpenAI/Anthropic keys, ENV-style KEY=value secrets
Stage 3  spaCy NER        PERSON (≥2 words), GPE, LOC  — latest user message only
Stage 3' map replay       fast string-match against session map  — history messages

First match wins — known_pii > regex > NER for the same string. Values listed under ignore: are exempt from all stages. Replacements are applied longest-first to prevent partial matches (e.g. "John" never clobbers "Johnson").

NER scoping: spaCy only runs on the newest user message. All prior user messages and tool results use a fast string-match against the session map — anything NER ever discovered is already stored there, so no coverage is lost and NER cost stays constant regardless of conversation length.

File path and localhost exemptions: Username segments inside /Users/<name>/ and /home/<name>/ paths are never anonymized — anonymizing them would break file operations. Similarly, http://localhost and 127.x.x.x addresses are exempt from the URL and IP regex stages.

Pseudonymization

fake_for(label, original) seeds Faker with md5(original)[:8] so the same real value always produces the same fake.

Label	Fake looks like
PERSON	`Grace Daniels`
EMAIL	`espinozasamuel@example.net`
PHONE	`+737-907-7967x1625`
ADDRESS	`USS Steele, FPO AE 51334`
EMPLOYER / ORG	`Steele, Bond and Huff`
SECRET_AWS_KEY	`AKIAxxx...` (AKIA prefix preserved)
SECRET_GITHUB_PAT	`ghp_xxx...`
SECRET_JWT	same segment lengths, random base64
IP_ADDRESS	valid random IPv4

Requirements

macOS (uses launchd for auto-start; the proxy itself runs on any OS)
Python 3.9+
~685 MB RAM for the spaCy NER model

Quick start

1. Install dependencies

cd ~/path/to/pii-proxy
python3 -m venv venv
./venv/bin/pip install -r requirements.txt
./venv/bin/python -m spacy download en_core_web_sm

Tip: If spaCy is already installed system-wide (via uv or Homebrew) and the model won't load inside venv, download the wheel directly:
./venv/bin/pip install "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl"

2. Create your PII list

cp known_pii.example.yaml ~/.pii-proxy/known_pii.yaml
chmod 600 ~/.pii-proxy/known_pii.yaml
# edit with your real names, emails, phones, addresses, employer, family

3. Install the launchd service

launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.jai.pii-proxy.plist

4. Route your AI clients through the proxy

Add to ~/.zshrc (or ~/.bashrc):

export ANTHROPIC_BASE_URL=http://localhost:8082
export OPENAI_BASE_URL=http://localhost:8082

Restart your terminal (and any AI clients) to pick up the change.

5. Verify

curl -s http://localhost:8082/health | python3 -m json.tool

You should see "status": "ok" and a map_entries count. Send a message in Claude Code — the count will grow.

`known_pii.yaml` structure

identity:
  names:
    - Your Full Name
    - Nickname
  emails:
    - you@example.com
  phones:
    - "+1-555-000-0000"
  addresses:
    - 123 Main St, Springfield IL 62701

employer:
  names:
    - Company Name
    - ABBREV
  domains:
    - company.com

family:
  - names: ["Spouse Name", "Spouse"]
    relationship: spouse
  - names: ["Child Name"]
    relationship: child

projects:
  - codename: InternalName
    real_name: ExternalBrandName

ignore:
  - 8082        # port number — not sensitive
  - 127.0.0.1   # localhost — not sensitive
  # - v2.1.3   # version string the IP regex catches incorrectly

Tips:

List every alias you go by — Stage 1 is exact-match only.
Single words (e.g. a first name alone) won't be caught by NER (requires ≥2 words), so add them explicitly here.
Use ignore: for values the pipeline flags incorrectly (port numbers, internal IPs, version strings).
Changes take effect on proxy restart.

Managing the proxy

# Status and map entry count
curl -s http://localhost:8082/health

# View the full real→fake map
curl -s http://localhost:8082/map | python3 -m json.tool

# Restart (picks up changes to pii_proxy.py or known_pii.yaml)
launchctl kickstart -k gui/$(id -u)/com.jai.pii-proxy

# Stop
launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.jai.pii-proxy.plist

# Start
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.jai.pii-proxy.plist

# Reset the pseudonym map (all fakes regenerate on next request)
launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.jai.pii-proxy.plist
rm ~/.pii-proxy/map.json
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.jai.pii-proxy.plist

Debugging

Live redaction log

tail -f /tmp/pii-proxy.err

Labels in the log correspond to which part of the request body triggered the redaction:

[system]      — system prompt
[user]        — latest user message (full NER); history user messages (map replay)
[assistant]   — prior assistant turns
[tool]        — OpenAI tool role (map replay)
[tool_result] — Anthropic tool_result blocks

Example:

2026-05-17 08:29:30 INFO   [system] redacted: 'you@company.com' → 'john85@example.org'
2026-05-17 08:29:30 INFO   [user] redacted: 'Your Name' → 'Grace Daniels'

Why are old messages being redacted on every request?

Claude Code sends the full conversation history in every API call. The proxy scans all of it — not just your latest message. History messages use fast map replay rather than spaCy NER, so the cost stays flat regardless of conversation length.

Run tests

cd ~/path/to/pii-proxy
./venv/bin/python tests/test_roundtrip.py

Test a specific string manually

./venv/bin/python - <<'EOF'
from anonymizer import anonymize_text, load_nlp, load_known_pii
from session_map import SessionMap

nlp = load_nlp()
smap = SessionMap(path=None)
known_pii = load_known_pii("/Users/you/.pii-proxy/known_pii.yaml")

text = "My name is Your Name, email is you@company.com"
anon, rep = anonymize_text(text, nlp, smap, known_pii)
print("Anonymized:", anon)
print("Restored:", smap.deanonymize(anon))
EOF

PDF handling

By default, PDF blocks pass through unmodified — Anthropic's servers decode them server-side.

To enable PDF text extraction and PII scanning, set PII_PDF_SCAN=true and install pymupdf:

./venv/bin/pip install "pymupdf>=1.24"
export PII_PDF_SCAN=true

For a permanent setting, add PII_PDF_SCAN to the EnvironmentVariables dict in your launchd plist, then reload:

launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.jai.pii-proxy.plist
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.jai.pii-proxy.plist

When enabled, each type: document PDF block is extracted with pymupdf, the full detection pipeline runs on the text, and the block is replaced with pseudonymized plain text before forwarding.

Tradeoffs:

	PDF_SCAN off	PDF_SCAN on
PII in PDFs redacted	No	Yes
Claude sees PDF formatting	Yes	No — plain text only
Claude sees images in the PDF	Yes	No — images are discarded
Scanned PDFs (image-based)	Readable by Claude	Blank — no text layer to extract
Processing overhead	None	~5–20ms per page

Best for: text-heavy documents where layout is not critical (contracts, reports, HR documents). Leave disabled when Claude needs to reason about visual layout, forms, or embedded images.

Performance

Component	Cost	Scales with
spaCy NER	5–50ms	fixed per request (latest message only)
Regex + secret scan	<1ms	message size
Map replay (history)	<1ms	session map size × history length
Streaming deanonymize	<1ms per chunk	chunk size
Localhost loopback	<1ms	—
spaCy model in RAM	~685MB fixed	—

The dominant latency is always the upstream API (1–30+ seconds). Proxy overhead is well under 100ms.

Common issues

Symptom	Cause	Fix
`curl health` returns connection refused	Proxy not running	`launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.jai.pii-proxy.plist`
spaCy model not found at startup	Model installed to wrong environment	Run `./venv/bin/python -m spacy download en_core_web_sm`
Real name not redacted	Single-word name not in `known_pii.yaml`	NER requires ≥2 words; add the name explicitly to the YAML
PII appears in Claude's response	Tool input not deanonymized	Streaming tool inputs are deanonymized; check logs for missing label
Map grows without bound	Each unique real value gets one entry	Expected; entries are tiny (~100 bytes each)
Fakes changed after map delete	Map deleted without proxy restart	Stop proxy → delete map → start proxy; never delete while running
`ANTHROPIC_BASE_URL` not picked up	Env var set after Claude Code launched	Restart Claude Code after setting the env var
OpenAI requests not redacted	Using wrong path	Confirm client sends to `/v1/chat/completions`; other paths pass through unmodified

Security notes

~/.pii-proxy/ is mode 0700; map.json and known_pii.yaml are mode 0600.
The /map endpoint binds to 127.0.0.1 only — not reachable from the network.
Deny rules in ~/.claude/settings.json block Claude from reading ~/.pii-proxy/** directly.
Secrets (AWS keys, tokens, etc.) are pseudonymized, not erased. The proxy holds the real value in memory and in map.json; the upstream API only ever sees the fake. De-anonymization restores real values so model-generated tool calls (e.g. writing a .env file) contain correct credentials on your disk.

Disabling the proxy

To run Claude Code without anonymization, you need to both unset the env var and relaunch Claude Code (it inherits env vars at startup, not dynamically).

Temporarily (current terminal session only):

unset ANTHROPIC_BASE_URL
unset OPENAI_BASE_URL
# relaunch Claude Code from this terminal

The proxy can stay running — Claude Code just won't route through it.

Permanently (until you re-enable):

Comment out the lines in ~/.zshrc:

# export ANTHROPIC_BASE_URL=http://localhost:8082
# export OPENAI_BASE_URL=http://localhost:8082

Open a new terminal and relaunch Claude Code.

To also stop the proxy process:

launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.jai.pii-proxy.plist

To re-enable:

launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.jai.pii-proxy.plist
# uncomment ANTHROPIC_BASE_URL/OPENAI_BASE_URL in ~/.zshrc, then restart terminal + Claude Code

The env var is the real switch — the proxy can be running but harmless as long as Claude Code doesn't point at it.

Contributing

Issues and pull requests are welcome. Before submitting a change:

Run the test suite: ./venv/bin/python tests/test_roundtrip.py
Keep new detection patterns in secret_scan.py or anonymizer.py as appropriate
Add a test case in tests/test_roundtrip.py for any new PII type or edge case

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dist		dist
installer		installer
providers		providers
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
anonymizer.py		anonymizer.py
config.py		config.py
install.sh		install.sh
known_pii.example.yaml		known_pii.example.yaml
make_dist.py		make_dist.py
pii_proxy.py		pii_proxy.py
pseudonymizer.py		pseudonymizer.py
requirements.txt		requirements.txt
secret_scan.py		secret_scan.py
session_map.py		session_map.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pii-proxy

Install

Why pii-proxy

How it works

Detection pipeline

Pseudonymization

Requirements

Quick start

1. Install dependencies

2. Create your PII list

3. Install the launchd service

4. Route your AI clients through the proxy

5. Verify

`known_pii.yaml` structure

Managing the proxy

Debugging

Live redaction log

Why are old messages being redacted on every request?

Run tests

Test a specific string manually

PDF handling

Performance

Common issues

Security notes

Disabling the proxy

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pii-proxy

Install

Why pii-proxy

How it works

Detection pipeline

Pseudonymization

Requirements

Quick start

1. Install dependencies

2. Create your PII list

3. Install the launchd service

4. Route your AI clients through the proxy

5. Verify

known_pii.yaml structure

Managing the proxy

Debugging

Live redaction log

Why are old messages being redacted on every request?

Run tests

Test a specific string manually

PDF handling

Performance

Common issues

Security notes

Disabling the proxy

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`known_pii.yaml` structure

Packages