🛡️ OCULTAR Monorepo

Zero-egress PII refinery for AI pipelines. Runs in your infrastructure. Your data never leaves.

Important

Featured Article: OpenAI shipped a model. We built the system. 📖 OpenAI shipped a model. We built the system. Read on dev.to

Quick Security Stats

Stat	Value
SSRF bypass vectors found + fixed	2
Fail-closed scenarios tested	6
Vault persistence	Named Docker volume
Tier 2 engine	OpenAI Privacy Filter (Apache 2.0)
Key management	Doppler

Welcome to the Unified OCULTAR Engine. This monorepo contains the core refinery, integrated applications, and enterprise security tiers.

Structure

/apps/ - Applications (Proxy, Sombra Gateway, SLM Engine, Dashboard, Automation Bridge, Web)
/services/ - Core backend logic (Refinery, Vault, Mock API)
/enterprise/ - Enterprise security extensions & licensing logic
/internal/pii/ - Centralized PII detection engine & registry
/extensions/ - Third-party AI tool integrations (Goose MCP, etc.)
/docs/ - Technical and product documentation
/security/ - Regulatory policies and integrity manifests

Security Model

OCULTAR is built on a Zero-Trust for Data architecture. It is designed for senior security engineers who require verifiable guarantees before connecting internal data to external AI providers.

Zero-Egress: A hard architectural guarantee. All PII detection and tokenization happen within your trust boundary. No network calls are made to third-party detection providers.
Fail-Closed: 6 critical failure modes are rigorously tested (SLM timeout, vault failure, empty boot-guard, queue saturation, refinery internal error, re-hydration failure). In all cases, OCULTAR blocks the request rather than degrading to plaintext exposure.
SSRF Protection: Hardened IP/DNS validation blocking RFC 1918 and 169.254.169.254 (IMDS) ranges with active DNS rebinding safety. 2 bypass vectors (including IPv6 loopback and non-standard decimal encoding) were identified and patched during adversarial testing.
Secure Vault: AES-256-GCM encryption with keys derived via HKDF-SHA256. The vault is persisted via a named Docker volume to survive redeployments while keeping the master key in memory.
Ed25519 Audit Logs: Tamper-proof, hash-chained audit trails signed with Ed25519. Every vault event (matching or vaulting) is logged for SIEM ingestion and compliance verification.

Multi-Tier Refinery Pipeline

Tokenization is handled via a defense-in-depth pipeline that runs before any payload reaches an upstream AI provider.

Tier	Shield	Technical Description
0.1	Base64 Evasion	Decodes, scans, and re-encodes PII hidden inside Base64/JWT blobs.
0	Dictionary	High-speed protection for VIPs, internal projects, and sensitive org names.
0.5	Pattern + Entropy	Shannon scoring for high-entropy strings, catching keys and tokens.
1	Rule Engine	EMAIL, SSN, IBAN (MOD97), CC (Luhn mod-10), 50+ national ID types.
1.1	Phone Shield	libphonenumber validation to reduce false positives on digit sequences.
1.2	Address Shield	Heuristic street address parser supporting EN/FR/ES/DE.
1.5	Greeting/Signature	Detects names in salutations ("Regards, Jean") and intro sentences.
2	AI NER	OpenAI Privacy Filter — 1.5B param, local inference. Optimized for French Finance.
3	Structural Heuristics	Proximity expansion: `[TOKEN] ET Dupont` → re-tokenized as single entity.

Why OCULTAR Is Different

Obfuscation-Resistant: Recursive Base64 Scanning

Most PII filters operate on plaintext. A sophisticated attacker can embed sensitive data inside a Base64-encoded blob inside a JSON field, bypassing naive pattern matching. OCULTAR decodes and recursively scans every Base64 segment, running the full pipeline on the decoded content.

Luhn-Validated Credit Card Detection

OCULTAR applies the Luhn algorithm (mod-10 checksum) to every credit card candidate before vaulting it. A match that fails Luhn is passed through without redaction or vault storage, eliminating the noise typical of regex-only filters.

Deterministic Tokens for Privacy-Safe Analytics

Tokens are derived from SHA-256(original_PII). The same input always produces the same token. This allows you to run aggregations, joins, and frequency analysis on fully tokenized data without de-tokenizing it — preserving analytical value while eliminating privacy risk.

Extensions

Goose AI Workflow Integration

Zero-egress PII protection for Goose AI workflows.

pip install ocultar-goose-mcp

Read the launch story: OpenAI shipped a model. We built the system.

Integration Boundary

Ocultar's responsibility ends at POST /refine. It returns cleanText and a vault token map. It has no knowledge of downstream AI decisions. Callers must fail loudly if Ocultar is unavailable — never degrade gracefully by passing raw data.

Getting Started

Secrets Management: OCULTAR uses Doppler for secure secret injection.
```
doppler setup
```
Go Workspace:
```
go work sync
```
Build and Run:
```
make build
./scripts/start.sh
```

Development

Documentation: See /docs/reference for architecture details.
Testing: Run go test ./... to verify all modules.

Discovery & Community

Topics: privacy, gdpr, pii, golang, ai-security, zero-trust, llm, data-privacy
License: Apache 2.0 (Open-Core)

Name		Name	Last commit message	Last commit date
Latest commit History 266 Commits
apps		apps
archetypes		archetypes
configs		configs
content		content
data/fine-tune		data/fine-tune
datasets		datasets
docker		docker
docs		docs
experiments		experiments
extensions		extensions
internal/pii		internal/pii
models/privacy-filter-fr-finance		models/privacy-filter-fr-finance
monitoring		monitoring
pkg/gateway		pkg/gateway
reports		reports
scripts		scripts
security		security
services		services
tools		tools
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
PRIVACY.md		PRIVACY.md
README.md		README.md
SECURITY.md		SECURITY.md
dashboard		dashboard
dist.manifest.yaml		dist.manifest.yaml
docker-compose.community.yml		docker-compose.community.yml
docker-compose.proxy.yml		docker-compose.proxy.yml
ecosystem.manifest.json		ecosystem.manifest.json
go.work		go.work
go.work.sum		go.work.sum
hugo.toml		hugo.toml
privacy_filter_results.json		privacy_filter_results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ OCULTAR Monorepo

Quick Security Stats

Structure

Security Model

Multi-Tier Refinery Pipeline

Why OCULTAR Is Different

Obfuscation-Resistant: Recursive Base64 Scanning

Luhn-Validated Credit Card Detection

Deterministic Tokens for Privacy-Safe Analytics

Extensions

Goose AI Workflow Integration

Integration Boundary

Getting Started

Development

Discovery & Community

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ OCULTAR Monorepo

Quick Security Stats

Structure

Security Model

Multi-Tier Refinery Pipeline

Why OCULTAR Is Different

Obfuscation-Resistant: Recursive Base64 Scanning

Luhn-Validated Credit Card Detection

Deterministic Tokens for Privacy-Safe Analytics

Extensions

Goose AI Workflow Integration

Integration Boundary

Getting Started

Development

Discovery & Community

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages