Async LLM safety pipeline with input/output guardrails and observability tracing.
pynop is published on PyPI under the distribution name pynop-ai. The Python import name is pynop.
pip install pynop-ai
# or
uv add pynop-aiimport pynop
from pynop import SafetyPipelineThe core install ships with OpenAI support, Guardrails-AI, and Langfuse tracing. Additional providers and tools are available as optional extras:
| Extra | Adds | Use when |
|---|---|---|
pynop-ai[anthropic] |
langchain-anthropic |
You configure provider: anthropic in YAML |
pynop-ai[google] |
langchain-google-genai |
You configure provider: google in YAML |
pynop-ai[nemo] |
nemoguardrails |
You add a type: nemo guard |
pynop-ai[eval] |
garak, giskard |
You call EvalRunner.run_garak / run_giskard |
pynop-ai[all] |
All of the above | You want everything |
pip install "pynop-ai[anthropic,nemo]"
# or, install everything
uv add "pynop-ai[all]"Pynop imports the optional dependencies lazily — picking a provider or tool you didn't install raises a clear ModuleNotFoundError at from_yaml / run_* time.
pynop requires an API key from your chosen LLM provider. Sign up and obtain a key from one of:
- OpenAI
- Anthropic
- Google AI Studio
- Or run a local server (Ollama, vLLM, LM Studio) — no key required
pynop does not cap or monitor LLM spend. Every pipeline run, and each reask retry, incurs token costs. Cost management is the user's responsibility.
Validators must be installed before use via the Guardrails Hub CLI:
guardrails hub install hub://guardrails/detect_pii
guardrails hub install hub://guardrails/toxic_languageBrowse available validators at hub.guardrailsai.com. Any validator referenced in your config that is not installed will cause an AttributeError at pipeline startup.
Pin validators to a specific version to avoid silent behavioral changes when a validator package updates:
guardrails hub install "hub://guardrails/detect_pii~=1.4"Validators do not update automatically. To update to the latest version, run:
guardrails hub install hub://guardrails/detect_pii --upgradeTracing requires a Langfuse instance. Sign up at langfuse.com or self-host. Obtain a public key and secret key from your project settings.
Set the required env vars before running pynop. Missing vars referenced in config raise a ValueError at startup:
export OPENAI_API_KEY=sk-... # or your provider's key
export LANGFUSE_PUBLIC_KEY=pk-... # if tracing is enabled
export LANGFUSE_SECRET_KEY=sk-... # if tracing is enabledimport asyncio
from pynop import SafetyPipeline
async def main():
pipeline = SafetyPipeline.from_yaml("config.yaml")
result = await pipeline.run("Summarize this document for me.")
print(result.output)
# Select an environment profile
pipeline = SafetyPipeline.from_yaml("config.yaml", env="prod")
asyncio.run(main())See config.yaml for the default configuration. Supports:
- LLM: Multi-backend via LangChain — OpenAI, Anthropic, Google, and local (Ollama/vLLM/LM Studio)
- Guards: Guardrails-AI validators (PII, schema) and NeMo Guardrails (jailbreak, content safety) — configurable per input/output slot, run in config order
- Tracing: Langfuse observability (optional, auto-reads env vars)
- Eval thresholds: Configurable pass/fail criteria for evaluation runs
- Environment profiles: Per-environment config overrides (dev, staging, prod)
# OpenAI
llm:
provider: openai
model: gpt-4o-mini
api_key: ${OPENAI_API_KEY}
# Anthropic
llm:
provider: anthropic
model: claude-sonnet-4-20250514
api_key: ${ANTHROPIC_API_KEY}
# Google Gemini
llm:
provider: google
model: gemini-2.0-flash
api_key: ${GOOGLE_API_KEY}
# Local (OpenAI-compatible server — Ollama, vLLM, LM Studio)
llm:
provider: local
model: llama3
base_url: http://localhost:11434/v1
api_key: not-neededYou can also pass a pre-built LangChain BaseChatModel directly to the constructor (skipping from_yaml):
from langchain_openai import ChatOpenAI
from pynop import SafetyPipeline
from pynop.tracing import Tracer
from pynop.types import GuardSlot
custom_llm = ChatOpenAI(model="gpt-4o", temperature=0)
pipeline = SafetyPipeline(
llm_config={"provider": "openai", "model": "gpt-4o"},
input_slot=GuardSlot(), # add guards if you want input validation
output_slot=GuardSlot(), # add guards if you want output validation
tracer=Tracer(enabled=False),
llm=custom_llm,
)Each guard slot (input/output) supports configurable rejection and error strategies:
guards:
input:
on_guard_fail: reject # reject | return_canned | include_reason
on_guard_error: reject # reject | pass
canned_response: "I can't process that request." # required for return_canned
guards:
- type: guardrails_ai
validators:
- name: DetectPII
on_fail: exception
- type: nemo
config_path: ./nemo_configs/input_railson_guard_fail — what happens when a guard rejects input/output. Set at slot level as a default; individual guards can override:
reject(default): raiseGuardRejectionwith generic messagereturn_canned: return aPipelineResultwith thecanned_responsestring, skip LLM callinclude_reason: raiseGuardRejectionwith the guard's rejection reason attachedreask(output guards only): re-call the LLM with the rejection reason appended, then re-run all output guards. Falls back torejectaftermax_reaskretries (default: 2)
guards:
output:
on_guard_fail: reject # slot default
guards:
- type: guardrails_ai
on_guard_fail: reask # per-guard override
max_reask: 3
reask_instruction: "Your response was flagged: {reason}. Rewrite it."
validators:
- name: ToxicLanguage
on_fail: exception
- type: guardrails_ai
# inherits slot default: reject
validators:
- name: DetectPII
on_fail: exceptionGuard ordering matters when mixing strategies — guards run in config order and stop at the first failure.
on_guard_error — what happens when a guard crashes (unexpected exception):
reject(default): treat the error as a guard failure (applieson_guard_failstrategy)pass: log the error, skip the failed guard, continue to next guard
NeMo guards can be configured in two ways:
Inline rails (recommended) — declare rails by name directly in config. pynop generates the NeMo config automatically. Built-in NeMo rails (jailbreak, content safety, PII) are referenced directly; parameterized rails (topic control) accept custom parameters:
guards:
input:
guards:
- type: nemo
rails:
- jailbreak
- topic_control:
allowed: [coding, data science]
denied: [politics, violence]
output:
guards:
- type: nemo
rails:
- content_safety
- piiCustom config directory — for rails that require custom Colang flows, point to a directory containing a config.yml and .co files:
- type: nemo
config_path: ./my_custom_railsrails and config_path are mutually exclusive on a single guard entry. See nemo_configs/ for custom config examples.
Define per-environment overrides in the environments: section. Each profile replaces entire top-level sections (no deep merge). Sections not defined in a profile fall through to the base config.
eval:
max_issues: 0
environments:
dev:
tracing:
enabled: false
eval:
max_issues: 10
ignore_severities: [minor]
prod:
eval:
max_issues: 0Select an environment via the env parameter or the PYNOP_ENV env var:
pipeline = SafetyPipeline.from_yaml("config.yaml", env="dev")
# or: export PYNOP_ENV=devThe eval: section configures pass/fail criteria for evaluation runs:
eval:
max_issues: 0 # maximum issues before failing (default: 0)
ignore_severities: [minor] # exclude these severity levels from the count
garak_severities: # map Garak probe families to severity levels
dan: major
glitch: minor
# unlisted probes default to majorSeverity levels are major, medium, and minor. Without an eval: section, the default is zero-tolerance (any issue fails).
Garak and Giskard can have different thresholds within the same pipeline. Add a garak: or giskard: block under eval: — each block inherits from the top-level defaults and only overrides the keys you set:
eval:
max_issues: 0
ignore_severities: [minor] # default applied to both tools
garak:
max_issues: 0 # zero tolerance for vulnerability scans
ignore_severities: [] # don't ignore minor either
giskard:
max_issues: 3 # lenient for quality checks
ignore_severities: [minor]Use pipeline.eval_threshold_for("garak") (or "giskard") to inspect the resolved threshold from Python. EvalRunner uses the per-tool threshold automatically when computing EvalResult.passed.
Run pre-deployment security evaluations using Garak and Giskard:
from pynop import SafetyPipeline
from pynop.eval import EvalRunner
pipeline = SafetyPipeline.from_yaml("config.yaml")
runner = EvalRunner(pipeline)
# Garak vulnerability scan
garak_result = await runner.run_garak(probes=["dan", "promptinject"])
print(garak_result.summary)
print(garak_result.passed) # uses the eval threshold from config
# Giskard quality scan
giskard_result = await runner.run_giskard(detectors=["prompt_injection"])
print(giskard_result.summary)
print(giskard_result.issues)Both tools evaluate the full pipeline (guards + LLM). Results are traced in Langfuse when tracing is enabled.
Before running evaluations, review the available probe families and detectors to determine which are relevant to your use case:
pynop does not provide CI/CD integration. EvalRunner returns a Python result object — wiring evaluations into a CI pipeline (e.g. failing a build on low scores) is the user's responsibility.
LatencyBenchmark compares the per-guard latency of two pipeline configurations side-by-side. It runs a prompt set through both pipelines, fetches the resulting traces from Langfuse, and reports per-span p50/p95/p99.
from pynop import LatencyBenchmark, SafetyPipeline
baseline = SafetyPipeline.from_yaml("config.baseline.yaml")
candidate = SafetyPipeline.from_yaml("config.candidate.yaml")
benchmark = LatencyBenchmark(baseline, candidate, label_a="baseline", label_b="candidate")
report = await benchmark.run([
"Summarize this document.",
"Explain quantum computing in one paragraph.",
"Write a haiku about CI pipelines.",
])
for span in report.stats_a:
print(f"{span.name:30s} p50={span.p50:.3f}s p95={span.p95:.3f}s")
print(f"baseline total p95: {report.total_a.p95:.3f}s")
print(f"candidate total p95: {report.total_b.p95:.3f}s")LatencyBenchmark requires both pipelines to have Langfuse tracing enabled — it reads the per-span latency from Langfuse rather than instrumenting timers itself.
The default uv run pytest command runs the unit suite with mocked OpenAI, Langfuse, and Guardrails-AI. End-to-end integration tests live in tests/integration/ and are opt-in — they hit real OpenAI, Garak, Giskard, and Langfuse, so they require API keys and network access.
Enable them by setting PYNOP_INTEGRATION=1:
export PYNOP_INTEGRATION=1
export OPENAI_API_KEY=sk-...
export LANGFUSE_PUBLIC_KEY=pk-...
export LANGFUSE_SECRET_KEY=sk-...
uv run pytest tests/integration/Without PYNOP_INTEGRATION=1, every test in tests/integration/ is skipped — safe to run on a developer laptop or in PR-level CI.
uv sync
uv run pytest