release: 0.7.7 — forward real model + tools to /gate pre-flight (T4) by maltsev-dev · Pull Request #36 · nullrunio/nullrun-sdk-python

maltsev-dev · 2026-06-27T16:31:33Z

0.7.7 — forward real model + tools to /gate pre-flight (T4)

Bug: Pre-0.7.7 every SDK /gate call for any workflow with a budget was hard-blocked with

"Tool 'llm' was blocked because policy 'Rule 1 (cost_limit)' (score 70.00) matched"

because the runtime hard-coded model="budget-precheck" as a sentinel placeholder. The backend's PolicyEvaluationGraph.evaluate() stub treated any synthetic cost_limit rule with score > 0.8 as Block (backend/src/policy/graph.rs:448-462, backend/src/proxy/http/gate/internal.rs:619-628), so the pricing lookup never landed on a real model and the rule fired with the wrong score.

Fix: Forward the real model name and tool list to /gate via a new per-call context API.

Added

nullrun.set_call_context(model=..., tools=[...]) — per-call context the SDK forwards to /gate so the backend can enforce budget tiers and tool-block on real values.
```
import nullrun

with nullrun.workflow(name="support-bot"):
    nullrun.set_call_context(
        model="claude-sonnet-4-6",
        tools=["shell.run", "code.eval"],
    )

    @nullrun.protect
    def chat(message: str) -> str:
        return agent.run(message)
```
- model (optional) — backend uses it to look up the per-model rate from tool_pricing (Postgres) so projected_cost matches what /track will compute from real token counts. None → backend falls back to the claude-sonnet-4 default rate.
- tools (optional) — backend matches each against the workflow's effective blocked_tools aggregate and returns block on any match. None leaves whatever was previously set; [] clears.
- nullrun.get_call_model() / nullrun.get_call_tools() are the read-side helpers (also reachable via nullrun.context.*).

Fixed

/gate pre-flight no longer sends model="budget-precheck". check_workflow_budget now reads get_call_model() (or None when unset) instead of the placeholder. Default workflows with a budget now return allow instead of blanket-block.
/gate pre-flight now forwards the per-call tools list. Transport.check previously dropped the tools key from the wire payload, so even set_call_context(tools=[...]) had no effect on /gate. The transport now propagates tools when set; [] vs missing-None are distinguished on the wire per gate/internal.rs::check_tool_block ("no tools will be called" is different from "I did not tell you what tools").

Tests

New: tests/test_gate_real_path.py (226 lines) pins the fix:
- TestGateRealPathRegression — default request allows a clean workflow (not the old blanket block), wire payload has no policy-N residue from the old graph plumbing, real decision="block" still raises WorkflowKilledInterrupt (so the fix didn't accidentally remove the real-block path).
- TestSetCallContext — set_call_context(model=...) flows into the body, set_call_context(tools=[...]) flows into the body, no-context means no tools key at all (not []), and set_call_context(tools=[]) clears a previously-set tool list.
- TestPackageExports — the new helpers are reachable from nullrun.*.
tests/conftest.py — reset_runtime fixture now also clears _call_model_var and _call_tools_var so a test's set_call_context(...) doesn't leak into the next test's wire payload.

Migration

No breaking changes. New helpers default to None / empty, so existing call sites (and every test in the suite) keep working without modification.

Diff stats

 CHANGELOG.md                       |   97 ++++++
 pyproject.toml                     |    2 +-
 src/nullrun/__init__.py            |    8 +
 src/nullrun/context.py             |   62 +++++
 src/nullrun/runtime.py             |   28 ++-
 src/nullrun/transport.py           |   12 +
 tests/conftest.py                  |    8 +
 tests/test_gate_real_path.py       |  226 +++++++++++++++++ (new)
 8 files changed, 433 insertions(+), 3 deletions(-)

Additive patch on top of the 0.7.0 thin-client refactor. No breaking changes. Added ----- * nullrun.integrations.fastapi — one-line FastAPI integration that turns every NullRunDecision / NullRunInfrastructureError thrown by @nullrun.protect endpoints into a clean JSON response with the right HTTP status code. No per-endpoint except blocks required. Response shape: {"error_code": "NR-B004", "user_message": "You've reached the usage limit...", "category": "decision"} HTTP status mapping: * NR-B004 (budget), NR-L001 (loop), NR-R001 (rate) -> 429 with optional Retry-After * NR-T001 (tool blocked), NR-X001 (generic block) -> 403 * NR-W003 (paused) -> 503 with Retry-After * NR-W002 (killed) -> 503; WorkflowKilledInterrupt is a BaseException subclass so Starlette's add_exception_handler refuses it — handled via ASGI middleware instead (hybrid pattern, documented in module docstring). * NullRunInfrastructureError subclasses -> 503 (our side, not user's). * nullrun.messages — default user-facing message catalog. Every NR-* error code has an English default message owned by NULLRUN, not customer code. Customer Support Bots hitting a budget cap show the same wording across every NullRun-backed application. * format_user_message(exc) — render exception as user-facing string * set_user_message(code, text) — per-process override for branded variants * get_user_message(code) — raw lookup * reset_overrides() — clear all overrides (for tests) Changed ------- * Transport._send_batch canonical JSON serialization — route the /track/batch body through _signed_request_body for consistent compact-separator serialization. HMAC itself is unaffected, but consistent serialization removes a special-case from the wire-format contract tests. * Transport._send_batch actions response handling — backend renamed BatchTrackResponse.actions_taken (debug names) -> BatchTrackResponse.actions (ActionTaken structs). Read both for forward-compat; per-element try/except so one malformed entry doesn't abort the whole loop. * pyproject.toml metadata — long-form description with search keywords, Maintainer: populated via maintainers=[...], expanded classifiers (Linux / Windows / macOS, Python 3.13, CPython, Security / AI / WWW/HTTP topics), project URL expander. Tests ----- * tests/test_messages.py (new, 282 lines) — catalog completeness (every NR-* code has a default message), override / reset behavior, render path. * tests/test_integrations_fastapi.py (new, 289 lines) — HTTP status mapping per error code, response shape, ASGI middleware path for WorkflowKilledInterrupt, hybrid composition. * tests/test_decision_split.py (new, 199 lines) — pins the decision / infrastructure error split. * Updates to tests/test_runtime.py, tests/test_extractors.py reflecting transport canonical-JSON + actions-renamed changes. Release plumbing ---------------- * pyproject.toml: version bumped 0.7.0 -> 0.7.6 * src/nullrun/__version__.py: __version__ = "0.7.6" * CHANGELOG.md: full 0.7.6 entry covering additions, transport changes, metadata improvements Tests pass locally (per session log) — pytest on Windows / Python 3.14.2 is green.

…padding PR #35 (release/0.7.6) failed all four CI jobs (test 3.10/3.11/3.12, coverage, codecov/patch) on the same root cause + one latent bug masked by it. This commit lands the fixes plus the last-mile tests that bring coverage above the 82% threshold. CI failure root --------------- * tests/test_integrations_fastapi.py does from fastapi import ... at module top-level. CI installs only pip install -e '.[dev]', and fastapi was declared as an *optional* [fastapi] extra, NOT in [dev]. Pytest collection aborted with ModuleNotFoundError: No module named 'fastapi' → all 4 jobs red. * Fix: add fastapi>=0.100,<1.0 to [dev]. Same precedent as langchain-core (already in [dev] for the same import-time contract: nullrun.instrumentation.langgraph is eager-imported from nullrun.decorators at collection time, so the test extras must cover the import chain). Latent bug surfaced by the first fix ------------------------------------ The same PR refactored Transport._send_batch_with_retry_info to route the /track/batch body through _signed_request_body for canonical-JSON serialization (matching /gate and /execute). The two sibling call sites use the module-level helper _signed_request_body (no self.); this one used self._signed_request_body by typo. Result: AttributeError on every batch flush, breaking 15 existing tests across test_transport.py / test_track_batch_retry.py / test_integration_contract.py / test_signal_safety.py. As long as the fastapi collection error aborted pytest, this was hidden. Fixed to _signed_request_body(...) with a docstring noting why it is module-level and what the bug looked like. Coverage padding (codecov/patch was failing on this too) -------------------------------------------------------- Total coverage on the failing CI run was 81.98% — 0.02pp under the fail-under=82 gate. After the two fixes above it would have recovered to ~82.0% on the dot, so I added minimal tests for the cheapest-to-cover gaps: * tests/test_breaker_main.py (new) — covers the 5 statements in nullrun.breaker.__main__.main() (0% → 100%). The module exists so python -m nullrun.breaker exits cleanly instead of failing with No module named nullrun.breaker.__main__; the previous fix-mechanism was return 0 after a print, but no test was exercising it. * tests/test_status.py — extends TestSummary with seven scenarios covering each conditional branch of NullRunStatus.summary() (organization_id, workflow_id, workflow_state != Normal, backend_reachable=False, ws_connected=False, recent_errors). status.py jumps 84.52% → 98.81%. * tests/test_integrations_fastapi.py — four tests on _build_headers covering non-numeric, zero, negative, and resume_after (the WorkflowPausedException code path). integrations/fastapi.py jumps 90.22% → 94.57%. After all three: TOTAL 81.98% → 82.46%, comfortably above the gate. Verification ------------ * Local pytest: 997 passed, 13 skipped, 0 failed (Windows / Python 3.14.2, 8m47s — same env the original commit was validated in). * python -m coverage report — 82.46%, no fail-under complaint.

…ng/tools Patch coverage on PR #35 was 62.38% against a 65% threshold (codecov target 70% / threshold 5pp). The two biggest delta-holders against master were auto.py (+286) and langgraph.py (+221), both dominated by Phase 4.1 additions: * auto._normalize_finish_reason + _FINISH_REASON_MAP * auto._openai_extractor second-tier fields (cache_read_tokens, cache_write_tokens, reasoning_tokens, finish_reason, tool_names) * auto._anthropic_extractor cache_read / cache_write * langgraph._safe_get_gen_message * langgraph._get_finish_reason (5-source fallback chain) * langgraph.extract_usage_from_response second-tier fields These are pure / near-pure functions with no network or vendor SDK calls. Coverage padding is cheap — pin the canonical wire shapes once and the backend ingest contract gets a free live spec. Local numbers: * auto.py 63.44% -> 64.01% (file-level, +57 statements) * langgraph.py 78.50% -> 86.01% (file-level, +32 statements) * TOTAL 82.46% -> 83.13% (already above 82% gate) 41 tests, all green. Existing test_extractors.py and test_langgraph_callback.py left untouched — these tests deliberately target the Phase 4.1 fields (cache_read / cache_write / reasoning / finish_reason / tool_names) that the older tests didn't pin.

Pre-0.7.7 every SDK /gate call for any workflow with a budget was hard-blocked because the runtime hard-coded the literal string "budget-precheck" as the model. The backend's PolicyEvaluationGraph treated any synthetic cost_limit rule with score > 0.8 as Block, so the pricing lookup never landed on a real model and the rule fired with the wrong score. This commit: * Adds nullrun.set_call_context(model=..., tools=[...]) plus get_call_model / get_call_tools helpers (and the underlying _call_model_var / _call_tools_var contextvars in nullrun.context). * Wires the call context into check_workflow_budget: the /gate payload now carries the real model name (or None when unset) and the user-supplied tool list. tools=[] vs missing-None are distinguished on the wire per gate/internal.rs::check_tool_block. * Transport.check forwards the tools key when set (it was silently dropped pre-fix). * tests/conftest.py reset_runtime clears the new contextvars so a test's set_call_context(...) doesn't leak into the next test's wire payload. * New tests/test_gate_real_path.py pins down the regression: default request allows a clean workflow, real block still honored, no policy-N residue on the wire, set_call_context flows into the body, no-context means no tools key, and the helpers are reachable from nullrun.*. Bumps version to 0.7.7. No breaking changes - new helpers default to None / empty so existing call sites keep working.

Conflict resolution between release/0.7.7 (T4 per-call context for /gate) and origin/master (Release/0.7.6 #35, which bumped the SDK to 0.7.6): * pyproject.toml: keep 0.7.7 (the HEAD side). 0.7.6 on master is superseded by 0.7.7 once this merges. * CHANGELOG.md: keep BOTH the new 0.7.7 block (from HEAD) and the 0.7.6 block (from master). They document different releases and are listed in chronological order with the older 0.7.6 block below. * src/nullrun/{__init__.py, runtime.py, transport.py}: auto-merged cleanly - master doesn't touch the T4 hunks. Auto-merge result equals HEAD, but the merge commit is still needed to record the parent relationship and clear the conflict state on the PR.

codecov · 2026-06-27T16:54:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Conflict resolution between release/0.7.8 (fail-loud on deprecated surface) and origin/master (release: 0.7.7 #36, the squash-merge of PR #36 which bumped the SDK to 0.7.7): * pyproject.toml: keep 0.7.8 (the HEAD side). 0.7.7 on master is superseded by 0.7.8 once this merges. * src/nullrun/__version__.py: keep 0.7.8 (same reasoning). * CHANGELOG.md: keep BOTH the new 0.7.8 block (from HEAD) and the 0.7.7 block (from master). They document different releases and are listed in chronological order with the older 0.7.7 block below. * src/nullrun/runtime.py and src/nullrun/transport.py: auto-merged cleanly - master doesn't touch the 0.7.8 hunks. * Test files: auto-merged cleanly - master doesn't touch the 0.7.8 test changes either. Auto-merge result equals HEAD, but the merge commit is still needed to record the parent relationship and clear the conflict state on the PR.

maltsev-dev added 5 commits June 27, 2026 12:14

maltsev-dev merged commit ae48ccd into master Jun 27, 2026
5 checks passed

maltsev-dev deleted the release/0.7.7 branch June 27, 2026 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

release: 0.7.7 — forward real model + tools to /gate pre-flight (T4)#36

release: 0.7.7 — forward real model + tools to /gate pre-flight (T4)#36
maltsev-dev merged 5 commits into
masterfrom
release/0.7.7

maltsev-dev commented Jun 27, 2026

Uh oh!

codecov Bot commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

maltsev-dev commented Jun 27, 2026

0.7.7 — forward real model + tools to /gate pre-flight (T4)

Added

Fixed

Tests

Migration

Diff stats

Uh oh!

codecov Bot commented Jun 27, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant