Release/0.7.6#35
Merged
Merged
Conversation
Additive patch on top of the 0.7.0 thin-client refactor. No
breaking changes.
Added
-----
* nullrun.integrations.fastapi — one-line FastAPI integration
that turns every NullRunDecision / NullRunInfrastructureError
thrown by @nullrun.protect endpoints into a clean JSON
response with the right HTTP status code. No per-endpoint
except blocks required.
Response shape:
{"error_code": "NR-B004",
"user_message": "You've reached the usage limit...",
"category": "decision"}
HTTP status mapping:
* NR-B004 (budget), NR-L001 (loop), NR-R001 (rate) -> 429
with optional Retry-After
* NR-T001 (tool blocked), NR-X001 (generic block) -> 403
* NR-W003 (paused) -> 503 with Retry-After
* NR-W002 (killed) -> 503; WorkflowKilledInterrupt is a
BaseException subclass so Starlette's
add_exception_handler refuses it — handled via ASGI
middleware instead (hybrid pattern, documented in
module docstring).
* NullRunInfrastructureError subclasses -> 503 (our side,
not user's).
* nullrun.messages — default user-facing message catalog.
Every NR-* error code has an English default message owned
by NULLRUN, not customer code. Customer Support Bots hitting
a budget cap show the same wording across every NullRun-backed
application.
* format_user_message(exc) — render exception as user-facing
string
* set_user_message(code, text) — per-process override for
branded variants
* get_user_message(code) — raw lookup
* reset_overrides() — clear all overrides (for tests)
Changed
-------
* Transport._send_batch canonical JSON serialization — route the
/track/batch body through _signed_request_body for consistent
compact-separator serialization. HMAC itself is unaffected,
but consistent serialization removes a special-case from the
wire-format contract tests.
* Transport._send_batch actions response handling — backend
renamed BatchTrackResponse.actions_taken (debug names) ->
BatchTrackResponse.actions (ActionTaken structs). Read both
for forward-compat; per-element try/except so one malformed
entry doesn't abort the whole loop.
* pyproject.toml metadata — long-form description with search
keywords, Maintainer: populated via maintainers=[...],
expanded classifiers (Linux / Windows / macOS, Python 3.13,
CPython, Security / AI / WWW/HTTP topics), project URL
expander.
Tests
-----
* tests/test_messages.py (new, 282 lines) — catalog
completeness (every NR-* code has a default message),
override / reset behavior, render path.
* tests/test_integrations_fastapi.py (new, 289 lines) — HTTP
status mapping per error code, response shape, ASGI
middleware path for WorkflowKilledInterrupt, hybrid
composition.
* tests/test_decision_split.py (new, 199 lines) — pins the
decision / infrastructure error split.
* Updates to tests/test_runtime.py, tests/test_extractors.py
reflecting transport canonical-JSON + actions-renamed
changes.
Release plumbing
----------------
* pyproject.toml: version bumped 0.7.0 -> 0.7.6
* src/nullrun/__version__.py: __version__ = "0.7.6"
* CHANGELOG.md: full 0.7.6 entry covering additions,
transport changes, metadata improvements
Tests pass locally (per session log) — pytest on Windows /
Python 3.14.2 is green.
ee270ba to
992fdc0
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…padding PR #35 (release/0.7.6) failed all four CI jobs (test 3.10/3.11/3.12, coverage, codecov/patch) on the same root cause + one latent bug masked by it. This commit lands the fixes plus the last-mile tests that bring coverage above the 82% threshold. CI failure root --------------- * tests/test_integrations_fastapi.py does from fastapi import ... at module top-level. CI installs only pip install -e '.[dev]', and fastapi was declared as an *optional* [fastapi] extra, NOT in [dev]. Pytest collection aborted with ModuleNotFoundError: No module named 'fastapi' → all 4 jobs red. * Fix: add fastapi>=0.100,<1.0 to [dev]. Same precedent as langchain-core (already in [dev] for the same import-time contract: nullrun.instrumentation.langgraph is eager-imported from nullrun.decorators at collection time, so the test extras must cover the import chain). Latent bug surfaced by the first fix ------------------------------------ The same PR refactored Transport._send_batch_with_retry_info to route the /track/batch body through _signed_request_body for canonical-JSON serialization (matching /gate and /execute). The two sibling call sites use the module-level helper _signed_request_body (no self.); this one used self._signed_request_body by typo. Result: AttributeError on every batch flush, breaking 15 existing tests across test_transport.py / test_track_batch_retry.py / test_integration_contract.py / test_signal_safety.py. As long as the fastapi collection error aborted pytest, this was hidden. Fixed to _signed_request_body(...) with a docstring noting why it is module-level and what the bug looked like. Coverage padding (codecov/patch was failing on this too) -------------------------------------------------------- Total coverage on the failing CI run was 81.98% — 0.02pp under the fail-under=82 gate. After the two fixes above it would have recovered to ~82.0% on the dot, so I added minimal tests for the cheapest-to-cover gaps: * tests/test_breaker_main.py (new) — covers the 5 statements in nullrun.breaker.__main__.main() (0% → 100%). The module exists so python -m nullrun.breaker exits cleanly instead of failing with No module named nullrun.breaker.__main__; the previous fix-mechanism was return 0 after a print, but no test was exercising it. * tests/test_status.py — extends TestSummary with seven scenarios covering each conditional branch of NullRunStatus.summary() (organization_id, workflow_id, workflow_state != Normal, backend_reachable=False, ws_connected=False, recent_errors). status.py jumps 84.52% → 98.81%. * tests/test_integrations_fastapi.py — four tests on _build_headers covering non-numeric, zero, negative, and resume_after (the WorkflowPausedException code path). integrations/fastapi.py jumps 90.22% → 94.57%. After all three: TOTAL 81.98% → 82.46%, comfortably above the gate. Verification ------------ * Local pytest: 997 passed, 13 skipped, 0 failed (Windows / Python 3.14.2, 8m47s — same env the original commit was validated in). * python -m coverage report — 82.46%, no fail-under complaint.
…ng/tools Patch coverage on PR #35 was 62.38% against a 65% threshold (codecov target 70% / threshold 5pp). The two biggest delta-holders against master were auto.py (+286) and langgraph.py (+221), both dominated by Phase 4.1 additions: * auto._normalize_finish_reason + _FINISH_REASON_MAP * auto._openai_extractor second-tier fields (cache_read_tokens, cache_write_tokens, reasoning_tokens, finish_reason, tool_names) * auto._anthropic_extractor cache_read / cache_write * langgraph._safe_get_gen_message * langgraph._get_finish_reason (5-source fallback chain) * langgraph.extract_usage_from_response second-tier fields These are pure / near-pure functions with no network or vendor SDK calls. Coverage padding is cheap — pin the canonical wire shapes once and the backend ingest contract gets a free live spec. Local numbers: * auto.py 63.44% -> 64.01% (file-level, +57 statements) * langgraph.py 78.50% -> 86.01% (file-level, +32 statements) * TOTAL 82.46% -> 83.13% (already above 82% gate) 41 tests, all green. Existing test_extractors.py and test_langgraph_callback.py left untouched — these tests deliberately target the Phase 4.1 fields (cache_read / cache_write / reasoning / finish_reason / tool_names) that the older tests didn't pin.
maltsev-dev
added a commit
that referenced
this pull request
Jun 27, 2026
Conflict resolution between release/0.7.7 (T4 per-call context for /gate) and origin/master (Release/0.7.6 #35, which bumped the SDK to 0.7.6): * pyproject.toml: keep 0.7.7 (the HEAD side). 0.7.6 on master is superseded by 0.7.7 once this merges. * CHANGELOG.md: keep BOTH the new 0.7.7 block (from HEAD) and the 0.7.6 block (from master). They document different releases and are listed in chronological order with the older 0.7.6 block below. * src/nullrun/{__init__.py, runtime.py, transport.py}: auto-merged cleanly - master doesn't touch the T4 hunks. Auto-merge result equals HEAD, but the merge commit is still needed to record the parent relationship and clear the conflict state on the PR.
maltsev-dev
added a commit
that referenced
this pull request
Jun 27, 2026
…36) * release: 0.7.6 — FastAPI integration + user-facing message catalog Additive patch on top of the 0.7.0 thin-client refactor. No breaking changes. Added ----- * nullrun.integrations.fastapi — one-line FastAPI integration that turns every NullRunDecision / NullRunInfrastructureError thrown by @nullrun.protect endpoints into a clean JSON response with the right HTTP status code. No per-endpoint except blocks required. Response shape: {"error_code": "NR-B004", "user_message": "You've reached the usage limit...", "category": "decision"} HTTP status mapping: * NR-B004 (budget), NR-L001 (loop), NR-R001 (rate) -> 429 with optional Retry-After * NR-T001 (tool blocked), NR-X001 (generic block) -> 403 * NR-W003 (paused) -> 503 with Retry-After * NR-W002 (killed) -> 503; WorkflowKilledInterrupt is a BaseException subclass so Starlette's add_exception_handler refuses it — handled via ASGI middleware instead (hybrid pattern, documented in module docstring). * NullRunInfrastructureError subclasses -> 503 (our side, not user's). * nullrun.messages — default user-facing message catalog. Every NR-* error code has an English default message owned by NULLRUN, not customer code. Customer Support Bots hitting a budget cap show the same wording across every NullRun-backed application. * format_user_message(exc) — render exception as user-facing string * set_user_message(code, text) — per-process override for branded variants * get_user_message(code) — raw lookup * reset_overrides() — clear all overrides (for tests) Changed ------- * Transport._send_batch canonical JSON serialization — route the /track/batch body through _signed_request_body for consistent compact-separator serialization. HMAC itself is unaffected, but consistent serialization removes a special-case from the wire-format contract tests. * Transport._send_batch actions response handling — backend renamed BatchTrackResponse.actions_taken (debug names) -> BatchTrackResponse.actions (ActionTaken structs). Read both for forward-compat; per-element try/except so one malformed entry doesn't abort the whole loop. * pyproject.toml metadata — long-form description with search keywords, Maintainer: populated via maintainers=[...], expanded classifiers (Linux / Windows / macOS, Python 3.13, CPython, Security / AI / WWW/HTTP topics), project URL expander. Tests ----- * tests/test_messages.py (new, 282 lines) — catalog completeness (every NR-* code has a default message), override / reset behavior, render path. * tests/test_integrations_fastapi.py (new, 289 lines) — HTTP status mapping per error code, response shape, ASGI middleware path for WorkflowKilledInterrupt, hybrid composition. * tests/test_decision_split.py (new, 199 lines) — pins the decision / infrastructure error split. * Updates to tests/test_runtime.py, tests/test_extractors.py reflecting transport canonical-JSON + actions-renamed changes. Release plumbing ---------------- * pyproject.toml: version bumped 0.7.0 -> 0.7.6 * src/nullrun/__version__.py: __version__ = "0.7.6" * CHANGELOG.md: full 0.7.6 entry covering additions, transport changes, metadata improvements Tests pass locally (per session log) — pytest on Windows / Python 3.14.2 is green. * ci: fix PR #35 — fastapi dep + Transport._send_batch typo + coverage padding PR #35 (release/0.7.6) failed all four CI jobs (test 3.10/3.11/3.12, coverage, codecov/patch) on the same root cause + one latent bug masked by it. This commit lands the fixes plus the last-mile tests that bring coverage above the 82% threshold. CI failure root --------------- * tests/test_integrations_fastapi.py does from fastapi import ... at module top-level. CI installs only pip install -e '.[dev]', and fastapi was declared as an *optional* [fastapi] extra, NOT in [dev]. Pytest collection aborted with ModuleNotFoundError: No module named 'fastapi' → all 4 jobs red. * Fix: add fastapi>=0.100,<1.0 to [dev]. Same precedent as langchain-core (already in [dev] for the same import-time contract: nullrun.instrumentation.langgraph is eager-imported from nullrun.decorators at collection time, so the test extras must cover the import chain). Latent bug surfaced by the first fix ------------------------------------ The same PR refactored Transport._send_batch_with_retry_info to route the /track/batch body through _signed_request_body for canonical-JSON serialization (matching /gate and /execute). The two sibling call sites use the module-level helper _signed_request_body (no self.); this one used self._signed_request_body by typo. Result: AttributeError on every batch flush, breaking 15 existing tests across test_transport.py / test_track_batch_retry.py / test_integration_contract.py / test_signal_safety.py. As long as the fastapi collection error aborted pytest, this was hidden. Fixed to _signed_request_body(...) with a docstring noting why it is module-level and what the bug looked like. Coverage padding (codecov/patch was failing on this too) -------------------------------------------------------- Total coverage on the failing CI run was 81.98% — 0.02pp under the fail-under=82 gate. After the two fixes above it would have recovered to ~82.0% on the dot, so I added minimal tests for the cheapest-to-cover gaps: * tests/test_breaker_main.py (new) — covers the 5 statements in nullrun.breaker.__main__.main() (0% → 100%). The module exists so python -m nullrun.breaker exits cleanly instead of failing with No module named nullrun.breaker.__main__; the previous fix-mechanism was return 0 after a print, but no test was exercising it. * tests/test_status.py — extends TestSummary with seven scenarios covering each conditional branch of NullRunStatus.summary() (organization_id, workflow_id, workflow_state != Normal, backend_reachable=False, ws_connected=False, recent_errors). status.py jumps 84.52% → 98.81%. * tests/test_integrations_fastapi.py — four tests on _build_headers covering non-numeric, zero, negative, and resume_after (the WorkflowPausedException code path). integrations/fastapi.py jumps 90.22% → 94.57%. After all three: TOTAL 81.98% → 82.46%, comfortably above the gate. Verification ------------ * Local pytest: 997 passed, 13 skipped, 0 failed (Windows / Python 3.14.2, 8m47s — same env the original commit was validated in). * python -m coverage report — 82.46%, no fail-under complaint. * test: cover Phase 4.1 instrumentation — finish_reason + cache/reasoning/tools Patch coverage on PR #35 was 62.38% against a 65% threshold (codecov target 70% / threshold 5pp). The two biggest delta-holders against master were auto.py (+286) and langgraph.py (+221), both dominated by Phase 4.1 additions: * auto._normalize_finish_reason + _FINISH_REASON_MAP * auto._openai_extractor second-tier fields (cache_read_tokens, cache_write_tokens, reasoning_tokens, finish_reason, tool_names) * auto._anthropic_extractor cache_read / cache_write * langgraph._safe_get_gen_message * langgraph._get_finish_reason (5-source fallback chain) * langgraph.extract_usage_from_response second-tier fields These are pure / near-pure functions with no network or vendor SDK calls. Coverage padding is cheap — pin the canonical wire shapes once and the backend ingest contract gets a free live spec. Local numbers: * auto.py 63.44% -> 64.01% (file-level, +57 statements) * langgraph.py 78.50% -> 86.01% (file-level, +32 statements) * TOTAL 82.46% -> 83.13% (already above 82% gate) 41 tests, all green. Existing test_extractors.py and test_langgraph_callback.py left untouched — these tests deliberately target the Phase 4.1 fields (cache_read / cache_write / reasoning / finish_reason / tool_names) that the older tests didn't pin. * fix(gate): forward real model + tools to /gate pre-flight (T4) Pre-0.7.7 every SDK /gate call for any workflow with a budget was hard-blocked because the runtime hard-coded the literal string "budget-precheck" as the model. The backend's PolicyEvaluationGraph treated any synthetic cost_limit rule with score > 0.8 as Block, so the pricing lookup never landed on a real model and the rule fired with the wrong score. This commit: * Adds nullrun.set_call_context(model=..., tools=[...]) plus get_call_model / get_call_tools helpers (and the underlying _call_model_var / _call_tools_var contextvars in nullrun.context). * Wires the call context into check_workflow_budget: the /gate payload now carries the real model name (or None when unset) and the user-supplied tool list. tools=[] vs missing-None are distinguished on the wire per gate/internal.rs::check_tool_block. * Transport.check forwards the tools key when set (it was silently dropped pre-fix). * tests/conftest.py reset_runtime clears the new contextvars so a test's set_call_context(...) doesn't leak into the next test's wire payload. * New tests/test_gate_real_path.py pins down the regression: default request allows a clean workflow, real block still honored, no policy-N residue on the wire, set_call_context flows into the body, no-context means no tools key, and the helpers are reachable from nullrun.*. Bumps version to 0.7.7. No breaking changes - new helpers default to None / empty so existing call sites keep working.
maltsev-dev
added a commit
that referenced
this pull request
Jun 28, 2026
* release: 0.7.6 — FastAPI integration + user-facing message catalog
Additive patch on top of the 0.7.0 thin-client refactor. No
breaking changes.
Added
-----
* nullrun.integrations.fastapi — one-line FastAPI integration
that turns every NullRunDecision / NullRunInfrastructureError
thrown by @nullrun.protect endpoints into a clean JSON
response with the right HTTP status code. No per-endpoint
except blocks required.
Response shape:
{"error_code": "NR-B004",
"user_message": "You've reached the usage limit...",
"category": "decision"}
HTTP status mapping:
* NR-B004 (budget), NR-L001 (loop), NR-R001 (rate) -> 429
with optional Retry-After
* NR-T001 (tool blocked), NR-X001 (generic block) -> 403
* NR-W003 (paused) -> 503 with Retry-After
* NR-W002 (killed) -> 503; WorkflowKilledInterrupt is a
BaseException subclass so Starlette's
add_exception_handler refuses it — handled via ASGI
middleware instead (hybrid pattern, documented in
module docstring).
* NullRunInfrastructureError subclasses -> 503 (our side,
not user's).
* nullrun.messages — default user-facing message catalog.
Every NR-* error code has an English default message owned
by NULLRUN, not customer code. Customer Support Bots hitting
a budget cap show the same wording across every NullRun-backed
application.
* format_user_message(exc) — render exception as user-facing
string
* set_user_message(code, text) — per-process override for
branded variants
* get_user_message(code) — raw lookup
* reset_overrides() — clear all overrides (for tests)
Changed
-------
* Transport._send_batch canonical JSON serialization — route the
/track/batch body through _signed_request_body for consistent
compact-separator serialization. HMAC itself is unaffected,
but consistent serialization removes a special-case from the
wire-format contract tests.
* Transport._send_batch actions response handling — backend
renamed BatchTrackResponse.actions_taken (debug names) ->
BatchTrackResponse.actions (ActionTaken structs). Read both
for forward-compat; per-element try/except so one malformed
entry doesn't abort the whole loop.
* pyproject.toml metadata — long-form description with search
keywords, Maintainer: populated via maintainers=[...],
expanded classifiers (Linux / Windows / macOS, Python 3.13,
CPython, Security / AI / WWW/HTTP topics), project URL
expander.
Tests
-----
* tests/test_messages.py (new, 282 lines) — catalog
completeness (every NR-* code has a default message),
override / reset behavior, render path.
* tests/test_integrations_fastapi.py (new, 289 lines) — HTTP
status mapping per error code, response shape, ASGI
middleware path for WorkflowKilledInterrupt, hybrid
composition.
* tests/test_decision_split.py (new, 199 lines) — pins the
decision / infrastructure error split.
* Updates to tests/test_runtime.py, tests/test_extractors.py
reflecting transport canonical-JSON + actions-renamed
changes.
Release plumbing
----------------
* pyproject.toml: version bumped 0.7.0 -> 0.7.6
* src/nullrun/__version__.py: __version__ = "0.7.6"
* CHANGELOG.md: full 0.7.6 entry covering additions,
transport changes, metadata improvements
Tests pass locally (per session log) — pytest on Windows /
Python 3.14.2 is green.
* ci: fix PR #35 — fastapi dep + Transport._send_batch typo + coverage padding
PR #35 (release/0.7.6) failed all four CI jobs (test 3.10/3.11/3.12,
coverage, codecov/patch) on the same root cause + one latent bug
masked by it. This commit lands the fixes plus the last-mile tests
that bring coverage above the 82% threshold.
CI failure root
---------------
* tests/test_integrations_fastapi.py does from fastapi import ...
at module top-level. CI installs only pip install -e '.[dev]',
and fastapi was declared as an *optional* [fastapi] extra,
NOT in [dev]. Pytest collection aborted with
ModuleNotFoundError: No module named 'fastapi' → all 4 jobs red.
* Fix: add fastapi>=0.100,<1.0 to [dev]. Same precedent as
langchain-core (already in [dev] for the same import-time
contract: nullrun.instrumentation.langgraph is eager-imported
from nullrun.decorators at collection time, so the test extras
must cover the import chain).
Latent bug surfaced by the first fix
------------------------------------
The same PR refactored Transport._send_batch_with_retry_info to
route the /track/batch body through _signed_request_body for
canonical-JSON serialization (matching /gate and /execute). The two
sibling call sites use the module-level helper _signed_request_body
(no self.); this one used self._signed_request_body by typo.
Result: AttributeError on every batch flush, breaking 15 existing
tests across test_transport.py / test_track_batch_retry.py /
test_integration_contract.py / test_signal_safety.py. As long as
the fastapi collection error aborted pytest, this was hidden. Fixed
to _signed_request_body(...) with a docstring noting why it is
module-level and what the bug looked like.
Coverage padding (codecov/patch was failing on this too)
--------------------------------------------------------
Total coverage on the failing CI run was 81.98% — 0.02pp under the
fail-under=82 gate. After the two fixes above it would have
recovered to ~82.0% on the dot, so I added minimal tests for the
cheapest-to-cover gaps:
* tests/test_breaker_main.py (new) — covers the 5 statements in
nullrun.breaker.__main__.main() (0% → 100%). The module
exists so python -m nullrun.breaker exits cleanly instead of
failing with No module named nullrun.breaker.__main__; the
previous fix-mechanism was return 0 after a print, but no
test was exercising it.
* tests/test_status.py — extends TestSummary with seven
scenarios covering each conditional branch of NullRunStatus.summary()
(organization_id, workflow_id, workflow_state != Normal,
backend_reachable=False, ws_connected=False, recent_errors).
status.py jumps 84.52% → 98.81%.
* tests/test_integrations_fastapi.py — four tests on
_build_headers covering non-numeric, zero, negative, and
resume_after (the WorkflowPausedException code path).
integrations/fastapi.py jumps 90.22% → 94.57%.
After all three: TOTAL 81.98% → 82.46%, comfortably above the gate.
Verification
------------
* Local pytest: 997 passed, 13 skipped, 0 failed
(Windows / Python 3.14.2, 8m47s — same env the original commit
was validated in).
* python -m coverage report — 82.46%, no fail-under complaint.
* test: cover Phase 4.1 instrumentation — finish_reason + cache/reasoning/tools
Patch coverage on PR #35 was 62.38% against a 65% threshold (codecov
target 70% / threshold 5pp). The two biggest delta-holders against
master were auto.py (+286) and langgraph.py (+221), both dominated
by Phase 4.1 additions:
* auto._normalize_finish_reason + _FINISH_REASON_MAP
* auto._openai_extractor second-tier fields (cache_read_tokens,
cache_write_tokens, reasoning_tokens, finish_reason, tool_names)
* auto._anthropic_extractor cache_read / cache_write
* langgraph._safe_get_gen_message
* langgraph._get_finish_reason (5-source fallback chain)
* langgraph.extract_usage_from_response second-tier fields
These are pure / near-pure functions with no network or vendor SDK
calls. Coverage padding is cheap — pin the canonical wire shapes
once and the backend ingest contract gets a free live spec.
Local numbers:
* auto.py 63.44% -> 64.01% (file-level, +57 statements)
* langgraph.py 78.50% -> 86.01% (file-level, +32 statements)
* TOTAL 82.46% -> 83.13% (already above 82% gate)
41 tests, all green. Existing test_extractors.py and
test_langgraph_callback.py left untouched — these tests
deliberately target the Phase 4.1 fields (cache_read /
cache_write / reasoning / finish_reason / tool_names) that the
older tests didn't pin.
* fix(gate): forward real model + tools to /gate pre-flight (T4)
Pre-0.7.7 every SDK /gate call for any workflow with a budget was
hard-blocked because the runtime hard-coded the literal string
"budget-precheck" as the model. The backend's PolicyEvaluationGraph
treated any synthetic cost_limit rule with score > 0.8 as Block,
so the pricing lookup never landed on a real model and the rule
fired with the wrong score.
This commit:
* Adds nullrun.set_call_context(model=..., tools=[...]) plus
get_call_model / get_call_tools helpers (and the underlying
_call_model_var / _call_tools_var contextvars in
nullrun.context).
* Wires the call context into check_workflow_budget: the /gate
payload now carries the real model name (or None when unset)
and the user-supplied tool list. tools=[] vs missing-None are
distinguished on the wire per gate/internal.rs::check_tool_block.
* Transport.check forwards the tools key when set (it was
silently dropped pre-fix).
* tests/conftest.py reset_runtime clears the new contextvars so
a test's set_call_context(...) doesn't leak into the next
test's wire payload.
* New tests/test_gate_real_path.py pins down the regression:
default request allows a clean workflow, real block still
honored, no policy-N residue on the wire, set_call_context
flows into the body, no-context means no tools key, and the
helpers are reachable from nullrun.*.
Bumps version to 0.7.7. No breaking changes - new helpers
default to None / empty so existing call sites keep working.
* release: 0.7.8 — fail-loud on deprecated surface
Two silent fail-OPEN footguns are converted to explicit
DeprecationWarning / RuntimeError so misconfigurations show up at
SDK init instead of being diagnosed from a missing proto trace.
Deprecated:
* NullRunRuntime.start_recording() and .stop_recording() now emit
DeprecationWarning. They have been silent no-op stubs since
Sprint 2.1 (0.4.0) — decision history is now on the backend
dashboard at /control-center/decision-history. Both methods
will be removed in 0.9.0.
* NULLRUN_USE_GRPC=1 now raises RuntimeError at SDK init instead
of silently falling back to HTTP with an info log. gRPC is on
the roadmap but not implemented; unset the env var to use HTTP.
Hardening (init path):
* Transport._post_auth_with_retry (new) — retry transient 503 / 504
+ network blips during /api/v1/auth/verify. Backend emits 503
+ Retry-After: 5 on transient DB errors (handlers.rs:11346-51).
Pre-fix the first 503 surfaced as NR-A001 to the user as if the
API key were bad. Three attempts, exponential backoff
(0.5s → 1s → 2s), honors Retry-After when present. Auth-key
failures (401) are NOT retried — a wrong key on attempt 1 is a
wrong key on attempt 3.
Transport refactor:
* Transport._add_hmac_headers (new) — pulls the HMAC header
construction out of _signed_request_body so /track/batch,
/gate, /check, /execute all share one source of truth for
Content-Type / X-Signature / X-Signature-Timestamp / X-API-Key
/ Authorization headers. HMAC formula unchanged.
* generate_hmac_signature + verify_hmac_signature accept str | bytes
for body. Legacy str callers (and the FastAPI integration) keep
working without an explicit .encode().
* actions_taken → actions on /track/batch response. Backend renamed
BatchTrackResponse.actions_taken (debug names) → actions
(ActionTaken structs with human-readable strings moved to
messages). Read both keys for forward-compat.
Test updates:
* tests/test_framework_patches — alignment with retry + actions
rename.
* tests/test_high_reliability_fixes — re-pinned for _post_auth_with_retry.
* tests/test_hmac_signing — expanded for str/bytes body + new
_add_hmac_headers helper.
* tests/test_integration_contract — backend actions rename covered.
* tests/test_transport — retry semantics.
Bumps version to 0.7.8. No breaking changes for callers who don't
touch start_recording / stop_recording / NULLRUN_USE_GRPC.
* test(grpc): align test_grpc_removed with 0.7.8 NULLRUN_USE_GRPC contract
The 0.7.8 commit changed NULLRUN_USE_GRPC=1 from silent no-op +
INFO log to an explicit RuntimeError, but the regression test
in tests/test_grpc_removed.py still pinned the old behavior
(``test_nullrun_use_grpc_does_not_crash_init`` asserting
make_runtime() succeeded and an INFO line was logged).
CI on PR #38 failed on this test:
FAILED tests/test_grpc_removed.py::TestGrpcRemoved
::test_nullrun_use_grpc_does_not_crash_init
E RuntimeError: NULLRUN_USE_GRPC is set but the gRPC
transport is not yet implemented. ...
This commit updates the test to pin the new 0.7.8 contract:
the env var must raise RuntimeError, and the error message
must name the offending variable + point at the docs page.
The test is renamed from
``test_nullrun_use_grpc_does_not_crash_init`` to
``test_nullrun_use_grpc_raises_runtime_error`` so the test
name itself documents the new contract.
The module docstring (point 2 in the contract list) is
updated to say "raises RuntimeError" instead of "does NOT
crash init — it logs an INFO line and silently falls back
to HTTP". The 0.3.1 -> 0.7.8 evolution is documented in the
test docstring as a contract-evolution footnote for future
maintainers.
Imports: removed unused `import logging` and `caplog`
parameter (no longer asserting on log records); added
`import pytest` for `pytest.raises`.
No production-code change. No version bump. The fix is
self-contained to tests/test_grpc_removed.py.
* style(runtime): sort stdlib imports (ruff I001)
The 0.7.8 commit (fail-loud on deprecated surface) added
``import warnings`` mid-block in src/nullrun/runtime.py:34,
breaking alphabetical order:
asyncio
logging
os
warnings <-- out of order
threading
time
uuid
Ruff on PR #38 CI (Run ruff check src/) flagged it as I001.
Reorder to alphabetical:
asyncio
logging
os
threading
time
uuid
warnings
Verified:
* ruff check src/ -> All checks passed!
* pytest tests/test_grpc_removed.py tests/test_runtime_branches.py
-> 47 passed
No behavior change, no production logic touched. Pure lint fix.
maltsev-dev
added a commit
that referenced
this pull request
Jun 28, 2026
* release: 0.7.6 — FastAPI integration + user-facing message catalog
Additive patch on top of the 0.7.0 thin-client refactor. No
breaking changes.
Added
-----
* nullrun.integrations.fastapi — one-line FastAPI integration
that turns every NullRunDecision / NullRunInfrastructureError
thrown by @nullrun.protect endpoints into a clean JSON
response with the right HTTP status code. No per-endpoint
except blocks required.
Response shape:
{"error_code": "NR-B004",
"user_message": "You've reached the usage limit...",
"category": "decision"}
HTTP status mapping:
* NR-B004 (budget), NR-L001 (loop), NR-R001 (rate) -> 429
with optional Retry-After
* NR-T001 (tool blocked), NR-X001 (generic block) -> 403
* NR-W003 (paused) -> 503 with Retry-After
* NR-W002 (killed) -> 503; WorkflowKilledInterrupt is a
BaseException subclass so Starlette's
add_exception_handler refuses it — handled via ASGI
middleware instead (hybrid pattern, documented in
module docstring).
* NullRunInfrastructureError subclasses -> 503 (our side,
not user's).
* nullrun.messages — default user-facing message catalog.
Every NR-* error code has an English default message owned
by NULLRUN, not customer code. Customer Support Bots hitting
a budget cap show the same wording across every NullRun-backed
application.
* format_user_message(exc) — render exception as user-facing
string
* set_user_message(code, text) — per-process override for
branded variants
* get_user_message(code) — raw lookup
* reset_overrides() — clear all overrides (for tests)
Changed
-------
* Transport._send_batch canonical JSON serialization — route the
/track/batch body through _signed_request_body for consistent
compact-separator serialization. HMAC itself is unaffected,
but consistent serialization removes a special-case from the
wire-format contract tests.
* Transport._send_batch actions response handling — backend
renamed BatchTrackResponse.actions_taken (debug names) ->
BatchTrackResponse.actions (ActionTaken structs). Read both
for forward-compat; per-element try/except so one malformed
entry doesn't abort the whole loop.
* pyproject.toml metadata — long-form description with search
keywords, Maintainer: populated via maintainers=[...],
expanded classifiers (Linux / Windows / macOS, Python 3.13,
CPython, Security / AI / WWW/HTTP topics), project URL
expander.
Tests
-----
* tests/test_messages.py (new, 282 lines) — catalog
completeness (every NR-* code has a default message),
override / reset behavior, render path.
* tests/test_integrations_fastapi.py (new, 289 lines) — HTTP
status mapping per error code, response shape, ASGI
middleware path for WorkflowKilledInterrupt, hybrid
composition.
* tests/test_decision_split.py (new, 199 lines) — pins the
decision / infrastructure error split.
* Updates to tests/test_runtime.py, tests/test_extractors.py
reflecting transport canonical-JSON + actions-renamed
changes.
Release plumbing
----------------
* pyproject.toml: version bumped 0.7.0 -> 0.7.6
* src/nullrun/__version__.py: __version__ = "0.7.6"
* CHANGELOG.md: full 0.7.6 entry covering additions,
transport changes, metadata improvements
Tests pass locally (per session log) — pytest on Windows /
Python 3.14.2 is green.
* ci: fix PR #35 — fastapi dep + Transport._send_batch typo + coverage padding
PR #35 (release/0.7.6) failed all four CI jobs (test 3.10/3.11/3.12,
coverage, codecov/patch) on the same root cause + one latent bug
masked by it. This commit lands the fixes plus the last-mile tests
that bring coverage above the 82% threshold.
CI failure root
---------------
* tests/test_integrations_fastapi.py does from fastapi import ...
at module top-level. CI installs only pip install -e '.[dev]',
and fastapi was declared as an *optional* [fastapi] extra,
NOT in [dev]. Pytest collection aborted with
ModuleNotFoundError: No module named 'fastapi' → all 4 jobs red.
* Fix: add fastapi>=0.100,<1.0 to [dev]. Same precedent as
langchain-core (already in [dev] for the same import-time
contract: nullrun.instrumentation.langgraph is eager-imported
from nullrun.decorators at collection time, so the test extras
must cover the import chain).
Latent bug surfaced by the first fix
------------------------------------
The same PR refactored Transport._send_batch_with_retry_info to
route the /track/batch body through _signed_request_body for
canonical-JSON serialization (matching /gate and /execute). The two
sibling call sites use the module-level helper _signed_request_body
(no self.); this one used self._signed_request_body by typo.
Result: AttributeError on every batch flush, breaking 15 existing
tests across test_transport.py / test_track_batch_retry.py /
test_integration_contract.py / test_signal_safety.py. As long as
the fastapi collection error aborted pytest, this was hidden. Fixed
to _signed_request_body(...) with a docstring noting why it is
module-level and what the bug looked like.
Coverage padding (codecov/patch was failing on this too)
--------------------------------------------------------
Total coverage on the failing CI run was 81.98% — 0.02pp under the
fail-under=82 gate. After the two fixes above it would have
recovered to ~82.0% on the dot, so I added minimal tests for the
cheapest-to-cover gaps:
* tests/test_breaker_main.py (new) — covers the 5 statements in
nullrun.breaker.__main__.main() (0% → 100%). The module
exists so python -m nullrun.breaker exits cleanly instead of
failing with No module named nullrun.breaker.__main__; the
previous fix-mechanism was return 0 after a print, but no
test was exercising it.
* tests/test_status.py — extends TestSummary with seven
scenarios covering each conditional branch of NullRunStatus.summary()
(organization_id, workflow_id, workflow_state != Normal,
backend_reachable=False, ws_connected=False, recent_errors).
status.py jumps 84.52% → 98.81%.
* tests/test_integrations_fastapi.py — four tests on
_build_headers covering non-numeric, zero, negative, and
resume_after (the WorkflowPausedException code path).
integrations/fastapi.py jumps 90.22% → 94.57%.
After all three: TOTAL 81.98% → 82.46%, comfortably above the gate.
Verification
------------
* Local pytest: 997 passed, 13 skipped, 0 failed
(Windows / Python 3.14.2, 8m47s — same env the original commit
was validated in).
* python -m coverage report — 82.46%, no fail-under complaint.
* test: cover Phase 4.1 instrumentation — finish_reason + cache/reasoning/tools
Patch coverage on PR #35 was 62.38% against a 65% threshold (codecov
target 70% / threshold 5pp). The two biggest delta-holders against
master were auto.py (+286) and langgraph.py (+221), both dominated
by Phase 4.1 additions:
* auto._normalize_finish_reason + _FINISH_REASON_MAP
* auto._openai_extractor second-tier fields (cache_read_tokens,
cache_write_tokens, reasoning_tokens, finish_reason, tool_names)
* auto._anthropic_extractor cache_read / cache_write
* langgraph._safe_get_gen_message
* langgraph._get_finish_reason (5-source fallback chain)
* langgraph.extract_usage_from_response second-tier fields
These are pure / near-pure functions with no network or vendor SDK
calls. Coverage padding is cheap — pin the canonical wire shapes
once and the backend ingest contract gets a free live spec.
Local numbers:
* auto.py 63.44% -> 64.01% (file-level, +57 statements)
* langgraph.py 78.50% -> 86.01% (file-level, +32 statements)
* TOTAL 82.46% -> 83.13% (already above 82% gate)
41 tests, all green. Existing test_extractors.py and
test_langgraph_callback.py left untouched — these tests
deliberately target the Phase 4.1 fields (cache_read /
cache_write / reasoning / finish_reason / tool_names) that the
older tests didn't pin.
* fix(gate): forward real model + tools to /gate pre-flight (T4)
Pre-0.7.7 every SDK /gate call for any workflow with a budget was
hard-blocked because the runtime hard-coded the literal string
"budget-precheck" as the model. The backend's PolicyEvaluationGraph
treated any synthetic cost_limit rule with score > 0.8 as Block,
so the pricing lookup never landed on a real model and the rule
fired with the wrong score.
This commit:
* Adds nullrun.set_call_context(model=..., tools=[...]) plus
get_call_model / get_call_tools helpers (and the underlying
_call_model_var / _call_tools_var contextvars in
nullrun.context).
* Wires the call context into check_workflow_budget: the /gate
payload now carries the real model name (or None when unset)
and the user-supplied tool list. tools=[] vs missing-None are
distinguished on the wire per gate/internal.rs::check_tool_block.
* Transport.check forwards the tools key when set (it was
silently dropped pre-fix).
* tests/conftest.py reset_runtime clears the new contextvars so
a test's set_call_context(...) doesn't leak into the next
test's wire payload.
* New tests/test_gate_real_path.py pins down the regression:
default request allows a clean workflow, real block still
honored, no policy-N residue on the wire, set_call_context
flows into the body, no-context means no tools key, and the
helpers are reachable from nullrun.*.
Bumps version to 0.7.7. No breaking changes - new helpers
default to None / empty so existing call sites keep working.
* release: 0.7.8 — fail-loud on deprecated surface
Two silent fail-OPEN footguns are converted to explicit
DeprecationWarning / RuntimeError so misconfigurations show up at
SDK init instead of being diagnosed from a missing proto trace.
Deprecated:
* NullRunRuntime.start_recording() and .stop_recording() now emit
DeprecationWarning. They have been silent no-op stubs since
Sprint 2.1 (0.4.0) — decision history is now on the backend
dashboard at /control-center/decision-history. Both methods
will be removed in 0.9.0.
* NULLRUN_USE_GRPC=1 now raises RuntimeError at SDK init instead
of silently falling back to HTTP with an info log. gRPC is on
the roadmap but not implemented; unset the env var to use HTTP.
Hardening (init path):
* Transport._post_auth_with_retry (new) — retry transient 503 / 504
+ network blips during /api/v1/auth/verify. Backend emits 503
+ Retry-After: 5 on transient DB errors (handlers.rs:11346-51).
Pre-fix the first 503 surfaced as NR-A001 to the user as if the
API key were bad. Three attempts, exponential backoff
(0.5s → 1s → 2s), honors Retry-After when present. Auth-key
failures (401) are NOT retried — a wrong key on attempt 1 is a
wrong key on attempt 3.
Transport refactor:
* Transport._add_hmac_headers (new) — pulls the HMAC header
construction out of _signed_request_body so /track/batch,
/gate, /check, /execute all share one source of truth for
Content-Type / X-Signature / X-Signature-Timestamp / X-API-Key
/ Authorization headers. HMAC formula unchanged.
* generate_hmac_signature + verify_hmac_signature accept str | bytes
for body. Legacy str callers (and the FastAPI integration) keep
working without an explicit .encode().
* actions_taken → actions on /track/batch response. Backend renamed
BatchTrackResponse.actions_taken (debug names) → actions
(ActionTaken structs with human-readable strings moved to
messages). Read both keys for forward-compat.
Test updates:
* tests/test_framework_patches — alignment with retry + actions
rename.
* tests/test_high_reliability_fixes — re-pinned for _post_auth_with_retry.
* tests/test_hmac_signing — expanded for str/bytes body + new
_add_hmac_headers helper.
* tests/test_integration_contract — backend actions rename covered.
* tests/test_transport — retry semantics.
Bumps version to 0.7.8. No breaking changes for callers who don't
touch start_recording / stop_recording / NULLRUN_USE_GRPC.
* test(grpc): align test_grpc_removed with 0.7.8 NULLRUN_USE_GRPC contract
The 0.7.8 commit changed NULLRUN_USE_GRPC=1 from silent no-op +
INFO log to an explicit RuntimeError, but the regression test
in tests/test_grpc_removed.py still pinned the old behavior
(``test_nullrun_use_grpc_does_not_crash_init`` asserting
make_runtime() succeeded and an INFO line was logged).
CI on PR #38 failed on this test:
FAILED tests/test_grpc_removed.py::TestGrpcRemoved
::test_nullrun_use_grpc_does_not_crash_init
E RuntimeError: NULLRUN_USE_GRPC is set but the gRPC
transport is not yet implemented. ...
This commit updates the test to pin the new 0.7.8 contract:
the env var must raise RuntimeError, and the error message
must name the offending variable + point at the docs page.
The test is renamed from
``test_nullrun_use_grpc_does_not_crash_init`` to
``test_nullrun_use_grpc_raises_runtime_error`` so the test
name itself documents the new contract.
The module docstring (point 2 in the contract list) is
updated to say "raises RuntimeError" instead of "does NOT
crash init — it logs an INFO line and silently falls back
to HTTP". The 0.3.1 -> 0.7.8 evolution is documented in the
test docstring as a contract-evolution footnote for future
maintainers.
Imports: removed unused `import logging` and `caplog`
parameter (no longer asserting on log records); added
`import pytest` for `pytest.raises`.
No production-code change. No version bump. The fix is
self-contained to tests/test_grpc_removed.py.
* style(runtime): sort stdlib imports (ruff I001)
The 0.7.8 commit (fail-loud on deprecated surface) added
``import warnings`` mid-block in src/nullrun/runtime.py:34,
breaking alphabetical order:
asyncio
logging
os
warnings <-- out of order
threading
time
uuid
Ruff on PR #38 CI (Run ruff check src/) flagged it as I001.
Reorder to alphabetical:
asyncio
logging
os
threading
time
uuid
warnings
Verified:
* ruff check src/ -> All checks passed!
* pytest tests/test_grpc_removed.py tests/test_runtime_branches.py
-> 47 passed
No behavior change, no production logic touched. Pure lint fix.
* release: 0.8.0 — SDK wire-format audit (model/provider extraction)
Closes a class of silent-fail-OPEN path that was sending
model=None or model="unknown" on /track for many LLM-vendor
paths. Every such event cost the backend a model_pricing
lookup that returned no row, fell through to DEFAULT_RATE
(~$30/M), and emitted a fallback warning the operator
couldn't reproduce because the offending observation was
buried in another package's telemetry.
No public-API break. No behavior change for callers whose
instrumentation already populates model correctly. Pure
wire-payload hygiene.
runtime.py — track():
* Strips None values from the wire payload (pre-0.8.0
forwarded every key except _WIRE_STRIP_FIELDS, including
keys whose value was None). Putting {"model": null} on
the wire triggered backend unwrap_or("default") and a
fallback warning. Dropping None keeps the diagnostic
signal loud (the new WARN below fires on missing-key,
which is what we want operators to see) instead of
silent (the JSON-null case).
* Adds logger.warning("track(): llm_call event missing
'model' field — backend will fall back to DEFAULT_RATE.
event=...") — the single signal an operator needs to
reproduce "which observation produced an llm_call
without model set". Activated only for llm_call; other
event types are silent.
instrumentation/langgraph.py — NullRunCallback.on_llm_end:
* New _extract_model_from_response + _extract_provider_from_response
helpers (mirror _get_finish_reason's best-effort
pattern). Fallback chain: invocation_params → response
metadata → AIMessage response_metadata → llm_output →
direct attribute. "unknown" is now a true last resort,
not the common case.
instrumentation/llama_index.py:
* extract_from_event fallback chain: event.response.model
→ event.response.raw.model → usage['model']. Mock
providers and adapter-style ChatResponse now ship a
real model id.
instrumentation/autogen.py:
* on_messages fallback chain: self.model → result.model.
OpenAI's response carries the actual model id (may
differ from request if the server resolved an alias).
instrumentation/auto.py — _emit_from_span (openai-agents):
* span model fallback chain: span['model'] →
usage['model'] → span['response_metadata']['model_name'].
Some custom tracer configs leave span['model'] empty;
the other two sources usually have it.
Sets model on the event only when we have a real value
(empty/None is dropped — relies on the new None-strip
in track() to keep the operator warning loud).
Bumps version to 0.8.0. No breaking changes for callers
who don't touch the wire path directly.
maltsev-dev
added a commit
that referenced
this pull request
Jun 29, 2026
…40) * release: 0.7.6 — FastAPI integration + user-facing message catalog Additive patch on top of the 0.7.0 thin-client refactor. No breaking changes. Added ----- * nullrun.integrations.fastapi — one-line FastAPI integration that turns every NullRunDecision / NullRunInfrastructureError thrown by @nullrun.protect endpoints into a clean JSON response with the right HTTP status code. No per-endpoint except blocks required. Response shape: {"error_code": "NR-B004", "user_message": "You've reached the usage limit...", "category": "decision"} HTTP status mapping: * NR-B004 (budget), NR-L001 (loop), NR-R001 (rate) -> 429 with optional Retry-After * NR-T001 (tool blocked), NR-X001 (generic block) -> 403 * NR-W003 (paused) -> 503 with Retry-After * NR-W002 (killed) -> 503; WorkflowKilledInterrupt is a BaseException subclass so Starlette's add_exception_handler refuses it — handled via ASGI middleware instead (hybrid pattern, documented in module docstring). * NullRunInfrastructureError subclasses -> 503 (our side, not user's). * nullrun.messages — default user-facing message catalog. Every NR-* error code has an English default message owned by NULLRUN, not customer code. Customer Support Bots hitting a budget cap show the same wording across every NullRun-backed application. * format_user_message(exc) — render exception as user-facing string * set_user_message(code, text) — per-process override for branded variants * get_user_message(code) — raw lookup * reset_overrides() — clear all overrides (for tests) Changed ------- * Transport._send_batch canonical JSON serialization — route the /track/batch body through _signed_request_body for consistent compact-separator serialization. HMAC itself is unaffected, but consistent serialization removes a special-case from the wire-format contract tests. * Transport._send_batch actions response handling — backend renamed BatchTrackResponse.actions_taken (debug names) -> BatchTrackResponse.actions (ActionTaken structs). Read both for forward-compat; per-element try/except so one malformed entry doesn't abort the whole loop. * pyproject.toml metadata — long-form description with search keywords, Maintainer: populated via maintainers=[...], expanded classifiers (Linux / Windows / macOS, Python 3.13, CPython, Security / AI / WWW/HTTP topics), project URL expander. Tests ----- * tests/test_messages.py (new, 282 lines) — catalog completeness (every NR-* code has a default message), override / reset behavior, render path. * tests/test_integrations_fastapi.py (new, 289 lines) — HTTP status mapping per error code, response shape, ASGI middleware path for WorkflowKilledInterrupt, hybrid composition. * tests/test_decision_split.py (new, 199 lines) — pins the decision / infrastructure error split. * Updates to tests/test_runtime.py, tests/test_extractors.py reflecting transport canonical-JSON + actions-renamed changes. Release plumbing ---------------- * pyproject.toml: version bumped 0.7.0 -> 0.7.6 * src/nullrun/__version__.py: __version__ = "0.7.6" * CHANGELOG.md: full 0.7.6 entry covering additions, transport changes, metadata improvements Tests pass locally (per session log) — pytest on Windows / Python 3.14.2 is green. * ci: fix PR #35 — fastapi dep + Transport._send_batch typo + coverage padding PR #35 (release/0.7.6) failed all four CI jobs (test 3.10/3.11/3.12, coverage, codecov/patch) on the same root cause + one latent bug masked by it. This commit lands the fixes plus the last-mile tests that bring coverage above the 82% threshold. CI failure root --------------- * tests/test_integrations_fastapi.py does from fastapi import ... at module top-level. CI installs only pip install -e '.[dev]', and fastapi was declared as an *optional* [fastapi] extra, NOT in [dev]. Pytest collection aborted with ModuleNotFoundError: No module named 'fastapi' → all 4 jobs red. * Fix: add fastapi>=0.100,<1.0 to [dev]. Same precedent as langchain-core (already in [dev] for the same import-time contract: nullrun.instrumentation.langgraph is eager-imported from nullrun.decorators at collection time, so the test extras must cover the import chain). Latent bug surfaced by the first fix ------------------------------------ The same PR refactored Transport._send_batch_with_retry_info to route the /track/batch body through _signed_request_body for canonical-JSON serialization (matching /gate and /execute). The two sibling call sites use the module-level helper _signed_request_body (no self.); this one used self._signed_request_body by typo. Result: AttributeError on every batch flush, breaking 15 existing tests across test_transport.py / test_track_batch_retry.py / test_integration_contract.py / test_signal_safety.py. As long as the fastapi collection error aborted pytest, this was hidden. Fixed to _signed_request_body(...) with a docstring noting why it is module-level and what the bug looked like. Coverage padding (codecov/patch was failing on this too) -------------------------------------------------------- Total coverage on the failing CI run was 81.98% — 0.02pp under the fail-under=82 gate. After the two fixes above it would have recovered to ~82.0% on the dot, so I added minimal tests for the cheapest-to-cover gaps: * tests/test_breaker_main.py (new) — covers the 5 statements in nullrun.breaker.__main__.main() (0% → 100%). The module exists so python -m nullrun.breaker exits cleanly instead of failing with No module named nullrun.breaker.__main__; the previous fix-mechanism was return 0 after a print, but no test was exercising it. * tests/test_status.py — extends TestSummary with seven scenarios covering each conditional branch of NullRunStatus.summary() (organization_id, workflow_id, workflow_state != Normal, backend_reachable=False, ws_connected=False, recent_errors). status.py jumps 84.52% → 98.81%. * tests/test_integrations_fastapi.py — four tests on _build_headers covering non-numeric, zero, negative, and resume_after (the WorkflowPausedException code path). integrations/fastapi.py jumps 90.22% → 94.57%. After all three: TOTAL 81.98% → 82.46%, comfortably above the gate. Verification ------------ * Local pytest: 997 passed, 13 skipped, 0 failed (Windows / Python 3.14.2, 8m47s — same env the original commit was validated in). * python -m coverage report — 82.46%, no fail-under complaint. * test: cover Phase 4.1 instrumentation — finish_reason + cache/reasoning/tools Patch coverage on PR #35 was 62.38% against a 65% threshold (codecov target 70% / threshold 5pp). The two biggest delta-holders against master were auto.py (+286) and langgraph.py (+221), both dominated by Phase 4.1 additions: * auto._normalize_finish_reason + _FINISH_REASON_MAP * auto._openai_extractor second-tier fields (cache_read_tokens, cache_write_tokens, reasoning_tokens, finish_reason, tool_names) * auto._anthropic_extractor cache_read / cache_write * langgraph._safe_get_gen_message * langgraph._get_finish_reason (5-source fallback chain) * langgraph.extract_usage_from_response second-tier fields These are pure / near-pure functions with no network or vendor SDK calls. Coverage padding is cheap — pin the canonical wire shapes once and the backend ingest contract gets a free live spec. Local numbers: * auto.py 63.44% -> 64.01% (file-level, +57 statements) * langgraph.py 78.50% -> 86.01% (file-level, +32 statements) * TOTAL 82.46% -> 83.13% (already above 82% gate) 41 tests, all green. Existing test_extractors.py and test_langgraph_callback.py left untouched — these tests deliberately target the Phase 4.1 fields (cache_read / cache_write / reasoning / finish_reason / tool_names) that the older tests didn't pin. * fix(gate): forward real model + tools to /gate pre-flight (T4) Pre-0.7.7 every SDK /gate call for any workflow with a budget was hard-blocked because the runtime hard-coded the literal string "budget-precheck" as the model. The backend's PolicyEvaluationGraph treated any synthetic cost_limit rule with score > 0.8 as Block, so the pricing lookup never landed on a real model and the rule fired with the wrong score. This commit: * Adds nullrun.set_call_context(model=..., tools=[...]) plus get_call_model / get_call_tools helpers (and the underlying _call_model_var / _call_tools_var contextvars in nullrun.context). * Wires the call context into check_workflow_budget: the /gate payload now carries the real model name (or None when unset) and the user-supplied tool list. tools=[] vs missing-None are distinguished on the wire per gate/internal.rs::check_tool_block. * Transport.check forwards the tools key when set (it was silently dropped pre-fix). * tests/conftest.py reset_runtime clears the new contextvars so a test's set_call_context(...) doesn't leak into the next test's wire payload. * New tests/test_gate_real_path.py pins down the regression: default request allows a clean workflow, real block still honored, no policy-N residue on the wire, set_call_context flows into the body, no-context means no tools key, and the helpers are reachable from nullrun.*. Bumps version to 0.7.7. No breaking changes - new helpers default to None / empty so existing call sites keep working. * release: 0.7.8 — fail-loud on deprecated surface Two silent fail-OPEN footguns are converted to explicit DeprecationWarning / RuntimeError so misconfigurations show up at SDK init instead of being diagnosed from a missing proto trace. Deprecated: * NullRunRuntime.start_recording() and .stop_recording() now emit DeprecationWarning. They have been silent no-op stubs since Sprint 2.1 (0.4.0) — decision history is now on the backend dashboard at /control-center/decision-history. Both methods will be removed in 0.9.0. * NULLRUN_USE_GRPC=1 now raises RuntimeError at SDK init instead of silently falling back to HTTP with an info log. gRPC is on the roadmap but not implemented; unset the env var to use HTTP. Hardening (init path): * Transport._post_auth_with_retry (new) — retry transient 503 / 504 + network blips during /api/v1/auth/verify. Backend emits 503 + Retry-After: 5 on transient DB errors (handlers.rs:11346-51). Pre-fix the first 503 surfaced as NR-A001 to the user as if the API key were bad. Three attempts, exponential backoff (0.5s → 1s → 2s), honors Retry-After when present. Auth-key failures (401) are NOT retried — a wrong key on attempt 1 is a wrong key on attempt 3. Transport refactor: * Transport._add_hmac_headers (new) — pulls the HMAC header construction out of _signed_request_body so /track/batch, /gate, /check, /execute all share one source of truth for Content-Type / X-Signature / X-Signature-Timestamp / X-API-Key / Authorization headers. HMAC formula unchanged. * generate_hmac_signature + verify_hmac_signature accept str | bytes for body. Legacy str callers (and the FastAPI integration) keep working without an explicit .encode(). * actions_taken → actions on /track/batch response. Backend renamed BatchTrackResponse.actions_taken (debug names) → actions (ActionTaken structs with human-readable strings moved to messages). Read both keys for forward-compat. Test updates: * tests/test_framework_patches — alignment with retry + actions rename. * tests/test_high_reliability_fixes — re-pinned for _post_auth_with_retry. * tests/test_hmac_signing — expanded for str/bytes body + new _add_hmac_headers helper. * tests/test_integration_contract — backend actions rename covered. * tests/test_transport — retry semantics. Bumps version to 0.7.8. No breaking changes for callers who don't touch start_recording / stop_recording / NULLRUN_USE_GRPC. * test(grpc): align test_grpc_removed with 0.7.8 NULLRUN_USE_GRPC contract The 0.7.8 commit changed NULLRUN_USE_GRPC=1 from silent no-op + INFO log to an explicit RuntimeError, but the regression test in tests/test_grpc_removed.py still pinned the old behavior (``test_nullrun_use_grpc_does_not_crash_init`` asserting make_runtime() succeeded and an INFO line was logged). CI on PR #38 failed on this test: FAILED tests/test_grpc_removed.py::TestGrpcRemoved ::test_nullrun_use_grpc_does_not_crash_init E RuntimeError: NULLRUN_USE_GRPC is set but the gRPC transport is not yet implemented. ... This commit updates the test to pin the new 0.7.8 contract: the env var must raise RuntimeError, and the error message must name the offending variable + point at the docs page. The test is renamed from ``test_nullrun_use_grpc_does_not_crash_init`` to ``test_nullrun_use_grpc_raises_runtime_error`` so the test name itself documents the new contract. The module docstring (point 2 in the contract list) is updated to say "raises RuntimeError" instead of "does NOT crash init — it logs an INFO line and silently falls back to HTTP". The 0.3.1 -> 0.7.8 evolution is documented in the test docstring as a contract-evolution footnote for future maintainers. Imports: removed unused `import logging` and `caplog` parameter (no longer asserting on log records); added `import pytest` for `pytest.raises`. No production-code change. No version bump. The fix is self-contained to tests/test_grpc_removed.py. * style(runtime): sort stdlib imports (ruff I001) The 0.7.8 commit (fail-loud on deprecated surface) added ``import warnings`` mid-block in src/nullrun/runtime.py:34, breaking alphabetical order: asyncio logging os warnings <-- out of order threading time uuid Ruff on PR #38 CI (Run ruff check src/) flagged it as I001. Reorder to alphabetical: asyncio logging os threading time uuid warnings Verified: * ruff check src/ -> All checks passed! * pytest tests/test_grpc_removed.py tests/test_runtime_branches.py -> 47 passed No behavior change, no production logic touched. Pure lint fix. * release: 0.8.0 — SDK wire-format audit (model/provider extraction) Closes a class of silent-fail-OPEN path that was sending model=None or model="unknown" on /track for many LLM-vendor paths. Every such event cost the backend a model_pricing lookup that returned no row, fell through to DEFAULT_RATE (~$30/M), and emitted a fallback warning the operator couldn't reproduce because the offending observation was buried in another package's telemetry. No public-API break. No behavior change for callers whose instrumentation already populates model correctly. Pure wire-payload hygiene. runtime.py — track(): * Strips None values from the wire payload (pre-0.8.0 forwarded every key except _WIRE_STRIP_FIELDS, including keys whose value was None). Putting {"model": null} on the wire triggered backend unwrap_or("default") and a fallback warning. Dropping None keeps the diagnostic signal loud (the new WARN below fires on missing-key, which is what we want operators to see) instead of silent (the JSON-null case). * Adds logger.warning("track(): llm_call event missing 'model' field — backend will fall back to DEFAULT_RATE. event=...") — the single signal an operator needs to reproduce "which observation produced an llm_call without model set". Activated only for llm_call; other event types are silent. instrumentation/langgraph.py — NullRunCallback.on_llm_end: * New _extract_model_from_response + _extract_provider_from_response helpers (mirror _get_finish_reason's best-effort pattern). Fallback chain: invocation_params → response metadata → AIMessage response_metadata → llm_output → direct attribute. "unknown" is now a true last resort, not the common case. instrumentation/llama_index.py: * extract_from_event fallback chain: event.response.model → event.response.raw.model → usage['model']. Mock providers and adapter-style ChatResponse now ship a real model id. instrumentation/autogen.py: * on_messages fallback chain: self.model → result.model. OpenAI's response carries the actual model id (may differ from request if the server resolved an alias). instrumentation/auto.py — _emit_from_span (openai-agents): * span model fallback chain: span['model'] → usage['model'] → span['response_metadata']['model_name']. Some custom tracer configs leave span['model'] empty; the other two sources usually have it. Sets model on the event only when we have a real value (empty/None is dropped — relies on the new None-strip in track() to keep the operator warning loud). Bumps version to 0.8.0. No breaking changes for callers who don't touch the wire path directly. * fix: 0.8.2 — coverage wire-shape (metadata nesting) + model fallback Two coordinated fixes from the 0.8.0 wire-format audit: 1. Coverage counters under metadata - src/nullrun/runtime.py: track_coverage() emits seen/tracked/ streaming_skipped dicts under event.metadata instead of the top level. SdkTrackRequest uses explicit fields with no #[serde(flatten)] catchall, so top-level keys were silently dropped by serde and the dashboard's last_coverage_pct was permanently null. - tests/test_coverage_report.py: pin the wire shape (regression test). 2. Model name extraction fallback (Issue 2) - src/nullrun/instrumentation/auto.py: when the response body extractor returns None for model (OpenAI Responses API, Anthropic streaming edge cases), fall back to the model string the user embedded in the request body via ChatOpenAI(model='gpt-4.1-mini'). Without this, every such call was zero-billed (backend unwrap_or('default') + DEFAULT_RATE ≈ $0/call). - tests/test_model_fallback.py: unit-test the helper. 3. Backend batch response schema contract tests - tests/test_batch_response_parsing.py: pin the post-2026-06-27 BatchTrackResponse shape (actions: Vec<ActionTaken>, messages: Vec<String>) and document that the legacy actions_taken: Vec<String> field is intentionally dropped in 0.8.0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Why
How
Test plan
cd backend && cargo test,cd frontend && npm test)cd frontend && npm run lint)cd frontend && npm run type-check)Risk
Checklist
CONTRIBUTING.md(if present)