diff --git a/python/docs/examples/pytest_plugin.md b/python/docs/examples/pytest_plugin.md index c464e564e..5a40d450d 100644 --- a/python/docs/examples/pytest_plugin.md +++ b/python/docs/examples/pytest_plugin.md @@ -306,6 +306,10 @@ outcomes into `TestStatus`: | Non-`AssertionError` exception escapes the test (e.g. `ValueError`, `TimeoutError`) | `ERROR`, with the formatted traceback (last 10 frames plus the first frame) on `step.error_info.error_message` | | Manual `step.current_step.update({"status": ...})` | Whatever you set; the step exit handler honors a manually-resolved status | +For the full contract, including skips, xfail/xpass, hard exits (`SystemExit`, +`KeyboardInterrupt`), setup/teardown phase failures, and propagation rules, +see the [Pass/Fail Behavior guide](../guides/pytest_plugin/pass_fail_behavior.md). + A failure or error at any depth propagates upward: the parent substep, the function step, the class/module/package steps above it, and the session report all get marked failed. diff --git a/python/docs/guides/pytest_plugin/pass_fail_behavior.md b/python/docs/guides/pytest_plugin/pass_fail_behavior.md new file mode 100644 index 000000000..6e9b1d6e3 --- /dev/null +++ b/python/docs/guides/pytest_plugin/pass_fail_behavior.md @@ -0,0 +1,164 @@ +# Pass/Fail Behavior + +The pytest plugin maps every pytest outcome to a `TestStatus` on the +corresponding Sift step. Use this page to look up what a given test will +produce, and how that result rolls up to the parent steps and the report. + +## `TestStatus` values + +The statuses below come from `sift_client.sift_types.test_report.TestStatus`. + +| Status | Meaning | +| ------------- |------------------------------------------------------------------------------------------------------------------------| +| `PASSED` | The step completed and every check it owns succeeded. | +| `FAILED` | An assertion, a `pytest.fail(...)`, a failed `report_outcome`, or a failing measurement marked it. | +| `ERROR` | An unexpected exception escaped the test body or a fixture (setup or teardown). | +| `ABORTED` | A hard exit (`SystemExit`, observed `KeyboardInterrupt`) interrupted the test. | +| `SKIPPED` | The test was skipped at collection time, at runtime, or from a fixture. | +| `IN_PROGRESS` | Test in progress or the plugin never observed a final outcome (e.g. a session-aborting interrupt killed pytest first). | + +## Normal test outcomes + +| Scenario | Trigger | Outcome | +| ----------------------------------------- | ------------------------------------ | -------- | +| Test passes | function body returns cleanly | `PASSED` | +| Assertion failure | `assert 1 == 2` | `FAILED` | +| `pytest.fail("...")` from the body | `pytest.fail("intentional failure")` | `FAILED` | +| Uncaught non-assertion exception | `raise ValueError("boom")` | `ERROR` | + +A non-assertion exception gets its formatted traceback recorded on +`step.error_info.error_message`. + +## Hard exits + +Hard exits the plugin can observe map to `ABORTED`. If pytest tears the +session down before the plugin sees the exit, the step stays at +`IN_PROGRESS` instead of resolving. + +| Scenario | Trigger | Outcome | +| ---------------------------------------------- | ------------------------- | -------------------------------------------------------------------- | +| `SystemExit` from the test body | `sys.exit(1)` | `ABORTED` | +| `KeyboardInterrupt` the plugin observes | `raise KeyboardInterrupt` | `ABORTED` | +| Session-aborting `KeyboardInterrupt` | Ctrl-C terminates pytest | `IN_PROGRESS` (session ends before the plugin's hooks fire) | + +### Abort propagation through nested substeps + +Every step that was open when the abort fired records +`ABORTED`. + +```python title="test_abort.py" +import sys + + +def test_x(step): + with step.substep(name="completed_sub"): + pass # closes as PASSED before the abort + with step.substep(name="outer_sub") as outer_sub: + with outer_sub.substep(name="inner_sub"): + sys.exit(1) # ABORTED applied to inner_sub, outer_sub, and the test step +``` + +The Sift report shows `completed_sub` as `PASSED` and the three steps +still open at the abort (`inner_sub`, `outer_sub`, and the test step +itself) as `ABORTED`. + +## Skips + +| Scenario | Trigger | Outcome | +| ------------------------------------- | --------------------------------------------- | --------- | +| Collection-time skip | `@pytest.mark.skip(reason=...)` | `SKIPPED` | +| Conditional collection-time skip | `@pytest.mark.skipif(True, reason=...)` | `SKIPPED` | +| Runtime skip from the test body | `pytest.skip("...")` | `SKIPPED` | +| Skip raised inside a fixture | `@pytest.fixture` calls `pytest.skip("...")` | `SKIPPED` | + +`SKIPPED` does not propagate as a failure. A skipped substep or test does +not block its parent from resolving to `PASSED`. + +## Expected failures (xfail / xpass) + +xfail marks declare that a test is expected to fail. The plugin follows +the same semantics pytest does. + +| Scenario | Trigger | Outcome | +| ----------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------- | +| xfail-marked test that fails | `@pytest.mark.xfail` + `assert 1 == 2` | `PASSED` (the test fulfilled the xfail expectation) | +| Strict xfail that unexpectedly passes | `@pytest.mark.xfail(strict=True)` + `assert True` | `FAILED` (the mark no longer matches reality) | +| Non-strict xfail that unexpectedly passes | `@pytest.mark.xfail()` + `assert True` | `PASSED` (`strict=False` does not insist on the failure) | +| `xfail(raises=...)` with wrong exception | `@pytest.mark.xfail(raises=ValueError)` + `raise KeyError` | `FAILED` (the `raises=` mismatch is a real test failure) | +| `xfail(run=False)` | `@pytest.mark.xfail(run=False)` | `SKIPPED` (the body never ran) | + +## Influencing outcomes from test code + +A test can also set the step's outcome directly via the helpers below. +Substeps your test opens follow the same propagation rules as the ones +the plugin opens for you. + +### Manual status override + +`step.current_step.update({...})` sets the status directly. The step's +exit handler does not overwrite it. + +```python +from sift_client.sift_types.test_report import TestStatus + + +def test_manual(step): + step.current_step.update({"status": TestStatus.FAILED}) +``` + +### `report_outcome` for externally computed checks + +`report_outcome(name, result, reason)` records a named check whose +pass/fail was computed elsewhere (a subprocess, a remote system, your own +comparison logic). A failing outcome marks the step `FAILED`. + +```python +def test_external_check(step): + result, reason = run_external_validator() + step.report_outcome("ext-validator", result, reason) +``` + +### Measurements with bounds + +`step.measure(name=, value=, bounds=)` records a measurement and resolves +the step to `FAILED` if the value is out of bounds. The call returns the +pass/fail boolean and does not raise, so multiple measurements can run +without short-circuiting. + +```python +def test_battery(step): + step.measure(name="voltage", value=12.1, bounds={"min": 11.5, "max": 13.0}, unit="V") + step.measure(name="current", value=0.42, bounds={"max": 1.0}, unit="A") +``` + +### Substep failures + +A failed substep propagates failure to its parent step. A manually-set +`SKIPPED` on a substep does not. + +```python +def test_with_substep(step): + with step.substep(name="check") as inner: + inner.measure(name="value", value=99.0, bounds={"min": 0.0, "max": 5.0}) + # The outer step resolves to FAILED because the substep failed. +``` + +## Propagation rules + +Every non-`PASSED`/`SKIPPED` step marks its parent as failed. What the +parent records depends on whether its own scope had an abort and whether +a child already failed: + +- A hard exit (`SystemExit` or an observed `KeyboardInterrupt`) in the + step's own scope records `ABORTED`. `ABORTED` propagates through every + step the abort passes through on its way up. +- A child that already recorded a non-`PASSED`/`SKIPPED` outcome marks + the parent as `FAILED`. This holds whether or not an exception is still + propagating through the parent's scope: only the originating substep + records `ERROR`; ancestors inherit `FAILED`. The traceback stays on + the originating step's `error_info`. +- A step records `ERROR` only when its own scope raised a non-Assertion + exception AND no child has failed. + +`SKIPPED` does not propagate. A status set explicitly via +`current_step.update` is kept. diff --git a/python/lib/sift_client/_tests/pytest_plugin/_fakes.py b/python/lib/sift_client/_tests/pytest_plugin/_fakes.py deleted file mode 100644 index 460100daa..000000000 --- a/python/lib/sift_client/_tests/pytest_plugin/_fakes.py +++ /dev/null @@ -1,132 +0,0 @@ -"""Test doubles for the pytester-driven pytest-plugin tests. - -The fake ``ReportContext`` is a drop-in for the real one that records every -step creation to a JSON file at session exit. Used by ``test_parametrize.py`` -to assert the step tree produced by an inner pytester pytest run. -""" - -from __future__ import annotations - -import itertools -import json -from typing import TYPE_CHECKING, Any -from unittest.mock import MagicMock - -if TYPE_CHECKING: - from pathlib import Path - - -class FakeStep: - def __init__(self, id_: str, name: str, parent_step_id: str | None, step_path: str) -> None: - self.id_ = id_ - self.name = name - self.parent_step_id = parent_step_id - self.step_path = step_path - self.status: Any = None - self.description: Any = None - self.error_info: Any = None - - def update(self, fields: dict[str, Any]) -> None: - for k, v in fields.items(): - setattr(self, k, v) - - -class FakeReport: - def __init__(self) -> None: - self.id_ = "report-id" - - def update(self, fields: dict[str, Any]) -> None: - pass - - -class FakeReportContext: - def __init__(self, steps_file: Path) -> None: - self.steps_file = steps_file - self.report = FakeReport() - self.client = MagicMock() - self.step_stack: list[FakeStep] = [] - self.step_number_at_depth: dict[int, int] = {} - self.open_step_results: dict[str, bool] = {} - self.any_failures = False - self.log_file: Path | None = None - self.steps: list[dict[str, Any]] = [] - self._ids = itertools.count(1) - - def __enter__(self) -> FakeReportContext: - return self - - def __exit__(self, *_: Any) -> None: - self.steps_file.write_text(json.dumps(self.steps)) - - def new_step( - self, - name: str, - description: str | None = None, - assertion_as_fail_not_error: bool = True, - metadata: dict[str, Any] | None = None, - ) -> Any: - # Reuse the real NewStep machinery — it talks to this fake via the - # methods below. - from sift_client.util.test_results.context_manager import NewStep - - return NewStep( - self, # type: ignore[arg-type] - name=name, - description=description, - assertion_as_fail_not_error=assertion_as_fail_not_error, - metadata=metadata, - ) - - def get_next_step_path(self) -> str: - top = self.step_stack[-1] if self.step_stack else None - path = top.step_path if top else "" - next_n = self.step_number_at_depth.get(len(self.step_stack), 0) + 1 - prefix = f"{path}." if path else "" - return f"{prefix}{next_n}" - - def create_step( - self, - name: str, - description: str | None = None, - metadata: dict[str, Any] | None = None, - ) -> FakeStep: - step_path = self.get_next_step_path() - parent = self.step_stack[-1] if self.step_stack else None - step = FakeStep( - id_=f"step-{next(self._ids)}", - name=name, - parent_step_id=parent.id_ if parent else None, - step_path=step_path, - ) - self.step_number_at_depth[len(self.step_stack)] = ( - self.step_number_at_depth.get(len(self.step_stack), 0) + 1 - ) - self.step_stack.append(step) - self.open_step_results[step.step_path] = True - self.steps.append( - { - "id": step.id_, - "name": name, - "parent_step_id": step.parent_step_id, - "step_path": step_path, - } - ) - return step - - def record_step_outcome(self, outcome: bool, step: FakeStep) -> None: - if not outcome: - self.open_step_results[step.step_path] = False - self.any_failures = True - - def resolve_and_propagate_step_result(self, step: FakeStep, error_info: Any = None) -> bool: - result = self.open_step_results.get(step.step_path, True) - if error_info: - result = False - return result - - def exit_step(self, step: FakeStep) -> None: - self.step_number_at_depth[len(self.step_stack)] = 0 - stack_top = self.step_stack.pop() - self.open_step_results.pop(step.step_path) - if stack_top.id_ != step.id_: - raise ValueError("popped step was not the top of the stack") diff --git a/python/lib/sift_client/_tests/pytest_plugin/_step_status_capture.py b/python/lib/sift_client/_tests/pytest_plugin/_step_status_capture.py new file mode 100644 index 000000000..e92d1726e --- /dev/null +++ b/python/lib/sift_client/_tests/pytest_plugin/_step_status_capture.py @@ -0,0 +1,139 @@ +"""Read step status sequences from a Sift offline-mode log file. + +The contract suite drives each scenario through an inner pytester session +run with ``--sift-offline``, which causes the real plugin + ``ReportContext`` +to write every test-result API call to a JSONL log. This module parses +that log into a per-step status timeline that ``test_pass_fail.py`` asserts +against, with no test-only ``ReportContext`` fake required. +""" + +from __future__ import annotations + +import json +from dataclasses import dataclass, field +from typing import TYPE_CHECKING + +from sift_client._internal.low_level_wrappers._test_results_log import iter_log_data_lines +from sift_client.sift_types.test_report import TestStatus + +if TYPE_CHECKING: + from pathlib import Path + + +@dataclass +class CapturedStep: + step_id: str + name: str + step_path: str + parent_step_id: str | None + statuses: list[TestStatus] = field(default_factory=list) + + +_PROTO_STATUS_NAMES = { + "TEST_STATUS_UNSPECIFIED": TestStatus.UNSPECIFIED, + "TEST_STATUS_DRAFT": TestStatus.DRAFT, + "TEST_STATUS_PASSED": TestStatus.PASSED, + "TEST_STATUS_FAILED": TestStatus.FAILED, + "TEST_STATUS_ABORTED": TestStatus.ABORTED, + "TEST_STATUS_ERROR": TestStatus.ERROR, + "TEST_STATUS_IN_PROGRESS": TestStatus.IN_PROGRESS, + "TEST_STATUS_SKIPPED": TestStatus.SKIPPED, +} + + +def _status(name: str | None) -> TestStatus: + if name is None: + return TestStatus.UNSPECIFIED + return _PROTO_STATUS_NAMES.get(name, TestStatus.UNSPECIFIED) + + +def parse_log(log_path: Path) -> dict[str, CapturedStep]: + """Parse the offline log into ``{step_id: CapturedStep}``. + + Walks the JSONL file in order, building a ``CapturedStep`` for each + ``CreateTestStep`` entry and appending the new status from each + ``UpdateTestStep`` entry. + """ + steps: dict[str, CapturedStep] = {} + for request_type, response_id, json_str in iter_log_data_lines(log_path): + payload = json.loads(json_str) + test_step = payload.get("testStep", {}) + if request_type == "CreateTestStep" and response_id: + steps[response_id] = CapturedStep( + step_id=response_id, + name=test_step.get("name", ""), + step_path=test_step.get("stepPath", ""), + parent_step_id=test_step.get("parentStepId") or None, + statuses=[_status(test_step.get("status"))], + ) + elif request_type == "UpdateTestStep": + step_id = test_step.get("testStepId") + new_status = test_step.get("status") + if step_id and step_id in steps and new_status is not None: + steps[step_id].statuses.append(_status(new_status)) + return steps + + +_active_log: Path | None = None +_cached: dict[str, CapturedStep] | None = None + + +def set_log(path: Path) -> None: + """Point subsequent queries at a new log file. Clears the parse cache.""" + global _active_log, _cached + _active_log = path + _cached = None + + +def _steps() -> dict[str, CapturedStep]: + global _cached + if _cached is None: + if _active_log is None or not _active_log.exists(): + _cached = {} + else: + _cached = parse_log(_active_log) + return _cached + + +def steps_by_name(name: str) -> list[CapturedStep]: + return [s for s in _steps().values() if s.name == name] + + +def test_step(name: str) -> CapturedStep | None: + """The step the autouse ``step`` fixture creates for the test function. + + Multiple steps can share a name (e.g. when the makereport hook records an + inline step for a collection-time skip on top of the autouse step). The + autouse step is the shallowest by path depth. + """ + matches = steps_by_name(name) + if not matches: + return None + return min(matches, key=lambda s: s.step_path.count(".")) + + +def final_status(name: str) -> TestStatus | None: + step = test_step(name) + return step.statuses[-1] if step and step.statuses else None + + +def load_steps(log_path: Path) -> list[dict]: + """Load the offline log as a list of step records keyed by hierarchy fields. + + Each record has ``id``, ``name``, ``parent_step_id``, ``step_path``, the + shape ``test_hierarchy.py`` expects for its ``_by_name`` and + ``_ancestor_names`` walkers. Returns an empty list if the log was never + created (e.g. every item in the inner session was ``sift_exclude``-d, so + the plugin's ``report_context`` fixture never fired). + """ + if not log_path.exists(): + return [] + return [ + { + "id": s.step_id, + "name": s.name, + "parent_step_id": s.parent_step_id, + "step_path": s.step_path, + } + for s in parse_log(log_path).values() + ] diff --git a/python/lib/sift_client/_tests/pytest_plugin/step_status_states.md b/python/lib/sift_client/_tests/pytest_plugin/step_status_states.md new file mode 100644 index 000000000..7e366a512 --- /dev/null +++ b/python/lib/sift_client/_tests/pytest_plugin/step_status_states.md @@ -0,0 +1,105 @@ +# Pytest-plugin step-status: test scenarios + +Reference for the pass/fail scenarios covered by +[`test_pass_fail.py`](test_pass_fail.py). Each row pairs a scenario with the +`TestStatus` the plugin records, and maps to the user-facing contract in +[`docs/guides/pytest_plugin/pass_fail_behavior.md`](../../../../docs/guides/pytest_plugin/pass_fail_behavior.md). + +`TestStatus` values come from `sift_client.sift_types.test_report.TestStatus`: +`PASSED`, `FAILED`, `ERROR`, `SKIPPED`, `ABORTED`, `IN_PROGRESS`. Hard process +exits the plugin can observe (`SystemExit`, `KeyboardInterrupt` when pytest +delivers a call-phase report) map to `ABORTED`. A session-aborting interrupt +that fires before the plugin sees it leaves the step in `IN_PROGRESS`. + +## Case ID scheme + +Each scenario has a stable case ID of the form `PREFIX-NN`. Tests in +`test_pass_fail.py` reference their case ID in a leading comment so a test can +be traced back to its row here without rereading the scenario: + +| Prefix | Section | +| ------- | ---------------------------------------- | +| `CALL` | Call-phase exit paths | +| `SKIP` | Skip paths | +| `XFAIL` | xfail / xpass | +| `PHASE` | Setup / teardown phases | +| `COLL` | Collection / fixture-resolution failures | +| `API` | Plugin-API exit paths | + + +## Call-phase exit paths + +| Case | Scenario | Trigger | Outcome | +| --------- | ------------------------------- | ------------------------------------ | -------------------------------------------------------------------------------------------------------- | +| `CALL-01` | Test passes | function body returns cleanly | `PASSED` | +| `CALL-02` | Assert failure in call phase | `assert 1 == 2` | `FAILED` | +| `CALL-03` | Generic exception in call phase | `raise ValueError("boom")` | `ERROR` | +| `CALL-04` | `pytest.fail("...")` from body | `pytest.fail("intentional failure")` | `FAILED` | +| `CALL-05` | `SystemExit` from the test body | `sys.exit(1)` | `ABORTED` | +| `CALL-06` | `KeyboardInterrupt` in body | `raise KeyboardInterrupt` | `IN_PROGRESS` — session aborts before the plugin sees the interrupt; `ABORTED` if the plugin does see it | +| `CALL-07` | Substep raises non-Assertion exception | `with step.substep(...): raise ValueError("boom")` | Substep `ERROR`, test step `FAILED` (child-failed signal outranks the propagating exception) | + +## Skip paths + +| Case | Scenario | Trigger | Outcome | +| --------- | -------------------------------- | -------------------------------------------- | ------------------------------------------------------------------------ | +| `SKIP-01` | Collection-time skip | `@pytest.mark.skip(reason=...)` | `SKIPPED` — only the makereport hook records a step; no autouse step ran | +| `SKIP-02` | Conditional collection-time skip | `@pytest.mark.skipif(True, reason=...)` | `SKIPPED` — same route as `@pytest.mark.skip` | +| `SKIP-03` | Runtime skip in body | `pytest.skip("...")` | Outer step `SKIPPED`; no duplicate nested step | +| `SKIP-04` | Skip raised inside a fixture | `@pytest.fixture` calls `pytest.skip("...")` | Outer step `SKIPPED` (setup-phase skip); no duplicate nested step | + +## xfail / xpass + +| Case | Scenario | Trigger | Outcome | +| ---------- | ----------------------------------------- | ---------------------------------------------------------- | -------------------------------------------------------- | +| `XFAIL-01` | xfail-marked test that fails | `@pytest.mark.xfail` + `assert 1 == 2` | `PASSED` — test fulfilled the xfail expectation | +| `XFAIL-02` | Strict xfail that unexpectedly passes | `@pytest.mark.xfail(strict=True)` + `assert True` | `FAILED` — mark no longer matches reality | +| `XFAIL-03` | Non-strict xfail that unexpectedly passes | `@pytest.mark.xfail()` + `assert True` | `PASSED` — `strict=False` doesn't insist on the failure | +| `XFAIL-04` | `xfail(raises=...)` with wrong exception | `@pytest.mark.xfail(raises=ValueError)` + `raise KeyError` | `FAILED` — `raises=` mismatch is a real test failure | +| `XFAIL-05` | `xfail(run=False)` | `@pytest.mark.xfail(run=False)` (body never executed) | `SKIPPED` — the test never ran | + +## Setup / teardown phases + +| Case | Scenario | Trigger | Outcome | +| ---------- | -------------------------------------------- | ------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- | +| `PHASE-01` | Setup-phase fixture failure (RuntimeError) | `@pytest.fixture` raises before `yield`; test body never runs | `ERROR` — plugin reads the setup-phase report and maps `failed` → `ERROR` (a `phase=setup` annotation is a planned follow-up) | +| `PHASE-02` | Teardown-phase fixture failure | `@pytest.fixture` raises after `yield`; test body passed | `FAILED` — plugin upgrades a passed step when the teardown report shows `failed` (a `phase=teardown` annotation is a planned follow-up) | +| `PHASE-03` | Call-phase fail **plus** teardown-phase fail | `assert 1 == 2` in body AND `@pytest.fixture` raises after `yield` | `FAILED` — call-phase failure dominates; surfacing the teardown error alongside is a planned follow-up | + +## Collection / fixture-resolution failures + +| Case | Scenario | Trigger | Outcome | +| --------- | --------------- | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------ | +| `COLL-01` | Missing fixture | `def test_x(nonexistent_fixture):` | `ERROR` — missing fixture surfaces as a setup-phase failure (a `phase=setup` annotation is a planned follow-up) | + +## Plugin-API exit paths (in-test mutations) + +| Case | Scenario | Trigger | Outcome | +| -------- | --------------------------------- | ------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | +| `API-01` | Manual status override | `step.current_step.update({"status": TestStatus.FAILED})` | `FAILED` | +| `API-02` | `report_outcome(result=False)` | `step.report_outcome("the_check", False, "did not match")` | `FAILED` | +| `API-03` | `measure(...)` out-of-bounds | `step.measure(name="m", value=10.0, bounds={"min": 0.0, "max": 5.0})` | `FAILED` | +| `API-04` | Failed measurement on a substep | `with step.substep(...) as s: s.measure(... out-of-bounds)` | `FAILED` — propagates from substep to parent | +| `API-05` | Manually-skipped substep | `with step.substep(...) as s: s.current_step.update({"status": SKIPPED})` | Parent step `PASSED` — skip does not propagate as a failure | +| `API-06` | Hard exit inside a nested substep | `with step.substep(...) as s: with s.substep(...): sys.exit(1)` | Every open step on the unwind path records `ABORTED`; a sibling substep that closed before the abort keeps its prior status | + +## Out of scope + +Scenarios deliberately not covered by this suite: + +- **Timeout** — needs `pytest-timeout` or a manual signal harness. +- **Signal (SIGKILL / SIGTERM)** — cannot be caught from inside the process; + needs a subprocess-level harness. +- **`pytest.exit("...")`** — niche; the "aborts subsequent tests" behavior + is hard to characterize cleanly because each `pytester` invocation is + its own session. +- **`os._exit()`** — bypasses Python cleanup entirely; can't be tested + in-process because it would kill the outer pytest run. Guaranteed + data-loss case alongside `SystemExit` / `SIGKILL`. +- **Parametrize-level marks** (`pytest.param(..., marks=pytest.mark.xfail / skip)`) + — routes through a different selection path but produces the same + `report.outcome`, so behavior matches the function-level marks already + covered above. +- **Import error / syntax error / `conftest.py` error** — these fail + collection entirely; no `item` is produced and no plugin hook fires, so + no Sift step is recorded. diff --git a/python/lib/sift_client/_tests/pytest_plugin/test_hierarchy.py b/python/lib/sift_client/_tests/pytest_plugin/test_hierarchy.py index cecad2df8..1efd4e817 100644 --- a/python/lib/sift_client/_tests/pytest_plugin/test_hierarchy.py +++ b/python/lib/sift_client/_tests/pytest_plugin/test_hierarchy.py @@ -4,65 +4,43 @@ classes (including nested), parametrize axes — plus the ini opt-out flags, failure-cleanup semantics, and the drain helper. -Each test spins up an inner pytest run via ``pytester`` whose conftest swaps -in a ``FakeReportContext`` (from ``_fakes.py``) that records every step -creation to a JSON file. The outer test reads that file and asserts the -resulting step tree. +Each test spins up an inner pytest run via ``pytester`` configured with +``--sift-offline`` and a known log path. The plugin writes every test-result +API call to that JSONL log, and the outer test parses it via +``_step_status_capture.load_steps`` to reconstruct the step tree. """ from __future__ import annotations -import json -from pathlib import Path as _Path from textwrap import dedent from typing import TYPE_CHECKING import pytest +from sift_client._tests.pytest_plugin import _step_status_capture as capture + if TYPE_CHECKING: from pathlib import Path -_STEPS_FILE_ENV = "SIFT_FAKE_STEPS_FILE" - -# ``_fakes.py`` is excluded from the wheel by ``pyproject.toml``'s -# ``packages.find`` rule that strips ``sift_client._tests``. The inner -# pytester subprocess uses the installed package and cannot import from -# ``sift_client._tests``. Embed the fake source directly into the inner -# conftest so the subprocess gets a fully self-contained module to load. -_FAKES_SOURCE = (_Path(__file__).parent / "_fakes.py").read_text() - -_INNER_CONFTEST = f""" -{_FAKES_SOURCE} - -import os -from pathlib import Path -from unittest.mock import MagicMock - -import pytest - -pytest_plugins = ["sift_client.pytest_plugin"] +_INNER_CONFTEST = 'pytest_plugins = ["sift_client.pytest_plugin"]\n' -@pytest.fixture(scope="session") -def sift_client(): - return MagicMock() - -@pytest.fixture(scope="session", autouse=True) -def report_context(sift_client): - import sift_client.pytest_plugin as plugin_module - steps_file = Path(os.environ[{_STEPS_FILE_ENV!r}]) - with FakeReportContext(steps_file) as ctx: - plugin_module.REPORT_CONTEXT = ctx - yield ctx -""" +def _base_ini_lines(log_path: Path) -> list[str]: + """Default ini settings every inner pytester run needs.""" + return [ + "[pytest]", + "sift_offline = true", + f"sift_log_file = {log_path}", + "sift_git_metadata = false", + ] @pytest.fixture -def steps_file(pytester: pytest.Pytester, monkeypatch: pytest.MonkeyPatch) -> Path: - path = pytester.path / "captured_steps.json" +def log_file(pytester: pytest.Pytester) -> Path: + path = pytester.path / "sift.log" pytester.makeconftest(_INNER_CONFTEST) - monkeypatch.setenv(_STEPS_FILE_ENV, str(path)) + pytester.makefile(".ini", pytest="\n".join(_base_ini_lines(path)) + "\n") return path @@ -85,9 +63,7 @@ def _ancestor_names(steps: list[dict], leaf: dict) -> list[str]: return chain -def test_class_methods_cluster_under_class_step( - pytester: pytest.Pytester, steps_file: Path -) -> None: +def test_class_methods_cluster_under_class_step(pytester: pytest.Pytester, log_file: Path) -> None: pytester.makepyfile( test_klass=dedent( """ @@ -102,7 +78,7 @@ def test_b(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) assert len(by_name["TestFoo"]) == 1 class_id = by_name["TestFoo"][0]["id"] @@ -110,7 +86,7 @@ def test_b(self): assert by_name["test_b"][0]["parent_step_id"] == class_id -def test_nested_classes_produce_nested_steps(pytester: pytest.Pytester, steps_file: Path) -> None: +def test_nested_classes_produce_nested_steps(pytester: pytest.Pytester, log_file: Path) -> None: pytester.makepyfile( test_nested=dedent( """ @@ -123,7 +99,7 @@ def test_a(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=1) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) assert len(by_name["TestOuter"]) == 1 assert len(by_name["TestInner"]) == 1 @@ -136,7 +112,7 @@ def test_a(self): ] -def test_class_parametrize_nests_under_class(pytester: pytest.Pytester, steps_file: Path) -> None: +def test_class_parametrize_nests_under_class(pytester: pytest.Pytester, log_file: Path) -> None: pytester.makepyfile( test_cp=dedent( """ @@ -151,7 +127,7 @@ def test_a(self, v): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) class_id = by_name["TestFoo"][0]["id"] test_a_id = by_name["test_a"][0]["id"] @@ -160,7 +136,7 @@ def test_a(self, v): assert by_name["v=2"][0]["parent_step_id"] == test_a_id -def test_two_sibling_classes_in_module(pytester: pytest.Pytester, steps_file: Path) -> None: +def test_two_sibling_classes_in_module(pytester: pytest.Pytester, log_file: Path) -> None: pytester.makepyfile( test_sib=dedent( """ @@ -176,7 +152,7 @@ def test_y(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) mod_id = by_name["test_sib.py"][0]["id"] assert by_name["TestA"][0]["parent_step_id"] == mod_id @@ -186,7 +162,7 @@ def test_y(self): assert len(by_name["TestB"]) == 1 -def test_mixed_class_and_free_function(pytester: pytest.Pytester, steps_file: Path) -> None: +def test_mixed_class_and_free_function(pytester: pytest.Pytester, log_file: Path) -> None: pytester.makepyfile( test_mix=dedent( """ @@ -201,7 +177,7 @@ def test_free(): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) mod_id = by_name["test_mix.py"][0]["id"] # Class method parents to TestA; free function parents directly to module. @@ -211,7 +187,7 @@ def test_free(): def test_class_with_all_excluded_methods_no_class_step( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: pytester.makepyfile( test_excl=dedent( @@ -231,14 +207,14 @@ def test_b(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) assert "TestFoo" not in by_name assert "test_a" not in by_name assert "test_b" not in by_name -def test_sift_exclude_on_class_propagates(pytester: pytest.Pytester, steps_file: Path) -> None: +def test_sift_exclude_on_class_propagates(pytester: pytest.Pytester, log_file: Path) -> None: pytester.makepyfile( test_clsexcl=dedent( """ @@ -256,14 +232,14 @@ def test_b(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) assert "TestFoo" not in by_name assert "test_a" not in by_name def test_class_docstring_becomes_step_description( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: pytester.makepyfile( test_doc=dedent( @@ -278,7 +254,7 @@ def test_a(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=1) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) # The fake records step creation but not all fields — check the class # step was recorded, then read the description via the FakeStep's @@ -289,7 +265,7 @@ def test_a(self): def test_transition_between_class_chains_drains_parametrize( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: pytester.makepyfile( test_trans=dedent( @@ -310,7 +286,7 @@ def test_y(self, w): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) # Each class opens exactly once; parametrize parents under the right class. assert len(by_name["TestA"]) == 1 @@ -396,7 +372,7 @@ def __exit__(self, *_: object) -> None: def test_failing_test_in_class_does_not_orphan_class_step( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: """A failing class method must not block the class step from cleaning up. @@ -422,7 +398,7 @@ def test_c(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2, failed=1) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) assert len(by_name["TestFoo"]) == 1 assert len(by_name["TestBar"]) == 1 @@ -439,7 +415,7 @@ def test_c(self): def test_failing_parametrized_method_in_class_closes_full_chain( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: """A failing parametrized class method must not orphan its parametrize parents.""" pytester.makepyfile( @@ -460,7 +436,7 @@ def test_b(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2, failed=1) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) foo_id = by_name["TestFoo"][0]["id"] test_a_id = by_name["test_a"][0]["id"] @@ -476,18 +452,18 @@ def test_b(self): # --------------------------------------------------------------------------- -def _write_ini(pytester: pytest.Pytester, **overrides: object) -> None: - """Write a pytest.ini with the given sift_* overrides set under [pytest].""" - lines = ["[pytest]"] +def _write_ini(pytester: pytest.Pytester, log_file: Path, **overrides: object) -> None: + """Write a pytest.ini with the given sift_* overrides, preserving the + offline/log/git-metadata defaults the ``log_file`` fixture installs. + """ + lines = _base_ini_lines(log_file) for key, value in overrides.items(): lines.append(f"{key} = {value}") pytester.makefile(".ini", pytest="\n".join(lines) + "\n") -def test_sift_class_step_false_skips_class_steps( - pytester: pytest.Pytester, steps_file: Path -) -> None: - _write_ini(pytester, sift_class_step="false") +def test_sift_class_step_false_skips_class_steps(pytester: pytest.Pytester, log_file: Path) -> None: + _write_ini(pytester, log_file, sift_class_step="false") pytester.makepyfile( test_noclass=dedent( """ @@ -502,7 +478,7 @@ def test_b(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) assert "TestFoo" not in by_name mod_id = by_name["test_noclass.py"][0]["id"] @@ -511,9 +487,9 @@ def test_b(self): def test_sift_module_step_false_skips_module_step( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: - _write_ini(pytester, sift_module_step="false") + _write_ini(pytester, log_file, sift_module_step="false") pytester.makepyfile( test_nomod=dedent( """ @@ -525,7 +501,7 @@ def test_a(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=1) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) assert "test_nomod.py" not in by_name # TestFoo attaches to the report root (no parent recorded by the fake). @@ -534,9 +510,9 @@ def test_a(self): def test_sift_parametrize_nesting_false_keeps_flat_leaves( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: - _write_ini(pytester, sift_parametrize_nesting="false") + _write_ini(pytester, log_file, sift_parametrize_nesting="false") pytester.makepyfile( test_flat=dedent( """ @@ -550,7 +526,7 @@ def test_a(v): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) # No parametrize parent step. assert "test_a" not in by_name @@ -564,7 +540,7 @@ def test_a(v): def test_sift_module_step_false_still_drains_across_modules( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: """sift_module_step=false must not merge same-named classes across modules. @@ -572,7 +548,7 @@ def test_sift_module_step_false_still_drains_across_modules( (even when it's not rendered as a step), so two modules each declaring ``class TestFoo`` produce two distinct ``TestFoo`` frames in the diff. """ - _write_ini(pytester, sift_module_step="false") + _write_ini(pytester, log_file, sift_module_step="false") pytester.makepyfile( test_a=dedent( """ @@ -591,7 +567,7 @@ def test_y(self): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) # Two distinct TestFoo class steps — one per module — not a shared frame. assert len(by_name["TestFoo"]) == 2 @@ -605,7 +581,7 @@ def test_y(self): def test_package_step_default_opens_for_init_dirs( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: """Default: a directory with ``__init__.py`` produces a parent package step.""" pytester.mkpydir("pkg_a") @@ -619,7 +595,7 @@ def test_one(): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=1) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) assert "pkg_a" in by_name pkg_id = by_name["pkg_a"][0]["id"] @@ -628,7 +604,7 @@ def test_one(): def test_same_named_packages_in_different_dirs_do_not_merge( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: """Two packages with the same display name but different paths must stay distinct. @@ -663,7 +639,7 @@ def test_two(): # name on disk don't collide during sys.path-based import. result = pytester.runpytest_subprocess("-v", "--import-mode=importlib") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) # Two distinct ``utils`` package steps — one per project. assert len(by_name["utils"]) == 2 @@ -677,10 +653,10 @@ def test_two(): def test_sift_package_step_false_skips_package_steps( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: """With ``sift_package_step=false`` the directory step is suppressed.""" - _write_ini(pytester, sift_package_step="false") + _write_ini(pytester, log_file, sift_package_step="false") pytester.mkpydir("pkg_a") (pytester.path / "pkg_a" / "test_x.py").write_text( dedent( @@ -692,7 +668,7 @@ def test_one(): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=1) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) assert "pkg_a" not in by_name # The module step still opens and is now the top-level frame. @@ -700,10 +676,11 @@ def test_one(): def test_all_three_flags_false_matches_legacy_behavior( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: _write_ini( pytester, + log_file, sift_module_step="false", sift_class_step="false", sift_parametrize_nesting="false", @@ -722,7 +699,7 @@ def test_a(self, v): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) # No module, class, or parametrize parents — just bracket-mangled leaves. assert "test_legacy.py" not in by_name @@ -740,7 +717,7 @@ def test_a(self, v): def test_single_parametrize_clusters_under_originalname( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: pytester.makepyfile( test_rail=dedent( @@ -755,7 +732,7 @@ def test_rail(v): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) # Module step + one shared `test_rail` parent + two leaves. assert len(by_name["test_rail.py"]) == 1 @@ -768,7 +745,7 @@ def test_rail(v): def test_stacked_parametrize_nests_outer_to_inner( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: pytester.makepyfile( test_iso=dedent( @@ -784,7 +761,7 @@ def test_iso(voltage, component): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=4) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) # One `test_iso` parent, two `voltage='…'` parents, four `component='…'` leaves. assert len(by_name["test_iso"]) == 1 @@ -806,7 +783,7 @@ def test_iso(voltage, component): assert leaf["parent_step_id"] in voltage_ids -def test_fixture_parametrization_participates(pytester: pytest.Pytester, steps_file: Path) -> None: +def test_fixture_parametrization_participates(pytester: pytest.Pytester, log_file: Path) -> None: pytester.makepyfile( test_widget=dedent( """ @@ -823,7 +800,7 @@ def test_widget(widget): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=2) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) assert len(by_name["test_widget"]) == 1 parent_id = by_name["test_widget"][0]["id"] @@ -832,7 +809,7 @@ def test_widget(widget): def test_module_boundary_isolates_parametrize_stack( - pytester: pytest.Pytester, steps_file: Path + pytester: pytest.Pytester, log_file: Path ) -> None: pytester.makepyfile( test_a=dedent( @@ -856,7 +833,7 @@ def test_two(w): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=4) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) by_name = _by_name(steps) # Each module step contains its own `test_one`/`test_two` parametrize subtree. mod_a = by_name["test_a.py"][0] @@ -865,9 +842,7 @@ def test_two(w): assert by_name["test_two"][0]["parent_step_id"] == mod_b["id"] -def test_leaf_parent_chain_terminates_at_report( - pytester: pytest.Pytester, steps_file: Path -) -> None: +def test_leaf_parent_chain_terminates_at_report(pytester: pytest.Pytester, log_file: Path) -> None: pytester.makepyfile( test_chain=dedent( """ @@ -882,7 +857,7 @@ def test_chain(a, b): ) result = pytester.runpytest_subprocess("-v") result.assert_outcomes(passed=1) - steps = json.loads(steps_file.read_text()) + steps = capture.load_steps(log_file) leaf = next(s for s in steps if s["name"].startswith("b=")) chain = _ancestor_names(steps, leaf) # leaf b=… → a=… → test_chain → test_chain.py (module step) → root diff --git a/python/lib/sift_client/_tests/pytest_plugin/test_pass_fail.py b/python/lib/sift_client/_tests/pytest_plugin/test_pass_fail.py new file mode 100644 index 000000000..0e1540ce7 --- /dev/null +++ b/python/lib/sift_client/_tests/pytest_plugin/test_pass_fail.py @@ -0,0 +1,562 @@ +"""Contract suite: maps each pytest exit path to the ``TestStatus`` the +Sift pytest plugin is required to record on the outer step. + +Each scenario writes a tiny inner test file and runs it through pytester +with a fake ``sift_client`` injected via a generated conftest. The fake +records every step status write into ``_step_status_capture.CAPTURED_STEPS`` +so this outer test can assert on what the plugin produced. + +Assertions encode the contract from +``docs/guides/pytest_plugin/pass_fail_behavior.md``. Tests for scenarios the +plugin does not yet handle correctly are expected to **fail today** — they +are the punch list. ``lib/sift_client/_tests/pytest_plugin/step_status_states.md`` +tracks each scenario's observed-today behavior next to the target so the +remaining gaps are visible without running the suite. +""" + +from __future__ import annotations + +import textwrap + +import pytest + +from sift_client._tests.pytest_plugin import _step_status_capture as capture +from sift_client.sift_types.test_report import TestStatus + +pytest_plugins = ["pytester"] + + +_INNER_CONFTEST_SRC = ''' +"""Auto-generated conftest. Loading the Sift plugin is the only thing the +inner session needs. ``--sift-offline`` on the CLI causes the plugin's +default ``sift_client`` fixture to construct a placeholder client and the +real ``ReportContext`` writes every API call to the JSONL log without +contacting Sift. +""" + +pytest_plugins = ["sift_client.pytest_plugin"] +''' + + +@pytest.fixture +def inner(pytester): + """Install the inner conftest. Returns ``pytester``.""" + pytester.makeconftest(_INNER_CONFTEST_SRC) + return pytester + + +# Prepended to every inner test file. Pytest skips marker-based ``skip`` items +# before any autouse fixture runs, which would leave ``REPORT_CONTEXT`` unset +# and the plugin's inline-skip recording inert. A single passing item up-front +# forces ``report_context`` to initialize so the makereport hook can record +# the skip into the same session's JSONL. +_WARMUP = "def test_sift_warmup(): pass\n\n" + + +def _run(pytester, body: str) -> None: + pytester.makepyfile(_WARMUP + textwrap.dedent(body)) + log_path = pytester.path / "sift.log" + capture.set_log(log_path) + pytester.runpytest_inprocess( + "--sift-offline", + f"--sift-log-file={log_path}", + "--no-sift-git-metadata", + ) + + +# --------------------------------------------------------------------------- +# Call-phase exit paths +# --------------------------------------------------------------------------- + + +def test_pass_maps_to_passed(inner): + # Case: CALL-01 + _run( + inner, + """ + def test_x(): + assert True + """, + ) + assert capture.final_status("test_x") == TestStatus.PASSED + + +def test_assert_failure_maps_to_failed(inner): + # Case: CALL-02 + _run( + inner, + """ + def test_x(): + assert 1 == 2 + """, + ) + assert capture.final_status("test_x") == TestStatus.FAILED + + +def test_generic_exception_maps_to_error(inner): + # Case: CALL-03 + _run( + inner, + """ + def test_x(): + raise ValueError("boom") + """, + ) + assert capture.final_status("test_x") == TestStatus.ERROR + + +def test_system_exit_maps_to_aborted(inner): + # Case: CALL-05 + _run( + inner, + """ + import sys + def test_x(): + sys.exit(1) + """, + ) + assert capture.final_status("test_x") == TestStatus.ABORTED + + +def test_pytest_fail_maps_to_failed(inner): + # Case: CALL-04 + _run( + inner, + """ + import pytest + def test_x(): + pytest.fail("intentional failure") + """, + ) + assert capture.final_status("test_x") == TestStatus.FAILED + + +def test_keyboard_interrupt_leaves_step_in_progress(inner): + # Case: CALL-06 + # KeyboardInterrupt aborts the session before the call-phase makereport + # fires; the plugin can't observe the interrupt. The contract is that + # the step is left in IN_PROGRESS rather than being silently resolved + # to PASSED — a session-aborting interrupt should not look like a clean + # pass in the report. + try: + _run( + inner, + """ + def test_x(): + raise KeyboardInterrupt + """, + ) + except KeyboardInterrupt: + pass + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.IN_PROGRESS + + +def test_substep_exception_records_error_with_failed_parent(inner): + # Case: CALL-07 + _run( + inner, + """ + def test_x(step): + with step.substep(name="inner"): + raise ValueError("boom") + """, + ) + # Only the originating substep records ERROR. The test step inherits the + # child-failed signal and resolves to FAILED, even though the same + # ValueError propagated through its scope. + inner_sub = next(iter(capture.steps_by_name("inner")), None) + test_x = capture.test_step("test_x") + assert inner_sub is not None + assert test_x is not None + assert inner_sub.statuses[-1] == TestStatus.ERROR + assert test_x.statuses[-1] == TestStatus.FAILED + + +# --------------------------------------------------------------------------- +# Skip paths +# --------------------------------------------------------------------------- + + +def test_pytest_skip_in_body_maps_to_skipped(inner): + # Case: SKIP-03 + _run( + inner, + """ + import pytest + def test_x(): + pytest.skip("not today") + """, + ) + # Runtime skip in the body resolves the outer step to SKIPPED. The + # makereport hook must not create a duplicate nested step. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.SKIPPED + duplicates = [s for s in capture.steps_by_name("test_x") if s is not outer] + assert not duplicates, f"expected no duplicate nested step; got {len(duplicates)}" + + +def test_pytest_mark_skip_records_skipped(inner): + # Case: SKIP-01 + _run( + inner, + """ + import pytest + @pytest.mark.skip(reason="collection-time skip") + def test_x(): + assert False + """, + ) + # Collection-time skip: the autouse step fixture never runs. Only the + # makereport hook creates a step, with status SKIPPED. + assert capture.final_status("test_x") == TestStatus.SKIPPED + + +def test_pytest_mark_skipif_records_skipped(inner): + # Case: SKIP-02 + _run( + inner, + """ + import pytest + @pytest.mark.skipif(True, reason="conditional skip") + def test_x(): + assert False + """, + ) + # `skipif` with a truthy condition follows the same path as + # `@pytest.mark.skip`; only the makereport hook records a step. + assert capture.final_status("test_x") == TestStatus.SKIPPED + + +def test_skip_inside_fixture_setup(inner): + # Case: SKIP-04 + _run( + inner, + """ + import pytest + + @pytest.fixture + def skipping_fixture(): + pytest.skip("environment not ready") + + def test_x(skipping_fixture): + assert True + """, + ) + # A setup-phase skip resolves the outer step to SKIPPED. The makereport + # hook must not create a duplicate nested step. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.SKIPPED + duplicates = [s for s in capture.steps_by_name("test_x") if s is not outer] + assert not duplicates, f"expected no duplicate nested step; got {len(duplicates)}" + + +# --------------------------------------------------------------------------- +# xfail / xpass +# --------------------------------------------------------------------------- + + +def test_xfail_marked_test_that_fails(inner): + # Case: XFAIL-01 + _run( + inner, + """ + import pytest + @pytest.mark.xfail(reason="known issue") + def test_x(): + assert 1 == 2 + """, + ) + # xfail + expected failure fulfills the contract; outer step resolves to + # PASSED. No duplicate nested step from the makereport hook. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.PASSED + duplicates = [s for s in capture.steps_by_name("test_x") if s is not outer] + assert not duplicates, f"expected no duplicate nested step; got {len(duplicates)}" + + +def test_xfail_strict_unexpected_pass(inner): + # Case: XFAIL-02 + _run( + inner, + """ + import pytest + @pytest.mark.xfail(strict=True, reason="should fail") + def test_x(): + assert True + """, + ) + # strict xfail that passes must surface as FAILED: either the bug was + # fixed (remove the mark) or the test stopped exercising what it claimed. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.FAILED + + +def test_xfail_non_strict_unexpected_pass(inner): + # Case: XFAIL-03 + _run( + inner, + """ + import pytest + @pytest.mark.xfail(reason="might pass sometimes") + def test_x(): + assert True + """, + ) + # Non-strict xfail does not insist on the failure, so a passing run is + # PASSED. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.PASSED + + +def test_xfail_raises_mismatch(inner): + # Case: XFAIL-04 + _run( + inner, + """ + import pytest + @pytest.mark.xfail(raises=ValueError, reason="expected ValueError") + def test_x(): + raise KeyError("wrong exception") + """, + ) + # `raises=` mismatch is a real test failure — the contract required a + # specific exception type and a different one was thrown. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.FAILED + + +def test_xfail_run_false(inner): + # Case: XFAIL-05 + _run( + inner, + """ + import pytest + @pytest.mark.xfail(run=False, reason="never run") + def test_x(): + assert False + """, + ) + # The test never ran; outer step is SKIPPED. + assert capture.final_status("test_x") == TestStatus.SKIPPED + + +# --------------------------------------------------------------------------- +# Setup-phase / teardown-phase fixture failures +# --------------------------------------------------------------------------- + + +def test_setup_phase_fixture_failure(inner): + # Case: PHASE-01 + _run( + inner, + """ + import pytest + + @pytest.fixture + def bad_setup(): + raise RuntimeError("setup boom") + + def test_x(bad_setup): + assert True + """, + ) + # A fixture that raises before `yield` fails the setup phase. The outer + # step must surface this as ERROR; the test body never executed and a + # silently green step would hide the failure. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.ERROR + + +def test_teardown_phase_fixture_failure(inner): + # Case: PHASE-02 + _run( + inner, + """ + import pytest + + @pytest.fixture + def bad_teardown(): + yield + raise RuntimeError("teardown boom") + + def test_x(bad_teardown): + assert True + """, + ) + # A fixture that raises after `yield` fails the teardown phase. The + # outer step's status reflects the teardown failure as FAILED rather + # than the call-phase pass. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.FAILED + + +def test_call_fail_plus_teardown_fail(inner): + # Case: PHASE-03 + _run( + inner, + """ + import pytest + + @pytest.fixture + def bad_teardown(): + yield + raise RuntimeError("teardown boom") + + def test_x(bad_teardown): + assert 1 == 2 + """, + ) + # Call-phase failure dominates the outer step status; the contract also + # requires the teardown error to be surfaced somewhere on the step + # (mechanism TBD — see pass_fail_behavior.md). This test asserts the + # status today; tighten once a surfacing mechanism is chosen. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.FAILED + + +# --------------------------------------------------------------------------- +# Collection-phase failures +# --------------------------------------------------------------------------- + + +def test_missing_fixture_maps_to_error(inner): + # Case: COLL-01 + _run( + inner, + """ + def test_x(nonexistent_fixture): + assert True + """, + ) + # An unresolved fixture is a setup-phase failure. The outer step + # surfaces as ERROR rather than a misleading green pass for a test + # that never executed. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.ERROR + + +# --------------------------------------------------------------------------- +# Plugin-API exit paths (in-test mutations) +# --------------------------------------------------------------------------- + + +def test_manual_status_update_to_failed(inner): + # Case: API-01 + _run( + inner, + """ + from sift_client.sift_types.test_report import TestStatus + def test_x(step): + step.current_step.update({"status": TestStatus.FAILED}) + """, + ) + assert capture.final_status("test_x") == TestStatus.FAILED + + +def test_report_outcome_false_maps_to_failed(inner): + # Case: API-02 + _run( + inner, + """ + def test_x(step): + step.report_outcome("the_check", False, "did not match") + """, + ) + # Outer step sees a failed substep and rolls up to FAILED. + assert capture.final_status("test_x") == TestStatus.FAILED + + +def test_measure_out_of_bounds_maps_to_failed(inner): + # Case: API-03 + _run( + inner, + """ + def test_x(step): + step.measure(name="m", value=10.0, bounds={"min": 0.0, "max": 5.0}) + """, + ) + assert capture.final_status("test_x") == TestStatus.FAILED + + +def test_substep_failure_propagates_to_parent(inner): + # Case: API-04 + _run( + inner, + """ + def test_x(step): + with step.substep(name="inner") as inner_step: + inner_step.measure(name="m", value=10.0, bounds={"min": 0.0, "max": 5.0}) + """, + ) + # `test_measure_out_of_bounds_maps_to_failed` exercises a failed + # measurement on the function step itself; this one verifies the same + # failure on a nested substep propagates up to the parent. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.FAILED + + +def test_skipped_substep_does_not_fail_parent(inner): + # Case: API-05 + _run( + inner, + """ + from sift_client.sift_types.test_report import TestStatus + def test_x(step): + with step.substep(name="optional_check") as cal: + cal.current_step.update( + {"status": TestStatus.SKIPPED}, + log_file=step.report_context.log_file, + ) + """, + ) + # A manually-resolved SKIPPED on a substep must not propagate as a failure + # to the parent. The outer step has no measurements of its own and resolves + # to PASSED. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.PASSED + + +def test_abort_inside_substep_marks_every_open_step_aborted(inner): + # Case: API-06 + _run( + inner, + """ + import sys + def test_x(step): + with step.substep(name="completed_sub"): + pass + with step.substep(name="outer_sub") as outer_sub: + with outer_sub.substep(name="inner_sub"): + sys.exit(1) + """, + ) + # SystemExit unwinds the substep stack on the way out. Every step that was + # open when the abort fired (inner substep, outer substep, test step) + # must record ABORTED. The sibling substep that closed cleanly before the + # abort must retain its PASSED status. + outer = capture.test_step("test_x") + assert outer is not None + assert outer.statuses[-1] == TestStatus.ABORTED + outer_sub = next(iter(capture.steps_by_name("outer_sub")), None) + inner_sub = next(iter(capture.steps_by_name("inner_sub")), None) + completed_sub = next(iter(capture.steps_by_name("completed_sub")), None) + assert outer_sub is not None + assert inner_sub is not None + assert completed_sub is not None + assert outer_sub.statuses[-1] == TestStatus.ABORTED + assert inner_sub.statuses[-1] == TestStatus.ABORTED + assert completed_sub.statuses[-1] == TestStatus.PASSED diff --git a/python/lib/sift_client/pytest_plugin.py b/python/lib/sift_client/pytest_plugin.py index 7c4c1c2f5..c3b303ac8 100644 --- a/python/lib/sift_client/pytest_plugin.py +++ b/python/lib/sift_client/pytest_plugin.py @@ -5,14 +5,16 @@ from dataclasses import dataclass from datetime import datetime, timezone from pathlib import Path +from types import SimpleNamespace from typing import TYPE_CHECKING, Any, Generator, Tuple import pytest from sift_client import SiftClient, SiftConnectionConfig from sift_client.errors import SiftWarning -from sift_client.sift_types.test_report import TestStatus +from sift_client.sift_types.test_report import ErrorInfo, TestStatus from sift_client.util.test_results import ReportContext +from sift_client.util.test_results.context_manager import format_truncated_traceback class SiftPytestPluginWarning(SiftWarning): @@ -508,17 +510,162 @@ def _resolve_log_file(pytestconfig: pytest.Config | None) -> str | Path | bool | return Path(raw) +def _error_info_from_longrepr(longrepr: Any) -> ErrorInfo: + """Fall back to the report's longrepr when no Python exception is available.""" + return ErrorInfo(error_code=1, error_message=str(longrepr) if longrepr is not None else "") + + +def _resolve_initial_status(new_step: NewStep, item: pytest.Item) -> None: + """Resolve the function step's status from pytest's per-phase reports. + + Reads ``_sift_phase_setup`` / ``_sift_phase_call`` and the test's xfail marker, + then mutates ``new_step.current_step`` in place and flips + ``new_step._sift_managed_externally`` so ``NewStep.__exit__`` emits the + resolved status without re-classifying. + + When the call phase reports ``passed`` and no override is needed (i.e. the + test's own status or substep failures should drive the result), this leaves + the step alone so the default ``__exit__`` resolution stays in charge. + """ + current_step = new_step.current_step + if current_step is None: + # The step never opened (the autouse fixture short-circuited or was + # disabled). Nothing to resolve. + return + setup_phase = getattr(item, "_sift_phase_setup", None) + call_phase = getattr(item, "_sift_phase_call", None) + xfail_marker = item.get_closest_marker("xfail") + xfail_runs = xfail_marker.kwargs.get("run", True) if xfail_marker is not None else True + + status: TestStatus | None = None + error_info: ErrorInfo | None = None + keep_managed = False + + if setup_phase is not None and setup_phase.report.outcome == "failed": + status = TestStatus.ERROR + excinfo = setup_phase.call.excinfo + if excinfo is not None: + error_info = format_truncated_traceback(excinfo.type, excinfo.value, excinfo.tb) + else: + error_info = _error_info_from_longrepr(setup_phase.report.longrepr) + elif setup_phase is not None and setup_phase.report.outcome == "skipped": + status = TestStatus.SKIPPED + elif call_phase is None: + # Setup completed but the call-phase report never fired — the inner + # pytester session was aborted (e.g. by KeyboardInterrupt) before the + # plugin could observe the outcome. Leave the step at IN_PROGRESS so + # the report does not lie about a clean pass. + keep_managed = True + else: + wasxfail = getattr(call_phase.report, "wasxfail", None) + if wasxfail is not None: + if call_phase.report.outcome == "failed": + # Strict xpass: pytest synthesizes a failure when an xfail(strict=True) + # test unexpectedly passes. The xfail mark no longer matches reality. + status = TestStatus.FAILED + elif call_phase.report.outcome == "skipped": + if xfail_marker is not None and xfail_runs is False: + # xfail(run=False): the test body never executed. + status = TestStatus.SKIPPED + else: + # xfail + expected failure: the test fulfilled its xfail expectation. + status = TestStatus.PASSED + else: + # Non-strict xpass: passes that weren't required to fail. + status = TestStatus.PASSED + elif call_phase.report.outcome == "passed": + # Default __exit__ resolves PASSED/FAILED from open_step_results and any + # status the test code may have set. Don't override it here. + return + elif call_phase.report.outcome == "skipped": + status = TestStatus.SKIPPED + elif call_phase.report.outcome == "failed": + excinfo = call_phase.call.excinfo + children_passed = new_step.report_context.open_step_results.get( + current_step.step_path, True + ) + if excinfo is None: + status = TestStatus.FAILED + elif isinstance(excinfo.value, AssertionError): + status = TestStatus.FAILED + elif isinstance(excinfo.value, pytest.fail.Exception): + status = TestStatus.FAILED + elif isinstance(excinfo.value, (KeyboardInterrupt, SystemExit)): + # Hard exits the plugin can observe: pytest converted the + # raise into a call-phase report. The session-aborting variant + # (call_phase is None) lands earlier and stays IN_PROGRESS. + status = TestStatus.ABORTED + error_info = format_truncated_traceback(excinfo.type, excinfo.value, excinfo.tb) + elif xfail_marker is not None: + # xfail(raises=X) with a non-matching exception: the contract failed. + status = TestStatus.FAILED + error_info = format_truncated_traceback(excinfo.type, excinfo.value, excinfo.tb) + elif not children_passed: + # A substep already recorded the error and carries the traceback; + # the test step only inherits the child-failed signal. + status = TestStatus.FAILED + else: + status = TestStatus.ERROR + error_info = format_truncated_traceback(excinfo.type, excinfo.value, excinfo.tb) + + if status is None and not keep_managed: + return + + if status is not None: + # BaseType is frozen; mutate via __dict__ the same way _apply_client_to_instance does. + current_step.__dict__["status"] = status + if error_info is not None: + current_step.__dict__["error_info"] = error_info + new_step._sift_managed_externally = True + + +def _finalize_after_teardown(item: pytest.Item, teardown_report: pytest.TestReport) -> None: + """Upgrade a closed step to FAILED when the teardown phase failed. + + The autouse step fixture has already exited by the time the teardown + makereport hook fires, so call ``step.update`` again to override the status + server-side and propagate the failure to the still-open parent step. + """ + step: NewStep | None = getattr(item, "_sift_step", None) + if step is None: + return + current_step = step.current_step + if current_step is None: + return + if teardown_report.outcome == "failed" and current_step.status == TestStatus.PASSED: + current_step.update({"status": TestStatus.FAILED}) + step.report_context.mark_step_failed_after_close(current_step) + + @pytest.hookimpl(tryfirst=True, hookwrapper=True) def pytest_runtest_makereport(item: pytest.Item, call: pytest.CallInfo[Any]): - """Capture pytest outcomes so assertion failures and skips land on the Sift step.""" + """Capture per-phase reports and finalize step status after teardown. + + Stashes both ``rep_`` (the ``CallInfo``, kept for pytest plugins that + expect that conventional attribute) and ``_sift_phase_`` (a + ``SimpleNamespace(call, report)`` used by ``_resolve_initial_status``). The + collection-time skip path is strictly gated on ``_sift_step`` being unset + so it does not duplicate steps the fixture already created. + """ outcome = yield report = outcome.get_result() - if report.outcome == "skipped": - # Skipped tests bypass the autouse `step` fixture, so we record the step manually here. - if REPORT_CONTEXT: - with REPORT_CONTEXT.new_step(name=item.name) as new_step: - new_step.current_step.update({"status": TestStatus.SKIPPED}) setattr(item, "rep_" + report.when, call) + setattr(item, "_sift_phase_" + report.when, SimpleNamespace(call=call, report=report)) + + # Collection-time skip (``@pytest.mark.skip`` / ``skipif``): the autouse + # ``step`` fixture never runs, so the hook is the only place that can + # record a step. Presence of ``_sift_step`` is the "fixture ran" signal. + if ( + REPORT_CONTEXT + and report.when == "setup" + and report.outcome == "skipped" + and getattr(item, "_sift_step", None) is None + ): + with REPORT_CONTEXT.new_step(name=item.name) as inline_step: + inline_step.current_step.update({"status": TestStatus.SKIPPED}) + + if report.when == "teardown": + _finalize_after_teardown(item, report) def _report_context_impl( @@ -748,13 +895,9 @@ def _step_impl( with report_context.new_step( name=name, description=existing_docstring, assertion_as_fail_not_error=False ) as new_step: + node._sift_step = new_step yield new_step - if hasattr(node, "rep_call") and node.rep_call.excinfo: - new_step.update_step_from_result( - node.rep_call.excinfo, - node.rep_call.excinfo.value, - node.rep_call.excinfo.tb, - ) + _resolve_initial_status(new_step, node) @pytest.fixture(autouse=True) diff --git a/python/lib/sift_client/util/test_results/context_manager.py b/python/lib/sift_client/util/test_results/context_manager.py index bd2ec917f..3454ef5e2 100644 --- a/python/lib/sift_client/util/test_results/context_manager.py +++ b/python/lib/sift_client/util/test_results/context_manager.py @@ -43,6 +43,17 @@ logger = logging.getLogger(__name__) +def format_truncated_traceback( + exc: type[BaseException] | None, + exc_value: BaseException | None, + tb: object | None, +) -> ErrorInfo: + """Format an ErrorInfo from a traceback, keeping the first frame and the last 10.""" + stack = traceback.format_exception(exc, exc_value, tb) # type: ignore[arg-type] + stack = [stack[0], *stack[-10:]] if len(stack) > 10 else stack + return ErrorInfo(error_code=1, error_message="".join(stack)) + + def log_replay_instructions(log_file: str | Path | None) -> None: """Surface replay instructions when an import/replay attempt fails. @@ -363,30 +374,33 @@ def record_step_outcome(self, outcome: bool, step: TestStep): self.open_step_results[step.step_path] = False self.any_failures = True - def resolve_and_propagate_step_result( - self, - step: TestStep, - error_info: ErrorInfo | None = None, - ) -> bool: - """Resolve the result of a step and propagate the result to the parent step if it failed.""" - result = self.open_step_results.get(step.step_path, True) - if error_info: - result = False - if step.status != TestStatus.IN_PROGRESS: - # The step was manually completed so use that result. - # Skipped steps are considered passed. - result = step.status in (TestStatus.PASSED, TestStatus.SKIPPED) - - # Update the parent step results if this step failed (true by default so no need to do anything if we didn't fail). - if not result: + def mark_step_failed_after_close(self, step: TestStep): + """Mark a step's parent as failed after the step has already been popped from the stack. + + Used by the pytest plugin when a teardown-phase report fires after the + fixture's ``__exit__`` has already resolved and exited the step. + """ + self.any_failures = True + path_parts = step.step_path.split(".") + if len(path_parts) > 1: + self.open_step_results[".".join(path_parts[:-1])] = False + + def propagate_step_result(self, step: TestStep, status: TestStatus) -> bool: + """Propagate this step's final status to the parent step. + + Status is the governor: anything outside ``{PASSED, SKIPPED}`` counts + as a failure for the parent. ``error_info`` is intentionally not + consulted here; it is free-form diagnostic data that may sit on a + step regardless of status. + """ + succeeded = status in (TestStatus.PASSED, TestStatus.SKIPPED) + if not succeeded: self.any_failures = True self.open_step_results[step.step_path] = False path_parts = step.step_path.split(".") if len(path_parts) > 1: - parent_step_path = ".".join(path_parts[:-1]) - self.open_step_results[parent_step_path] = False - - return result + self.open_step_results[".".join(path_parts[:-1])] = False + return succeeded def exit_step(self, step: TestStep): """Exit a step and update the report context.""" @@ -407,6 +421,10 @@ class NewStep(AbstractContextManager): client: SiftClient assertion_as_fail_not_error: bool = True current_step: TestStep | None = None + # Set by the pytest plugin's ``_resolve_initial_status`` to signal that + # status was already resolved upstream and ``__exit__`` should skip + # re-classifying. Read via ``getattr`` so unset is treated as False. + _sift_managed_externally: bool = False def __init__( self, @@ -471,34 +489,55 @@ def update_step_from_result( returns: The false if step failed or errored, true otherwise. """ + current_step = self.current_step + if current_step is None: + # The step was never opened; nothing to resolve. Treat as a pass + # so callers that branch on the return value don't see a spurious + # failure. + return True + error_info = None - assert self.current_step is not None + aborted = False + errored = False if exc: if isinstance(exc_value, AssertionError) and not self.assertion_as_fail_not_error: # If we're not showing assertion errors (i.e. pytest), mark step as failed but don't set error info. - self.report_context.record_step_outcome(False, self.current_step) + self.report_context.record_step_outcome(False, current_step) + elif isinstance(exc_value, (KeyboardInterrupt, SystemExit)): + # Hard exit propagating through the substep stack: record as + # ABORTED so every in-progress step on the way out reflects + # the abort rather than coercing to ERROR. + aborted = True + error_info = format_truncated_traceback(exc, exc_value, tb) else: - stack = traceback.format_exception(exc, exc_value, tb) # type: ignore - stack = [stack[0], *stack[-10:]] if len(stack) > 10 else stack - trace = "".join(stack) - error_info = ErrorInfo( - error_code=1, - error_message=trace, - ) - - # Resolve the status of this step (i.e. fail if children failed) and propagate the result to the parent step. - result = self.report_context.resolve_and_propagate_step_result( - self.current_step, error_info - ) - - # Mark the step as completed - status = self.current_step.status + errored = True + error_info = format_truncated_traceback(exc, exc_value, tb) + + # Status is the governor: anything other than IN_PROGRESS was set + # deliberately (manual override, plugin pre-resolution, etc.) and must + # not be silently overwritten by side-channel signals. When the step is + # still IN_PROGRESS, resolve from independent state: aborts first, then + # a child-failed signal (parents inherit FAILED, not the originating + # ERROR), then the step's own captured exception, then the children-pass + # default. error_info is diagnostic and never drives status. + status = current_step.status if status == TestStatus.IN_PROGRESS: - # Update the status only if the step was in progress i.e. not updated elsewhere. - status = TestStatus.PASSED if result else TestStatus.FAILED - if error_info: - status = TestStatus.ERROR - self.current_step.update( + children_passed = self.report_context.open_step_results.get( + current_step.step_path, True + ) + if aborted: + status = TestStatus.ABORTED + elif not children_passed: + status = TestStatus.FAILED + elif errored: + status = TestStatus.ERROR + else: + status = TestStatus.PASSED + + # Propagate based on the resolved status; error_info rides along as + # pure diagnostics and does not affect propagation. + result = self.report_context.propagate_step_result(current_step, status) + current_step.update( { "status": status, "end_time": datetime.now(timezone.utc), @@ -509,6 +548,28 @@ def update_step_from_result( return result def __exit__(self, exc, exc_value, tb): + if getattr(self, "_sift_managed_externally", False): + # The pytest fixture already resolved status from phase reports. + # Propagate based on that resolved status, emit one update_step + # with the resolved values, and pop from the stack without + # re-classifying. + current_step = self.current_step + if current_step is None: + # The step was never opened; nothing to propagate. + return True + result = self.report_context.propagate_step_result(current_step, current_step.status) + current_step.update( + { + "status": current_step.status, + "end_time": datetime.now(timezone.utc), + "error_info": current_step.error_info, + }, + ) + self.report_context.exit_step(current_step) + if hasattr(self, "force_result"): + result = self.force_result + return result + result = self.update_step_from_result(exc, exc_value, tb) # Now that the step is updated. Let the report context handle removing it from the stack and updating the report context. diff --git a/python/mkdocs.yml b/python/mkdocs.yml index 5108b7e4a..af174aa4f 100644 --- a/python/mkdocs.yml +++ b/python/mkdocs.yml @@ -62,6 +62,9 @@ nav: # Will migrate to Guides in the future - Pytest Plugin: examples/pytest_plugin.md - Pytest Plugin Quickstart: examples/pytest_plugin_quickstart.md + - Guides: + - Pytest Plugin: + - Pass/Fail Behavior: guides/pytest_plugin/pass_fail_behavior.md # - Guides: # - Logging # - Error Handling