Slice 2 — US6 transcript redaction (Phase 8, T037-T039) by brettheap · Pull Request #4 · opensoft/openCloser

brettheap · 2026-05-23T00:11:53Z

Stacks on PR #3.

Summary by Sourcery

Introduce a configurable transcript redaction layer and integrate it into session artifact writing to satisfy US6 privacy requirements.

New Features:

Add a RedactionLayer abstraction with regex-based and no-op policies plus configurable retention modes for transcript handling.

Enhancements:

Route transcript text through the RedactionLayer before writing artifacts, including support for omitting transcript files under summary-only retention while preserving existing session-result contracts.

Documentation:

Update US6 task checklist documentation to mark redaction implementation and tests as complete.

Tests:

Add unit and integration tests to cover redaction behavior, summary-only retention, no-op policy behavior, and malformed redaction policy readiness failures.

…daction Add a default-on RedactionLayer that transforms transcript text before any disk write (FR-028..FR-030; contracts/redaction-layer.md): - RedactionLayer.from_config builds a RegexRedactionPolicy (default [REDACTED]) with built-in email + phone patterns plus user-supplied patterns, or a NoOpPolicy when [redaction] policy = "noop". - Layer carries the retention mode: "full" writes the redacted transcript, "summary-only" writes no transcript file while preserving the session-result summary + transcript-pointer contract from Slice 1. - A malformed user regex raises ValueError at layer construction, surfacing as a readiness failure (SC-009). - writer.write_session_artifacts now accepts an optional redaction_layer and routes transcript text through it before disk write; the artifacts boundary may import opencloser.redaction (test_imports.py updated). - Tests cover redacted-token replacement, summary-only retention, no-op preserving Slice 1 bytes, and the malformed-policy readiness gate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

sourcery-ai · 2026-05-23T00:12:20Z

Reviewer's Guide

Adds a configurable transcript redaction layer and integrates it into session artifact writing, with tests and specs updates to cover regex/no-op policies and summary-only retention behavior.

File-Level Changes

Change	Details	Files
Route transcript writing through a configurable RedactionLayer that supports full vs summary-only retention while preserving the existing artifact contract.	Add an optional redaction_layer parameter to write_session_artifacts and document its behavior. Default to a regex-based RedactionLayer when none is provided, applying redaction before any transcript disk write. Skip writing the transcript file when the layer retention mode is "summary-only", while still emitting session-result JSON and existing artifacts.	`src/opencloser/artifacts/writer.py`
Introduce a redaction module implementing regex-based and no-op redaction policies plus a RedactionLayer builder from configuration.	Define DEFAULT_REPLACEMENT and built-in regex patterns for emails and phone numbers used by RegexRedactionPolicy. Implement NoOpPolicy and RegexRedactionPolicy strategies with validation of user-supplied regex patterns, raising ValueError on invalid patterns. Implement RedactionLayer with retention-mode handling, including default() and from_config() constructors wired to RedactionPolicyConfig.	`src/opencloser/redaction/layer.py`
Extend import-boundary tests to treat the redaction module as part of the artifacts slice boundary.	Register opencloser.redaction as an artifacts-boundary module to enforce import constraints and contract coverage.	`tests/test_imports.py`
Add unit and integration tests covering redaction behavior, configuration, and writer integration for all policies and retention modes.	Add unit tests for policy behavior, default redaction, extra patterns, malformed regex readiness failures, and writer integration with regex, summary-only, and no-op policies. Add integration tests for User Story 6 that exercise redacted transcript output, summary-only omission of transcript files, no-op policy preserving unredacted transcripts, and malformed-pattern readiness failure.	`tests/unit/test_redaction.py` `tests/integration/test_us6_redaction.py`
Update Story 6 task checklist to mark redaction implementation and tests as complete.	Mark T037, T038, and T039 checklist items as completed in the slice specification markdown.	`specs/002-mock-call-real-crm/tasks.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

Copilot

Pull request overview

Adds a transcript redaction boundary and integrates it into artifact writing to support default regex-based redaction, an explicit no-op policy, and a summary-only retention mode.

Changes:

Introduce opencloser.redaction.layer with RedactionLayer, RegexRedactionPolicy, and NoOpPolicy.
Route transcript writes in write_session_artifacts through the redaction layer and omit transcript output under summary-only retention.
Add unit + integration tests for redaction behavior and update dependency-direction allowlists.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/opencloser/redaction/layer.py`	Implements the redaction layer + policies and config construction.
`src/opencloser/artifacts/writer.py`	Applies redaction before transcript writes and supports summary-only transcript omission.
`tests/unit/test_redaction.py`	Unit tests for policy behavior and writer-level behavior.
`tests/integration/test_us6_redaction.py`	US6 acceptance-style integration tests for redaction/retention modes.
`tests/test_imports.py`	Allows the artifacts boundary to import the new redaction boundary.
`specs/002-mock-call-real-crm/tasks.md`	Marks US6 redaction tasks as completed.

+        if effective_layer.retention_mode() == "full":
+            transcript_path = session_dir / _TRANSCRIPT_FILENAME
+            _write_text_atomic(transcript_path, effective_layer.redact(transcript_text))
+        # else summary-only retention (FR-030): no transcript file is written.


    if transcript_text is not None:
-        transcript_path = session_dir / _TRANSCRIPT_FILENAME
-        _write_text_atomic(transcript_path, transcript_text)
+        effective_layer = redaction_layer or RedactionLayer.default()
+        if effective_layer.retention_mode() == "full":
+            transcript_path = session_dir / _TRANSCRIPT_FILENAME
+            _write_text_atomic(transcript_path, effective_layer.redact(transcript_text))


+def test_us6_summary_only_retention_writes_no_transcript(tmp_artifact_dir: Path) -> None:
+    """Spec §Story 6 acceptance: summary-only retention writes NO transcript file;
+    the session-result summary remains."""
+    layer = RedactionLayer.from_config(
+        RedactionPolicyConfig(policy="regex", retention="summary-only")
+    )
+    paths = write_session_artifacts(
+        artifact_root=tmp_artifact_dir,
+        session_id="ses_us6_sum",
+        redaction_layer=layer,
+        **_make_inputs("ses_us6_sum"),
+    )
+    assert paths.transcript is None
+    assert not (tmp_artifact_dir / "ses_us6_sum" / "transcript.txt").exists()
+    sr = paths.session_result.read_text(encoding="utf-8")
+    # Summary preserved; transcript pointer field still present (Slice 1 contract).
+    assert f'"summary": "{_SUMMARY}"' in sr
+


    "artifacts": {
        "opencloser.models",
+        # contracts/redaction-layer.md §Behavior: the writer calls the RedactionLayer
+        # immediately before any transcript disk write (FR-028..FR-030).
+        "opencloser.redaction",
        "opencloser.artifacts",  # intra-boundary submodules
    },


+    inputs = _make_artifact_inputs()
+    layer = RedactionLayer.from_config(
+        RedactionPolicyConfig(policy="regex", retention="summary-only")
+    )
+    paths = write_session_artifacts(
+        artifact_root=tmp_artifact_dir,
+        session_id="ses_red_2",
+        redaction_layer=layer,
+        **inputs,
+    )


+    inputs = _make_artifact_inputs()
+    noop_layer = RedactionLayer(policy=NoOpPolicy(), _retention_mode="full")
+
+    # Write with explicit no-op layer.
+    write_session_artifacts(
+        artifact_root=tmp_artifact_dir / "noop",
+        session_id="ses_x",
+        redaction_layer=noop_layer,
+        **inputs,
+    )


sourcery-ai

Hey - I've found 1 issue

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="src/opencloser/redaction/layer.py" line_range="52-61" />
<code_context>
+            raise ValueError(f"Unknown redaction policy: {config.policy!r}")
+        return cls(policy=policy, _retention_mode=config.retention)
+
+    @classmethod
+    def default(cls) -> RedactionLayer:
+        """Default-on layer: regex policy + full retention (FR-028 default)."""
+        return cls(
</code_context>
<issue_to_address>
**suggestion:** Default layer construction duplicates logic already available via configuration.

`default()` currently hardcodes a `RegexRedactionPolicy` and retention, instead of reusing the config-based path (`from_config`). Consider constructing the default via a single config/builder so the implicit default and `[redaction]` config stay aligned. Otherwise, future changes to `RedactionPolicyConfig` (e.g., replacement text or retention defaults) will require updating both places and risk divergence.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b615b7d975

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T00:49:45Z

    if transcript_text is not None:
-        transcript_path = session_dir / _TRANSCRIPT_FILENAME
-        _write_text_atomic(transcript_path, transcript_text)
+        effective_layer = redaction_layer or RedactionLayer.default()


Wire writer to configured redaction layer

Using RedactionLayer.default() when redaction_layer is omitted causes runtime behavior to ignore config/slice2.toml [redaction] entirely. In the current codebase, orchestrator call sites (src/opencloser/core/orchestrator.py around lines 256 and 601) do not pass redaction_layer, so custom patterns, policy="noop", and retention="summary-only" are never applied, and malformed regexes are not surfaced as readiness failures as intended. This makes the new redaction feature effectively non-configurable in production paths.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-23T00:49:45Z

+        if effective_layer.retention_mode() == "full":
+            transcript_path = session_dir / _TRANSCRIPT_FILENAME
+            _write_text_atomic(transcript_path, effective_layer.redact(transcript_text))
+        # else summary-only retention (FR-030): no transcript file is written.


Encode summary-only retention in session-result artifact

The summary-only branch skips writing transcript.txt but leaves session-result.json unchanged, so consumers cannot tell this was an intentional retention mode and may still see a stale transcript pointer value from normalized_result. FR-030 requires the session-result artifact to retain summary information while also carrying retention-mode semantics; without that marker (and corresponding pointer nulling), downstream tooling cannot distinguish policy-driven omission from an artifact write problem.

Useful? React with 👍 / 👎.

Address Copilot inline findings + SonarCloud duplication on PR #4: - writer.py: under summary-only retention, clear the exported ``transcript_pointer`` BEFORE writing session-result.json so the artifact never advertises a transcript file that the writer is about to skip (contracts/redaction-layer.md §Behavior; FR-030). - writer.py: cache the default RedactionLayer at module level so repeated writes without an explicit layer don't re-compile the built-in patterns. - test_imports.py: update the module docstring to match the allowlist — artifacts may import models AND redaction (the writer calls the layer). - tests/fixtures/artifact_inputs.py: extract the shared ``write_session_artifacts`` input builder used by the artifact-writer, US6 unit, and US6 integration tests. Removes the 16.6%-on-new-code duplication SonarCloud flagged, and keeps the directory ``session_id`` consistent with the JSON-payload ``session_id`` in every test. - Add assertion that summary-only retention writes ``transcript_pointer: null`` in session-result.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc381bda64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T02:28:44Z

+    if transcript_text is not None and not summary_only:
        transcript_path = session_dir / _TRANSCRIPT_FILENAME
-        _write_text_atomic(transcript_path, transcript_text)
+        _write_text_atomic(transcript_path, effective_layer.redact(transcript_text))


Delete existing transcript on summary-only retention

When summary_only is active, the function only skips the write path (if transcript_text is not None and not summary_only) but never removes an existing transcript.txt in session_dir. Because this writer is explicitly designed to re-emit artifacts into an existing per-session directory, a prior run (or retry) can leave a transcript behind and summary-only mode will still return success while sensitive transcript bytes remain on disk, violating the FR-030 retention guarantee.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

+    # Decide retention BEFORE writing session-result, so the exported pointer never
+    # advertises a transcript file that summary-only retention is about to skip
+    # (contracts/redaction-layer.md §Behavior; FR-030).
+    effective_layer = redaction_layer or _DEFAULT_REDACTION_LAYER
+    summary_only = (
+        transcript_text is not None and effective_layer.retention_mode() == "summary-only"
+    )
+    if summary_only and normalized_result.transcript_pointer is not None:
+        normalized_result = normalized_result.model_copy(update={"transcript_pointer": None})


+    """Write all per-session artifacts into ``<artifact_root>/<session_id>/`` atomically.
+
+    Transcript text always passes through the configured ``RedactionLayer`` before disk
+    write (FR-028..FR-030). When the layer's retention mode is ``"summary-only"``, no
+    transcript file is written; the session-result summary and the rest of the Slice 1
+    artifact contract are preserved.


+    @classmethod
+    def from_patterns(
+        cls,
+        extra_patterns: list[str] | tuple[str, ...] = (),
+        *,
+        replacement: str = DEFAULT_REPLACEMENT,
+    ) -> RegexRedactionPolicy:
+        compiled: list[re.Pattern[str]] = []
+        for raw in (*_BUILTIN_PATTERNS, *extra_patterns):
+            try:
+                compiled.append(re.compile(raw))
+            except re.error as exc:
+                raise ValueError(f"Invalid redaction regex {raw!r}: {exc}") from exc
+        return cls(_compiled=tuple(compiled), replacement=replacement)
+
+    def redact(self, transcript_text: str) -> str:
+        out = transcript_text
+        for pattern in self._compiled:
+            out = pattern.sub(self.replacement, out)
+        return out


+        persona_version="alf-appointment-setter@0.1.0",
+        final_disposition=Disposition.INTERESTED_CALLBACK_REQUESTED,
+        summary=summary,
+        transcript_pointer="transcript.txt",


Address Copilot, Codex, and Sourcery findings on cc381bd: - writer.py: clear ``transcript_pointer`` whenever NO transcript file will be written — previously only summary-only retention nulled it, so a caller passing ``transcript_text=None`` with a non-null pointer would emit a dangling pointer (Copilot). - writer.py: under summary-only retention, unlink any pre-existing ``transcript.txt`` from an earlier run so idempotent re-emit cannot leave PII bytes on disk (Codex P1, FR-030 + FR-019). - writer.py: reword the function docstring — each file write is atomic via tempfile + os.replace, but the artifact set as a whole is not transactional (Copilot). - redaction/layer.py: ``RedactionLayer.default()`` now delegates to ``from_config(RedactionPolicyConfig())`` so the implicit and config-driven defaults cannot drift (Sourcery). - tests/fixtures/artifact_inputs.py: ``transcript_pointer`` mirrors ``transcript_text`` presence — no more fixture-built dangling pointers when callers exercise transcript-less paths (Copilot). - New integration tests: stale ``transcript.txt`` is removed when re-emitting under summary-only retention; ``transcript_text=None`` leaves ``transcript_pointer`` null in session-result.json. Deferred (will track separately): - ReDoS guardrails for user-supplied regex patterns (operational hardening). - Codex P1 on orchestrator wiring: the orchestrator does not yet pass a configured ``RedactionLayer`` from ``[redaction]``. The Phase 8 scope was T037-T039 only; the orchestrator wiring is a follow-up that touches ``core/orchestrator.py``, CLI config loading, and Slice2Config plumbing. - Codex P2 on a retention-mode marker in session-result: the now-null ``transcript_pointer`` already serves as that marker; adding a new field would expand the Slice 1 schema beyond what the contract asks for. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

brettheap · 2026-05-23T02:48:19Z

Round-2 review fixes pushed in 9b34631.

Addressed:

Copilot — transcript_pointer now cleared whenever no transcript file is written (not just summary-only). Fixture mirrors transcript_text presence.
Codex P1 — stale transcript.txt from a prior full-retention run is unlinked when re-emitting under summary-only retention (FR-030 + FR-019). New test: test_us6_summary_only_removes_stale_transcript.
Sourcery — RedactionLayer.default() delegates to from_config(RedactionPolicyConfig()) so the implicit default cannot drift from the config-driven default.
Copilot — docstring reworded: per-file atomicity via tempfile + os.replace, not transactional across the artifact set.

Deferred (tracked as follow-ups, not in Phase 8 / T037–T039 scope):

Codex P1 — orchestrator wiring of the configured RedactionLayer. The orchestrator still passes no layer at core/orchestrator.py:256, 601, so production paths use the cached default (regex / full / built-in patterns only). Wiring [redaction] end-to-end requires touching process_one_queue_item, the Slice 2 CLI config loader, and Slice2Config → SliceConfig plumbing — out of the Phase 8 T037–T039 scope explicitly set for this PR. Will be picked up alongside the Slice 2 CLI work.
Copilot — ReDoS guardrails for user-supplied regex patterns. Real concern but operational hardening; deferred to a future redaction-hardening pass (length cap, pattern-complexity validation, or swap to regex lib with timeouts).
Codex P2 — retention-mode marker in session-result. The now-null transcript_pointer already distinguishes policy-driven omission from a write failure (which would leave the prior pointer in place or the JSON unparseable). Adding a separate retention_mode field to NormalizedResult would expand the Slice 1 schema beyond what contracts/redaction-layer.md requires.

All 262 tests pass; redaction module 100% covered.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

+    elif summary_only:
+        # FR-030: idempotent re-emit under summary-only must not leave a stale
+        # transcript on disk from an earlier run with a different retention mode.


+        # Mirror the orchestrator: only advertise a transcript file when one will
+        # actually be written. Keeps the fixture from generating dangling pointers.


Address remaining live findings on PR #4 (most prior Copilot comments were stale re-flags of issues already fixed; the two below are genuinely new): - writer.py: unlink any pre-existing ``transcript.txt`` whenever the writer is not going to write a transcript — previously the stale-file cleanup only ran under summary-only retention, so a re-emit with ``transcript_text=None`` could leave PII bytes on disk while session-result.json reported a null pointer. Now the cleanup runs for both ``transcript_text=None`` and summary-only paths (Copilot). - tests/fixtures/artifact_inputs.py: reword the comment so it no longer overclaims — the fixture only knows about ``transcript_text`` presence; the writer still has the final say on the emitted pointer under summary-only retention (Copilot). - Strengthen ``test_us6_no_transcript_text_nulls_pointer_and_removes_stale`` to also prove the stale-file unlink path (primes the session dir with a prior transcript, re-emits with no transcript text, asserts both the null pointer and the absent file). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

brettheap · 2026-05-23T03:58:57Z

Round 3 pushed in 21b7aca.

Fixed:

writer.py now unlinks any pre-existing transcript.txt whenever a transcript will not be written — previously the cleanup only ran for summary-only retention, so a re-emit with transcript_text=None could leave stale PII on disk while session-result.json reported transcript_pointer: null (Copilot).
tests/fixtures/artifact_inputs.py comment reworded — the fixture only knows transcript_text presence; the writer still has final say on the emitted pointer under summary-only retention (Copilot).
Strengthened test_us6_no_transcript_text_nulls_pointer_and_removes_stale to prime the session dir with a transcript from a prior run and assert it's removed on re-emit.

Stale Copilot re-flags (already addressed in earlier rounds, no action needed):

test_us6_redaction.py:80 — transcript_pointer is None assertion already present at line 79.
test_imports.py:75 — docstring already updated to reference redaction.
writer.py:91 — pointer is already nulled for all non-write cases (not just summary-only).

Still deferred (per previous comment): orchestrator wiring, ReDoS guardrails, and retention-mode marker — explicitly outside the Phase 8 T037–T039 scope set for this PR.

262 tests pass; redaction module 100% covered; SonarCloud + Sourcery green.

Address SonarCloud python:S1192 (CRITICAL) on PR #4: the persona-version literal ``"alf-appointment-setter@0.1.0"`` appeared three times in the shared artifact-inputs fixture. Hoist to a module-private constant ``_PERSONA_VERSION``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

+        # OR no transcript_text supplied), any transcript file from an earlier run
+        # must be removed so session-result.json's null pointer stays consistent with
+        # what is actually on disk and no stale PII is preserved.
+        (session_dir / _TRANSCRIPT_FILENAME).unlink(missing_ok=True)


Address Copilot finding on writer.py:113 (eb962d5): unlink any stale ``transcript.txt`` BEFORE writing ``session-result.json``. Previously the unlink ran after session-result.json was already on disk advertising ``transcript_pointer: null``; if the unlink failed (locked file or permission denied) we would leave the on-disk state both privacy-inconsistent (claimed null, real PII still present) and harder for an operator to detect. Other Copilot comments on this commit were stale re-flags of issues already addressed in earlier rounds (transcript_pointer null assertion, test_imports docstring, pointer-null logic across both summary-only and no-transcript-text paths). ReDoS guardrails remain deferred per the prior PR comment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

+# Cached so repeated calls without an explicit layer don't re-compile the built-in
+# redaction patterns on every session write.
+_DEFAULT_REDACTION_LAYER = RedactionLayer.default()
+


+# Built-in redaction patterns applied by ``RegexRedactionPolicy`` in addition to any
+# user-supplied patterns from ``[redaction] patterns``. Covers common direct-identifier
+# leakage in scripted demo transcripts (phone numbers, emails).
+_BUILTIN_PATTERNS: tuple[str, ...] = (
+    # Email addresses.
+    r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b",
+    # North American phone numbers — at least one separator required between segments
+    # so bare 10-digit IDs are not redacted. Matches: 555-123-4567, (555) 123-4567,
+    # 555.123.4567, +1 555 123 4567.
+    r"(?:\+?\d{1,2}[\s.-])?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}",
+)


Address Copilot finding on layer.py:30 (d9b3acd): the built-in PII regex patterns lived in ``RegexRedactionPolicy`` and were silently prepended to whatever the operator configured, while ``RedactionPolicyConfig.patterns`` defaulted to ``[]``. That meant ``[redaction] patterns = ["mycustom"]`` in the TOML would invisibly include email + phone regexes too, and the contract default ("phone numbers, emails, ...") was code-driven rather than config-driven. Fix: move the default pattern set onto the config as ``BUILTIN_REDACTION_PATTERNS`` (used as the ``patterns`` field's default_factory). ``RegexRedactionPolicy.from_patterns()`` now compiles exactly what it is given — no implicit prepending. Operators who want the built-ins plus extras spell that out explicitly via ``BUILTIN_REDACTION_PATTERNS`` in their config; config and code can no longer drift. Test changes: ``test_regex_policy_match_variants`` now passes ``BUILTIN_REDACTION_PATTERNS`` explicitly (the policy no longer ships them implicitly). The prior ``test_from_config_user_extra_patterns`` is split into two tests covering both the replace and the explicit-extend cases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d0caef45c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T05:10:15Z

+    # North American phone numbers — at least one separator required between segments
+    # so bare 10-digit IDs are not redacted. Matches: 555-123-4567, (555) 123-4567,
+    # 555.123.4567, +1 555 123 4567.
+    r"(?:\+?\d{1,2}[\s.-])?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}",


Redact contiguous 10-digit phone numbers by default

The new built-in phone regex requires separators between all digit groups, so transcripts containing common normalized forms like 5551234567 will not be redacted under the default policy. This weakens FR-028 privacy behavior for deployments relying on defaults, and it also diverges from the project’s documented default patterns in config/slice2.toml/specs/002-mock-call-real-crm/data-model.md that allow optional separators. Any call transcript that captures digits-only phone numbers will currently leak raw PII into transcript.txt.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

+"""Default ``[redaction] patterns`` for the regex policy: email + NA phone.
+
+Lives on the config so the implicit and config-driven defaults cannot diverge —
+``RegexRedactionPolicy`` compiles exactly the patterns it is given, no implicit
+prepending. Operators who supply a custom ``patterns`` list replace these
+defaults outright (set ``policy = "noop"`` to disable redaction entirely)."""
+


Address two new findings on 8d0caef: - Codex P2 (models.py:709 phone regex): the built-in NA-phone pattern required at least one separator between digit groups, so bare 10-digit forms like ``5551234567`` slipped through under the default policy — a privacy gap that also diverged from the documented default in ``config/slice2.toml``. Make the separators OPTIONAL so digits-only forms are also redacted. The trade-off (potential over-match on bare 10-digit IDs) is documented inline; operators with that concern can configure stricter patterns explicitly. - Copilot (models.py:717 "useless expression"): convert the variable docstring under ``_BUILTIN_REDACTION_PATTERNS`` to ``#`` comments so no future ruff B018 / flake8-bugbear config could flag it. (Ruff currently passes either way; this just removes the ambiguity.) Update ``test_regex_policy_match_variants`` so the digits-only phone input now expects 1 redaction instead of 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c276bad4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T05:20:25Z

+    # Decide retention BEFORE writing session-result, so the exported pointer never
+    # advertises a transcript file that the writer is about to skip
+    # (contracts/redaction-layer.md §Behavior; FR-030).
+    effective_layer = redaction_layer or _DEFAULT_REDACTION_LAYER


Wire artifact writer to configured redaction layer

This fallback always selects RedactionLayer.default() when callers omit redaction_layer, which makes runtime behavior ignore operator-provided [redaction] config. I checked the two production write_session_artifacts call sites in src/opencloser/core/orchestrator.py (lines 256-265 and 601-610), and neither passes a layer; additionally, RedactionLayer.from_config(...) is not used anywhere under src/ (rg -n). In practice, custom patterns, policy = "noop", and retention = "summary-only" from config/slice2.toml will not take effect, and malformed user regexes cannot surface as readiness failures.

Useful? React with 👍 / 👎.

Wire the configured RedactionLayer through the orchestrator boundary (Codex P1, re-flagged across multiple rounds). Previously ``process_one_queue_item`` did not accept a ``RedactionLayer`` and neither of its two ``write_session_artifacts`` call sites passed one, so the artifact writer always fell back to its cached default-on layer. Operator- supplied ``[redaction]`` config (custom patterns, ``policy = "noop"``, ``retention = "summary-only"``) had no effect in the production call path, and a malformed user regex could not surface as a readiness failure inside a running session. Changes: - ``process_one_queue_item`` now accepts an optional ``redaction_layer: RedactionLayer | None`` and plumbs it through both ``_finalize_blocked`` and ``_run_allowed_session`` → ``_finalize_session``, reaching ``write_session_artifacts`` at both call sites. Slice 1 callers may omit it — the writer's cached default-on layer keeps PII off disk by default. Slice 2 callers build the layer via ``RedactionLayer.from_config(slice2_config.redaction)``. - ``tests/test_imports.py``: add ``opencloser.redaction`` to the ``core`` boundary's allowlist with a referencing comment. The orchestrator legitimately wires this boundary, mirroring how it already wires ``artifacts``/``transport``/``persona``/``crm``. - Two new integration tests in ``tests/integration/test_us6_redaction.py`` prove the wiring end-to-end: one runs ``process_one_queue_item`` with a default-on layer + a conversation that contains a phone + email and asserts the on-disk transcript shows ``[REDACTED]``; the other runs with a summary-only layer and asserts no transcript file is written and the session-result pointer is null. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

        task=task,
        transcript_text=transcript_text,
        conflicting_events=conflicting_events or None,
+        redaction_layer=redaction_layer,
    )


Close DB/artifact consistency gap flagged by Copilot on orchestrator.py:627 (bac6df1). Previously ``_finalize_session`` persisted the ``NormalizedResult`` to the ``normalized_results`` DB table BEFORE the writer applied the redaction layer's retention decision to ``session-result.json``. Under summary-only retention (or no-transcript-text paths) the writer would null the exported ``transcript_pointer`` and skip / unlink ``transcript.txt`` — but the DB row still advertised ``"transcript.txt"``, leaving consumers to chase a file that would never exist. Fix: in ``_finalize_session``, apply the same retention decision to ``normalized.transcript_pointer`` BEFORE ``store.insert_normalized_result``. The artifact writer makes the same call when it serializes the JSON; the DB and on-disk artifact now agree. Strengthen the summary-only orchestrator integration test to assert the ``normalized_results`` row's ``transcript_pointer`` is also null. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-us6-redaction

sonarqubecloud · 2026-05-23T13:39:28Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Two new Phase 9 polish tasks address the genuine task-level gaps surfaced by the post-checklist /speckit-checklist reverification pass (commit 10b5c30): - T049 (CHK046-CHK052): document the run-report schema in contracts/cli-slice2.md — required fields per spec §Constitution Alignment §Auditability, the artifact format, and the dry-run vs write-enabled field sets. Makes observability requirements first-class. - T050 (CHK068-CHK069): document the write-back progress state machine in data-model.md §1 and the exit-status mapping in contracts/cli-slice2.md — the four resume states with mutually exclusive, exhaustive criteria, allowed transitions, and triggering events. The remaining reverification.md items either validate requirement quality of the existing scope (no new task needed) or are already covered by existing tasks (T017-T048). Re-running /speckit.tasks wholesale would risk renumbering T035-T039 which are referenced in merged PR #4/#5/#6 commits; surgical adds preserve that history. §Task Summary totals updated: 49 -> 51 (Polish 6 -> 8); 28 unique IDs parallel marker count 26 -> 28. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

/#5/#6 + cross-validated MEDIUMs) Closes 3 HIGH findings from the 2026-05-24 swarm code-review pass plus 2 cross-validated MEDIUMs: HIGH #4 (conflict-detection + dataverse-adapter): TOCTOU race between the conflict-check GET and the queue-status PATCH was the exact failure mode T045 was supposed to close. Now closed via Dataverse `If-Match` / `@odata.etag` optimistic concurrency: the baseline captures the row's etag at load time; emit_queue_status_update sends `If-Match: <etag>` on the PATCH; 412 Precondition Failed → CrmConflictError(["@odata.etag"]). HIGH #5 (resume-coordinator + conflict-detection): `_snapshot_resume_baseline` silently swallowed MappingError + TransientDataverseError, bypassing the CHK061 conflict-stop on resume. Now: errors propagate; only "row not found" (`loaded is None`) is a silent skip. Transient/permanent classification is done by the outer resume_session except → RESUME_NEEDED vs BLOCKED. HIGH #6 (conflict-detection): `last_session_id` was not in the baseline, so a concurrent session writing to that column between our load and our PATCH would clobber it undetected. Now captured in QueueBaseline and compared in _check_conflict; mismatch → CrmConflictError. MEDIUM (conflict-detection + dataverse-adapter — cross-validated): status_value None-vs-0 asymmetry. Baseline coerced missing→0; current GET returned None; produced spurious conflicts AND missed real ones. Fixed by typing QueueBaseline.status_value as int|None and dropping the coercion. MEDIUM (dataverse-adapter): deleted row mid-run produced exit_status="failed" instead of "blocked" + conflict_detected. Now `_check_conflict` raises CrmConflictError(["__row_deleted__"]) on missing row. Implementation: - models.py: QueueBaseline.status_value → int|None; new last_session_id + etag fields; preserve_values typed dict[str, Any] (was object). - queue_loader.py: _baseline_from_row captures last_session_id + @odata.etag; raw status (no coercion). - adapter.py: _check_conflict reads last_session_field, compares status raw, raises on deleted-row; emit_queue_status_update sends If-Match header and maps 412 → CrmConflictError. - client.py: DataverseClient.patch accepts optional headers kwarg. - resume.py: _snapshot_resume_baseline re-raises errors; only None skips. - fake.py: emit `@odata.etag` per row on GET (monotonic version counter); honor If-Match on PATCH (412 on mismatch); bump version on every PATCH. Tests added (5 in test_us4_idempotency.py): - test_t045_conflict_on_human_adding_preserve_field_value — None→non-None - test_t045_conflict_on_last_session_id_change — concurrent-session clobber - test_t045_deleted_row_yields_conflict_not_failed — row-deletion semantics - test_t045_if_match_412_raises_conflict_error — TOCTOU/ETag race closed - test_resume_snapshot_failure_re_raises_to_blocked — snapshot 401 → blocked Result: 616 pass (+5 new Pass 1B tests) / 2 pre-existing constitution_sync failures unrelated. ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

All three are factually-wrong package docstrings / comments left over from Slice 2's early-scaffold phase; the underlying code shipped many commits ago but the surrounding prose was never refreshed. 1. `src/opencloser/redaction/__init__.py` — the docstring claimed the package was a "boundary scaffold only" with `RedactionLayer` landing in a later sub-PR (#4). `layer.py` is present and exercised by the `RedactionLayer.from_config` readiness path in `slice2/runner.py`. Updated to describe what ships. 2. `src/opencloser/slice2/__init__.py` — same shape: claimed runner / resume-coordinator land in later sub-PRs (#5, #6). Both `runner.py` and `resume.py` exist and are the surface modules the CLI imports. Rewrote to list the surface modules and their roles. 3. `src/opencloser/core/orchestrator.py:158` — comment promised that `transport.pre_validate_fixture()` raises `MalformedFixtureError`, but the hook can also raise plain `ValueError` from `_resolve_fixture_path` (e.g. path-traversal / invalid fixture_id). Reworded to state both shapes — the operator-visible guarantee (`no session row, no eligibility-decision row, no attempt consumed, no Dataverse queue change` per SC-006) is unchanged. No behavior change. Imports + US5 fixture-validation tests pass. ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 23, 2026 00:11

Copilot started reviewing on behalf of brettheap May 23, 2026 00:12 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

brettheap marked this pull request as ready for review May 23, 2026 00:42

sourcery-ai Bot reviewed May 23, 2026

View reviewed changes

Comment thread src/opencloser/redaction/layer.py

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

brettheap requested a review from Copilot May 23, 2026 02:31

Copilot started reviewing on behalf of brettheap May 23, 2026 02:32 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

brettheap requested a review from Copilot May 23, 2026 02:51

Copilot started reviewing on behalf of brettheap May 23, 2026 02:51 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 23, 2026 04:46

Copilot started reviewing on behalf of brettheap May 23, 2026 04:46 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

brettheap requested a review from Copilot May 23, 2026 04:55

Copilot started reviewing on behalf of brettheap May 23, 2026 04:55 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

brettheap requested a review from Copilot May 23, 2026 05:08

Copilot started reviewing on behalf of brettheap May 23, 2026 05:08 View session

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

Copilot AI reviewed May 23, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 23, 2026 05:27

Copilot started reviewing on behalf of brettheap May 23, 2026 05:27 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Comment thread src/opencloser/core/orchestrator.py

Comment on lines 623 to 627

task=task,

transcript_text=transcript_text,

conflicting_events=conflicting_events or None,

redaction_layer=redaction_layer,

)

brettheap and others added 2 commits May 23, 2026 05:34

Merge remote-tracking branch 'origin/002-mock-call-real-crm' into 002…

a8f1d1b

…-us6-redaction

Copilot AI review requested due to automatic review settings May 23, 2026 13:38

Copilot started reviewing on behalf of brettheap May 23, 2026 13:38 View session

brettheap merged commit c4068aa into 002-mock-call-real-crm May 23, 2026
2 of 3 checks passed

brettheap review requested due to automatic review settings May 23, 2026 13:59

brettheap mentioned this pull request May 24, 2026

Transport: validate fixture at place_call time to avoid partial session state #2

Closed

		# Mirror the orchestrator: only advertise a transcript file when one will
		# actually be written. Keeps the fixture from generating dangling pointers.

Conversation

brettheap commented May 23, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

brettheap commented May 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

brettheap commented May 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

brettheap commented May 23, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented May 23, 2026 •

edited

Loading