feat: Fix dockerfile installs, add copyable checkpoint instructions by ncrispino · Pull Request #1058 · massgen/MassGen

ncrispino · 2026-04-13T15:51:29Z

PR Title Format

Your PR title must follow the format: <type>: <brief description>

Valid types:

fix: - Bug fixes
feat: - New features
breaking: - Breaking changes
docs: - Documentation updates
refactor: - Code refactoring
test: - Test additions/modifications
chore: - Maintenance tasks
perf: - Performance improvements
style: - Code style changes
ci: - CI/CD configuration changes

Examples:

fix: resolve memory leak in data processing
feat: add export to CSV functionality
breaking: change API response format
docs: update installation guide

Description

Brief description of the changes in this PR

Type of change

Checklist

I have run pre-commit on my changed files and all checks pass
My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Pre-commit status

# Paste the output of running pre-commit on your changed files:
# uv run pre-commit install
# git diff --name-only HEAD~1 | xargs uv run pre-commit run --files # for last commit
# git diff --name-only origin/<base branch>...HEAD | xargs uv run pre-commit run --files # for all commits in PR
# git add <your file> # if any fixes were applied
# git commit -m "chore: apply pre-commit fixes"
# git push origin <branch-name>

How to Test

Add test method for this PR.

Test CLI Command

Write down the test bash command. If there is pre-requests, please emphasize.

Expected Results

Description/screenshots of expected results.

Additional context

Add any other context about the PR here.

Summary by CodeRabbit

New Features
- Added massgen-checkpoint-setup command to automatically inject checkpoint instructions into project files.
- Implemented automatic secret redaction across logs and persisted events to enhance security.
Bug Fixes
- Improved workflow session handling when prior turn lacks workflow decision.
- Enhanced Docker build process with system-wide uv installation.
Documentation
- Added checkpoint safety guidelines and workflow requirements.

…eckpoint

coderabbitai · 2026-04-13T15:51:46Z

📝 Walkthrough

Walkthrough

This PR introduces comprehensive secret redaction utilities and applies them across event logging, backend output, and logger stream processing. It adds a checkpoint instructions management tool with CLI setup capability, refines Codex backend MCP workflow handling with session resumption logic, and updates Docker build to use lock-file-based dependency resolution.

Changes

Cohort / File(s)	Summary
Secret Redaction Utilities `massgen/utils/redact_secrets.py`	New module implementing regex-based redaction for sensitive key patterns, Bearer tokens, and inline secrets in both free-form text and nested data structures.
Secret Redaction Integration `massgen/events.py`, `massgen/logger_config.py`, `massgen/backend/codex.py`	Applied `redact_secrets()` and `redact_secrets_in_text()` to event persistence, log stream processing, and Codex output/config logging respectively.
Backend Core Changes `massgen/backend/base.py`, `massgen/backend/codex.py`	Added `filter_enforcement_tool_calls()` hook to base class; enhanced Codex backend with `_last_turn_missing_workflow_call` state tracking, `_remove_runtime_mcp_server()` method, and session/MCP server cleanup logic on workflow-enabled resumption.
Checkpoint Instructions Tool `massgen/mcp_tools/standalone/setup_instructions.py`, `massgen/mcp_tools/standalone/checkpoint_instructions.md`, `massgen/mcp_tools/standalone/README.md`	New CLI script for injecting managed checkpoint instruction blocks into markdown files; added documentation on checkpointing behavior and usage examples.
Docker Build Updates `massgen/docker/Dockerfile.sudo`	Replaced per-user `uv` installation with system-wide setup; updated build pipeline to use lock-file-based dependency export and `pip install` with `--no-deps` for source installation.
Test Coverage `massgen/tests/test_codex_mcp_tool_visibility.py`, `massgen/tests/test_codex_reasoning_config.py`, `massgen/tests/test_display_notification_events.py`, `massgen/tests/test_logger_config.py`	Added test cases validating MCP tool prefixing, fresh session handling, secret redaction in events/logging, and `filter_enforcement_tool_calls()` behavior.
Configuration `pyproject.toml`	Registered `massgen-checkpoint-setup` console script entry point.

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

PR #957: Overlaps significantly with secret redaction, event/logging changes, and backend code modifications across massgen/events.py, massgen/logger_config.py, and massgen/backend/codex.py.
PR #764: Modifies the same base backend class (massgen/backend/base.py) with overlapping initialization and configuration logic.
PR #915: Shares subsystem modifications including backend, event logging, secret redaction utilities, and MCP/workflow tool handling patterns.

Suggested Reviewers

a5507203

🚥 Pre-merge checks | ✅ 3 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete; it contains only the template structure with all sections empty or unchecked, providing no actual information about the changes.	Fill in the Description section, check relevant Type of change boxes, complete the Checklist items, provide pre-commit output, and add Testing and Additional context information.
Docstring Coverage	⚠️ Warning	Docstring coverage is 73.81% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Documentation Updated	⚠️ Warning	PR is missing required Google-style docstrings in three test functions and lacks design documentation for complex internal features.	Add docstrings to test_codex_filter_enforcement_tool_calls_defaults_to_all_calls, test_codex_redacts_workspace_config_preview_logs, and test_log_stream_chunk_redacts_secrets. Create design documentation for workflow MCP session recovery and user guide for massgen-checkpoint-setup command.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Capabilities Registry Check	✅ Passed	PR introduces only internal infrastructure improvements to backend classes without adding new models or capabilities.
Config Parameter Sync	✅ Passed	The custom check is conditionally triggered only if new YAML parameters are added. The PR summary does not mention any new YAML parameters being added.
Title check	✅ Passed	The title addresses multiple distinct changes: Docker installation fixes, and checkpoint instruction copying functionality. While partially accurate, it doesn't emphasize the more extensive secret redaction work or filter enforcement tool calls feature that make up a significant portion of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev/v0.1.75-p1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR improves MassGen’s runtime and developer tooling by (1) making the Docker build install dependencies from uv.lock for reproducible containers and (2) adding secret redaction to logs/events plus several Codex workflow-safety behaviors (with tests) to prevent credential leakage and improve session correctness.

Changes:

Add a reusable secret-redaction utility and apply it to stream logging, structured event emission, and Codex diagnostic logs.
Make Dockerfile.sudo install dependencies from uv.lock via uv export to avoid dependency re-resolution drift during image builds.
Add a standalone CLI (massgen-checkpoint-setup) and documentation/templates for injecting managed checkpoint instructions into project markdown files; expand Codex workflow handling + tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`pyproject.toml`	Adds a new CLI script entry for checkpoint setup instructions injection.
`massgen/utils/redact_secrets.py`	Introduces recursive secret redaction for strings and structured data.
`massgen/logger_config.py`	Redacts secrets before logging streamed chunks.
`massgen/events.py`	Redacts secrets before writing `events.jsonl` and notifying listeners.
`massgen/docker/Dockerfile.sudo`	Switches container dependency install to `uv.lock`-based export + pip install for reproducibility.
`massgen/backend/codex.py`	Redacts config/raw-event logs; adjusts workflow MCP naming and session-resume behavior; adds runtime MCP cleanup helper.
`massgen/backend/base.py`	Adds a default `filter_enforcement_tool_calls()` hook for enforcement behavior.
`massgen/mcp_tools/standalone/setup_instructions.py`	New CLI to inject/update a managed checkpoint instructions block in `CLAUDE.md`/`AGENTS.md`.
`massgen/mcp_tools/standalone/README.md`	Documents how to run `massgen-checkpoint-setup`.
`massgen/mcp_tools/standalone/checkpoint_instructions.md`	Provides the managed checkpoint instruction template content.
`massgen/tests/test_logger_config.py`	Adds test coverage ensuring stream logs redact secrets.
`massgen/tests/test_display_notification_events.py`	Adds test coverage ensuring tool events are redacted before JSONL write/read.
`massgen/tests/test_codex_reasoning_config.py`	Adds tests for enforcement filtering default + redacted Codex config preview logs.
`massgen/tests/test_codex_mcp_tool_visibility.py`	Adds tests for server-prefixed workflow tool names, fresh-session enforcement, and runtime MCP cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T15:58:34Z

massgen/utils/redact_secrets.py

+_SENSITIVE_KEY_PATTERN = r'["\']?[A-Za-z][A-Za-z0-9_]*(?:API_KEY|ACCESS_TOKEN|REFRESH_TOKEN|TOKEN|SECRET|PASSWORD|PRIVATE_KEY|AUTHORIZATION)["\']?'
+_QUOTED_ASSIGNMENT_RE = re.compile(
+    rf"(?P<prefix>{_SENSITIVE_KEY_PATTERN}\s*[:=]\s*)(?P<quote>[\"'])(?P<value>.*?)(?P=quote)",
+    re.IGNORECASE,
+)
+_UNQUOTED_ASSIGNMENT_RE = re.compile(
+    rf"(?P<prefix>{_SENSITIVE_KEY_PATTERN}\s*[:=]\s*)(?![\"'])(?P<value>[^\s,]+)",
+    re.IGNORECASE,


_SENSITIVE_KEY_PATTERN as written won’t match keys that are exactly the sensitive suffix (e.g., api_key, token, secret, etc.) because the leading [A-Za-z][A-Za-z0-9_]* part must consume at least one character before the suffix group. This means redact_secrets_in_text() can miss common patterns like api_key=... / token: .... Consider changing the pattern to allow an optional prefix (e.g., (?:[A-Za-z][A-Za-z0-9_]*_)?) before the sensitive suffix, or otherwise ensuring bare suffix keys are matched.

Copilot · 2026-04-13T15:58:35Z

massgen/mcp_tools/standalone/setup_instructions.py

+
+    if MARKER_START in content:
+        # Replace existing managed block
+        content = _BLOCK_RE.sub(block, content, count=1)


apply_instructions() treats any file containing MARKER_START as having a valid managed block. If the file has a stray/partial start marker (missing MARKER_END), _BLOCK_RE.sub(...) will do nothing and the script will silently leave the file unpatched. Consider requiring both markers before replacing, or using subn() and falling back to append/repair (or raising) when no replacement occurs.

Suggested change

if MARKER_START in content:

# Replace existing managed block

content = _BLOCK_RE.sub(block, content, count=1)

has_start = MARKER_START in content

has_end = MARKER_END in content

if has_start and has_end:

# Replace existing managed block

content, replacements = _BLOCK_RE.subn(block, content, count=1)

if replacements != 1:

raise ValueError(

f"{target} contains MassGen checkpoint markers, but no complete "

"managed block could be replaced"

)

elif has_start or has_end:

raise ValueError(

f"{target} contains an incomplete MassGen checkpoint managed block; "

"repair or remove the stray marker before re-running setup"

)

Copilot · 2026-04-13T15:58:35Z

massgen/docker/Dockerfile.sudo

+# Installs from the git repository being built (not PyPI) to ensure version sync.
+#
+# IMPORTANT: We install dependencies from uv.lock (not from pyproject.toml ranges)
+# so the container is guaranteed to use the same transitive dependency versions
+# as the host. Plain `pip install .` re-resolves from PyPI and can pick up newer
+# transitive deps that break compatibility (e.g. fakeredis 2.35 removed the
+# FakeConnection symbol that pydocket 0.18.x imports, which silently broke MCP
+# server startup inside the container).


PR metadata/title focuses on Dockerfile install fixes, but this PR also adds secret redaction across logging/events plus multiple Codex behavior changes and tests. Consider updating the PR title/description (or splitting into separate PRs) so reviewers and release notes accurately reflect the broader scope and risk surface.

Copilot · 2026-04-13T15:58:35Z

massgen/mcp_tools/standalone/setup_instructions.py

+_TEMPLATE_PATH = Path(__file__).parent / "checkpoint_instructions.md"
+
+# Regex that matches the full managed block (markers + content between them).
+# DOTALL so `.` matches newlines.
+_BLOCK_RE = re.compile(
+    re.escape(MARKER_START) + r".*?" + re.escape(MARKER_END),
+    re.DOTALL,
+)
+
+
+def load_template() -> str:
+    """Return the checkpoint instructions template content."""
+    return _TEMPLATE_PATH.read_text(encoding="utf-8")
+


load_template() reads checkpoint_instructions.md from the filesystem next to the module. This will break if the template isn’t included in source distributions (MANIFEST.in currently doesn’t appear to include massgen/mcp_tools/standalone/*.md) or if installation layout changes. Consider loading via importlib.resources (package data) and/or ensuring the template is explicitly included in sdist/wheel packaging.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (3)

massgen/backend/base.py (1)

894-907: Use the repo’s Google-style docstring format for this new hook.

This hook is new public backend surface area, but its docstring skips the usual Args: / Returns: sections used elsewhere in the repo. Please align it with the Python docstring convention here. As per coding guidelines **/*.py: For new or changed functions, include Google-style docstrings.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/base.py` around lines 894 - 907, The docstring for the new
public hook filter_enforcement_tool_calls is missing the repo's Google-style
sections; update its docstring to follow the project's convention by adding an
"Args:" section documenting tool_calls (list[dict[str, Any]]) and
unknown_tool_calls (list[dict[str, Any]]) and a "Returns:" section describing
the returned list[dict[str, Any]]; keep the existing explanatory text but move
it into a short summary paragraph and ensure types and brief descriptions are
included for both parameters and the return value.

massgen/tests/test_display_notification_events.py (1)

191-223: Please cover the listener path too.

EventEmitter.emit() now redacts before notifying listeners, but this test only locks down JSONL persistence and EventReader round-tripping. Adding a listener assertion here would keep the live-callback path from regressing separately.

✅ Minimal extension

 def test_tool_events_redact_secrets_before_writing_jsonl(self, tmp_path: Path):
     emitter = EventEmitter(tmp_path)
+    received = []
+    emitter.add_listener(received.append)
     openai_key = "sk-proj-testsecret1234567890abcdefghijklmnopqrstuvwxyz"
     bearer_token = "AIzaSyTestSecret1234567890abcdefghijklmnop"
@@
     events = reader.read_all()
     assert events[0].data["args"]["OPENAI_API_KEY"] == "[REDACTED]"
     assert events[0].data["args"]["nested"]["token"] == "[REDACTED]"
     assert 'OPENAI_API_KEY = "[REDACTED]"' in events[1].data["result"]
     assert "Authorization: Bearer [REDACTED]" in events[1].data["result"]
+    assert received[0].data["args"]["OPENAI_API_KEY"] == "[REDACTED]"
+    assert "Authorization: Bearer [REDACTED]" in received[1].data["result"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_display_notification_events.py` around lines 191 - 223,
Extend the test_tool_events_redact_secrets_before_writing_jsonl test to also
verify the listener/callback path by registering a listener on the EventEmitter
(attach a callback to the EventEmitter instance before calling
emit_tool_start/emit_tool_complete) that collects events into a local list, then
after emitter.close() assert that the listener-received events contain the
redacted values (e.g. events_list[0].data["args"]["OPENAI_API_KEY"] ==
"[REDACTED]", events_list[0].data["args"]["nested"]["token"] == "[REDACTED]",
and that events_list[1].data["result"] contains 'OPENAI_API_KEY = "[REDACTED]"'
and 'Authorization: Bearer [REDACTED]') to ensure EventEmitter.emit notifies
listeners with redacted data.

massgen/utils/redact_secrets.py (1)

43-99: Add Google-style docstrings to the new redaction helpers.

These helpers are new shared utility surface, but the docstrings are prose-only. Please add Args: / Returns: sections so they match the repo’s Python docstring standard. As per coding guidelines **/*.py: For new or changed functions, include Google-style docstrings.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/utils/redact_secrets.py` around lines 43 - 99, The docstrings for the
new helpers (_normalize_key_name, is_sensitive_key, redact_secrets_in_text,
redact_secrets) are missing Google-style Args/Returns sections; update each
function's docstring to include a brief description plus an "Args:" block that
documents parameters (e.g., key: str, text: str, value: Any) and a "Returns:"
block that describes the return type and meaning (e.g., normalized string, bool,
redacted string, redacted structure), keeping wording concise and consistent
with existing repo style.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@massgen/tests/test_codex_mcp_tool_visibility.py`:
- Around line 107-240: Duplicate the three Codex tests
(test_codex_workflow_instructions_use_server_prefixed_tool_names,
test_codex_forces_fresh_session_after_missing_workflow_tool_call,
test_codex_non_workflow_turn_removes_runtime_workflow_server) to provide parity
for one base_with_custom_tool_and_mcp backend and for the claude_code backend:
replace CodexBackend with the equivalent backend classes (e.g.,
BaseWithCustomToolAndMCPBackend and ClaudeCodeBackend), keep the same
messages/tools, mocking of backend._stream_local, tmp_path usage, and the same
assertions that check AGENTS.md, config.toml mcp_servers entries and
resume_flags; ensure tests import/construct the correct backend classes and run
under pytest.mark.asyncio so regressions in shared/native backends are covered.

In `@massgen/tests/test_codex_reasoning_config.py`:
- Around line 89-90: Add Google-style docstrings to the new test functions
(e.g., test_codex_filter_enforcement_tool_calls_defaults_to_all_calls)
describing the test purpose, setup, and expected outcome; ensure each docstring
is a proper triple-quoted Google-style block under the function definition
summarizing what the test verifies, any key inputs (like the CodexBackend
instantiation), and the expected behavior/assertions, and apply the same change
to the other new/changed test functions referenced in the review.

In `@massgen/tests/test_logger_config.py`:
- Around line 76-77: Add a Google-style docstring to the new test function
test_log_stream_chunk_redacts_secrets describing what the test verifies (that
log_stream chunking redacts secrets); include a one-line summary and optional
short details about behavior under test (no args/returns needed for tests) so
the function complies with the project docstring guideline.
- Around line 80-81: The two hard-coded secret-looking strings openai_key and
bearer_token are tripping Ruff rule S105; mark them as intentional test fixtures
by adding an inline suppression (e.g., append "# noqa: S105") to each assignment
of openai_key and bearer_token in test_logger_config.py, or alternatively
replace them with clearly fake placeholders built at runtime (e.g.,
os.environ.get() fallback) to avoid literal secret patterns; update the
assignments for openai_key and bearer_token accordingly so the linter no longer
flags them.

In `@massgen/utils/redact_secrets.py`:
- Around line 21-29: The current _SENSITIVE_KEY_PATTERN only allows underscores
and alphanumerics after the first letter, so header-style keys with hyphens
(e.g. "X-API-Key") are not matched and thus leak; update _SENSITIVE_KEY_PATTERN
to permit hyphens in the identifier (e.g. change the second character class to
include "-" such as [A-Za-z][A-Za-z0-9_-]* or similar) so that both
_QUOTED_ASSIGNMENT_RE and _UNQUOTED_ASSIGNMENT_RE (which use that pattern) will
correctly match and redact hyphenated header names.

---

Nitpick comments:
In `@massgen/backend/base.py`:
- Around line 894-907: The docstring for the new public hook
filter_enforcement_tool_calls is missing the repo's Google-style sections;
update its docstring to follow the project's convention by adding an "Args:"
section documenting tool_calls (list[dict[str, Any]]) and unknown_tool_calls
(list[dict[str, Any]]) and a "Returns:" section describing the returned
list[dict[str, Any]]; keep the existing explanatory text but move it into a
short summary paragraph and ensure types and brief descriptions are included for
both parameters and the return value.

In `@massgen/tests/test_display_notification_events.py`:
- Around line 191-223: Extend the
test_tool_events_redact_secrets_before_writing_jsonl test to also verify the
listener/callback path by registering a listener on the EventEmitter (attach a
callback to the EventEmitter instance before calling
emit_tool_start/emit_tool_complete) that collects events into a local list, then
after emitter.close() assert that the listener-received events contain the
redacted values (e.g. events_list[0].data["args"]["OPENAI_API_KEY"] ==
"[REDACTED]", events_list[0].data["args"]["nested"]["token"] == "[REDACTED]",
and that events_list[1].data["result"] contains 'OPENAI_API_KEY = "[REDACTED]"'
and 'Authorization: Bearer [REDACTED]') to ensure EventEmitter.emit notifies
listeners with redacted data.

In `@massgen/utils/redact_secrets.py`:
- Around line 43-99: The docstrings for the new helpers (_normalize_key_name,
is_sensitive_key, redact_secrets_in_text, redact_secrets) are missing
Google-style Args/Returns sections; update each function's docstring to include
a brief description plus an "Args:" block that documents parameters (e.g., key:
str, text: str, value: Any) and a "Returns:" block that describes the return
type and meaning (e.g., normalized string, bool, redacted string, redacted
structure), keeping wording concise and consistent with existing repo style.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9bc69c9b-77a3-4295-9cde-b5dc59ef94f9

📥 Commits

Reviewing files that changed from the base of the PR and between c8c9148 and 1213bda.

📒 Files selected for processing (14)

massgen/backend/base.py
massgen/backend/codex.py
massgen/docker/Dockerfile.sudo
massgen/events.py
massgen/logger_config.py
massgen/mcp_tools/standalone/README.md
massgen/mcp_tools/standalone/checkpoint_instructions.md
massgen/mcp_tools/standalone/setup_instructions.py
massgen/tests/test_codex_mcp_tool_visibility.py
massgen/tests/test_codex_reasoning_config.py
massgen/tests/test_display_notification_events.py
massgen/tests/test_logger_config.py
massgen/utils/redact_secrets.py
pyproject.toml

👮 Files not reviewed due to content moderation or server errors (5)

massgen/docker/Dockerfile.sudo
pyproject.toml
massgen/mcp_tools/standalone/README.md
massgen/mcp_tools/standalone/checkpoint_instructions.md
massgen/mcp_tools/standalone/setup_instructions.py

coderabbitai · 2026-04-13T16:01:21Z

massgen/tests/test_codex_mcp_tool_visibility.py

+@pytest.mark.asyncio
+async def test_codex_workflow_instructions_use_server_prefixed_tool_names(
+    tmp_path: Path,
+):
+    """Codex instructions should reference the server-qualified MCP tool handles."""
+    backend = CodexBackend(cwd=str(tmp_path))
+    backend.api_key = "test"
+
+    messages = [
+        {"role": "system", "content": "system instructions"},
+        {"role": "user", "content": "Please complete the task."},
+    ]
+    tools = [
+        {
+            "type": "function",
+            "function": {
+                "name": "new_answer",
+                "description": "Submit your answer",
+            },
+        },
+        {
+            "type": "function",
+            "function": {
+                "name": "vote",
+                "description": "Vote for the best answer",
+            },
+        },
+    ]
+
+    async def mock_stream(prompt, resume_session):  # noqa: ARG001
+        from massgen.backend.base import StreamChunk
+
+        yield StreamChunk(type="done", usage={})
+
+    monkeypatch = pytest.MonkeyPatch()
+    monkeypatch.setattr(backend, "_stream_local", mock_stream)
+    try:
+        async for _ in backend.stream_with_tools(messages, tools):
+            pass
+    finally:
+        monkeypatch.undo()
+
+    agents_md = (tmp_path / ".codex" / "AGENTS.md").read_text(encoding="utf-8")
+    config = tomllib.loads((tmp_path / ".codex" / "config.toml").read_text(encoding="utf-8"))
+
+    assert "`massgen_workflow_tools/new_answer`" in agents_md
+    assert "`massgen_workflow_tools/vote`" in agents_md
+    assert "massgen_workflow_tools" in config.get("mcp_servers", {})
+
+
+@pytest.mark.asyncio
+async def test_codex_forces_fresh_session_after_missing_workflow_tool_call(
+    tmp_path: Path,
+):
+    """If a workflow turn ended without a decision, the next turn should not resume it."""
+    backend = CodexBackend(cwd=str(tmp_path))
+    backend.api_key = "test"
+    backend.session_id = "seed-session"
+
+    messages = [{"role": "user", "content": "Please complete the task."}]
+    tools = [
+        {
+            "type": "function",
+            "function": {
+                "name": "new_answer",
+                "description": "Submit your answer",
+            },
+        },
+    ]
+    resume_flags: list[bool] = []
+
+    async def mock_stream(prompt, resume_session):  # noqa: ARG001
+        from massgen.backend.base import StreamChunk
+
+        resume_flags.append(resume_session)
+        yield StreamChunk(type="content", content="plain text only")
+        yield StreamChunk(type="done", usage={})
+
+    monkeypatch = pytest.MonkeyPatch()
+    monkeypatch.setattr(backend, "_stream_local", mock_stream)
+    try:
+        async for _ in backend.stream_with_tools(messages, tools):
+            pass
+
+        backend.session_id = "next-session"
+        async for _ in backend.stream_with_tools(messages, tools):
+            pass
+    finally:
+        monkeypatch.undo()
+
+    assert resume_flags == [True, False]
+
+
+@pytest.mark.asyncio
+async def test_codex_non_workflow_turn_removes_runtime_workflow_server(
+    tmp_path: Path,
+):
+    """A later non-workflow turn should clear the runtime workflow MCP server."""
+    backend = CodexBackend(cwd=str(tmp_path))
+    backend.api_key = "test"
+
+    messages = [{"role": "user", "content": "Please complete the task."}]
+    workflow_tools = [
+        {
+            "type": "function",
+            "function": {
+                "name": "new_answer",
+                "description": "Submit your answer",
+            },
+        },
+    ]
+
+    async def mock_stream(prompt, resume_session):  # noqa: ARG001
+        from massgen.backend.base import StreamChunk
+
+        yield StreamChunk(type="done", usage={})
+
+    monkeypatch = pytest.MonkeyPatch()
+    monkeypatch.setattr(backend, "_stream_local", mock_stream)
+    try:
+        async for _ in backend.stream_with_tools(messages, workflow_tools):
+            pass
+
+        first_config = tomllib.loads((tmp_path / ".codex" / "config.toml").read_text(encoding="utf-8"))
+        assert "massgen_workflow_tools" in first_config.get("mcp_servers", {})
+
+        async for _ in backend.stream_with_tools(messages, []):
+            pass
+    finally:
+        monkeypatch.undo()
+
+    second_config = tomllib.loads((tmp_path / ".codex" / "config.toml").read_text(encoding="utf-8"))
+    assert "massgen_workflow_tools" not in second_config.get("mcp_servers", {})
+    assert not any(isinstance(server, dict) and server.get("name") == "massgen_workflow_tools" for server in backend.mcp_servers)


⚠️ Potential issue | 🟠 Major

Add backend-parity coverage for this workflow-tool behavior.

These cases lock down the Codex path, but this is still a non-trivial tooling contract. Per repo policy, it also needs parity coverage for at least one base_with_custom_tool_and_mcp backend and claude_code, otherwise regressions in the shared/native backends can slip through.

Based on learnings "Any non-trivial tooling feature should add backend parity tests for at least: one base_with_custom_tool_and_mcp backend, claude_code, and codex" and "Backend parity is critical: do not assume one backend implementation covers all backends; base_with_custom_tool_and_mcp.py changes do not cover native backends like claude_code.py and codex.py".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_codex_mcp_tool_visibility.py` around lines 107 - 240, Duplicate the three Codex tests (test_codex_workflow_instructions_use_server_prefixed_tool_names, test_codex_forces_fresh_session_after_missing_workflow_tool_call, test_codex_non_workflow_turn_removes_runtime_workflow_server) to provide parity for one base_with_custom_tool_and_mcp backend and for the claude_code backend: replace CodexBackend with the equivalent backend classes (e.g., BaseWithCustomToolAndMCPBackend and ClaudeCodeBackend), keep the same messages/tools, mocking of backend._stream_local, tmp_path usage, and the same assertions that check AGENTS.md, config.toml mcp_servers entries and resume_flags; ensure tests import/construct the correct backend classes and run under pytest.mark.asyncio so regressions in shared/native backends are covered.

coderabbitai · 2026-04-13T16:01:21Z

massgen/tests/test_codex_reasoning_config.py

+def test_codex_filter_enforcement_tool_calls_defaults_to_all_calls(tmp_path: Path):
+    backend = CodexBackend(cwd=str(tmp_path))


⚠️ Potential issue | 🟡 Minor

Add docstrings to the new tests.

These new test functions are missing docstrings, and this repo requires docstrings on changed Python functions.

As per coding guidelines "For new or changed functions, include Google-style docstrings".

Also applies to: 236-237

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_codex_reasoning_config.py` around lines 89 - 90, Add Google-style docstrings to the new test functions (e.g., test_codex_filter_enforcement_tool_calls_defaults_to_all_calls) describing the test purpose, setup, and expected outcome; ensure each docstring is a proper triple-quoted Google-style block under the function definition summarizing what the test verifies, any key inputs (like the CodexBackend instantiation), and the expected behavior/assertions, and apply the same change to the other new/changed test functions referenced in the review.

coderabbitai · 2026-04-13T16:01:21Z

massgen/tests/test_logger_config.py

+@pytest.mark.usefixtures("_isolate_test_logs")
+def test_log_stream_chunk_redacts_secrets() -> None:


⚠️ Potential issue | 🟡 Minor

Add a docstring to the new test.

This changed test function is missing the required docstring.

As per coding guidelines "For new or changed functions, include Google-style docstrings".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_logger_config.py` around lines 76 - 77, Add a Google-style docstring to the new test function test_log_stream_chunk_redacts_secrets describing what the test verifies (that log_stream chunking redacts secrets); include a one-line summary and optional short details about behavior under test (no args/returns needed for tests) so the function complies with the project docstring guideline.

coderabbitai · 2026-04-13T16:01:21Z

massgen/tests/test_logger_config.py

+    openai_key = "sk-proj-testsecret1234567890abcdefghijklmnopqrstuvwxyz"
+    bearer_token = "AIzaSyTestSecret1234567890abcdefghijklmnop"


⚠️ Potential issue | 🟠 Major

Silence the intentional fake token so Ruff stays green.

Line 81 is already flagged as S105, so this test will keep failing lint until the literal is marked as an intentional test fixture or otherwise rewritten.

Possible minimal fix

- openai_key = "sk-proj-testsecret1234567890abcdefghijklmnopqrstuvwxyz" - bearer_token = "AIzaSyTestSecret1234567890abcdefghijklmnop" + openai_key = "sk-proj-testsecret1234567890abcdefghijklmnopqrstuvwxyz" # noqa: S105 + bearer_token = "AIzaSyTestSecret1234567890abcdefghijklmnop" # noqa: S105

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

openai_key = "sk-proj-testsecret1234567890abcdefghijklmnopqrstuvwxyz"

bearer_token = "AIzaSyTestSecret1234567890abcdefghijklmnop"

openai_key = "sk-proj-testsecret1234567890abcdefghijklmnopqrstuvwxyz" # noqa: S105

bearer_token = "AIzaSyTestSecret1234567890abcdefghijklmnop" # noqa: S105

🧰 Tools

🪛 Ruff (0.15.9)

[error] 81-81: Possible hardcoded password assigned to: "bearer_token"

(S105)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_logger_config.py` around lines 80 - 81, The two hard-coded secret-looking strings openai_key and bearer_token are tripping Ruff rule S105; mark them as intentional test fixtures by adding an inline suppression (e.g., append "# noqa: S105") to each assignment of openai_key and bearer_token in test_logger_config.py, or alternatively replace them with clearly fake placeholders built at runtime (e.g., os.environ.get() fallback) to avoid literal secret patterns; update the assignments for openai_key and bearer_token accordingly so the linter no longer flags them.

coderabbitai · 2026-04-13T16:01:21Z

massgen/utils/redact_secrets.py

+_SENSITIVE_KEY_PATTERN = r'["\']?[A-Za-z][A-Za-z0-9_]*(?:API_KEY|ACCESS_TOKEN|REFRESH_TOKEN|TOKEN|SECRET|PASSWORD|PRIVATE_KEY|AUTHORIZATION)["\']?'
+_QUOTED_ASSIGNMENT_RE = re.compile(
+    rf"(?P<prefix>{_SENSITIVE_KEY_PATTERN}\s*[:=]\s*)(?P<quote>[\"'])(?P<value>.*?)(?P=quote)",
+    re.IGNORECASE,
+)
+_UNQUOTED_ASSIGNMENT_RE = re.compile(
+    rf"(?P<prefix>{_SENSITIVE_KEY_PATTERN}\s*[:=]\s*)(?![\"'])(?P<value>[^\s,]+)",
+    re.IGNORECASE,
+)


⚠️ Potential issue | 🟠 Major

Hyphenated secret headers still leak in plain-text logs.

_SENSITIVE_KEY_PATTERN only accepts _, so strings like X-API-Key: ... or X-Access-Token=... won't match the assignment regexes and will be logged verbatim. Since this helper is now used on arbitrary log text, that leaves a common header format unredacted.

🔐 Suggested fix

-_SENSITIVE_KEY_PATTERN = r'["\']?[A-Za-z][A-Za-z0-9_]*(?:API_KEY|ACCESS_TOKEN|REFRESH_TOKEN|TOKEN|SECRET|PASSWORD|PRIVATE_KEY|AUTHORIZATION)["\']?' +_SENSITIVE_KEY_PATTERN = ( + r'["\']?[A-Za-z][A-Za-z0-9_-]*' + r'(?:API_KEY|ACCESS_TOKEN|REFRESH_TOKEN|TOKEN|SECRET|PASSWORD|PRIVATE_KEY|AUTHORIZATION)' + r'["\']?' +)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/utils/redact_secrets.py` around lines 21 - 29, The current _SENSITIVE_KEY_PATTERN only allows underscores and alphanumerics after the first letter, so header-style keys with hyphens (e.g. "X-API-Key") are not matched and thus leak; update _SENSITIVE_KEY_PATTERN to permit hyphens in the identifier (e.g. change the second character class to include "-" such as [A-Za-z][A-Za-z0-9_-]* or similar) so that both _QUOTED_ASSIGNMENT_RE and _UNQUOTED_ASSIGNMENT_RE (which use that pattern) will correctly match and redact hyphenated header names.

ncrispino added 3 commits April 11, 2026 15:37

Fix docker dependencies

e7bbaaf

Give custom agent instructions to put in agent memory md files for ch…

cbeb6dc

…eckpoint

Clarify instr

1213bda

Copilot AI review requested due to automatic review settings April 13, 2026 15:51

ncrispino changed the title ~~feat: Fix dockerfile installs~~ feat: Fix dockerfile installs, add copyable instructions Apr 13, 2026

ncrispino changed the title ~~feat: Fix dockerfile installs, add copyable instructions~~ feat: Fix dockerfile installs, add copyable checkpoint instructions Apr 13, 2026

Copilot started reviewing on behalf of ncrispino April 13, 2026 15:52 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

Henry-811 changed the base branch from main to dev/v0.1.76 April 13, 2026 16:04

Henry-811 merged commit ea55ad4 into dev/v0.1.76 Apr 13, 2026
26 checks passed

coderabbitai bot mentioned this pull request Apr 13, 2026

feat: v0.1.76 #1055

Merged

18 tasks

-    if MARKER_START in content:
-        # Replace existing managed block
-        content = _BLOCK_RE.sub(block, content, count=1)
+    has_start = MARKER_START in content
+    has_end = MARKER_END in content
+    if has_start and has_end:
+        # Replace existing managed block
+        content, replacements = _BLOCK_RE.subn(block, content, count=1)
+        if replacements != 1:
+            raise ValueError(
+                f"{target} contains MassGen checkpoint markers, but no complete "
+                "managed block could be replaced"
+            )
+    elif has_start or has_end:
+        raise ValueError(
+            f"{target} contains an incomplete MassGen checkpoint managed block; "
+            "repair or remove the stray marker before re-running setup"
+        )

		def test_codex_filter_enforcement_tool_calls_defaults_to_all_calls(tmp_path: Path):
		backend = CodexBackend(cwd=str(tmp_path))

		@pytest.mark.usefixtures("_isolate_test_logs")
		def test_log_stream_chunk_redacts_secrets() -> None:

		openai_key = "sk-proj-testsecret1234567890abcdefghijklmnopqrstuvwxyz"
		bearer_token = "AIzaSyTestSecret1234567890abcdefghijklmnop"

Conversation

ncrispino commented Apr 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title Format

Description

Type of change

Checklist

Pre-commit status

How to Test

Test CLI Command

Expected Results

Additional context

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

❌ Failed checks (3 warnings)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ncrispino commented Apr 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 13, 2026 •

edited

Loading