-
Notifications
You must be signed in to change notification settings - Fork 0
Updated project with new changes #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis pull request introduces comprehensive infrastructure and feature management enhancements, including mandatory database persistence requirements in spec creation, a complete database layer reorganization with SQLAlchemy models and connection management, new server endpoints for knowledge files and IDE integration, enhanced security and authentication utilities, session event tracking with rate-limit handling in agents, and extensive testing infrastructure alongside frontend UI improvements for IDE selection and project management. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
Note Docstrings generation - SKIPPED |
|
Note Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
1 similar comment
|
Note Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
Docstrings generation was requested by @getworken. * #2 (comment) The following files were modified: * `agent.py` * `api/agent_types.py` * `api/config.py` * `api/connection.py` * `api/dependency_resolver.py` * `api/feature_repository.py` * `api/logging_config.py` * `api/migrations.py` * `api/models.py` * `autonomous_agent_demo.py` * `client.py` * `mcp_server/feature_mcp.py` * `parallel_orchestrator.py` * `progress.py` * `quality_gates.py` * `rate_limit_utils.py` * `security.py` * `server/main.py` * `server/routers/agent.py` * `server/routers/assistant_chat.py` * `server/routers/devserver.py` * `server/routers/features.py` * `server/routers/filesystem.py` * `server/routers/projects.py` * `server/routers/schedules.py` * `server/routers/settings.py` * `server/routers/spec_creation.py` * `server/routers/terminal.py` * `server/schemas.py` * `server/services/assistant_chat_session.py` * `server/services/dev_server_manager.py` * `server/services/expand_chat_session.py` * `server/services/process_manager.py` * `server/services/spec_chat_session.py` * `server/services/terminal_manager.py` * `server/utils/auth.py` * `server/utils/process_utils.py` * `server/utils/validation.py` * `server/websocket.py` * `start_ui.py` * `structured_logging.py` * `tests/conftest.py` * `tests/test_async_examples.py` * `tests/test_repository_and_config.py` * `tests/test_security.py` * `tests/test_security_integration.py` * `ui/src/App.tsx` * `ui/src/components/AssistantPanel.tsx` * `ui/src/components/ConversationHistory.tsx` * `ui/src/components/DebugLogViewer.tsx` * `ui/src/components/IDESelectionModal.tsx` * `ui/src/components/ProjectSelector.tsx` * `ui/src/components/ProjectSetupRequired.tsx` * `ui/src/components/ResetProjectModal.tsx` * `ui/src/components/ScheduleModal.tsx` * `ui/src/components/SettingsModal.tsx` * `ui/src/components/ThemeSelector.tsx` * `ui/src/hooks/useAssistantChat.ts` * `ui/src/hooks/useConversations.ts` * `ui/src/hooks/useProjects.ts` * `ui/src/lib/api.ts`
|
Caution The CodeRabbit agent's plans did not produce any file changes. |
1 similar comment
|
Caution The CodeRabbit agent's plans did not produce any file changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
server/routers/filesystem.py (1)
459-468: Missing normalization before path traversal check.The directory name check for
.and..on line 464 operates on the raw input without NFKC normalization. An attacker could use Unicode fullwidth full stops (U+FF0E.) to bypass this check since..would pass the literal..check.Apply
normalize_nameto the directory name before the security checks for consistency with the normalization applied elsewhere in this file.Proposed fix
# Validate directory name name = request.name.strip() + # Normalize to prevent Unicode bypass attacks + name = normalize_name(name) if not name: raise HTTPException(status_code=400, detail="Directory name cannot be empty")server/utils/process_utils.py (1)
44-156: Taskkill fallback runs after parent termination, rendering it ineffective.The current code (lines 144–156) calls
taskkill /T /PID <pid>only after the parent has already been terminated or force-killed (lines 133–142). At that point, the parent PID no longer exists, andtaskkillwill fail with "ERROR: The process not found"—it cannot traverse a dead parent's process tree. The/Tflag only works when the parent PID currently exists in the process table.Move the taskkill fallback earlier, while the parent still exists:
♻️ Suggested adjustment
gone, still_alive = psutil.wait_procs(children, timeout=timeout) result.children_terminated = len(gone) logger.debug( "Children after graceful wait: %d terminated, %d still alive", len(gone), len(still_alive) ) + # On Windows, use taskkill while the parent still exists if any children remain + if IS_WINDOWS and still_alive: + _kill_windows_process_tree_taskkill(proc.pid) + # Force kill any remaining children for child in still_alive: try: logger.debug("Force-killing child PID %d", child.pid) child.kill() result.children_killed += 1 except (psutil.NoSuchProcess, psutil.AccessDenied) as e: logger.debug("Child PID %d gone during force-kill: %s", child.pid, e) if result.children_killed > 0: result.status = "partial" # Now terminate the parent logger.debug("Terminating parent PID %d", proc.pid) proc.terminate() try: proc.wait(timeout=timeout) logger.debug("Parent PID %d terminated gracefully", proc.pid) except subprocess.TimeoutExpired: logger.debug("Parent PID %d did not terminate, force-killing", proc.pid) proc.kill() proc.wait() result.parent_forcekilled = True result.status = "partial" - # On Windows, use taskkill as a final cleanup to catch any orphans - # that psutil may have missed (e.g., conhost.exe, deeply nested processes) - if IS_WINDOWS: - try: - remaining = psutil.Process(proc.pid).children(recursive=True) - if remaining: - logger.warning( - "Found %d remaining children after psutil cleanup, using taskkill", - len(remaining) - ) - _kill_windows_process_tree_taskkill(proc.pid) - except psutil.NoSuchProcess: - pass # Parent already dead, goodserver/routers/schedules.py (1)
135-145: Addmax_concurrencytoScheduleinitialization.
ScheduleCreateincludesmax_concurrencywith a default of 3, but theSchedule(...)constructor on lines 135-145 doesn't pass it, causing the user-provided value to be silently ignored.Fix
schedule = Schedule( project_name=project_name, start_time=data.start_time, duration_minutes=data.duration_minutes, days_of_week=data.days_of_week, enabled=data.enabled, yolo_mode=data.yolo_mode, model=data.model, + max_concurrency=data.max_concurrency, )server/services/expand_chat_session.py (1)
187-249: Clean up temp files when client initialization fails.If Claude client creation throws, the session exits without calling
close(), leaving the settings/MCP config files behind.🧹 Suggested cleanup on failure
try: self.client = ClaudeSDKClient( options=ClaudeAgentOptions( @@ ) ) await self.client.__aenter__() self._client_entered = True except Exception: logger.exception("Failed to create Claude client") + # Clean up temp files created earlier in start() + if self._settings_file and self._settings_file.exists(): + try: + self._settings_file.unlink() + except Exception as e: + logger.warning(f"Error removing settings file: {e}") + finally: + self._settings_file = None + if self._mcp_config_file and self._mcp_config_file.exists(): + try: + self._mcp_config_file.unlink() + except Exception as e: + logger.warning(f"Error removing MCP config file: {e}") + finally: + self._mcp_config_file = None yield { "type": "error", "content": "Failed to initialize Claude" } return.claude/commands/create-spec.md (1)
231-273: Clarify feature-count guidance for stateless apps.Phase 3b allows a stateless path with no DB/persistence tests, but the ranges and example totals always say “includes 5 infrastructure.” That can mislead users into adding infra features even when database is explicitly “none.” Consider conditional wording or a stateless alternative.
💡 Suggested wording tweak
- - **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure) - - **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure) - - **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure) + - **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure when a database is required) + - **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure when a database is required) + - **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure when a database is required) - > - **Infrastructure (required)**: 5 features (database setup, persistence verification) + > - **Infrastructure (required when database is needed)**: 5 features (database setup, persistence verification) - > **Total: ~N features** (including 5 infrastructure) + > **Total: ~N features** (including infrastructure when applicable)
🤖 Fix all issues with AI agents
In @.claude/commands/expand-project.md:
- Around line 173-190: The fallback currently embeds a JSON array inside an XML
tag (<features_to_create>), which is invalid; replace the hybrid with a
correctly structured XML payload by converting the JSON objects into <feature>
elements (with child tags <category>, <name>, <description>, and a <steps>
container with multiple <step> entries) inside <features_to_create>, or—if the
system expects raw JSON—remove the <features_to_create> wrapper and document the
fallback as JSON; update the documentation text to explicitly state which format
the parser expects and show the corresponding valid example using either the
<features_to_create> XML structure or a plain JSON array.
In `@agent.py`:
- Around line 182-205: The current parent-process detection calls
clear_stuck_features() in both ImportError and general Exception handlers,
reintroducing the parallel-mode race; change the logic so
clear_stuck_features(project_dir) is only invoked when psutil import and parent
detection succeed and the parent_name check confirms we're not running under the
orchestrator (i.e., "python" not in parent_name.lower()); on ImportError or any
exception during detection, do NOT call clear_stuck_features — instead skip
clearing (optionally log a warning) so we avoid unsafe clearing when detection
fails; update the block around the psutil import/parent_process/name check and
the try/except handlers (references: psutil, parent_process, parent_name,
clear_stuck_features) accordingly.
In `@client.py`:
- Around line 47-51: The unconditional DEFAULT_MAX_OUTPUT_TOKENS = "131072"
should be set only when an alternative Anthropic-compatible API is configured;
change the assignment to check os.getenv("ANTHROPIC_BASE_URL") (or equivalent
config) and only set DEFAULT_MAX_OUTPUT_TOKENS to "131072" when
ANTHROPIC_BASE_URL is present, otherwise set a safer default (or leave
unset/None) to avoid exceeding standard Anthropic model limits; apply this same
guarded assignment change for DEFAULT_MAX_OUTPUT_TOKENS in client.py and the
same lines in server/services/spec_chat_session.py,
server/services/expand_chat_session.py, and
server/services/assistant_chat_session.py, referencing the
DEFAULT_MAX_OUTPUT_TOKENS constant and the ANTHROPIC_BASE_URL environment/config
variable.
In `@registry.py`:
- Around line 31-56: VALID_MODELS is currently built from CLAUDE_MODELS and
won’t include Ollama IDs when AVAILABLE_MODELS is swapped; update the code so
VALID_MODELS is derived from AVAILABLE_MODELS (or recomputed whenever
AVAILABLE_MODELS changes) instead of CLAUDE_MODELS. Locate the constants
VALID_MODELS and AVAILABLE_MODELS and change the list comprehension that builds
VALID_MODELS to use AVAILABLE_MODELS (or add a small helper/function to refresh
VALID_MODELS after detecting Ollama) so validators accept the actual active
model IDs; ensure DEFAULT_OLLAMA_MODEL and DEFAULT_MODEL remain unchanged.
In `@security.py`:
- Around line 68-85: Replace the partial previews in the logging block so no
parts of command/reason are logged: stop calling redact_string for previews and
instead log only the deterministic hashes (command_hash, reason_hash) and the
lengths (len(command), len(reason)) or a fixed masked string; update the
logger.info call to include only those hashes/lengths or fully masked
placeholders and remove any trailing whitespace on blank lines in this block
(affecting the lines around command_hash/reason_hash, redact_string, and
logger.info) to fix CI W293.
In `@server/routers/projects.py`:
- Around line 710-734: The upload_knowledge_file endpoint uses file.filename
directly to build filepath and is vulnerable to path traversal; validate and
sanitize the filename before writing: ensure filename is non-empty, contains no
path separators or "..", and is not absolute (you can get a safe name via
pathlib.Path(file.filename).name or os.path.basename and compare to the original
to detect tampering), reject names with null bytes or dangerous characters, then
construct filepath = knowledge_dir / safe_name and verify
filepath.resolve().is_relative_to(knowledge_dir.resolve()) (or compare parents)
before writing; on invalid names return HTTPException(status_code=400) and keep
the rest of upload_knowledge_file, KnowledgeFileUpload, get_knowledge_dir
references intact.
In `@server/services/expand_chat_session.py`:
- Around line 379-417: The code sets mcp_tool_succeeded = True as soon as a
ToolResult/ToolResultBlock with "feature_create_bulk" is seen, preventing the
XML fallback even when the tool errored or parsing failed; change the logic in
the ToolResult/ToolResultBlock branch to first check getattr(block, "is_error",
False) and skip processing if true, then attempt JSON parsing of content and
extract created_features, and only after successfully extracting non-empty
created_features (and appending ids to self.created_feature_ids and incrementing
self.features_created) set mcp_tool_succeeded = True; ensure any
JSONDecodeError/AttributeError keeps mcp_tool_succeeded False so the subsequent
XML fallback runs.
In `@server/services/process_manager.py`:
- Around line 241-268: The current logic unconditionally deletes the lock when
self.pid is None, which can remove a valid lock owned by a still-running
process; instead, when our_pid is None, read and parse the lock file into
lock_pid (as done below) and then verify liveness of lock_pid (e.g.,
os.kill(lock_pid, 0) on Unix or psutil.pid_exists/psutil.Process to check alive
and optionally inspect cmdline) before removing it; only call
self.lock_file.unlink(...) if the lock_pid is not alive or the process
command/identity does not match the expected agent, otherwise leave the lock in
place and log a warning.
In `@tests/test_security.py`:
- Around line 75-123: Tests like test_extract_commands and helpers like
check_hook currently only print and return values so pytest treats them as
passing; update each test_* function (e.g., test_extract_commands and any other
test_* in this file) to assert on failures instead of returning counts by adding
an assertion such as assert failed == 0 (or assert passed == expected_count) at
the end, and update check_hook to raise an AssertionError or return a boolean
used by an assert in callers (e.g., assert check_hook(cmd, should_block) is
True) so any mismatch fails the test run; ensure each test uses pytest-style
assertions rather than print+return.
🟡 Minor comments (19)
.github/workflows/ci.yml-21-22 (1)
21-22: Ensure the full test suite runs in CI.The workflow currently runs only security tests (
tests/test_security.pyandtests/test_security_integration.py). No other test execution exists in the CI pipeline. Add a step to run the full test suite (or confirm if limited coverage is intentional).start_ui.sh-33-37 (1)
33-37: DefineSCRIPT_DIRbefore using it.
$SCRIPT_DIRisn’t set, so the pre-activation check is likely ineffective (or points to/venv).🧩 Suggested fix
cd "$(dirname "$0")" +SCRIPT_DIR="$(pwd)" # AutoCoder UI Launcher for Unix/Linux/macOS @@ # Activate virtual environment if it exists if [ -d "$SCRIPT_DIR/venv" ]; then.claude/templates/testing_prompt.template.md-24-27 (1)
24-27: Add a language tag to the fenced MCP block.
markdownlint MD040 expects a language identifier;textkeeps it neutral here.🛠️ Proposed fix
-``` +```text # 4. Get progress statistics Use the feature_get_stats tool</details> </blockquote></details> <details> <summary>server/main.py-168-182 (1)</summary><blockquote> `168-182`: **Handle malformed base64 in Basic Auth decoding.** `base64.b64decode()` can raise `binascii.Error` (and will for `validate=True`). Without catching it, malformed headers can bubble into 500s. <details> <summary>🛡️ Safer decode + explicit error handling</summary> ```diff -import base64 +import base64 +import binascii @@ - decoded = base64.b64decode(encoded_credentials).decode("utf-8") + decoded = base64.b64decode(encoded_credentials, validate=True).decode("utf-8") username, password = decoded.split(":", 1) @@ - except (ValueError, UnicodeDecodeError): + except (ValueError, UnicodeDecodeError, binascii.Error): return Response( status_code=401, content="Invalid authorization header", headers={"WWW-Authenticate": 'Basic realm="Autocoder"'}, ).claude/templates/coding_prompt.template.md-33-38 (1)
33-38: Add a language tag to the MCP tools snippet (MD040).Line 35 triggers markdownlint; mark this fence as
textorbash.✏️ Suggested fix
-``` +```text # 5. Get progress statistics Use the feature_get_stats tool</details> </blockquote></details> <details> <summary>.claude/templates/coding_prompt.template.md-472-479 (1)</summary><blockquote> `472-479`: **Align token‑efficiency guidance with the allowed tools list.** The Token Efficiency section tells agents to use `feature_get_summary`, but the “ALLOWED Feature Tools” list doesn’t include it. Either add it to the allowed list or swap the guidance to an allowed tool (e.g., `feature_get_stats`). <details> <summary>✏️ Suggested fix</summary> ```diff -- **Use `feature_get_summary`** for status checks (lighter than `feature_get_by_id`) +- **Use `feature_get_stats`** for status checks (lighter than `feature_get_by_id`).claude/templates/initializer_prompt.template.md-45-78 (1)
45-78: Add a language tag to the feature_create_bulk example (MD040).markdownlint flags Line 45. Add
json(ortext) to the fenced block.✏️ Suggested fix
-``` +```json Use the feature_create_bulk tool with features=[ { "category": "functional", "name": "Brief feature name",security.py-9-18 (1)
9-18: Fix import ordering per CI (I001).CI reports the import block is unsorted at Line 9. Reorder standard imports to satisfy the linter.
🧹 Suggested fix
-import logging -import hashlib +import hashlib +import logging import os import re import shlex import threadingserver/routers/projects.py-461-517 (1)
461-517: Inconsistent use ofcmdvscmd_pathon Unix.On Windows (line 507), the code correctly uses
cmd_path(the resolved executable path). On Unix (line 510), it usescmd(the command name string). While this may work due tostart_new_session=Trueand shell PATH resolution, it's inconsistent and could cause issues if the executable isn't in PATH for the subprocess.Proposed fix for consistency
if sys.platform == "win32": subprocess.Popen([cmd_path, project_path]) else: # Unix-like systems - subprocess.Popen([cmd, project_path], start_new_session=True) + subprocess.Popen([cmd_path, project_path], start_new_session=True)mcp_server/feature_mcp.py-616-624 (1)
616-624: Inconsistent behavior withfeature_mark_passingregardinglast_failed_at.When
tested_ok=True, this function clearslast_failed_at(line 619), butfeature_mark_passing(lines 268-275) does not clearlast_failed_at. This inconsistency could lead to confusion when querying feature history.Consider aligning behavior: either both functions should clear
last_failed_atwhen marking passing, or neither should.mcp_server/feature_mcp.py-1484-1486 (1)
1484-1486:last_failed_atupdated on any error log, even for non-failure scenarios.Logging an error (e.g.,
lint_error) updateslast_failed_at, but the feature might still be passing. This could misrepresent when the feature actually failed a test. Consider only updatinglast_failed_atwhen the error type indicates an actual feature failure (e.g.,test_failure), or make this behavior opt-in via a parameter.mcp_server/feature_mcp.py-614-614 (1)
614-614: Fix whitespace issues flagged by CI.The CI pipeline flagged whitespace issues on this line and others (lines 625, 1002, 1025). Remove trailing whitespace and whitespace on blank lines.
structured_logging.py-436-439 (1)
436-439: Usedatetime.now(timezone.utc)instead of deprecateddatetime.utcnow().
datetime.utcnow()is deprecated in Python 3.12+. Usedatetime.now(timezone.utc)for consistency with the rest of the codebase:# Default to last 24 hours if not since: - since = datetime.utcnow() - timedelta(hours=24) + since = datetime.now(timezone.utc) - timedelta(hours=24) if not until: - until = datetime.utcnow() + until = datetime.now(timezone.utc)ui/src/components/ResetProjectModal.tsx-28-44 (1)
28-44: Block close interactions while a reset is pending.Backdrop and the header X still close the modal while
resetProject.isPendingis true, which can hide in-flight status/errors.🔒 Suggested guard
- <div className="neo-modal-backdrop" onClick={onClose}> + <div className="neo-modal-backdrop" onClick={resetProject.isPending ? undefined : onClose}> ... - <button - onClick={onClose} - className="neo-btn neo-btn-ghost p-2" - > + <button + onClick={onClose} + disabled={resetProject.isPending} + className="neo-btn neo-btn-ghost p-2" + > <X size={24} /> </button>ui/src/components/ResetProjectModal.tsx-16-25 (1)
16-25: Handle non-throwing reset failures.
resetProjectreturns{ success, message }; ifsuccessis false but HTTP 200, the modal closes without surfacing the failure.🐛 Suggested handling
- await resetProject.mutateAsync({ name: projectName, fullReset }) - onReset?.() - onClose() + const result = await resetProject.mutateAsync({ name: projectName, fullReset }) + if (!result.success) { + setError(result.message || 'Failed to reset project') + return + } + onReset?.() + onClose()tests/test_security.py-10-31 (1)
10-31: CI formatting failures: import order + trailing whitespace.The pipeline flagged I001 and W293. Please re‑run the formatter (or manually fix import ordering and whitespace-only blank lines).
Also applies to: 67-81
server/services/assistant_chat_session.py-246-256 (1)
246-256: Unusedskip_greetingparameter.The
skip_greetingparameter is documented but never actually used in the method body. The greeting decision at line 369 is based onis_new_conversation, notskip_greeting. Either remove the parameter or implement the intended behavior.Option 1: Remove unused parameter
- async def start(self, skip_greeting: bool = False) -> AsyncGenerator[dict, None]: + async def start(self) -> AsyncGenerator[dict, None]: """ Initialize session with the Claude client. Creates a new conversation if none exists, then sends an initial greeting. For resumed conversations, skips the greeting since history is loaded from DB. Yields message chunks as they stream in. - - Args: - skip_greeting: If True, skip sending the greeting (for resuming conversations) """Option 2: Use the parameter
# Send initial greeting only for NEW conversations # Resumed conversations already have history loaded from the database - if is_new_conversation: + if is_new_conversation and not skip_greeting:server/services/assistant_chat_session.py-304-307 (1)
304-307: MCP config file lacks unique naming and cleanup.Unlike
expand_chat_session.pywhich uses{uuid.uuid4().hex}in the filename to prevent conflicts between concurrent sessions, this implementation uses a fixed filename.claude_mcp_config.json. Additionally, the file is not tracked for cleanup in theclose()method.Suggested fix for unique naming and cleanup
+import uuid + class AssistantChatSession: def __init__(self, project_name: str, project_dir: Path, conversation_id: Optional[int] = None): ... self._history_loaded: bool = False + self._mcp_config_file: Optional[Path] = None # Track for cleanup async def close(self) -> None: """Clean up resources and close the Claude client.""" if self.client and self._client_entered: try: await self.client.__aexit__(None, None, None) except Exception as e: logger.warning(f"Error closing Claude client: {e}") finally: self._client_entered = False self.client = None + # Clean up MCP config file + if self._mcp_config_file and self._mcp_config_file.exists(): + try: + self._mcp_config_file.unlink() + except Exception as e: + logger.warning(f"Error removing MCP config file: {e}") async def start(self, skip_greeting: bool = False) -> AsyncGenerator[dict, None]: ... - mcp_config_file = self.project_dir / ".claude_mcp_config.json" + mcp_config_file = self.project_dir / f".claude_mcp_config.assistant.{uuid.uuid4().hex}.json" + self._mcp_config_file = mcp_config_fileapi/logging_config.py-63-66 (1)
63-66: Race condition in_logging_configuredflag check.The check-then-act on
_logging_configuredis not thread-safe. If two threads callsetup_logging()simultaneously, both could pass the check before either sets the flag, leading to duplicate handlers.🔒 Suggested fix using threading.Lock
+import threading + # Track if logging has been configured _logging_configured = False +_logging_lock = threading.Lock() def setup_logging( log_dir: Optional[Path] = None, log_file: str = DEFAULT_LOG_FILE, console_level: int = DEFAULT_CONSOLE_LOG_LEVEL, file_level: int = DEFAULT_FILE_LOG_LEVEL, root_level: int = DEFAULT_LOG_LEVEL, ) -> None: ... global _logging_configured - if _logging_configured: - return + with _logging_lock: + if _logging_configured: + return + # ... rest of the function ... + _logging_configured = True
🧹 Nitpick comments (30)
ui/src/components/ProjectSetupRequired.tsx (2)
48-52: SimplifyhandleRetryInitializerby removing redundant status update.Setting
initializerStatusto'idle'is unnecessary sincehandleSpecCompleteimmediately sets it to'starting'. Also, consider making this function async for clarity.♻️ Suggested refactor
- const handleRetryInitializer = () => { + const handleRetryInitializer = async () => { setInitializerError(null) - setInitializerStatus('idle') - handleSpecComplete('', yoloModeSelected) + await handleSpecComplete('', yoloModeSelected) }
161-172: Consider adding ARIA attributes for better accessibility.The error notification should announce to screen readers. Adding
role="alert"ensures the error message is immediately announced when it appears.♿ Suggested accessibility improvement
{initializerError && ( - <div className="mt-6 p-4 bg-[var(--color-neo-danger)] text-white border-3 border-[var(--color-neo-border)]"> + <div + role="alert" + className="mt-6 p-4 bg-[var(--color-neo-danger)] text-white border-3 border-[var(--color-neo-border)]" + > <p className="font-bold mb-2">Failed to start agent</p> <p className="text-sm">{initializerError}</p>pyproject.toml (1)
19-28: Consider scoping DeprecationWarning filters to specific modules or tests.Globally ignoring
DeprecationWarningcan mask important upgrade signals from dependencies. Theasyncio_default_fixture_loop_scopeoption is compatible with the pinned pytest-asyncio version (0.24+, where this feature was introduced). However, applying warning filters selectively (by module or test marker) is safer and helps catch dependency upgrade issues early.start_ui.py (1)
153-159: Guardnode_moduleschecks against non-directory or unreadable paths.
node_modules.iterdir()will raise if the path is a file or unreadable, which aborts startup. Treat those cases as “needs install” to keep the launcher resilient.♻️ Suggested hardening
- if not node_modules.exists(): + if not node_modules.exists() or not node_modules.is_dir(): needs_install = True - elif not any(node_modules.iterdir()): - # Treat empty node_modules as stale (failed/partial install) - needs_install = True - print(" Note: node_modules is empty, reinstalling...") + else: + try: + if not any(node_modules.iterdir()): + # Treat empty node_modules as stale (failed/partial install) + needs_install = True + print(" Note: node_modules is empty, reinstalling...") + except OSError: + needs_install = True + print(" Note: node_modules is not accessible, reinstalling...")ui/src/components/ErrorBoundary.tsx (1)
67-79: Consider adding an alert role for screen readers.
Since the fallback replaces the entire UI, a live region helps announce the error state to assistive tech.♿ Optional tweak
- <div className="max-w-2xl w-full bg-white dark:bg-gray-800 rounded-lg shadow-lg p-6"> + <div + className="max-w-2xl w-full bg-white dark:bg-gray-800 rounded-lg shadow-lg p-6" + role="alert" + aria-live="assertive" + >server/utils/auth.py (1)
82-104: Reject malformed base64 explicitly.
base64.b64decode()withoutvalidate=Truecan silently accept junk characters. Tightening this avoids ambiguous parsing.🔍 Tighten base64 decoding
- decoded = base64.b64decode(encoded).decode("utf-8") + decoded = base64.b64decode(encoded, validate=True).decode("utf-8") @@ - decoded = base64.b64decode(token).decode("utf-8") + decoded = base64.b64decode(token, validate=True).decode("utf-8")server/routers/settings.py (1)
174-178: Consider returning the count of cleared commands.The
clear_denied_commands()function returns the count of cleared commands, but this value is discarded. Returning it in the response would provide useful feedback to the caller.Proposed enhancement
`@router.delete`("/denied-commands") async def clear_denied_commands_list(): """Clear the denied commands history.""" - clear_denied_commands() - return {"status": "cleared"} + count = clear_denied_commands() + return {"status": "cleared", "count": count}quality_gates.py (2)
39-71: Move thetimeimport to module level.The
import timestatement inside_run_commandshould be moved to the top of the file with other imports. Function-level imports add overhead on each call and can make dependencies harder to track.Proposed fix
At the top of the file (around line 18):
from datetime import datetime from pathlib import Path from typing import TypedDict +import timeIn the function (line 51):
def _run_command(cmd: list[str], cwd: Path, timeout: int = 60) -> tuple[int, str, int]: ... - import time start = time.time()
292-353: Consider usingdatetime.now(timezone.utc)instead ofdatetime.utcnow().
datetime.utcnow()is deprecated in Python 3.12+ and returns a naive datetime without timezone info. Usingdatetime.now(timezone.utc)returns a timezone-aware datetime and is the recommended approach.Proposed fix
At the top of the file:
-from datetime import datetime +from datetime import datetime, timezoneIn
verify_quality(line 350):- "timestamp": datetime.utcnow().isoformat(), + "timestamp": datetime.now(timezone.utc).isoformat(),server/routers/projects.py (1)
641-675: Move thedatetimeimport to module level.The
from datetime import datetimeimport on line 665 inside the function should be at the module level. This causes import overhead on every call and is inconsistent with Python best practices.Proposed fix
At the top of the file with other imports:
import re import shutil import subprocess import sys +from datetime import datetime from pathlib import PathRemove the inline import on line 665.
autonomous_agent_demo.py (1)
57-87: Extractsafe_asyncio_runto a shared utility module.This function is duplicated verbatim across
autonomous_agent_demo.py(lines 57-87) andparallel_orchestrator.py(lines 93-123). Extract it toserver/utils/async_utils.py(following the existing pattern ofprocess_utils.py) to reduce maintenance burden and prevent future divergence.Both files set the Windows event loop policy before imports, so ensure the policy is applied in the entry point (autonomous_agent_demo.py) before importing the new utility module.
mcp_server/feature_mcp.py (2)
48-50: Consider importing_utc_nowfromapi.modelsinstead of duplicating it.This function is identical to the one defined in
api/models.py(lines 27-29). Importing it would reduce duplication and ensure consistency.-def _utc_now() -> datetime: - """Return current UTC time.""" - return datetime.now(timezone.utc) +from api.models import _utc_now
1398-1411: Consider consolidating statistics queries into a single aggregation.The three separate COUNT queries (total, success, failure) could be combined into one query using conditional aggregation, similar to the pattern used in
feature_get_stats:from sqlalchemy import case, func result = session.query( func.count(FeatureAttempt.id).label('total'), func.sum(case((FeatureAttempt.outcome == "success", 1), else_=0)).label('success'), func.sum(case((FeatureAttempt.outcome == "failure", 1), else_=0)).label('failure') ).filter(FeatureAttempt.feature_id == feature_id).first()structured_logging.py (1)
360-362: LIKE search without escaping special characters.The
searchparameter is used directly in a LIKE query. If the search term contains%or_, they'll be interpreted as wildcards. Consider escaping these characters if literal matching is intended:if search: # Escape LIKE special characters for literal matching escaped = search.replace('\\', '\\\\').replace('%', '\\%').replace('_', '\\_') conditions.append("message LIKE ? ESCAPE '\\'") params.append(f"%{escaped}%")ui/src/components/IDESelectionModal.tsx (1)
56-77: Add explicit radio semantics for the IDE options.These buttons represent an exclusive choice; adding
role="radiogroup"/role="radio"(plustype="button") improves screen-reader clarity.♿ Suggested a11y tweak
- <div className="space-y-2"> + <div className="space-y-2" role="radiogroup" aria-label="IDE selection"> {IDE_OPTIONS.map((ide) => ( <button + type="button" + role="radio" + aria-checked={selectedIDE === ide.id} key={ide.id} onClick={() => setSelectedIDE(ide.id)}ui/src/App.tsx (1)
242-269: Consider user-facing feedback for IDE open/save failures.Right now errors only hit
console.error; a toast/snackbar would help users understand why “Open in IDE” failed.tests/conftest.py (2)
73-88: Database session fixture lacks proper transaction isolation.The
db_sessionfixture callsrollback()in finally, but sincetemp_dbalready committed data during database creation, a simple rollback won't provide true test isolation. Tests using this fixture that commit changes will affect subsequent tests.Consider using nested transactions (savepoints) for proper isolation:
Suggested fix using savepoints
`@pytest.fixture` def db_session(temp_db: Path): """Get a database session for testing. - Provides a session that is automatically rolled back after each test. + Provides a session wrapped in a transaction that is rolled back after each test. """ from api.database import create_database _, SessionLocal = create_database(temp_db) - session = SessionLocal() + connection = SessionLocal().get_bind().connect() + transaction = connection.begin() + session = SessionLocal(bind=connection) try: yield session finally: - session.rollback() + transaction.rollback() + connection.close() session.close()
223-255: Redundant engine cache invalidation.The
populated_dbfixture yieldstemp_dband then callsinvalidate_engine_cache(temp_db)in cleanup. However,temp_db(whichpopulated_dbdepends on) already invalidates the cache in its own cleanup. This double invalidation is harmless but unnecessary.Remove redundant cleanup
`@pytest.fixture` def populated_db(temp_db: Path, sample_feature_data: dict) -> Generator[Path, None, None]: """Create a database populated with sample features. Returns the project directory path. """ from api.database import Feature, create_database, invalidate_engine_cache _, SessionLocal = create_database(temp_db) session = SessionLocal() try: # Add sample features for i in range(5): feature = Feature( priority=i + 1, category=f"category_{i % 2}", name=f"Feature {i + 1}", description=f"Description for feature {i + 1}", steps=[f"Step {j}" for j in range(3)], passes=i < 2, # First 2 features are passing in_progress=i == 2, # Third feature is in progress ) session.add(feature) session.commit() finally: session.close() yield temp_db - - # Dispose cached engine to prevent file locks on Windows - invalidate_engine_cache(temp_db) + # Note: temp_db fixture handles engine cleanupprogress.py (1)
146-170: Inner exception handling is overly broad.The
except sqlite3.OperationalErrorat line 158 catches all operational errors, but the fallback query is specifically for missingin_progresscolumn. Other operational errors (e.g., disk I/O errors) would incorrectly trigger the fallback instead of bubbling up.Suggested fix to check error message
except sqlite3.OperationalError as e: - # Fallback for databases without in_progress column + # Fallback only for databases without in_progress column + if "in_progress" not in str(e).lower() and "no such column" not in str(e).lower(): + raise # Re-raise other operational errors cursor.execute(""" SELECT COUNT(*) as total,tests/test_repository_and_config.py (2)
18-34: Consider using thedb_sessionfixture for cleaner tests.Each test method manually creates a session and wraps it in try/finally. The
db_sessionfixture fromconftest.pyalready handles this pattern. However, since the tests need thepopulated_dbfixture rather thantemp_db, this pattern is acceptable for now.
398-422: Tests manipulate internal module state.The tests directly set
api.config._config = Noneto reset the singleton, which couples tests to internal implementation details. If the codebase provides a publicreload_config()function (which it does, per line 417), prefer using that consistently.Suggested improvement
def test_get_config_singleton(self, monkeypatch, tmp_path): """Test that get_config returns a singleton.""" - # Note: get_config uses the default config loading, which reads .env - # This test just verifies the singleton pattern works - import api.config - api.config._config = None - - from api.config import get_config + from api.config import get_config, reload_config + # Reset to ensure clean state + reload_config() + config1 = get_config() config2 = get_config() assert config1 is config2api/feature_repository.py (2)
32-34: Duplicated_utc_nowhelper function.This helper is duplicated in
mcp_server/feature_mcp.py(lines 47-49 per relevant snippets). Consider extracting to a shared utilities module to maintain DRY principle.
225-245:mark_passingoverwritescompleted_ateven for already-passing features.Unlike
mark_in_progresswhich guards against redundant updates (line 218),mark_passingwill updatecompleted_ateven if the feature is already passing. This could lose the original completion timestamp if called multiple times.Consider adding a guard condition
def mark_passing(self, feature_id: int) -> Optional[Feature]: ... feature = self.get_by_id(feature_id) - if feature: + if feature and not feature.passes: feature.passes = True feature.in_progress = False feature.completed_at = _utc_now() _commit_with_retry(self.session) self.session.refresh(feature) return featureserver/websocket.py (2)
275-278: Fire-and-forget task without exception handling.
asyncio.create_task()creates a background task but exceptions raised withincleanup_stale_agentswill be silently lost if not handled. This could mask errors during cleanup.🔧 Suggested improvement to handle task exceptions
def _schedule_cleanup(self) -> None: """Schedule cleanup if needed (non-blocking).""" if self._should_cleanup(): - asyncio.create_task(self.cleanup_stale_agents()) + task = asyncio.create_task(self.cleanup_stale_agents()) + task.add_done_callback(lambda t: t.exception() if not t.cancelled() else None)
210-212: Redundant cleanup scheduling.
_schedule_cleanup()is called at line 197 inside the lock when state changes, and then again at line 211 unconditionally after the lock is released. The second call will always duplicate work since the time-based check (_should_cleanup) hasn't changed between line 197 and 211.♻️ Remove redundant call
- # Periodic cleanup of stale agents (every 5 minutes) - self._schedule_cleanup() return Noneapi/logging_config.py (1)
170-176: Late import ofosmodule.The
osmodule is imported inside the function at line 171. While this works, it's unconventional and can cause confusion. Consider moving the import to the top of the file with other imports.♻️ Move import to top of file
import logging +import os import sys from logging.handlers import RotatingFileHandler ... def setup_orchestrator_logging(...): ... # Log session start - import os logger.info("=" * 60)tests/test_security_integration.py (2)
28-69: Code duplication withtests/test_security.py.The
temporary_homecontext manager is duplicated fromtests/test_security.py(lines 33-71). Consider extracting this to a shared test utilities module to maintain DRY principles.♻️ Extract to shared test utilities
Create a
tests/conftest.pyortests/test_utils.py:# tests/test_utils.py import os import sys from contextlib import contextmanager `@contextmanager` def temporary_home(home_path): """Context manager to temporarily set HOME (and Windows equivalents).""" # ... implementation ...Then import in both test files:
from tests.test_utils import temporary_home
420-424: Bareexcept Exceptionswallows context.When a test raises an exception, the traceback import happens inside the except block. Consider importing
tracebackat the top of the file for cleaner code.♻️ Move import to top of file
import asyncio import os import sys import tempfile +import traceback from contextlib import contextmanager ... except Exception as e: print(f"❌ FAIL: Test raised exception: {e}") - import traceback - traceback.print_exc() failed += 1api/connection.py (2)
419-436:get_dbis a generator but lacks@contextmanagerdecorator.The
get_dbfunction usesyieldbut is documented as a "dependency for FastAPI". For FastAPI dependencies, this is correct (FastAPI handles generator dependencies). However, the function cannot be used as a regular context manager without wrapping. Consider adding a docstring note about this limitation.The current implementation works correctly for FastAPI's dependency injection system. No code change needed, but the docstring could clarify:
def get_db() -> Session: """ Dependency for FastAPI to get database session. Note: This is a generator dependency for FastAPI's DI system. For standalone usage, use get_db_session() context manager instead. ... """
43-91: Network path detection may have false negatives.The
/proc/mountsparsing checks ifpath_str.startswith(mount_point), but this could fail to match if the mount point has a trailing slash or if paths aren't normalized identically. Consider usingPathcomparison for robustness.🔧 Suggested improvement for path comparison
# Check if path is under this mount point and if it's a network FS - if path_str.startswith(mount_point): + try: + path_resolved = Path(path_str).resolve() + mount_resolved = Path(mount_point).resolve() + if path_resolved == mount_resolved or mount_resolved in path_resolved.parents: + if fs_type in ("nfs", "nfs4", "cifs", "smbfs", "fuse.sshfs"): + return True + except (ValueError, OSError): + # Path comparison failed, continue checking other mounts + if path_str.startswith(mount_point): if fs_type in ("nfs", "nfs4", "cifs", "smbfs", "fuse.sshfs"): return True
| **IMPORTANT - XML Fallback:** | ||
| If the `feature_create_bulk` tool is unavailable or fails, output features in this XML format as a backup: | ||
|
|
||
| ```xml | ||
| <features_to_create> | ||
| [ | ||
| { | ||
| "category": "functional", | ||
| "name": "Feature name", | ||
| "description": "Description", | ||
| "steps": ["Step 1", "Step 2"] | ||
| } | ||
| ] | ||
| </features_to_create> | ||
| ``` | ||
|
|
||
| The system will parse this XML and create features automatically. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Malformed XML/JSON hybrid format in fallback.
The XML fallback format is incorrectly structured. Lines 176-187 show XML tags (<features_to_create>) wrapping JSON array syntax ([{ ... }]), which is neither valid XML nor proper JSON. This hybrid format will likely fail to parse correctly.
If the system expects XML, the content should use XML elements:
<features_to_create>
<feature>
<category>functional</category>
<name>Feature name</name>
<description>Description</description>
<steps>
<step>Step 1</step>
<step>Step 2</step>
</steps>
</feature>
</features_to_create>If the system expects JSON wrapped in XML tags for extraction, clarify this in the documentation and explain why the XML wrapper is needed.
If JSON alone is sufficient, remove the XML tags and document it as a JSON fallback instead.
📋 Recommended fix using proper XML structure
**IMPORTANT - XML Fallback:**
If the `feature_create_bulk` tool is unavailable or fails, output features in this XML format as a backup:
```xml
<features_to_create>
-[
- {
- "category": "functional",
- "name": "Feature name",
- "description": "Description",
- "steps": ["Step 1", "Step 2"]
- }
-]
+ <feature>
+ <category>functional</category>
+ <name>Feature name</name>
+ <description>Description</description>
+ <steps>
+ <step>Step 1</step>
+ <step>Step 2</step>
+ </steps>
+ </feature>
</features_to_create>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In @.claude/commands/expand-project.md around lines 173 - 190, The fallback
currently embeds a JSON array inside an XML tag (<features_to_create>), which is
invalid; replace the hybrid with a correctly structured XML payload by
converting the JSON objects into elements (with child tags ,
, , and a container with multiple entries)
inside <features_to_create>, or—if the system expects raw JSON—remove the
<features_to_create> wrapper and document the fallback as JSON; update the
documentation text to explicitly state which format the parser expects and show
the corresponding valid example using either the <features_to_create> XML
structure or a plain JSON array.
</details>
<!-- fingerprinting:phantom:triton:puma -->
<!-- This is an auto-generated comment by CodeRabbit -->
| # IMPORTANT: Do NOT clear stuck features in parallel mode! | ||
| # The orchestrator manages feature claiming atomically. | ||
| # Clearing here causes race conditions where features are marked in_progress | ||
| # by the orchestrator but immediately cleared by the agent subprocess on startup. | ||
| # | ||
| # For single-agent mode or manual runs, clearing is still safe because | ||
| # there's only one agent at a time and it happens before claiming any features. | ||
| # | ||
| # Only clear if we're NOT in a parallel orchestrator context | ||
| # (detected by checking if this agent is a subprocess spawned by orchestrator) | ||
| try: | ||
| import psutil | ||
| parent_process = psutil.Process().parent() | ||
| parent_name = parent_process.name() if parent_process else "" | ||
|
|
||
| # Only clear if parent is NOT python (i.e., we're running manually, not from orchestrator) | ||
| if "python" not in parent_name.lower(): | ||
| clear_stuck_features(project_dir) | ||
| except (ImportError, ModuleNotFoundError): | ||
| # psutil not available - assume single-agent mode and clear | ||
| clear_stuck_features(project_dir) | ||
| except Exception: | ||
| # If parent process check fails, err on the safe side and clear | ||
| clear_stuck_features(project_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, find and read agent.py around lines 182-205
fd -type f -name "agent.py" | head -5Repository: getworken/autocoder-coderabbit
Length of output: 244
🏁 Script executed:
# Fix the fd command syntax
fd --type file -name "agent.py"Repository: getworken/autocoder-coderabbit
Length of output: 304
🏁 Script executed:
# Simple search for agent.py
find . -name "agent.py" -type fRepository: getworken/autocoder-coderabbit
Length of output: 110
🏁 Script executed:
# Read agent.py around lines 182-205
cat -n agent.py | sed -n '175,215p'Repository: getworken/autocoder-coderabbit
Length of output: 2177
🏁 Script executed:
# Check if psutil is in requirements/dependencies
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "poetry.lock" | head -10Repository: getworken/autocoder-coderabbit
Length of output: 109
🏁 Script executed:
# Check requirements.txt and pyproject.toml for psutil
echo "=== requirements.txt ===" && cat requirements.txt && echo -e "\n=== pyproject.toml ===" && cat pyproject.tomlRepository: getworken/autocoder-coderabbit
Length of output: 1220
🏁 Script executed:
# Find clear_stuck_features function definition
rg -n "def clear_stuck_features" .Repository: getworken/autocoder-coderabbit
Length of output: 56
🏁 Script executed:
# Search more broadly for clear_stuck_features
rg "clear_stuck_features" -nRepository: getworken/autocoder-coderabbit
Length of output: 280
🏁 Script executed:
# Check the imports in agent.py to see where clear_stuck_features comes from
cat -n agent.py | sed -n '1,50p'Repository: getworken/autocoder-coderabbit
Length of output: 1736
🏁 Script executed:
# Find and examine clear_stuck_features in progress.py
find . -name "progress.py" -type fRepository: getworken/autocoder-coderabbit
Length of output: 87
🏁 Script executed:
# Read progress.py and find the clear_stuck_features function
rg -n "def clear_stuck_features" progress.py -A 30Repository: getworken/autocoder-coderabbit
Length of output: 56
🏁 Script executed:
# Search for clear_stuck_features in progress.py with broader context
rg "clear_stuck_features" progress.py -A 20 -B 2Repository: getworken/autocoder-coderabbit
Length of output: 56
🏁 Script executed:
# List all contents of progress.py to find the function
cat progress.py | head -200Repository: getworken/autocoder-coderabbit
Length of output: 6644
🏁 Script executed:
# Search for clear_stuck_features in the entire progress.py file
wc -l progress.py && echo "---" && rg -n "clear" progress.pyRepository: getworken/autocoder-coderabbit
Length of output: 93
🏁 Script executed:
# Check the full progress.py file to see if clear_stuck_features is defined
cat progress.py | tail -150Repository: getworken/autocoder-coderabbit
Length of output: 5709
🏁 Script executed:
# Search entire codebase for clear_stuck_features definition
rg "def clear_stuck_features" .Repository: getworken/autocoder-coderabbit
Length of output: 56
Skip clearing stuck features when orchestrator detection fails.
Both the except (ImportError, ModuleNotFoundError) and except Exception blocks fall back to calling clear_stuck_features(), which reintroduces the parallel-mode race condition the code explicitly guards against. While psutil is a required dependency (so ImportError is unlikely), the general Exception handler poses a real risk: if parent process detection fails for any reason, clearing still proceeds. Safer approach: only clear when detection succeeds and confirms we're not in parallel mode; skip clearing otherwise.
🛠️ Suggested safer fallback
- try:
- import psutil
- parent_process = psutil.Process().parent()
- parent_name = parent_process.name() if parent_process else ""
-
- # Only clear if parent is NOT python (i.e., we're running manually, not from orchestrator)
- if "python" not in parent_name.lower():
- clear_stuck_features(project_dir)
- except (ImportError, ModuleNotFoundError):
- # psutil not available - assume single-agent mode and clear
- clear_stuck_features(project_dir)
- except Exception:
- # If parent process check fails, err on the safe side and clear
- clear_stuck_features(project_dir)
+ should_clear = False
+ try:
+ import psutil
+ parent_process = psutil.Process().parent()
+ parent_name = parent_process.name() if parent_process else ""
+
+ # Only clear if parent is NOT python (i.e., running manually, not from orchestrator)
+ should_clear = "python" not in parent_name.lower()
+ except (ImportError, ModuleNotFoundError):
+ logger.warning("psutil not available; skipping clear_stuck_features to avoid parallel-mode races")
+ except Exception:
+ logger.warning("Unable to determine parent process; skipping clear_stuck_features", exc_info=True)
+
+ if should_clear:
+ clear_stuck_features(project_dir)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # IMPORTANT: Do NOT clear stuck features in parallel mode! | |
| # The orchestrator manages feature claiming atomically. | |
| # Clearing here causes race conditions where features are marked in_progress | |
| # by the orchestrator but immediately cleared by the agent subprocess on startup. | |
| # | |
| # For single-agent mode or manual runs, clearing is still safe because | |
| # there's only one agent at a time and it happens before claiming any features. | |
| # | |
| # Only clear if we're NOT in a parallel orchestrator context | |
| # (detected by checking if this agent is a subprocess spawned by orchestrator) | |
| try: | |
| import psutil | |
| parent_process = psutil.Process().parent() | |
| parent_name = parent_process.name() if parent_process else "" | |
| # Only clear if parent is NOT python (i.e., we're running manually, not from orchestrator) | |
| if "python" not in parent_name.lower(): | |
| clear_stuck_features(project_dir) | |
| except (ImportError, ModuleNotFoundError): | |
| # psutil not available - assume single-agent mode and clear | |
| clear_stuck_features(project_dir) | |
| except Exception: | |
| # If parent process check fails, err on the safe side and clear | |
| clear_stuck_features(project_dir) | |
| # IMPORTANT: Do NOT clear stuck features in parallel mode! | |
| # The orchestrator manages feature claiming atomically. | |
| # Clearing here causes race conditions where features are marked in_progress | |
| # by the orchestrator but immediately cleared by the agent subprocess on startup. | |
| # | |
| # For single-agent mode or manual runs, clearing is still safe because | |
| # there's only one agent at a time and it happens before claiming any features. | |
| # | |
| # Only clear if we're NOT in a parallel orchestrator context | |
| # (detected by checking if this agent is a subprocess spawned by orchestrator) | |
| should_clear = False | |
| try: | |
| import psutil | |
| parent_process = psutil.Process().parent() | |
| parent_name = parent_process.name() if parent_process else "" | |
| # Only clear if parent is NOT python (i.e., running manually, not from orchestrator) | |
| should_clear = "python" not in parent_name.lower() | |
| except (ImportError, ModuleNotFoundError): | |
| logger.warning("psutil not available; skipping clear_stuck_features to avoid parallel-mode races") | |
| except Exception: | |
| logger.warning("Unable to determine parent process; skipping clear_stuck_features", exc_info=True) | |
| if should_clear: | |
| clear_stuck_features(project_dir) |
🤖 Prompt for AI Agents
In `@agent.py` around lines 182 - 205, The current parent-process detection calls
clear_stuck_features() in both ImportError and general Exception handlers,
reintroducing the parallel-mode race; change the logic so
clear_stuck_features(project_dir) is only invoked when psutil import and parent
detection succeed and the parent_name check confirms we're not running under the
orchestrator (i.e., "python" not in parent_name.lower()); on ImportError or any
exception during detection, do NOT call clear_stuck_features — instead skip
clearing (optionally log a warning) so we avoid unsafe clearing when detection
fails; update the block around the psutil import/parent_process/name check and
the try/except handlers (references: psutil, parent_process, parent_name,
clear_stuck_features) accordingly.
| "CLAUDE_CODE_MAX_OUTPUT_TOKENS", # Max output tokens (default 32000, GLM 4.7 supports 131072) | ||
| ] | ||
|
|
||
| # Default max output tokens for GLM 4.7 compatibility (131k output limit) | ||
| DEFAULT_MAX_OUTPUT_TOKENS = "131072" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's find and explore the client.py file
find . -name "client.py" -type f | head -20Repository: getworken/autocoder-coderabbit
Length of output: 85
🏁 Script executed:
#!/bin/bash
# List the structure to understand the repository
ls -la | head -30Repository: getworken/autocoder-coderabbit
Length of output: 1927
🏁 Script executed:
#!/bin/bash
# Look for the specific lines mentioned in the review
find . -name "client.py" -type f -exec wc -l {} \;Repository: getworken/autocoder-coderabbit
Length of output: 89
🌐 Web query:
Anthropic Claude API maximum output tokens limit specifications
💡 Result:
In the Anthropic Messages API, the max_tokens parameter sets the maximum number of output tokens Claude may generate, and the maximum allowed value depends on the model. (docs.anthropic.com)
Current Anthropic API max output limits (per response)
- Claude Sonnet 4 (
claude-sonnet-4-20250514): 64,000 tokens (docs.anthropic.com) - Claude Sonnet 3.7 (
claude-3-7-sonnet-20250219): 64,000 tokens (optionally 128,000 with beta headeroutput-128k-2025-02-19) (docs.anthropic.com) - Claude Opus 4 / 4.1 (
claude-opus-4-20250514,claude-opus-4-1-20250805): 32,000 tokens (docs.anthropic.com) - Claude 3.5 Sonnet (
claude-3-5-sonnet-20241022,claude-3-5-sonnet-20240620): 8,192 tokens (docs.anthropic.com) - Claude 3.5 Haiku (
claude-3-5-haiku-20241022): 8,192 tokens (docs.anthropic.com) - Claude 3 Haiku (
claude-3-haiku-20240307): 4,096 tokens (docs.anthropic.com)
Note: If you enable extended thinking, those thinking tokens count toward your max_tokens budget. (docs.anthropic.com)
Citations:
- 1: https://docs.anthropic.com/zh-TW/api/messages?utm_source=openai
- 2: https://docs.anthropic.com/en/docs/about-claude/models/all-models?utm_source=openai
- 3: https://docs.anthropic.com/en/docs/about-claude/models/all-models?utm_source=openai
- 4: https://docs.anthropic.com/en/docs/about-claude/models/all-models?utm_source=openai
- 5: https://docs.anthropic.com/en/docs/about-claude/models/all-models?utm_source=openai
- 6: https://docs.anthropic.com/en/docs/about-claude/models/all-models?utm_source=openai
- 7: https://docs.anthropic.com/en/docs/about-claude/models/all-models?utm_source=openai
- 8: https://docs.anthropic.com/fr/api/messages?utm_source=openai
🌐 Web query:
GLM 4.7 maximum output tokens limit specifications
💡 Result:
GLM‑4.7’s maximum output (completion) length is 128,000 tokens (i.e., you can generate up to 128K output tokens in a single response). [1]
Notes:
- GLM‑4.7’s context length is 200K tokens, so in typical chat usage your prompt tokens + generated output tokens must fit within the 200K context window. [1]
- Some third‑party hosts/listings show a slightly different cap (e.g., “Max Tokens 131K”), which can reflect provider-specific serving limits rather than the base model spec. [2]
Sources: [1] Z.AI developer docs (GLM‑4.7 overview) • [2] SiliconFlow model card/listing for GLM‑4.7
🏁 Script executed:
#!/bin/bash
cat -n client.py | sed -n '40,60p'Repository: getworken/autocoder-coderabbit
Length of output: 1150
🏁 Script executed:
#!/bin/bash
cat -n client.py | sed -n '295,315p'Repository: getworken/autocoder-coderabbit
Length of output: 1074
🏁 Script executed:
#!/bin/bash
# Check if ANTHROPIC_BASE_URL is used elsewhere in the codebase
rg "ANTHROPIC_BASE_URL" --context 2Repository: getworken/autocoder-coderabbit
Length of output: 6163
🏁 Script executed:
#!/bin/bash
# Search for where 131072 is used or documented
rg "131072|131k|131K" -iRepository: getworken/autocoder-coderabbit
Length of output: 1298
🏁 Script executed:
#!/bin/bash
# Check if there's any documentation about this value
grep -r "131" . --include="*.md" --include="*.txt" | grep -i tokenRepository: getworken/autocoder-coderabbit
Length of output: 56
Guard the 131k default to alternative APIs only.
Setting CLAUDE_CODE_MAX_OUTPUT_TOKENS to 131,072 unconditionally will exceed Anthropic model limits (which range from 4K to 64K) and cause request failures when using the standard Anthropic API. This default is only appropriate for alternative APIs like GLM.
Modify lines 303-305 to set the default only when ANTHROPIC_BASE_URL is configured:
🔧 Proposed fix
- # Set default max output tokens for GLM 4.7 compatibility if not already set
- if "CLAUDE_CODE_MAX_OUTPUT_TOKENS" not in sdk_env:
- sdk_env["CLAUDE_CODE_MAX_OUTPUT_TOKENS"] = DEFAULT_MAX_OUTPUT_TOKENS
+ # Set default max output tokens only for alternative APIs (e.g., GLM)
+ if "CLAUDE_CODE_MAX_OUTPUT_TOKENS" not in sdk_env and os.getenv("ANTHROPIC_BASE_URL"):
+ sdk_env["CLAUDE_CODE_MAX_OUTPUT_TOKENS"] = DEFAULT_MAX_OUTPUT_TOKENSApply the same fix to server/services/spec_chat_session.py, server/services/expand_chat_session.py, and server/services/assistant_chat_session.py (lines 303-305 in each file).
🤖 Prompt for AI Agents
In `@client.py` around lines 47 - 51, The unconditional DEFAULT_MAX_OUTPUT_TOKENS
= "131072" should be set only when an alternative Anthropic-compatible API is
configured; change the assignment to check os.getenv("ANTHROPIC_BASE_URL") (or
equivalent config) and only set DEFAULT_MAX_OUTPUT_TOKENS to "131072" when
ANTHROPIC_BASE_URL is present, otherwise set a safer default (or leave
unset/None) to avoid exceeding standard Anthropic model limits; apply this same
guarded assignment change for DEFAULT_MAX_OUTPUT_TOKENS in client.py and the
same lines in server/services/spec_chat_session.py,
server/services/expand_chat_session.py, and
server/services/assistant_chat_session.py, referencing the
DEFAULT_MAX_OUTPUT_TOKENS constant and the ANTHROPIC_BASE_URL environment/config
variable.
| # Available models with display names (Claude models) | ||
| # To add a new model: add an entry here with {"id": "model-id", "name": "Display Name"} | ||
| AVAILABLE_MODELS = [ | ||
| CLAUDE_MODELS = [ | ||
| {"id": "claude-opus-4-5-20251101", "name": "Claude Opus 4.5"}, | ||
| {"id": "claude-sonnet-4-5-20250929", "name": "Claude Sonnet 4.5"}, | ||
| ] | ||
|
|
||
| # Common Ollama models for local inference | ||
| OLLAMA_MODELS = [ | ||
| {"id": "llama3.3:70b", "name": "Llama 3.3 70B"}, | ||
| {"id": "llama3.2:latest", "name": "Llama 3.2"}, | ||
| {"id": "codellama:34b", "name": "Code Llama 34B"}, | ||
| {"id": "deepseek-coder:33b", "name": "DeepSeek Coder 33B"}, | ||
| {"id": "qwen2.5:72b", "name": "Qwen 2.5 72B"}, | ||
| {"id": "mistral:latest", "name": "Mistral"}, | ||
| ] | ||
|
|
||
| # Default to Claude models (will be overridden if Ollama is detected) | ||
| AVAILABLE_MODELS = CLAUDE_MODELS | ||
|
|
||
| # List of valid model IDs (derived from AVAILABLE_MODELS) | ||
| VALID_MODELS = [m["id"] for m in AVAILABLE_MODELS] | ||
| VALID_MODELS = [m["id"] for m in CLAUDE_MODELS] | ||
|
|
||
| # Default model and settings | ||
| DEFAULT_MODEL = "claude-opus-4-5-20251101" | ||
| DEFAULT_OLLAMA_MODEL = "llama3.3:70b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VALID_MODELS won’t reflect Ollama availability.
VALID_MODELS is derived from CLAUDE_MODELS, but AVAILABLE_MODELS is designed to switch to Ollama when detected. That means validators using VALID_MODELS will still reject Ollama IDs. Please derive VALID_MODELS from the active/combined model set, or ensure it’s updated whenever AVAILABLE_MODELS changes.
🔧 Suggested fix
-# List of valid model IDs (derived from AVAILABLE_MODELS)
-VALID_MODELS = [m["id"] for m in CLAUDE_MODELS]
+# List of valid model IDs (derived from available model catalogs)
+VALID_MODELS = [m["id"] for m in (CLAUDE_MODELS + OLLAMA_MODELS)]🤖 Prompt for AI Agents
In `@registry.py` around lines 31 - 56, VALID_MODELS is currently built from
CLAUDE_MODELS and won’t include Ollama IDs when AVAILABLE_MODELS is swapped;
update the code so VALID_MODELS is derived from AVAILABLE_MODELS (or recomputed
whenever AVAILABLE_MODELS changes) instead of CLAUDE_MODELS. Locate the
constants VALID_MODELS and AVAILABLE_MODELS and change the list comprehension
that builds VALID_MODELS to use AVAILABLE_MODELS (or add a small helper/function
to refresh VALID_MODELS after detecting Ollama) so validators accept the actual
active model IDs; ensure DEFAULT_OLLAMA_MODEL and DEFAULT_MODEL remain
unchanged.
| # Redact sensitive data before logging to prevent secret leakage | ||
| # Use deterministic hash for identification without exposing content | ||
| command_hash = hashlib.sha256(command.encode('utf-8')).hexdigest()[:16] | ||
| reason_hash = hashlib.sha256(reason.encode('utf-8')).hexdigest()[:16] | ||
|
|
||
| # Create redacted preview (first 20 + last 20 chars with mask in between) | ||
| def redact_string(s: str, max_preview: int = 20) -> str: | ||
| if len(s) <= max_preview * 2: | ||
| return s[:max_preview] + "..." if len(s) > max_preview else s | ||
| return f"{s[:max_preview]}...{s[-max_preview:]}" | ||
|
|
||
| command_preview = redact_string(command, 20) | ||
| reason_preview = redact_string(reason, 20) | ||
|
|
||
| logger.info( | ||
| f"[SECURITY] Command denied (hash: {command_hash}): {command_preview} " | ||
| f"Reason (hash: {reason_hash}): {reason_preview}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid logging denied command previews to prevent secret leakage.
Even with hashing, Line 73+ logs partial command/reason strings. Secrets can easily appear in the first 20 characters (e.g., tokens in headers). Prefer logging only hashes/lengths or fully masked strings. Also strip trailing whitespace on blank lines in this block (CI W293 at Line 67/72/78/81).
🔒 Suggested fix
- # Create redacted preview (first 20 + last 20 chars with mask in between)
- def redact_string(s: str, max_preview: int = 20) -> str:
- if len(s) <= max_preview * 2:
- return s[:max_preview] + "..." if len(s) > max_preview else s
- return f"{s[:max_preview]}...{s[-max_preview:]}"
-
- command_preview = redact_string(command, 20)
- reason_preview = redact_string(reason, 20)
-
- logger.info(
- f"[SECURITY] Command denied (hash: {command_hash}): {command_preview} "
- f"Reason (hash: {reason_hash}): {reason_preview}"
- )
+ logger.info(
+ "[SECURITY] Command denied (hash: %s, len: %d). Reason hash: %s, len: %d",
+ command_hash, len(command), reason_hash, len(reason)
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Redact sensitive data before logging to prevent secret leakage | |
| # Use deterministic hash for identification without exposing content | |
| command_hash = hashlib.sha256(command.encode('utf-8')).hexdigest()[:16] | |
| reason_hash = hashlib.sha256(reason.encode('utf-8')).hexdigest()[:16] | |
| # Create redacted preview (first 20 + last 20 chars with mask in between) | |
| def redact_string(s: str, max_preview: int = 20) -> str: | |
| if len(s) <= max_preview * 2: | |
| return s[:max_preview] + "..." if len(s) > max_preview else s | |
| return f"{s[:max_preview]}...{s[-max_preview:]}" | |
| command_preview = redact_string(command, 20) | |
| reason_preview = redact_string(reason, 20) | |
| logger.info( | |
| f"[SECURITY] Command denied (hash: {command_hash}): {command_preview} " | |
| f"Reason (hash: {reason_hash}): {reason_preview}" | |
| ) | |
| # Redact sensitive data before logging to prevent secret leakage | |
| # Use deterministic hash for identification without exposing content | |
| command_hash = hashlib.sha256(command.encode('utf-8')).hexdigest()[:16] | |
| reason_hash = hashlib.sha256(reason.encode('utf-8')).hexdigest()[:16] | |
| logger.info( | |
| "[SECURITY] Command denied (hash: %s, len: %d). Reason hash: %s, len: %d", | |
| command_hash, len(command), reason_hash, len(reason) | |
| ) |
🧰 Tools
🪛 GitHub Actions: CI
[error] 72-72: W293: Blank line contains whitespace
[error] 78-78: W293: Blank line contains whitespace
[error] 81-81: W293: Blank line contains whitespace
🤖 Prompt for AI Agents
In `@security.py` around lines 68 - 85, Replace the partial previews in the
logging block so no parts of command/reason are logged: stop calling
redact_string for previews and instead log only the deterministic hashes
(command_hash, reason_hash) and the lengths (len(command), len(reason)) or a
fixed masked string; update the logger.info call to include only those
hashes/lengths or fully masked placeholders and remove any trailing whitespace
on blank lines in this block (affecting the lines around
command_hash/reason_hash, redact_string, and logger.info) to fix CI W293.
| @router.post("/{name}/knowledge", response_model=KnowledgeFileContent) | ||
| async def upload_knowledge_file(name: str, file: KnowledgeFileUpload): | ||
| """Upload a knowledge file to a project.""" | ||
| _init_imports() | ||
| _, _, get_project_path, _, _ = _get_registry_functions() | ||
|
|
||
| name = validate_project_name(name) | ||
| project_dir = get_project_path(name) | ||
|
|
||
| if not project_dir: | ||
| raise HTTPException(status_code=404, detail=f"Project '{name}' not found") | ||
|
|
||
| if not project_dir.exists(): | ||
| raise HTTPException(status_code=404, detail="Project directory not found") | ||
|
|
||
| knowledge_dir = get_knowledge_dir(project_dir) | ||
| knowledge_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| filepath = knowledge_dir / file.filename | ||
|
|
||
| try: | ||
| filepath.write_text(file.content, encoding="utf-8") | ||
| return KnowledgeFileContent(name=file.filename, content=file.content) | ||
| except Exception as e: | ||
| raise HTTPException(status_code=500, detail=f"Failed to write file: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing filename validation in upload endpoint.
The upload_knowledge_file endpoint doesn't validate file.filename before using it to construct the file path. While the KnowledgeFileUpload schema may have validation, the endpoint should defensively validate the filename to prevent path traversal via specially crafted filenames.
Proposed fix
`@router.post`("/{name}/knowledge", response_model=KnowledgeFileContent)
async def upload_knowledge_file(name: str, file: KnowledgeFileUpload):
"""Upload a knowledge file to a project."""
_init_imports()
_, _, get_project_path, _, _ = _get_registry_functions()
name = validate_project_name(name)
project_dir = get_project_path(name)
if not project_dir:
raise HTTPException(status_code=404, detail=f"Project '{name}' not found")
if not project_dir.exists():
raise HTTPException(status_code=404, detail="Project directory not found")
+ # Validate filename (prevent path traversal)
+ if not re.match(r'^[a-zA-Z0-9_\-\.]+\.md$', file.filename):
+ raise HTTPException(status_code=400, detail="Invalid filename. Must be alphanumeric with .md extension")
+
knowledge_dir = get_knowledge_dir(project_dir)
knowledge_dir.mkdir(parents=True, exist_ok=True)
filepath = knowledge_dir / file.filename🤖 Prompt for AI Agents
In `@server/routers/projects.py` around lines 710 - 734, The upload_knowledge_file
endpoint uses file.filename directly to build filepath and is vulnerable to path
traversal; validate and sanitize the filename before writing: ensure filename is
non-empty, contains no path separators or "..", and is not absolute (you can get
a safe name via pathlib.Path(file.filename).name or os.path.basename and compare
to the original to detect tampering), reject names with null bytes or dangerous
characters, then construct filepath = knowledge_dir / safe_name and verify
filepath.resolve().is_relative_to(knowledge_dir.resolve()) (or compare parents)
before writing; on invalid names return HTTPException(status_code=400) and keep
the rest of upload_knowledge_file, KnowledgeFileUpload, get_knowledge_dir
references intact.
| # Detect successful feature_create_bulk tool calls | ||
| # Handle both ToolResult and ToolResultBlock naming conventions | ||
| elif block_type in ("ToolResultBlock", "ToolResult"): | ||
| # Try to get tool name from tool_use_id correlation or direct attribute | ||
| tool_use_id = getattr(block, "tool_use_id", None) | ||
| tool_name = tool_use_map.get(tool_use_id, "") or getattr(block, "tool_name", "") | ||
| if "feature_create_bulk" in tool_name: | ||
| mcp_tool_succeeded = True | ||
| logger.info("Detected successful feature_create_bulk MCP tool call") | ||
|
|
||
| # Extract created features from tool result | ||
| tool_content = getattr(block, "content", []) | ||
| if tool_content: | ||
| for content_block in tool_content: | ||
| if hasattr(content_block, "text"): | ||
| try: | ||
| result_data = json.loads(content_block.text) | ||
| created_features = result_data.get("created_features", []) | ||
|
|
||
| if created_features: | ||
| self.features_created += len(created_features) | ||
| # Safely extract feature IDs, filtering out any without valid IDs | ||
| self.created_feature_ids.extend( | ||
| [f.get("id") for f in created_features if f.get("id") is not None] | ||
| ) | ||
|
|
||
| yield { | ||
| "type": "features_created", | ||
| "count": len(created_features), | ||
| "features": created_features, | ||
| "source": "mcp" # Tag source for debugging | ||
| } | ||
|
|
||
| logger.info(f"Created {len(created_features)} features for {self.project_name} (via MCP)") | ||
| except (json.JSONDecodeError, AttributeError) as e: | ||
| logger.warning(f"Failed to parse MCP tool result: {e}") | ||
|
|
||
| # Only parse XML if MCP tool wasn't used (fallback mechanism) | ||
| if not mcp_tool_succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n server/services/expand_chat_session.py | sed -n '370,430p'Repository: getworken/autocoder-coderabbit
Length of output: 4183
🏁 Script executed:
head -50 server/services/expand_chat_session.pyRepository: getworken/autocoder-coderabbit
Length of output: 1417
🏁 Script executed:
rg -A 3 -B 3 "is_error|ToolResult.*error" server/services/ 2>/dev/null | head -50Repository: getworken/autocoder-coderabbit
Length of output: 991
🏁 Script executed:
rg "getattr.*error" server/services/spec_chat_session.py -B 2 -A 2Repository: getworken/autocoder-coderabbit
Length of output: 579
🏁 Script executed:
rg -B 5 "mcp_tool_succeeded" server/services/expand_chat_session.py | head -40Repository: getworken/autocoder-coderabbit
Length of output: 1180
Gate MCP success on error checking and actual feature extraction to preserve XML fallback.
mcp_tool_succeeded is set to True immediately upon seeing a feature_create_bulk ToolResult block (line 386), before validating that the result is not an error. If the ToolResult has is_error=True or content parsing fails, XML parsing is skipped and features won't be created via either path.
Move the mcp_tool_succeeded = True assignment to occur only after successfully extracting created_features, and check is_error before processing. This preserves the XML fallback mechanism when the MCP tool fails or returns an error response.
Suggested fix pattern
- elif block_type in ("ToolResultBlock", "ToolResult"):
+ elif block_type in ("ToolResultBlock", "ToolResult"):
# Try to get tool name from tool_use_id correlation or direct attribute
tool_use_id = getattr(block, "tool_use_id", None)
tool_name = tool_use_map.get(tool_use_id, "") or getattr(block, "tool_name", "")
- if "feature_create_bulk" in tool_name:
- mcp_tool_succeeded = True
- logger.info("Detected successful feature_create_bulk MCP tool call")
+ is_error = getattr(block, "is_error", False) or bool(getattr(block, "error", None))
+ if "feature_create_bulk" in tool_name and not is_error:
+ logger.info("Detected feature_create_bulk MCP tool call")
# Extract created features from tool result
tool_content = getattr(block, "content", [])
if tool_content:
+ parsed_any = False
for content_block in tool_content:
if hasattr(content_block, "text"):
try:
result_data = json.loads(content_block.text)
+ if result_data.get("error"):
+ continue
created_features = result_data.get("created_features", [])
if created_features:
+ parsed_any = True
self.features_created += len(created_features)
# Safely extract feature IDs, filtering out any without valid IDs
self.created_feature_ids.extend(
[f.get("id") for f in created_features if f.get("id") is not None]
)
yield {
"type": "features_created",
"count": len(created_features),
"features": created_features,
"source": "mcp" # Tag source for debugging
}
logger.info(f"Created {len(created_features)} features for {self.project_name} (via MCP)")
except (json.JSONDecodeError, AttributeError) as e:
logger.warning(f"Failed to parse MCP tool result: {e}")
+ if parsed_any:
+ mcp_tool_succeeded = True🤖 Prompt for AI Agents
In `@server/services/expand_chat_session.py` around lines 379 - 417, The code sets
mcp_tool_succeeded = True as soon as a ToolResult/ToolResultBlock with
"feature_create_bulk" is seen, preventing the XML fallback even when the tool
errored or parsing failed; change the logic in the ToolResult/ToolResultBlock
branch to first check getattr(block, "is_error", False) and skip processing if
true, then attempt JSON parsing of content and extract created_features, and
only after successfully extracting non-empty created_features (and appending ids
to self.created_feature_ids and incrementing self.features_created) set
mcp_tool_succeeded = True; ensure any JSONDecodeError/AttributeError keeps
mcp_tool_succeeded False so the subsequent XML fallback runs.
| if not self.lock_file.exists(): | ||
| return | ||
|
|
||
| try: | ||
| # Read lock file to verify it's ours | ||
| lock_content = self.lock_file.read_text().strip() | ||
|
|
||
| # Check if we own this lock | ||
| our_pid = self.pid | ||
| if our_pid is None: | ||
| # We don't have a running process, but lock exists | ||
| # This is unexpected - remove it anyway | ||
| self.lock_file.unlink(missing_ok=True) | ||
| logger.debug("Removed orphaned lock file (no running process)") | ||
| return | ||
|
|
||
| # Parse lock content | ||
| if ":" in lock_content: | ||
| lock_pid_str, _ = lock_content.split(":", 1) | ||
| lock_pid = int(lock_pid_str) | ||
| else: | ||
| lock_pid = int(lock_content) | ||
|
|
||
| # If lock PID matches our process, remove it | ||
| if lock_pid == our_pid: | ||
| self.lock_file.unlink(missing_ok=True) | ||
| logger.debug(f"Removed lock file for our process (PID {our_pid})") | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid removing a valid lock when no local process handle exists.
If the server restarts (or the manager loses its process handle), self.pid is None and the lock is removed unconditionally. That can drop a lock for a still‑running agent and allow duplicate processes. Consider verifying the lock PID’s liveness/command before removal.
🛠️ Proposed fix
- # Check if we own this lock
- our_pid = self.pid
- if our_pid is None:
- # We don't have a running process, but lock exists
- # This is unexpected - remove it anyway
- self.lock_file.unlink(missing_ok=True)
- logger.debug("Removed orphaned lock file (no running process)")
- return
-
- # Parse lock content
- if ":" in lock_content:
- lock_pid_str, _ = lock_content.split(":", 1)
- lock_pid = int(lock_pid_str)
- else:
- lock_pid = int(lock_content)
+ # Parse lock content
+ if ":" in lock_content:
+ lock_pid_str, _ = lock_content.split(":", 1)
+ lock_pid = int(lock_pid_str)
+ else:
+ lock_pid = int(lock_content)
+
+ # Check if we own this lock
+ our_pid = self.pid
+ if our_pid is None:
+ # No local process handle — only remove if the lock PID is stale
+ if psutil.pid_exists(lock_pid):
+ try:
+ proc = psutil.Process(lock_pid)
+ cmdline = " ".join(proc.cmdline())
+ if "autonomous_agent_demo.py" in cmdline:
+ logger.debug(
+ "Lock belongs to a running agent (PID %s); leaving intact",
+ lock_pid,
+ )
+ return
+ except (psutil.NoSuchProcess, psutil.AccessDenied):
+ pass
+ self.lock_file.unlink(missing_ok=True)
+ logger.debug("Removed orphaned lock file (no local process)")
+ return🤖 Prompt for AI Agents
In `@server/services/process_manager.py` around lines 241 - 268, The current logic
unconditionally deletes the lock when self.pid is None, which can remove a valid
lock owned by a still-running process; instead, when our_pid is None, read and
parse the lock file into lock_pid (as done below) and then verify liveness of
lock_pid (e.g., os.kill(lock_pid, 0) on Unix or psutil.pid_exists/psutil.Process
to check alive and optionally inspect cmdline) before removing it; only call
self.lock_file.unlink(...) if the lock_pid is not alive or the process
command/identity does not match the expected agent, otherwise leave the lock in
place and log a warning.
| def check_hook(command: str, should_block: bool) -> bool: | ||
| """Check a single command against the security hook (helper function).""" | ||
| input_data = {"tool_name": "Bash", "tool_input": {"command": command}} | ||
| result = asyncio.run(bash_security_hook(input_data)) | ||
| was_blocked = result.get("decision") == "block" | ||
|
|
||
| if was_blocked == should_block: | ||
| status = "PASS" | ||
| else: | ||
| status = "FAIL" | ||
| expected = "blocked" if should_block else "allowed" | ||
| actual = "blocked" if was_blocked else "allowed" | ||
| reason = result.get("reason", "") | ||
| print(f" {status}: {command!r}") | ||
| print(f" Expected: {expected}, Got: {actual}") | ||
| if reason: | ||
| print(f" Reason: {reason}") | ||
| return False | ||
|
|
||
| print(f" {status}: {command!r}") | ||
| return True | ||
|
|
||
|
|
||
| def test_extract_commands(): | ||
| """Test the command extraction logic.""" | ||
| print("\nTesting command extraction:\n") | ||
| passed = 0 | ||
| failed = 0 | ||
|
|
||
| test_cases = [ | ||
| ("ls -la", ["ls"]), | ||
| ("npm install && npm run build", ["npm", "npm"]), | ||
| ("cat file.txt | grep pattern", ["cat", "grep"]), | ||
| ("/usr/bin/node script.js", ["node"]), | ||
| ("VAR=value ls", ["ls"]), | ||
| ("git status || git init", ["git", "git"]), | ||
| ] | ||
|
|
||
| for cmd, expected in test_cases: | ||
| result = extract_commands(cmd) | ||
| if result == expected: | ||
| print(f" PASS: {cmd!r} -> {result}") | ||
| passed += 1 | ||
| else: | ||
| print(f" FAIL: {cmd!r}") | ||
| print(f" Expected: {expected}, Got: {result}") | ||
| failed += 1 | ||
|
|
||
| return passed, failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pytest will treat these tests as always passing.
The test_* functions only print and return counts; pytest ignores return values, so failures won’t fail CI. Add assertions (or raise) on failed to enforce the checks.
✅ Make pytest fail on test failures
def check_hook(command: str, should_block: bool) -> bool:
@@
print(f" {status}: {command!r}")
return True
+def _assert_no_failures(passed: int, failed: int, label: str) -> None:
+ assert failed == 0, f"{label}: {failed} failure(s)"
+
def test_extract_commands():
@@
- return passed, failed
+ _assert_no_failures(passed, failed, "extract_commands")
+ return passed, failedApply the same pattern to the rest of the test_* functions.
🧰 Tools
🪛 GitHub Actions: CI
[error] 78-78: W293: Blank line contains whitespace
[error] 81-81: W293: Blank line contains whitespace
🤖 Prompt for AI Agents
In `@tests/test_security.py` around lines 75 - 123, Tests like
test_extract_commands and helpers like check_hook currently only print and
return values so pytest treats them as passing; update each test_* function
(e.g., test_extract_commands and any other test_* in this file) to assert on
failures instead of returning counts by adding an assertion such as assert
failed == 0 (or assert passed == expected_count) at the end, and update
check_hook to raise an AssertionError or return a boolean used by an assert in
callers (e.g., assert check_hook(cmd, should_block) is True) so any mismatch
fails the test run; ensure each test uses pytest-style assertions rather than
print+return.
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.