Skip to content

fix(discovery): Allow starting discovery when project is running but discovery idle#181

Merged
frankbria merged 2 commits intomainfrom
fix/start-discovery
Jan 4, 2026
Merged

fix(discovery): Allow starting discovery when project is running but discovery idle#181
frankbria merged 2 commits intomainfrom
fix/start-discovery

Conversation

@frankbria
Copy link
Owner

@frankbria frankbria commented Jan 4, 2026

Summary

  • Fixes issue where "Start Discovery" button was blocked when project status was "running" but discovery hadn't actually started
  • The /start endpoint now checks discovery state via LeadAgent.get_discovery_status() before responding
  • If discovery is "idle", starts discovery (202 Accepted)
  • If discovery is "discovering" or "completed", returns appropriate status (200 OK)

Root Cause

When a project is created, it's set to status "running" but discovery may not have started (state: "idle"). The /start endpoint was returning "already running" without checking discovery state, blocking the Start Discovery button.

Changes

File Changes
codeframe/ui/routers/agents.py Added discovery state checking logic before returning "already running"
tests/agents/test_agent_lifecycle.py Added 4 new tests, updated 1 existing test
web-ui/.../DiscoveryProgress.test.tsx Added test for "already running" response
tests/e2e/test_start_agent_flow.spec.ts Added 2 E2E tests for discovery workflow

Test plan

  • Backend unit tests: 22/22 agent lifecycle tests pass
  • Frontend unit tests: 1316/1316 tests pass
  • New tests cover all discovery state scenarios (idle, discovering, completed, error fallback)
  • Manual test: Create project, verify Start Discovery button works if discovery is idle
  • E2E test: Run npx playwright test test_start_agent_flow.spec.ts

Summary by CodeRabbit

  • New Features
    • Start-project flow now respects discovery state: automatically initiates discovery when idle, reports in-progress status while discovering, and confirms completion when finished.
    • Background discovery start and clearer, contextual messages for each discovery state.
    • Robust fallback: if status checks fail, discovery is still initiated and users receive a starting message.
    • Previous behavior remains when discovery integration is not configured.

✏️ Tip: You can customize this high-level summary in your review settings.

…discovery idle

When a project is created, it's set to status "running" but discovery
may not have started (state: "idle"). Previously, the /start endpoint
returned "already running" without checking discovery state, blocking
the "Start Discovery" button.

Changes:
- agents.py: Check discovery state via LeadAgent when project is running
  - If discovery is "idle": start discovery (202 Accepted)
  - If discovery is "discovering": return "already running" (200 OK)
  - If discovery is "completed": return appropriate status (200 OK)
  - On status check error: fallback to starting discovery

- Updated unit tests with 4 new test cases for discovery state handling
- Updated existing test to mock discovery state correctly
- Added frontend test for "already running" response handling
- Added E2E tests for discovery start workflow
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

Walkthrough

Adds discovery-state awareness to the project start endpoint: when a project is RUNNING and an API key is set, the router queries LeadAgent for discovery status (idle / discovering / completed / error) and either starts discovery in background or returns an appropriate 200/202 response.

Changes

Cohort / File(s) Summary
Backend Route Handler
codeframe/ui/routers/agents.py
Imports LeadAgent and updates start_project_agent to check discovery state when project status is RUNNING and an API key is configured; starts discovery in background if idle or on status-check failure, otherwise returns 200 for discovering/completed.
Backend Unit Tests
tests/agents/test_agent_lifecycle.py
Extends existing test to mock LeadAgent.get_discovery_status and adds TestStartEndpointDiscoveryState with tests for idle (expect 202/start), discovering (200/already running), completed (200), and status-check failure (202/fallback). Uses monkeypatch for ANTHROPIC_API_KEY and mocks for LeadAgent and start_agent.
E2E Tests
tests/e2e/test_start_agent_flow.spec.ts
Adds two end-to-end cases: start discovery when project is running but discovery idle, and handle discovery already in progress gracefully; validates UI flows and discovery progress visibility.
UI Component Tests
web-ui/src/components/__tests__/DiscoveryProgress.test.tsx
Adds "should handle 'already running' response gracefully" test that simulates idle→discovering progression, mocks startProject to return running, asserts no error and that discovery progress is refreshed.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Router as Agent Router
    participant LeadAgent
    participant BG as Background Task

    Client->>Router: POST /projects/:id/start
    Router->>Router: Fetch project & check status
    Router->>Router: Check API key configured?
    alt API key configured
        Router->>LeadAgent: get_discovery_status(project_id)
        alt discovery == idle
            LeadAgent-->>Router: idle
            Router->>BG: start_agent(project_id) (async)
            Router-->>Client: 202 Accepted (starting discovery)
        else discovery == discovering
            LeadAgent-->>Router: discovering
            Router-->>Client: 200 OK (discovery in progress)
        else discovery == completed
            LeadAgent-->>Router: completed
            Router-->>Client: 200 OK (project running)
        else status check error
            LeadAgent-->>Router: error/exception
            Router->>BG: start_agent(project_id) (fallback)
            Router-->>Client: 202 Accepted (starting discovery)
        end
    else No API key
        Router-->>Client: 200 OK (project running)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

Poem

🐰 A curious rabbit hops to see,
States of discovery, idle to be,
If idle, I start with a quiet cheer,
If running, I watch and stay right here,
Hooray — progress hums, the path is clear! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main change: enabling discovery start when project is running but discovery is idle, which matches the core functionality added across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/start-discovery

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Jan 4, 2026

Code Review - PR #181

Great work identifying and fixing the discovery state synchronization issue! This is a thoughtful solution that properly checks discovery state before blocking the "Start Discovery" button. Here's my detailed review:

✅ Strengths

  1. Root Cause Analysis: Excellent identification of the issue - project status "running" doesn't guarantee discovery has started
  2. Comprehensive Testing: Strong test coverage with 4 new unit tests + 2 E2E tests covering all discovery states (idle, discovering, completed, error fallback)
  3. Graceful Error Handling: Smart fallback behavior when discovery status check fails - tries to start discovery rather than blocking
  4. Clear Documentation: Good docstring updates explaining the new behavior

🔍 Code Quality Issues

1. Resource Leak - LeadAgent Not Cleaned Up ⚠️ HIGH PRIORITY

# codeframe/ui/routers/agents.py:82-86
temp_agent = LeadAgent(project_id=project_id, db=db, api_key=api_key)
discovery_status = temp_agent.get_discovery_status()

Problem: Creating a LeadAgent instance is heavyweight (initializes provider, discovery framework, agent pool manager, etc.) but it's never cleaned up. This happens on every /start request for running projects.

Impact:

  • Memory leak: Each request creates objects that aren't garbage collected
  • Potential resource exhaustion with repeated requests
  • Unnecessary overhead for a simple status check

Recommendation: Extract discovery status directly from database instead of creating a full LeadAgent instance:

# Option A: Add a database method
discovery_state = db.get_discovery_state(project_id)

# Option B: Use existing DB discovery methods
discovery_answers = db.get_discovery_answers(project_id)
# Derive state from answer count vs required questions

The discovery state is already persisted in the database (see LeadAgent._save_discovery_state() in lead_agent.py:373-376), so you shouldn't need to instantiate a full agent just to read it.

2. Duplicate Start Requests Not Handled

# Lines 91 and 112 - both start discovery
background_tasks.add_task(start_agent, project_id, db, running_agents, api_key)

Problem: When discovery state is "idle" OR status check fails, code calls start_agent(). The comment says "start_agent handles duplicates gracefully" (line 111), but looking at shared.py:257-284, it actually raises ValueError on duplicates:

# shared.py:283-284
if existing_agent is not None:
    raise ValueError(f"Agent already running for project {project_id}")

Impact:

  • Race condition: Two simultaneous /start requests for an idle project could both pass the status check and both try to start
  • The second one will raise ValueError, crashing the background task
  • No error handling for background_tasks failures

Recommendation:

# Wrap background task with try/except or check running_agents first
if project_id not in running_agents:
    background_tasks.add_task(start_agent, project_id, db, running_agents, api_key)

3. Inconsistent HTTP Status Codes

Lines 91 and 112 both return 202 status but use different methods:

# Line 91 - implicit 202
return {"message": f"Starting discovery...", "status": "starting"}

# Line 112 - implicit 202  
return {"message": f"Starting discovery...", "status": "starting"}

# Compare to lines 96-99 - explicit JSONResponse
return JSONResponse(
    status_code=200,
    content={"message": f"...", "status": "running"}
)

Recommendation: Be consistent - either use JSONResponse everywhere or rely on FastAPI's default 200 status and change the endpoint's response_model to support 202.

Currently, lines 91 and 112 return 202 because FastAPI infers it from the endpoint signature, but it's not obvious. Explicit is better than implicit.

🧪 Test Coverage Concerns

4. E2E Tests Too Permissive

// test_start_agent_flow.spec.ts:141-148
const discoveryActive = page.locator('[data-testid="discovery-answer-input"]')
  .or(page.locator('text=/discovery complete/i'))
  .or(page.locator('text=/not started/i'));  // ← Accepts "not started"?

await page.waitForTimeout(1000);  // ← Hard timeout instead of state verification
await expect(discoverySection).toBeVisible();  // ← Only checks visibility

Problem: Test passes if discovery section exists, even if discovery didn't actually start. The .or() chain is too broad - "not started" should be a failure case.

Recommendation: Assert on specific success states and fail on error states:

// Should see EITHER a question input OR completion message
const discoveryStarted = page.locator('[data-testid="discovery-answer-input"]')
  .or(page.locator('text=/discovery complete/i'));

await expect(discoveryStarted).toBeVisible({ timeout: 5000 });

// Verify NO error messages
await expect(page.locator('text=/failed to start/i')).not.toBeVisible();

📝 Minor Issues

  1. API Key Check Redundant: Lines 117-123 check for API key but this same check already happens at line 78 (before the discovery state logic). If there's no API key by line 117, it means the discovery state check completed - so the fallback path (line 117-123) is unreachable dead code.

  2. Test Mocking Incomplete: In test_agent_lifecycle.py:212-219, the mock doesn't include a close() or cleanup() method for the temp LeadAgent, so the resource leak persists even in tests.

🎯 Recommendations Summary

Must Fix:

  1. Replace temp LeadAgent creation with direct database query for discovery state
  2. Add duplicate check or error handling before calling start_agent in background task

Should Fix:
3. Make HTTP status codes explicit with JSONResponse throughout
4. Improve E2E test assertions to verify actual discovery start, not just UI existence

Nice to Have:
5. Remove unreachable API key check at lines 117-123
6. Add cleanup/context manager support to LeadAgent for proper resource management

📊 Test Results Verification

Overall Assessment

This PR correctly identifies the problem and the core logic is sound, but the implementation has a significant resource leak (issue #1) and potential race condition (issue #2) that should be addressed before merging. The fix will be straightforward once these issues are resolved.

Recommendation: Request changes to address resource leak and race condition before merging.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
codeframe/ui/routers/agents.py (2)

84-94: Potential duplicate LeadAgent instantiation.

When discovery is idle (line 90-94), a temporary LeadAgent is created to check status, then immediately start_agent is called which creates another LeadAgent instance (per the start_agent snippet in relevant code). This results in two LeadAgent instances being created for the same project.

Consider either:

  1. Reusing the temp_agent by storing it in running_agents before calling start_agent
  2. Checking if an agent already exists in running_agents before creating temp_agent
🔎 Proposed optimization to avoid duplicate agent creation
             if discovery_state == "idle":
                 # Discovery not started - proceed to start discovery
                 logger.info(f"Project {project_id} is running but discovery is idle - starting discovery")
+                # Store the temp_agent to avoid creating a duplicate in start_agent
+                from codeframe.ui.shared import shared_state
+                async with shared_state._agents_lock:
+                    if project_id not in shared_state._running_agents:
+                        shared_state._running_agents[project_id] = temp_agent
+                        running_agents[project_id] = temp_agent
                 background_tasks.add_task(start_agent, project_id, db, running_agents, api_key)
                 return {"message": f"Starting discovery for project {project_id}", "status": "starting"}

Note: This is a performance consideration. The current implementation works correctly but may be wasteful if LeadAgent initialization is non-trivial.


113-118: Reasonable fallback when API key unavailable.

When the API key is not configured, the endpoint can't instantiate a LeadAgent to check discovery state, so returning a generic "already running" message is appropriate. Consider adding a log message here for observability.

         else:
             # No API key available - can't check discovery status or start
+            logger.debug(f"Project {project_id} is running but no API key to check discovery state")
             return JSONResponse(
                 status_code=200,
                 content={"message": f"Project {project_id} is already running", "status": "running"},
             )
tests/e2e/test_start_agent_flow.spec.ts (3)

139-147: Unused locator and weak final assertion.

The discoveryActive locator (lines 139-141) is defined but never used in any assertion. The final assertion only checks that discoverySection is visible, which was already verified at line 115. This makes the test's verification incomplete.

Consider either:

  1. Removing the unused locator
  2. Adding a meaningful assertion using discoveryActive
🔎 Proposed fix to strengthen the test assertion
     // At this point, discovery should be started (either auto or via button)
     // Verify that we see either:
     // - Discovery question input
     // - Discovery in progress indicator
     // - Discovery complete message
-    const discoveryActive = page.locator('[data-testid="discovery-answer-input"]')
-      .or(page.locator('text=/discovery complete/i'))
-      .or(page.locator('text=/not started/i'));
-
-    // Give some time for the state to update
-    await page.waitForTimeout(1000);
-
-    // The test passes if discovery section is visible and properly initialized
-    await expect(discoverySection).toBeVisible();
+    // Give some time for the state to update
+    await page.waitForTimeout(1000);
+
+    // Verify discovery section shows meaningful content (not just visible)
+    const discoveryContent = page.locator('[data-testid="discovery-answer-input"]')
+      .or(page.locator('text=/discovery complete/i'))
+      .or(page.locator('text=/not started/i'))
+      .or(page.getByTestId('mock-progress-bar'));
+
+    await expect(discoverySection).toBeVisible();
+    // At least one of these states should be present
+    await expect(discoveryContent).toBeVisible({ timeout: 5000 });
   });

130-132: Consider replacing waitForTimeout with explicit element waits.

Using page.waitForTimeout(2000) introduces arbitrary delays that can cause flaky tests. Prefer waiting for specific state changes.

🔎 More robust wait alternative
-      // Wait for discovery to actually start (question appears or progress updates)
-      await page.waitForTimeout(2000);
+      // Wait for discovery to actually start (question appears or button state changes)
+      await Promise.race([
+        page.getByTestId('discovery-answer-input').waitFor({ state: 'visible', timeout: 5000 }),
+        page.locator('text=/not started/i').waitFor({ state: 'hidden', timeout: 5000 }),
+      ]).catch(() => {
+        // Either condition is acceptable - discovery may or may not have fully started
+      });

164-172: Test assertion could be more explicit.

The conditional check means the test passes whether or not an alert is visible. If you specifically want to ensure no error alert appears, consider a more direct assertion.

-    const errorAlert = page.getByRole('alert');
-    const hasError = await errorAlert.isVisible().catch(() => false);
-
-    // Should not show any error on initial load
-    if (hasError) {
-      const errorText = await errorAlert.textContent();
-      // "Failed to start discovery" is an error we want to catch
-      expect(errorText).not.toContain('Failed to start discovery');
-    }
+    // Verify no "Failed to start discovery" error is shown
+    // Using a short timeout since we expect this NOT to be visible
+    await expect(page.locator('text=/failed to start discovery/i'))
+      .not.toBeVisible({ timeout: 2000 });
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0b4437 and f472b70.

📒 Files selected for processing (4)
  • codeframe/ui/routers/agents.py
  • tests/agents/test_agent_lifecycle.py
  • tests/e2e/test_start_agent_flow.spec.ts
  • web-ui/src/components/__tests__/DiscoveryProgress.test.tsx
🧰 Additional context used
📓 Path-based instructions (5)
tests/e2e/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Files:

  • tests/e2e/test_start_agent_flow.spec.ts
web-ui/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

web-ui/src/**/*.{ts,tsx}: Use TypeScript 5.3+ for frontend development with React 18, Next.js 14, Tailwind CSS, and Hugeicons for icons
Use TypeScript strict mode and ensure 100% type safety with no hardcoded types or any usage in frontend code

Files:

  • web-ui/src/components/__tests__/DiscoveryProgress.test.tsx
web-ui/src/components/**/*.tsx

📄 CodeRabbit inference engine (CLAUDE.md)

web-ui/src/components/**/*.tsx: Use shadcn/ui Nova template components with semantic color palette (bg-card, text-foreground, etc.) and avoid hardcoded color values
Use cn() utility for conditional Tailwind CSS classes and follow Nova's compact spacing conventions

Files:

  • web-ui/src/components/__tests__/DiscoveryProgress.test.tsx
web-ui/src/**/*.tsx

📄 CodeRabbit inference engine (CLAUDE.md)

Replace all icon usage with Hugeicons (@hugeicons/react) and do not mix with lucide-react

Files:

  • web-ui/src/components/__tests__/DiscoveryProgress.test.tsx
codeframe/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

codeframe/**/*.py: Use Python 3.11+ for backend development with FastAPI, AsyncAnthropic, SQLite with async support (aiosqlite), and tiktoken for token counting
Use token counting via tiktoken library for token budget management with ~50,000 token limit per conversation
Use asyncio patterns with AsyncAnthropic for async/await in Python backend for concurrent operations
Implement quality gates with multi-stage pre-completion checks (tests → type → coverage → review) and Git + SQLite + context snapshots for project state rollback
Use tiered memory system (HOT/WARM/COLD) with importance scoring using hybrid exponential decay algorithm for context management with 30-50% token reduction
Implement session lifecycle management with auto-save/restore using file-based storage at .codeframe/session_state.json

Files:

  • codeframe/ui/routers/agents.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.845Z
Learning: Applies to web-ui/src/app/page.tsx : Implement automatic project discovery start after project creation with loading state transitions and 'Start Discovery' button for idle projects
📚 Learning: 2026-01-04T06:26:12.845Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.845Z
Learning: Applies to web-ui/src/app/page.tsx : Implement automatic project discovery start after project creation with loading state transitions and 'Start Discovery' button for idle projects

Applied to files:

  • tests/e2e/test_start_agent_flow.spec.ts
  • web-ui/src/components/__tests__/DiscoveryProgress.test.tsx
  • codeframe/ui/routers/agents.py
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Implement Lead Agent for orchestration and Worker Agents for specialization (Backend, Frontend, Test, Review) with maturity levels D1-D4

Applied to files:

  • codeframe/ui/routers/agents.py
📚 Learning: 2025-12-17T19:21:30.131Z
Learnt from: frankbria
Repo: frankbria/codeframe PR: 128
File: tests/agents/test_bottleneck_detection.py:486-500
Timestamp: 2025-12-17T19:21:30.131Z
Learning: In tests/agents/*.py, when testing bottleneck detection logic, ensure that tests exercising detect_bottlenecks are async and mock _get_agent_workload to return a value below AGENT_OVERLOAD_THRESHOLD (5) while providing a non-empty tasks list to prevent early return. This guarantees the code path for low workload is exercised and behavior under threshold is verified.

Applied to files:

  • tests/agents/test_agent_lifecycle.py
🧬 Code graph analysis (3)
web-ui/src/components/__tests__/DiscoveryProgress.test.tsx (2)
web-ui/src/types/api.ts (1)
  • DiscoveryProgressResponse (135-139)
web-ui/src/lib/api.ts (1)
  • projectsApi (30-54)
codeframe/ui/routers/agents.py (2)
codeframe/agents/lead_agent.py (2)
  • LeadAgent (28-2126)
  • get_discovery_status (393-447)
codeframe/ui/shared.py (1)
  • start_agent (257-360)
tests/agents/test_agent_lifecycle.py (3)
codeframe/core/models.py (2)
  • ProjectStatus (34-43)
  • status (246-247)
codeframe/persistence/database.py (1)
  • update_project (286-288)
codeframe/agents/lead_agent.py (1)
  • get_discovery_status (393-447)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Backend Unit Tests
  • GitHub Check: Frontend Unit Tests
  • GitHub Check: E2E Smoke Tests (Chromium)
  • GitHub Check: claude-review
🔇 Additional comments (6)
codeframe/ui/routers/agents.py (1)

107-112: Broad exception handling with fallback is appropriate here.

The fallback to starting discovery when the status check fails is a reasonable defensive pattern. The comment correctly notes that start_agent handles duplicates gracefully. The warning log provides visibility into these edge cases.

web-ui/src/components/__tests__/DiscoveryProgress.test.tsx (1)

448-505: Well-structured test for the "already running" edge case.

The test correctly validates that:

  1. startProject is called when the button is clicked
  2. No error is shown when the backend returns a "running" status (non-error response)
  3. The component still refreshes discovery progress after the response

The comment on line 449 clearly explains the scenario being tested.

tests/agents/test_agent_lifecycle.py (4)

191-234: Updated test correctly reflects new behavior.

The test now properly mocks LeadAgent.get_discovery_status to return "discovering" state, which is the condition under which the endpoint should return 200 "already running". The docstring update clearly explains the new behavior.


609-648: Comprehensive test for the core fix scenario.

This test directly validates the fix: when project is RUNNING but discovery is idle, the endpoint should return 202 and start discovery. The assertion at line 647 verifies that start_agent was actually called.


649-689: Good coverage for "discovery already active" scenario.

The test correctly verifies that when discovery is already in progress:

  1. The endpoint returns 200 OK
  2. start_agent is NOT called (avoiding duplicate work)

736-768: Important error fallback test.

This test validates the defensive behavior: when discovery status check fails (e.g., database error), the endpoint falls back to starting discovery rather than failing. This ensures the user can still start discovery even if there's a transient error checking status.

@frankbria frankbria linked an issue Jan 4, 2026 that may be closed by this pull request
Check running_agents for existing agent before creating temp_agent
for status check. This avoids creating two LeadAgent instances when
one already exists.

Addresses CodeRabbit feedback on PR #181.
@frankbria frankbria merged commit 759c232 into main Jan 4, 2026
6 of 9 checks passed
@frankbria frankbria deleted the fix/start-discovery branch January 4, 2026 09:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
codeframe/ui/routers/agents.py (2)

114-119: Error handling falls back gracefully.

The exception handler catches all errors when checking discovery status and falls back to starting discovery. This is a safe approach since start_agent handles duplicates gracefully. The warning log at line 115 is appropriate.

Consider enhancing the log message to include the exception type for better debugging.

🔎 Optional enhancement: Include exception type in log
-                logger.warning(f"Failed to check discovery status for project {project_id}: {e}")
+                logger.warning(f"Failed to check discovery status for project {project_id}: {type(e).__name__}: {e}")

127-130: Consider refactoring to avoid duplicate API key fetch.

The API key is fetched from the environment at both line 82 (for discovery state check) and line 127 (for starting the agent). While this doesn't cause incorrect behavior, it's slightly redundant.

Consider fetching the API key once at the beginning of the function after authorization checks.

🔎 Proposed refactor to eliminate duplicate fetch
     # Authorization check
     if not db.user_has_project_access(current_user.id, project_id):
         raise HTTPException(status_code=403, detail="Access denied")
+
+    # Get API key from environment once
+    api_key = os.environ.get("ANTHROPIC_API_KEY")

     # cf-10.2: Handle idempotent behavior - check discovery state if already running
     if project["status"] == ProjectStatus.RUNNING.value:
         # Check discovery state before returning "already running"
         # If discovery is idle, we should still start discovery
-        api_key = os.environ.get("ANTHROPIC_API_KEY")
         if api_key:
             try:
                 # First check if agent already exists in running_agents to avoid duplicate instantiation
                 existing_agent = running_agents.get(project_id)
                 if existing_agent:
                     # Reuse existing agent for status check
                     discovery_status = existing_agent.get_discovery_status()
                 else:
                     # Create temporary agent only for status check (will be replaced by start_agent if needed)
                     temp_agent = LeadAgent(project_id=project_id, db=db, api_key=api_key)
                     discovery_status = temp_agent.get_discovery_status()

                 discovery_state = discovery_status.get("state", "idle")

                 if discovery_state == "idle":
                     # Discovery not started - proceed to start discovery
                     logger.info(f"Project {project_id} is running but discovery is idle - starting discovery")
                     background_tasks.add_task(start_agent, project_id, db, running_agents, api_key)
                     return {"message": f"Starting discovery for project {project_id}", "status": "starting"}
                 elif discovery_state == "discovering":
                     # Discovery already in progress
                     return JSONResponse(
                         status_code=200,
                         content={"message": f"Project {project_id} discovery already in progress", "status": "running"},
                     )
                 elif discovery_state == "completed":
                     # Discovery completed
                     return JSONResponse(
                         status_code=200,
                         content={"message": f"Project {project_id} discovery already completed", "status": "completed"},
                     )
             except Exception as e:
                 logger.warning(f"Failed to check discovery status for project {project_id}: {e}")
                 # Fall back to normal start flow on error - try to start discovery
                 # This is safer than blocking since start_agent handles duplicates gracefully
                 background_tasks.add_task(start_agent, project_id, db, running_agents, api_key)
                 return {"message": f"Starting discovery for project {project_id}", "status": "starting"}
         else:
             # No API key available - can't check discovery status or start
             return JSONResponse(
                 status_code=200,
                 content={"message": f"Project {project_id} is already running", "status": "running"},
             )

-    # cf-10.2: Get API key from environment
-    api_key = os.environ.get("ANTHROPIC_API_KEY")
     if not api_key:
         raise HTTPException(status_code=500, detail="ANTHROPIC_API_KEY not configured")

     # cf-10.2: Start agent in background task (non-blocking)
     background_tasks.add_task(start_agent, project_id, db, running_agents, api_key)

     # cf-10.2: Return 202 Accepted immediately
     return {"message": f"Starting Lead Agent for project {project_id}", "status": "starting"}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f472b70 and ae55ad9.

📒 Files selected for processing (1)
  • codeframe/ui/routers/agents.py
🧰 Additional context used
📓 Path-based instructions (1)
codeframe/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

codeframe/**/*.py: Use Python 3.11+ for backend development with FastAPI, AsyncAnthropic, SQLite with async support (aiosqlite), and tiktoken for token counting
Use token counting via tiktoken library for token budget management with ~50,000 token limit per conversation
Use asyncio patterns with AsyncAnthropic for async/await in Python backend for concurrent operations
Implement quality gates with multi-stage pre-completion checks (tests → type → coverage → review) and Git + SQLite + context snapshots for project state rollback
Use tiered memory system (HOT/WARM/COLD) with importance scoring using hybrid exponential decay algorithm for context management with 30-50% token reduction
Implement session lifecycle management with auto-save/restore using file-based storage at .codeframe/session_state.json

Files:

  • codeframe/ui/routers/agents.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.845Z
Learning: Applies to web-ui/src/app/page.tsx : Implement automatic project discovery start after project creation with loading state transitions and 'Start Discovery' button for idle projects
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Implement Lead Agent for orchestration and Worker Agents for specialization (Backend, Frontend, Test, Review) with maturity levels D1-D4

Applied to files:

  • codeframe/ui/routers/agents.py
📚 Learning: 2026-01-04T06:26:12.845Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.845Z
Learning: Applies to web-ui/src/app/page.tsx : Implement automatic project discovery start after project creation with loading state transitions and 'Start Discovery' button for idle projects

Applied to files:

  • codeframe/ui/routers/agents.py
🧬 Code graph analysis (1)
codeframe/ui/routers/agents.py (3)
codeframe/agents/lead_agent.py (1)
  • LeadAgent (28-2126)
codeframe/core/models.py (2)
  • ProjectStatus (34-43)
  • project_id (234-235)
codeframe/ui/shared.py (1)
  • start_agent (257-360)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Backend Unit Tests
  • GitHub Check: Frontend Unit Tests
  • GitHub Check: claude-review
  • GitHub Check: E2E Smoke Tests (Chromium)
🔇 Additional comments (5)
codeframe/ui/routers/agents.py (5)

23-23: LGTM!

The LeadAgent import is necessary for the discovery-state checking logic introduced in this PR.


48-63: LGTM!

The docstring updates accurately reflect the new discovery-state-aware behavior and all possible return codes.


95-113: Discovery state branching logic is correct.

The three-way branch (idle/discovering/completed) with appropriate HTTP status codes (202 for starting, 200 for already running/completed) correctly implements idempotent behavior. The status messages are clear and informative.


89-89: get_discovery_status() method is properly implemented on LeadAgent. The method exists at line 393 of codeframe/agents/lead_agent.py with return type Dict[str, Any] and correctly returns a dictionary with a "state" key containing values "idle", "discovering", or "completed", as used at lines 89 and 93.


85-93: The race condition is real but mitigated by design.

At line 86, running_agents.get(project_id) lacks synchronization. A thread could find no agent, allow another thread to create one, then create a temporary agent. However, verification confirms this is acceptable because:

  1. The temporary agent is not stored in running_agents, so it's garbage collected after the status check
  2. If start_agent is called (lines 100 or 118), it uses shared_state._agents_lock to prevent duplicate instantiation
  3. The get_discovery_status() method exists on LeadAgent and is safe to invoke

The race window is narrow and consequential only to brief unnecessary instantiation. No additional locking is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P0] Clicking 'Start Discovery' doesn't start discovery process

1 participant