-
Notifications
You must be signed in to change notification settings - Fork 0
Description
User Story
As a CIDX API consumer (Claude Code MCP client or REST API user),
I want all long-running operations to execute asynchronously with background job tracking,
So that my connections don't timeout and I can monitor operation progress.
Background
Code inspection revealed 2 critical MCP endpoints that are SYNCHRONOUS and violate the 30-second rule:
add_golden_repo: Blocks for minutes to 4+ hours (with temporal indexing)remove_golden_repo: Blocks for 30+ seconds (large directory cleanup)
Additionally, refresh_golden_repo has BROKEN code - MCP handler expects job_id but implementation returns Dict.
These violations cause:
- Production timeouts with Claude Code MCP integration (30-second MCP timeout)
- Poor user experience (hanging connections)
- Broken functionality (
refresh_golden_repocrashes)
Current State Analysis
✅ Already Async (Working Correctly)
activate_repository- Returns job_id, uses BackgroundJobManager ✅deactivate_repository- Returns job_id, uses BackgroundJobManager ✅sync_repository- Returns job_id, uses BackgroundJobManager ✅
🔴 Still Synchronous (VIOLATIONS)
-
add_golden_repo(handlers.py:451)- Calls BLOCKING
golden_repo_manager.add_golden_repo() - Waits for: git clone + cidx init + cidx index
- Can take 4+ hours with temporal indexing
- Calls BLOCKING
-
remove_golden_repo(handlers.py:465)- Calls BLOCKING
golden_repo_manager.remove_golden_repo() - Deletes large .code-indexer/ directories
- Can take 30+ seconds for large repos
- Calls BLOCKING
-
refresh_golden_repo(handlers.py:482) - BROKEN- MCP handler expects:
job_id = method() - Method returns:
{"success": True, "message": "..."} - Crashes when accessing undefined job_id key
- MCP handler expects:
❓ Need Verification
- REST:
DELETE /api/admin/golden-repos/{alias}- Check if uses BackgroundJobManager - REST:
PUT /api/repos/{user_alias}/branch- Check if uses BackgroundJobManager - MCP:
switch_branch- Check if uses BackgroundJobManager
Acceptance Criteria
AC1: Fix Critical MCP Violations (2 endpoints + 1 broken)
Priority 1 - CRITICAL (4+ hour operations):
- MCP:
add_golden_repo→ Submit to BackgroundJobManager, return job_id immediately - MCP:
remove_golden_repo→ Submit to BackgroundJobManager, return job_id immediately
Priority 1 - BROKEN CODE:
- MCP:
refresh_golden_repo→ Fix implementation to actually return job_id OR fix handler to match return type
AC2: Verify and Document Already-Async Endpoints
- Confirm
activate_repositoryuses BackgroundJobManager ✅ - Confirm
deactivate_repositoryuses BackgroundJobManager ✅ - Confirm
sync_repositoryuses BackgroundJobManager ✅ - Update issue to reflect actual state
AC3: Audit Remaining Suspicious Endpoints
- Check
switch_branch- Is it async or sync? - Check REST
DELETE /api/admin/golden-repos/{alias}- Already async? - Check REST
PUT /api/repos/{user_alias}/branch- Already async? - Document findings
AC4: Response Time Guarantee
- All refactored endpoints return in <1 second
- Background processing handles heavy work
- Job status queryable via
/api/jobs/{job_id}
AC5: Testing
- Unit tests for refactored endpoints
- Integration tests: submit → poll → verify completion
- Timeout tests: verify <1s response
- Fix test: verify
refresh_golden_repodoesn't crash
Implementation Tasks
Task 1: Fix refresh_golden_repo (BROKEN CODE)
Option A: Make implementation async (RECOMMENDED)
# In golden_repo_manager.py
def refresh_golden_repo(self, alias: str) -> str:
"""Returns job_id for background refresh."""
if alias not in self.golden_repos:
raise GoldenRepoError(f"Golden repository '{alias}' not found")
# Submit to BackgroundJobManager
job_id = background_job_manager.submit_job(
"refresh_golden_repo",
self._do_refresh_golden_repo,
submitter_username="admin",
is_admin=True,
alias=alias
)
return job_id
def _do_refresh_golden_repo(self, alias: str) -> Dict[str, Any]:
"""Background worker for refresh (current sync code moves here)."""
# ... existing lines 908-954 move here ...Option B: Fix handler to match sync return (NOT RECOMMENDED)
# In handlers.py:482
result = app.golden_repo_manager.refresh_golden_repo(alias)
return _mcp_response({
"success": result["success"],
"message": result["message"]
})Task 2: Refactor add_golden_repo to Async
Current (handlers.py:451):
result = app.golden_repo_manager.add_golden_repo(...) # BLOCKS
return _mcp_response({"success": True, "message": result["message"]})Refactor to:
# Check if golden_repo_manager has background_job_manager reference
# If not, use app.background_job_manager
job_id = app.background_job_manager.submit_job(
"add_golden_repo",
app.golden_repo_manager.add_golden_repo,
submitter_username=user.username,
is_admin=True,
repo_url=params["url"],
alias=params["alias"],
default_branch=params.get("branch", "main")
)
return _mcp_response({
"success": True,
"job_id": job_id,
"message": f"Golden repository '{params['alias']}' addition started"
})Task 3: Refactor remove_golden_repo to Async
Current (handlers.py:465):
app.golden_repo_manager.remove_golden_repo(alias) # BLOCKS
return _mcp_response({"success": True, "message": "..."})Refactor to:
job_id = app.background_job_manager.submit_job(
"remove_golden_repo",
app.golden_repo_manager.remove_golden_repo,
submitter_username=user.username,
is_admin=True,
alias=alias
)
return _mcp_response({
"success": True,
"job_id": job_id,
"message": f"Golden repository '{alias}' removal started"
})Task 4: Verify Other Endpoints
- Check if
switch_branchis already async (line search in handlers.py) - Check REST endpoints call async or sync versions
- Update issue with accurate count of violations
Testing Requirements
-
Fix Verification:
refresh_golden_repono longer crashesadd_golden_reporeturns in <1s with job_idremove_golden_reporeturns in <1s with job_id
-
End-to-End:
- Submit
add_golden_repowith temporal indexing - Poll job status every 5 seconds
- Verify completes successfully after hours
- Verify repository ready for activation
- Submit
-
Timeout Prevention:
- Call
add_golden_repovia MCP - Verify MCP doesn't timeout after 30s
- Verify can query job status while indexing
- Call
Success Metrics
- ✅ Zero MCP timeout errors in production
- ✅ All MCP endpoints return in <1 second
- ✅
refresh_golden_repodoesn't crash - ✅ Can add 4+ hour temporal repos without timeout
Estimated Effort
- Fix broken
refresh_golden_repo: 2 hours - Refactor
add_golden_repo: 4 hours - Refactor
remove_golden_repo: 2 hours - Testing: 1 day
- Total: 2-3 days
Priority
CRITICAL - refresh_golden_repo is BROKEN (crashes), add_golden_repo blocks MCP for hours causing timeouts.