Skip to content

Refactor 9 synchronous API endpoints to asynchronous pattern (30s timeout violations) #501

@jsbattig

Description

@jsbattig

User Story

As a CIDX API consumer (Claude Code MCP client or REST API user),
I want all long-running operations to execute asynchronously with background job tracking,
So that my connections don't timeout and I can monitor operation progress.

Background

Code inspection revealed 2 critical MCP endpoints that are SYNCHRONOUS and violate the 30-second rule:

  • add_golden_repo: Blocks for minutes to 4+ hours (with temporal indexing)
  • remove_golden_repo: Blocks for 30+ seconds (large directory cleanup)

Additionally, refresh_golden_repo has BROKEN code - MCP handler expects job_id but implementation returns Dict.

These violations cause:

  • Production timeouts with Claude Code MCP integration (30-second MCP timeout)
  • Poor user experience (hanging connections)
  • Broken functionality (refresh_golden_repo crashes)

Current State Analysis

✅ Already Async (Working Correctly)

  • activate_repository - Returns job_id, uses BackgroundJobManager ✅
  • deactivate_repository - Returns job_id, uses BackgroundJobManager ✅
  • sync_repository - Returns job_id, uses BackgroundJobManager ✅

🔴 Still Synchronous (VIOLATIONS)

  1. add_golden_repo (handlers.py:451)

    • Calls BLOCKING golden_repo_manager.add_golden_repo()
    • Waits for: git clone + cidx init + cidx index
    • Can take 4+ hours with temporal indexing
  2. remove_golden_repo (handlers.py:465)

    • Calls BLOCKING golden_repo_manager.remove_golden_repo()
    • Deletes large .code-indexer/ directories
    • Can take 30+ seconds for large repos
  3. refresh_golden_repo (handlers.py:482) - BROKEN

    • MCP handler expects: job_id = method()
    • Method returns: {"success": True, "message": "..."}
    • Crashes when accessing undefined job_id key

❓ Need Verification

  • REST: DELETE /api/admin/golden-repos/{alias} - Check if uses BackgroundJobManager
  • REST: PUT /api/repos/{user_alias}/branch - Check if uses BackgroundJobManager
  • MCP: switch_branch - Check if uses BackgroundJobManager

Acceptance Criteria

AC1: Fix Critical MCP Violations (2 endpoints + 1 broken)

Priority 1 - CRITICAL (4+ hour operations):

  • MCP: add_golden_repo → Submit to BackgroundJobManager, return job_id immediately
  • MCP: remove_golden_repo → Submit to BackgroundJobManager, return job_id immediately

Priority 1 - BROKEN CODE:

  • MCP: refresh_golden_repo → Fix implementation to actually return job_id OR fix handler to match return type

AC2: Verify and Document Already-Async Endpoints

  • Confirm activate_repository uses BackgroundJobManager ✅
  • Confirm deactivate_repository uses BackgroundJobManager ✅
  • Confirm sync_repository uses BackgroundJobManager ✅
  • Update issue to reflect actual state

AC3: Audit Remaining Suspicious Endpoints

  • Check switch_branch - Is it async or sync?
  • Check REST DELETE /api/admin/golden-repos/{alias} - Already async?
  • Check REST PUT /api/repos/{user_alias}/branch - Already async?
  • Document findings

AC4: Response Time Guarantee

  • All refactored endpoints return in <1 second
  • Background processing handles heavy work
  • Job status queryable via /api/jobs/{job_id}

AC5: Testing

  • Unit tests for refactored endpoints
  • Integration tests: submit → poll → verify completion
  • Timeout tests: verify <1s response
  • Fix test: verify refresh_golden_repo doesn't crash

Implementation Tasks

Task 1: Fix refresh_golden_repo (BROKEN CODE)

Option A: Make implementation async (RECOMMENDED)

# In golden_repo_manager.py
def refresh_golden_repo(self, alias: str) -> str:
    """Returns job_id for background refresh."""
    if alias not in self.golden_repos:
        raise GoldenRepoError(f"Golden repository '{alias}' not found")
    
    # Submit to BackgroundJobManager
    job_id = background_job_manager.submit_job(
        "refresh_golden_repo",
        self._do_refresh_golden_repo,
        submitter_username="admin",
        is_admin=True,
        alias=alias
    )
    return job_id

def _do_refresh_golden_repo(self, alias: str) -> Dict[str, Any]:
    """Background worker for refresh (current sync code moves here)."""
    # ... existing lines 908-954 move here ...

Option B: Fix handler to match sync return (NOT RECOMMENDED)

# In handlers.py:482
result = app.golden_repo_manager.refresh_golden_repo(alias)
return _mcp_response({
    "success": result["success"],
    "message": result["message"]
})

Task 2: Refactor add_golden_repo to Async

Current (handlers.py:451):

result = app.golden_repo_manager.add_golden_repo(...)  # BLOCKS
return _mcp_response({"success": True, "message": result["message"]})

Refactor to:

# Check if golden_repo_manager has background_job_manager reference
# If not, use app.background_job_manager
job_id = app.background_job_manager.submit_job(
    "add_golden_repo",
    app.golden_repo_manager.add_golden_repo,
    submitter_username=user.username,
    is_admin=True,
    repo_url=params["url"],
    alias=params["alias"],
    default_branch=params.get("branch", "main")
)
return _mcp_response({
    "success": True,
    "job_id": job_id,
    "message": f"Golden repository '{params['alias']}' addition started"
})

Task 3: Refactor remove_golden_repo to Async

Current (handlers.py:465):

app.golden_repo_manager.remove_golden_repo(alias)  # BLOCKS
return _mcp_response({"success": True, "message": "..."})

Refactor to:

job_id = app.background_job_manager.submit_job(
    "remove_golden_repo",
    app.golden_repo_manager.remove_golden_repo,
    submitter_username=user.username,
    is_admin=True,
    alias=alias
)
return _mcp_response({
    "success": True,
    "job_id": job_id,
    "message": f"Golden repository '{alias}' removal started"
})

Task 4: Verify Other Endpoints

  • Check if switch_branch is already async (line search in handlers.py)
  • Check REST endpoints call async or sync versions
  • Update issue with accurate count of violations

Testing Requirements

  1. Fix Verification:

    • refresh_golden_repo no longer crashes
    • add_golden_repo returns in <1s with job_id
    • remove_golden_repo returns in <1s with job_id
  2. End-to-End:

    • Submit add_golden_repo with temporal indexing
    • Poll job status every 5 seconds
    • Verify completes successfully after hours
    • Verify repository ready for activation
  3. Timeout Prevention:

    • Call add_golden_repo via MCP
    • Verify MCP doesn't timeout after 30s
    • Verify can query job status while indexing

Success Metrics

  • Zero MCP timeout errors in production
  • All MCP endpoints return in <1 second
  • refresh_golden_repo doesn't crash
  • Can add 4+ hour temporal repos without timeout

Estimated Effort

  • Fix broken refresh_golden_repo: 2 hours
  • Refactor add_golden_repo: 4 hours
  • Refactor remove_golden_repo: 2 hours
  • Testing: 1 day
  • Total: 2-3 days

Priority

CRITICAL - refresh_golden_repo is BROKEN (crashes), add_golden_repo blocks MCP for hours causing timeouts.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions