Skip to content

feat(agents): distribute MCP tool capabilities across agent roster#4091

Closed
mrveiss wants to merge 1 commit intoDev_new_guifrom
feature/3386-mcp-distribution
Closed

feat(agents): distribute MCP tool capabilities across agent roster#4091
mrveiss wants to merge 1 commit intoDev_new_guifrom
feature/3386-mcp-distribution

Conversation

@mrveiss
Copy link
Copy Markdown
Owner

@mrveiss mrveiss commented Apr 10, 2026

Summary

Implement Phase 1 of MCP tool distribution across all 29 agents in the AutoBot platform. All agents now have access to memory management, with specialized agents gaining sequential thinking, structured thinking, and task management capabilities for improved multi-step problem solving and cross-agent collaboration.

Changes

  • Added memory_mcp to all 29 agents for persistent knowledge tracking
  • Added sequential_thinking_mcp to 17 analysis/decision-making agents
  • Added structured_thinking_mcp to 12 agents requiring systematic analysis
  • Added shrimp_task_manager_mcp to 10 agents managing complex workflows
  • All tool assignments documented in MCP_DISTRIBUTION_SUMMARY.md

Testing

  • Python syntax validation of agent_config.py: PASS
  • Agent configurations remain structurally intact
  • All 29 agent IDs and base configurations preserved

Closes #3386

🤖 Generated with Claude Code

…3386)

Implement Phase 1 of MCP tool distribution across all 29 agents in the AutoBot platform.
All agents now have access to memory management, with specialized agents gaining sequential thinking,
structured thinking, and task management capabilities for improved multi-step problem solving and
cross-agent collaboration.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@mrveiss
Copy link
Copy Markdown
Owner Author

mrveiss commented Apr 10, 2026

Unified service account model deployed and verified in production. All services running as autobot:autobot with shared /var/lib/autobot/ directory. Permission conflicts resolved.

@mrveiss mrveiss closed this Apr 10, 2026
@github-actions
Copy link
Copy Markdown

✅ SSOT Configuration Compliance: Passing

🎉 No hardcoded values detected that have SSOT config equivalents!

mrveiss added a commit that referenced this pull request Apr 10, 2026
All AutoBot services on a host now use the same autobot:autobot account,
eliminating the confusing split between backend (autobot) and ai-stack (autobot-ai).

Changes:
- ai-stack role defaults: ai_user/ai_group now 'autobot' instead of 'autobot-ai'
- ai_data_dir changed from /var/lib/autobot-ai to /var/lib/autobot (shared)
- Removed separate autobot-ai account creation in ai-stack tasks
- Removed ai_user/ai_group override in setup_wizard auto-inject

Benefits:
- Simpler permissions and ownership model
- No permission conflicts during co-location
- Consistent with backend service model
- Eliminates need for auto-inject overrides

Related: #3501, #3097, #4088 (EnvironmentFile fix)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
mrveiss added a commit that referenced this pull request Apr 10, 2026
* perf(frontend): lazy-load Cytoscape graph components (#3998)

- Defer Cytoscape (300KB library) import until network view is activated
- Replace static imports with dynamic imports in ImportTreeChart and FunctionCallGraph
- Add loading and error states for Cytoscape initialization
- Cytoscape now loads separately via dynamic import, reducing initial bundle
- Split Cytoscape into separate chunks (434KB gzip: 137KB) and fcose (122KB gzip: 33KB)
- Default tree/list views load instantly without Cytoscape overhead
- Network view shows loading state while library is fetched
- Suspense boundaries already in place in parent CodebaseDependenciesPanel

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(frontend): remove broken pause/resume calls in visibility handler (#3999)

The useBackoffPoller composable already handles visibility changes internally
by clearing timers when the tab is hidden and resuming when visible. The
ChatInterface was attempting to call non-existent pause() and resume() methods,
which would cause runtime errors.

This fix removes the redundant visibility change handler and lets the poller
manage its own lifecycle, eliminating the bug and reducing code duplication.

The implementation now relies on useBackoffPoller's built-in visibility detection
which properly pauses polling when the browser tab is not visible.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(backend): add missing get_redis_client import in advanced_cache_manager.py

NameError: name 'get_redis_client' is not defined at startup
File: utils/advanced_cache_manager.py:197 during AdvancedCacheManager initialization
This caused all backend worker processes to crash immediately on startup
preventing port 8001 from becoming available (60s timeout during provisioning)

Related: Issue #4059 - migrate remaining get_redis_client callers

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(ansible): ChromaDB host detection for co-located vs distributed deployments

Issue #4084: On WSL2 co-located deployments, ChromaDB runs inside container
at 127.0.0.1:8100, not on Windows host at 10.255.255.254.

Fix: Detect if ai-stack role is deployed on same node:
- Co-located (ai-stack present) → use 127.0.0.1
- Distributed (ai-stack absent) → use 10.255.255.254 (Windows host)

File: autobot-slm-backend/ansible/roles/backend/tasks/main.yml
This ensures ChromaDB connection works on all deployment patterns.

Related: Issue #3541 (original WSL2 logic)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(frontend): LiveEventService uses wrong host for WebSocket connection

Issue: LiveEventService.getUrl() used window.location.host which connects
to whatever server served the frontend (e.g., SLM manager at 172.16.168.20)
but the /api/ws/live endpoint is on the backend (localhost:8001).

Result: Frontend gets 403 Forbidden errors when trying to connect to live events.

Fix: Use config.websocketUrl (properly configured to backend host:port)
instead of window.location.host.

File: autobot-frontend/src/services/LiveEventService.ts
- Import config from ssot-config
- Use config.websocketUrl + /live endpoint

This ensures WebSocket connections route to the correct backend server.

Related: Issue #601 SSOT config system

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(frontend): correct ssot-config import in LiveEventService

Import config as default export (not named export).
config is exported as default from ssot-config.ts, not as named export.

File: autobot-frontend/src/services/LiveEventService.ts

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(ai-stack): ChromaDB health check uses correct endpoint

Issue: health_check() was calling /health endpoint on ChromaDB
but ChromaDB doesn't have that endpoint. It uses /api/v2 for heartbeat.

Result: Health check always failed with Connection refused error

Fix: Change health check endpoint from /health to /api/v2
This is the correct ChromaDB heartbeat endpoint

File: autobot-backend/services/ai_stack_client.py:291
Endpoint: /health → /api/v2

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(ansible): AI Stack systemd services use correct EnvironmentFile path (#4088)

- Changed autobot-chromadb.service.j2 to reference /etc/autobot/autobot-ai-stack.env
- Changed autobot-ai-stack.service.j2 to reference /etc/autobot/autobot-ai-stack.env
- Matches where the environment template is deployed (ai-stack/tasks/main.yml:51)
- Fixes: chromadb and ai-stack services failing to start due to missing env file

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor(ai-stack): unified service account to autobot:autobot (#4091)

All AutoBot services on a host now use the same autobot:autobot account,
eliminating the confusing split between backend (autobot) and ai-stack (autobot-ai).

Changes:
- ai-stack role defaults: ai_user/ai_group now 'autobot' instead of 'autobot-ai'
- ai_data_dir changed from /var/lib/autobot-ai to /var/lib/autobot (shared)
- Removed separate autobot-ai account creation in ai-stack tasks
- Removed ai_user/ai_group override in setup_wizard auto-inject

Benefits:
- Simpler permissions and ownership model
- No permission conflicts during co-location
- Consistent with backend service model
- Eliminates need for auto-inject overrides

Related: #3501, #3097, #4088 (EnvironmentFile fix)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(backend): sync docs directory for knowledge base population (#4100)

- Added rsync task to deploy docs/ from code_source to /opt/autobot/docs
- Enables doc_indexer.py to find documentation files for KB population
- Excludes archives/ to keep deployment lean
- Excludes __pycache__, .pyc, .git to avoid bloat

Fixes: Knowledge base population endpoint now able to index AutoBot documentation

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat(rag): semantic chunking, fact extraction, entity resolution (#3395) (#4093)

Implement Phase 2-3 of RAG optimization with atomic fact extraction and entity resolution:

**Phase 2 - Atomic Facts Extraction:**
- Added AtomicFact model for representing discrete factual statements
- Implemented FactExtractor cognifier with dual-mode extraction:
  - NLP mode: Fast regex-based pattern matching for large document sets
  - LLM mode: High-accuracy extraction using language models
- Facts deduplicated by normalized subject-predicate-object triples
- Cross-validation via supported_by_count tracking

**Phase 3 - Entity Resolution:**
- Added EntityResolver cognifier for multi-strategy deduplication:
  - Exact canonical name matching
  - Predefined synonym mappings (e.g., "AutoBot" = "AutoBot AI")
  - Fuzzy string similarity matching (configurable threshold)
- Preserves source chunk references and extraction confidence

**Infrastructure:**
- Updated PipelineContext to support facts extraction
- Extended PipelineResult metrics for fact tracking
- Comprehensive test coverage (26 tests, all passing)
- Flake8 compliant code with proper documentation

Expected improvements: 25-40% retrieval accuracy, 30-50% faster troubleshooting.

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor(config): migrate infrastructure callers from ConfigManager to ssot_config (#3829) (#4101)

Migrate 13 files that used ConfigManager.get_host(), get_port(),
get_service_url(), get_redis_config(), and get_ollama_url() to call
ssot_config directly, eliminating duplicate infrastructure lookups.

Changes:
- autobot_shared/ssot_config.py: add PortConfig.chrome_cdp field (default
  9222, AUTOBOT_CHROME_CDP_PORT) and AutoBotConfig.chrome_cdp_url property
- autobot-backend/utils/service_registry.py: port/host lookups -> ssot_config
- autobot-backend/utils/system_validator.py: base_urls and critical_ports
  -> ssot_config.backend_url / frontend_url / ollama_url and vm.*/port.*
- autobot-backend/utils/system_metrics.py: ollama URL -> ssot_config.ollama_url
- autobot-backend/utils/model_optimizer.py: ollama URL -> ssot_config.ollama_url
- autobot-backend/startup_validator.py: get_host, get_redis_config,
  get_service_url -> ssot_config equivalents
- autobot-backend/auth_middleware.py: get_redis_config -> ssot_config.redis.enabled
- autobot-backend/knowledge/base.py: redis host/port/password -> ssot_config.vm/port/redis
- autobot-backend/llm_interface_pkg/interface.py: get_service_url("ollama")
  -> ssot_config.ollama_url
- autobot-backend/api/system.py: get_service_url, get_redis_config, get_host,
  get_port -> ssot_config equivalents
- autobot-backend/api/llm.py: get_service_url("ollama") -> ssot_config.ollama_url
- autobot-backend/api/research_browser.py: get_service_url -> ssot_config.browser_service_url
- autobot-backend/research_browser_manager.py: get_service_url("chrome") ->
  ssot_config.chrome_cdp_url

ConfigManager is retained for runtime/mutable YAML config (feature flags,
circuit-breaker thresholds, LLM provider settings, security config) per
the migration plan in the issue.

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* arch(mcp): add process/container isolation for MCP tool bridges (#4089)

* arch(mcp): add process/container isolation for MCP tool bridges (#3229)

Introduces an opt-in subprocess/container isolation layer for high-risk
MCP bridges (filesystem_mcp, browser_mcp, vnc_mcp) so a misbehaving tool
can no longer stall or crash the backend event loop.

New modules:
- services/mcp_isolation_config.py: per-bridge policy (mode + rlimits),
  all resolved from environment for zero-code tuning
- services/mcp_isolated_runtime.py: IsolatedBridgeClient with JSON-RPC
  stdio framing, crash auto-restart, circuit breaker, env scrubbing
- services/mcp_bridge_workers/worker_entrypoint.py: child process that
  applies RLIMIT_CPU / RLIMIT_AS / RLIMIT_NOFILE / RLIMIT_NPROC and
  serves a JSON-RPC loop over stdin/stdout, reusing existing bridge code

Wiring:
- MCPDispatcher._call_bridge routes through the isolated registry when
  policy requires it; dispatcher API and caller surface unchanged

Infrastructure:
- autobot-mcp-bridge@.service.j2: hardened systemd template (seccomp,
  ProtectSystem, CPUQuota, MemoryMax, TasksMax, no-new-privileges)
- docker/mcp-bridges.yml: cgroup-limited containers with read-only root,
  cap_drop ALL, and internal-only network

Defaults are off (MCP_ISOLATION_MODE=inprocess) so rollout is a no-op
until explicitly enabled per bridge on the SLM Manager.

Tests: 17 new (9 config + 8 runtime), 15 existing dispatcher tests still
pass. Closes the acceptance criteria in #3229:
- High-risk bridges execute in isolated process/container
- Backend stability unaffected by misbehaving bridge (SIGKILL + restart)
- MCP registry/dispatcher unchanged from caller perspective
- Resource limits configurable per bridge category

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp-isolation): resolve race conditions in async request handling (#3229)

- Make _next_id() async and call within lock to prevent concurrent request ID collisions
- Move response processing inside lock in call_tool() to prevent data race
- Add error handling for stdin/stdout connection failures in worker_entrypoint.py
- Add comprehensive type hints to IsolatedBridgeClient methods

Fixes critical race conditions identified in code review.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(knowledge): async background task for populate_autobot_docs (#4103)

- Endpoint returns immediately with task_id instead of blocking
- Actual indexing runs in background via BackgroundTasks
- 442 markdown files now embedded asynchronously without timeout
- Added status polling endpoint (TODO: persistent task tracking)

Fixes: service timeout when indexing large documentation sets
Related: #4103 (populate hangs backend due to synchronous embedding)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat(agents): distribute MCP tool capabilities across agent roster (#3386) (#4104)

Implement Phase 1 of MCP tool distribution across all 29 agents in the AutoBot platform.
All agents now have access to memory management, with specialized agents gaining sequential thinking,
structured thinking, and task management capabilities for improved multi-step problem solving and
cross-agent collaboration.

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(frontend): correct const/let declarations for lazy-loaded Cytoscape modules (#3998)

- Change cytoscapeModule and fcoseModule from const to let to allow runtime assignment
- Fixes TypeScript type errors in lazy-loading implementation

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
mrveiss added a commit that referenced this pull request Apr 12, 2026
All AutoBot services on a host now use the same autobot:autobot account,
eliminating the confusing split between backend (autobot) and ai-stack (autobot-ai).

Changes:
- ai-stack role defaults: ai_user/ai_group now 'autobot' instead of 'autobot-ai'
- ai_data_dir changed from /var/lib/autobot-ai to /var/lib/autobot (shared)
- Removed separate autobot-ai account creation in ai-stack tasks
- Removed ai_user/ai_group override in setup_wizard auto-inject

Benefits:
- Simpler permissions and ownership model
- No permission conflicts during co-location
- Consistent with backend service model
- Eliminates need for auto-inject overrides

Related: #3501, #3097, #4088 (EnvironmentFile fix)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant