Conversation
Remove all Khoj integration: Docker provisioning, pgvector, semantic search backend, install-khoj/start-khoj CLI commands, settings tab, search toggle, MCP armillary_semantic tool, KhojConfigBlock, and all related tests. -2844 lines across 18 files. 2 files deleted entirely (cli_khoj.py, khoj_service.py). armillary now uses ripgrep only for search — simpler install, zero Docker dependency, same core value. 280 tests remain (was 313 — 33 Khoj-only tests removed). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fully removes the Khoj/Docker/pgvector semantic search integration so armillary’s search experience is ripgrep-only across CLI, UI, MCP tools, config, and tests.
Changes:
- Removed Khoj backend/client code, CLI commands, config block, and UI settings/search toggles.
- Simplified CLI/UI/MCP search to always use ripgrep and updated docs accordingly.
- Deleted Khoj-specific tests and updated remaining tests to match the new behavior.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_search.py | Removes Khoj backend tests; keeps ripgrep parser coverage. |
| tests/test_mcp_server.py | Removes armillary_semantic tool test. |
| tests/test_config.py | Removes Khoj config round-trip assertions and references. |
| tests/test_cli.py | Removes Khoj CLI/config-init detection and install-khoj/start-khoj test suites. |
| src/armillary/ui/settings.py | Drops Khoj settings tab; updates settings page text/tabs. |
| src/armillary/ui/settings_tabs.py | Removes Khoj tab implementation and connection test helper. |
| src/armillary/ui/search.py | Removes semantic toggle + backend switching; ripgrep-only flow. |
| src/armillary/search.py | Deletes Khoj backend/client code; leaves ripgrep backend as sole implementation. |
| src/armillary/mcp_server.py | Removes armillary_semantic tool and related config wiring. |
| src/armillary/khoj_service.py | Deletes Docker provisioning/health-check helper module. |
| src/armillary/config.py | Removes KhojConfigBlock from the config schema. |
| src/armillary/cli.py | Unregisters cli_khoj commands. |
| src/armillary/cli_tools.py | Removes --khoj flag and Khoj backend selection; ripgrep-only CLI search. |
| src/armillary/cli_khoj.py | Deletes install-khoj / start-khoj commands. |
| src/armillary/cli_config.py | Removes Khoj detection step and --skip-khoj-detect flag. |
| src/armillary/cli_config_ceremony.py | Removes Khoj probing/auto-enable and YAML emission of Khoj block. |
| README.md | Removes Khoj mentions; updates feature list, requirements, and MCP tool list. |
| CLAUDE.md | Removes Khoj integration section. |
Comments suppressed due to low confidence (1)
src/armillary/search.py:29
SearchHit.lineis still typed asint | None, but with Khoj removed the code/documentation now implies a line number is always present. To simplify downstream handling and keep the type contract accurate, consider makingline: intand updating_parse_ripgrep_jsonlto skip matches without aline_number(or coerce safely).
@dataclass(frozen=True)
class SearchHit:
"""A single match across all indexed projects.
`path` is the absolute file the match came from. `line` is the
1-based line number. `preview` is the matched line — kept short
enough to render in a dashboard table cell.
"""
path: Path
line: int | None
preview: str
backend: str
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| console = Console() | ||
| total_hits = 0 | ||
| for project in projects: | ||
| try: | ||
| hits = backend.search(query, root=project.path, max_results=max_results) | ||
| except Exception as exc: # noqa: BLE001 — KhojResponseError, URLError, etc. | ||
| except Exception as exc: # noqa: BLE001 — permission errors, broken files, etc. |
There was a problem hiding this comment.
search() currently calls ripgrep with --max-count set to the full --max for every project, even when only a few results remain before hitting the global cap. This can do unnecessary work on later projects and slow down large caches. Consider computing a per-project remaining budget (e.g., remaining = max_results - total_hits) and passing that (clamped to >=1) into backend.search(...).
|
|
||
| from armillary.cache import Cache | ||
| from armillary.config import ConfigError, load_config | ||
| from armillary.search import KhojConfig, KhojSearch, LiteralSearch, SearchHit | ||
| from armillary.search import LiteralSearch, SearchHit | ||
|
|
There was a problem hiding this comment.
Now that Khoj is removed, MCP search depends entirely on ripgrep. LiteralSearch.search() returns an empty list when rg is missing, which makes armillary_search respond with "No matches" and can mislead agents. Add an explicit LiteralSearch.is_available() check in the MCP tool and return a clear error message when ripgrep is not installed/on PATH.
- CLI search passes remaining count to ripgrep instead of full --max - MCP armillary_search returns clear error if rg is missing - Fix test mock to include is_available() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/armillary/search.py
Outdated
| 1-based line number. `preview` is the matched line — kept short | ||
| enough to render in a dashboard table cell. |
There was a problem hiding this comment.
SearchHit.line remains typed as int | None, but the updated docstring now implies it is always a 1-based line number. Either make line a plain int (and validate/skip ripgrep events missing line_number), or update the docstring to explicitly allow None so the model contract stays accurate.
| 1-based line number. `preview` is the matched line — kept short | |
| enough to render in a dashboard table cell. | |
| 1-based line number when available, or `None` if ripgrep did not | |
| provide one. `preview` is the matched line — kept short enough to | |
| render in a dashboard table cell. |
| max_results = _clamp_max_results(max_results) | ||
| if not LiteralSearch.is_available(): | ||
| return "ripgrep (`rg`) is not installed. Install it: `brew install ripgrep`." | ||
| backend = LiteralSearch() |
There was a problem hiding this comment.
armillary_search’s docstring claims the tool is “always available”, but the implementation now returns an error when rg is missing. Please update the docstring to reflect that ripgrep is required (or adjust behavior if it truly should be always available).
…equirement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if total_hits == 0: | ||
| typer.secho(f"No matches for '{query}'.", fg=typer.colors.YELLOW) | ||
|
|
There was a problem hiding this comment.
total_hits == 0 currently prints "No matches" even if every project search attempt raised an exception earlier in the loop (those exceptions are logged and then continue). This can yield a misleading success exit with a no-matches message when the search actually failed. Consider tracking whether any per-project errors occurred and (a) emitting a different summary message and/or (b) returning a non-zero exit code when all projects fail (or when any failures happen in strict/scripted mode).
Track error count; if every project raised an exception, exit with error instead of misleading "No matches" success message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Complete removal of Khoj semantic search integration. armillary now uses ripgrep only.
Deleted:
cli_khoj.py— install-khoj, start-khoj commandskhoj_service.py— Docker provisioning, pgvector, health checksCleaned (18 files, -2844 lines):
config.py— removed KhojConfigBlocksearch.py— removed KhojSearch, KhojConfig, KhojResponseErrormcp_server.py— removed armillary_semantic toolcli_tools.py— removed --khoj flag from searchcli_config.py— removed --skip-khoj-detect flagcli_config_ceremony.py— removed Khoj detection stepui/search.py— removed semantic toggleui/settings.py— removed Khoj tabui/settings_tabs.py— removed Khoj settings + admin credentialsWhy
ADR 0008 Round 5: 3/3 personas + UX reviewer agreed Khoj is too much infrastructure (Docker + pgvector) for a personal dev tool. Ripgrep + future tag index covers 95% of use cases.
Test plan
🤖 Generated with Claude Code