Skip to content

remove: kill Khoj/Docker/semantic search — ripgrep only#25

Merged
justi merged 4 commits intomainfrom
kill-khoj
Apr 13, 2026
Merged

remove: kill Khoj/Docker/semantic search — ripgrep only#25
justi merged 4 commits intomainfrom
kill-khoj

Conversation

@justi
Copy link
Copy Markdown
Owner

@justi justi commented Apr 12, 2026

Summary

Complete removal of Khoj semantic search integration. armillary now uses ripgrep only.

Deleted:

  • cli_khoj.py — install-khoj, start-khoj commands
  • khoj_service.py — Docker provisioning, pgvector, health checks

Cleaned (18 files, -2844 lines):

  • config.py — removed KhojConfigBlock
  • search.py — removed KhojSearch, KhojConfig, KhojResponseError
  • mcp_server.py — removed armillary_semantic tool
  • cli_tools.py — removed --khoj flag from search
  • cli_config.py — removed --skip-khoj-detect flag
  • cli_config_ceremony.py — removed Khoj detection step
  • ui/search.py — removed semantic toggle
  • ui/settings.py — removed Khoj tab
  • ui/settings_tabs.py — removed Khoj settings + admin credentials
  • README.md, CLAUDE.md — removed all Khoj mentions
  • 4 test files — removed 33 Khoj-only tests

Why

ADR 0008 Round 5: 3/3 personas + UX reviewer agreed Khoj is too much infrastructure (Docker + pgvector) for a personal dev tool. Ripgrep + future tag index covers 95% of use cases.

Test plan

  • 280 tests pass (33 Khoj tests removed, 0 broken)
  • ruff check + format clean
  • Zero Khoj references remain in src/ and tests/

🤖 Generated with Claude Code

Remove all Khoj integration: Docker provisioning, pgvector, semantic
search backend, install-khoj/start-khoj CLI commands, settings tab,
search toggle, MCP armillary_semantic tool, KhojConfigBlock, and all
related tests.

-2844 lines across 18 files. 2 files deleted entirely (cli_khoj.py,
khoj_service.py). armillary now uses ripgrep only for search —
simpler install, zero Docker dependency, same core value.

280 tests remain (was 313 — 33 Khoj-only tests removed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 12, 2026 23:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fully removes the Khoj/Docker/pgvector semantic search integration so armillary’s search experience is ripgrep-only across CLI, UI, MCP tools, config, and tests.

Changes:

  • Removed Khoj backend/client code, CLI commands, config block, and UI settings/search toggles.
  • Simplified CLI/UI/MCP search to always use ripgrep and updated docs accordingly.
  • Deleted Khoj-specific tests and updated remaining tests to match the new behavior.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_search.py Removes Khoj backend tests; keeps ripgrep parser coverage.
tests/test_mcp_server.py Removes armillary_semantic tool test.
tests/test_config.py Removes Khoj config round-trip assertions and references.
tests/test_cli.py Removes Khoj CLI/config-init detection and install-khoj/start-khoj test suites.
src/armillary/ui/settings.py Drops Khoj settings tab; updates settings page text/tabs.
src/armillary/ui/settings_tabs.py Removes Khoj tab implementation and connection test helper.
src/armillary/ui/search.py Removes semantic toggle + backend switching; ripgrep-only flow.
src/armillary/search.py Deletes Khoj backend/client code; leaves ripgrep backend as sole implementation.
src/armillary/mcp_server.py Removes armillary_semantic tool and related config wiring.
src/armillary/khoj_service.py Deletes Docker provisioning/health-check helper module.
src/armillary/config.py Removes KhojConfigBlock from the config schema.
src/armillary/cli.py Unregisters cli_khoj commands.
src/armillary/cli_tools.py Removes --khoj flag and Khoj backend selection; ripgrep-only CLI search.
src/armillary/cli_khoj.py Deletes install-khoj / start-khoj commands.
src/armillary/cli_config.py Removes Khoj detection step and --skip-khoj-detect flag.
src/armillary/cli_config_ceremony.py Removes Khoj probing/auto-enable and YAML emission of Khoj block.
README.md Removes Khoj mentions; updates feature list, requirements, and MCP tool list.
CLAUDE.md Removes Khoj integration section.
Comments suppressed due to low confidence (1)

src/armillary/search.py:29

  • SearchHit.line is still typed as int | None, but with Khoj removed the code/documentation now implies a line number is always present. To simplify downstream handling and keep the type contract accurate, consider making line: int and updating _parse_ripgrep_jsonl to skip matches without a line_number (or coerce safely).
@dataclass(frozen=True)
class SearchHit:
    """A single match across all indexed projects.

    `path` is the absolute file the match came from. `line` is the
    1-based line number. `preview` is the matched line — kept short
    enough to render in a dashboard table cell.
    """

    path: Path
    line: int | None
    preview: str
    backend: str

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 72 to +77
console = Console()
total_hits = 0
for project in projects:
try:
hits = backend.search(query, root=project.path, max_results=max_results)
except Exception as exc: # noqa: BLE001 — KhojResponseError, URLError, etc.
except Exception as exc: # noqa: BLE001 — permission errors, broken files, etc.
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search() currently calls ripgrep with --max-count set to the full --max for every project, even when only a few results remain before hitting the global cap. This can do unnecessary work on later projects and slow down large caches. Consider computing a per-project remaining budget (e.g., remaining = max_results - total_hits) and passing that (clamped to >=1) into backend.search(...).

Copilot uses AI. Check for mistakes.
Comment on lines 18 to 21

from armillary.cache import Cache
from armillary.config import ConfigError, load_config
from armillary.search import KhojConfig, KhojSearch, LiteralSearch, SearchHit
from armillary.search import LiteralSearch, SearchHit

Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that Khoj is removed, MCP search depends entirely on ripgrep. LiteralSearch.search() returns an empty list when rg is missing, which makes armillary_search respond with "No matches" and can mislead agents. Add an explicit LiteralSearch.is_available() check in the MCP tool and return a clear error message when ripgrep is not installed/on PATH.

Copilot uses AI. Check for mistakes.
- CLI search passes remaining count to ripgrep instead of full --max
- MCP armillary_search returns clear error if rg is missing
- Fix test mock to include is_available()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +23
1-based line number. `preview` is the matched line — kept short
enough to render in a dashboard table cell.
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchHit.line remains typed as int | None, but the updated docstring now implies it is always a 1-based line number. Either make line a plain int (and validate/skip ripgrep events missing line_number), or update the docstring to explicitly allow None so the model contract stays accurate.

Suggested change
1-based line number. `preview` is the matched linekept short
enough to render in a dashboard table cell.
1-based line number when available, or `None` if ripgrep did not
provide one. `preview` is the matched linekept short enough to
render in a dashboard table cell.

Copilot uses AI. Check for mistakes.
Comment on lines 117 to 120
max_results = _clamp_max_results(max_results)
if not LiteralSearch.is_available():
return "ripgrep (`rg`) is not installed. Install it: `brew install ripgrep`."
backend = LiteralSearch()
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

armillary_search’s docstring claims the tool is “always available”, but the implementation now returns an error when rg is missing. Please update the docstring to reflect that ripgrep is required (or adjust behavior if it truly should be always available).

Copilot uses AI. Check for mistakes.
…equirement

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 105 to 107
if total_hits == 0:
typer.secho(f"No matches for '{query}'.", fg=typer.colors.YELLOW)

Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_hits == 0 currently prints "No matches" even if every project search attempt raised an exception earlier in the loop (those exceptions are logged and then continue). This can yield a misleading success exit with a no-matches message when the search actually failed. Consider tracking whether any per-project errors occurred and (a) emitting a different summary message and/or (b) returning a non-zero exit code when all projects fail (or when any failures happen in strict/scripted mode).

Copilot uses AI. Check for mistakes.
Track error count; if every project raised an exception, exit with
error instead of misleading "No matches" success message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@justi justi merged commit 4ff8561 into main Apr 13, 2026
3 checks passed
@justi justi deleted the kill-khoj branch April 13, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants