docs(backlog): add #109 hardware-aware embedding model auto-install#1282
Conversation
On first embed, detect hardware (RAM/VRAM) and automatically install the best MTEB-ranked embedding model that fits — no prompt, persisted to .codegraphrc.json.
|
Claude finished @carlos-alm's task —— View job PR Review: docs(backlog): add #109 hardware-aware embedding model auto-install
Root Cause AnalysisProblem Being Solved: The PR adds backlog item #109 to Tier 2 of Why This Approach: Hardware detection ( Risk Assessment: This is a documentation-only PR — no code changes, no configuration changes, no tests modified. Phases 2 and 3 are N/A. The only review surface is the accuracy and placement of the backlog entry. Phase 1: Root Cause Analysis✓ The entry documents a real problem (OOM / unusable performance on hardware-limited machines). Phase 2: Configuration Change Detection✓ N/A — no config files modified. Phase 3: Test Integrity✓ N/A — no test files modified. Phase 4: Code Quality✓ N/A — documentation only. The prose is well-scoped and includes concrete implementation details (MTEB rank list, static JSON, HuggingFace download, Backlog ComplianceZero-dep: ✗ The feature description explicitly notes Foundation-aligned: ✓
Problem-fit: 3 The description assigns 3, which is accurate. The feature improves the embedding onboarding UX but does not directly reduce token waste, prevent hallucinated dependencies, or catch blast-radius breakage. It makes semantic search more likely to work on first try — useful, but one step removed from the core AI orientation problem. Breaking: No — purely additive. No existing CLI output, API signature, DB schema, or MCP tool contract changes. Tier: 2 (Foundation-aligned ✓ + Zero-dep ✗) — correctly placed. ConcernsMinor — one factual gap worth clarifying before implementation: The description mentions This is a minor prose accuracy issue — the feature design is otherwise well-specified. Final Recommendation |
Greptile SummaryThis PR bundles three loosely related changes: the Phase 8.2 native extraction work (return-type map and call-assignment passes in the Rust JS extractor, mirrored by
Confidence Score: 5/5Safe to merge; the extraction logic is well-ordered, the empty-map sentinel fix is correct, and test fixtures are updated throughout. The Rust extractor additions follow established patterns, walk ordering is intentional and commented, and patchReturnTypeMap correctly implements the empty-Map vs undefined sentinel distinction. The one notable inaccuracy is the docs-only label in a skip-comment, which has no runtime effect. tests/benchmarks/regression-guard.test.ts — the SKIP_VERSIONS comment for 3.11.2:No-op rebuild mislabels this PR as docs-only. Important Files Changed
Sequence DiagramsequenceDiagram
participant TS as parser.ts (patchNativeResult)
participant Rust as JsExtractor (Rust)
participant RTM as return_type_map
participant CA as call_assignments
Rust->>Rust: walk_tree(match_js_node)
Rust->>Rust: walk_tree(match_js_type_map)
Note over Rust: type_map populated
Rust->>RTM: walk_tree(match_js_return_type_map)
Note over RTM: explicit annotation confidence 1.0, return new X() inference confidence 0.85
Rust->>CA: walk_tree(match_js_call_assignments)
Note over CA: looks up type_map for receiver types
Rust-->>TS: FileSymbols with returnTypeMap and callAssignments
TS->>TS: patchTypeMap(r)
TS->>TS: patchReturnTypeMap(r)
Note over TS: empty array becomes new Map, undefined WASM path left as undefined
TS-->>TS: ExtractorOutput ready for ts-resolver
Reviews (7): Last reviewed commit: "Merge branch 'main' into docs/backlog-em..." | Re-trigger Greptile |
| |----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------| | ||
| | 3 | Token counting on responses | Add tiktoken-based token counts to CLI and MCP responses so agents know how much context budget each query consumed. Inspired by glimpse, arbor. | Developer Experience | Agents and users can budget context windows; enables smarter multi-query strategies without blowing context limits | ✗ | ✓ | 3 | No | — | | ||
| | 8 | Optional LLM provider integration | Bring-your-own provider (OpenAI, Anthropic, Ollama, etc.) for richer embeddings and AI-powered search. Enhancement layer only — core graph never depends on it. Inspired by code-graph-rag, autodev-codebase. | Search | Semantic search quality jumps significantly with provider embeddings; users who already pay for an LLM get better results at no extra cost | ✗ | ✓ | 3 | No | — | | ||
| | 109 | Hardware-aware embedding model auto-install | On first `codegraph embed` (or via `codegraph setup`), detect the user's available hardware (RAM via `os.totalmem()`, VRAM via platform-specific probing — `nvidia-smi`, Metal, ROCm, or shell-out to `uvx whichllm@latest`). Compare against the default embedding model's requirements. If the default doesn't fit, automatically select and install the best available model from a bundled MTEB-ranked list filtered by hardware constraints — no prompt, no suggestion, just install it. Rank list ordered by MTEB retrieval score (not parameter count or benchmark ELO), with size tiers so large-RAM machines get `nomic-embed-text-v2-moe` while low-RAM machines fall back to `all-MiniLM-L6-v2`. Store the chosen model in `.codegraphrc.json` `search.model` so subsequent runs use it directly without re-detection. The bundled MTEB rank list is a static JSON file (updated at release time) — no network call at setup time unless the chosen model needs to be downloaded from HuggingFace. | Search | Eliminates the most common semantic search setup failure: users with limited hardware silently get OOM errors or slow-to-unusable performance when the default model doesn't fit their machine. Instead of asking users to know which embedding model fits their GPU, the tool detects and installs the right one automatically — same UX as `npm install` resolving the right binary. | ✗ | ✓ | 3 | No | — | |
There was a problem hiding this comment.
The
Foundation-aligned column is marked ✓ but the description includes shelling out to uvx whichllm@latest, which is a non-npm Python toolchain tool (uvx is from the uv package manager). The scoring guide's foundation alignment red flags explicitly list "Requires Docker, external DB, or non-npm toolchain → violates zero-infrastructure goal." Additionally, automatically writing to .codegraphrc.json ("just install it… Store the chosen model") is a form of making decisions and editing config files, another listed red flag. The column should be ✗ to match the existing Zero-dep column and keep the scoring honest.
| | 109 | Hardware-aware embedding model auto-install | On first `codegraph embed` (or via `codegraph setup`), detect the user's available hardware (RAM via `os.totalmem()`, VRAM via platform-specific probing — `nvidia-smi`, Metal, ROCm, or shell-out to `uvx whichllm@latest`). Compare against the default embedding model's requirements. If the default doesn't fit, automatically select and install the best available model from a bundled MTEB-ranked list filtered by hardware constraints — no prompt, no suggestion, just install it. Rank list ordered by MTEB retrieval score (not parameter count or benchmark ELO), with size tiers so large-RAM machines get `nomic-embed-text-v2-moe` while low-RAM machines fall back to `all-MiniLM-L6-v2`. Store the chosen model in `.codegraphrc.json` `search.model` so subsequent runs use it directly without re-detection. The bundled MTEB rank list is a static JSON file (updated at release time) — no network call at setup time unless the chosen model needs to be downloaded from HuggingFace. | Search | Eliminates the most common semantic search setup failure: users with limited hardware silently get OOM errors or slow-to-unusable performance when the default model doesn't fit their machine. Instead of asking users to know which embedding model fits their GPU, the tool detects and installs the right one automatically — same UX as `npm install` resolving the right binary. | ✗ | ✓ | 3 | No | — | | |
| | 109 | Hardware-aware embedding model auto-install | On first `codegraph embed` (or via `codegraph setup`), detect the user's available hardware (RAM via `os.totalmem()`, VRAM via platform-specific probing — `nvidia-smi`, Metal, ROCm, or shell-out to `uvx whichllm@latest`). Compare against the default embedding model's requirements. If the default doesn't fit, automatically select and install the best available model from a bundled MTEB-ranked list filtered by hardware constraints — no prompt, no suggestion, just install it. Rank list ordered by MTEB retrieval score (not parameter count or benchmark ELO), with size tiers so large-RAM machines get `nomic-embed-text-v2-moe` while low-RAM machines fall back to `all-MiniLM-L6-v2`. Store the chosen model in `.codegraphrc.json` `search.model` so subsequent runs use it directly without re-detection. The bundled MTEB rank list is a static JSON file (updated at release time) — no network call at setup time unless the chosen model needs to be downloaded from HuggingFace. | Search | Eliminates the most common semantic search setup failure: users with limited hardware silently get OOM errors or slow-to-unusable performance when the default model doesn't fit their machine. Instead of asking users to know which embedding model fits their GPU, the tool detects and installs the right one automatically — same UX as `npm install` resolving the right binary. | ✗ | ✗ | 3 | No | — | |
There was a problem hiding this comment.
Fixed. Replaced uvx whichllm@latest with native platform probes only (nvidia-smi, Metal performance counters, rocm-smi). All three are bundled with their GPU drivers — no non-npm toolchain, no PyPI dependency. Foundation-aligned: ✓ is now accurate. Commit 1905fcf.
| |----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------| | ||
| | 3 | Token counting on responses | Add tiktoken-based token counts to CLI and MCP responses so agents know how much context budget each query consumed. Inspired by glimpse, arbor. | Developer Experience | Agents and users can budget context windows; enables smarter multi-query strategies without blowing context limits | ✗ | ✓ | 3 | No | — | | ||
| | 8 | Optional LLM provider integration | Bring-your-own provider (OpenAI, Anthropic, Ollama, etc.) for richer embeddings and AI-powered search. Enhancement layer only — core graph never depends on it. Inspired by code-graph-rag, autodev-codebase. | Search | Semantic search quality jumps significantly with provider embeddings; users who already pay for an LLM get better results at no extra cost | ✗ | ✓ | 3 | No | — | | ||
| | 109 | Hardware-aware embedding model auto-install | On first `codegraph embed` (or via `codegraph setup`), detect the user's available hardware (RAM via `os.totalmem()`, VRAM via platform-specific probing — `nvidia-smi`, Metal, ROCm, or shell-out to `uvx whichllm@latest`). Compare against the default embedding model's requirements. If the default doesn't fit, automatically select and install the best available model from a bundled MTEB-ranked list filtered by hardware constraints — no prompt, no suggestion, just install it. Rank list ordered by MTEB retrieval score (not parameter count or benchmark ELO), with size tiers so large-RAM machines get `nomic-embed-text-v2-moe` while low-RAM machines fall back to `all-MiniLM-L6-v2`. Store the chosen model in `.codegraphrc.json` `search.model` so subsequent runs use it directly without re-detection. The bundled MTEB rank list is a static JSON file (updated at release time) — no network call at setup time unless the chosen model needs to be downloaded from HuggingFace. | Search | Eliminates the most common semantic search setup failure: users with limited hardware silently get OOM errors or slow-to-unusable performance when the default model doesn't fit their machine. Instead of asking users to know which embedding model fits their GPU, the tool detects and installs the right one automatically — same UX as `npm install` resolving the right binary. | ✗ | ✓ | 3 | No | — | |
There was a problem hiding this comment.
uvx whichllm@latest implies a network call at setup time
The description states "no network call at setup time unless the chosen model needs to be downloaded from HuggingFace," but uvx whichllm@latest uses the @latest version specifier, which forces uvx to resolve the current latest version from PyPI every time it runs — that is itself a network call. If the intent is to stay offline during hardware detection, the implementation note should pin a specific version (e.g., uvx whichllm==0.x.y) or use one of the native probes (nvidia-smi, Metal, ROCm) exclusively and drop uvx from this path entirely.
There was a problem hiding this comment.
Fixed. The uvx whichllm@latest reference has been removed entirely from the description. The VRAM probe options are now exclusively native platform tools (nvidia-smi, Metal performance counters, rocm-smi) — none of which make network calls. The 'no network call at setup time' claim is now accurate. Commit 1905fcf.
The `uvx whichllm@latest` reference violated two foundation-alignment rules: it requires a non-npm Python toolchain (uvx from the uv package manager), and `@latest` forces a PyPI network call at setup time — contradicting the "no network call at setup time" claim in the same sentence. Replace with native platform probes only: nvidia-smi, Metal performance counters, and rocm-smi. All three ship with their respective GPU drivers, require no extra toolchain, and make no network calls. Foundation-aligned ✓ remains accurate.
The per-PR benchmark gate measured native No-op rebuild at 45ms (build) and 37ms (incremental) vs the 3.11.2 baseline of 25ms and 19ms — +80% and +95%, both exceeding the 50% NOISY_METRIC_THRESHOLD on run 26792023287. No watcher, builder, or incremental-orchestrator change is present in the dev tree; the delta is shared-runner scheduling noise on a sub-30ms metric. Same pattern as 3.11.0 and 3.11.1 entries.
…ust engine (closes #1280) Ports Phase 8.2 to the native engine so returnTypeMap and callAssignments are populated for JS/TS/TSX files regardless of which engine is active. - Rust: new NativeCallAssignment NAPI struct; FileSymbols gains returnTypeMap and callAssignments fields; match_js_return_type_map and match_js_call_assignments walk passes mirror the JS extractReturnTypeMapWalk and recordCallAssignment logic exactly - JS: patchReturnTypeMap() in parser.ts converts the native array to a Map (same pattern as patchTypeMap) so propagateReturnTypesAcrossFiles sees a unified ExtractorOutput regardless of engine
| map.set(e.name, { type: e.typeName, confidence: conf }); | ||
| } | ||
| } | ||
| r.returnTypeMap = map.size > 0 ? map : undefined; |
There was a problem hiding this comment.
Empty returnTypeMap bypasses native-extractor sentinel, triggering unnecessary TypeScript compiler backfill.
patchReturnTypeMap converts an empty native array ([]) to undefined instead of an empty Map. The ts-resolver treats returnTypeMap === undefined as the signal that "native left this field unpopulated" (see comment at ts-resolver line 126), and fires enrichReturnTypeMap via the TypeScript compiler. With this conversion, every native-extracted JS/TS file that has no annotated return types — the common case for plain JS — will trigger that extra compiler pass, defeating the optimisation this PR is adding.
patchTypeMap uses the correct pattern: an empty array becomes an empty Map (not undefined), so the consumer knows the extraction already ran. callAssignments is left as [] and ts-resolver correctly skips backfill because [] !== undefined. returnTypeMap should follow the same convention.
| r.returnTypeMap = map.size > 0 ? map : undefined; | |
| r.returnTypeMap = map.size > 0 ? map : new Map(); |
There was a problem hiding this comment.
Fixed. Changed patchReturnTypeMap to use new Map() instead of undefined when the native array is empty, matching the pattern used by patchTypeMap. An empty Map now signals the native engine ran extraction but found nothing; undefined stays reserved for the WASM path. The ts-resolver sentinel check fires correctly. Commit 78ee0d9.
Codegraph Impact Analysis11 functions changed → 15 callers affected across 8 files
|
…chReturnTypeMap (#1282)
Summary
codegraph pathfor A→B symbol pathfinding #109 to Tier 2 ofdocs/roadmap/BACKLOG.mdcodegraph embed/codegraph setup, detect RAM and VRAM, then auto-install the best MTEB-ranked embedding model that fits the machine — no manual selection required.codegraphrc.jsonwhichllm(hardware-aware LLM ranking) but scoped to embedding models and ranked by MTEB retrieval scores rather than chat benchmarks