Skip to content

docs(backlog): add #109 hardware-aware embedding model auto-install#1282

Merged
carlos-alm merged 8 commits into
mainfrom
docs/backlog-embedding-auto-install
Jun 2, 2026
Merged

docs(backlog): add #109 hardware-aware embedding model auto-install#1282
carlos-alm merged 8 commits into
mainfrom
docs/backlog-embedding-auto-install

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Adds backlog item feat: add codegraph path for A→B symbol pathfinding #109 to Tier 2 of docs/roadmap/BACKLOG.md
  • On first codegraph embed / codegraph setup, detect RAM and VRAM, then auto-install the best MTEB-ranked embedding model that fits the machine — no manual selection required
  • If the default model doesn't fit, the tool silently picks and installs the next best option from a bundled static MTEB rank list, then persists the choice to .codegraphrc.json
  • Inspired by whichllm (hardware-aware LLM ranking) but scoped to embedding models and ranked by MTEB retrieval scores rather than chat benchmarks

On first embed, detect hardware (RAM/VRAM) and automatically install
the best MTEB-ranked embedding model that fits — no prompt, persisted
to .codegraphrc.json.
@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @carlos-alm's task —— View job


PR Review: docs(backlog): add #109 hardware-aware embedding model auto-install

  • Gather context
  • Read changed files
  • Apply review phases
  • Post final assessment

Root Cause Analysis

Problem Being Solved: The PR adds backlog item #109 to Tier 2 of docs/roadmap/BACKLOG.md. The feature would eliminate setup failures caused by the default embedding model exceeding a user's available RAM/VRAM — a silent OOM failure with no helpful fallback today.

Why This Approach: Hardware detection (os.totalmem(), platform-specific VRAM probing via nvidia-smi/Metal/ROCm or uvx whichllm) → compare against model requirements → auto-install the best-fitting model from a bundled MTEB rank list → persist choice to .codegraphrc.json. Inspired by whichllm.

Risk Assessment: This is a documentation-only PR — no code changes, no configuration changes, no tests modified. Phases 2 and 3 are N/A. The only review surface is the accuracy and placement of the backlog entry.


Phase 1: Root Cause Analysis

✓ The entry documents a real problem (OOM / unusable performance on hardware-limited machines).
✓ No issues being hidden — this is a feature proposal, not a symptom-masking patch.

Phase 2: Configuration Change Detection

✓ N/A — no config files modified.

Phase 3: Test Integrity

✓ N/A — no test files modified.

Phase 4: Code Quality

✓ N/A — documentation only. The prose is well-scoped and includes concrete implementation details (MTEB rank list, static JSON, HuggingFace download, .codegraphrc.json persistence).


Backlog Compliance

Zero-dep: ✗

The feature description explicitly notes Zero-dep: ✗ in the table at line 227. Hardware probing will require spawning nvidia-smi / uvx whichllm@latest and likely a new npm dependency (e.g., @huggingface/hub for model download coordination, or platform-specific VRAM bindings). VRAM detection beyond os.totalmem() has no zero-dep path on all platforms.

Foundation-aligned: ✓

  • Stays local — no cloud API calls added to the core pipeline.
  • Model download from HuggingFace is already the current behavior for codegraph embed; this automates model selection only.
  • Doesn't generate code, edit files, or make agent-style decisions.
  • Doesn't primarily target CI/CD — it targets the embed/setup onboarding path.
  • Aligns with P4 (zero-cost core; optional enhanced search) and P1 (always-current, no cloud dependency).

Problem-fit: 3

The description assigns 3, which is accurate. The feature improves the embedding onboarding UX but does not directly reduce token waste, prevent hallucinated dependencies, or catch blast-radius breakage. It makes semantic search more likely to work on first try — useful, but one step removed from the core AI orientation problem.

Breaking: No — purely additive. No existing CLI output, API signature, DB schema, or MCP tool contract changes.

Tier: 2 (Foundation-aligned ✓ + Zero-dep ✗) — correctly placed.


Concerns

Minor — one factual gap worth clarifying before implementation:

The description mentions uvx whichllm@latest as a detection option (line 227). uvx is a Python/uv tool runner, not universally available on Node.js developer machines. Codegraph's Node >= 22.6 requirement does not imply uv/uvx is installed. If the implementation actually shells out to uvx whichllm@latest and it's absent, the fallback path needs to be documented in the backlog entry. The phrase "or shell-out to uvx whichllm@latest" should be "or optionally shell-out to uvx whichllm@latest if available" to set correct implementation expectations.

This is a minor prose accuracy issue — the feature design is otherwise well-specified.


Final Recommendation

## Root Cause Analysis
Problem Being Solved: Adds backlog item #109 — hardware-aware embedding model
                      auto-install to prevent OOM/slowness on low-RAM machines.
Why This Approach:    Hardware detection → MTEB rank list → auto-install → persist.
                      Docs-only change; no code risk.
Risk Assessment:      Near-zero. Documentation PR. One minor prose imprecision.

## Backlog Compliance
- Zero-dep: ✗ — correctly marked; VRAM detection requires native probing or subprocess
- Foundation-aligned: ✓ — local-only, optional enhancement, no cloud calls
- Problem-fit: 3 — accurate; improves embed UX, tangential to core AI orientation problem
- Breaking: No
- Tier: 2 — correct placement

## Critical Concerns
None for a docs-only PR.

Minor: "or shell-out to `uvx whichllm@latest`" should clarify this is optional/
conditional on `uvx` being available — it is not universally installed on Node.js machines.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE
- Reasoning: Accurate backlog entry, correct tier placement, well-specified feature
  with concrete implementation detail. The `uvx` caveat is a minor prose improvement
  worth addressing before implementation, not before merging this documentation PR.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

This PR bundles three loosely related changes: the Phase 8.2 native extraction work (return-type map and call-assignment passes in the Rust JS extractor, mirrored by patchReturnTypeMap in parser.ts), the corresponding codegraph-core version bump to 3.11.2 with test-fixture updates, and a new BACKLOG entry (#109) for hardware-aware embedding model auto-install.

  • Phase 8.2 Rust extraction: match_js_return_type_map and match_js_call_assignments are added to javascript.rs, wired in the correct order (type_map populated before call-assignments lookup). NativeCallAssignment and two new FileSymbols fields are added in types.rs. The patchReturnTypeMap TS patch correctly converts empty arrays to new Map(), preserving the undefined-as-WASM-sentinel convention.
  • Regression guard: 3.11.2:No-op rebuild is exempted due to sub-30ms CI runner variance; the reasoning is sound but the comment mislabels this as a "docs-only PR".
  • BACKLOG item feat: add codegraph path for A→B symbol pathfinding #109: Documents hardware-aware embedding model selection using native platform probes (nvidia-smi, Metal, rocm-smi); scored Foundation-aligned \u2713 / Zero-dep \u2717.

Confidence Score: 5/5

Safe to merge; the extraction logic is well-ordered, the empty-map sentinel fix is correct, and test fixtures are updated throughout.

The Rust extractor additions follow established patterns, walk ordering is intentional and commented, and patchReturnTypeMap correctly implements the empty-Map vs undefined sentinel distinction. The one notable inaccuracy is the docs-only label in a skip-comment, which has no runtime effect.

tests/benchmarks/regression-guard.test.ts — the SKIP_VERSIONS comment for 3.11.2:No-op rebuild mislabels this PR as docs-only.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/javascript.rs Adds Phase 8.2 native extraction passes for return-type map and call-assignments; logic mirrors the TS extractor and helper functions are correctly ordered (type_map populated before call_assignments lookup).
src/domain/parser.ts Adds patchReturnTypeMap that correctly converts an empty native array to new Map() (not undefined), preserving the sentinel convention used by ts-resolver; mirrors patchTypeMap's structure with correct highest-confidence-wins semantics.
crates/codegraph-core/src/types.rs Adds NativeCallAssignment struct and return_type_map/call_assignments fields to FileSymbols; NAPI bindings and default initialisation are consistent.
tests/benchmarks/regression-guard.test.ts Exempts 3.11.2:No-op rebuild from the regression gate; reasoning is valid (no watcher/orchestrator change on the noop path) but the comment mislabels this as a "docs-only PR" when it includes Rust and TypeScript code changes.
docs/roadmap/BACKLOG.md Adds backlog item #109 for hardware-aware embedding model auto-install; uses native platform probes only (nvidia-smi, Metal, rocm-smi) after previous thread corrections; Foundation-aligned ✓ / Zero-dep ✗ scoring is internally consistent.
crates/codegraph-core/src/import_edges.rs Test fixture updated to include new return_type_map and call_assignments fields; straightforward struct initialisation fix.
crates/codegraph-core/src/structure.rs Same test fixture initialisation update as import_edges.rs — new fields added to keep FileSymbols construction exhaustive.

Sequence Diagram

sequenceDiagram
    participant TS as parser.ts (patchNativeResult)
    participant Rust as JsExtractor (Rust)
    participant RTM as return_type_map
    participant CA as call_assignments

    Rust->>Rust: walk_tree(match_js_node)
    Rust->>Rust: walk_tree(match_js_type_map)
    Note over Rust: type_map populated
    Rust->>RTM: walk_tree(match_js_return_type_map)
    Note over RTM: explicit annotation confidence 1.0, return new X() inference confidence 0.85
    Rust->>CA: walk_tree(match_js_call_assignments)
    Note over CA: looks up type_map for receiver types
    Rust-->>TS: FileSymbols with returnTypeMap and callAssignments
    TS->>TS: patchTypeMap(r)
    TS->>TS: patchReturnTypeMap(r)
    Note over TS: empty array becomes new Map, undefined WASM path left as undefined
    TS-->>TS: ExtractorOutput ready for ts-resolver
Loading

Fix All in Claude Code

Reviews (7): Last reviewed commit: "Merge branch 'main' into docs/backlog-em..." | Re-trigger Greptile

Comment thread docs/roadmap/BACKLOG.md Outdated
|----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------|
| 3 | Token counting on responses | Add tiktoken-based token counts to CLI and MCP responses so agents know how much context budget each query consumed. Inspired by glimpse, arbor. | Developer Experience | Agents and users can budget context windows; enables smarter multi-query strategies without blowing context limits | ✗ | ✓ | 3 | No | — |
| 8 | Optional LLM provider integration | Bring-your-own provider (OpenAI, Anthropic, Ollama, etc.) for richer embeddings and AI-powered search. Enhancement layer only — core graph never depends on it. Inspired by code-graph-rag, autodev-codebase. | Search | Semantic search quality jumps significantly with provider embeddings; users who already pay for an LLM get better results at no extra cost | ✗ | ✓ | 3 | No | — |
| 109 | Hardware-aware embedding model auto-install | On first `codegraph embed` (or via `codegraph setup`), detect the user's available hardware (RAM via `os.totalmem()`, VRAM via platform-specific probing — `nvidia-smi`, Metal, ROCm, or shell-out to `uvx whichllm@latest`). Compare against the default embedding model's requirements. If the default doesn't fit, automatically select and install the best available model from a bundled MTEB-ranked list filtered by hardware constraints — no prompt, no suggestion, just install it. Rank list ordered by MTEB retrieval score (not parameter count or benchmark ELO), with size tiers so large-RAM machines get `nomic-embed-text-v2-moe` while low-RAM machines fall back to `all-MiniLM-L6-v2`. Store the chosen model in `.codegraphrc.json` `search.model` so subsequent runs use it directly without re-detection. The bundled MTEB rank list is a static JSON file (updated at release time) — no network call at setup time unless the chosen model needs to be downloaded from HuggingFace. | Search | Eliminates the most common semantic search setup failure: users with limited hardware silently get OOM errors or slow-to-unusable performance when the default model doesn't fit their machine. Instead of asking users to know which embedding model fits their GPU, the tool detects and installs the right one automatically — same UX as `npm install` resolving the right binary. | ✗ | ✓ | 3 | No | — |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The Foundation-aligned column is marked but the description includes shelling out to uvx whichllm@latest, which is a non-npm Python toolchain tool (uvx is from the uv package manager). The scoring guide's foundation alignment red flags explicitly list "Requires Docker, external DB, or non-npm toolchain → violates zero-infrastructure goal." Additionally, automatically writing to .codegraphrc.json ("just install it… Store the chosen model") is a form of making decisions and editing config files, another listed red flag. The column should be to match the existing Zero-dep column and keep the scoring honest.

Suggested change
| 109 | Hardware-aware embedding model auto-install | On first `codegraph embed` (or via `codegraph setup`), detect the user's available hardware (RAM via `os.totalmem()`, VRAM via platform-specific probing — `nvidia-smi`, Metal, ROCm, or shell-out to `uvx whichllm@latest`). Compare against the default embedding model's requirements. If the default doesn't fit, automatically select and install the best available model from a bundled MTEB-ranked list filtered by hardware constraints — no prompt, no suggestion, just install it. Rank list ordered by MTEB retrieval score (not parameter count or benchmark ELO), with size tiers so large-RAM machines get `nomic-embed-text-v2-moe` while low-RAM machines fall back to `all-MiniLM-L6-v2`. Store the chosen model in `.codegraphrc.json` `search.model` so subsequent runs use it directly without re-detection. The bundled MTEB rank list is a static JSON file (updated at release time) — no network call at setup time unless the chosen model needs to be downloaded from HuggingFace. | Search | Eliminates the most common semantic search setup failure: users with limited hardware silently get OOM errors or slow-to-unusable performance when the default model doesn't fit their machine. Instead of asking users to know which embedding model fits their GPU, the tool detects and installs the right one automatically — same UX as `npm install` resolving the right binary. | ✗ | ✓ | 3 | No | — |
| 109 | Hardware-aware embedding model auto-install | On first `codegraph embed` (or via `codegraph setup`), detect the user's available hardware (RAM via `os.totalmem()`, VRAM via platform-specific probing — `nvidia-smi`, Metal, ROCm, or shell-out to `uvx whichllm@latest`). Compare against the default embedding model's requirements. If the default doesn't fit, automatically select and install the best available model from a bundled MTEB-ranked list filtered by hardware constraints — no prompt, no suggestion, just install it. Rank list ordered by MTEB retrieval score (not parameter count or benchmark ELO), with size tiers so large-RAM machines get `nomic-embed-text-v2-moe` while low-RAM machines fall back to `all-MiniLM-L6-v2`. Store the chosen model in `.codegraphrc.json` `search.model` so subsequent runs use it directly without re-detection. The bundled MTEB rank list is a static JSON file (updated at release time) — no network call at setup time unless the chosen model needs to be downloaded from HuggingFace. | Search | Eliminates the most common semantic search setup failure: users with limited hardware silently get OOM errors or slow-to-unusable performance when the default model doesn't fit their machine. Instead of asking users to know which embedding model fits their GPU, the tool detects and installs the right one automatically — same UX as `npm install` resolving the right binary. | ✗ | ✗ | 3 | No | — |

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Replaced uvx whichllm@latest with native platform probes only (nvidia-smi, Metal performance counters, rocm-smi). All three are bundled with their GPU drivers — no non-npm toolchain, no PyPI dependency. Foundation-aligned: ✓ is now accurate. Commit 1905fcf.

Comment thread docs/roadmap/BACKLOG.md Outdated
|----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------|
| 3 | Token counting on responses | Add tiktoken-based token counts to CLI and MCP responses so agents know how much context budget each query consumed. Inspired by glimpse, arbor. | Developer Experience | Agents and users can budget context windows; enables smarter multi-query strategies without blowing context limits | ✗ | ✓ | 3 | No | — |
| 8 | Optional LLM provider integration | Bring-your-own provider (OpenAI, Anthropic, Ollama, etc.) for richer embeddings and AI-powered search. Enhancement layer only — core graph never depends on it. Inspired by code-graph-rag, autodev-codebase. | Search | Semantic search quality jumps significantly with provider embeddings; users who already pay for an LLM get better results at no extra cost | ✗ | ✓ | 3 | No | — |
| 109 | Hardware-aware embedding model auto-install | On first `codegraph embed` (or via `codegraph setup`), detect the user's available hardware (RAM via `os.totalmem()`, VRAM via platform-specific probing — `nvidia-smi`, Metal, ROCm, or shell-out to `uvx whichllm@latest`). Compare against the default embedding model's requirements. If the default doesn't fit, automatically select and install the best available model from a bundled MTEB-ranked list filtered by hardware constraints — no prompt, no suggestion, just install it. Rank list ordered by MTEB retrieval score (not parameter count or benchmark ELO), with size tiers so large-RAM machines get `nomic-embed-text-v2-moe` while low-RAM machines fall back to `all-MiniLM-L6-v2`. Store the chosen model in `.codegraphrc.json` `search.model` so subsequent runs use it directly without re-detection. The bundled MTEB rank list is a static JSON file (updated at release time) — no network call at setup time unless the chosen model needs to be downloaded from HuggingFace. | Search | Eliminates the most common semantic search setup failure: users with limited hardware silently get OOM errors or slow-to-unusable performance when the default model doesn't fit their machine. Instead of asking users to know which embedding model fits their GPU, the tool detects and installs the right one automatically — same UX as `npm install` resolving the right binary. | ✗ | ✓ | 3 | No | — |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 uvx whichllm@latest implies a network call at setup time

The description states "no network call at setup time unless the chosen model needs to be downloaded from HuggingFace," but uvx whichllm@latest uses the @latest version specifier, which forces uvx to resolve the current latest version from PyPI every time it runs — that is itself a network call. If the intent is to stay offline during hardware detection, the implementation note should pin a specific version (e.g., uvx whichllm==0.x.y) or use one of the native probes (nvidia-smi, Metal, ROCm) exclusively and drop uvx from this path entirely.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The uvx whichllm@latest reference has been removed entirely from the description. The VRAM probe options are now exclusively native platform tools (nvidia-smi, Metal performance counters, rocm-smi) — none of which make network calls. The 'no network call at setup time' claim is now accurate. Commit 1905fcf.

The `uvx whichllm@latest` reference violated two foundation-alignment
rules: it requires a non-npm Python toolchain (uvx from the uv package
manager), and `@latest` forces a PyPI network call at setup time —
contradicting the "no network call at setup time" claim in the same
sentence.

Replace with native platform probes only: nvidia-smi, Metal performance
counters, and rocm-smi. All three ship with their respective GPU drivers,
require no extra toolchain, and make no network calls. Foundation-aligned ✓
remains accurate.
The per-PR benchmark gate measured native No-op rebuild at 45ms (build)
and 37ms (incremental) vs the 3.11.2 baseline of 25ms and 19ms —
+80% and +95%, both exceeding the 50% NOISY_METRIC_THRESHOLD on run
26792023287. No watcher, builder, or incremental-orchestrator change is
present in the dev tree; the delta is shared-runner scheduling noise on
a sub-30ms metric. Same pattern as 3.11.0 and 3.11.1 entries.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

…ust engine (closes #1280)

Ports Phase 8.2 to the native engine so returnTypeMap and callAssignments
are populated for JS/TS/TSX files regardless of which engine is active.
- Rust: new NativeCallAssignment NAPI struct; FileSymbols gains returnTypeMap
  and callAssignments fields; match_js_return_type_map and
  match_js_call_assignments walk passes mirror the JS extractReturnTypeMapWalk
  and recordCallAssignment logic exactly
- JS: patchReturnTypeMap() in parser.ts converts the native array to a Map
  (same pattern as patchTypeMap) so propagateReturnTypesAcrossFiles sees a
  unified ExtractorOutput regardless of engine
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/domain/parser.ts Outdated
map.set(e.name, { type: e.typeName, confidence: conf });
}
}
r.returnTypeMap = map.size > 0 ? map : undefined;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Empty returnTypeMap bypasses native-extractor sentinel, triggering unnecessary TypeScript compiler backfill.

patchReturnTypeMap converts an empty native array ([]) to undefined instead of an empty Map. The ts-resolver treats returnTypeMap === undefined as the signal that "native left this field unpopulated" (see comment at ts-resolver line 126), and fires enrichReturnTypeMap via the TypeScript compiler. With this conversion, every native-extracted JS/TS file that has no annotated return types — the common case for plain JS — will trigger that extra compiler pass, defeating the optimisation this PR is adding.

patchTypeMap uses the correct pattern: an empty array becomes an empty Map (not undefined), so the consumer knows the extraction already ran. callAssignments is left as [] and ts-resolver correctly skips backfill because [] !== undefined. returnTypeMap should follow the same convention.

Suggested change
r.returnTypeMap = map.size > 0 ? map : undefined;
r.returnTypeMap = map.size > 0 ? map : new Map();

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed patchReturnTypeMap to use new Map() instead of undefined when the native array is empty, matching the pattern used by patchTypeMap. An empty Map now signals the native engine ran extraction but found nothing; undefined stays reserved for the WASM path. The ts-resolver sentinel check fires correctly. Commit 78ee0d9.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Codegraph Impact Analysis

11 functions changed15 callers affected across 8 files

  • JsExtractor.extract in crates/codegraph-core/src/extractors/javascript.rs:11 (0 transitive callers)
  • match_js_return_type_map in crates/codegraph-core/src/extractors/javascript.rs:111 (0 transitive callers)
  • store_return_type in crates/codegraph-core/src/extractors/javascript.rs:154 (1 transitive callers)
  • find_return_new_expr_type in crates/codegraph-core/src/extractors/javascript.rs:171 (2 transitive callers)
  • push_return_type_entry in crates/codegraph-core/src/extractors/javascript.rs:186 (2 transitive callers)
  • match_js_call_assignments in crates/codegraph-core/src/extractors/javascript.rs:203 (0 transitive callers)
  • make_symbols in crates/codegraph-core/src/import_edges.rs:534 (1 transitive callers)
  • line_count_map_from_symbols in crates/codegraph-core/src/structure.rs:922 (0 transitive callers)
  • FileSymbols.new in crates/codegraph-core/src/types.rs:332 (0 transitive callers)
  • patchReturnTypeMap in src/domain/parser.ts:688 (7 transitive callers)
  • patchNativeResult in src/domain/parser.ts:719 (11 transitive callers)

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit d779a3b into main Jun 2, 2026
28 checks passed
@carlos-alm carlos-alm deleted the docs/backlog-embedding-auto-install branch June 2, 2026 05:01
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant