Merged
Conversation
Set up the code-index project with uv and hatchling, including all dependencies (tree-sitter, faiss-cpu, sentence-transformers, networkx, cdlib, mcp, click). Create the package structure under src/code_index with subpackages for graph, storage, ingestion, embeddings, and query. Implement core modules: - config.py: constants for index paths, embedding settings, graph schema - graph/models.py: Node/Edge dataclasses and KnowledgeGraph with O(1) lookup and indexed edge iteration - storage/db.py: full IndexDatabase class with SQLite WAL mode, FTS5 search, batch insert, and graph serialization - cli.py: click-based CLI entry point Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement EmbeddingEngine class (Step 2B) that vectorizes code graph nodes using sentence-transformers and indexes them with FAISS IndexFlatIP for cosine similarity search. Supports batch embedding, progress callbacks, persistence to disk, and id_map recovery from SQLite. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Ingestion parsers: filesystem walker, parser_base types, Python parser, TypeScript parser - Plugin config: plugin.json, .mcp.json - Skills: guide, explore-code, search-code, impact-analysis, debug-code, index-codebase Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s (Wave 3) - Import resolver with Python/TS resolution and tiered SymbolTable - Heritage processor: EXTENDS, IMPLEMENTS, HAS_METHOD, OVERRIDES edges - Call processor: CALLS edges with confidence-scored name resolution - Community processor: Leiden clustering with NetworkX fallback - Process detector: BFS execution flow tracing from entry points Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pipeline: 10-phase indexing from file discovery to embedding generation - Query engine: hybrid search (BM25+semantic+RRF), context, impact analysis, change detection, coordinated rename, cluster/process views - Fulltext, semantic, and hybrid search modules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MCP server: 7 tools (query, context, impact, detect_changes, rename, list_repos, get_clusters) and 7 resources with lazy initialization - CLI: analyze, status, clean, list, search, mcp commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Test fixtures: sample Python and TypeScript projects - Unit tests: graph models, parsers, import resolver, RRF merge - Integration tests: full pipeline + query engine on sample fixtures - Updated query engine with improved structure and logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RRF merge stores scores under 'score' key but query() was reading 'rrf_score'. Found by audit agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix tree-sitter 0.25 API: use QueryCursor instead of deprecated Query.matches() - Fix TypeScript parser: use tree_sitter_javascript for JS, fix heritage parsing - Fix db.py: use FTS5 rebuild command, allow cross-thread SQLite access - Fix test assertions to match actual API responses Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- import_resolver: Fix File node ID mismatch causing zero IMPORTS edges (was using full path as name instead of filename) - import_resolver: Move tsconfig loading outside per-import loop - server.py: Fix KeyError crashes from process dict key mismatches (process_name vs name, matched_symbols vs symbols) - cli.py: Fix same dict key mismatches in search display - test_integration: Add db.close() teardown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tries local_files_only=True first, falls back to download if not cached. Eliminates repeated HF Hub requests and rate limit warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- eval_communities: cohesion, separation, file coherence, size balance, naming - eval_processes: entry point quality, cross-file, length balance, uniqueness, coverage - eval_search: hit rate, rank quality, anti-pattern avoidance, test-file penalty - run_evals: combined report runner with weighted overall scores Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Communities: - Add IMPORTS to structural edges (was missing cross-file signal) - Stop pruning degree-1 nodes (killed coverage) - Lower Leiden resolution (1.0/0.8 vs 2.0/1.0) for finer clusters - Add orphan assignment: unassigned symbols join nearest community by directory Processes: - Increase max_processes 75->500, max_branching 4->8, max_trace_depth 10->12 - Lower min_steps 3->2 to capture short but real flows - Broaden entry point regex (HTTP methods, CRUD, webhooks, tests) - Allow entry points with 1-2 callers (not just zero) Results on ~/decoda: - Community coverage: 42.7% -> 99.6% - Communities: 141 -> 617 - Cohesion: 0.692 -> 0.941 - Processes: 40 -> 181 - Overall: 0.786 -> 0.807 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reads test patterns from: - pyproject.toml [tool.pytest.ini_options] testpaths + norecursedirs - pytest.ini, setup.cfg - jest.config.js/ts testPathIgnorePatterns - package.json jest.testMatch - vitest.config.ts include patterns - Falls back to common conventions (test_*, *.test.ts, *.spec.ts, __tests__/) On decoda: 551 test files excluded (11.3%), indexing 40% faster (83s vs 149s), search score 0.879 -> 0.913 (zero test files in results), process grouping 12.5% -> 50% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After the initial BFS from entry points, a second pass picks up uncovered nodes by tracing backward to a root then forward to a terminal. This ensures more of the call graph participates in processes. On decoda: process coverage 1.3% -> 9.4%, processes 178 -> 1964, overall score 0.810 -> 0.839, search score 0.913 -> 0.980 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…solution Root cause: resolver used absolute paths internally but graph stores relative paths, so nothing matched. Also only checked repo-root tsconfig, missing monorepo subdir configs (console/tsconfig.json). Fixes: - All path matching now uses repo-relative strings - .resolve() on relative imports to collapse ../ - Monorepo tsconfig discovery (scans subdirs for tsconfig.json) - @/* path alias resolution with correct base_dir - Import-flow pass in process detector for call-graph-isolated symbols Results on decoda: - IMPORTS edges: 1,727 -> 22,478 (13x) - TS import resolution: 0% -> 79% - Overall: 0.810 -> 0.842 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When multiple candidates match a call, prefer the one from an imported file, then same directory, instead of giving up. This resolves internal helpers that were invisible because their name existed in multiple files. On decoda: - CALLS edges: 21,998 -> 54,766 (2.5x) - Search process grouping: 50% -> 100% - Overall: 0.842 -> 0.848 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes impact latency regression from 2.5x more CALLS edges. Before: 1,092ms avg. After: 57ms avg (19x faster). Also skips MEMBER_OF/STEP_IN_PROCESS/CONTAINS during BFS traversal and collects process participation inline instead of a second pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced SQLite-backed BFS with in-memory KnowledgeGraph lookups. Removed the 200-node cap — full traversal now runs in <1ms because dict lookups are ~100x faster than SQLite queries. Results: _send_email_notification now shows 416 affected symbols (was 82 with cap), 0ms (was 163ms). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added `fastembed` version 0.8.0 to the project dependencies in `pyproject.toml`. - Updated `uv.lock` to include `fastembed` and its dependencies. - Enhanced CLI output to display matched symbols and updated scoring metrics in `cli.py`. - Refined impact and change result formatting in `server.py` to improve clarity and detail. - Adjusted integration test setup to ensure proper database closure after tests. These changes improve dependency management and enhance the output of the CLI and server functionalities.
- Introduced `EdgeType` and `NodeLabel` enums to standardize edge and node types across the codebase. - Updated `config.py` to include new constants for confidence thresholds and community detection parameters. - Created `server_formatters.py` to handle markdown formatting for various server responses, improving output clarity. - Refactored existing code to utilize the new enums and constants, enhancing maintainability and readability. These changes streamline the configuration management and improve the server's response formatting capabilities.
- Modified the `.mcp.json` file to specify a new directory for the code index server. - Updated `server.py` to include initialization options when running the server, improving its setup process. - Changed the scoring metric in `engine.py` to use `rrf_score` instead of `score`, ensuring more accurate results. - Introduced a new static method in `db.py` to sanitize FTS queries, enhancing search reliability and preventing syntax errors. These changes improve the configuration management and search capabilities of the code index system.
…on and debugging - Deleted the `debug-code`, `explore-code`, `guide`, `impact-analysis`, `index-codebase`, and `search-code` skills as they were redundant or outdated. - Added new skills: `discover`, `find`, `investigate`, `review`, and `setup` to enhance code exploration, debugging, and review processes. - Updated the knowledge graph configuration to reflect new node and edge types, improving the overall structure and functionality of the code index. - Enhanced search capabilities with semantic search and improved dependency tracing for better debugging and impact analysis. These changes streamline the workflow for developers, making it easier to explore codebases and assess changes effectively.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.