initial commit by nnourr · Pull Request #1 · nnourr/code-index

nnourr · 2026-04-14T19:01:05Z

No description provided.

Set up the code-index project with uv and hatchling, including all dependencies (tree-sitter, faiss-cpu, sentence-transformers, networkx, cdlib, mcp, click). Create the package structure under src/code_index with subpackages for graph, storage, ingestion, embeddings, and query. Implement core modules: - config.py: constants for index paths, embedding settings, graph schema - graph/models.py: Node/Edge dataclasses and KnowledgeGraph with O(1) lookup and indexed edge iteration - storage/db.py: full IndexDatabase class with SQLite WAL mode, FTS5 search, batch insert, and graph serialization - cli.py: click-based CLI entry point Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implement EmbeddingEngine class (Step 2B) that vectorizes code graph nodes using sentence-transformers and indexes them with FAISS IndexFlatIP for cosine similarity search. Supports batch embedding, progress callbacks, persistence to disk, and id_map recovery from SQLite. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Ingestion parsers: filesystem walker, parser_base types, Python parser, TypeScript parser - Plugin config: plugin.json, .mcp.json - Skills: guide, explore-code, search-code, impact-analysis, debug-code, index-codebase Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…s (Wave 3) - Import resolver with Python/TS resolution and tiered SymbolTable - Heritage processor: EXTENDS, IMPLEMENTS, HAS_METHOD, OVERRIDES edges - Call processor: CALLS edges with confidence-scored name resolution - Community processor: Leiden clustering with NetworkX fallback - Process detector: BFS execution flow tracing from entry points Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Pipeline: 10-phase indexing from file discovery to embedding generation - Query engine: hybrid search (BM25+semantic+RRF), context, impact analysis, change detection, coordinated rename, cluster/process views - Fulltext, semantic, and hybrid search modules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- MCP server: 7 tools (query, context, impact, detect_changes, rename, list_repos, get_clusters) and 7 resources with lazy initialization - CLI: analyze, status, clean, list, search, mcp commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Test fixtures: sample Python and TypeScript projects - Unit tests: graph models, parsers, import resolver, RRF merge - Integration tests: full pipeline + query engine on sample fixtures - Updated query engine with improved structure and logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The RRF merge stores scores under 'score' key but query() was reading 'rrf_score'. Found by audit agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix tree-sitter 0.25 API: use QueryCursor instead of deprecated Query.matches() - Fix TypeScript parser: use tree_sitter_javascript for JS, fix heritage parsing - Fix db.py: use FTS5 rebuild command, allow cross-thread SQLite access - Fix test assertions to match actual API responses Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- import_resolver: Fix File node ID mismatch causing zero IMPORTS edges (was using full path as name instead of filename) - import_resolver: Move tsconfig loading outside per-import loop - server.py: Fix KeyError crashes from process dict key mismatches (process_name vs name, matched_symbols vs symbols) - cli.py: Fix same dict key mismatches in search display - test_integration: Add db.close() teardown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tries local_files_only=True first, falls back to download if not cached. Eliminates repeated HF Hub requests and rate limit warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- eval_communities: cohesion, separation, file coherence, size balance, naming - eval_processes: entry point quality, cross-file, length balance, uniqueness, coverage - eval_search: hit rate, rank quality, anti-pattern avoidance, test-file penalty - run_evals: combined report runner with weighted overall scores Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Communities: - Add IMPORTS to structural edges (was missing cross-file signal) - Stop pruning degree-1 nodes (killed coverage) - Lower Leiden resolution (1.0/0.8 vs 2.0/1.0) for finer clusters - Add orphan assignment: unassigned symbols join nearest community by directory Processes: - Increase max_processes 75->500, max_branching 4->8, max_trace_depth 10->12 - Lower min_steps 3->2 to capture short but real flows - Broaden entry point regex (HTTP methods, CRUD, webhooks, tests) - Allow entry points with 1-2 callers (not just zero) Results on ~/decoda: - Community coverage: 42.7% -> 99.6% - Communities: 141 -> 617 - Cohesion: 0.692 -> 0.941 - Processes: 40 -> 181 - Overall: 0.786 -> 0.807 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reads test patterns from: - pyproject.toml [tool.pytest.ini_options] testpaths + norecursedirs - pytest.ini, setup.cfg - jest.config.js/ts testPathIgnorePatterns - package.json jest.testMatch - vitest.config.ts include patterns - Falls back to common conventions (test_*, *.test.ts, *.spec.ts, __tests__/) On decoda: 551 test files excluded (11.3%), indexing 40% faster (83s vs 149s), search score 0.879 -> 0.913 (zero test files in results), process grouping 12.5% -> 50% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

After the initial BFS from entry points, a second pass picks up uncovered nodes by tracing backward to a root then forward to a terminal. This ensures more of the call graph participates in processes. On decoda: process coverage 1.3% -> 9.4%, processes 178 -> 1964, overall score 0.810 -> 0.839, search score 0.913 -> 0.980 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…solution Root cause: resolver used absolute paths internally but graph stores relative paths, so nothing matched. Also only checked repo-root tsconfig, missing monorepo subdir configs (console/tsconfig.json). Fixes: - All path matching now uses repo-relative strings - .resolve() on relative imports to collapse ../ - Monorepo tsconfig discovery (scans subdirs for tsconfig.json) - @/* path alias resolution with correct base_dir - Import-flow pass in process detector for call-graph-isolated symbols Results on decoda: - IMPORTS edges: 1,727 -> 22,478 (13x) - TS import resolution: 0% -> 79% - Overall: 0.810 -> 0.842 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When multiple candidates match a call, prefer the one from an imported file, then same directory, instead of giving up. This resolves internal helpers that were invisible because their name existed in multiple files. On decoda: - CALLS edges: 21,998 -> 54,766 (2.5x) - Search process grouping: 50% -> 100% - Overall: 0.842 -> 0.848 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes impact latency regression from 2.5x more CALLS edges. Before: 1,092ms avg. After: 57ms avg (19x faster). Also skips MEMBER_OF/STEP_IN_PROCESS/CONTAINS during BFS traversal and collects process participation inline instead of a second pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replaced SQLite-backed BFS with in-memory KnowledgeGraph lookups. Removed the 200-node cap — full traversal now runs in <1ms because dict lookups are ~100x faster than SQLite queries. Results: _send_email_notification now shows 416 affected symbols (was 82 with cap), 0ms (was 163ms). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Added `fastembed` version 0.8.0 to the project dependencies in `pyproject.toml`. - Updated `uv.lock` to include `fastembed` and its dependencies. - Enhanced CLI output to display matched symbols and updated scoring metrics in `cli.py`. - Refined impact and change result formatting in `server.py` to improve clarity and detail. - Adjusted integration test setup to ensure proper database closure after tests. These changes improve dependency management and enhance the output of the CLI and server functionalities.

- Introduced `EdgeType` and `NodeLabel` enums to standardize edge and node types across the codebase. - Updated `config.py` to include new constants for confidence thresholds and community detection parameters. - Created `server_formatters.py` to handle markdown formatting for various server responses, improving output clarity. - Refactored existing code to utilize the new enums and constants, enhancing maintainability and readability. These changes streamline the configuration management and improve the server's response formatting capabilities.

- Modified the `.mcp.json` file to specify a new directory for the code index server. - Updated `server.py` to include initialization options when running the server, improving its setup process. - Changed the scoring metric in `engine.py` to use `rrf_score` instead of `score`, ensuring more accurate results. - Introduced a new static method in `db.py` to sanitize FTS queries, enhancing search reliability and preventing syntax errors. These changes improve the configuration management and search capabilities of the code index system.

…on and debugging - Deleted the `debug-code`, `explore-code`, `guide`, `impact-analysis`, `index-codebase`, and `search-code` skills as they were redundant or outdated. - Added new skills: `discover`, `find`, `investigate`, `review`, and `setup` to enhance code exploration, debugging, and review processes. - Updated the knowledge graph configuration to reflect new node and edge types, improving the overall structure and functionality of the code index. - Enhanced search capabilities with semantic search and improved dependency tracing for better debugging and impact analysis. These changes streamline the workflow for developers, making it easier to explore codebases and assess changes effectively.

nnourr and others added 28 commits April 3, 2026 19:14

Add .gitignore and remove cached bytecode files

a67f317

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix rrf_score key mismatch in QueryEngine.query()

d29293e

The RRF merge stores scores under 'score' key but query() was reading 'rrf_score'. Found by audit agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix server.py dict key mismatches for process formatting

8215acc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use cached model locally, only download from HF Hub on first run

74faa07

Tries local_files_only=True first, falls back to download if not cached. Eliminates repeated HF Hub requests and rate limit warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

add install script

5a4dcfe

add setup script

0fb1c03

add setup script

ebc27c1

nnourr merged commit bf8c00d into main Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial commit#1

initial commit#1
nnourr merged 28 commits intomainfrom
Noureldeen-Ahmed/in-this-brand-new-project-i-want-to-create-a-code

nnourr commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nnourr commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant