Skip to content

initial commit#1

Merged
nnourr merged 28 commits intomainfrom
Noureldeen-Ahmed/in-this-brand-new-project-i-want-to-create-a-code
Apr 14, 2026
Merged

initial commit#1
nnourr merged 28 commits intomainfrom
Noureldeen-Ahmed/in-this-brand-new-project-i-want-to-create-a-code

Conversation

@nnourr
Copy link
Copy Markdown
Owner

@nnourr nnourr commented Apr 14, 2026

No description provided.

nnourr and others added 28 commits April 3, 2026 19:14
Set up the code-index project with uv and hatchling, including all
dependencies (tree-sitter, faiss-cpu, sentence-transformers, networkx,
cdlib, mcp, click). Create the package structure under src/code_index
with subpackages for graph, storage, ingestion, embeddings, and query.

Implement core modules:
- config.py: constants for index paths, embedding settings, graph schema
- graph/models.py: Node/Edge dataclasses and KnowledgeGraph with O(1)
  lookup and indexed edge iteration
- storage/db.py: full IndexDatabase class with SQLite WAL mode, FTS5
  search, batch insert, and graph serialization
- cli.py: click-based CLI entry point

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement EmbeddingEngine class (Step 2B) that vectorizes code graph
nodes using sentence-transformers and indexes them with FAISS IndexFlatIP
for cosine similarity search. Supports batch embedding, progress
callbacks, persistence to disk, and id_map recovery from SQLite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Ingestion parsers: filesystem walker, parser_base types, Python parser, TypeScript parser
- Plugin config: plugin.json, .mcp.json
- Skills: guide, explore-code, search-code, impact-analysis, debug-code, index-codebase

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s (Wave 3)

- Import resolver with Python/TS resolution and tiered SymbolTable
- Heritage processor: EXTENDS, IMPLEMENTS, HAS_METHOD, OVERRIDES edges
- Call processor: CALLS edges with confidence-scored name resolution
- Community processor: Leiden clustering with NetworkX fallback
- Process detector: BFS execution flow tracing from entry points

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pipeline: 10-phase indexing from file discovery to embedding generation
- Query engine: hybrid search (BM25+semantic+RRF), context, impact analysis,
  change detection, coordinated rename, cluster/process views
- Fulltext, semantic, and hybrid search modules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MCP server: 7 tools (query, context, impact, detect_changes, rename,
  list_repos, get_clusters) and 7 resources with lazy initialization
- CLI: analyze, status, clean, list, search, mcp commands

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Test fixtures: sample Python and TypeScript projects
- Unit tests: graph models, parsers, import resolver, RRF merge
- Integration tests: full pipeline + query engine on sample fixtures
- Updated query engine with improved structure and logging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RRF merge stores scores under 'score' key but query() was reading 'rrf_score'.
Found by audit agent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix tree-sitter 0.25 API: use QueryCursor instead of deprecated Query.matches()
- Fix TypeScript parser: use tree_sitter_javascript for JS, fix heritage parsing
- Fix db.py: use FTS5 rebuild command, allow cross-thread SQLite access
- Fix test assertions to match actual API responses

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- import_resolver: Fix File node ID mismatch causing zero IMPORTS edges
  (was using full path as name instead of filename)
- import_resolver: Move tsconfig loading outside per-import loop
- server.py: Fix KeyError crashes from process dict key mismatches
  (process_name vs name, matched_symbols vs symbols)
- cli.py: Fix same dict key mismatches in search display
- test_integration: Add db.close() teardown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tries local_files_only=True first, falls back to download if not cached.
Eliminates repeated HF Hub requests and rate limit warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- eval_communities: cohesion, separation, file coherence, size balance, naming
- eval_processes: entry point quality, cross-file, length balance, uniqueness, coverage
- eval_search: hit rate, rank quality, anti-pattern avoidance, test-file penalty
- run_evals: combined report runner with weighted overall scores

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Communities:
- Add IMPORTS to structural edges (was missing cross-file signal)
- Stop pruning degree-1 nodes (killed coverage)
- Lower Leiden resolution (1.0/0.8 vs 2.0/1.0) for finer clusters
- Add orphan assignment: unassigned symbols join nearest community by directory

Processes:
- Increase max_processes 75->500, max_branching 4->8, max_trace_depth 10->12
- Lower min_steps 3->2 to capture short but real flows
- Broaden entry point regex (HTTP methods, CRUD, webhooks, tests)
- Allow entry points with 1-2 callers (not just zero)

Results on ~/decoda:
- Community coverage: 42.7% -> 99.6%
- Communities: 141 -> 617
- Cohesion: 0.692 -> 0.941
- Processes: 40 -> 181
- Overall: 0.786 -> 0.807

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reads test patterns from:
- pyproject.toml [tool.pytest.ini_options] testpaths + norecursedirs
- pytest.ini, setup.cfg
- jest.config.js/ts testPathIgnorePatterns
- package.json jest.testMatch
- vitest.config.ts include patterns
- Falls back to common conventions (test_*, *.test.ts, *.spec.ts, __tests__/)

On decoda: 551 test files excluded (11.3%), indexing 40% faster (83s vs 149s),
search score 0.879 -> 0.913 (zero test files in results), process grouping
12.5% -> 50%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After the initial BFS from entry points, a second pass picks up uncovered
nodes by tracing backward to a root then forward to a terminal. This ensures
more of the call graph participates in processes.

On decoda: process coverage 1.3% -> 9.4%, processes 178 -> 1964,
overall score 0.810 -> 0.839, search score 0.913 -> 0.980

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…solution

Root cause: resolver used absolute paths internally but graph stores relative
paths, so nothing matched. Also only checked repo-root tsconfig, missing
monorepo subdir configs (console/tsconfig.json).

Fixes:
- All path matching now uses repo-relative strings
- .resolve() on relative imports to collapse ../
- Monorepo tsconfig discovery (scans subdirs for tsconfig.json)
- @/* path alias resolution with correct base_dir
- Import-flow pass in process detector for call-graph-isolated symbols

Results on decoda:
- IMPORTS edges: 1,727 -> 22,478 (13x)
- TS import resolution: 0% -> 79%
- Overall: 0.810 -> 0.842

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When multiple candidates match a call, prefer the one from an imported
file, then same directory, instead of giving up. This resolves internal
helpers that were invisible because their name existed in multiple files.

On decoda:
- CALLS edges: 21,998 -> 54,766 (2.5x)
- Search process grouping: 50% -> 100%
- Overall: 0.842 -> 0.848

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes impact latency regression from 2.5x more CALLS edges.
Before: 1,092ms avg. After: 57ms avg (19x faster).

Also skips MEMBER_OF/STEP_IN_PROCESS/CONTAINS during BFS traversal
and collects process participation inline instead of a second pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced SQLite-backed BFS with in-memory KnowledgeGraph lookups.
Removed the 200-node cap — full traversal now runs in <1ms because
dict lookups are ~100x faster than SQLite queries.

Results: _send_email_notification now shows 416 affected symbols
(was 82 with cap), 0ms (was 163ms).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added `fastembed` version 0.8.0 to the project dependencies in `pyproject.toml`.
- Updated `uv.lock` to include `fastembed` and its dependencies.
- Enhanced CLI output to display matched symbols and updated scoring metrics in `cli.py`.
- Refined impact and change result formatting in `server.py` to improve clarity and detail.
- Adjusted integration test setup to ensure proper database closure after tests.

These changes improve dependency management and enhance the output of the CLI and server functionalities.
- Introduced `EdgeType` and `NodeLabel` enums to standardize edge and node types across the codebase.
- Updated `config.py` to include new constants for confidence thresholds and community detection parameters.
- Created `server_formatters.py` to handle markdown formatting for various server responses, improving output clarity.
- Refactored existing code to utilize the new enums and constants, enhancing maintainability and readability.

These changes streamline the configuration management and improve the server's response formatting capabilities.
- Modified the `.mcp.json` file to specify a new directory for the code index server.
- Updated `server.py` to include initialization options when running the server, improving its setup process.
- Changed the scoring metric in `engine.py` to use `rrf_score` instead of `score`, ensuring more accurate results.
- Introduced a new static method in `db.py` to sanitize FTS queries, enhancing search reliability and preventing syntax errors.

These changes improve the configuration management and search capabilities of the code index system.
…on and debugging

- Deleted the `debug-code`, `explore-code`, `guide`, `impact-analysis`, `index-codebase`, and `search-code` skills as they were redundant or outdated.
- Added new skills: `discover`, `find`, `investigate`, `review`, and `setup` to enhance code exploration, debugging, and review processes.
- Updated the knowledge graph configuration to reflect new node and edge types, improving the overall structure and functionality of the code index.
- Enhanced search capabilities with semantic search and improved dependency tracing for better debugging and impact analysis.

These changes streamline the workflow for developers, making it easier to explore codebases and assess changes effectively.
@nnourr nnourr merged commit bf8c00d into main Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant