SDL-MCP v0.11.5
SDL-MCP v0.11.5
Added
- AST-aware search edit languages: Extended
search.edittargeting:"identifier"andtargeting:"structural"beyond TypeScript/JavaScript to all built-in tree-sitter adapters, with pluginstructuralMatcherdescriptors for opt-in language support. - Pass-1 drain diagnostics: Added opt-in batch row counts and sub-timings for pass-1 write flushing so
deleteOldSymbols, file upserts, symbol references, symbol upserts, andDEPENDS_ONedge inserts can be profiled independently against a temporary graph DB. - SCIP generator diagnostics: Index refresh results and audit payloads now report generated SCIP indexes, skipped generated files, and non-fatal generator/ingest failures.
- Algorithm refresh controls: Added
indexing.algorithmRefreshconfig for worker-bounded PageRank/K-core and Louvain policy limits. - Provider-first SCIP execution: Added the provider-first indexing foundation for SCIP refreshes, including provider fact collection, stable provider IDs, LadybugDB graph-row materialization, CLI/SSE fallback reporting, full-refresh execution for completely covered SCIP indexes, and explicit errors when
indexing.pipeline: "providerFirst"is configured but provider execution or coverage is incomplete. - Provider-first shadow staging: Provider-first SCIP runs now write provider-materialized rows as streaming table-shaped CSV artifacts plus a manifest beside the active graph DB, preserving node-before-relationship load order. When same-run legacy fallback parses uncovered or provider-unusable files, the final shadow staging pass also includes those just-written active graph rows. The same phase bulk-loads those artifacts into a fresh shadow
.lbug, uses explicit CSV null handling so empty-string sentinels remain value-faithful, disables parallel CSV reading so quoted newlines in provider symbol text load correctly, builds secondary indexes when supported, checkpoints, and validates actual loaded row counts against expected counts before reporting the shadow DB loaded. Artifact or shadow-load failures are reported as skipped staging/load work and do not block the active LadybugDB materialization path after provider graph validation succeeds; unsupportedCREATE INDEXruntimes and secondary-index build failures are surfaced as non-fatal shadow DB warnings. - Provider-first shadow finalization and activation: Loaded provider-first shadows are now finalized after active graph finalization by writing auxiliary dependency symbols, final active edges, version rows, symbol versions, metrics, file summaries, clusters, processes, shadow clusters, and derived-state rows to finalization CSV artifacts and loading them into the shadow
.lbugwith LadybugDBCOPY, then validating active-versus-shadow counts before activation preflight. Finalized shadow summaries keep public real-symbol counts separate from unresolved or external auxiliary dependency symbols copied for edge parity and expose the finalization copy mode plus artifact manifest for diagnostics. Finalization now seeds non-repo-linked edge-target symbol nodes without adding missing repo links, and relationship rows with quoted endpoints, record separators in endpoint/property text, or CSV-quotedpass2-cppprovenance fall back to parameterized writes so unresolved quoted or multi-line dependency, cluster, process, shadow-cluster, and C++ pass-2 edge rows do not break activation. Live provider-first runs now close the active LadybugDB pool, swap the finalized shadow.lbuginto the active path, reopen the active DB for follow-up reads, keep the previous active DB as a backup, and roll back if activation or reopen fails. - Provider-first phase timings: Executed provider-first runs now include a normal CLI timing block with total provider-first wall time, the slowest provider-first bucket, phase durations for collection, coverage scan, active materialization, same-run legacy fallback, final shadow staging, shadow finalization, and activation, plus active materialization subphase timings for provider-owned symbol deletion, file upserts, symbol upserts, stale external pruning, external-symbol merges, and edge inserts. Provider-owned symbol upserts also report the combined
nodeAndRelCreatebucket so materialization tuning can track the provider symbol-node and ownership relationship COPY load separately from the outer symbol phase. These timings do not require broad--diagnosticsoutput. - Provider-first live progress: Index progress now has a first-class
providerFirststage with substages for coverage scanning, provider metadata/document/external-symbol/source-line collection, normalization, graph-row shaping, validation, coverage analysis, active materialization, shadow staging, shadow finalization, and shadow activation. The existing CLI, HTTP reindex stream, and MCP progress notifications all receive the same stage/substage payloads with counters when totals are known and heartbeat messages when totals are not. - Provider-first fallback diagnostics: Provider-first runs that still parse uncovered or provider-unusable files through same-run legacy fallback now report fallback file count, total and average fallback time, the slowest fallback subphase, pass1/pass1-drain/pass2/finalize buckets, nested
finalizeIndexingsubphases, derived subphase buckets, version snapshot details (latest,create,snapshot,readPages,writePages), deferred-index details (secondary,config,retrieval), deferred retrieval lifecycle details (symbolDiscovery,symbolFts,symbolVectors,entityDiscovery,entityFts,fileSummaryVectors,agentFeedbackVectors), secondary buckets such as shared-state initialization, import re-resolution, edge finalization, versioning, deferred indexes, and memory sync, plus an unaccounted residual and sample fallback paths. Nested finalization, version snapshot, and deferred-index timings are collected for provider-first fallback even when broad index diagnostics are disabled, including Metrics write execution/wait buckets, FileSummary load/build/write subphases, and retrieval index lifecycle subphases. - Provider-first pass-2 diagnostics: Provider-first fallback diagnostics now include pass-2 target-selection, import-cache, resolver-dispatch, write-active, write-queue, COPY placeholder-repair, placeholder symbol-metadata repair, placeholder repo-link repair, COPY insert, and generic repair-insert timing buckets. COPY insert timing is split into temporary CSV materialization and
COPY DEPENDS_ON FROM, and generic repair-insert timing is further split into row preparation, source repo-link symbol metadata, source repo-link relationship repair, endpoint metadata, target metadata, target repo-link repair, relationship create, and relationship update. Pass-2 write counters also report flushes, COPY edges, placeholder rows, skipped repeated placeholder rows, repair cause rows, repair cause drift, effective repair rows, and small COPY batches. Resolver diagnostics include per-resolver phase timing buckets plus count/size metrics such as file bytes read, include-index files parsed, cache hits, and extracted call counts for attribution inside heavy pass-2 resolvers. This makes large-repo fallback runs distinguish resolver CPU/read work, LadybugDB writer time, placeholder target repair, repo-link repair, unsafe endpoint repair, small COPY-safe batch fallback, and relationship import volume without enabling broad index diagnostics. - Index runtime provenance:
sdl-mcp indexnow prints the loaded SDL-MCP package version, Node.js version, and command module path, and delegated HTTP indexing reports the server process runtime identity so benchmark logs can distinguish current builds from stale global installs or long-lived servers. Per-repo summaries also report caller-visible wall time separately from indexed-phase duration so SCIP generator/pre-refresh cost is visible in normal logs. - SCIP generator cache: Generated scip-io artifacts are cached under
~/.sdl-mcp/cache/scip-io/by source/config fingerprint and restored on unchanged refreshes, avoiding expensive compiler generator runs on repeated full indexes. Warm hits use a latest stat-signature manifest before falling back to the exact content-hash fingerprint path, keeping large unchanged repos from hashing every input file just to restore cached SCIP output. Usable generated indexes are now cached even whenscip-ioexits nonzero because another requested language failed, so mixed-language repos can reuse the successful language artifact on the next unchanged run. CLI summaries report generator cache hit/store diagnostics when the cache participates, including separate generator, prepare, save, and restore timing buckets when available. - Provider-first coverage denominators: C/C++ provider-first summaries now split the broad SDL-MCP scan scope from SCIP semantic eligibility. The semantic denominator is the union of scan-scope files named by discovered
compile_commands.jsonentries and provider-emitted C/C++ header/include documents inside scan scope, while provider document counts still report the SCIP docs emitted inside scan scope.
Changed
- CI benchmark guardrails: Kept
benchmark:cion Linux but made it a JavaScript/LadybugDB regression gate by explicitly disabling the native addon and removing the native-build dependency. Native crash and parity coverage stays with the native-build and sync-memory jobs, while the benchmark lane continues to catch threshold regressions without Linux exit-139 flakes from the addon path. - CI pass-1 write stabilization: Added
SDL_MCP_PASS1_STABLE_DB_WRITES=1for CI indexing lanes so pass-1 LadybugDB flushes do not overlap native tree-sitter or Rust parsing in the same Node process. This preserves native sync-memory coverage while avoiding hosted-runner exit-139 crashes at the parser/graph-write boundary. - CI sync memory setup: Restored locked external benchmark repo setup in the sync-memory job so the default SDL-MCP config can index every configured repository before exporting CI memory artifacts.
- Provider-first provider collection: Normal CLI timing output now breaks down the provider collection bucket into SCIP metadata, document decode, external-symbol loading, source-line loading, normalization, row shaping, and validation subphases. Provider source-line loading also retains broad import-alias context only for alias-bearing import lines, reducing provider-first memory and normalization work for large repos with many plain imports.
- Pass-1 write batching: Centralized LadybugDB write chunk sizing, raised safe edge/reference/file defaults to reduce pass-1 prepared statement count, let pass-1 skip redundant existing
DEPENDS_ONrefreshes after source-symbol replacement, moved full-refresh stale-symbol deletion ahead of pass-1 flushes, and collapsed file-scoped stale symbol relationship cleanup into oneDETACH DELETEpass while keeping ID-keyed metrics/reference/cache cleanup explicit. - SCIP generated index handling: Raised decoder file caps to 512 MiB and added generated split-index fallback with SHA-256 dedupe for identical TypeScript/JavaScript split artifacts.
- Provider-first safety: Allowed active SCIP provider-first execution only after coverage validation, so
autouses legacy fallback when SCIP execution or coverage is incomplete and explicitproviderFirstfails loudly instead of replacing the live graph with partial provider data. - Provider-first readiness: Split semantic readiness from provider-first graph readiness so SCIP provider-first indexing skips inline semantic refresh, reports deferred semantic readiness in CLI output, and leaves semantic derived state dirty for later refresh.
- Provider-first SCIP calls: Promoted SCIP reference occurrences to exact
calledges only when repo source lines prove the expected symbol text plus invocation syntax, including constructor symbols whose SCIP display name is`<constructor>`but whose source token is the owning class innew ClassName(...), synthetic type-literal member symbols whose source token is the member suffix, Python nested callable descriptors such aseventclass().wrapperwhose source token is the terminal callablewrapper, Python import aliases where SCIP ranges cover the fullname as local_nameclause, C++/clang qualified, member, and template calls whose retained local token window proves the terminal callable and invocation syntax, C++ constructor declarations such asAPInt Offset(...)when the provider symbol is the constructor and the declarator range points at the variable name, C++/clang location-only macro descriptors such as`.../assert.h:77:11`!whose source token is an invoked identifier, and single-line or multi-line named-import aliases such asimport { original as localAlias }. Python module initializer references that expand to a qualified member invocation, such aslit.utilinsidelit.util.warning(...), remain neutral because the module is not the invoked callable. Non-import TypeScriptasexpressions are ignored for alias proof. Readable non-call references, such as property keys and broad value reads, remain neutral occurrence facts without blocking call-proof readiness. C++ qualifier-only references, template-argument references, invocation-like text inside string/comment literals, and mismatched all-caps macro wrapper tokens remain neutral occurrence data instead of call-proof failures. Unresolved references, invocation-shaped stale SCIP ranges on the actual callable token, and unavailable source lines stay conservative. Provider-first source loading now includes bounded import blocks plus small C++ reference windows so alias and multi-line C++ proof work in real SCIP execution, not just in direct normalization tests. Provider-only SCIP runs compute graph-derived cluster/process/algorithm state only when call proof is complete; otherwise they leave graph-derived readiness dirty with an operator-visible health reason. CLI coverage output now groups incomplete call-proof reasons with reference counts, affected file counts, sample paths, and bounded expected/actual samples for symbol-text mismatches. - Provider-first C++ call-proof follow-up: Treats overlapping clang macro-expansion ranges such as
clEnumValN(...)and location-only macro tokens such asoffsetof(...)as neutral when scip-clang maps namespace, type, enum, or member symbols onto the invoked macro token, proves constructor calls in member-initializer lists when the occurrence range points at the member name but nearby declarations expose the constructed type, proves trailing local-class declarator constructors such asstruct RestorePath { ... } restore_path(path);, proves typedef-alias constructor declarations such asMutexLock l(&mutex)when SCIP also exposes the alias type occurrence in the same descriptor scope, proves unary/operator-token references such asoperator~andoperator()when clang ranges the source operator token, normalizes backtick-wrapped clang descriptor names and balanced trailing template arguments such as`~V8`andScopedHashTableScope<K, V>to their source spelling, and keeps implicit result/conversion references neutral for C++ named casts, callable-object invocations, constructor conversions over a different expression call token, and conversion-operator declarations such asoperator ArrayRef<T>(). - Provider-first C++ multi-line call proof: Allows clang/cxx multi-line provider ranges to enter the bounded C++ token-window proof when source lines are retained, proving multi-line template/member invocations while keeping broad non-call ranges and C++ control-flow keywords neutral.
- Provider-first C++ literal and declarator proof: Keeps clang/cxx string literal spans reported as implicit
StringRefconstructor references neutral, including UTF-8 byte-column ranges over non-ASCII literals, and proves constructor references in comma-separated declarations such asAPFloat MA(Sem), SC(Sem)from the shared exposed type. - Algorithm refresh policy: Lowered the default Louvain
maxCallEdgesthreshold from50000to10000so optional shadow community detection is policy-skipped before it dominates provider-first full-index wall time. PageRank and K-core still run by default. - Version snapshot batching: Replaced per-symbol
SymbolVersionsnapshot writes with cursor-paged symbol snapshot reads. Fresh full-index versions now stream larger bounded read pages into one buffered CSV artifact and one LadybugDBCOPY, while repair/reuse paths keep chunkedUNWIND MERGEwrites so incomplete or reused snapshots can still be safely filled. - Provider-first memory release: Explicitly releases decoded provider fact payloads and discarded graph-row copies after provider coverage analysis so large SCIP runs do not carry occurrence/source-line arrays into same-run legacy fallback and version snapshot creation.
- Provider-first active materialization: Provider-first active graph writes now use a known-fresh symbol writer that writes provider
Symbol,SYMBOL_IN_FILE, andSYMBOL_IN_REPOrows to temporary CSV artifacts and imports them with LadybugDBCOPY, plus a known-endpoint edge loader that writes providerDEPENDS_ONrows to a temporary CSV and imports them with LadybugDBCOPYinside the active transaction. This avoids generic relationship existence checks, endpoint repair, stale optional-field preservation, per-row endpoint matching, and the extra symbol or edge relationship probes needed by broader legacy writes before shadow activation while leaving the legacy writer defaults unchanged. - Provider-first active-row reuse: Medium repeat provider sets now retire stale active rows in chunks and reload them with the known-fresh COPY writer up to 100k provider symbols, avoiding the slow merge-safe symbol path for LLVM-sized C/C++ refreshes. Larger active provider-row reuse is gated by a recorded generated-SCIP input fingerprint; if the generated provider artifact changes, SDL-MCP runs merge-safe file and symbol upserts instead of reusing stale active rows solely because the existing provider symbol set is large.
- Metrics materialization: Full post-index metric refresh now replaces the repo's current
Metricsrows with a delete-plus-COPYtransaction from a buffered temporary CSV, while partial incremental refreshes keep merge-safe batched upserts. Full refreshes also persist a repo-level metrics payload fingerprint and skip the delete-plus-COPYentirely when fan/churn/test/canonical/centrality values are unchanged, leavingupdatedAtstable instead of rewriting rows just to move timestamps. This removes the large full-indexUNWIND MERGEmetrics write loop without risking data loss for scoped updates. - Provider-first fallback Metrics writes: Incremental Metrics writes now probe existing Metrics IDs inside the write transaction, COPY-load absence-proven missing rows above the threshold, directly create small missing batches, and
MATCH-update existing rows. Provider-first fallback diagnostics surface the nested Metrics write phases so fresh isolated benchmark runs can distinguish probe, CSV materialization, COPY, direct-create, and existing-row update costs. - Metrics test-reference cache: Test-reference discovery now persists per-repo matched test symbols in the SDL-MCP temp cache with file size and content-hash metadata, so repeated one-shot CLI full indexes can reuse unchanged test-file matches instead of re-reading and re-tokenizing every test file. Metrics refreshes reuse the already-loaded indexed file list and content hashes as the candidate test-file set, avoiding a second filesystem glob walk over large repositories and filesystem stats for unchanged cached test files while keeping ignored or unscanned files out of graph metrics. Warm runs collect candidate names from cached refs and changed test-file tokens before building the current symbol-name lookup, so large provider-first repos no longer allocate name buckets for every symbol when only a small subset appears in tests. Duplicate-heavy symbol names are treated as low-signal test-reference tokens to avoid attaching one test file to hundreds of same-named symbols in large repositories.
- FileSummary materialization: File summary refresh now compares the newly generated
summaryandsearchTextwith existingFileSummaryrows and skips unchanged payloads instead of rewriting every file-level summary just to advanceupdatedAt. Full-repo summary refreshes use directSymbol.repoIdsymbol reads instead of giantfileId IN (...)predicates or extraSYMBOL_IN_REPOtraversals, derive exported-name search text from the same symbol facts, and in provider-first full runs consume provider-owned symbol facts directly from the already materialized provider rows so the summary phase only queries LadybugDB for fallback-owned files. Changed existing rows use a node-only update path that avoids File/Repo relationship probes, and first-time summary rows are loaded through temporary CSV artifacts with LadybugDBCOPYforFileSummary,FILE_SUMMARY_IN_REPO, andSUMMARY_OF_FILE. The merge-safe upsert path remains the fallback if the known-new COPY insert fails. The reportedupdatedcount reflects rows actually written. - Provider-first pass-2 imports: Provider-first full runs now seed pass-2 import caches from provider-owned graph rows. The generic import resolver gets exported
(symbolId, name)facts without re-reading provider-owned files from LadybugDB, and Python pass-2 also receives exported kind/range details so imported class-method resolution can avoid per-targetgetSymbolsByFilereads for provider-owned modules while preserving the DB fallback for legacy-owned files. - Provider-first pass-2 writes: Full-mode pass-2 now splits COPY-safe call edges from repair-only edges. Large COPY-safe batches use the known-symbol
DEPENDS_ONCOPY path after source-symbol replacement and bulk placeholder repair for safe unresolved targets, while small batches and rows with unsafe relationship endpoints or copied edge properties that require CSV quoting keep the generic writer with full-mode skip flags so COPY setup overhead does not dominate small fallback runs and rare C++ provenance with commas, quotes, or newlines cannot break the relationship CSV load; incremental pass-2 remains on the refresh-capable generic writer. - Provider-first C/C++ pass-2 fallback performance: C++ pass-2 now groups call sites by owning symbol once per file, warms the shared include index before per-file resolution, uses a resolved-value include-index cache after warmup, reuses the pass-level C++ include index for current-file symbol mapping, scopes include-index import parsing to active pass-2 target files, reuses pass-1 C++ imports/content for include-index resolution and pass-1 source text for current-file parsing without reusing pass-1 calls, builds namespace-member lookups from the same cached repo symbol rows with a linear namespace-prefix pass instead of a namespace-by-symbol cross product, and avoids promise-await overhead for synchronous pass-2 batch submissions. It also raises the sequential full-mode pass-2 flush default to
256files /32,768edges, skips no-op fresh-copy incoming-symbol deletes when none of the incoming IDs exist in the current graph, avoids repeated unresolved-placeholder target repair during full-mode pass-2 COPY batches once a target was successfully ensured earlier in the same pass, and groups placeholder repo-link repair byrepoIdso each chunk matches the repo once instead of per row. C pass-2 now reuses a pass-level repo symbol index for current-file symbol lookup, groups calls by owning symbol once per file, and shares the same synchronous batch-submit path, cutting large LLVM fallback C resolver work while preserving pass-2 C edge counts. Full-mode generic repair now also skips the existing-relationship probe for fresh pass-2 call edges, groups target placeholder repo-link repair by repo, and keeps non-real target metadata off the file-backed cleanup join, preserving duplicate-protection for generalinsertEdgescallers while reducing the provider-first repair writer. The provider-first fallback benchmark harness also clears graph DB path environment overrides for child index runs so per-repeat graph DB artifacts are authoritative. - Provider-first unresolved-call cleanup: Final edge cleanup now gathers distinct unresolved call target IDs, applies the existing builtin classifier once per target, and deletes repo-scoped call relationships to builtin targets directly in LadybugDB with duplicate-safe relationship deletion. This preserves dotted-call semantics such as
console.logwhile avoiding materializing every unresolved call edge and then replaying(from, to, type)delete rows through the generic edge deleter. - Cluster/process replacement writes: Full cluster and process replacement now uses dedicated post-delete relationship insert helpers for
BELONGS_TO_CLUSTERandPARTICIPATES_INrows. The generic upsert helpers still keep probe-and-update semantics for partial callers, while full replacement skips redundantOPTIONAL MATCHand second update passes after old relationships have already been removed. - Provider-first fallback pass-1 stability: Complete same-run provider-first legacy fallback now uses the tuned legacy machinery, including native Rust pass-1 when configured and available, parser workers for TypeScript pass-1, normal configured concurrency, and the batch persistence accumulator. Intentionally partial provider-first fallback remains on inline TypeScript parsing and direct per-file LadybugDB writes because shadow activation is already blocked by the skipped tail and that mixed partial path has hit hard native and worker exits on large C++ repos.
- Provider-first shadow handoff: Full provider-first runs now skip expensive shadow DB staging and finalization when call-proof gaps or fallback-cap gaps have already made graph-derived state dirty and activation impossible. The active graph still finalizes normally, and CLI output reports the skipped shadow reason instead of spending time building a shadow DB that cannot be finalized or swapped into place.
- Provider-first fallback caps: Same-run legacy fallback after provider-first materialization is now guarded by
indexing.providerFirst.maxLegacyFallbackFiles(default1000000). The high default keeps full provider-first graphs complete by routing broad uncovered or provider-unusable tails through the tuned legacy fallback path; lowering the cap remains available for partial iteration or resource protection, and capped runs report the skipped count while leaving graph-derived readiness deferred. - Provider-first semantic fallback cap: Added
indexing.providerFirst.maxSemanticEligibleFallbackFiles(default0) for the special case where the full fallback gap is overmaxLegacyFallbackFilesbut a semantic-eligible subset is known. SDL-MCP skips that partial subset by default because the skipped outside-semantic tail still blocks shadow finalization and activation; users can raise the new cap when active-graph fallback coverage is worth the extra indexing time despite the partial graph. - Provider-first deferred index work: Provider-first runs that defer semantic refresh now defer Symbol FTS, entity FTS, and Symbol/FileSummary vector-index creation out of the indexing wall-clock. A later non-fresh startup or readiness refresh bootstraps retrieval indexes, while semantic embedding refresh rebuilds HNSW vector indexes after actual vector rows are written.
- Cluster/process materialization: Canonical cluster and process refresh now batch parent node updates and flatten member/step relationship rows into one batch per refresh instead of issuing one parent write plus one relationship batch per cluster or process. Provider-first fallback diagnostics now surface
clusterWrite.*andprocessWrite.*subphase timings so derived-state write bottlenecks remain visible in normal CLI output. - SCIP generator language filtering:
scip-iopre-refresh now derives a--langfilter from the repo's configuredlanguages, so TypeScript/JavaScript/Rust repos do not invoke unrelated Java, C#, Go, C++, or other generators. Explicit--lang/-lentries inscip.generator.argsstill override the automatic filter, and split fallback ignores unchanged pre-existing split files so stale artifacts from unrelated languages are not ingested. - Provider-first coverage diagnostics: CLI coverage output now breaks down the
provider unusablebucket by reason, including missing coverage facts and provider-covered files with no usable symbols, plus skipped-symbol reason counts when provider symbols were emitted but SDL-MCP did not materialize them. Rust module descriptors ending in/now materialize as SDLmodulesymbols instead of being reported as unknown descriptor suffixes, reducing provider-unusable fallback for Rustmod.rsfiles. Repeated rust-analyzer crate namespace symbols such ascrate/are coalesced into one provider symbol so expected namespace duplication does not trip the unsafe duplicate-symbol guard. C++ provider symbols emitted with thecxxscheme now use clang-style descriptor mapping, and ambiguous native symbols with definitions in multiple files are skipped as provider-unusable facts instead of aborting the whole SCIP provider run. - Provider-first staging format:
indexing.providerFirst.stagingFormat: "parquet"currently records a Parquet-to-CSV fallback reason in the staging manifest because CSV is the implemented bulk-load artifact format for this phase.
Fixed
- Retrieval index failure surfacing: Required retrieval index creation failures during the post-full-index deferred bootstrap now stop the index run with the failed index names instead of being swallowed and later appearing only as
FTS: ABSENTindoctor. - File watcher ignores: Compiled repository ignore globs into the Chokidar startup predicate so modern Chokidar versions prune ignored directories such as
node_modules,.bun,dist-*, and build outputs before opening watches. Watcher event filtering now reuses the same glob semantics and the scanner's language-extension mapping, keeping watcher scope aligned with indexed source files and preventingEMFILE: too many open files, watchstorms in dependency-heavy repos. - MCP tool schema compatibility: Flattened root-level
oneOf/anyOf/allOfcomposition in advertised tool input schemas, includingsdl.search.edit,sdl.symbol.edit, and gateway wire schemas, so Claude/Anthropic clients that require top-leveltype: "object"schemas can accept SDL-MCP tool lists while runtime Zod validation remains strict. - Cluster refresh crash: Dropped and rebuilt the Cluster FTS index around topology-changing cluster replacement so LadybugDB does not access-violate when deleting the old cluster set during delegated incremental indexing. Rebuilds now fail closed when FTS is available, skip only when the global Cluster table is empty, and recreate Cluster FTS when rows return after an absent-index state.
- FTS bootstrap safety: Deferred FTS index creation for empty entity tables and made FTS existence checks table-aware so same-name indexes on other tables do not mask missing required indexes.
- Plugin adapter startup: Loaded configured plugin adapters during server, CLI serve, direct indexing, and CLI tool startup so plugin
structuralMatcherdescriptors are available tosearch.edit, and resolved configured plugin paths relative to the config file with trusted-root containment. - Search edit hardening: Reused realpath-validated, handle-based, size-capped file reads for single and batch previews; capped structural
requiredCapturesmaps to bounded safe keys; and added an aggregate structural query time budget that is checked before candidate parsing. - Batch search edit safety: Recomputed batch operation ranges from the aggregate-read content instead of diffing against stale per-operation preview output, deduplicated same-file byte accounting across child operation previews, and compacted stored skipped-file summaries.
- Index drop confirmation: Made missing
DROP_FTS_INDEX/DROP_VECTOR_INDEXprocedure/function handling fail closed unless strictSHOW_INDEXESintrospection confirms the table index is absent, with missing table metadata treated as unconfirmed. - LadybugDB extension reloads: Guarded replacement connection
LOAD EXTENSIONcalls with a pre-load WAL checkpoint and cleared global extension capability state on per-connection load failures so recycled sessions do not bypass the dirty-WAL crash guard. - CLI plugin loading: Delayed direct CLI plugin imports until local adapter registry state is needed, avoiding plugin execution for delegated indexing and metadata-only tool commands.
- Gateway validation parity: Mirrored direct
search.editstring and array caps in the repo gateway schema. - Structural search-edit validation: Candidate-specific tree-sitter query compilation failures now surface as validation errors instead of false no-match previews.
- LadybugDB algorithm refresh: Drop and rebuild repo-scoped projected graphs before post-index algorithm refresh so long-lived HTTP server connections do not reuse stale projections during incremental indexing.
- Large-repo indexing memory: Released pass-1/pass-2 symbol-map bridge caches before version snapshot creation so large full indexes do not carry full-repo symbol maps into post-index finalization.
- Incremental metrics recovery: No-op incremental refreshes now inspect the current graph for incomplete version snapshots, missing metrics/file summaries, stale or absent derived state, and configured SCIP indexes before returning. Missing
Metricsrows are repaired through a dedicated LadybugDB aggregate/write path instead of hydrating the full edge graph, while SCIP edge changes still use the full recomputation path for correctness. - Algorithm refresh timeouts: Canonical cluster/process refresh now completes independently of optional graph algorithms; PageRank/K-core run in a killable worker, centrality writes are preserved before Louvain, and large call graphs skip Louvain by policy instead of timing out the post-index session.
- Provider-first graph facts: Kept SCIP provider symbol IDs stable across line movement, stopped promoting broad SCIP reference occurrences to exact call edges, pruned stale SCIP external symbols during full provider materialization, and batched external-symbol writes to avoid thousands of single-writer round trips.
- Provider-first duplicate SCIP facts: Coalesced duplicate SCIP documents for the same normalized repo-relative path and duplicate symbols/occurrences inside a document before provider fact emission, preserving unique facts without producing duplicate
FileorSymbolprimary keys in large multi-root SCIP indexes. - Provider-first SCIP symbol ownership: Stopped materializing referenced-only SCIP
SymbolInformationmetadata as file-local symbols. Occurrences in reference documents now resolve to the definition document's provider symbol, preventingscip-pythonduplicate native symbol failures while preserving true multi-definition collisions for validation. - Provider-first scan scope: Safe repo-relative SCIP documents that fall outside the configured repository scan scope, such as files for languages not listed in the repo config, are now filtered from provider facts and rows instead of aborting the run. Absolute or path-traversing provider paths still fail coverage validation.
- Scanner language extensions: Repository scans now derive source extensions from the built-in adapter registry for configured C, C++, Python, PHP, Kotlin, and shell languages, so provider-first coverage no longer ignores valid
.cc,.cxx,.hh,.hpp,.hxx,.h,.pyw,.phtml,.kts,.bash, and.zshfiles solely because the repo language was configured by its short SDL-MCP language ID. Configured C++ scan scope now also includes provider-emitted C/C++ companion extensions.c,.h,.def, and.inc, and configured Python includes.pyi, so safe SCIP documents for headers, generated include fragments, and Python stubs are not filtered out before provider materialization. - Provider-first generator warnings: Full provider-first runs now continue after
scip-ioreports per-language generator failures when a usable SCIP index still decodes into provider rows; missing configured SCIP index files remain fatal for explicitproviderFirstand legacy-fallback triggers forauto. - Provider-first external snapshot boundary: Version snapshots and shadow finalization now exclude
external=truedependency support symbols even if stale metadata still labels them assymbolStatus: "real". Shadow parity treats those nodes as auxiliary and copiesSymbolVersionandMetricsrows only for repo-owned non-external symbols, so activated provider-first shadows do not inflate public symbol or version counts. - Scanner glob scope: Corrected wildcard directory ignores such as
**/dist-*/**so generateddist-*directories remain excluded while checked-in source files nameddist-runtime.tsordist-stdio-smoke.test.tsstay in the scanned repo scope. - Derived-state readiness: Stopped graph-derived startup recovery from clearing semantic summary/embedding dirty flags, and skipped semantic-only stale rows instead of enqueueing a graph refresh that cannot clear them.
48 non-merge commits from 1 contributor since v0.11.4.