Skip to content

v1.1.2 - Purple Haze

Choose a tag to compare

@orneryd orneryd released this 26 May 19:14
· 123 commits to main since this release

[v1.1.2] Purple Haze - 2026-05-26

Headline release: Bolt over WebSocket lands end-to-end so browser-based Neo4j drivers connect to NornicDB without a proxy, and per-database BM25 + vector index master switches ship as a first-class memory and warmup-cost lever for multi-tenant deployments. Three independently reported Cypher correctness regressions (mcp-neo4j-memory) are fixed with deeply-asserted parity against Neo4j 5.x DDL and Lucene wildcard semantics. A profile-led overhaul of the shortestPath traversal stack drops latency ~400× on the demo workload. No on-disk format changes; existing v1.1.x databases upgrade transparently.

shortestPath benchmark — 500K nodes / ~3.97M edges

Single-process traversal latency at scale. Generated by:

go test ./pkg/cypher/ -run TestLargeScaleShortestPath_HopBuckets \
    -timeout 30m -largescale -v

Hardware: Apple M3 Max. Storage chain: BadgerDB → AsyncEngine → NamespacedEngine → StorageExecutor (the same chain the HTTP server uses).

Setup

  • 500,000 Star nodes across 1,000 sectors (500 per sector)
  • 3,969,418 HYPERLANE edges (forward+reverse), avg ~16 per node
  • Direct BulkCreateNodes / BulkCreateEdges, property index built after bulk insert + count verify (mirrors a typical "load → index → query" workflow)
  • Reference BFS over 200 random sources to bucket pairs by exact hop distance
  • 30 random pairs benched per depth bucket

Latency table (depths 1–60, 30 samples per bucket)

hops min median p95 max
1 17.5µs 94.8µs 319.5µs 328.0µs
2 15.6µs 106.6µs 137.0µs 342.8µs
3 19.7µs 162.3µs 225.5µs 229.9µs
4 21.0µs 342.7µs 4.6ms 6.3ms
5 22.1µs 1.23ms 4.97ms 36.4ms
6 21.0µs 1.26ms 1.57ms 1.96ms
7 18.9µs 1.56ms 4.98ms 16.0ms
8 22.8µs 2.39ms 24.6ms 39.6ms
9 19.9µs 46.3ms 51.7ms 53.6ms
10 27.1µs 44.5ms 51.4ms 53.2ms
11 22.9µs 51.7ms 57.3ms 67.3ms
12 21.5µs 63.4ms 79.6ms 85.3ms
13 25.5µs 76.6ms 84.2ms 90.8ms
14 25.8µs 78.9ms 89.1ms 106.3ms
15 23.6µs 80.8ms 93.1ms 93.3ms
16 24.0µs 97.9ms 112.7ms 115.6ms
17 24.5µs 108.8ms 113.3ms 125.2ms
18 30.0µs 111.0ms 118.7ms 131.0ms
19 24.3µs 121.1ms 135.3ms 141.1ms
20 23.5µs 134.2ms 146.0ms 148.3ms
21 25.3µs 137.1ms 150.0ms 161.2ms
22 25.0µs 147.3ms 167.0ms 175.1ms
23 29.5µs 152.4ms 162.5ms 163.4ms
24 23.3µs 157.4ms 171.3ms 172.6ms
25 25.5µs 180.7ms 280.1ms 357.7ms
26 25.6µs 181.4ms 207.3ms 301.1ms
27 23.7µs 182.0ms 192.6ms 195.4ms
28 27.1µs 188.7ms 202.5ms 210.3ms
29 25.8µs 211.4ms 232.3ms 236.4ms
30 24.0µs 210.9ms 223.5ms 229.4ms
31 23.4µs 212.3ms 234.7ms 237.9ms
32 22.5µs 218.8ms 239.4ms 242.9ms
33 23.5µs 241.4ms 260.9ms 268.6ms
34 24.0µs 239.3ms 254.6ms 255.3ms
35 26.6µs 246.1ms 259.4ms 264.6ms
36 24.0µs 275.2ms 290.6ms 296.0ms
37 24.9µs 274.5ms 289.1ms 293.8ms
38 24.4µs 277.5ms 292.0ms 299.0ms
39 26.1µs 282.4ms 297.4ms 301.7ms
40 28.8µs 612.7ms 640.3ms 642.4ms
41 26.8µs 614.3ms 644.0ms 652.3ms
42 24.5µs 618.7ms 648.0ms 652.6ms
43 25.0µs 640.0ms 665.9ms 666.9ms
44 27.7µs 694.4ms 781.8ms 800.9ms
45 26.0µs 684.7ms 725.2ms 732.4ms
46 25.5µs 683.1ms 704.1ms 710.8ms
47 25.8µs 709.6ms 738.5ms 751.8ms
48 26.7µs 751.3ms 867.6ms 897.4ms
49 24.6µs 764.2ms 940.3ms 984.9ms
50 25.3µs 770.0ms 794.9ms 798.8ms
51 25.5µs 789.7ms 835.4ms 854.2ms
52 28.7µs 818.3ms 864.7ms 872.7ms
53 25.4µs 817.7ms 845.3ms 884.1ms
54 25.4µs 821.3ms 856.9ms 862.5ms
55 26.4µs 879.1ms 1.02s 1.03s
56 25.5µs 1.00s 1.08s 1.11s
57 25.0µs 976.8ms 1.08s 1.17s
58 25.6µs 972.6ms 1.12s 1.13s
59 28.7µs 1.01s 1.06s 1.08s
60 25.4µs 1.02s 1.05s 1.07s

Practical takeaway

For a 500K-node, ~4M-edge graph, this delivers:

  • Sub-millisecond shortestPath at depths 1–4
  • Single-digit milliseconds at depths 5–8
  • Linear ~14ms-per-hop through depth ~40
  • ~1s for paths spanning the full sector chain (depth 60)

All of this is in-process; HTTP/Bolt overhead adds the usual 1–2ms on top.

See more...

Reproducing

go test ./pkg/cypher/ -run TestLargeScaleShortestPath_HopBuckets \
    -timeout 30m -largescale -v

Added

  • Bolt over WebSocket — browser drivers connect natively. The Bolt port (:7687 by default) now multiplexes four wire-level transports off one listener, sniffing the first 5 bytes of every accepted connection: bolt:// (raw TCP, today's path), bolt+s:// (TLS), ws:// (WebSocket over plain TCP), wss:// (TLS + WebSocket). The architecture mirrors Neo4j's TransportSelectionHandler: WebSocket frames carry the same Bolt magic + version negotiation + PackStream + chunked framing that raw TCP does, so existing drivers (Go, Java, Python, JavaScript browser, .NET) speak the same protocol on either transport. Operator-configurable knobs cover origin allowlist (default *), max message size (default 65 536 bytes, matching Neo4j's MAX_WEBSOCKET_FRAME_SIZE), ping/pong cadence (default 30 s ping / 60 s pong), pre-HELLO auth deadline, transport-sniff timeout, mTLS ClientAuthMode (none/request/request_verify/require_verify), RequireTLS (rejects every plaintext upgrade with the canonical Neo4j error), WebSocketEnabled=false (returns HTTP 426 on real WS upgrades while still serving the discovery probe to health checks), and operator-driven cert rotation via 5-second tls.Config.GetCertificate re-read with atomic-rename semantics. A plain GET / on the Bolt port returns a Neo4j-parity discovery response (200 OK + 5 required headers; empty body for Community parity, JSON describing the OAuth provider when NORNICDB_AUTH_PROVIDER=oauth). Phase-3 throughput, allocation, and round-trip benchmarks ship for all four transports; ws stays within a 5 % budget vs raw tcp and ws_tls within 0.3 % of tcp_tls.

    Auth: HELLO scheme=bearer/basic always wins. As a deliberate exception for first-party browser clients the WS upgrade reads the nornicdb_token cookie and Authorization: Bearer … header; either is honored as an "implicit bearer" when HELLO is scheme=none. Cookie wins on conflict; raw TCP has no HTTP layer so the implicit path is unreachable there.

    Configuration: 13 new NORNICDB_BOLT_* env vars (TLS cert/key/require/CA/auth-mode, WS enabled/origins/max-message/write-buffer/ping/pong, sniff/auth timeouts) plumbed through env → CLI → YAML. Documented in docs/operations/configuration.md (Bolt over WebSocket + TLS section), docs/operations/environment-variables.md, docs/user-guides/connecting-bolt.md (Neo4j-compatible scheme table for every official driver), and pkg/bolt/README.md. Metric schema migrated: bolt_connections_active becomes a GaugeVec, bolt_connections_total gains a closed-enum transport label (cardinality 3 → 12), plus new bolt_connections_rejected_total{reason} and bolt_websocket_oversized_total counters. dashboards/Grafana dashboards continue to work; queries that filtered only on result should be updated to also project transport.

  • NornicDB browser UI uses Bolt over WebSocket end-to-end. The embedded admin UI swapped its HTTP /tx/commit Cypher transport for the official neo4j-driver browser build over ws:// / wss://, configured automatically from the discovery response. Same-origin nornicdb_token cookie carries auth into every query so the UI's executeCypher path is one network round trip with no token-juggling JavaScript. Vite plugins (neo4jBrowserChannelPlugin, nodeShimPlugin) wire the driver's browser channel correctly under Vite 8 / Rolldown. The HTTP server's UI handler now serves SPA routes with trailing slashes (/databases/) directly instead of returning HTTP 400 — refreshing on any nested route works.

  • Per-database search index master switches and warming triggers. Four new orthogonal keys configure BM25 fulltext and vector ANN behavior independently per database:

    • NORNICDB_SEARCH_BM25_ENABLED (boolean, default true) — master switch for BM25 fulltext search.
    • NORNICDB_SEARCH_BM25_WARMING (enum: startup|lazy, default startup) — eager build at boot or deferred until first query.
    • NORNICDB_SEARCH_VECTOR_ENABLED (boolean, default true) — master switch for every vector search strategy (HNSW, IVF-HNSW, brute-force, GPU, Metal, Qdrant pass-through). When false, node embeddings are NOT iterated into the in-memory ANN substrate — the strongest available memory-pressure lever.
    • NORNICDB_SEARCH_VECTOR_WARMING (enum: startup|lazy, default startup).

    Defaults reproduce today's behavior; existing deployments need no change. Configurable via env, CLI flags (--search-bm25-enabled, etc.), nornicdb.yaml global memory: block, and yaml databases: map for per-database overrides. Runtime overrides via PUT /admin/databases/{name}/config always win over global defaults in both directions (per-DB true enables a globally-disabled index; per-DB false disables a globally-enabled one). Lazy-warming is a synchronous-wait contract: the first inbound search request from any entry point (HTTP, Bolt, GraphQL, gRPC, Cypher procedures) blocks inside Service.EnsureWarm until the build completes; concurrent first-readers all wait on the same sync.Once. The build runs in the DB's long-lived context so a request that times out during the wait does NOT abort the build.

    Migration: zero. Documented in docs/operations/configuration.md#per-database-search-index-control, docs/operations/low-memory-mode.md, docs/user-guides/hybrid-search.md, and the openapi spec. See docs/plans/per-database-search-index-flags-plan.md for design context.

  • Lucene wildcard parity for fulltext indexes. db.index.fulltext.queryNodes and db.index.fulltext.queryRelationships accept all three Lucene wildcard shapes:

    • *MatchAllDocsQuery; every document in the index.
    • *:* — Solr-style equivalent of *.
    • <prop>:* — field-presence query; every doc that has a non-empty value for the named property.

    Each shape honors the index's declared scope (label list for nodes, relationship-type list for edges) and declared property allowlist. An undeclared field returns empty (matching Neo4j-Lucene posting-list semantics). The previous behavior — wildcard queries returning 0 rows or, conversely, returning every node regardless of label scope — is fixed.

  • Relationship-scoped fulltext indexes. CREATE FULLTEXT INDEX <name> [IF NOT EXISTS] FOR ()-[r:Type]-() ON EACH [r.prop1, r.prop2] (Neo4j 5.x DDL form) is now supported. db.index.fulltext.queryRelationships('idx', '...') scans only relationships whose type matches the index's declared scope, instead of every edge in the graph. Persistence is forwards/backwards compatible: the new RelationshipTypes schema field uses omitempty, so old binaries reading new files see no extra key, and new binaries reading old files see an empty slice (which falls back to the legacy unscoped behavior). No on-disk schema-version bump.

  • /cyber demo route — cyber-physical graph visualization. Interactive 3D visualization seeded with sectors, hyperlanes, and traversable paths against a cyber_demo database, exercising the same hot-path Cypher cookbook as /demo (UnwindSimpleMergeBatch + UnwindMultiMatchCreateBatch). Pinned for benchmark and operator-demo scenarios.

Changed

  • shortestPath traversal latency cut ~400× on the demo workload (M3 Max, ~1 000 nodes / ~5 000 edges). Profile-led cleanup spanning storage, Cypher, and UI:

    • AsyncEngine adds a per-node inverted index over edgeCache so GetOutgoingEdges / GetIncomingEdges run in O(degree) instead of O(total cached edges). The BFS-frontier full-cache scan that scaled with total seeded edges is gone.
    • BadgerEngine adds an edge-body cache and per-node adjacency-ID cache. BFS-style reads on a stable graph skip Badger entirely after the first visit. Cache returns shared pointers (read-only contract) so repeated hits don't pay copyEdge.
    • New AdjacentEdgesEngine capability fetches both directions in a single view txn; plumbed through AsyncEngine, NamespacedEngine, and WALEngine.
    • NamespacedEngine.toUserEdge / toUserNode drop a deep-copy branch; all Get*Edges callers treat results read-only and clone via CopyNode/CopyEdge before mutating.
    • Cypher shortestPath BFS now uses parent-pointer reconstruction instead of per-neighbor GetNode during traversal; one BatchGetNodes at the end materializes the path. Calls GetAdjacentEdges when the storage chain supports it.
    • Cypher findNodeByPattern consults SchemaManager.PropertyIndexLookup before falling back to a label scan (mirrors merge.go).

    Cumulative result on the in-process bench: warm bench 14.5 ms / 156K allocs → 36 µs / 229 allocs; latency mean ~12 ms → 874 µs; latency p99 ~26 ms → 2.2 ms.

  • Strict-typed property round-trip preserved end-to-end. A long-standing widening regression — caller writes []float64 / []string / []int64, storage hands back []interface{} on every read — is fixed. The msgpack property codec inspects array headers and decodes homogeneous arrays into their declared concrete slice types; mixed arrays still fall back to []interface{}. Maps recurse the same way. The Cypher path's substituteParams short-circuits typed list parameters ($rows = []float64) so they stay as $name references through the parser instead of being stringified into Cypher list literals (which forced re-decode as []interface{}). Threaded ctx through ~70 expression-evaluator functions across binding-where, case, comparison, operators, math, traversal, link-prediction, knowledge-policy, vector procs, and APOC helpers so $param references resolve at evaluate time inside reduce(), list comprehensions, WHERE, and every other expression context — no widening, no re-parse.

  • db.index.vector.queryNodes returns empty results with a WARN log on vector-disabled databases instead of erroring or instantiating a fresh enabled service that bypasses the operator's flag. Composite Cypher pipelines that gracefully handle empty vector results continue to succeed; operators see the misconfiguration in subsystem=vector_search log lines.

  • Qdrant gRPC bridge honors the per-DB vector master switch. External Qdrant clients querying a database with NORNICDB_SEARCH_VECTOR_ENABLED=false see a deterministic structured error rather than a service whose ANN substrate isn't populated.

Fixed

  • mcp-neo4j-memory regressions — three independently reproducible Cypher correctness defects resolved.

    1. Map-parameter property access stored as literal text. WITH $entity AS entity MERGE (e:Memory {name: entity.name}) previously stored the literal string "{name:'Alice', type:'Person'}.name" instead of evaluating entity.name. The WITH-binding substitution treated entity.<key> as a standalone identifier and replaced just entity, leaving an orphaned .name suffix. Fixed by expanding <ident>.<key> into the property's Cypher literal value before the standalone-identifier replacer runs. Token boundary checks (word / underscore / dot) keep unrelated identifiers untouched. The same pattern in UNWIND [$r] AS r MATCH (a),(b) WHERE a.name = r.source AND b.name = r.target MERGE (a)-[:REL]->(b) now matches and creates the expected edge.

    2. Aggregating RETURN after CALL…YIELD…WITH…WHERE returned 0 rows. A bare RETURN collect(...) is required by Cypher to produce exactly one row even when the WHERE filters every input. The MATCH-WITH-RETURN aggregation path looked up cr.values["entity.name"] (a literal string keyed by alias) and silently produced an empty list when collect(entity.name) ran over it. New resolveInnerForRow evaluates each aggregate's inner expression three ways — bare alias, alias.property against a stored *storage.Node, or general expression with WITH-bound nodes as context — and applies uniformly to count, sum, and collect. WITH-followed-by-WHERE-followed-by-aggregating-RETURN now produces exactly one row holding the aggregation's identity value (collect → [], count → 0).

    3. CALL dbms.components() reported hard-coded "1.0.0". Wired to pkg/buildinfo.Version() (which loads from the embedded VERSION file at build time). Same fix applied to dbms.listConfig's nornicdb.version row. cypher-shell --version-style probes now see the actual running binary version.

  • Cypher SET errors no longer silently swallowed. A conflict-rejected UpdateNode / UpdateEdge previously looked like a successful SET to ExecuteCypher callers — the SET-RETURN row carried the pre-update state on disk while the executor reported success. Errors now propagate so MVCC commit conflicts surface as loud query failures instead of silent data loss. Paired with: RebuildTemporalIndexes + RebuildMVCCHeads moved from a background task into the synchronous tail of Open() so first-query writes can't race a startup head-rewrite that clears the entire prefixMVCCNodeHead range mid-commit.

  • DROP INDEX now tears down per-property vector data. Previously DROP INDEX <name> only removed the schema entry, leaving per-property vector data orphaned in the in-memory vectorIndex / HNSW / cluster substrates. A subsequent CREATE VECTOR INDEX with the same name appeared to "do nothing" because the orphaned state shadowed the new one. New search.Service.RemovePropertyVectorIndex tears the in-memory state down; executeDropIndex calls it before returning so a recreate from scratch is clean.

  • WAL chunk recovery now batches snapshot restore. RecoverWithTransactions and RecoverFromWALWithResult were calling BulkCreateNodes / BulkCreateEdges with the entire snapshot in one go, exhausting Badger's per-transaction write budget on snapshots above ~10 K nodes/edges. New BulkCreateNodesForRecovery / BulkCreateEdgesForRecovery chunk the restore into transaction-sized batches.

  • Search-flag precedence honored end-to-end. Three independent gaps in the v1.1.1 search-flag contract caused operator-set values to be silently dropped at startup:

    1. cmd/nornicdb/runServe was hand-copying a subset of cfg fields into a fresh nornicdb.DefaultConfig(); the four Search* fields were missing from the copy block, so env+CLI values landed in cfg but never reached dbConfig. dbConfig is now an alias of cfg so any field added to Config flows through automatically.
    2. nornicdb.Open warmed search indexes in a background goroutine that raced server.New's SetDbSearchFlagsResolver. When the resolver was nil at warmup time, default-DB warmup fell through to global defaults instead of per-DB overrides. New Config.DeferSearchWarmup + db.MarkSearchWarmupReady gate the warmup until the resolver is installed; pkg/server opts in.
    3. applyEnvVars unconditionally wrote (true, "startup") before checking the env var, breaking LoadFromFile's precedence ladder — a YAML file setting search_bm25_enabled: false was silently overwritten when the env var was unset. The env path now only writes when the var is actually present.

    Operators who set NORNICDB_SEARCH_BM25_ENABLED=false (or the CLI / YAML equivalents) now see the flag honored from the first warmup line in the log.

  • Transactional MATCH … MERGE correctly routes before CREATE. A regression where a MERGE inside a transaction containing a preceding MATCH was being dispatched to the CREATE path instead of the merge path is fixed. The dispatcher now consults MERGE keywords ahead of CREATE.

Internal

  • 38 deeply-asserted regression tests added across pkg/cypher (13 in mcp_memory_bugs_test.go covering every shape from the bug report plus relationship-side parity), pkg/storage (6 in schema_fulltext_relationship_test.go proving forward + backward + idempotent persistence), pkg/cypher/demo_shortest_path_bench_test.go (latency distribution + three benchmarks), pkg/storage/async_engine_edge_index_test.go (edge-cache inverted index across CRUD + flush + bulk paths), pkg/storage/async_engine_label_index_test.go (labelIndex flush eviction, GetNodesByLabel cache+engine merge), and the search-flag suites (pkg/search/index_flags_test.go and pkg/server/server_search_flags_test.go).
  • CI: storage tests split into smaller test groups so the runner doesn't exceed memory limits on shared CI hardware.
  • Bolt-side benchmarks: BenchmarkBolt_StreamRecords_EndToEnd_* plus per-transport variants (tcp, tcp_tls, ws, ws_tls) ship as part of the regression suite.

Documentation

  • docs/user-guides/connecting-bolt.md — driver-by-driver connecting guide for the four Bolt-over-WebSocket transports plus driver-side aliases (bolt+ssc://, neo4j://, neo4j+s://, neo4j+ssc://). Per-driver code snippets (Java, Python, JavaScript browser, JavaScript Node, .NET, Go).
  • docs/user-guides/graph-traversal.md — full Memgraph-style traversal vocabulary documented with the workload→procedure reference table: BFS / DFS via apoc.path.expandConfig, weighted shortest path via apoc.algo.dijkstra and apoc.algo.aStar, all-simple-paths via apoc.algo.allSimplePaths, neighborhood queries via apoc.neighbors.byhop/tohop, subgraph extraction via apoc.path.subgraphNodes, centrality (PageRank, betweenness, closeness), community detection (Louvain, label propagation, weakly connected components), and the GDS link-prediction family (commonNeighbors, adamicAdar, jaccard, preferentialAttachment, resourceAllocation, predict) plus gds.fastRP.stream for node embeddings.
  • docs/plans/bolt-over-websocket-plan.md — full implementation plan with phasing, test coverage matrix, and Neo4j-compatibility notes for the WS transport landing.
  • docs/plans/operator-declared-graphql-schema-plan.md — design plan for operator-declared GraphQL schema (read-only with relationship traversal, SDL stored in system DB, no auto-inference). Implementation deferred; plan committed as the source of truth for the future cut.

Technical Details

  • Range covered: v1.1.1..main (31 commits)
  • Primary focus areas: Bolt-over-WebSocket transport multiplexing with full TLS / origin / mTLS / cert-rotation surface, neo4j-driver browser-build integration in the embedded UI, per-DB search index master switches with synchronous lazy-warming, mcp-neo4j-memory Cypher parity (map-param property access, fulltext label/type scope, Lucene wildcard family, post-YIELD aggregation), shortestPath traversal latency reduction via per-node edge-cache indexes and parent-pointer BFS, typed property round-trip preservation through msgpack codec + Cypher param substitution, deterministic DROP INDEX teardown of vector substrates, WAL recovery batching for large snapshots.

What's Changed

  • fix(storage): chunk recovery restore batches by @linuxdynasty in #179
  • feat(search): per-database BM25 + vector index master switches with lazy warming by @orneryd in #177
  • fix(cypher): route transactional match merge before create by @linuxdynasty in #178
  • feat(bolt,ui): Bolt over WebSocket end-to-end with browser driver by @orneryd in #181

Full Changelog: v1.1.1...v1.1.2