Skip to content

feat: pluggable TemplateStore for horizontal parser scale-out#293

Merged
mikemiles-dev merged 1 commit into
mainfrom
feat/template-store
May 3, 2026
Merged

feat: pluggable TemplateStore for horizontal parser scale-out#293
mikemiles-dev merged 1 commit into
mainfrom
feat/template-store

Conversation

@mikemiles-dev
Copy link
Copy Markdown
Owner

Add a TemplateStore trait so V9 and IPFIX templates can be persisted to and re-read from an external backend (Redis, NATS KV, etc.). With a store configured, the parser writes through every learned template, consults the store on every cache miss, and propagates LRU evictions, RFC 7011 §8.1 withdrawals, and explicit clear_*_templates calls. This unblocks running multiple stateless parser replicas behind a UDP load balancer without source-IP-affinity routing.

AutoScopedParser auto-derives a per-source scope so exporters using the same template ID with different layouts do not collide in the store. The trait sees opaque Vec payloads encoded with a small versioned custom binary wire format — no serde_json or other runtime serializer is added to the dependency tree. An InMemoryTemplateStore reference impl is provided for tests.

New public API:

  • TemplateStore, TemplateStoreKey, TemplateKind, TemplateStoreError
  • InMemoryTemplateStore
  • NetflowParserBuilder::with_template_store / with_template_store_scope
  • NetflowParser::set_template_store_scope

Eight integration tests in tests/template_store.rs cover write-through, cross-replica read-through, baseline (no-store) unchanged, clear_* propagation, IPFIX withdrawal eviction, and per-source scoping. README gains a "Pluggable Template Storage" section under the Template Management Guide; RELEASES.md notes the feature under 1.0.3.

@mikemiles-dev mikemiles-dev force-pushed the feat/template-store branch 5 times, most recently from da90828 to c9b77a0 Compare May 3, 2026 20:25
Add a TemplateStore trait so V9 and IPFIX templates can be persisted
to and re-read from an external backend such as Redis or NATS KV.
With a store configured the parser:

  - writes through every learned template to the store
  - consults the store on every primary-cache miss before declaring
    a template unknown (read-through), repopulating the in-process
    LRU on hit so subsequent records take the hot path
  - propagates LRU evictions, RFC 7011 §8.1 template withdrawals,
    and explicit clear_*_templates calls so the store stays in sync

This unblocks running multiple stateless parser replicas behind a
UDP load balancer without source-IP-affinity routing — replica B
can boot cold and start serving data records for templates that
replica A learned, as long as both share the same store.

API additions
-------------
  - TemplateStore trait (get / put / remove with explicit error
    semantics: Ok(None) means absent; Err means backend failure)
  - TemplateStoreKey { scope: Arc<str>, kind: TemplateKind, template_id }
  - TemplateKind enum: V9Data, V9Options, IpfixData, IpfixOptions,
    IpfixV9Data, IpfixV9Options
  - TemplateStoreError { Backend(Box<...>), Codec(String) }
  - InMemoryTemplateStore reference impl (Mutex<HashMap>)
  - NetflowParserBuilder::with_template_store(Arc<dyn TemplateStore>)
  - NetflowParserBuilder::with_template_store_scope(impl Into<Arc<str>>)
  - NetflowParser::set_template_store_scope(impl Into<Arc<str>>)
  - TemplateEvent::Restored variant — fires when a template is
    pulled in via read-through; observability tools that count
    Learned can also count Restored after a parser restart
  - Three new CacheMetrics counters: template_store_restored,
    template_store_codec_errors, template_store_backend_errors

Wire format
-----------
A small versioned binary format (WIRE_VERSION = 1) is used to
encode templates as opaque Vec<u8> payloads. The store sees only
bytes — no serde_json or other runtime serializer is added to
the dependency tree. Codec errors on read are counted in metrics
AND the corrupted key is removed so a fresh template announce can
repopulate cleanly. The module docs cover the upgrade story for
future wire-version bumps (drain-before-upgrade or version-namespaced
scope).

Multi-source
------------
AutoScopedParser auto-derives a per-source scope of the form
"v9:{addr}/{source_id}", "ipfix:{addr}/{obs_domain}", or
"legacy:{addr}" so two exporters using the same template ID with
different layouts do not collide in the store.

Performance
-----------
Hot path impact when no store is configured: a single
Option::is_none branch. With a store configured: zero atomic
refcount bumps on the read-through path. The store handle is
held as a borrowed &Arc<dyn TemplateStore> rather than cloned;
every other field touched (metrics, templates, scope, etc.) is
accessed via direct field access so the borrow checker can split
disjoint borrows. Method calls that would re-borrow the whole
struct (`self.store_key(...)`, `template.is_valid(self)`) are
avoided by inlining or using limit-taking validation variants.

The scope is held as Arc<str> rather than String so the per-key
clone is a refcount bump rather than a heap allocation.

Validation
----------
V9 Template / OptionsTemplate gain `is_valid_with_limits` methods
that take numeric limits instead of a parser reference. The IPFIX
CommonTemplate trait gains the same. The live-parse and
read-through paths both call these helpers so validation rules
cannot drift between paths.

Pending-flow replay
-------------------
Templates restored via read-through are added to the per-parse
"learned IDs" set passed to pending-flow replay, so queued data
records for a previously-missing template resolve as soon as the
template is recovered from the store, not only when the exporter
re-announces it.

Tests
-----
17 integration tests in tests/template_store.rs cover:
  - write-through on learn (V9 + IPFIX)
  - cross-replica read-through (V9 + IPFIX + IPFIX options)
  - baseline behavior unchanged when no store is configured
  - clear_*_templates propagating to the store
  - IPFIX template withdrawal propagating to the store
  - AutoScopedParser per-source scope isolation
  - backend Err propagation (FaultStore fault injection)
  - corrupted-payload codec rejection + cleanup
  - LRU eviction on a full cache propagating to the store
  - duplicate-template-ID write-through overwriting
  - TemplateEvent::Restored event firing
  - pending-flow replay after read-through
  - set_template_store_scope retrofit after build

All 216 lib + 17 integration + 66 doctests pass.

Docs
----
README gains a "Pluggable Template Storage (Horizontal Scale-Out)"
section under the Template Management Guide with usage and a
backend-impl sketch. RELEASES.md notes the feature under 1.0.3.
@mikemiles-dev mikemiles-dev force-pushed the feat/template-store branch from c9b77a0 to a7f5990 Compare May 3, 2026 20:28
@mikemiles-dev mikemiles-dev merged commit 9eb2db9 into main May 3, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant