Skip to content

Roadmap: v0.3 — make the runtime real #7

@mfw78

Description

@mfw78

Roadmap: v0.3 — make the runtime real

Goal

v0.2 shipped the contract: nexum:host WIT package, unified host-error, minimal manifest parser with [capabilities].required + [capabilities.http].allow enforcement, the engine/host vocabulary, and stubs for every host interface. v0.3 makes the contract real: every host stub does actual work, the lifecycle FSM runs end-to-end, and authoring a module is a 30-minute on-ramp instead of a research project.

The platform is a generic, programmable runtime for off-chain monitoring and automation. It is not, and must never become, a TWAP watch-tower with a runtime bolted on. The runtime ships the capabilities; modules use them. v0.3's job is to make those capabilities genuinely available.

We steer the workload selection by use cases we already know we want — CoW Protocol watch-tower (TWAP, Ethflow, ComposableCoW), balance/health monitors, governance vote tracking — but no workstream below is "deliver TWAP." TWAP is one of the validations at the bottom (success criteria §3). If shipping TWAP requires special-casing it in the engine, that's a bug in the abstraction, not a v0.3 deliverable.

What's NOT in v0.3

Scope-keeping. These all stay deferred:

  • Mobile / WebView / super-app engines. Architecture supports them via the nexum:host contract; no second engine ships in 0.3. Server-only remains the reference implementation.
  • query-module host runtime. WIT stays published as experimental; the wallet/ruleset-eval use case still has no hosted runtime.
  • Identity surface split (identity-read / identity-sign with signing-result.user-rejected). Stays single identity interface, personal_sign-only. Deferred to 0.4+, gated on a named wallet partner.
  • nexum:core / nexum:web3 / nexum:p2p package split. Architectural reviewer's recommendation; gated on a second customer who doesn't want web3 deps.
  • Module-to-module RPC (peer.call). No demand yet.
  • nexum-host embedder crate + stable C ABI. Required for non-Rust hosts; gated on the mobile partner.

Workstreams

Roughly ordered by dependency. Each is a checkbox; nested checkboxes are sub-deliverables. Open one issue per workstream when work begins; each becomes its own PR (or small series).


1. Real backends for every host interface

The engine currently stubs every interface to Err(HostError { kind: Unsupported }). v0.3 wires real implementations behind the same WIT contract.

  • chain — alloy Provider per chain, configured in a runtime-level nexum-engine.toml (separate from per-module nexum.toml). Multi-chain map. Tower middleware: timeout, retry (exponential + jitter), rate-limit, fallback endpoint. WebSocket subscriptions auto-reconnect with missed-block backfill.
    • chain::request forwards to alloy with JSON-RPC framing in the host.
    • chain::request_batch actually batches on the wire (alloy RequestPacket::Batch), not a per-call loop fallback.
  • local-storeredb (pure Rust, ACID). One database file per module; per-module quota from [module.resources].max_state_bytes (see workstream 3). Each event runs inside an implicit write transaction; commit on Ok, rollback on Err. The design lives in docs/04-state-store.md.
  • http — real fetch (reqwest or similar). The [capabilities.http].allow enforcement already lands in 0.2; v0.3 hooks the actual transport. Non-2xx responses return as ok(response) per the WIT contract; only transport-level failures surface as host-error.
  • identity — keystore-backed signing. Minimum: file-based encrypted keystore (EIP-2335 or sealed-box; decide in open questions). Honors the per-module [capabilities.identity].methods subset (currently parsed, not enforced).
  • remote-store — Bee client for Swarm: upload/download/read-feed/write-feed. May land partial in 0.3 if no in-scope module needs it; in that case ship the trait wired to kind: unsupported with a feature flag and finish in 0.4.
  • messaging — nwaku client for Waku content-topic publish/query. Same defer-if-unused stance as Swarm.
  • logging — route the existing stdout sink through tracing so it lands in the same pipeline as engine logs (workstream 5).

2. Event subscriptions — the runtime reads the manifest's [[subscription]] blocks

Today the engine dispatches a single hard-coded Block event for smoke-testing. The manifest's subscription declarations are documented but unread.

  • block sourceeth_subscribe("newHeads") via the alloy provider. Shared per chain (one subscription, fan out to N modules).
  • log sourceeth_subscribe("logs", filter) with address + topic filters parsed from the manifest.
  • cron source — schedule-based ticks. (Schedule syntax decision in open questions.)
  • message source — Waku content-topic subscriptions. Defers with the messaging backend.
  • Dispatch semantics — concurrent across modules, sequential within a module (ordered delivery). Each event runs inside the implicit write transaction (workstream 1).

3. Module lifecycle — restart policy + resource enforcement

docs/02 describes the Load → Init → Run → Restart → Dead FSM. Today only Init + Run exist (and Init is hard-coded).

  • Restart with exponential backoff (1s → 2s → 4s → … → 5min cap, per docs/02).
  • Poison-pill detection — N consecutive failures ([module.restart].max_consecutive_failures) → Dead. Operator-visible (metric + log).
  • wasmtime::ResourceLimiter for max_memory_bytes. memory.grow denied past cap.
  • Fuel meteringConfig::consume_fuel(true), Store::set_fuel(max_fuel_per_event) per event. Trap → rollback → restart.
  • Epoch interruptionConfig::epoch_interruption(true) with a Tokio epoch ticker. Yield long-running events back to the runtime; cap wall-clock time per event.
  • Storage cap enforcement — host-side tracking of local-store byte usage against max_state_bytes. set returns HostError { kind: invalid-input } on cap breach.

4. Optional-capability trap stubs — finish the manifest §3 promise

v0.2 enforces [capabilities].required and the http allowlist. The trap-stub fallback for absent-but-declared optional capabilities is the piece that was deferred.

  • When a module declares [capabilities].optional = ["messaging"] and the host doesn't provide it (or the operator denied it), install a per-import trap stub instead of failing instantiation. Calls return HostError { kind: unsupported } naming the module + missing capability.
  • Remove the "all-required" 0.1-compat fallback (fallback_manifest) — by 0.3 every module ships a nexum.toml, and the deprecation warning has had its grace period.

5. Observability

docs/06 describes the targets. Implement them.

  • Logstracing + tracing-subscriber → JSON to stdout.
  • Metricsmetrics + metrics-exporter-prometheus:9090/metrics. Three groups:
    • Runtime-level: modules loaded / dead.
    • Per-module: events handled, latency, fuel consumed, restart count, state bytes.
    • Per-chain RPC: request rate, error rate, fallback hits, blocks-behind.
  • Health endpoint — HTTP JSON at :8080/health reporting per-module state + per-chain RPC health. Suitable for k8s liveness/readiness.

6. Component supply chain

  • Hash verification — the manifest's component = "sha256:..." is parsed but not verified. On load, compute sha256(wasm_bytes) and refuse to instantiate on mismatch.
  • Content-address fetchersdocs/02's table lists bzz / ipfs / oci / https schemes. Implement the resolution flow: hash → local store → backend fetch → verify → cache.

7. Operator-facing infrastructure

  • nexum-engine.toml — runtime config, distinct from per-module nexum.toml. Multi-chain RPC providers, content backends, keystore path, observability endpoints.
  • Operator CLInexum run, nexum module list / load / unload, nexum state inspect, nexum health.
  • Dockerfile + GHCR image.
  • Operator quickstart doc — install → config → drop a module → nexum runcurl :8080/health. Tech writer flagged this gap in the v0.2 review.

8. cargo-nexum toolchain (minimal)

v0.2 deferred the cargo subcommand entirely; v0.3 ships the minimum useful surface.

  • cargo nexum new <name> — scaffold a module with nexum.toml, Cargo.toml, src/lib.rs, .gitignore.
  • cargo nexum check — validate the manifest schema + cargo check --target wasm32-wasip2.
  • cargo nexum run --mock — local-mock runner with a fake chain (preloaded blocks/logs from a fixture file) so authors can iterate without an RPC endpoint. Single biggest DX win in the toolchain — don't drop.
  • Defer to 0.4: cargo nexum package, cargo nexum publish, cargo nexum migrate --from 0.2.

9. nexum-sdk and shepherd-sdk crates

Today modules talk directly to wit-bindgen-generated traits. The SDK is the ergonomic layer.

  • nexum-sdk crate:
    • provider(chain_id) -> alloy::Provider<HostTransport> — alloy Provider backed by chain::request / chain::request_batch. Routes batch packets through the WIT batch call.
    • Signer — typed wrapper over identity::* (per the naming established in 0.2's docs).
    • Messaging, RemoteStore — typed wrappers.
    • TypedState<T>local-store with postcard (or bincode) serialisation.
    • log::{info!, warn!, error!} macros routing through logging::log.
    • #[nexum::module] proc macro generating bindgen + Guest impl from named handlers (#[on_block], #[on_log], #[on_tick], #[on_message]).
    • nexum_sdk::testing::MockHost — unit tests in native Rust.
  • shepherd-sdk crate: re-exports nexum-sdk, adds Cow (typed cow-api client), #[shepherd::module] proc macro extending #[nexum::module] with CoW imports.

10. Typed config (config-value variant)

Deferred from 0.2 because it was meaningless without the manifest parser. Both pieces land together in 0.3.

  • Add variant config-value { string, integer, boolean, list } to nexum:host/types.wit.
  • Change type config = list<tuple<string, string>>list<tuple<string, config-value>>.
  • Host's manifest reader emits typed values from TOML.
  • #[derive(NexumConfig)] proc macro in nexum-sdk codegen-validates against a Rust struct.
  • Re-instate typed-config language in the docs (the deferral notes added in 0.2 come out).

11. Module discovery

docs/03 describes static / ENS / on-chain registry. Today only "static path on disk via CLI arg" works.

  • Static — operator config lists local paths.
  • ENS — resolve contenthash (ENSIP-7) → Swarm/IPFS ref → fetch → verify → load.
  • On-chain registry — watch contract events (or ENS TextChanged) for registrations; auto-load/upgrade.

12. Documentation

  • Author quickstartcargo nexum newcargo nexum run --mock → publish. 5-minute path to first running module.
  • Per-language authoring guides — Rust first; JS / Go / Python deferred to 0.4 unless a contributor picks them up.
  • Diátaxis reorg — move 0008 design docs to docs/design/; build out concepts/, guides/, reference/, operations/. Tech writer's reorg proposal from the v0.2 review.
  • Glossary — explicit definitions of module / component / world / interface / capability / engine / host. Tech writer flagged this as critical.
  • 0.2 → 0.3 migration notes if any additive change touches the public surface (typed config will).

Dependency graph

   ┌─ chain backend ─────────┐
   ├─ local-store backend ───┤
   ├─ http backend ──────────┼─→ any meaningful module
   ├─ identity backend ──────┤
   └─ subscriptions ─────────┘
                │
                ↓
   restart + resource enforcement (parallel)
                │
                ↓
   observability (validates the above runs in prod shape)

   SDK + cargo-nexum + typed config (parallel to backends; gates DX
   quality but not whether modules can run at all)

   Component-hash verification + module discovery (independent;
   can land late in 0.3)

Success criteria

v0.3 ships when all of these hold:

  1. A real-world off-chain monitoring module runs end-to-end on the engine against a forked mainnet (or testnet) — exercising chain reads, local-store persistence, an event subscription, signing (where applicable), and at least one outbound transaction or API call.
  2. A second module, using a different combination of capabilities (or a different domain extension package, or none), runs on the same engine binary with no engine changes. The runtime claim "generic and programmable" is verifiable, not aspirational.
  3. The engine survives a 24-hour soak test: module restarts on simulated crash; RPC fallback on simulated provider outage; metrics report sane numbers throughout.
  4. A new author goes from zero to a running custom module in under 30 minutes using cargo nexum new + cargo nexum run --mock.

The grant deliverables (TWAP monitor, Ethflow monitor — GRANT_APPLICATION.md milestone 2) naturally satisfy criteria (1) and (2) when paired with a simpler counter-example module. They're the steering use cases, not the deliverable; the deliverable is the runtime.

Open questions (decide before each affected workstream starts)

  • Keystore format for the file-based identity backend — EIP-2335 (geth/reth compatible) or sealed-box (simpler)?
  • cron schedule syntax — standard cron (*/5 * * * *) or duration-style (5m)? Cron is more flexible; duration is simpler. Could support both.
  • redb commit policy — per-event commit (default, safest, max fsync) or batchable via manifest? Reviewer 3 flagged this in the v0.2 efficiency review. Defer unless a real workload pushes for it.
  • CoW API auth (and any other domain-extension that needs operator credentials) — how do API keys get into cow-api::Host without leaking into the module? Likely a runtime-level nexum-engine.toml section, not the per-module manifest. Need a general pattern, not a CoW special case.
  • Native Component Model async (future<T>) for genuinely-blocking host functions (chain RPC, signing) — adopt in 0.3, or stick with block_on-via-fibers and defer to 0.4? Cross-platform reviewer flagged the fiber dependency as the largest portability bug.

Cross-references

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions