Context
Follow-up items from PR #353 (plugin SDK FFI hardening). These were validated as real findings but deferred because they are architectural, low-risk, or would expand the PR scope significantly.
Tracked per review discussion.
Follow-up Items
FFI Safety / Soundness
EnvFilter / Callsite
Timeouts / Blocking
Resource Cache
Worker / Performance
CI / Tooling
SDK Ergonomics / Cleanup
Metrics
Wire / ABI
Context
Follow-up items from PR #353 (plugin SDK FFI hardening). These were validated as real findings but deferred because they are architectural, low-risk, or would expand the PR scope significantly.
Tracked per review discussion.
Follow-up Items
FFI Safety / Soundness
Logger
Send+Syncunsound with non-nullenabled_user_data(logger.rs:34-35). Current hosts pass null, but theSend+Syncimpl is a promise to plugin authors. Tighten via fn-pointer-only callbacks orArc<dyn Send+Sync>. Breaking SDK API change — needs migration path.PluginMetadataStorageSend+Syncby convention (metadata_storage.rs:52-54). Stored inOnceLock, never mutated after init, but public mutable fields technically break the "never-mutated-after-init" invariant. Considerpub(crate)visibility.Post-
create_instancehandle leak (wrapper.rs:1023-1029). The block betweencreate_instancereturningOk(handle)andOk(Self { ... })(e.g.set_log_enabled_callback) can leak the handle if a future addition panics or fails. Wrap in a scopeguard-style fallback that callsdestroy_instanceif the constructor doesn't reachOk.EnvFilter / Callsite
EnvFilter cache poisoning — shared callsite
Identifier(wrapper.rs:781-821,896-938). All leaked per-(target,level)Metadataentries share a singlePLUGIN_LOG_CALLSITEIdentifier.tracing-subscriber'sEnvFiltercachesInterestby callsite ID, so directives likeRUST_LOG=whisper=debug,kokoro=offcan be masked by whichever target registers first. Fix: one leakedCallsiteimpl per (target,level) entry, or avoidregister_callsiteentirely. Add anEnvFilterintegration test.PluginLogCallsitenot registered (noted in code). The perf win from v9's enabled-check short-circuit is at risk until callsite registration lands properly.Timeouts / Blocking
begin_callrollback can rundestroy_instanceon arbitrary threads (wrapper.rs:286-308). Plugins may assume worker-thread context (e.g. ORT thread-locals). Document or enforce.InstanceState::Drop→ plugin destroy can land on tokio worker threads (wrapper.rs:359-366). Blocking destroy wedges tokio. Mitigated by worker thread detach but not guaranteed.Shared
timeout_warnedacross send and reply paths (wrapper.rs:1198-1236). One-shot dedup loses the reply-side warn when send times out first. Consider splitting into two atomics or rate-limiting (e.g. one warn/min) rather than one-shot.on_upstream_hintFFI calls have no timeout (wrapper.rs:748-782). A slow/wedged hint permanently wedges the worker; all subsequent Tick/Process fail the send-side timeout with "worker likely wedged in prior FFI call". Already acknowledged in module docs (l.46-50), but the end state is node death with a confusing error. Either give hint calls their own timeout, rate-limit hints, or drop when worker occupancy is high.Resource Cache
ResourceCache::clear()doesn't reset counters (resource_cache.rs:143-176). After clear,hits + misses + init_races > entries. Document as monotonic-by-design or exposereset_stats().ResourceCache::clear()races with in-flightget_or_init(resource_cache.rs:143-157). Only tests use it today. Consider a generation counter before production callers appear.get_or_initrace path runs init twice (resource_cache.rs:194-225). Fine now, but for GB-sized ML models this wastes RAM + seconds. Add a TODO for a CondVar/Notify second-check design.Worker / Performance
Worker Vec reuse lost on panic (
wrapper.rs:524, 596, 672). A recurrently panicking plugin forces malloc thrashing. Minor DoS risk. Already documented.input_pins()/output_pins()clone on every call (wrapper.rs:1029-1035). Hot path for sources.CI / Tooling
Add Miri to CI so
api_ptr_preserves_provenancetest actually runs (wrapper.rs:2540-2552). Currently#[cfg(miri)]only.cdylib panic-across-FFI test fixture — real loaded plugin exercising panic paths. Unit coverage documents why extern-stub panic testing aborts. Consider a dedicated test fixture crate.
SDK Ergonomics / Cleanup
logger.rs:78-84—CString::newperlog()call despitetarget_cstr: Option<Arc<CString>>cache field. Reuse the cached value.cstring_sanitizeinconversions.rs:541-543silently falls back to empty. Align withffi_guard::cstring_lossywhich logs.create_instanceclones fullNodeMetadatajust for.kind(lib.rs:930,1148). Once storage is cached, read from it.Document field drop-order invariant (library after handle) in
NativePluginWrapper(wrapper.rs:212).set_log_enabled_callbackcalled during instance creation with no blocking bound (wrapper.rs:997-1003). A pathological plugin could wedge engine startup. Low risk but worth documenting.error_to_cthread-local lifetime (conversions.rs:1010-1030). A nested host call can silently invalidate the plugin's captured pointer. Doc now says "copy immediately" — consider enforcing via API change in a future version.Promote
assert_send_syncconst-assert pattern in SDK docs (parakeet_node.rs:75-79). The patternconst _: () = { const fn assert_send_sync<T: Send + Sync>() {}; assert_send_sync::<MyNode>(); };is a good alternative tounsafe impl Send+Sync. Worth documenting as the preferred approach for plugin authors.Metrics
plugin.callscan double-count post-timeout completions (metrics.rs:93-97). Consider renaming tocall_observations. Already documented in code.metrics.rs:100-102—kind: &'static strto avoid per-instanceto_string()allocation inbuild_labels. Low priority.len()returns 0 when poisoned (resource_cache.rs:121). Conflates empty/poisoned. Consistent with std idioms, so OK — document explicitly.Wire / ABI
Bytes::clone()pointer stability cross-crate invariant (conversions.rs:433-458). Regression test exists, but consider pinningbytesversion or addingdebug_assert.v9 plugin missing optional
set_log_enabled_callback(lib.rs:621-623, 771-773). Defaultlogger_mut() -> Nonemeans v9 plugins silently get no enabled callback benefit.