feat(plugin-sdk): FFI panic guards, RAII call guard, timeout, provenance, worker thread, v9 wire bump#353
Conversation
Wrap every extern "C" trampoline on both the client SDK and host side with catch_unwind boundaries so that a panic in plugin code never unwinds across the C ABI boundary (which is instant UB). Client SDK (sdks/plugin-sdk/native/): - New ffi_guard module with typed guard helpers: guard_result, guard_handle, guard_ptr, guard_tick, guard_schema, guard_source_config, guard_unit - All trampolines in native_plugin_entry!, native_source_plugin_entry!, and __plugin_shared_ffi! now wrapped via the appropriate guard - 11 unit tests verifying each guard catches panics and returns the correct sentinel value Host (crates/plugin-native/): - New ffi_guard_result, ffi_guard_alloc_video, ffi_guard_alloc_audio, ffi_guard_unit helpers in wrapper.rs - All extern "C" callbacks wrapped: plugin_log_callback, output_callback_shim, telemetry_callback_shim, alloc_video_shim, alloc_audio_shim, free_video_buffer, free_audio_buffer - 4 unit tests verifying host-side guards No wire-format or API version change. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
…ing safety Addresses all 10 items from PR #353 review: Major: 1. Add ConvertedPacketGuard RAII guard in output_callback_shim to ensure pooled buffers are freed if a panic occurs between packet_from_c() and Vec::push(). 2. Host-side FFI guards now preserve panic messages via conversions::error_to_c() and log them with tracing::error!, using ffi_guard::panic_message() (now pub) to avoid reimplementing message extraction. 3. __plugin_destroy_instance wraps cleanup() in a nested catch_unwind so a cleanup panic does not abort the process if Drop also panics. cleanup() doc updated with # Panics note. 4. Replace all 16 .expect() calls in metadata macros with ffi_guard::cstring_lossy() which strips null bytes and logs via tracing::error! instead of panicking. Minor: 5. guard_source_config now emits tracing::error! when panic causes silent reclassification as non-source. 6. panic_message extended to handle Box<dyn Error + Send + Sync>. 7. Filed #354 for integration test with panicking dylib. 8. Host guards delegate to pub ffi_guard::panic_message() (DRY). Nits: 9. Fixed misleading 'log' comment in guard_handle. 10. Added doc comments to all host-side guard functions. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Replace dead error_to_c calls in guard_handle, guard_ptr, guard_unit with tracing::error! (item 1: no error-fetch ABI exists) - Remove dead error_to_c from guard_source_config (already logs) - Expand guard_source_config doc to explain observable behavior when is_source:false reclassification occurs (item 2) - Add 'cannot panic' comment to __plugin_flush_noop explaining why it is intentionally unguarded (item 3) - Items 4-6: acknowledged, no code changes needed Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Replace dead error_to_c() call in __plugin_destroy_instance's nested catch_unwind with tracing::error! so cleanup panics are observable. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Deduplicate host guards via generic ffi_guard_with(label, on_panic, f) - Standardize on match catch_unwind style in both SDK and host - Restore error_to_c misnomer note as doc comment on the function - Wrap __plugin_flush_noop in guard_result for consistency - Add real shim tests: output_callback_shim (null args + valid text packet) and telemetry_callback_shim (null args) - Verified host surfaces SDK panic messages at tracing::error level (process_packet, flush, tick, update_params all extract CResult error_message) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…ent, add alloc shim tests - guard_ptr now takes a label parameter instead of hardcoding 'get_metadata', making it safe to reuse for other *const T trampolines - ConvertedPacketGuard doc accurately describes its narrow scope (fires only on Vec::push OOM panic, not on broader error paths) - ffi_guard_with doc notes that on_panic runs outside catch_unwind - Add alloc_video_shim and alloc_audio_shim null-user_data tests - Panic metrics counter punted to PR 7 (host observability) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…e fix - Replace `api_addr: usize` with `ApiPtr` wrapper to preserve pointer provenance under strict-provenance rules. - Add `CallGuard` RAII type so `finish_call` runs even on panic/early-return. All 10 begin_call/finish_call sites converted. - Add `call_with_timeout` helper wrapping `spawn_blocking` with configurable `tokio::time::timeout`. Default: 5 minutes (generous for ML plugins). - Plumb `call_timeout` through `LoadedNativePlugin` → `NativeNodeWrapper` → `InstanceState`. - Document callback lifetime contract in module doc. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…warn, dedup - Replace `handle_addr: AtomicUsize` with `AtomicPtr<c_void>` to preserve pointer provenance for plugin handles (completing the strict-provenance fix). - Return error instead of silently swallowing packets when `begin_call` returns `None` inside `process_packet` and `flush` closures. - Log `warn!` when a call timeout fires (documents that the blocking task and its `CallGuard` remain alive until completion). - Inline safety justifications on `unsafe impl Send/Sync for ApiPtr`. - Fix stale comment in `runtime_param_schema` — `finish_call` is now pure atomics, not `error_to_c`. - Deduplicate inline `update_params` handling by reusing `apply_params_update`. - Add `begin_call_returns_none_after_request_drop` test. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Fix TOCTOU race: begin_call re-checks drop_requested after incrementing in_flight_calls (Dekker pattern with SeqCst ordering), closing the window where a concurrent destroy_instance could free the handle while begin_call holds it. - Add Drop impl for InstanceState — defense-in-depth cleanup in case NativeNodeWrapper::drop is bypassed. - Add node_name context to call_with_timeout error messages so timeout logs identify the offending node in multi-plugin pipelines. - Document timeout+drop instance leak in module-level doc comment. - Add TODO comments for timeout coverage gaps: runtime_param_schema, get_source_config (sync, no timeout), on_upstream_hint (missing call_with_timeout), DEFAULT_CALL_TIMEOUT config wiring. - Add NOTE to api_ptr_preserves_provenance test clarifying that raw pointer comparison does not verify provenance; suggest Miri. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- apply_params_update: return error on destroyed instance instead of silently swallowing (consistent with flush/process_packet/tick). - Wrap destroy_instance FFI call with ffi_guard_unit to prevent double-panic / UB when called from InstanceState::Drop. - Document synchronous destroy on async worker in module-level doc. - Add Arc<InstanceState> lifetime note to CallGuard doc. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…m_hint, add stress test - begin_call rollback: if fetch_sub brings in_flight_calls to 0 and drop_requested is set, call destroy_instance to prevent stranding. - Route on_upstream_hint through call_with_timeout instead of bare spawn_blocking — prevents a hung plugin hint from stalling the source tick loop. - Add concurrent_begin_call_and_request_drop stress test (50 rounds, 8 caller threads + 1 request_drop thread) asserting invariants: in_flight == 0, handle is null, drop_requested is set. - Remove ceremonial Arc::clone in destroy_instance (self.library is already live; the clone pattern is only load-bearing inside spawn_blocking closures). - Add idempotency note to finish_call doc. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Upgrade finish_call and destroy_instance to SeqCst ordering for correct Dekker pair on weakly-ordered architectures. - Rate-limit timeout warning: per-instance AtomicBool one-shot flag prevents log spam from a permanently wedged plugin. - Remove dead DESTROY_COUNT from stress test. - Gate api_ptr_preserves_provenance behind #[cfg(miri)] — pointer comparison alone does not prove provenance. - Document catch_unwind asymmetry in runtime_param_schema and get_source_config TODO comments. - Remove redundant node_name_owned allocation in apply_params_update. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
| .unwrap_or_default(); | ||
| warn!(error = %msg, "Plugin runtime_param_schema failed"); | ||
| } | ||
| self.state.finish_call(); | ||
| return None; | ||
| } | ||
|
|
||
| // success=true, null json_schema → plugin has no runtime schema. | ||
| if result.json_schema.is_null() { | ||
| self.state.finish_call(); | ||
| return None; | ||
| } | ||
|
|
||
| // success=true, non-null json_schema → JSON string containing the schema. | ||
| // SAFETY: result.json_schema points to a thread-local CString set by | ||
| // error_to_c (used here as a generic "String → *const c_char" helper). | ||
| // We must copy the string BEFORE any other FFI call on this thread | ||
| // (including finish_call) that could invoke error_to_c again and | ||
| // overwrite the thread-local buffer. | ||
| // SAFETY: result.json_schema points to a thread-local CString managed | ||
| // by the plugin SDK. We copy the string BEFORE dropping the guard so | ||
| // no subsequent FFI call can overwrite the thread-local buffer. | ||
| let json_str = unsafe { conversions::c_str_to_string(result.json_schema) }.ok(); | ||
| self.state.finish_call(); | ||
| drop(guard); | ||
| json_str.and_then(|s| serde_json::from_str(&s).ok()) | ||
| } |
There was a problem hiding this comment.
📝 Info: get_source_config and runtime_param_schema bypass worker thread (synchronous FFI on async task)
Both runtime_param_schema (crates/plugin-native/src/wrapper.rs:695) and get_source_config (crates/plugin-native/src/wrapper.rs:1158-1169) call plugin FFI functions directly on the calling thread rather than routing through the worker. This is safe because: (1) runtime_param_schema is called by the engine before run() is spawned, so no worker thread exists yet; (2) get_source_config is called after the Ready→Start handshake completes and the last apply_params_update has been awaited, so the worker is idle. However, both block the calling tokio worker thread without timeout protection or catch_unwind, as noted by the TODO comments. The plugin-side trampolines DO have guard_schema/guard_source_config wrappers, so plugin panics are caught on the plugin side — but host-side panics in callbacks would propagate.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
| (outputs, error) | ||
| }) | ||
| .await | ||
| .map_err(|e| { | ||
| StreamKitError::Runtime(format!("Plugin processing task panicked: {e}")) | ||
| })?; | ||
| .await?; |
There was a problem hiding this comment.
🚩 Timeout propagation exits run_processor/run_source without state emission
When call_with_timeout returns a timeout error for process_packet (wrapper.rs:791) or tick (wrapper.rs:1149), the ? operator propagates the error out of run_processor/run_source immediately. This skips the cleanup code that normally emits NodeState::Stopped or NodeState::Failed and aborts input tasks. The pipeline coordinator would see the node's task return Err(...) without a preceding state update. This is a pre-existing pattern (the old spawn_blocking(...).await.map_err(...)? had the same issue for JoinError), but the timeout feature makes this path significantly more reachable. A scopeguard or cleanup block before the ? would address this.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Acknowledged. The timeout propagation skipping state emission is a pre-existing pattern (as noted). We'll address this in PR 7 (host observability) where we're adding proper state tracking and cleanup guards across all error paths. The worker thread change doesn't worsen this — same ? propagation semantics as before, just through a different dispatch mechanism.
Replace per-packet tokio::task::spawn_blocking dispatch with a dedicated OS thread per plugin instance that receives work over a bounded channel. The worker owns reusable CallbackContext buffers and hoisted Arc clones, eliminating ~7 Arc::clone operations and a Vec allocation per packet. The async side sends WorkerRequest messages and awaits oneshot replies, preserving the existing timeout behavior via tokio::time::timeout on the reply channel. No behavioral change — same begin_call/CallGuard pattern, same callback shims, same error handling. All existing tests pass. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
InstanceWorker::Drop was synchronously joining the worker thread, which would block the tokio async runtime if the worker was stuck in a long FFI call (e.g. after a timeout). This regressed from the old spawn_blocking approach where JoinHandles were simply dropped. Fix: detach the worker thread on drop. The Arc<InstanceState> held by the worker ensures the plugin instance stays alive until the FFI call completes, and request_drop/destroy_instance handle deferred cleanup. Also fix Tick handler to propagate callback_ctx.error on success, consistent with Process and Flush handlers. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Add reply+timeout to OnUpstreamHint so a wedged plugin cannot
stall the tick loop indefinitely.
- Wrap each FFI call in catch_unwind to preserve per-operation
panic diagnostics ("Plugin <op> panicked: <msg>") instead of
generic "Worker thread died".
- Shorten thread name prefix to "skit-plg-" to preserve more
node_id within Linux 15-byte prctl(PR_SET_NAME) limit.
- Remove Arc wrapper from input_pin_cstrs (no longer cloned into
per-packet closures).
- Remove redundant let _lib = Arc::clone(&state.library) in
worker_thread_main; InstanceState.library keeps it alive.
- Remove redundant try_send(Shutdown) before InstanceWorker Drop
(channel close already signals the worker to exit).
- Remove Shutdown variant (now dead code).
- Suppress clippy::missing_const_for_fn on tx() per reviewer
preference for plain fn.
- Fix docstring accuracy re Vec allocation claim.
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Wire up LoadedPluginRegistry::unregister via Drop on LoadedNativePlugin, fixing ghost entries after unload and probe-path registry pollution. - Rename counters: plugin.calls, plugin.errors, plugin.panics (drop .total suffix to avoid double _total from OTel Prometheus exporter). - Rename instance_count to load_count for accurate semantics. - Change record_call/record_panic op parameter to &'static str to eliminate per-call heap allocation on the hot path. - Add plugin.kind and node.id fields to worker-thread panic error! logs. - Remove unused plugin_path field from LoadedNativePlugin. - Soften LoadedPluginRegistry doc; add OTel init dependency note. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Eliminate hot-path heap allocations: pre-build [KeyValue; 2] label arrays on InstanceState at construction time instead of allocating kind.to_string() per FFI call. Change op parameter to &'static str. - Collapse record_call + record_panic into single record(outcome) method using CallOutcome enum (Success/Error/Panic). - Make registry internal: drop public plugin_registry(), list(), and PluginInfo. Registry is now write-only bookkeeping for register/unregister; diagnostics endpoint deferred. - Simplify register() signature to just kind (path/api_version were unused after making registry internal), fixing stale plugin_path on re-registration. - Add timeout recording from caller (async) side via record_timeout() so timed-out calls are distinguishable in metrics from slow successes. - Move `use tracing::Instrument` to top-of-file imports. - Remove #[allow(clippy::missing_const_for_fn)] (String already disqualifies const). Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Fix success/error classification drift: derive metric outcome from the final error value (err.is_none()) instead of a separate result.success && callback_ctx.error.is_none() check, so metrics and return path agree. - Fix CallOutcome::Panic double-count: only bump panics_total, not errors_total, so dashboards can sum errors + panics safely. - Make metric_labels non-optional in await_reply — all callers pass labels, preventing future refactors from silently losing timeout counts. - Remove LoadedPluginRegistry entirely — it was write-only dead code with no consumer. Can be re-introduced when a diagnostics endpoint is wired up. - Rename span from native_plugin to native_plugin.run to match OTel metric namespacing convention. - Add Debug derive on CallOutcome. - Add doc comment on build_labels noting one-time-per-instance cost. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Add dedicated plugin.timeouts counter so timeouts are distinguishable from FFI errors in dashboards. - record_timeout now also bumps calls_total so the call attempt is counted and error ratios stay correct for wedged plugins. - Document that a timeout followed by worker completion produces two calls_total entries for the same logical call. - Add comment on on_upstream_hint explaining that hints are best-effort advisory signals and intentionally not instrumented with metrics. - Trim trailing blank lines in lib.rs. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Update comment to reflect actual behavior: record_timeout bumps plugin.calls + plugin.timeouts, not plugin.errors. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Switch error!() fields from dotted (plugin.kind, node.id) to snake_case (plugin_kind, node_id) matching the codebase convention for tracing events. Dotted names remain on spans only. - Document that plugin_kind on InstanceState is the raw kind (e.g. whisper), not the namespaced form (plugin::native::whisper). - Trim stray trailing blank line in lib.rs. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
| CPacketType::Binary => { | ||
| // Upgrade to BinaryWithMeta so we have a CBinaryPacket to | ||
| // attach the zero-copy handle to. | ||
| // | ||
| // `bp.data` inherits the original `self.packet.data` pointer, | ||
| // while `buffer_handle` holds a `Bytes::clone()` (Arc bump — | ||
| // same underlying buffer). This relies on `Bytes::clone()` | ||
| // preserving `as_ptr()` identity, which holds for the | ||
| // current `bytes` crate implementation (shared-reference | ||
| // Bytes always clones the Arc, not the data). | ||
| let mut bp = Box::new(CBinaryPacket { | ||
| data: self.packet.data.cast::<u8>(), | ||
| data_len: self.packet.len, | ||
| content_type: std::ptr::null(), | ||
| metadata: std::ptr::null(), | ||
| buffer_handle: std::ptr::null_mut(), | ||
| free_fn: None, | ||
| }); | ||
| let cloned = Box::new(source_bytes.clone()); | ||
| bp.buffer_handle = Box::into_raw(cloned).cast::<c_void>(); | ||
| bp.free_fn = Some(free_binary_buffer_handle); | ||
| self.packet = CPacket { | ||
| packet_type: CPacketType::BinaryWithMeta, | ||
| data: std::ptr::from_mut::<CBinaryPacket>(&mut *bp).cast::<c_void>(), | ||
| len: std::mem::size_of::<CBinaryPacket>(), | ||
| }; | ||
| self.owned = CPacketOwned::BinaryWithMeta(BinaryWithMetaOwned { | ||
| content_type: None, | ||
| metadata: None, | ||
| binary: bp, | ||
| }); | ||
| }, |
There was a problem hiding this comment.
🚩 set_binary_buffer_handle's Binary→BinaryWithMeta wire upgrade may surprise v9 C plugins
When the host sends a plain Binary packet to a v9 plugin, set_binary_buffer_handle (conversions.rs:428-459) upgrades it to BinaryWithMeta on the wire (with null content_type and metadata) to attach the zero-copy handle. The v9 version doc (types.rs:39-42) correctly documents that v9 C plugins must handle BinaryWithMeta even with null fields. However, a v9 plugin that only switches on CPacketType::Binary and doesn't handle BinaryWithMeta would silently drop packets that were Binary before this change. This is documented as a deliberate wire change but could be a source of subtle bugs in third-party v9 plugins.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Acknowledged — this is an intentional v9 wire change. The upgrade is documented in:
- The
set_binary_buffer_handledoc comment itself ("Wire note (S2)") types.rs:39-42v9 version docs requiring v9 C plugins to handleBinaryWithMetawith null fields- The v9 migration notes for plugin authors
A v9 plugin that only matches CPacketType::Binary would miss these packets, but this is a documented contract change that comes with the version bump. Third-party plugins upgrading to v9 must audit their packet_type match arms — this should be called out explicitly in the changelog before release.
Add a generic ResourceCache<K, V> that replaces the hand-rolled LazyLock<Mutex<HashMap>> pattern used across all native plugins. - New module: resource_cache.rs with get_or_init, stats, clear - ResourceSupport trait updated with associated Key type and provided get_or_init_resource default implementation - Migrate parakeet plugin as proof of the new API shape - 6 unit tests including concurrent access Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Remove manual unsafe Send/Sync impls; rely on auto-derive - Fix stats race: losing thread in init race counted as miss (both threads paid full init cost) - Change get_or_init closure to FnOnce(&K) to avoid clone on every call from get_or_init_resource - Document poisoned-lock behavior in len() - Document ResourceManager relationship and deinit_resource removal in module-level and trait-level docs - Note that ResourceSupport trait is scaffolding for future plugin migrations and native_plugin_entry! auto-wiring Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…t trait - Replace String errors with CacheError enum (Poisoned | Init) - Add init_races counter to CacheStats for contention visibility - Remove ResourceSupport trait (zero implementers, free-function init_resource cannot capture &logger/&config) - Document clear() vs in-flight init race in docstring - Use Entry API instead of option_if_let_else - Fix doc example to use .map_err(|e| e.to_string()) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…rors
- Add is_poisoned() method so callers can distinguish empty from poisoned
- Add init_races_counted test with double-barrier to force all threads
into init simultaneously, asserting init_races >= 1
- Prefix parakeet cache error with context: 'recognizer cache: {e}'
- Remove Resource trait from prelude (no plugin implements it)
- Remove stale migration note from module docs
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
… Send/Sync - Document is_empty() and clear() behavior on poisoned mutex - Document stats() non-atomic snapshot semantics - Trim speculative 'future bridge' from module docs - Replace ParakeetNode manual unsafe Send/Sync with const assertion - Prefix parakeet cache error with 'recognizer cache:' context - Scope concurrent tests to Arc<ResourceCache> instead of static Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…nment, guard removal - plugin_log_callback: check subscriber filter before acquiring PLUGIN_LOG_METADATA_CACHE mutex (finding #2) - await_reply(None): add DEFAULT_CALL_TIMEOUT backstop so reply side is never unbounded (finding #9) - native_call_timeout_secs: clamp values below 1 to prevent instant timeouts (finding #13) - free_packet_buffer_handle: use raw-pointer reads instead of &* deref for Stacked Borrows consistency (finding #6) - ConvertedPacketGuard: remove — guard.consume() evaluated before push so guard never fires; Packet::drop handles unwind cleanup (finding #12) - set_binary_buffer_handle: free previously-set handle before overwrite to prevent leak on double-call (Devin Review) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- free_binary_buffer_handle: swap raw catch_unwind to guard_unit for consistent panic logging across FFI boundaries (finding #2) - plugin_log_field_set: document value-equality invariant (finding #5) - DEFAULT_CALL_TIMEOUT: document relationship to config and backstop - WORKER_CHANNEL_CAPACITY: name the channel capacity constant Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- set_binary_buffer_handle: document lifetime invariant for Binary upgrade path — caller must keep CPacketRepr alive across FFI call - null_binary_buffer_handle: strengthen safety doc to require heap- allocated storage (not .rodata) - from_node_metadata: document Vec pointer-stability invariant - error_to_c: tighten doc — callers must copy immediately, nested calls invalidate previous pointer - Logger Send+Sync: name user_data/enabled_user_data as the specific fields requiring the manual impl - call_guard_panic_then_destroy_invariant: test that in_flight_calls reaches 0 after panic and destroy_instance still succeeds Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…ot infinite The await_reply backstop was added but the set_call_timeout doc still claimed None gives infinite wait. Align doc with implementation: None falls back to DEFAULT_CALL_TIMEOUT (300s). Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Summary
Native plugin SDK/host hardening across FFI safety, host resiliency, v9 ABI behavior, observability, and plugin developer ergonomics.
catch_unwindpanic boundaries around SDK trampolines, host callback shims, synchronous host FFI calls, and worker-thread plugin invocations.NodeState::Failedreporting.RUST_LOG=whisper=debug) and adds target-inspection regression coverage.static mutmetadata withOnceLock<PluginMetadataStorage>, makes output metadata backing storage pointer-stable, and aligns legacy Opus metadata conversion.ResourceCachehelpers and migrates Parakeet directly toResourceCache::get_or_init.Bytes::clone()pointer-stability regression test, removal of unsafeunwrap_unchecked,cstring_lossycleanup, clearerguard_unitlabels, panic-safefinish_call, explicit worker-panic cleanup docs,InstanceState::dropregression coverage, clearer C ABI ownership docs, and v9 missing-callback warning context.Round 3 validation (13 findings)
Send+Syncunsoundplugin_log_callbackmutex before enabledBytes::clone()pointer stabilitybegin_callrollback destroy on arbitrary threadsset_log_enabled_callbackon hot pathfree_packet_buffer_handle&*derefPluginMetadataStorageSend+Syncawait_reply(None)no backstopInstanceState::Dropon tokio workersResourceCache::clear()racesConvertedPacketGuardno protectionnative_call_timeout_secs=0Round 4 validation (5 findings)
free_binary_buffer_handleswallows panicsguard_unitResourceCache::clear()counterstimeout_warnedsend/replyplugin_log_field_set()invariantRound 5 validation (9 findings)
on_upstream_hintno timeoutset_binary_buffer_handlelifetime invariantnull_binary_buffer_handle.rodata safetycreate_instancehandle leakfrom_node_metadataVec pointer stabilitytimeout_warnedrate-limitingget_or_initrace wastes RAMerror_to_cnested invalidationin_flight_callstestcall_guard_panic_then_destroy_invarianttestAlso fixed: Logger
Send+SyncSAFETY comment tightened to nameuser_data/enabled_user_datafields.All deferred items consolidated in follow-up issue: #365
Review & Testing Checklist for Human
CPacket::lensize checks safely avoid reading new v9 fields.Failedon reply-side, send-side, and UpdateParams timeout paths.Notes
Local verification performed:
just lint-skit✓cargo test -p streamkit-plugin-native(31 tests) ✓cargo test -p streamkit-plugin-sdk-native(42 tests) ✓cargo check -p streamkit-server✓git diff --check✓No
perf-baselines.json,info.log, orremote_skit.logchanges committed.Link to Devin session: https://staging.itsdev.in/sessions/b181603833744776a948491c1b17cec9
Requested by: @streamer45