Skip to content

feat(plugin-sdk): FFI panic guards, RAII call guard, timeout, provenance, worker thread, v9 wire bump#353

Merged
streamer45 merged 57 commits intomainfrom
devin/1776623122-plugin-sdk-hardening
Apr 24, 2026
Merged

feat(plugin-sdk): FFI panic guards, RAII call guard, timeout, provenance, worker thread, v9 wire bump#353
streamer45 merged 57 commits intomainfrom
devin/1776623122-plugin-sdk-hardening

Conversation

@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor

@staging-devin-ai-integration staging-devin-ai-integration Bot commented Apr 19, 2026

Summary

Native plugin SDK/host hardening across FFI safety, host resiliency, v9 ABI behavior, observability, and plugin developer ergonomics.

  • Adds catch_unwind panic boundaries around SDK trampolines, host callback shims, synchronous host FFI calls, and worker-thread plugin invocations.
  • Moves native plugin execution to a per-instance worker thread with bounded request channels, configurable reply/send timeouts, timeout metrics, telemetry events, and NodeState::Failed reporting.
  • Bumps the native plugin ABI to v9 for zero-copy binary packet handles and the optional log-enabled callback used to skip filtered log formatting.
  • Fixes v9 plugin log event metadata so subscriber filters see the plugin kind target (e.g. RUST_LOG=whisper=debug) and adds target-inspection regression coverage.
  • Replaces static mut metadata with OnceLock<PluginMetadataStorage>, makes output metadata backing storage pointer-stable, and aligns legacy Opus metadata conversion.
  • Adds ResourceCache helpers and migrates Parakeet directly to ResourceCache::get_or_init.
  • Review follow-ups included in this PR: configurable native call timeout, send-side/update-params timeout failure propagation, telemetry drop logging/health accounting validation, Bytes::clone() pointer-stability regression test, removal of unsafe unwrap_unchecked, cstring_lossy cleanup, clearer guard_unit labels, panic-safe finish_call, explicit worker-panic cleanup docs, InstanceState::drop regression coverage, clearer C ABI ownership docs, and v9 missing-callback warning context.

Round 3 validation (13 findings)

# Finding Verdict Action
1 Logger Send+Sync unsound Follow-up Tightening is a breaking SDK API change
2 plugin_log_callback mutex before enabled Fixed Filter check before cache mutex
3 Bytes::clone() pointer stability Follow-up Already has regression test
4 begin_call rollback destroy on arbitrary threads Follow-up FFI-guarded + idempotent
5 set_log_enabled_callback on hot path Follow-up Called once at creation
6 free_packet_buffer_handle &* deref Fixed Raw-pointer reads for Stacked Borrows
7 Worker Vec reuse lost on panic Follow-up Already documented
8 PluginMetadataStorage Send+Sync Follow-up OnceLock, never mutated
9 await_reply(None) no backstop Fixed DEFAULT_CALL_TIMEOUT backstop
10 InstanceState::Drop on tokio workers Follow-up Worker thread detach
11 ResourceCache::clear() races Follow-up Best-effort, tests only
12 ConvertedPacketGuard no protection Fixed Removed dead guard
13 native_call_timeout_secs=0 Fixed Clamped to min 1

Round 4 validation (5 findings)

# Finding Verdict Action
1 EnvFilter cache poisoning (shared callsite ID) Follow-up Architectural — per-target callsite
2 free_binary_buffer_handle swallows panics Fixed Swapped to guard_unit
3 ResourceCache::clear() counters Follow-up Monotonic by design
4 Shared timeout_warned send/reply Follow-up Trade-off documented
5 plugin_log_field_set() invariant Fixed Added doc comment

Round 5 validation (9 findings)

# Finding Verdict Action
1 on_upstream_hint no timeout Follow-up Acknowledged in module docs
2 set_binary_buffer_handle lifetime invariant Fixed Documented lifetime for Binary upgrade
3 null_binary_buffer_handle .rodata safety Fixed Strengthened safety doc
4 Post-create_instance handle leak Follow-up Scopeguard needed
5 from_node_metadata Vec pointer stability Fixed Added invariant comment
6 timeout_warned rate-limiting Follow-up One-shot vs rate-limit trade-off
7 get_or_init race wastes RAM Follow-up CondVar TODO
8 error_to_c nested invalidation Fixed Tightened doc to "copy immediately"
9 Panic mid-FFI in_flight_calls test Fixed Added call_guard_panic_then_destroy_invariant test

Also fixed: Logger Send+Sync SAFETY comment tightened to name user_data/enabled_user_data fields.

All deferred items consolidated in follow-up issue: #365

Review & Testing Checklist for Human

  • Load a v8-compiled native plugin with this v9 host and verify CPacket::len size checks safely avoid reading new v9 fields.
  • Build representative official native plugins and run their sample pipelines, especially a plugin that emits/receives binary packets and a source plugin.
  • Exercise a wedged/slow plugin call and confirm the node transitions to Failed on reply-side, send-side, and UpdateParams timeout paths.
  • Verify v9 plugin logs appear with the plugin kind target and filtered log levels avoid plugin-side formatting.
  • For Parakeet, run two instances with the same model params and confirm the recognizer resource is shared.

Notes

Local verification performed:

  • just lint-skit
  • cargo test -p streamkit-plugin-native (31 tests) ✓
  • cargo test -p streamkit-plugin-sdk-native (42 tests) ✓
  • cargo check -p streamkit-server
  • git diff --check

No perf-baselines.json, info.log, or remote_skit.log changes committed.

Link to Devin session: https://staging.itsdev.in/sessions/b181603833744776a948491c1b17cec9
Requested by: @streamer45


Open in Devin Review (Staging)

Wrap every extern "C" trampoline on both the client SDK and host side
with catch_unwind boundaries so that a panic in plugin code never
unwinds across the C ABI boundary (which is instant UB).

Client SDK (sdks/plugin-sdk/native/):
- New ffi_guard module with typed guard helpers: guard_result,
  guard_handle, guard_ptr, guard_tick, guard_schema,
  guard_source_config, guard_unit
- All trampolines in native_plugin_entry!, native_source_plugin_entry!,
  and __plugin_shared_ffi! now wrapped via the appropriate guard
- 11 unit tests verifying each guard catches panics and returns the
  correct sentinel value

Host (crates/plugin-native/):
- New ffi_guard_result, ffi_guard_alloc_video, ffi_guard_alloc_audio,
  ffi_guard_unit helpers in wrapper.rs
- All extern "C" callbacks wrapped: plugin_log_callback,
  output_callback_shim, telemetry_callback_shim, alloc_video_shim,
  alloc_audio_shim, free_video_buffer, free_audio_buffer
- 4 unit tests verifying host-side guards

No wire-format or API version change.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review (Staging)
Debug

Playground

streamkit-devin and others added 2 commits April 20, 2026 13:04
…ing safety

Addresses all 10 items from PR #353 review:

Major:
1. Add ConvertedPacketGuard RAII guard in output_callback_shim to
   ensure pooled buffers are freed if a panic occurs between
   packet_from_c() and Vec::push().
2. Host-side FFI guards now preserve panic messages via
   conversions::error_to_c() and log them with tracing::error!,
   using ffi_guard::panic_message() (now pub) to avoid
   reimplementing message extraction.
3. __plugin_destroy_instance wraps cleanup() in a nested
   catch_unwind so a cleanup panic does not abort the process if
   Drop also panics.  cleanup() doc updated with # Panics note.
4. Replace all 16 .expect() calls in metadata macros with
   ffi_guard::cstring_lossy() which strips null bytes and logs
   via tracing::error! instead of panicking.

Minor:
5. guard_source_config now emits tracing::error! when panic causes
   silent reclassification as non-source.
6. panic_message extended to handle Box<dyn Error + Send + Sync>.
7. Filed #354 for integration test with panicking dylib.
8. Host guards delegate to pub ffi_guard::panic_message() (DRY).

Nits:
9. Fixed misleading 'log' comment in guard_handle.
10. Added doc comments to all host-side guard functions.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Replace dead error_to_c calls in guard_handle, guard_ptr, guard_unit
  with tracing::error! (item 1: no error-fetch ABI exists)
- Remove dead error_to_c from guard_source_config (already logs)
- Expand guard_source_config doc to explain observable behavior when
  is_source:false reclassification occurs (item 2)
- Add 'cannot panic' comment to __plugin_flush_noop explaining why
  it is intentionally unguarded (item 3)
- Items 4-6: acknowledged, no code changes needed

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
staging-devin-ai-integration[bot]

This comment was marked as resolved.

streamkit-devin and others added 4 commits April 20, 2026 14:32
Replace dead error_to_c() call in __plugin_destroy_instance's nested
catch_unwind with tracing::error! so cleanup panics are observable.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Deduplicate host guards via generic ffi_guard_with(label, on_panic, f)
- Standardize on match catch_unwind style in both SDK and host
- Restore error_to_c misnomer note as doc comment on the function
- Wrap __plugin_flush_noop in guard_result for consistency
- Add real shim tests: output_callback_shim (null args + valid text
  packet) and telemetry_callback_shim (null args)
- Verified host surfaces SDK panic messages at tracing::error level
  (process_packet, flush, tick, update_params all extract CResult
  error_message)

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…ent, add alloc shim tests

- guard_ptr now takes a label parameter instead of hardcoding
  'get_metadata', making it safe to reuse for other *const T trampolines
- ConvertedPacketGuard doc accurately describes its narrow scope
  (fires only on Vec::push OOM panic, not on broader error paths)
- ffi_guard_with doc notes that on_panic runs outside catch_unwind
- Add alloc_video_shim and alloc_audio_shim null-user_data tests
- Panic metrics counter punted to PR 7 (host observability)

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…e fix

- Replace `api_addr: usize` with `ApiPtr` wrapper to preserve pointer
  provenance under strict-provenance rules.
- Add `CallGuard` RAII type so `finish_call` runs even on panic/early-return.
  All 10 begin_call/finish_call sites converted.
- Add `call_with_timeout` helper wrapping `spawn_blocking` with configurable
  `tokio::time::timeout`.  Default: 5 minutes (generous for ML plugins).
- Plumb `call_timeout` through `LoadedNativePlugin` → `NativeNodeWrapper`
  → `InstanceState`.
- Document callback lifetime contract in module doc.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
@staging-devin-ai-integration staging-devin-ai-integration Bot changed the title feat(plugin-sdk): add catch_unwind panic guards to all FFI trampolines feat(plugin-sdk): FFI panic guards + RAII call guard, timeout, pointer provenance Apr 20, 2026
@staging-devin-ai-integration staging-devin-ai-integration Bot changed the title feat(plugin-sdk): FFI panic guards + RAII call guard, timeout, pointer provenance feat(plugin-sdk): FFI panic guards + host resiliency (RAII, timeout, provenance) Apr 20, 2026
…warn, dedup

- Replace `handle_addr: AtomicUsize` with `AtomicPtr<c_void>` to preserve
  pointer provenance for plugin handles (completing the strict-provenance fix).
- Return error instead of silently swallowing packets when `begin_call`
  returns `None` inside `process_packet` and `flush` closures.
- Log `warn!` when a call timeout fires (documents that the blocking task
  and its `CallGuard` remain alive until completion).
- Inline safety justifications on `unsafe impl Send/Sync for ApiPtr`.
- Fix stale comment in `runtime_param_schema` — `finish_call` is now pure
  atomics, not `error_to_c`.
- Deduplicate inline `update_params` handling by reusing `apply_params_update`.
- Add `begin_call_returns_none_after_request_drop` test.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
@staging-devin-ai-integration staging-devin-ai-integration Bot changed the title feat(plugin-sdk): FFI panic guards + host resiliency (RAII, timeout, provenance) feat(plugin-sdk): FFI panic guards + RAII call guard, timeout, pointer provenance Apr 20, 2026
streamkit-devin and others added 2 commits April 21, 2026 07:16
- Fix TOCTOU race: begin_call re-checks drop_requested after
  incrementing in_flight_calls (Dekker pattern with SeqCst ordering),
  closing the window where a concurrent destroy_instance could free the
  handle while begin_call holds it.
- Add Drop impl for InstanceState — defense-in-depth cleanup in case
  NativeNodeWrapper::drop is bypassed.
- Add node_name context to call_with_timeout error messages so timeout
  logs identify the offending node in multi-plugin pipelines.
- Document timeout+drop instance leak in module-level doc comment.
- Add TODO comments for timeout coverage gaps: runtime_param_schema,
  get_source_config (sync, no timeout), on_upstream_hint (missing
  call_with_timeout), DEFAULT_CALL_TIMEOUT config wiring.
- Add NOTE to api_ptr_preserves_provenance test clarifying that raw
  pointer comparison does not verify provenance; suggest Miri.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- apply_params_update: return error on destroyed instance instead of
  silently swallowing (consistent with flush/process_packet/tick).
- Wrap destroy_instance FFI call with ffi_guard_unit to prevent
  double-panic / UB when called from InstanceState::Drop.
- Document synchronous destroy on async worker in module-level doc.
- Add Arc<InstanceState> lifetime note to CallGuard doc.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
staging-devin-ai-integration[bot]

This comment was marked as resolved.

…m_hint, add stress test

- begin_call rollback: if fetch_sub brings in_flight_calls to 0 and
  drop_requested is set, call destroy_instance to prevent stranding.
- Route on_upstream_hint through call_with_timeout instead of bare
  spawn_blocking — prevents a hung plugin hint from stalling the
  source tick loop.
- Add concurrent_begin_call_and_request_drop stress test (50 rounds,
  8 caller threads + 1 request_drop thread) asserting invariants:
  in_flight == 0, handle is null, drop_requested is set.
- Remove ceremonial Arc::clone in destroy_instance (self.library is
  already live; the clone pattern is only load-bearing inside
  spawn_blocking closures).
- Add idempotency note to finish_call doc.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Upgrade finish_call and destroy_instance to SeqCst ordering for
  correct Dekker pair on weakly-ordered architectures.
- Rate-limit timeout warning: per-instance AtomicBool one-shot flag
  prevents log spam from a permanently wedged plugin.
- Remove dead DESTROY_COUNT from stress test.
- Gate api_ptr_preserves_provenance behind #[cfg(miri)] — pointer
  comparison alone does not prove provenance.
- Document catch_unwind asymmetry in runtime_param_schema and
  get_source_config TODO comments.
- Remove redundant node_name_owned allocation in apply_params_update.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 14 additional findings in Devin Review.

Open in Devin Review (Staging)
Debug

Playground

Comment thread crates/plugin-native/src/wrapper.rs Outdated
Comment on lines 433 to 451
.unwrap_or_default();
warn!(error = %msg, "Plugin runtime_param_schema failed");
}
self.state.finish_call();
return None;
}

// success=true, null json_schema → plugin has no runtime schema.
if result.json_schema.is_null() {
self.state.finish_call();
return None;
}

// success=true, non-null json_schema → JSON string containing the schema.
// SAFETY: result.json_schema points to a thread-local CString set by
// error_to_c (used here as a generic "String → *const c_char" helper).
// We must copy the string BEFORE any other FFI call on this thread
// (including finish_call) that could invoke error_to_c again and
// overwrite the thread-local buffer.
// SAFETY: result.json_schema points to a thread-local CString managed
// by the plugin SDK. We copy the string BEFORE dropping the guard so
// no subsequent FFI call can overwrite the thread-local buffer.
let json_str = unsafe { conversions::c_str_to_string(result.json_schema) }.ok();
self.state.finish_call();
drop(guard);
json_str.and_then(|s| serde_json::from_str(&s).ok())
}
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration Bot Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: get_source_config and runtime_param_schema bypass worker thread (synchronous FFI on async task)

Both runtime_param_schema (crates/plugin-native/src/wrapper.rs:695) and get_source_config (crates/plugin-native/src/wrapper.rs:1158-1169) call plugin FFI functions directly on the calling thread rather than routing through the worker. This is safe because: (1) runtime_param_schema is called by the engine before run() is spawned, so no worker thread exists yet; (2) get_source_config is called after the Ready→Start handshake completes and the last apply_params_update has been awaited, so the worker is idle. However, both block the calling tokio worker thread without timeout protection or catch_unwind, as noted by the TODO comments. The plugin-side trampolines DO have guard_schema/guard_source_config wrappers, so plugin panics are caught on the plugin side — but host-side panics in callbacks would propagate.

Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Comment thread crates/plugin-native/src/wrapper.rs Outdated
Comment on lines +789 to +791
(outputs, error)
})
.await
.map_err(|e| {
StreamKitError::Runtime(format!("Plugin processing task panicked: {e}"))
})?;
.await?;
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration Bot Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Timeout propagation exits run_processor/run_source without state emission

When call_with_timeout returns a timeout error for process_packet (wrapper.rs:791) or tick (wrapper.rs:1149), the ? operator propagates the error out of run_processor/run_source immediately. This skips the cleanup code that normally emits NodeState::Stopped or NodeState::Failed and aborts input tasks. The pipeline coordinator would see the node's task return Err(...) without a preceding state update. This is a pre-existing pattern (the old spawn_blocking(...).await.map_err(...)? had the same issue for JoinError), but the timeout feature makes this path significantly more reachable. A scopeguard or cleanup block before the ? would address this.

Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The timeout propagation skipping state emission is a pre-existing pattern (as noted). We'll address this in PR 7 (host observability) where we're adding proper state tracking and cleanup guards across all error paths. The worker thread change doesn't worsen this — same ? propagation semantics as before, just through a different dispatch mechanism.

Replace per-packet tokio::task::spawn_blocking dispatch with a
dedicated OS thread per plugin instance that receives work over a
bounded channel. The worker owns reusable CallbackContext buffers
and hoisted Arc clones, eliminating ~7 Arc::clone operations and
a Vec allocation per packet.

The async side sends WorkerRequest messages and awaits oneshot
replies, preserving the existing timeout behavior via
tokio::time::timeout on the reply channel.

No behavioral change — same begin_call/CallGuard pattern, same
callback shims, same error handling. All existing tests pass.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
staging-devin-ai-integration[bot]

This comment was marked as resolved.

InstanceWorker::Drop was synchronously joining the worker thread,
which would block the tokio async runtime if the worker was stuck
in a long FFI call (e.g. after a timeout).  This regressed from the
old spawn_blocking approach where JoinHandles were simply dropped.

Fix: detach the worker thread on drop.  The Arc<InstanceState> held
by the worker ensures the plugin instance stays alive until the FFI
call completes, and request_drop/destroy_instance handle deferred
cleanup.

Also fix Tick handler to propagate callback_ctx.error on success,
consistent with Process and Flush handlers.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
@staging-devin-ai-integration staging-devin-ai-integration Bot changed the title feat(plugin-sdk): FFI panic guards + RAII call guard, timeout, pointer provenance feat(plugin-sdk): FFI panic guards, RAII call guard, timeout, pointer provenance, per-instance worker Apr 21, 2026
staging-devin-ai-integration[bot]

This comment was marked as resolved.

- Add reply+timeout to OnUpstreamHint so a wedged plugin cannot
  stall the tick loop indefinitely.
- Wrap each FFI call in catch_unwind to preserve per-operation
  panic diagnostics ("Plugin <op> panicked: <msg>") instead of
  generic "Worker thread died".
- Shorten thread name prefix to "skit-plg-" to preserve more
  node_id within Linux 15-byte prctl(PR_SET_NAME) limit.
- Remove Arc wrapper from input_pin_cstrs (no longer cloned into
  per-packet closures).
- Remove redundant let _lib = Arc::clone(&state.library) in
  worker_thread_main; InstanceState.library keeps it alive.
- Remove redundant try_send(Shutdown) before InstanceWorker Drop
  (channel close already signals the worker to exit).
- Remove Shutdown variant (now dead code).
- Suppress clippy::missing_const_for_fn on tx() per reviewer
  preference for plain fn.
- Fix docstring accuracy re Vec allocation claim.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
streamkit-devin and others added 2 commits April 23, 2026 07:41
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Wire up LoadedPluginRegistry::unregister via Drop on LoadedNativePlugin,
  fixing ghost entries after unload and probe-path registry pollution.
- Rename counters: plugin.calls, plugin.errors, plugin.panics (drop .total
  suffix to avoid double _total from OTel Prometheus exporter).
- Rename instance_count to load_count for accurate semantics.
- Change record_call/record_panic op parameter to &'static str to eliminate
  per-call heap allocation on the hot path.
- Add plugin.kind and node.id fields to worker-thread panic error! logs.
- Remove unused plugin_path field from LoadedNativePlugin.
- Soften LoadedPluginRegistry doc; add OTel init dependency note.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
staging-devin-ai-integration[bot]

This comment was marked as resolved.

streamkit-devin and others added 6 commits April 23, 2026 14:00
- Eliminate hot-path heap allocations: pre-build [KeyValue; 2] label
  arrays on InstanceState at construction time instead of allocating
  kind.to_string() per FFI call.  Change op parameter to &'static str.
- Collapse record_call + record_panic into single record(outcome)
  method using CallOutcome enum (Success/Error/Panic).
- Make registry internal: drop public plugin_registry(), list(), and
  PluginInfo.  Registry is now write-only bookkeeping for
  register/unregister; diagnostics endpoint deferred.
- Simplify register() signature to just kind (path/api_version were
  unused after making registry internal), fixing stale plugin_path on
  re-registration.
- Add timeout recording from caller (async) side via record_timeout()
  so timed-out calls are distinguishable in metrics from slow successes.
- Move `use tracing::Instrument` to top-of-file imports.
- Remove #[allow(clippy::missing_const_for_fn)] (String already
  disqualifies const).

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Fix success/error classification drift: derive metric outcome from
  the final error value (err.is_none()) instead of a separate
  result.success && callback_ctx.error.is_none() check, so metrics
  and return path agree.
- Fix CallOutcome::Panic double-count: only bump panics_total, not
  errors_total, so dashboards can sum errors + panics safely.
- Make metric_labels non-optional in await_reply — all callers pass
  labels, preventing future refactors from silently losing timeout
  counts.
- Remove LoadedPluginRegistry entirely — it was write-only dead code
  with no consumer.  Can be re-introduced when a diagnostics endpoint
  is wired up.
- Rename span from native_plugin to native_plugin.run to match OTel
  metric namespacing convention.
- Add Debug derive on CallOutcome.
- Add doc comment on build_labels noting one-time-per-instance cost.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Add dedicated plugin.timeouts counter so timeouts are
  distinguishable from FFI errors in dashboards.
- record_timeout now also bumps calls_total so the call attempt is
  counted and error ratios stay correct for wedged plugins.
- Document that a timeout followed by worker completion produces
  two calls_total entries for the same logical call.
- Add comment on on_upstream_hint explaining that hints are
  best-effort advisory signals and intentionally not instrumented
  with metrics.
- Trim trailing blank lines in lib.rs.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Update comment to reflect actual behavior: record_timeout bumps
plugin.calls + plugin.timeouts, not plugin.errors.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Switch error!() fields from dotted (plugin.kind, node.id) to
  snake_case (plugin_kind, node_id) matching the codebase convention
  for tracing events.  Dotted names remain on spans only.
- Document that plugin_kind on InstanceState is the raw kind
  (e.g. whisper), not the namespaced form (plugin::native::whisper).
- Trim stray trailing blank line in lib.rs.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 19 additional findings in Devin Review.

Open in Devin Review (Staging)
Debug

Playground

Comment on lines +428 to +459
CPacketType::Binary => {
// Upgrade to BinaryWithMeta so we have a CBinaryPacket to
// attach the zero-copy handle to.
//
// `bp.data` inherits the original `self.packet.data` pointer,
// while `buffer_handle` holds a `Bytes::clone()` (Arc bump —
// same underlying buffer). This relies on `Bytes::clone()`
// preserving `as_ptr()` identity, which holds for the
// current `bytes` crate implementation (shared-reference
// Bytes always clones the Arc, not the data).
let mut bp = Box::new(CBinaryPacket {
data: self.packet.data.cast::<u8>(),
data_len: self.packet.len,
content_type: std::ptr::null(),
metadata: std::ptr::null(),
buffer_handle: std::ptr::null_mut(),
free_fn: None,
});
let cloned = Box::new(source_bytes.clone());
bp.buffer_handle = Box::into_raw(cloned).cast::<c_void>();
bp.free_fn = Some(free_binary_buffer_handle);
self.packet = CPacket {
packet_type: CPacketType::BinaryWithMeta,
data: std::ptr::from_mut::<CBinaryPacket>(&mut *bp).cast::<c_void>(),
len: std::mem::size_of::<CBinaryPacket>(),
};
self.owned = CPacketOwned::BinaryWithMeta(BinaryWithMetaOwned {
content_type: None,
metadata: None,
binary: bp,
});
},
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration Bot Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 set_binary_buffer_handle's Binary→BinaryWithMeta wire upgrade may surprise v9 C plugins

When the host sends a plain Binary packet to a v9 plugin, set_binary_buffer_handle (conversions.rs:428-459) upgrades it to BinaryWithMeta on the wire (with null content_type and metadata) to attach the zero-copy handle. The v9 version doc (types.rs:39-42) correctly documents that v9 C plugins must handle BinaryWithMeta even with null fields. However, a v9 plugin that only switches on CPacketType::Binary and doesn't handle BinaryWithMeta would silently drop packets that were Binary before this change. This is documented as a deliberate wire change but could be a source of subtle bugs in third-party v9 plugins.

Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is an intentional v9 wire change. The upgrade is documented in:

  1. The set_binary_buffer_handle doc comment itself ("Wire note (S2)")
  2. types.rs:39-42 v9 version docs requiring v9 C plugins to handle BinaryWithMeta with null fields
  3. The v9 migration notes for plugin authors

A v9 plugin that only matches CPacketType::Binary would miss these packets, but this is a documented contract change that comes with the version bump. Third-party plugins upgrading to v9 must audit their packet_type match arms — this should be called out explicitly in the changelog before release.

streamkit-devin and others added 11 commits April 23, 2026 16:11
Add a generic ResourceCache<K, V> that replaces the hand-rolled
LazyLock<Mutex<HashMap>> pattern used across all native plugins.

- New module: resource_cache.rs with get_or_init, stats, clear
- ResourceSupport trait updated with associated Key type and
  provided get_or_init_resource default implementation
- Migrate parakeet plugin as proof of the new API shape
- 6 unit tests including concurrent access

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Remove manual unsafe Send/Sync impls; rely on auto-derive
- Fix stats race: losing thread in init race counted as miss
  (both threads paid full init cost)
- Change get_or_init closure to FnOnce(&K) to avoid clone on
  every call from get_or_init_resource
- Document poisoned-lock behavior in len()
- Document ResourceManager relationship and deinit_resource
  removal in module-level and trait-level docs
- Note that ResourceSupport trait is scaffolding for future
  plugin migrations and native_plugin_entry! auto-wiring

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…t trait

- Replace String errors with CacheError enum (Poisoned | Init)
- Add init_races counter to CacheStats for contention visibility
- Remove ResourceSupport trait (zero implementers, free-function
  init_resource cannot capture &logger/&config)
- Document clear() vs in-flight init race in docstring
- Use Entry API instead of option_if_let_else
- Fix doc example to use .map_err(|e| e.to_string())

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…rors

- Add is_poisoned() method so callers can distinguish empty from poisoned
- Add init_races_counted test with double-barrier to force all threads
  into init simultaneously, asserting init_races >= 1
- Prefix parakeet cache error with context: 'recognizer cache: {e}'
- Remove Resource trait from prelude (no plugin implements it)
- Remove stale migration note from module docs

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
… Send/Sync

- Document is_empty() and clear() behavior on poisoned mutex
- Document stats() non-atomic snapshot semantics
- Trim speculative 'future bridge' from module docs
- Replace ParakeetNode manual unsafe Send/Sync with const assertion
- Prefix parakeet cache error with 'recognizer cache:' context
- Scope concurrent tests to Arc<ResourceCache> instead of static

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
staging-devin-ai-integration[bot]

This comment was marked as resolved.

streamkit-devin and others added 2 commits April 24, 2026 12:57
…nment, guard removal

- plugin_log_callback: check subscriber filter before acquiring
  PLUGIN_LOG_METADATA_CACHE mutex (finding #2)
- await_reply(None): add DEFAULT_CALL_TIMEOUT backstop so reply side
  is never unbounded (finding #9)
- native_call_timeout_secs: clamp values below 1 to prevent instant
  timeouts (finding #13)
- free_packet_buffer_handle: use raw-pointer reads instead of &* deref
  for Stacked Borrows consistency (finding #6)
- ConvertedPacketGuard: remove — guard.consume() evaluated before push
  so guard never fires; Packet::drop handles unwind cleanup (finding #12)
- set_binary_buffer_handle: free previously-set handle before overwrite
  to prevent leak on double-call (Devin Review)

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- free_binary_buffer_handle: swap raw catch_unwind to guard_unit for
  consistent panic logging across FFI boundaries (finding #2)
- plugin_log_field_set: document value-equality invariant (finding #5)
- DEFAULT_CALL_TIMEOUT: document relationship to config and backstop
- WORKER_CHANNEL_CAPACITY: name the channel capacity constant

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- set_binary_buffer_handle: document lifetime invariant for Binary
  upgrade path — caller must keep CPacketRepr alive across FFI call
- null_binary_buffer_handle: strengthen safety doc to require heap-
  allocated storage (not .rodata)
- from_node_metadata: document Vec pointer-stability invariant
- error_to_c: tighten doc — callers must copy immediately, nested
  calls invalidate previous pointer
- Logger Send+Sync: name user_data/enabled_user_data as the specific
  fields requiring the manual impl
- call_guard_panic_then_destroy_invariant: test that in_flight_calls
  reaches 0 after panic and destroy_instance still succeeds

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
staging-devin-ai-integration[bot]

This comment was marked as resolved.

…ot infinite

The await_reply backstop was added but the set_call_timeout doc still
claimed None gives infinite wait.  Align doc with implementation:
None falls back to DEFAULT_CALL_TIMEOUT (300s).

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
@streamer45 streamer45 merged commit 3d7cd83 into main Apr 24, 2026
17 checks passed
@streamer45 streamer45 deleted the devin/1776623122-plugin-sdk-hardening branch April 24, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants