Skip to content

Add Nemotron-ASR streaming inference to Rust SDK#613

Open
rui-ren wants to merge 15 commits intomainfrom
ruiren/live-audio-stream-rust
Open

Add Nemotron-ASR streaming inference to Rust SDK#613
rui-ren wants to merge 15 commits intomainfrom
ruiren/live-audio-stream-rust

Conversation

@rui-ren
Copy link
Copy Markdown
Contributor

@rui-ren rui-ren commented Apr 8, 2026

Add Nemotron-ASR streaming inference to Rust SDK"

Description

Ports the C# live audio transcription feature (PR #485) to the Rust SDK with full API parity.

The existing AudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionSession that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async stream.

What's included

New files

  • sdk/rust/src/openai/live_audio_client.rs — Streaming session with start(), append(), get_transcription_stream(), stop(), plus types, cancellation support, and unit tests
  • sdk/rust/tests/integration/live_audio_test.rs — E2E integration test with synthetic PCM audio
  • samples/rust/live-audio-transcription-example/ — Full sample with real microphone capture (cpal) and resampling

Modified files

  • sdk/rust/src/detail/core_interop.rs — Added StreamingRequestBuffer FFI struct and execute_command_with_binary() for binary audio data
  • sdk/rust/src/openai/audio_client.rs — Added create_live_transcription_session() factory method
  • sdk/rust/src/detail/model.rs, model_variant.rs — Wired factory method to Model
  • sdk/rust/src/openai/mod.rs, src/lib.rs — Module registration and public exports
  • sdk/rust/Cargo.toml — Added tokio-util dependency for CancellationToken

API surface

let audio_client = model.create_audio_client();
let session = audio_client.create_live_transcription_session();

session.settings.sample_rate = 16000;
session.settings.channels = 1;
session.settings.language = Some("en".into());

session.start(None).await?;

// Push audio from microphone callback
session.append(&pcm_bytes, None).await?;

// Read results as async stream
use tokio_stream::StreamExt;
let mut stream = session.get_transcription_stream()?;
while let Some(result) = stream.next().await {
    let result = result?;
    println!("{}", result.content[0].text);
}

session.stop(None).await?;

C# API parity

C# Rust Status
CreateLiveTranscriptionSession() create_live_transcription_session()
StartAsync(CancellationToken) start(Option<CancellationToken>)
AppendAsync(ReadOnlyMemory<byte>, CancellationToken) append(&[u8], Option<CancellationToken>)
GetTranscriptionStream(CancellationToken) get_transcription_stream()
StopAsync(CancellationToken) + cancel-safe cleanup stop(Option<CancellationToken>) + cancel-safe cleanup
IAsyncDisposable.DisposeAsync() Drop with best-effort native stop
LiveAudioTranscriptionResponse.Content[0].Text response.content[0].text
LiveAudioTranscriptionResponse.Content[0].Transcript response.content[0].transcript
LiveAudioTranscriptionResponse.IsFinal response.is_final
LiveAudioTranscriptionResponse.StartTime/EndTime response.start_time / response.end_time
LiveAudioTranscriptionOptions (SampleRate, Channels, BitsPerSample, Language, PushQueueCapacity) LiveAudioTranscriptionOptions (sample_rate, channels, bits_per_sample, language, push_queue_capacity)
CoreErrorResponse.TryParse() CoreErrorResponse::try_parse()
Native commands: audio_stream_start, audio_stream_push, audio_stream_stop Same commands via execute_command / execute_command_with_binary

Design highlights

  • CancellationToken supportstart/append/stop accept Option<CancellationToken> via tokio_util::sync::CancellationToken
  • Cancel-safe stopstop() always performs native audio_stream_stop even if token fires, preventing native session leaks (matches C# StopAsync pattern)
  • Response envelopeLiveAudioTranscriptionResponse uses content: Vec<ContentPart> matching C#'s ConversationItem.Content[0].Text/Transcript
  • Bounded push queue — Backpressure via bounded channel (capacity=100); prevents unbounded memory growth
  • Push loop on blocking threadexecute_command_with_binary FFI calls run on spawn_blocking, keeping async runtime free
  • Settings freeze — Audio format settings are cloned at start() and immutable during the session
  • Drop safety — Best-effort synchronous audio_stream_stop in Drop to prevent native session leaks
  • FFI null pointer safety — Empty binary slices use std::ptr::null() to avoid dangling pointer across FFI boundary

Verified working

  • ✅ SDK build succeeds (0 errors, 0 clippy warnings)
  • ✅ 13 unit tests passing (JSON deserialization, settings defaults, error parsing, content envelope)
  • ✅ E2E pipeline: Microphone (48kHz/2ch/F32) → Resample (16kHz/mono/16-bit) → SDK → Core.dll → onnxruntime-genai.dll → nemotron model
  • ✅ Synthetic audio test: 30 chunks (96KB PCM) pushed with clean session lifecycle
  • ✅ Live microphone test: real-time capture, session start/stop, no native errors

Stats

  • 14 files changed, 1,329 additions, 2 deletions

Copilot AI review requested due to automatic review settings April 8, 2026 19:45
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Ready Ready Preview, Comment Apr 25, 2026 1:50am

Request Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds live (chunked) PCM audio transcription streaming to the Foundry Local Rust SDK, aligning the Rust API with the existing C# live audio transcription session feature and extending the SDK beyond file-based transcription.

Changes:

  • Introduces LiveAudioTranscriptionSession + associated response/options/types and stream wrapper in the Rust SDK.
  • Extends the Rust FFI bridge with execute_command_with_binary() to send JSON params + binary PCM payloads.
  • Adds integration test coverage and a new Rust sample demonstrating microphone capture (cpal) and streaming transcription.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdk/rust/src/openai/live_audio_client.rs New streaming session implementation, types, cancellation support, and unit tests
sdk/rust/src/detail/core_interop.rs Adds StreamingRequestBuffer + optional execute_command_with_binary symbol support
sdk/rust/src/openai/audio_client.rs Adds create_live_transcription_session() factory method
sdk/rust/src/detail/model.rs Exposes Model::create_live_transcription_session()
sdk/rust/src/detail/model_variant.rs Wires variant factory for live transcription sessions
sdk/rust/src/openai/mod.rs Registers and re-exports live audio transcription module/types
sdk/rust/src/lib.rs Public re-exports for the new live transcription session/types
sdk/rust/Cargo.toml Adds tokio-util for CancellationToken
sdk/rust/tests/integration/main.rs Registers the new integration test module
sdk/rust/tests/integration/live_audio_test.rs New E2E-ish integration test using synthetic PCM audio
samples/rust/live-audio-transcription-example/src/main.rs New microphone/synthetic streaming transcription sample
samples/rust/live-audio-transcription-example/Cargo.toml Declares sample dependencies (cpal, tokio, sdk path dep)
samples/rust/Cargo.toml Adds the new sample crate to the workspace
codex-feedback.md Adds review/validation notes for the feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/tests/integration/live_audio_test.rs Outdated
Comment thread samples/rust/live-audio-transcription-example/src/main.rs Outdated
Comment thread samples/rust/live-audio-transcription-example/Cargo.toml Outdated
Comment thread codex-feedback.md Outdated
Comment thread sdk/rust/tests/integration/live_audio_test.rs
Comment thread sdk/rust/src/openai/live_audio_client.rs
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread samples/rust/live-audio-transcription-example/src/main.rs Outdated
Comment thread samples/rust/live-audio-transcription-example/src/main.rs Outdated
Comment thread samples/rust/live-audio-transcription-example/src/main.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
@rui-ren rui-ren changed the title Add live audio transcription streaming support to Foundry Local Rust SDK Add Nemotron-ASR streaming inference to Rust SDK Apr 14, 2026
ruiren_microsoft and others added 9 commits April 23, 2026 15:13
Port the C# live audio transcription feature (PR #485) to the Rust SDK
with full API parity.

New files:
- src/openai/live_audio_client.rs: LiveAudioTranscriptionSession with
  start/append/get_transcription_stream/stop lifecycle, response types,
  CoreErrorResponse, and unit tests
- tests/integration/live_audio_test.rs: E2E test with synthetic PCM audio

Modified files:
- src/detail/core_interop.rs: StreamingRequestBuffer FFI struct and
  execute_command_with_binary method for binary audio data
- src/openai/audio_client.rs: create_live_transcription_session() factory
- src/detail/model.rs, model_variant.rs: create_live_transcription_session()
- src/openai/mod.rs, src/lib.rs: Module and public type exports

API surface:
  let audio_client = model.create_audio_client();
  let session = audio_client.create_live_transcription_session();
  session.settings.sample_rate = 16000;
  session.start().await?;
  session.append(&pcm_bytes).await?;
  let mut stream = session.get_transcription_stream()?;
  // use tokio_stream::StreamExt;
  while let Some(result) = stream.next().await { ... }
  session.stop().await?;

Design highlights:
- Bounded push channel with backpressure (capacity=100)
- Push loop runs on blocking thread via spawn_blocking
- Fail-fast on native errors (no retry logic)
- Settings frozen at start() via clone snapshot
- Output channel completed on stop() after final result

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- core_interop.rs: Use std::ptr::null() for empty binary_data slices
  to avoid passing dangling pointer across FFI boundary
- live_audio_client.rs: Call native audio_stream_stop synchronously
  in Drop to prevent native session leaks when stop() is not called

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address codex-feedback.md parity gaps:

1. CancellationToken support: start/append/stop now accept
   Option<CancellationToken> (via tokio_util::sync::CancellationToken).
   stop() uses cancel-safe pattern matching C# StopAsync  native
   session stop is always performed even if token fires.

2. Response envelope matches C#: LiveAudioTranscriptionResponse now
   has content: Vec<ContentPart> with text/transcript fields, so
   callers use result.content[0].text (identical to C# Content[0].Text).

3. Added tokio-util dependency for CancellationToken.

Updated E2E sample and integration test to use new API shape.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Sample: update download progress callback from &str to f64 to match
  upstream API change (PR #608)
- Apply cargo fmt to all SDK and sample files for CI compliance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The function and its AudioClient import triggered -D warnings (dead_code)
in the CI build. The E2E test creates the session directly via
model.create_audio_client() and doesn't use this helper.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SDK: fix append() deadlock (clone tx before await), start()
cancellation leak, double-stop, get_transcription_stream() async,
remove unused field, fix error parsing, remove manual Unpin,
improve error messages, refactor stop() into helpers, rewrite
push_loop per reviewer suggestion.

Sample: extract convert_audio(), edition 2024, remove
codex-feedback.md, document BufferSize::Default, replace
Arc+spawn with channel-based mic forwarding.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread sdk/rust/src/openai/live_audio_client.rs
Comment thread sdk/rust/src/detail/model.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
@nenad1002
Copy link
Copy Markdown
Contributor

Run cargo fmt & cargo clippy

Comment thread sdk/rust/src/detail/model_variant.rs Outdated
…implify clone

- Remove create_live_transcription_session() from Model and ModelVariant
  per bmehta001/kunal-vaishnavi: session should only be created via
  AudioClient, not directly from Model (matches C# pattern)
- Simplify .as_ref().cloned() to .clone() per nenad1002

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Always await start_future to completion  if cancellation was requested,
check is_cancelled() afterwards and call audio_stream_stop to clean up
any native session that was created. This prevents native handle leaks
when CancellationToken fires during start().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/tests/integration/live_audio_test.rs
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated
- push_loop: match on FoundryLocalError::CommandExecution { reason }
  to extract raw JSON for CoreErrorResponse::try_parse instead of
  e.to_string() which includes Display prefix
- Integration test: propagate stream errors via shared state and
  assert no errors after stop()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These are always positive values  no reason to use i32 in Rust.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t, fix push_loop

- build_start_request: use json!() macro directly instead of Map::insert
- write_final_result: use idiomatic combinator chain instead of nested ifs
- push_loop: drop(output_tx) + return on fatal error so stream completes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants