Add Nemotron-ASR streaming inference to Rust SDK#613
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR adds live (chunked) PCM audio transcription streaming to the Foundry Local Rust SDK, aligning the Rust API with the existing C# live audio transcription session feature and extending the SDK beyond file-based transcription.
Changes:
- Introduces
LiveAudioTranscriptionSession+ associated response/options/types and stream wrapper in the Rust SDK. - Extends the Rust FFI bridge with
execute_command_with_binary()to send JSON params + binary PCM payloads. - Adds integration test coverage and a new Rust sample demonstrating microphone capture (cpal) and streaming transcription.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/rust/src/openai/live_audio_client.rs | New streaming session implementation, types, cancellation support, and unit tests |
| sdk/rust/src/detail/core_interop.rs | Adds StreamingRequestBuffer + optional execute_command_with_binary symbol support |
| sdk/rust/src/openai/audio_client.rs | Adds create_live_transcription_session() factory method |
| sdk/rust/src/detail/model.rs | Exposes Model::create_live_transcription_session() |
| sdk/rust/src/detail/model_variant.rs | Wires variant factory for live transcription sessions |
| sdk/rust/src/openai/mod.rs | Registers and re-exports live audio transcription module/types |
| sdk/rust/src/lib.rs | Public re-exports for the new live transcription session/types |
| sdk/rust/Cargo.toml | Adds tokio-util for CancellationToken |
| sdk/rust/tests/integration/main.rs | Registers the new integration test module |
| sdk/rust/tests/integration/live_audio_test.rs | New E2E-ish integration test using synthetic PCM audio |
| samples/rust/live-audio-transcription-example/src/main.rs | New microphone/synthetic streaming transcription sample |
| samples/rust/live-audio-transcription-example/Cargo.toml | Declares sample dependencies (cpal, tokio, sdk path dep) |
| samples/rust/Cargo.toml | Adds the new sample crate to the workspace |
| codex-feedback.md | Adds review/validation notes for the feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3422045 to
0f8ae7a
Compare
Port the C# live audio transcription feature (PR #485) to the Rust SDK with full API parity. New files: - src/openai/live_audio_client.rs: LiveAudioTranscriptionSession with start/append/get_transcription_stream/stop lifecycle, response types, CoreErrorResponse, and unit tests - tests/integration/live_audio_test.rs: E2E test with synthetic PCM audio Modified files: - src/detail/core_interop.rs: StreamingRequestBuffer FFI struct and execute_command_with_binary method for binary audio data - src/openai/audio_client.rs: create_live_transcription_session() factory - src/detail/model.rs, model_variant.rs: create_live_transcription_session() - src/openai/mod.rs, src/lib.rs: Module and public type exports API surface: let audio_client = model.create_audio_client(); let session = audio_client.create_live_transcription_session(); session.settings.sample_rate = 16000; session.start().await?; session.append(&pcm_bytes).await?; let mut stream = session.get_transcription_stream()?; // use tokio_stream::StreamExt; while let Some(result) = stream.next().await { ... } session.stop().await?; Design highlights: - Bounded push channel with backpressure (capacity=100) - Push loop runs on blocking thread via spawn_blocking - Fail-fast on native errors (no retry logic) - Settings frozen at start() via clone snapshot - Output channel completed on stop() after final result Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- core_interop.rs: Use std::ptr::null() for empty binary_data slices to avoid passing dangling pointer across FFI boundary - live_audio_client.rs: Call native audio_stream_stop synchronously in Drop to prevent native session leaks when stop() is not called Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address codex-feedback.md parity gaps: 1. CancellationToken support: start/append/stop now accept Option<CancellationToken> (via tokio_util::sync::CancellationToken). stop() uses cancel-safe pattern matching C# StopAsync native session stop is always performed even if token fires. 2. Response envelope matches C#: LiveAudioTranscriptionResponse now has content: Vec<ContentPart> with text/transcript fields, so callers use result.content[0].text (identical to C# Content[0].Text). 3. Added tokio-util dependency for CancellationToken. Updated E2E sample and integration test to use new API shape. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Sample: update download progress callback from &str to f64 to match upstream API change (PR #608) - Apply cargo fmt to all SDK and sample files for CI compliance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The function and its AudioClient import triggered -D warnings (dead_code) in the CI build. The E2E test creates the session directly via model.create_audio_client() and doesn't use this helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SDK: fix append() deadlock (clone tx before await), start() cancellation leak, double-stop, get_transcription_stream() async, remove unused field, fix error parsing, remove manual Unpin, improve error messages, refactor stop() into helpers, rewrite push_loop per reviewer suggestion. Sample: extract convert_audio(), edition 2024, remove codex-feedback.md, document BufferSize::Default, replace Arc+spawn with channel-based mic forwarding. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9754696 to
fbd9d89
Compare
|
Run cargo fmt & cargo clippy |
…implify clone - Remove create_live_transcription_session() from Model and ModelVariant per bmehta001/kunal-vaishnavi: session should only be created via AudioClient, not directly from Model (matches C# pattern) - Simplify .as_ref().cloned() to .clone() per nenad1002 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Always await start_future to completion if cancellation was requested, check is_cancelled() afterwards and call audio_stream_stop to clean up any native session that was created. This prevents native handle leaks when CancellationToken fires during start(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- push_loop: match on FoundryLocalError::CommandExecution { reason }
to extract raw JSON for CoreErrorResponse::try_parse instead of
e.to_string() which includes Display prefix
- Integration test: propagate stream errors via shared state and
assert no errors after stop()
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These are always positive values no reason to use i32 in Rust. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t, fix push_loop - build_start_request: use json!() macro directly instead of Map::insert - write_final_result: use idiomatic combinator chain instead of nested ifs - push_loop: drop(output_tx) + return on fatal error so stream completes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add Nemotron-ASR streaming inference to Rust SDK"
Description
Ports the C# live audio transcription feature (PR #485) to the Rust SDK with full API parity.
The existing
AudioClientonly supports file-based transcription. This PR introducesLiveAudioTranscriptionSessionthat accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async stream.What's included
New files
sdk/rust/src/openai/live_audio_client.rs— Streaming session withstart(),append(),get_transcription_stream(),stop(), plus types, cancellation support, and unit testssdk/rust/tests/integration/live_audio_test.rs— E2E integration test with synthetic PCM audiosamples/rust/live-audio-transcription-example/— Full sample with real microphone capture (cpal) and resamplingModified files
sdk/rust/src/detail/core_interop.rs— AddedStreamingRequestBufferFFI struct andexecute_command_with_binary()for binary audio datasdk/rust/src/openai/audio_client.rs— Addedcreate_live_transcription_session()factory methodsdk/rust/src/detail/model.rs,model_variant.rs— Wired factory method toModelsdk/rust/src/openai/mod.rs,src/lib.rs— Module registration and public exportssdk/rust/Cargo.toml— Addedtokio-utildependency forCancellationTokenAPI surface
C# API parity
CreateLiveTranscriptionSession()create_live_transcription_session()StartAsync(CancellationToken)start(Option<CancellationToken>)AppendAsync(ReadOnlyMemory<byte>, CancellationToken)append(&[u8], Option<CancellationToken>)GetTranscriptionStream(CancellationToken)get_transcription_stream()StopAsync(CancellationToken)+ cancel-safe cleanupstop(Option<CancellationToken>)+ cancel-safe cleanupIAsyncDisposable.DisposeAsync()Dropwith best-effort native stopLiveAudioTranscriptionResponse.Content[0].Textresponse.content[0].textLiveAudioTranscriptionResponse.Content[0].Transcriptresponse.content[0].transcriptLiveAudioTranscriptionResponse.IsFinalresponse.is_finalLiveAudioTranscriptionResponse.StartTime/EndTimeresponse.start_time/response.end_timeLiveAudioTranscriptionOptions(SampleRate, Channels, BitsPerSample, Language, PushQueueCapacity)LiveAudioTranscriptionOptions(sample_rate, channels, bits_per_sample, language, push_queue_capacity)CoreErrorResponse.TryParse()CoreErrorResponse::try_parse()audio_stream_start,audio_stream_push,audio_stream_stopexecute_command/execute_command_with_binaryDesign highlights
start/append/stopacceptOption<CancellationToken>viatokio_util::sync::CancellationTokenstop()always performs nativeaudio_stream_stopeven if token fires, preventing native session leaks (matches C#StopAsyncpattern)LiveAudioTranscriptionResponseusescontent: Vec<ContentPart>matching C#'sConversationItem.Content[0].Text/Transcriptexecute_command_with_binaryFFI calls run onspawn_blocking, keeping async runtime freestart()and immutable during the sessionaudio_stream_stopinDropto prevent native session leaksstd::ptr::null()to avoid dangling pointer across FFI boundaryVerified working
Stats