Add Nemotron-ASR streaming inference to Python SDK#612
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds end-to-end live audio (PCM chunk) streaming transcription to the Foundry Local Python SDK, including session lifecycle management, native interop support for binary payloads, and tests/samples to validate Windows DLL loading and Nemotron ASR streaming.
Changes:
- Introduces
LiveAudioTranscriptionSession+ supporting response/options/error types for streaming microphone-style PCM input. - Extends
CoreInteropwith aStreamingRequestBufferandexecute_command_with_binary()to push raw audio to native core. - Adds unit + E2E coverage and a sample app, including Windows DLL preload workarounds for brotli/LoadLibrary behavior.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/python/src/openai/live_audio_transcription_client.py | Implements the streaming session (start/append/stream/stop) and background push loop. |
| sdk/python/src/openai/live_audio_transcription_types.py | Adds response/options/error DTOs and JSON parsing helpers. |
| sdk/python/src/detail/core_interop.py | Adds binary-command execution path and Windows DLL loading hardening for ORT/GenAI. |
| sdk/python/src/openai/audio_client.py | Adds factory method to create the live transcription session. |
| sdk/python/src/openai/init.py | Exports new session and types from the openai package surface. |
| sdk/python/test/openai/test_live_audio_transcription.py | Unit tests for parsing/options/state guards and mocked streaming behavior. |
| sdk/python/test/openai/test_live_audio_transcription_e2e.py | Windows-only E2E test exercising real native DLLs and nemotron model pipeline. |
| sdk/python/test/openai/conftest.py | Preloads ORT/GenAI DLLs for E2E to avoid brotli-related DLL search changes. |
| sdk/python/test/conftest.py | Preloads ORT/GenAI DLLs early in all tests to avoid Windows DLL search conflicts. |
| samples/python/live-audio-transcription/src/app.py | Demonstration app using PyAudio to stream microphone PCM into the session. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…microsoft/Foundry-Local into ruiren/live-audio-stream-python
Per kunal-vaishnavi feedback: conftest.py should not contain E2E-specific helpers. Moved _get_e2e_test_pkgs_path, _preload_and_init_e2e, and e2e_manager fixture into test_live_audio_transcription_e2e.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…test cleanup - Revert foundry-local-core to 1.0.0rc1 (no dev builds on main) - Re-add try/except for execute_command_with_binary (1.0.0rc1 lacks symbol) - Document blocking backpressure behavior in append() - Remove unnecessary time.sleep(0.5) in error test - Add stop() cleanup in error test (required for non-daemon thread) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add Nemotron-ASR streaming inference to Python SDK
Description
Adds real-time audio streaming support to the Foundry Local Python SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI's StreamingProcessor API (Nemotron ASR).
This is the Python port of C# PR #485 with full feature parity. The existing
AudioClientonly supports file-based transcription. This PR introducesLiveAudioTranscriptionSessionthat accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as a synchronous generator.What's included
New files
src/openai/live_audio_transcription_client.py— Streaming session withstart(),append(),get_transcription_stream(),stop()src/openai/live_audio_transcription_types.py—LiveAudioTranscriptionResponse(ConversationItem-shaped),LiveAudioTranscriptionOptions,CoreErrorResponse,TranscriptionContentParttest/openai/test_live_audio_transcription.py— 22 unit tests for deserialization, settings, state guards, streaming pipelinetest/openai/test_live_audio_transcription_e2e.py— E2E test with real native DLLs and nemotron modeltest/openai/conftest.py— DLL preload for E2E testssamples/python/live-audio-transcription/src/app.py— Live microphone transcription demoModified files
src/openai/audio_client.py— Addedcreate_live_transcription_session()factory methodsrc/detail/core_interop.py— AddedStreamingRequestBufferstruct,execute_command_with_binary(),start_audio_stream,push_audio_data,stop_audio_streammethods, and_load_dll_win()for robust DLL loading on Windowssrc/openai/__init__.py— Exported new live transcription typestest/conftest.py— Pre-load ORT/GenAI DLLs before brotli import to avoid Windows DLL search conflictsAPI surface
C# parity
CreateLiveTranscriptionSession()create_live_transcription_session()StartAsync(ct)start()AppendAsync(ReadOnlyMemory<byte>, ct)append(bytes)GetTranscriptionStream()get_transcription_stream()StopAsync(ct)stop()IAsyncDisposablewith)LiveAudioTranscriptionOptionsLiveAudioTranscriptionOptionsLiveAudioTranscriptionResponseLiveAudioTranscriptionResponseDesign highlights
LiveAudioTranscriptionResponseuses the OpenAI RealtimeConversationItemshape (content[0].text/transcript) for forward compatibilityqueue.Queueserializes audio pushes from any thread (safe for mic callbacks) with backpressurestart()and immutable during the sessionappend()copies input data to avoid issues with callers reusing buffers (e.g., PyAudio)start_audio_streamandstop_audio_streamroute throughexecute_command;push_audio_dataroutes throughexecute_command_with_binary— no new native entry points requiredLoadLibraryExWwithLOAD_WITH_ALTERED_SEARCH_PATHon Windows to prevent conflicts with stale system-level ORT DLLsVerified working