From 75b91aa13e10ba19c9f4d0f2fc71b25c118bbea0 Mon Sep 17 00:00:00 2001 From: Rachit Mehta Date: Tue, 2 Dec 2025 10:49:15 -0500 Subject: [PATCH 1/5] add bidi to README --- README.md | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/README.md b/README.md index 3ff0ec2e4..b3887de10 100644 --- a/README.md +++ b/README.md @@ -184,6 +184,84 @@ Built-in providers: Custom providers can be implemented using [Custom Providers](https://strandsagents.com/latest/user-guide/concepts/model-providers/custom_model_provider/) +### Bidirectional Streaming + +> **⚠️ Experimental Feature**: Bidirectional streaming is currently in experimental status. APIs may change in future releases as we refine the feature based on user feedback and evolving model capabilities. + +Build real-time voice and audio conversations with persistent streaming connections. Unlike traditional request-response patterns, bidirectional streaming maintains long-running conversations where users can interrupt, provide continuous input, and receive real-time audio responses. + +**Key Features:** +- Real-time audio input/output streaming +- Automatic interruption detection +- Concurrent tool execution during conversations +- Support for text, audio, and image inputs +- Provider-agnostic event system + +**Supported Model Providers:** +- Amazon Nova Sonic (`amazon.nova-sonic-v1:0`) +- Google Gemini Live (`gemini-2.5-flash-native-audio-preview-09-2025`) +- OpenAI Realtime API (`gpt-realtime`) + +**Quick Example:** + +```python +from strands.experimental.bidi import BidiAgent +from strands.experimental.bidi.models import BidiNovaSonicModel +from strands.experimental.bidi.io import BidiAudioIO, BidiTextIO +from strands_tools import calculator + +# Create bidirectional agent with audio model +model = BidiNovaSonicModel() +agent = BidiAgent(model=model, tools=[calculator]) + +# Setup audio and text I/O +audio_io = BidiAudioIO() +text_io = BidiTextIO() + +# Run with real-time audio streaming +await agent.run( + inputs=[audio_io.input()], + outputs=[audio_io.output(), text_io.output()] +) +``` + +**Configuration Options:** + +```python +# Configure audio settings +model = BidiNovaSonicModel( + provider_config={ + "audio": { + "input_rate": 16000, + "output_rate": 16000, + "voice": "matthew" + }, + "inference": { + "max_tokens": 2048, + "temperature": 0.7 + } + } +) + +# Configure I/O devices +audio_io = BidiAudioIO( + input_device_index=0, # Specific microphone + output_device_index=1, # Specific speaker + input_buffer_size=10, + output_buffer_size=10 +) +``` + +**Event Types:** + +The bidirectional streaming system uses a rich event model: + +- **Input Events**: `BidiTextInputEvent`, `BidiAudioInputEvent`, `BidiImageInputEvent` +- **Output Events**: `BidiAudioStreamEvent`, `BidiTranscriptStreamEvent`, `BidiInterruptionEvent`, `BidiUsageEvent`, `ToolUseStreamEvent` +- **Lifecycle Events**: `BidiConnectionStartEvent`, `BidiResponseStartEvent`, `BidiResponseCompleteEvent`, `BidiConnectionCloseEvent` + +All events are strongly typed and JSON-serializable for easy integration with web applications and logging systems. + ### Example tools Strands offers an optional strands-agents-tools package with pre-built tools for quick experimentation: From 185db15972c6998760f0b7de8fac687e939bf398 Mon Sep 17 00:00:00 2001 From: Rachit Mehta Date: Tue, 2 Dec 2025 11:37:27 -0500 Subject: [PATCH 2/5] address comments --- README.md | 67 ++++++++++++++++++++++++++----------------------------- 1 file changed, 32 insertions(+), 35 deletions(-) diff --git a/README.md b/README.md index b3887de10..5038b8661 100644 --- a/README.md +++ b/README.md @@ -184,11 +184,22 @@ Built-in providers: Custom providers can be implemented using [Custom Providers](https://strandsagents.com/latest/user-guide/concepts/model-providers/custom_model_provider/) -### Bidirectional Streaming +### Example tools -> **⚠️ Experimental Feature**: Bidirectional streaming is currently in experimental status. APIs may change in future releases as we refine the feature based on user feedback and evolving model capabilities. +Strands offers an optional strands-agents-tools package with pre-built tools for quick experimentation: + +```python +from strands import Agent +from strands_tools import calculator +agent = Agent(tools=[calculator]) +agent("What is the square root of 1764") +``` + +It's also available on GitHub via [strands-agents/tools](https://github.com/strands-agents/tools). -Build real-time voice and audio conversations with persistent streaming connections. Unlike traditional request-response patterns, bidirectional streaming maintains long-running conversations where users can interrupt, provide continuous input, and receive real-time audio responses. +### [Bidirectional Streaming](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/bidirectional-streaming/quickstart) + +> **⚠️ Experimental Feature**: Bidirectional streaming is currently in experimental status. APIs may change in future releases as we refine the feature based on user feedback and evolving model capabilities. **Key Features:** - Real-time audio input/output streaming @@ -205,24 +216,31 @@ Build real-time voice and audio conversations with persistent streaming connecti **Quick Example:** ```python +import asyncio from strands.experimental.bidi import BidiAgent from strands.experimental.bidi.models import BidiNovaSonicModel from strands.experimental.bidi.io import BidiAudioIO, BidiTextIO +from strands.experimental.bidi.tools import stop_conversation from strands_tools import calculator -# Create bidirectional agent with audio model -model = BidiNovaSonicModel() -agent = BidiAgent(model=model, tools=[calculator]) +async def main(): + # Create bidirectional agent with audio model + model = BidiNovaSonicModel() + agent = BidiAgent(model=model, tools=[calculator, stop_conversation]) -# Setup audio and text I/O -audio_io = BidiAudioIO() -text_io = BidiTextIO() + # Setup audio and text I/O + audio_io = BidiAudioIO() + text_io = BidiTextIO() -# Run with real-time audio streaming -await agent.run( - inputs=[audio_io.input()], - outputs=[audio_io.output(), text_io.output()] -) + # Run with real-time audio streaming + # Say "stop conversation" to gracefully end the conversation + await agent.run( + inputs=[audio_io.input()], + outputs=[audio_io.output(), text_io.output()] + ) + +if __name__ == "__main__": + asyncio.run(main()) ``` **Configuration Options:** @@ -252,29 +270,8 @@ audio_io = BidiAudioIO( ) ``` -**Event Types:** - -The bidirectional streaming system uses a rich event model: - -- **Input Events**: `BidiTextInputEvent`, `BidiAudioInputEvent`, `BidiImageInputEvent` -- **Output Events**: `BidiAudioStreamEvent`, `BidiTranscriptStreamEvent`, `BidiInterruptionEvent`, `BidiUsageEvent`, `ToolUseStreamEvent` -- **Lifecycle Events**: `BidiConnectionStartEvent`, `BidiResponseStartEvent`, `BidiResponseCompleteEvent`, `BidiConnectionCloseEvent` - All events are strongly typed and JSON-serializable for easy integration with web applications and logging systems. -### Example tools - -Strands offers an optional strands-agents-tools package with pre-built tools for quick experimentation: - -```python -from strands import Agent -from strands_tools import calculator -agent = Agent(tools=[calculator]) -agent("What is the square root of 1764") -``` - -It's also available on GitHub via [strands-agents/tools](https://github.com/strands-agents/tools). - ## Documentation For detailed guidance & examples, explore our documentation: From f9f0e2d223570c1c82c2f7b925a90fbb0448137f Mon Sep 17 00:00:00 2001 From: Rachit Mehta Date: Tue, 2 Dec 2025 11:38:18 -0500 Subject: [PATCH 3/5] address comments --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 5038b8661..d241a5b54 100644 --- a/README.md +++ b/README.md @@ -270,8 +270,6 @@ audio_io = BidiAudioIO( ) ``` -All events are strongly typed and JSON-serializable for easy integration with web applications and logging systems. - ## Documentation For detailed guidance & examples, explore our documentation: From 6cbac512c1384dc002bd3edcb7a028cb1eebf580 Mon Sep 17 00:00:00 2001 From: Rachit Mehta Date: Tue, 2 Dec 2025 11:41:41 -0500 Subject: [PATCH 4/5] address comments --- README.md | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index d241a5b54..7af8af333 100644 --- a/README.md +++ b/README.md @@ -197,16 +197,11 @@ agent("What is the square root of 1764") It's also available on GitHub via [strands-agents/tools](https://github.com/strands-agents/tools). -### [Bidirectional Streaming](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/bidirectional-streaming/quickstart) +### Bidirectional Streaming > **⚠️ Experimental Feature**: Bidirectional streaming is currently in experimental status. APIs may change in future releases as we refine the feature based on user feedback and evolving model capabilities. -**Key Features:** -- Real-time audio input/output streaming -- Automatic interruption detection -- Concurrent tool execution during conversations -- Support for text, audio, and image inputs -- Provider-agnostic event system +Build real-time voice and audio conversations with persistent streaming connections. Unlike traditional request-response patterns, bidirectional streaming maintains long-running conversations where users can interrupt, provide continuous input, and receive real-time audio responses. Get started with your first BidiAgent by following the [Quickstart]((https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/bidirectional-streaming/quickstart)) guide. **Supported Model Providers:** - Amazon Nova Sonic (`amazon.nova-sonic-v1:0`) From 873da65a6e9c80666750de0780b48a7eaf3252cb Mon Sep 17 00:00:00 2001 From: Rachit Mehta Date: Tue, 2 Dec 2025 11:44:56 -0500 Subject: [PATCH 5/5] address comments --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7af8af333..e7d1b2a7e 100644 --- a/README.md +++ b/README.md @@ -201,7 +201,7 @@ It's also available on GitHub via [strands-agents/tools](https://github.com/stra > **⚠️ Experimental Feature**: Bidirectional streaming is currently in experimental status. APIs may change in future releases as we refine the feature based on user feedback and evolving model capabilities. -Build real-time voice and audio conversations with persistent streaming connections. Unlike traditional request-response patterns, bidirectional streaming maintains long-running conversations where users can interrupt, provide continuous input, and receive real-time audio responses. Get started with your first BidiAgent by following the [Quickstart]((https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/bidirectional-streaming/quickstart)) guide. +Build real-time voice and audio conversations with persistent streaming connections. Unlike traditional request-response patterns, bidirectional streaming maintains long-running conversations where users can interrupt, provide continuous input, and receive real-time audio responses. Get started with your first BidiAgent by following the [Quickstart](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/bidirectional-streaming/quickstart) guide. **Supported Model Providers:** - Amazon Nova Sonic (`amazon.nova-sonic-v1:0`)