Realtime session config falls back to legacy format when voice is set

### Please read this first

- **Have you read the docs?** Yes – [Agents SDK docs](https://openai.github.io/openai-agents-js/)
- **Have you searched for related issues?** Yes – no existing report matched this behavior.

### Describe the bug

When a realtime session config includes the top-level `voice` field—whether a user supplies it directly or it is injected automatically when you instantiate a `RealtimeAgent` with `voice`—the config converter flags the entire payload as legacy. As a result, GA-specific audio settings (e.g. `audio.input.format`, `audio.output.format`, `audio.output.voice`) are discarded and the session falls back to legacy defaults (`audio/pcm` at 24kHz). Simply doing `new RealtimeAgent({ voice: '...' })` causes carefully chosen telephony formats such as `audio/pcmu` to be reset.

### Debug information

- Agents SDK version: `@openai/agents-realtime` v0.1.3
- Runtime environment: Node.js 22.16.0

### Repro steps

1. Add the following test file at `packages/agents-realtime/test/realtimeVoiceConfigRegression.test.ts`:

   ```ts
   import { describe, it, expect } from 'vitest';
   import { toNewSessionConfig } from '../src/clientMessages';
   import { RealtimeAgent } from '../src/realtimeAgent';
   import { RealtimeSession } from '../src/realtimeSession';
   import { OpenAIRealtimeBase } from '../src/openaiRealtimeBase';
   import type { RealtimeClientMessage } from '../src/clientMessages';

   const TELEPHONY_AUDIO_FORMAT = { type: 'audio/pcmu' as const };

   class CapturingTransport extends OpenAIRealtimeBase {
     status: 'connected' | 'disconnected' | 'connecting' | 'disconnecting' = 'disconnected';
     mergedConfig: any = null;
     events: RealtimeClientMessage[] = [];

     async connect(options: { initialSessionConfig?: any }) {
       this.mergedConfig = (this as any)._getMergedSessionConfig(options.initialSessionConfig ?? {});
     }

     sendEvent(event: RealtimeClientMessage) {
       this.events.push(event);
     }

     mute() {}
     close() {}
     interrupt() {}

     get muted() {
       return false;
     }
   }

   describe('Realtime session voice config regression', () => {
     it('drops GA audio formats when top-level voice is present', () => {
       const converted = toNewSessionConfig({
         voice: 'alloy',
         audio: {
           input: { format: TELEPHONY_AUDIO_FORMAT },
           output: { format: TELEPHONY_AUDIO_FORMAT },
         },
       });

       expect(converted.audio?.input?.format).toEqual(TELEPHONY_AUDIO_FORMAT);
       expect(converted.audio?.output?.format).toEqual(TELEPHONY_AUDIO_FORMAT);
     });

     it('resets audio formats when connecting a session for an agent with voice configured', async () => {
       const transport = new CapturingTransport();
       const agent = new RealtimeAgent({
         name: 'voice-agent',
         instructions: 'Respond cheerfully.',
         voice: 'alloy',
       });

       const session = new RealtimeSession(agent, {
         transport,
         model: 'gpt-realtime',
         config: {
           audio: {
             input: { format: TELEPHONY_AUDIO_FORMAT },
             output: {
               format: TELEPHONY_AUDIO_FORMAT,
               voice: 'marin',
             },
           },
         },
       });

       await session.connect({ apiKey: 'dummy-key' });

       expect(transport.mergedConfig?.audio?.input?.format).toEqual(TELEPHONY_AUDIO_FORMAT);
       expect(transport.mergedConfig?.audio?.output?.format).toEqual(TELEPHONY_AUDIO_FORMAT);
     });
   });
   ```

2. Run the test:

   ```bash
   pnpm test -- --run packages/agents-realtime/test/realtimeVoiceConfigRegression.test.ts
   ```

3. Observe the failure:

   ```
   AssertionError: expected { type: 'audio/pcmu' } to deeply equal { type: 'audio/pcm', rate: 24000 }
   ```

### Expected behavior

Configs that are otherwise GA-shaped should remain in GA form when `voice` is present. Only configs containing legacy-only fields (e.g. `inputAudioFormat`) should trigger the legacy conversion path, preserving GA fields like `audio.output.voice`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Realtime session config falls back to legacy format when voice is set #495

Please read this first

Describe the bug

Debug information

Repro steps

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Realtime session config falls back to legacy format when voice is set #495

Description

Please read this first

Describe the bug

Debug information

Repro steps

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions