Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/dull-beans-arrive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@openai/agents-realtime': patch
---

Improve the types of turnDetection and inputAudioTranscription in RealtimeAgent configuration
4 changes: 2 additions & 2 deletions examples/docs/voice-agents/turnDetection.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ const session = new RealtimeSession(agent, {
turnDetection: {
type: 'semantic_vad',
eagerness: 'medium',
create_response: true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the example but either works!

interrupt_response: true,
createResponse: true,
interruptResponse: true,
},
},
});
41 changes: 39 additions & 2 deletions packages/agents-realtime/src/clientMessages.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,52 @@ export type RealtimeTracingConfig =
}
| 'auto';

export type RealtimeInputAudioTranscriptionConfig = {
language?: string;
model?:
| 'gpt-4o-transcribe'
| 'gpt-4o-mini-transcribe'
| 'whisper-1'
| (string & {});
prompt?: string;
};

export type RealtimeTurnDetectionConfigAsIs = {
type?: 'semantic_vad' | 'server_vad';
create_response?: boolean;
eagerness?: 'auto' | 'low' | 'medium' | 'high';
interrupt_response?: boolean;
prefix_padding_ms?: number;
silence_duration_ms?: number;
threshold?: number;
};

// The Realtime API accepts snake_cased keys, so when using this, this SDK coverts the keys to snake_case ones before passing it to the API
export type RealtimeTurnDetectionConfigCamelCase = {
type?: 'semantic_vad' | 'server_vad';
createResponse?: boolean;
eagerness?: 'auto' | 'low' | 'medium' | 'high';
interruptResponse?: boolean;
prefixPaddingMs?: number;
silenceDurationMs?: number;
threshold?: number;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it acceptable for this to also still take other properties inside of these two settings? Thinking how theoretically you could roll your own Realtime Transport Layer right now with other session config. But also fine to guide people to providerData for that and override this entire property

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated


export type RealtimeTurnDetectionConfig = (
| RealtimeTurnDetectionConfigAsIs
| RealtimeTurnDetectionConfigCamelCase
) &
Record<string, any>;

export type RealtimeSessionConfig = {
model: string;
instructions: string;
modalities: ('text' | 'audio')[];
voice: string;
inputAudioFormat: RealtimeAudioFormat;
outputAudioFormat: RealtimeAudioFormat;
inputAudioTranscription: Record<string, any>;
turnDetection: Record<string, any>;
inputAudioTranscription: RealtimeInputAudioTranscriptionConfig;
turnDetection: RealtimeTurnDetectionConfig;
toolChoice: ModelSettingsToolChoice;
tools: FunctionToolDefinition[];
tracing?: RealtimeTracingConfig | null;
Expand Down
46 changes: 45 additions & 1 deletion packages/agents-realtime/src/openaiRealtimeBase.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ import {
RealtimeClientMessage,
RealtimeSessionConfig,
RealtimeTracingConfig,
RealtimeTurnDetectionConfig,
RealtimeTurnDetectionConfigAsIs,
RealtimeUserInput,
} from './clientMessages';
import {
Expand Down Expand Up @@ -390,7 +392,7 @@ export abstract class OpenAIRealtimeBase
config.inputAudioTranscription ??
DEFAULT_OPENAI_REALTIME_SESSION_CONFIG.inputAudioTranscription,
turn_detection:
config.turnDetection ??
OpenAIRealtimeBase.buildTurnDetectionConfig(config.turnDetection) ??
DEFAULT_OPENAI_REALTIME_SESSION_CONFIG.turnDetection,
tool_choice:
config.toolChoice ?? DEFAULT_OPENAI_REALTIME_SESSION_CONFIG.toolChoice,
Expand All @@ -406,6 +408,48 @@ export abstract class OpenAIRealtimeBase
return sessionData;
}

private static buildTurnDetectionConfig(
c: RealtimeTurnDetectionConfig | undefined,
): RealtimeTurnDetectionConfigAsIs | undefined {
if (typeof c === 'undefined') {
return undefined;
}
const {
type,
createResponse,
create_response,
eagerness,
interruptResponse,
interrupt_response,
prefixPaddingMs,
prefix_padding_ms,
silenceDurationMs,
silence_duration_ms,
threshold,
...rest
} = c;

const config: RealtimeTurnDetectionConfigAsIs & Record<string, any> = {
type,
create_response: createResponse ? createResponse : create_response,
eagerness,
interrupt_response: interruptResponse
? interruptResponse
: interrupt_response,
prefix_padding_ms: prefixPaddingMs ? prefixPaddingMs : prefix_padding_ms,
silence_duration_ms: silenceDurationMs
? silenceDurationMs
: silence_duration_ms,
threshold,
...rest,
};
// Remove undefined values from the config
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I verified the behavior, having undefined values could affect the connection establishment, so I added this logic. but if my observation is wrong or is missing something, please feel free to adjust this part.

Object.keys(config).forEach((key) => {
if (config[key] === undefined) delete config[key];
});
return Object.keys(config).length > 0 ? config : undefined;
}

/**
* Sets the internal tracing config. This is used to track the tracing config that has been set
* during the session.create event.
Expand Down
2 changes: 1 addition & 1 deletion packages/agents-realtime/src/realtimeSession.ts
Original file line number Diff line number Diff line change
Expand Up @@ -519,7 +519,7 @@ export class RealtimeSession<
this.#transport.on('turn_done', (event) => {
const item = event.response.output[event.response.output.length - 1];
const textOutput = getLastTextFromAudioOutputMessage(item) ?? '';
const itemId = item.id ?? '';
const itemId = item?.id ?? '';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unrelated existing bug i found while doing tests

this.emit('agent_end', this.#context, this.#currentAgent, textOutput);
this.#currentAgent.emit('agent_end', this.#context, textOutput);

Expand Down