Skip to content

Conversation

@jamesrochabrun
Copy link
Owner

Summary

This PR adds comprehensive MCP (Model Context Protocol) server support and image input capabilities to the SwiftOpenAI Realtime API implementation.

Changes Included

1. Image Input Support (ea86a16)

  • Transform Content from struct to enum supporting both text and image inputs
  • Add image case with base64 data URL format (data:image/{format};base64,{bytes})
  • Add flexible Item initializer accepting multiple content types
  • Implement custom Codable encoding for proper serialization

2. MCP Server Configuration (12a8dc2)

  • Create new RealtimeTool enum supporting both function tools and MCP tools
  • Rename Tool struct to FunctionTool to avoid naming conflicts
  • Integrate with Tool.MCPTool from shared Tool enum
  • Enable MCP server configuration in Realtime API sessions

3. MCP Message Types and Error Logging (da4200f)

  • Add three new message cases to OpenAIRealtimeMessage:
    • mcpListToolsInProgress
    • mcpListToolsCompleted([String: Any])
    • mcpListToolsFailed(String?)
  • Implement comprehensive MCP error logging in OpenAIRealtimeSession
  • Extract error details from nested and top-level fields
  • Log full JSON payload for debugging authentication and configuration issues

Files Modified

  • Sources/OpenAI/Public/Parameters/Realtime/OpenAIRealtimeConversationItemCreate.swift
  • Sources/OpenAI/Public/Parameters/Realtime/OpenAIRealtimeSessionConfiguration.swift
  • Sources/OpenAI/Private/Realtime/OpenAIRealtimeSession.swift
  • Sources/OpenAI/Public/ResponseModels/Realtime/OpenAIRealtimeMessage.swift

Testing

These changes have been tested with:

  • MCP server integration (GitHub Copilot, custom servers)
  • Image input with screenshots
  • Error handling for authentication failures
  • Realtime conversation flows with multiple content types

Breaking Changes

  • Tool struct renamed to FunctionTool in OpenAIRealtimeSessionConfiguration
  • This is a minor breaking change for users configuring tools in Realtime sessions

jamesrochabrun and others added 4 commits November 14, 2025 11:20
- Extend Content type to support both text and image inputs
- Add image case with base64 data URL format
- Support data:image/{format};base64,{bytes} format
- Add flexible Item initializer for multi-content messages

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add RealtimeTool enum supporting both function and MCP tools
- Rename Tool to FunctionTool to avoid naming conflicts
- Support Tool.MCPTool from shared Tool enum
- Enable MCP server integration in Realtime sessions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add mcpListToolsInProgress, mcpListToolsCompleted, mcpListToolsFailed message cases
- Implement comprehensive MCP error logging with full JSON payload inspection
- Extract error details from nested and top-level fields (message, code, reason)
- Add debug logging for MCP tool discovery lifecycle

This enables proper MCP (Model Context Protocol) server integration diagnostics
and helps identify authentication and configuration issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@jamesrochabrun jamesrochabrun merged commit 50bedb1 into main Nov 15, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants