Skip to content

m1guelpf/swift-realtime-openai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

51 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

A modern Swift SDK for OpenAI's Realtime API

Install Size Swift Version GitHub license

This library provides a simple interface for implementing multi-modal conversations using OpenAI's new Realtime API.

It can handle automatically recording the user's microphone and playing back the assistant's response, and also gives you a transparent layer over the API for advanced use cases.

Installation

Swift Package Manager

The Swift Package Manager allows for developers to easily integrate packages into their Xcode projects and packages; and is also fully integrated into the swift compiler.

SPM Through XCode Project

SPM Through Xcode Package

Once you have your Swift package set up, add the Git link within the dependencies value of your Package.swift file.

dependencies: [
    .package(url: "https://github.com/m1guelpf/swift-realtime-openai.git", .branch("main"))
]

Getting started πŸš€

You can build an iMessage-like app with built-in AI chat in less than 60 lines of code (UI included!):

import OpenAI
import SwiftUI

struct ContentView: View {
	@State private var newMessage: String = ""
	@State private var conversation = Conversation(authToken: OPENAI_KEY)

	var messages: [Item.Message] {
		conversation.entries.compactMap { switch $0 {
			case let .message(message): return message
			default: return nil
		} }
	}

	var body: some View {
		VStack(spacing: 0) {
			ScrollView {
                VStack(spacing: 12) {
                    ForEach(messages, id: \.id) { message in
                        MessageBubble(message: message)
                    }
                }
                .padding()
			}

			HStack(spacing: 12) {
				HStack {
					TextField("Chat", text: $newMessage, onCommit: { sendMessage() })
						.frame(height: 40)
						.submitLabel(.send)

					if newMessage != "" {
						Button(action: sendMessage) {
							Image(systemName: "arrow.up.circle.fill")
								.resizable()
								.aspectRatio(contentMode: .fill)
								.frame(width: 28, height: 28)
								.foregroundStyle(.white, .blue)
						}
					}
				}
				.padding(.leading)
				.padding(.trailing, 6)
				.overlay(RoundedRectangle(cornerRadius: 20).stroke(.quaternary, lineWidth: 1))
			}
			.padding()
		}
		.navigationTitle("Chat")
		.navigationBarTitleDisplayMode(.inline)
		.onAppear { try! conversation.startHandlingVoice() }
	}

	func sendMessage() {
		guard newMessage != "" else { return }

		Task {
			try await conversation.send(from: .user, text: newMessage)
			newMessage = ""
		}
	}
}

Or, if you just want a simple app that lets the user talk and the AI respond:

import OpenAI
import SwiftUI

struct ContentView: View {
	@State private var conversation = Conversation(authToken: OPENAI_KEY)

	var body: some View {
		Text("Say something!")
			.onAppear { try! conversation.startListening() }
	}
}

Features

  • A simple interface for directly interacting with the API
  • Wrap the API in an interface that manages the conversation for you
  • Optionally handle recording the user's mic and sending it to the API
  • Optionally handle playing model responses as they stream in
  • Allow interrupting the model
  • WebRTC support

Architecture

Conversation

The Conversation class provides a high-level interface for managing a conversation with the model. It wraps the RealtimeAPI class and handles the details of sending and receiving messages, as well as managing the conversation history. It can optionally also handle recording the user's mic and sending it to the API, as well as playing model responses as they stream in.

Reading messages

You can access the messages in the conversation through the messages property. Note that this won't include function calls and its responses, only the messages between the user and the model. To access the full conversation history, use the entries property. For example:

ScrollView {
    ScrollViewReader { scrollView in
        VStack(spacing: 12) {
            ForEach(conversation.messages, id: \.id) { message in
                MessageBubble(message: message).id(message.id)
            }
        }
        .onReceive(conversation.messages.publisher) { _ in
            withAnimation { scrollView.scrollTo(conversation.messages.last?.id, anchor: .center) }
        }
    }
}

Customizing the session

You can customize the current session using the setSession(_: Session) or updateSession(withChanges: (inout Session) -> Void) methods. Note that they requires that a session has already been established, so it's recommended you call them from a whenConnected(_: @Sendable () async throws -> Void) callback or await waitForConnection() first. For example:

try await conversation.whenConnected {
    try await conversation.updateSession { session in
        // update system prompt
        session.instructions = "You are a helpful assistant."

        // enable transcription of users' voice messages
        session.inputAudioTranscription = Session.InputAudioTranscription()

        // ...
    }
}

Handling voice conversations

The Conversation class can automatically handle 2-way voice conversations. Calling startListening() will start listening to the user's voice and sending it to the model, and playing back the model's responses. Calling stopListening() will stop listening, but continue playing back responses.

If you just want to play model responses, call startHandlingVoice(). To stop both listening and playing back responses, call stopHandlingVoice().

Manually sending messages

To send a text message, call the send(from: Item.ItemRole, text: String, response: Response.Config? = nil) providing the role of the sender (.user, .assistant, or .system) and the contents of the message. You can optionally also provide a Response.Config object to customize the response, such as enabling or disabling function calls.

To manually send an audio message (or part of one), call the send(audioDelta: Data, commit: Bool = false) with a valid audio chunk. If commit is true, the model will consider the message finished and begin responding to it. Otherwise, it might wait for more audio depending on your Session.turnDetection settings.

Manually sending events

To manually send an event to the API, use the send(event: RealtimeAPI.ClientEvent) method. Note that this bypasses some of the logic in the Conversation class such as handling interrupts, so you should prefer to use other methods whenever possible.

RealtimeAPI

To interact with the API directly, create a new instance of RealtimeAPI providing one of the available connectors. There are helper methods that let you create an instance from an apiKey or a URLRequest, like so:

let api = RealtimeAPI.webSocket(authToken: YOUR_OPENAI_API_KEY, model: String = "gpt-4o-realtime-preview") // or RealtimeAPI.webSocket(connectingTo: URLRequest)
let api = RealtimeAPI.webRTC(authToken: YOUR_OPENAI_API_KEY, model: String = "gpt-4o-realtime-preview") // or RealtimeAPI.webRTC(connectingTo: URLRequest)

You can listen for new events through the events property, like so:

for try await event in api.events {
    switch event {
        case let .sessionCreated(event):
            print(event.session.id)
    }
}

To send an event to the API, call the send method with a ClientEvent instance:

try await api.send(event: .updateSession(session))
try await api.send(event: .appendInputAudioBuffer(encoding: audioData))
try await api.send(event: .createResponse())

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A modern Swift SDK for OpenAI's Realtime API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Languages