Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changeset/dull-ligers-bow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
'firebase': minor
'@firebase/ai': minor
---

Add `sendTextRealtime()`, `sendAudioReatime()`, and `sendVideoRealtime()` to the `LiveSession` class, and deprecate `sendMediaChunks()` and `sendMediaStream()`.
19 changes: 19 additions & 0 deletions common/api-review/ai.api.md
Original file line number Diff line number Diff line change
Expand Up @@ -991,12 +991,18 @@ export class LiveSession {
constructor(webSocketHandler: WebSocketHandler, serverMessages: AsyncGenerator<unknown>);
close(): Promise<void>;
inConversation: boolean;
inVideoRecording: boolean;
isClosed: boolean;
receive(): AsyncGenerator<LiveServerContent | LiveServerToolCall | LiveServerToolCallCancellation>;
send(request: string | Array<string | Part>, turnComplete?: boolean): Promise<void>;
sendAudioRealtime(blob: GenerativeContentBlob): Promise<void>;
sendFunctionResponses(functionResponses: FunctionResponse[]): Promise<void>;
// @deprecated
sendMediaChunks(mediaChunks: GenerativeContentBlob[]): Promise<void>;
// @deprecated (undocumented)
sendMediaStream(mediaChunkStream: ReadableStream<GenerativeContentBlob>): Promise<void>;
sendTextRealtime(text: string): Promise<void>;
sendVideoRealtime(blob: GenerativeContentBlob): Promise<void>;
}

// @public
Expand Down Expand Up @@ -1279,6 +1285,14 @@ export interface StartChatParams extends BaseParams {
tools?: Tool[];
}

// @beta
export function startVideoRecording(liveSession: LiveSession, options?: StartVideoRecordingOptions): Promise<VideoRecordingController>;

// @beta
export interface StartVideoRecordingOptions {
videoSource?: 'camera' | 'screen';
}

// @public
export class StringSchema extends Schema {
constructor(schemaParams?: SchemaParams, enumValues?: string[]);
Expand Down Expand Up @@ -1390,6 +1404,11 @@ export interface VideoMetadata {
startOffset: string;
}

// @beta
export interface VideoRecordingController {
stop: () => Promise<void>;
}

// @beta
export interface VoiceConfig {
prebuiltVoiceConfig?: PrebuiltVoiceConfig;
Expand Down
4 changes: 4 additions & 0 deletions docs-devsite/_toc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,8 @@ toc:
path: /docs/reference/js/ai.startaudioconversationoptions.md
- title: StartChatParams
path: /docs/reference/js/ai.startchatparams.md
- title: StartVideoRecordingOptions
path: /docs/reference/js/ai.startvideorecordingoptions.md
- title: StringSchema
path: /docs/reference/js/ai.stringschema.md
- title: TextPart
Expand All @@ -216,6 +218,8 @@ toc:
path: /docs/reference/js/ai.vertexaibackend.md
- title: VideoMetadata
path: /docs/reference/js/ai.videometadata.md
- title: VideoRecordingController
path: /docs/reference/js/ai.videorecordingcontroller.md
- title: VoiceConfig
path: /docs/reference/js/ai.voiceconfig.md
- title: WebAttribution
Expand Down
148 changes: 144 additions & 4 deletions docs-devsite/ai.livesession.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ export declare class LiveSession

| Property | Modifiers | Type | Description |
| --- | --- | --- | --- |
| [inConversation](./ai.livesession.md#livesessioninconversation) | | boolean | <b><i>(Public Preview)</i></b> Indicates whether this Live session is being controlled by an <code>AudioConversationController</code>. |
| [inConversation](./ai.livesession.md#livesessioninconversation) | | boolean | <b><i>(Public Preview)</i></b> Indicates whether this Live session is being controlled by a [AudioConversationController](./ai.audioconversationcontroller.md#audioconversationcontroller_interface)<!-- -->. |
| [inVideoRecording](./ai.livesession.md#livesessioninvideorecording) | | boolean | <b><i>(Public Preview)</i></b> Indicates whether this Live session is being controlled by a [VideoRecordingController](./ai.videorecordingcontroller.md#videorecordingcontroller_interface)<!-- -->. |
| [isClosed](./ai.livesession.md#livesessionisclosed) | | boolean | <b><i>(Public Preview)</i></b> Indicates whether this Live session is closed. |

## Methods
Expand All @@ -39,23 +40,39 @@ export declare class LiveSession
| [close()](./ai.livesession.md#livesessionclose) | | <b><i>(Public Preview)</i></b> Closes this session. All methods on this session will throw an error once this resolves. |
| [receive()](./ai.livesession.md#livesessionreceive) | | <b><i>(Public Preview)</i></b> Yields messages received from the server. This can only be used by one consumer at a time. |
| [send(request, turnComplete)](./ai.livesession.md#livesessionsend) | | <b><i>(Public Preview)</i></b> Sends content to the server. |
| [sendAudioRealtime(blob)](./ai.livesession.md#livesessionsendaudiorealtime) | | <b><i>(Public Preview)</i></b> Sends audio data to the server in realtime. |
| [sendFunctionResponses(functionResponses)](./ai.livesession.md#livesessionsendfunctionresponses) | | <b><i>(Public Preview)</i></b> Sends function responses to the server. |
| [sendMediaChunks(mediaChunks)](./ai.livesession.md#livesessionsendmediachunks) | | <b><i>(Public Preview)</i></b> Sends realtime input to the server. |
| [sendMediaStream(mediaChunkStream)](./ai.livesession.md#livesessionsendmediastream) | | <b><i>(Public Preview)</i></b> Sends a stream of [GenerativeContentBlob](./ai.generativecontentblob.md#generativecontentblob_interface)<!-- -->. |
| [sendMediaStream(mediaChunkStream)](./ai.livesession.md#livesessionsendmediastream) | | <b><i>(Public Preview)</i></b> |
| [sendTextRealtime(text)](./ai.livesession.md#livesessionsendtextrealtime) | | <b><i>(Public Preview)</i></b> Sends text to the server in realtime. |
| [sendVideoRealtime(blob)](./ai.livesession.md#livesessionsendvideorealtime) | | <b><i>(Public Preview)</i></b> Sends video data to the server in realtime. |

## LiveSession.inConversation

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

Indicates whether this Live session is being controlled by an `AudioConversationController`<!-- -->.
Indicates whether this Live session is being controlled by a [AudioConversationController](./ai.audioconversationcontroller.md#audioconversationcontroller_interface)<!-- -->.

<b>Signature:</b>

```typescript
inConversation: boolean;
```

## LiveSession.inVideoRecording

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

Indicates whether this Live session is being controlled by a [VideoRecordingController](./ai.videorecordingcontroller.md#videorecordingcontroller_interface)<!-- -->.

<b>Signature:</b>

```typescript
inVideoRecording: boolean;
```

## LiveSession.isClosed

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
Expand Down Expand Up @@ -135,6 +152,45 @@ Promise&lt;void&gt;

If this session has been closed.

## LiveSession.sendAudioRealtime()

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

Sends audio data to the server in realtime.

The server requires that the audio data is base64-encoded 16-bit PCM at 16kHz little-endian.

<b>Signature:</b>

```typescript
sendAudioRealtime(blob: GenerativeContentBlob): Promise<void>;
```

#### Parameters

| Parameter | Type | Description |
| --- | --- | --- |
| blob | [GenerativeContentBlob](./ai.generativecontentblob.md#generativecontentblob_interface) | The base64-encoded PCM data to send to the server in realtime. |

<b>Returns:</b>

Promise&lt;void&gt;

#### Exceptions

If this session has been closed.

### Example


```javascript
// const pcmData = ... base64-encoded 16-bit PCM at 16kHz little-endian.
const blob = { mimeType: "audio/pcm", data: pcmData };
liveSession.sendAudioRealtime(blob);

```

## LiveSession.sendFunctionResponses()

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
Expand Down Expand Up @@ -167,6 +223,11 @@ If this session has been closed.
> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

> Warning: This API is now obsolete.
>
> Use `sendTextRealtime()`<!-- -->, `sendAudioRealtime()`<!-- -->, and `sendVideoRealtime()` instead.
>

Sends realtime input to the server.

<b>Signature:</b>
Expand Down Expand Up @@ -194,7 +255,12 @@ If this session has been closed.
> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

Sends a stream of [GenerativeContentBlob](./ai.generativecontentblob.md#generativecontentblob_interface)<!-- -->.
> Warning: This API is now obsolete.
>
> Use `sendTextRealtime()`<!-- -->, `sendAudioRealtime()`<!-- -->, and `sendVideoRealtime()` instead.
>
> Sends a stream of [GenerativeContentBlob](./ai.generativecontentblob.md#generativecontentblob_interface)<!-- -->.
>

<b>Signature:</b>

Expand All @@ -216,3 +282,77 @@ Promise&lt;void&gt;

If this session has been closed.

## LiveSession.sendTextRealtime()

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

Sends text to the server in realtime.

<b>Signature:</b>

```typescript
sendTextRealtime(text: string): Promise<void>;
```

#### Parameters

| Parameter | Type | Description |
| --- | --- | --- |
| text | string | The text data to send. |

<b>Returns:</b>

Promise&lt;void&gt;

#### Exceptions

If this session has been closed.

### Example


```javascript
liveSession.sendTextRealtime("Hello, how are you?");

```

## LiveSession.sendVideoRealtime()

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

Sends video data to the server in realtime.

The server requires that the video is sent as individual video frames at 1 FPS. It is recommended to set `mimeType` to `image/jpeg`<!-- -->.

<b>Signature:</b>

```typescript
sendVideoRealtime(blob: GenerativeContentBlob): Promise<void>;
```

#### Parameters

| Parameter | Type | Description |
| --- | --- | --- |
| blob | [GenerativeContentBlob](./ai.generativecontentblob.md#generativecontentblob_interface) | The base64-encoded video data to send to the server in realtime. |

<b>Returns:</b>

Promise&lt;void&gt;

#### Exceptions

If this session has been closed.

### Example


```javascript
// const videoFrame = ... base64-encoded JPEG data
const blob = { mimeType: "image/jpeg", data: videoFrame };
liveSession.sendVideoRealtime(blob);

```

63 changes: 63 additions & 0 deletions docs-devsite/ai.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ The Firebase AI Web SDK.
| [getLiveGenerativeModel(ai, modelParams)](./ai.md#getlivegenerativemodel_f2099ac) | <b><i>(Public Preview)</i></b> Returns a [LiveGenerativeModel](./ai.livegenerativemodel.md#livegenerativemodel_class) class for real-time, bidirectional communication.<!-- -->The Live API is only supported in modern browser windows and Node &gt;<!-- -->= 22. |
| <b>function(liveSession, ...)</b> |
| [startAudioConversation(liveSession, options)](./ai.md#startaudioconversation_01c8e7f) | <b><i>(Public Preview)</i></b> Starts a real-time, bidirectional audio conversation with the model. This helper function manages the complexities of microphone access, audio recording, playback, and interruptions. |
| [startVideoRecording(liveSession, options)](./ai.md#startvideorecording_762a78a) | <b><i>(Public Preview)</i></b> Starts a real-time, unidirectional video stream to the model. This helper function manages the complexities of video source access, frame capture, and encoding. |

## Classes

Expand Down Expand Up @@ -131,6 +132,7 @@ The Firebase AI Web SDK.
| [SpeechConfig](./ai.speechconfig.md#speechconfig_interface) | <b><i>(Public Preview)</i></b> Configures speech synthesis. |
| [StartAudioConversationOptions](./ai.startaudioconversationoptions.md#startaudioconversationoptions_interface) | <b><i>(Public Preview)</i></b> Options for [startAudioConversation()](./ai.md#startaudioconversation_01c8e7f)<!-- -->. |
| [StartChatParams](./ai.startchatparams.md#startchatparams_interface) | Params for [GenerativeModel.startChat()](./ai.generativemodel.md#generativemodelstartchat)<!-- -->. |
| [StartVideoRecordingOptions](./ai.startvideorecordingoptions.md#startvideorecordingoptions_interface) | <b><i>(Public Preview)</i></b> Options for <code>startVideoRecording</code>. |
| [TextPart](./ai.textpart.md#textpart_interface) | Content part interface if the part represents a text string. |
| [ThinkingConfig](./ai.thinkingconfig.md#thinkingconfig_interface) | Configuration for "thinking" behavior of compatible Gemini models.<!-- -->Certain models utilize a thinking process before generating a response. This allows them to reason through complex problems and plan a more coherent and accurate answer. |
| [ToolConfig](./ai.toolconfig.md#toolconfig_interface) | Tool config. This config is shared for all tools provided in the request. |
Expand All @@ -140,6 +142,7 @@ The Firebase AI Web SDK.
| [URLMetadata](./ai.urlmetadata.md#urlmetadata_interface) | <b><i>(Public Preview)</i></b> Metadata for a single URL retrieved by the [URLContextTool](./ai.urlcontexttool.md#urlcontexttool_interface) tool. |
| [UsageMetadata](./ai.usagemetadata.md#usagemetadata_interface) | Usage metadata about a [GenerateContentResponse](./ai.generatecontentresponse.md#generatecontentresponse_interface)<!-- -->. |
| [VideoMetadata](./ai.videometadata.md#videometadata_interface) | Describes the input video content. |
| [VideoRecordingController](./ai.videorecordingcontroller.md#videorecordingcontroller_interface) | <b><i>(Public Preview)</i></b> A controller for managing an active video recording session. |
| [VoiceConfig](./ai.voiceconfig.md#voiceconfig_interface) | <b><i>(Public Preview)</i></b> Configuration for the voice to used in speech synthesis. |
| [WebAttribution](./ai.webattribution.md#webattribution_interface) | |
| [WebGroundingChunk](./ai.webgroundingchunk.md#webgroundingchunk_interface) | A grounding chunk from the web.<!-- -->Important: If using Grounding with Google Search, you are required to comply with the [Service Specific Terms](https://cloud.google.com/terms/service-terms) for "Grounding with Google Search". |
Expand Down Expand Up @@ -410,6 +413,66 @@ async function startConversation() {

```

### startVideoRecording(liveSession, options) {:#startvideorecording_762a78a}

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

Starts a real-time, unidirectional video stream to the model. This helper function manages the complexities of video source access, frame capture, and encoding.

Important: This function must be called in response to a user gesture (e.g., a button click) to comply with browser security policies for accessing camera or screen content. The backend requires video frames to be sent at 1 FPS as individual JPEGs. This helper enforces that constraint.

<b>Signature:</b>

```typescript
export declare function startVideoRecording(liveSession: LiveSession, options?: StartVideoRecordingOptions): Promise<VideoRecordingController>;
```

#### Parameters

| Parameter | Type | Description |
| --- | --- | --- |
| liveSession | [LiveSession](./ai.livesession.md#livesession_class) | An active [LiveSession](./ai.livesession.md#livesession_class) instance. |
| options | [StartVideoRecordingOptions](./ai.startvideorecordingoptions.md#startvideorecordingoptions_interface) | Configuration options for the video recording. |

<b>Returns:</b>

Promise&lt;[VideoRecordingController](./ai.videorecordingcontroller.md#videorecordingcontroller_interface)<!-- -->&gt;

A `Promise` that resolves with a `VideoRecordingController`<!-- -->.

#### Exceptions

`AIError` if the environment is unsupported, a recording is active, or the session is closed.

`DOMException` if issues occur with media access (e.g., permissions denied).

### Example


```javascript
const liveSession = await model.connect();
let videoController;

// This function must be called from within a click handler.
async function startRecording() {
try {
videoController = await startVideoRecording(liveSession, {
videoSource: 'screen' // or 'camera'
});
} catch (e) {
// Handle AI-specific errors, DOMExceptions for permissions, etc.
console.error("Failed to start video recording:", e);
}
}

// To stop the recording later:
// if (videoController) {
// await videoController.stop();
// }

```

## AIErrorCode

Standardized error codes that [AIError](./ai.aierror.md#aierror_class) can have.
Expand Down
Loading
Loading