# üéôÔ∏è Context Summarization with Realtime API

Build an end‚Äëto‚Äëend **voice bot** that‚ÄØlistens to your mic, speaks back in real time and **summarises long conversations** so quality never drops.

---

## üèÉ‚Äç‚ôÇÔ∏è What You‚Äôll Build
1. **Live microphone streaming** ‚Üí OpenAI *Realtime* (voice‚Äëto‚Äëvoice) endpoint.
2. **Instant transcripts & speech playback** on every turn.
3. **Conversation state container** that stores **every** user/assistant message.
4. **Automatic ‚Äúcontext trim‚Äù** ‚Äì when the token window nears 32‚ÄØk, older turns are compressed into a summary.
5. **Extensible design** you can adapt to support customer‚Äësupport bots, kiosks, or multilingual assistants.

---

## üéØ Learning Objectives
By the end of this notebook you can:

| Skill | Why it matters |
|-------|----------------|
| Capture audio with `sounddevice` | Low‚Äëlatency input is critical for natural UX |
| Use WebSockets with the OpenAI **Realtime** API | Streams beats polling for speed & simplicity |
| Track token usage and detect when to summarize context | Prevents quality loss in long chats |
| Summarise & prune history on‚Äëthe‚Äëfly | Keeps conversations coherent without manual resets |

---

## üîß Prerequisites

| Requirement | Details |
|-------------|---------|
| **Python¬†‚â•‚ÄØ3.10** | Will ensure that you don't hit any issues |
| **OpenAI API key** | Set `OPENAI_API_KEY` in your shell or paste inline (*not ideal for prod*) |
| Mic¬†+¬†speakers | Grant OS permission if prompted |


**Need help setting up the key?**  
> Follow the [official quick‚Äëstart guide](https://platform.openai.com/docs/quickstart#step-2-set-your-api-key).


*Notes:*
1. Why 32‚ÄØk? OpenAI‚Äôs public guidance notes that quality begins to decline well before the full 128‚ÄØk token limit; 32‚ÄØk is a conservative threshold observed in practice.

2. Token window‚ÄØ=‚ÄØall tokens (words and audio tokens) the model currently keeps in memory for the session.x
---

### üöÄ One‚Äëliner install (run in a fresh cell)

In [None]:
#¬†Run¬†once¬†to¬†install¬†or¬†upgrade¬†dependencies (comment out if already installed)
# !pip install --upgrade openai websockets sounddevice simpleaudio

In [1]:
#¬†Essential imports & constants
import os, asyncio, base64, json, sys, itertools
from dataclasses import dataclass, field
from typing import List, Literal

import asyncio, base64, io, json, os, sys, wave, pathlib
from typing import List

import numpy as np, soundfile as sf, resampy, websockets, openai

import sounddevice as sd         # microphone capture
import simpleaudio               # speaker playback
import websockets                # WebSocket client
import openai                    # OpenAI¬†Python¬†SDK >=¬†1.14.0

#¬†Audio/config¬†knobs
SAMPLE_RATE_HZ    = 24_000   #¬†Required by pcm16
CHUNK_DURATION_MS = 40       #¬†‚âà¬†latency granularity
BYTES_PER_SAMPLE  = 2        #¬†pcm16 = 2¬†bytes/sample
SUMMARY_TRIGGER   = 2_000    #¬†Summarise when context¬†‚â•¬†this
KEEP_LAST_TURNS   = 4        #¬†Keep these turns verbatim
SUMMARY_MODEL     = "gpt-4o-mini"  #¬†Cheaper, fast summariser

In [2]:
#¬†Set¬†your¬†API¬†key¬†safely
openai.api_key = os.getenv("OPENAI_API_KEY", "")
if not openai.api_key:
    raise ValueError("OPENAI_API_KEY not found ‚Äì please set env var or edit this cell.")

## 2‚ÄØ¬∑‚ÄØKey Concepts Behind the Realtime‚ÄØVoice‚ÄØAPI

This section gives you the mental model you‚Äôll need before diving into code.  Skim it now; refer back whenever something in the notebook feels ‚Äúmagic‚Äù.

---

### 2.1¬†Realtime¬†vs¬†Chat¬†Completions¬†‚Äî¬†Why WebSockets?

|  | **Chat‚ÄØCompletions¬†(HTTP)** | **Realtime¬†(WebSocket)** |
|---|---|---|
| Transport | Stateless request‚ÄØ‚Üí‚ÄØresponse | Persistent, bi‚Äëdirectional socket |
| Best for | Plain text or batched jobs | *Live* audio + incremental text |
| Latency model | 1‚ÄØRTT per message | Sub‚Äë200‚ÄØms deltas during one open session |
| Event types | *None* (single JSON) | `session.*`, `input_audio_buffer.append`, `response.*`, ‚Ä¶ |


**Flow**: you talk ‚ñ∏ server transcribes ‚ñ∏ assistant replies ‚ñ∏ you talk again.  
> Mirrors natural conversation while keeping event handling simple.

---

### 2.2¬†Audio Encoding Fundamentals

| Parameter | Value | Why it matters |
|-----------|-------|----------------|
| **Format** | PCM‚Äë16 (signed‚ÄØ16‚Äëbit) | Widely supported; no compression delay |
| **Sample¬†rate** | 24‚ÄØkHz | Required by Realtime endpoint |
| **Chunk size** | ‚âà‚ÄØ40‚ÄØms | Lower chunk‚ÄØ‚Üí‚ÄØsnappier response¬†‚Üî¬†higher packet overhead |

`chunk_bytes  = sample_rate * bytes_per_sample * chunk_duration_s`

---

### 2.3¬†Token Context Windows

* GPT‚Äë4o‚ÄØRealtime accepts **up to‚ÄØ128‚ÄØK tokens** in theory.  
* In practice, answer quality starts to drift around **‚âà‚ÄØ32‚ÄØK tokens**.  
* Every user/assistant turn consumes tokens ‚Üí the window **only grows**.
* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.

---

### 2.4¬†Conversation State

Instead of scattered globals, the notebook uses with one **state object**:

In [3]:
@dataclass
class Turn:
    """One utterance in the dialogue (user **or** assistant)."""
    role: Literal["user", "assistant"]
    item_id: str                    # Server‚Äëassigned identifier
    text: str | None = None         # Filled once transcript is ready

@dataclass
class ConversationState:
    """All mutable data the session needs ‚Äî nothing more, nothing less."""
    history: List[Turn] = field(default_factory=list)         # Ordered log
    waiting: dict[str, asyncio.Future] = field(default_factory=dict)  # Pending transcript fetches
    summary_count: int = 0

    latest_tokens: int = 0          # Window size after last reply
    summarising: bool = False       # Guard so we don‚Äôt run two summaries at once

A quick helper to peek at the transcript:

In [4]:
def print_history(state) -> None:
    """Pretty-print the running transcript so far."""
    print("‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî")
    for turn in state.history:
        text_preview = (turn.text or "").strip().replace("\n", " ")
        print(f"[{turn.role:<9}] {text_preview}  ({turn.item_id})")
    print("‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî")

## 3‚ÄØ¬∑‚ÄØToken‚ÄØUtilisation¬†‚Äì‚ÄØText¬†vs¬†Voice

Large‚Äëtoken windows are precious: every extra token you burn costs latency‚ÄØ+‚ÄØmoney.  
For **audio** the bill climbs much faster than for plain text because amplitude, timing, and other acoustic details must be represented.

*Rule of thumb*: **1 word of text ‚âà‚ÄØ1‚ÄØtoken**, but **1‚ÄØsecond of 24‚ÄëkHz PCM‚Äë16 ‚âà‚ÄØ~150‚ÄØaudio tokens**.  
In practice you‚Äôll often see **‚âà‚ÄØ10‚ÄØ√ó** more tokens for the *same* sentence spoken aloud than typed.

---

### 3.1¬†Hands‚Äëon¬†comparison¬†üìä

The cells below:

1. **Sends `TEXT` to Chat‚ÄØCompletions** ‚Üí reads `prompt_tokens`.  
2. **Turns the same `TEXT` into speech** with TTS.  
3. **Feeds the speech back into the Realtime API Transcription endpoint** ‚Üí reads `audio input tokens`.  
4. Prints a ratio so you can see the multiplier on *your* hardware / account.

In [20]:
# ‚ïî‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïó
# ‚ïë 3 ¬∑ Token Utilisation ‚Äì Text‚ÄØvs‚ÄØVoice                            ‚ïë
# ‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù

TEXT = (
    "Hello there, I am measuring tokens for text versus voice because we want to better compare the number of tokens used when sending a message as text versus when converting it to speech.."
)
STT_MODEL   = "gpt-4o-transcribe"
TTS_MODEL   = "gpt-4o-mini-tts"
RT_MODEL    = "gpt-4o-realtime-preview"          # S2S model
VOICE       = "shimmer"

TARGET_SR   = 24_000
PCM_SCALE   = 32_767
CHUNK_MS    = 120                                # stream step


HEADERS = {
    "Authorization": f"Bearer {openai.api_key}",
    "OpenAI-Beta":   "realtime=v1",
}

show = lambda l, v: print(f"{l:<28}: {v}")

# ‚îÄ‚îÄ‚îÄ Helpers ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
def float_to_pcm16(x: np.ndarray) -> bytes:
    return (np.clip(x, -1, 1) * PCM_SCALE).astype("<i2").tobytes()

def chunk_pcm(pcm: bytes, ms: int = CHUNK_MS) -> List[bytes]:
    step = TARGET_SR * 2 * ms // 1000
    return [pcm[i:i + step] for i in range(0, len(pcm), step)]

# ‚îÄ‚îÄ‚îÄ 1 ¬∑ Count text tokens ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
chat = openai.chat.completions.create(
    model=CHAT_MODEL,
    messages=[{"role": "user", "content": TEXT}],
    max_tokens=1,
    temperature=0,
)
text_tokens = chat.usage.prompt_tokens
show("üìÑ Text prompt tokens", text_tokens)

# ‚îÄ‚îÄ‚îÄ 2 ¬∑ Synthesis to WAV & PCM16 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
wav_bytes = openai.audio.speech.create(
    model=TTS_MODEL, input=TEXT, voice=VOICE, response_format="wav"
).content

with wave.open(io.BytesIO(wav_bytes)) as w:
    pcm_bytes = w.readframes(w.getnframes())
duration_sec = len(pcm_bytes) / (2 * TARGET_SR)
show("üîä Audio length (s)", f"{duration_sec:.2f}")

üìÑ Text prompt tokens        : 42
üîä Audio length (s)          : 10.75


In [21]:
# ‚îÄ‚îÄ‚îÄ 3 ¬∑ Realtime streaming & token harvest ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
async def count_audio_tokens(pcm: bytes) -> int:
    url = f"wss://api.openai.com/v1/realtime?model={RT_MODEL}"
    chunks = chunk_pcm(pcm)

    async with websockets.connect(url, extra_headers=HEADERS,
                                  max_size=1 << 24) as ws:

        # Wait for session.created
        while json.loads(await ws.recv())["type"] != "session.created":
            pass

        # Configure modalities + voice
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "modalities": ["audio", "text"],
                "voice": VOICE,
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "input_audio_transcription": {"model": STT_MODEL},
            }
        }))

        # Stream user audio chunks (no manual commit; server VAD handles it)
        for c in chunks:
            await ws.send(json.dumps({
                "type": "input_audio_buffer.append",
                "audio": base64.b64encode(c).decode(),
            }))

        async for raw in ws:
            ev = json.loads(raw)
            t = ev.get("type")

            if t == "response.done":
                return ev["response"]["usage"]\
                         ["input_token_details"]["audio_tokens"]

audio_tokens = await count_audio_tokens(pcm_bytes)
show("üé§ Audio input tokens", audio_tokens)

# ‚îÄ‚îÄ‚îÄ 4 ¬∑ Comparison ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
ratio = audio_tokens / text_tokens if text_tokens else float("inf")
show("‚öñÔ∏è  Audio/Text ratio", f"{ratio:.1f}√ó")
print(f"\n‚âà{int(audio_tokens/duration_sec)} audio‚Äëtokens‚ÄØ/‚ÄØsec vs ‚âà1 token‚ÄØ/‚ÄØword.")

üé§ Audio input tokens        : 105
‚öñÔ∏è  Audio/Text ratio        : 2.5√ó

‚âà9 audio‚Äëtokens‚ÄØ/‚ÄØsec vs ‚âà1 token‚ÄØ/‚ÄØword.


This toy example uses a short input, but as transcripts get longer, the difference between text token count and voice token count grows substantially.


---

## 3‚ÄØ¬∑‚ÄØStreaming Audio
We‚Äôll stream raw PCM‚Äë16 microphone data straight into the Realtime API.

The pipeline is: mic ‚îÄ‚ñ∫ async.Queue ‚îÄ‚ñ∫ WebSocket ‚îÄ‚ñ∫ Realtime API

### 3.1¬†Capture Microphone Input
We‚Äôll start with a coroutine that:

* Opens the default mic at **24‚ÄØkHz, mono, PCM‚Äë16** (one of the [format](https://platform.openai.com/docs/api-reference/realtime-sessions/create#realtime-sessions-create-input_audio_format) Realtime accepts).  
* Slices the stream into **‚âà‚ÄØ40‚ÄØms** blocks.  
* Dumps each block into an `asyncio.Queue` so another task (next section) can forward it to OpenAI.


In [22]:
# ‚ïî‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïó
# ‚ïë 3.1 ¬∑ Microphone ‚Üí async.Queue                                   ‚ïë
# ‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù

import asyncio, sys
import sounddevice as sd

# ‚îÄ‚îÄ Audio constants (match Realtime requirements) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
SAMPLE_RATE_HZ     = 24_000        # 24‚ÄëkHz mono
CHUNK_DURATION_MS  = 40            # ‚âà40‚Äëms frames
QUEUE_MAXSIZE      = 32            # Back‚Äëpressure buffer

async def mic_to_queue(pcm_queue: asyncio.Queue[bytes]) -> None:
    """
    Capture raw PCM‚Äë16 microphone audio and push ~CHUNK_DURATION_MS chunks
    to *pcm_queue* until the surrounding task is cancelled.

    Parameters
    ----------
    pcm_queue : asyncio.Queue[bytes]
        Destination queue for PCM‚Äë16 frames (little‚Äëendian int16).
    """
    blocksize = int(SAMPLE_RATE_HZ * CHUNK_DURATION_MS / 1000)

    def _callback(indata, _frames, _time, status):
        if status:                               # XRuns, device changes, etc.
            print("‚ö†Ô∏è", status, file=sys.stderr)
        try:
            pcm_queue.put_nowait(bytes(indata))  # 1‚Äëshot enqueue
        except asyncio.QueueFull:
            # Drop frame if upstream (WebSocket) can‚Äôt keep up.
            pass

    # RawInputStream is synchronous; wrap in context manager to auto‚Äëclose.
    with sd.RawInputStream(
        samplerate=SAMPLE_RATE_HZ,
        blocksize=blocksize,
        dtype="int16",
        channels=1,
        callback=_callback,
    ):
        try:
            # Keep coroutine alive until cancelled by caller.
            await asyncio.Event().wait()
        finally:
            print("‚èπÔ∏è  Mic stream closed.")

### 3.2¬†Send Audio Chunks to the API

Our mic task is now filling an `asyncio.Queue` with raw PCM‚Äë16 blocks.  
Next step: pull chunks off that queue, **base‚Äë64‚ÄØencode** them (the protocol requires JSON‚Äësafe text), and ship each block to the Realtime WebSocket as an `input_audio_buffer.append` event.


In [5]:
b64 = lambda blob: base64.b64encode(blob).decode()

async def queue_to_websocket(pcm_queue: asyncio.Queue[bytes], ws):
    """Read audio chunks from queue and send as JSON events."""
    try:
        while (chunk := await pcm_queue.get()) is not None:
            await ws.send(json.dumps({
                "type": "input_audio_buffer.append",
                "audio": b64(chunk),
            }))
    except websockets.ConnectionClosed:
        print("WebSocket closed ‚Äì stopping uploader")

### 3.3¬†Handle Incoming Events 
Once audio reaches the server, the Realtime API pushes a stream of JSON events back over the **same** WebSocket.  
Understanding these events is critical for:

* Printing live transcripts  
* Playing incremental audio back to the user  
* Keeping an accurate `ConversationState` so context trimming works later  

| Event¬†type | Typical timing | What you should do with it |
|------------|----------------|----------------------------|
| **`session.created`** | Immediately after connection | Verify the handshake; stash the `session_id` if you need it for server logs. |
| **`conversation.item.created`** (user) | Right after the user stops talking | Place a *placeholder* `Turn` in `state.history`. Transcript may still be `null`. |
| **`conversation.item.retrieved`** | A few hundred‚ÄØms later | Fill in any missing user transcript once STT completes. |
| **`response.audio.delta`** | Streaming chunks while the assistant speaks | Append bytes to a local buffer, play them (low‚Äëlatency) as they arrive. |
| **`response.done`** | After final assistant token | Add assistant text + usage stats, update `state.latest_tokens`. |
| **`conversation.item.deleted`** | Whenever you prune old turns | Remove superseded items from `conversation.item`. |


## 4‚ÄØ¬∑‚ÄØDynamic Context Management¬†&¬†Summarisation

### 4.1¬†Detect When to Summarise
We monitor latest_tokens returned in response.done. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.

### 4.2¬†Generate¬†&¬†Insert a Summary
We will be summarizing the conversation messages up to N-4 into french. We will later ask the Voice agent what language was the summary to test if the Summary insertion into Realtime API Conversation Context was successfull.

In [7]:
async def run_summary_llm(text: str) -> str:
    """Call a lightweight model to summarise `text`."""
    resp = await asyncio.to_thread(lambda: openai.chat.completions.create(
        model=SUMMARY_MODEL,
        temperature=0,
        messages=[
            {"role": "system", "content": "Summarise in French the following conversation "
                            "in one concise paragraph so it can be used as "
                            "context for future dialogue."},
            {"role": "user", "content": text},
        ],
    ))
    return resp.choices[0].message.content.strip()

In [16]:
async def summarise_and_prune(ws, state):
    """Summarise old turns, delete them server‚Äëside, and prepend a single summary
    turn locally + remotely."""
    state.summarising = True
    print(
        f"‚ö†Ô∏è  Token window ‚âà{state.latest_tokens} ‚â• {SUMMARY_TRIGGER}. Summarising‚Ä¶",
    )
    old_turns, recent_turns = state.history[:-KEEP_LAST_TURNS], state.history[-KEEP_LAST_TURNS:]
    convo_text = "\n".join(f"{t.role}: {t.text}" for t in old_turns if t.text)
    
    if not convo_text:
        print("Nothing to summarise (transcripts still pending).")
        state.summarising = False

    summary_text = await run_summary_llm(convo_text) if convo_text else ""
    state.summary_count += 1
    summary_id = f"sum_{state.summary_count:03d}"
    state.history[:] = [Turn("assistant", summary_id, summary_text)] + recent_turns
    
    print_history(state)    

    #¬†Create summary on server
    await ws.send(json.dumps({
        "type": "conversation.item.create",
        "previous_item_id": "root",
        "item": {
            "id": summary_id,
            "type": "message",
            "role": "assistant",
            "content": [{"type": "text", "text": summary_text}],
        },
    }))

    #¬†Delete old items
    for turn in old_turns:
        await ws.send(json.dumps({
            "type": "conversation.item.delete",
            "item_id": turn.item_id,
        }))

    print(f"‚úÖ Summary inserted ({summary_id})")
    
    state.summarising = False

In [9]:
async def fetch_full_item(
    ws, item_id: str, state: ConversationState, attempts: int = 1
):
    """
    Ask the server for a full conversation item; retry up to 5√ó if the
    transcript field is still null.  Resolve the waiting future when done.
    """
    # If there is already a pending fetch, just await it
    if item_id in state.waiting:
        return await state.waiting[item_id]

    fut = asyncio.get_running_loop().create_future()
    state.waiting[item_id] = fut

    await ws.send(json.dumps({
        "type": "conversation.item.retrieve",
        "item_id": item_id,
    }))
    item = await fut

    # If transcript still missing retry (max 5√ó)
    if attempts < 5 and not item.get("content", [{}])[0].get("transcript"):
        await asyncio.sleep(0.4 * attempts)
        return await fetch_full_item(ws, item_id, state, attempts + 1)

    # Done ‚Äì remove the marker
    state.waiting.pop(item_id, None)
    return item


## 5‚ÄØ¬∑‚ÄØEnd‚Äëto‚ÄëEnd Workflow Demonstration

Run the two cells below to launch an interactive session. Press Ctrl‚ÄëC¬†to stop recording.

In [13]:
# --------------------------------------------------------------------------- #
# üé§ Realtime session                                                          #
# --------------------------------------------------------------------------- #
async def realtime_session(model="gpt-4o-realtime-preview", voice="shimmer", enable_playback=True):
    """
    Main coroutine: connects to the Realtime endpoint, spawns helper tasks,
    and processes incoming events in a big async‚Äëfor loop.
    """
    state = ConversationState()  # Reset state for each run

    pcm_queue: asyncio.Queue[bytes] = asyncio.Queue()
    assistant_audio: List[bytes] = []

    # ----------------------------------------------------------------------- #
    # Open the WebSocket connection to the Realtime API                       #
    # ----------------------------------------------------------------------- #
    url = f"wss://api.openai.com/v1/realtime?model={model}"
    headers = {"Authorization": f"Bearer {openai.api_key}", "OpenAI-Beta": "realtime=v1"}

    async with websockets.connect(url, extra_headers=headers, max_size=1 << 24) as ws:
        # ------------------------------------------------------------------- #
        # Wait until server sends session.created                             #
        # ------------------------------------------------------------------- #
        while json.loads(await ws.recv())["type"] != "session.created":
            pass
        print("session.created ‚úÖ")

        # ------------------------------------------------------------------- #
        # Configure session: voice, modalities, audio formats, transcription  #
        # ------------------------------------------------------------------- #
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "voice": voice,
                "modalities": ["audio", "text"],
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "input_audio_transcription": {"model": "gpt-4o-transcribe"},
            },
        }))

        # ------------------------------------------------------------------- #
        # Launch background tasks: mic capture ‚Üí queue ‚Üí websocket            #
        # ------------------------------------------------------------------- #
        mic_task = asyncio.create_task(mic_to_queue(pcm_queue))
        upl_task = asyncio.create_task(queue_to_websocket(pcm_queue, ws))

        print("üéôÔ∏è¬†Speak now¬†(Ctrl‚ÄëC to quit)‚Ä¶")

        try:
            # ------------------------------------------------------------------- #
            # Main event loop: process incoming events from the websocket         #
            # ------------------------------------------------------------------- #
            async for event_raw in ws:
                event = json.loads(event_raw)
                etype = event["type"]

                # --------------------------------------------------------------- #
                # User just spoke ‚á¢ conversation.item.created (role = user)        #
                # --------------------------------------------------------------- #
                if etype == "conversation.item.created" and event["item"]["role"] == "user":
                    item = event["item"]
                    text = None
                    if item["content"]:
                        text = item["content"][0].get("transcript")
                    
                    state.history.append(Turn("user", event["item"]["id"], text))
                    
                    # If transcript not yet available, fetch it later
                    if text is None:
                        asyncio.create_task(fetch_full_item(ws, item["id"], state))

                # --------------------------------------------------------------- #
                # Transcript fetched ‚á¢ conversation.item.retrieved                 #
                # --------------------------------------------------------------- #
                elif etype == "conversation.item.retrieved":
                    content = event["item"]["content"][0]
                    # Fill missing transcript in history
                    for t in state.history:
                        if t.item_id == event["item"]["id"]:
                            t.text = content.get("transcript")
                            break

                # --------------------------------------------------------------- #
                # Assistant audio arrives in deltas                               #
                # --------------------------------------------------------------- #
                elif etype == "response.audio.delta":
                    assistant_audio.append(base64.b64decode(event["delta"]))

                # --------------------------------------------------------------- #
                # Assistant reply finished ‚á¢ response.done                        #
                # --------------------------------------------------------------- #
                elif etype == "response.done":
                    for item in event["response"]["output"]:
                        if item["role"] == "assistant":
                            txt = item["content"][0]["transcript"]
                            state.history.append(Turn("assistant", item["id"], txt))
                            # print(f"\nü§ñ {txt}\n")
                    state.latest_tokens = event["response"]["usage"]["total_tokens"]
                    print(f"‚Äî‚Äî response.done  (window ‚âà{state.latest_tokens} tokens) ‚Äî‚Äî")
                    print_history(state)
                    
                    # Fetch any still‚Äëmissing user transcripts
                    for turn in state.history:
                        if (turn.role == "user"
                            and turn.text is None
                            and turn.item_id not in state.waiting):
                            asyncio.create_task(
                                fetch_full_item(ws, turn.item_id, state)
                            )

                    # Playback collected audio once reply completes
                    if enable_playback and assistant_audio:
                        simpleaudio.play_buffer(b"".join(assistant_audio), 1, BYTES_PER_SAMPLE, SAMPLE_RATE_HZ)
                        assistant_audio.clear()

                    # Summarise if context too large ‚Äì fire in background so we don't block dialogue
                    if state.latest_tokens >= SUMMARY_TRIGGER and len(state.history) > KEEP_LAST_TURNS and not state.summarising:
                        asyncio.create_task(summarise_and_prune(ws, state))

        except KeyboardInterrupt:
            print("\nStopping‚Ä¶")
        finally:
            mic_task.cancel()
            await pcm_queue.put(None)
            await upl_task

In [None]:
#¬†Run¬†the realtime¬†session (this cell blocks until you stop it)
await realtime_session()

```raw
üéôÔ∏è Speak now (Ctrl‚ÄëC to quit)‚Ä¶
‚Äî‚Äî response.done  (window ‚âà228 tokens) ‚Äî‚Äî
‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
[user     ]   (item_BTfKeRGKfjQ976Ojmgpl6)
[assistant] Hey there! Not much, just here to help out. What's up with you?  (item_BTfKeuSJAlvr8WMqewiOo)
‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
‚Äî‚Äî response.done  (window ‚âà0 tokens) ‚Äî‚Äî
‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
[user     ] Hey, what's up?  (item_BTfKeRGKfjQ976Ojmgpl6)
[assistant] Hey there! Not much, just here to help out. What's up with you?  (item_BTfKeuSJAlvr8WMqewiOo)
[user     ]   (item_BTfMI1PSozC8zYfxBGDEA)
‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
‚Äî‚Äî response.done  (window ‚âà422 tokens) ‚Äî‚Äî
‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
[user     ] Hey, what's up?  (item_BTfKeRGKfjQ976Ojmgpl6)
[assistant] Hey there! Not much, just here to help out. What's up with you?  (item_BTfKeuSJAlvr8WMqewiOo)
[user     ]   (item_BTfMI1PSozC8zYfxBGDEA)
[user     ]   (item_BTfMIbULFByNpzbBMjP18)
[assistant] Sure thing! Why don't scientists trust atoms? Because they make up everything!  (item_BTfMI7oH0KvSsGxEfSOTP)
‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
‚Äî‚Äî response.done  (window ‚âà1897 tokens) ‚Äî‚Äî
‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
[user     ] Hey, what's up?  (item_BTfKeRGKfjQ976Ojmgpl6)
[assistant] Hey there! Not much, just here to help out. What's up with you?  (item_BTfKeuSJAlvr8WMqewiOo)
[user     ]   (item_BTfMI1PSozC8zYfxBGDEA)
[user     ] Can you tell me a joke?  (item_BTfMIbULFByNpzbBMjP18)
[assistant] Sure thing! Why don't scientists trust atoms? Because they make up everything!  (item_BTfMI7oH0KvSsGxEfSOTP)
[user     ]   (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
‚Äî‚Äî response.done  (window ‚âà2138 tokens) ‚Äî‚Äî
‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
[user     ] Hey, what's up?  (item_BTfKeRGKfjQ976Ojmgpl6)
[assistant] Hey there! Not much, just here to help out. What's up with you?  (item_BTfKeuSJAlvr8WMqewiOo)
[user     ]   (item_BTfMI1PSozC8zYfxBGDEA)
[user     ] Can you tell me a joke?  (item_BTfMIbULFByNpzbBMjP18)
[assistant] Sure thing! Why don't scientists trust atoms? Because they make up everything!  (item_BTfMI7oH0KvSsGxEfSOTP)
[user     ] Hey, €å€Å 500 word story?  (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
[user     ]   (item_BTfQYIvLsqARzsSwUF5Wv)
[assistant] Absolutely! How about this: Why did the scarecrow win an award? Because he was outstanding in his field!  (item_BTfQYoNSn0Lv33LMrWle5)
‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
[assistant] Dans cette conversation, l'utilisateur commence par saluer et demander de l'aide, puis il demande une blague. L'assistant r√©pond avec une blague sur les atomes, soulignant son r√¥le d'aide et d'interaction amicale.  (sum_001)
[user     ] Hey, €å€Å 500 word story?  (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
[user     ] Any other funny things you can tell me?  (item_BTfQYIvLsqARzsSwUF5Wv)
[assistant] Absolutely! How about this: Why did the scarecrow win an award? Because he was outstanding in his field!  (item_BTfQYoNSn0Lv33LMrWle5)
‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
‚Äî‚Äî response.done  (window ‚âà0 tokens) ‚Äî‚Äî
‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
[assistant] Dans cette conversation, l'utilisateur commence par saluer et demander de l'aide, puis il demande une blague. L'assistant r√©pond avec une blague sur les atomes, soulignant son r√¥le d'aide et d'interaction amicale.  (sum_001)
[user     ] Hey, €å€Å 500 word story?  (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
[user     ] Any other funny things you can tell me?  (item_BTfQYIvLsqARzsSwUF5Wv)
[assistant] Absolutely! How about this: Why did the scarecrow win an award? Because he was outstanding in his field!  (item_BTfQYoNSn0Lv33LMrWle5)
[user     ]   (item_BTfTVqDdNbi2X05U8rHIs)
‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
‚Äî‚Äî response.done  (window ‚âà0 tokens) ‚Äî‚Äî
‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
[assistant] Dans cette conversation, l'utilisateur commence par saluer et demander de l'aide, puis il demande une blague. L'assistant r√©pond avec une blague sur les atomes, soulignant son r√¥le d'aide et d'interaction amicale.  (sum_001)
[user     ] Hey, €å€Å 500 word story?  (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
[user     ] Any other funny things you can tell me?  (item_BTfQYIvLsqARzsSwUF5Wv)
[assistant] Absolutely! How about this: Why did the scarecrow win an award? Because he was outstanding in his field!  (item_BTfQYoNSn0Lv33LMrWle5)
[user     ]   (item_BTfTVqDdNbi2X05U8rHIs)
[user     ]   (item_BTfTVHuCRBUzG82xQlx1o)
‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
‚Äî‚Äî response.done  (window ‚âà2082 tokens) ‚Äî‚Äî
‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
[assistant] Dans cette conversation, l'utilisateur commence par saluer et demander de l'aide, puis il demande une blague. L'assistant r√©pond avec une blague sur les atomes, soulignant son r√¥le d'aide et d'interaction amicale.  (sum_001)
[user     ] Hey, €å€Å 500 word story?  (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
[user     ] Any other funny things you can tell me?  (item_BTfQYIvLsqARzsSwUF5Wv)
[assistant] Absolutely! How about this: Why did the scarecrow win an award? Because he was outstanding in his field!  (item_BTfQYoNSn0Lv33LMrWle5)
[user     ]   (item_BTfTVqDdNbi2X05U8rHIs)
[user     ]   (item_BTfTVHuCRBUzG82xQlx1o)
[user     ]   (item_BTfTV6S7x7gTfHgBhZFst)
[assistant] The language of the first summary of our conversation was French.  (item_BTfTVSiLtjisYYInKT40R)
‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
‚Äî‚Äî Conversation so far ‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî‚Äî
[assistant] Dans cette conversation, l'utilisateur demande de l'aide et une blague, tandis que l'assistant r√©pond avec une blague sur les atomes et une autre sur un √©pouvantail, montrant son r√¥le d'interaction amicale et humoristique. L'utilisateur √©voque √©galement une histoire de 500 mots sur une boulang√®re nomm√©e Lucy, qui remporte un concours de tartes, ce qui apporte la renomm√©e √† son village.  (sum_002)
[user     ]   (item_BTfTVqDdNbi2X05U8rHIs)
[user     ]   (item_BTfTVHuCRBUzG82xQlx1o)
[user     ] The summary of our conversation  (item_BTfTV6S7x7gTfHgBhZFst)
[assistant] The language of the first summary of our conversation was French.  (item_BTfTVSiLtjisYYInKT40R)
```

---

## 6‚ÄØ¬∑‚ÄØReal‚ÄëWorld Applications & Extension Ideas
- **Customer‚Äësupport voicebots:** Summaries enable seamless hand‚Äëoff to human agents while preserving privacy.
- **Multilingual assistants:** Swap `SUMMARY_MODEL` and system prompt to translate & condense context across languages.
- **Accessibility tools:** Real‚Äëtime captioning plus summarised notes for hearing‚Äëimpaired users.
- **Embedded devices:** Edge streaming with local VAD to conserve data.

> **Extension Challenge:** Integrate a browser‚Äëbased UI with Web¬†Speech¬†API so users need no Python.


## Next‚ÄØSteps & Further Reading
Try out the notebook and try integrating context summary into your application.

Few things you can try:
- Evaluate if context summary helps with your eval and use case
- Try various methods of summarizing
- ect

Resources:
- https://platform.openai.com/docs/guides/realtime 
- https://platform.openai.com/docs/guides/realtime-conversations
- https://platform.openai.com/docs/api-reference/realtime
- https://voiceaiandvoiceagents.com/