Skip to content

A framework-agnostic voice agent library for voice-controlled UI interactions, powered by OpenAI's Realtime API.

License

Notifications You must be signed in to change notification settings

rajnandan1/atticus

Repository files navigation

Atticus

Atticus Banner

A framework-agnostic voice agent library for voice-controlled UI interactions, powered by OpenAI's Realtime API.

Installation

npm/yarn

npm install @rajnandan1/atticus

CDN (Vanilla HTML/JS)

<!-- Use the IIFE build via unpkg or jsdelivr -->
<script src="https://unpkg.com/@rajnandan1/atticus@latest/dist/index.global.js"></script>

<!-- Or specific version -->
<script src="https://unpkg.com/@rajnandan1/atticus@v1.1.3/dist/index.global.js"></script>

<!-- jsdelivr alternative -->
<script src="https://cdn.jsdelivr.net/npm/@rajnandan1/atticus@latest/dist/index.global.js"></script>

The script tag exposes Atticus globally - see Vanilla HTML Usage below.

Quick Start

import { Atticus } from "atticus";

// Get a client secret from your backend (which calls OpenAI's API)
const clientSecret = await fetchClientSecret();

const agent = new Atticus({
    clientSecret,
    voice: "shimmer", // Optional: alloy, ash, ballad, coral, echo, sage, shimmer, verse
    language: "en", // Optional: supports 40+ languages
    agent: {
        name: "Assistant",
        instructions:
            "You are a helpful assistant that helps users interact with the UI.",
    },
    ui: {
        enabled: true,
        rootElement: document.getElementById("app"),
    },
});

// Listen to events
agent.on("connected", () => console.log("Connected!"));
agent.on("message", (msg) => console.log("Message:", msg));
agent.on("error", (err) => console.error("Error:", err));

// Connect and start talking
await agent.connect();

// Disconnect when done
agent.disconnect();

Vanilla HTML/JS Usage

Atticus works perfectly with vanilla HTML/JS using a script tag:

<!DOCTYPE html>
<html>
    <head>
        <title>Atticus Voice Demo</title>
    </head>
    <body>
        <button id="connectBtn">Connect</button>
        <div id="status">Idle</div>

        <!-- Include Atticus -->
        <script src="https://unpkg.com/@rajnandan1/atticus@latest/dist/index.global.js"></script>

        <script>
            // Atticus is now available globally
            let agent = null;

            document
                .getElementById("connectBtn")
                .addEventListener("click", async () => {
                    if (agent && agent.isConnected) {
                        agent.disconnect();
                        return;
                    }

                    // Get client secret from your backend
                    const response = await fetch("/api/session", {
                        method: "POST",
                    });
                    const { clientSecret } = await response.json();

                    agent = new Atticus.Atticus({
                        clientSecret,
                        agent: {
                            name: "Assistant",
                            instructions: "You are a helpful voice assistant.",
                        },
                        voice: "shimmer",
                        language: "en",
                        ui: {
                            enabled: true,
                            rootElement: document.body,
                        },
                    });

                    agent.on("connected", () => {
                        document.getElementById("status").textContent =
                            "Connected!";
                    });

                    agent.on("message", (msg) => {
                        console.log("Message:", msg);
                    });

                    await agent.connect();
                });
        </script>
    </body>
</html>

See index.html for a complete example.

UI-Aware Mode

Enable UI awareness to let users control your interface with voice. Actions are automatically executed by default:

const agent = new Atticus({
    clientSecret,
    agent: {
        name: "UI Assistant",
        instructions: "Help users fill out the form on this page.",
    },
    ui: {
        enabled: true,
        rootElement: document.getElementById("app")!,
    },
});

// Actions are auto-executed! Just listen for logging/feedback
agent.on("action", (action) => {
    console.log("Action executed:", action.outputText);
    console.log("Code:", action.outputCode);
});

await agent.connect();

// Now say: "Fill the name field with John Doe"
// The library will automatically execute the action!

Manual Action Execution

If you want to handle actions yourself:

const agent = new Atticus({
    clientSecret,
    agent: { name: "Assistant", instructions: "..." },
    doNotExecuteActions: true, // Disable auto-execution
    ui: { enabled: true, rootElement: document.body },
});

agent.on("action", async (action) => {
    // Validate or modify action before execution
    if (action.actionType === "click") {
        const result = await agent.executeAction(action);
        console.log("Result:", result);
    }
});

Preserving Nested DOM with data-preserve

By default, Atticus compresses the DOM to fit within token limits. To preserve specific sections with their full nested structure (useful for complex components or data that shouldn't be simplified), add the data-preserve attribute:

<!-- This section will maintain its full nested structure -->
<div
    class="product-list"
    data-preserve="List of available products with prices"
>
    <div class="product">
        <h3>Product Name</h3>
        <p>Description</p>
        <span class="price">$99.99</span>
        <button>Add to Cart</button>
    </div>
    <!-- More products... -->
</div>

<!-- Another preserved section -->
<form data-preserve="Contact form with name, email, and message fields">
    <input type="text" name="name" placeholder="Name" />
    <input type="email" name="email" placeholder="Email" />
    <textarea name="message" placeholder="Message"></textarea>
    <button type="submit">Submit</button>
</form>

The data-preserve attribute value should describe the content to help the AI understand what's inside. Preserved sections are included in full detail in the DOM context sent to the AI agent.

const agent = new Atticus({
    clientSecret,
    agent: {
        name: "Shopping Assistant",
        instructions: "Help users find and add products to their cart.",
    },
    ui: {
        enabled: true,
        rootElement: document.getElementById("app"),
    },
});

// AI will see the full product list structure and can interact with specific products

Configuration

interface AtticusConfig {
    // Required: OpenAI client secret (ephemeral key)
    clientSecret: string;

    // Required: Agent configuration
    agent: {
        name: string;
        instructions: string;
    };

    // Optional: Voice for the agent (default: 'alloy')
    // Options: 'alloy', 'ash', 'ballad', 'coral', 'echo', 'sage', 'shimmer', 'verse'
    voice?: AtticusVoice;

    // Optional: Language code (default: 'en')
    // Supports: en, es, fr, de, it, pt, ru, ja, ko, zh, hi, ar, and 30+ more
    language?: string;

    // Optional: OpenAI model (default: 'gpt-4o-realtime-preview')
    model?: string;

    // Optional: Auto-greet on connect (default: true)
    autoGreet?: boolean;

    // Optional: Greeting message (default: language-specific greeting)
    greetingMessage?: string;

    // Optional: Debug logging (default: false)
    debug?: boolean;

    // Optional: Disable auto-execution of UI actions (default: false)
    doNotExecuteActions?: boolean;

    // Optional: UI awareness configuration
    ui?: {
        enabled: boolean;
        rootElement: Element;
        autoUpdate?: boolean;
        autoUpdateInterval?: number; // ms, default: 5000
        d2SnapOptions?: {
            maxTokens?: number; // default: 4096
            assignUniqueIDs?: boolean; // default: true
        };
    };
}

Voice Options

Voice Description
alloy Neutral, balanced (default)
ash Soft, gentle
ballad Warm, expressive
coral Clear, friendly
echo Smooth, conversational
sage Calm, wise
shimmer Bright, energetic
verse Articulate, professional

Supported Languages

Atticus supports 40+ languages with native greetings. Set the language option:

const agent = new Atticus({
    clientSecret,
    language: "hi", // Hindi - will greet with "नमस्ते!"
    agent: { name: "Assistant", instructions: "Help users with their tasks." },
    ui: {
        enabled: true,
        rootElement: document.getElementById("app"),
    },
});
Code Language Code Language Code Language
en English ja Japanese pl Polish
hi Hindi ko Korean nl Dutch
es Spanish zh Chinese sv Swedish
fr French ar Arabic da Danish
de German bn Bengali no Norwegian
it Italian ta Tamil fi Finnish
pt Portuguese te Telugu tr Turkish
ru Russian th Thai uk Ukrainian

Events

Event Payload Description
connected - Successfully connected
disconnected - Disconnected
error string Error occurred
statusChange AtticusStatus Connection status changed
conversationStateChange ConversationState Conversation state changed
message Message New message received
historyChange Message[] Conversation history updated
stateChange AtticusState Any state changed
agentStart - Agent started speaking
agentEnd - Agent stopped speaking
userAudio - User audio detected
action UIAction UI action executed (or requested if doNotExecuteActions=true)

UI Action Types

When UI mode is enabled, the agent can perform these actions:

Action Description Example Code
click Click elements document.getElementById('btn').click()
type Enter text document.getElementById('input').value = 'Hello'
scroll Scroll page/elements window.scrollTo(0, 500)
focus Focus form elements document.getElementById('field').focus()
select Select dropdown options document.getElementById('select').value = 'option1'
hover Hover over elements -
navigate Navigate pages window.location.href = '/page'
read Read information (no code) -

API

Methods

  • connect() - Connect to the voice agent
  • disconnect() - Disconnect from the voice agent
  • toggle() - Toggle connection state
  • interrupt() - Interrupt the AI while speaking
  • sendMessage(text) - Send a text message
  • updateDOM(element | html) - Manually update DOM context
  • refreshDOM() - Refresh DOM from root element
  • startAutoUpdate() - Start auto-updating DOM
  • stopAutoUpdate() - Stop auto-updating DOM
  • executeAction(action) - Manually execute a UI action
  • getState() - Get complete state object
  • destroy() - Clean up resources

Properties

  • status - Connection status (idle | connecting | connected | error)
  • conversationState - Conversation state (idle | ai_speaking | user_turn | user_speaking)
  • error - Error message (if any)
  • history - Conversation history
  • isConnected - Is connected
  • isAiSpeaking - Is AI speaking
  • isUserSpeaking - Is user speaking
  • language - Configured language
  • currentDOM - Current DOM context
  • isUIEnabled - Is UI mode enabled

Getting a Client Secret

The client secret (ephemeral key) must be obtained from OpenAI's API. You can get it directly via curl or from your backend.

Option 1: Direct curl (for testing)

curl -X POST "https://api.openai.com/v1/realtime/sessions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-realtime-preview-2024-12-17",
    "voice": "shimmer"
  }'

Response:

{
    "id": "sess_xxx",
    "object": "realtime.session",
    "model": "gpt-4o-realtime-preview-2024-12-17",
    "client_secret": {
        "value": "ek_xxx...",
        "expires_at": 1234567890
    }
}

Copy the client_secret.value and use it with Atticus.

Option 2: Backend (Node.js/Express) - Recommended for Production

app.post("/api/session", async (req, res) => {
    const response = await fetch(
        "https://api.openai.com/v1/realtime/sessions",
        {
            method: "POST",
            headers: {
                Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                model: "gpt-4o-realtime-preview-2024-12-17",
                voice: "shimmer",
            }),
        }
    );

    const data = await response.json();
    res.json({ clientSecret: data.client_secret.value });
});

Frontend

async function fetchClientSecret() {
    const response = await fetch("/api/session", { method: "POST" });
    const data = await response.json();
    return data.clientSecret;
}

const clientSecret = await fetchClientSecret();
const agent = new Atticus({
    clientSecret,
    agent: {
        name: "Assistant",
        instructions: "Help users interact with the page.",
    },
    ui: {
        enabled: true,
        rootElement: document.getElementById("app"),
});

Running the Demo

# Clone the repo
git clone https://github.com/rajnandan1/atticus.git
cd atticus

# Install dependencies
npm install

# Start dev server (builds + serves demo)
npm run dev

# Open http://localhost:3000/demo/

License

MIT

About

A framework-agnostic voice agent library for voice-controlled UI interactions, powered by OpenAI's Realtime API.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published