Atticus

A framework-agnostic voice agent library for voice-controlled UI interactions, powered by OpenAI's Realtime API.

Installation

npm/yarn

npm install @rajnandan1/atticus

CDN (Vanilla HTML/JS)

<!-- Use the IIFE build via unpkg or jsdelivr -->
<script src="https://unpkg.com/@rajnandan1/atticus@latest/dist/index.global.js"></script>

<!-- Or specific version -->
<script src="https://unpkg.com/@rajnandan1/atticus@v1.1.3/dist/index.global.js"></script>

<!-- jsdelivr alternative -->
<script src="https://cdn.jsdelivr.net/npm/@rajnandan1/atticus@latest/dist/index.global.js"></script>

The script tag exposes Atticus globally - see Vanilla HTML Usage below.

Quick Start

import { Atticus } from "atticus";

// Get a client secret from your backend (which calls OpenAI's API)
const clientSecret = await fetchClientSecret();

const agent = new Atticus({
    clientSecret,
    voice: "shimmer", // Optional: alloy, ash, ballad, coral, echo, sage, shimmer, verse
    language: "en", // Optional: supports 40+ languages
    agent: {
        name: "Assistant",
        instructions:
            "You are a helpful assistant that helps users interact with the UI.",
    },
    ui: {
        enabled: true,
        rootElement: document.getElementById("app"),
    },
});

// Listen to events
agent.on("connected", () => console.log("Connected!"));
agent.on("message", (msg) => console.log("Message:", msg));
agent.on("error", (err) => console.error("Error:", err));

// Connect and start talking
await agent.connect();

// Disconnect when done
agent.disconnect();

Vanilla HTML/JS Usage

Atticus works perfectly with vanilla HTML/JS using a script tag:

<!DOCTYPE html>
<html>
    <head>
        <title>Atticus Voice Demo</title>
    </head>
    <body>
        <button id="connectBtn">Connect</button>
        <div id="status">Idle</div>

        <!-- Include Atticus -->
        <script src="https://unpkg.com/@rajnandan1/atticus@latest/dist/index.global.js"></script>

        <script>
            // Atticus is now available globally
            let agent = null;

            document
                .getElementById("connectBtn")
                .addEventListener("click", async () => {
                    if (agent && agent.isConnected) {
                        agent.disconnect();
                        return;
                    }

                    // Get client secret from your backend
                    const response = await fetch("/api/session", {
                        method: "POST",
                    });
                    const { clientSecret } = await response.json();

                    agent = new Atticus.Atticus({
                        clientSecret,
                        agent: {
                            name: "Assistant",
                            instructions: "You are a helpful voice assistant.",
                        },
                        voice: "shimmer",
                        language: "en",
                        ui: {
                            enabled: true,
                            rootElement: document.body,
                        },
                    });

                    agent.on("connected", () => {
                        document.getElementById("status").textContent =
                            "Connected!";
                    });

                    agent.on("message", (msg) => {
                        console.log("Message:", msg);
                    });

                    await agent.connect();
                });
        </script>
    </body>
</html>

See index.html for a complete example.

UI-Aware Mode

Enable UI awareness to let users control your interface with voice. Actions are automatically executed by default:

const agent = new Atticus({
    clientSecret,
    agent: {
        name: "UI Assistant",
        instructions: "Help users fill out the form on this page.",
    },
    ui: {
        enabled: true,
        rootElement: document.getElementById("app")!,
    },
});

// Actions are auto-executed! Just listen for logging/feedback
agent.on("action", (action) => {
    console.log("Action executed:", action.outputText);
    console.log("Code:", action.outputCode);
});

await agent.connect();

// Now say: "Fill the name field with John Doe"
// The library will automatically execute the action!

Manual Action Execution

If you want to handle actions yourself:

const agent = new Atticus({
    clientSecret,
    agent: { name: "Assistant", instructions: "..." },
    doNotExecuteActions: true, // Disable auto-execution
    ui: { enabled: true, rootElement: document.body },
});

agent.on("action", async (action) => {
    // Validate or modify action before execution
    if (action.actionType === "click") {
        const result = await agent.executeAction(action);
        console.log("Result:", result);
    }
});

Preserving Nested DOM with `data-preserve`

By default, Atticus compresses the DOM to fit within token limits. To preserve specific sections with their full nested structure (useful for complex components or data that shouldn't be simplified), add the data-preserve attribute:

<!-- This section will maintain its full nested structure -->
<div
    class="product-list"
    data-preserve="List of available products with prices"
>
    <div class="product">
        <h3>Product Name</h3>
        <p>Description</p>
        <span class="price">$99.99</span>
        <button>Add to Cart</button>
    </div>
    <!-- More products... -->
</div>

<!-- Another preserved section -->
<form data-preserve="Contact form with name, email, and message fields">
    <input type="text" name="name" placeholder="Name" />
    <input type="email" name="email" placeholder="Email" />
    <textarea name="message" placeholder="Message"></textarea>
    <button type="submit">Submit</button>
</form>

The data-preserve attribute value should describe the content to help the AI understand what's inside. Preserved sections are included in full detail in the DOM context sent to the AI agent.

const agent = new Atticus({
    clientSecret,
    agent: {
        name: "Shopping Assistant",
        instructions: "Help users find and add products to their cart.",
    },
    ui: {
        enabled: true,
        rootElement: document.getElementById("app"),
    },
});

// AI will see the full product list structure and can interact with specific products

Configuration

interface AtticusConfig {
    // Required: OpenAI client secret (ephemeral key)
    clientSecret: string;

    // Required: Agent configuration
    agent: {
        name: string;
        instructions: string;
    };

    // Optional: Voice for the agent (default: 'alloy')
    // Options: 'alloy', 'ash', 'ballad', 'coral', 'echo', 'sage', 'shimmer', 'verse'
    voice?: AtticusVoice;

    // Optional: Language code (default: 'en')
    // Supports: en, es, fr, de, it, pt, ru, ja, ko, zh, hi, ar, and 30+ more
    language?: string;

    // Optional: OpenAI model (default: 'gpt-4o-realtime-preview')
    model?: string;

    // Optional: Auto-greet on connect (default: true)
    autoGreet?: boolean;

    // Optional: Greeting message (default: language-specific greeting)
    greetingMessage?: string;

    // Optional: Debug logging (default: false)
    debug?: boolean;

    // Optional: Disable auto-execution of UI actions (default: false)
    doNotExecuteActions?: boolean;

    // Optional: UI awareness configuration
    ui?: {
        enabled: boolean;
        rootElement: Element;
        autoUpdate?: boolean;
        autoUpdateInterval?: number; // ms, default: 5000
        d2SnapOptions?: {
            maxTokens?: number; // default: 4096
            assignUniqueIDs?: boolean; // default: true
        };
    };
}

Voice Options

Voice	Description
`alloy`	Neutral, balanced (default)
`ash`	Soft, gentle
`ballad`	Warm, expressive
`coral`	Clear, friendly
`echo`	Smooth, conversational
`sage`	Calm, wise
`shimmer`	Bright, energetic
`verse`	Articulate, professional

Supported Languages

Atticus supports 40+ languages with native greetings. Set the language option:

const agent = new Atticus({
    clientSecret,
    language: "hi", // Hindi - will greet with "नमस्ते!"
    agent: { name: "Assistant", instructions: "Help users with their tasks." },
    ui: {
        enabled: true,
        rootElement: document.getElementById("app"),
    },
});

Code	Language	Code	Language	Code	Language
`en`	English	`ja`	Japanese	`pl`	Polish
`hi`	Hindi	`ko`	Korean	`nl`	Dutch
`es`	Spanish	`zh`	Chinese	`sv`	Swedish
`fr`	French	`ar`	Arabic	`da`	Danish
`de`	German	`bn`	Bengali	`no`	Norwegian
`it`	Italian	`ta`	Tamil	`fi`	Finnish
`pt`	Portuguese	`te`	Telugu	`tr`	Turkish
`ru`	Russian	`th`	Thai	`uk`	Ukrainian

Events

Event	Payload	Description
`connected`	-	Successfully connected
`disconnected`	-	Disconnected
`error`	`string`	Error occurred
`statusChange`	`AtticusStatus`	Connection status changed
`conversationStateChange`	`ConversationState`	Conversation state changed
`message`	`Message`	New message received
`historyChange`	`Message[]`	Conversation history updated
`stateChange`	`AtticusState`	Any state changed
`agentStart`	-	Agent started speaking
`agentEnd`	-	Agent stopped speaking
`userAudio`	-	User audio detected
`action`	`UIAction`	UI action executed (or requested if doNotExecuteActions=true)

UI Action Types

When UI mode is enabled, the agent can perform these actions:

Action	Description	Example Code
`click`	Click elements	`document.getElementById('btn').click()`
`type`	Enter text	`document.getElementById('input').value = 'Hello'`
`scroll`	Scroll page/elements	`window.scrollTo(0, 500)`
`focus`	Focus form elements	`document.getElementById('field').focus()`
`select`	Select dropdown options	`document.getElementById('select').value = 'option1'`
`hover`	Hover over elements	-
`navigate`	Navigate pages	`window.location.href = '/page'`
`read`	Read information (no code)	-

API

Methods

connect() - Connect to the voice agent
disconnect() - Disconnect from the voice agent
toggle() - Toggle connection state
interrupt() - Interrupt the AI while speaking
sendMessage(text) - Send a text message
updateDOM(element | html) - Manually update DOM context
refreshDOM() - Refresh DOM from root element
startAutoUpdate() - Start auto-updating DOM
stopAutoUpdate() - Stop auto-updating DOM
executeAction(action) - Manually execute a UI action
getState() - Get complete state object
destroy() - Clean up resources

Properties

status - Connection status (idle | connecting | connected | error)
conversationState - Conversation state (idle | ai_speaking | user_turn | user_speaking)
error - Error message (if any)
history - Conversation history
isConnected - Is connected
isAiSpeaking - Is AI speaking
isUserSpeaking - Is user speaking
language - Configured language
currentDOM - Current DOM context
isUIEnabled - Is UI mode enabled

Getting a Client Secret

The client secret (ephemeral key) must be obtained from OpenAI's API. You can get it directly via curl or from your backend.

Option 1: Direct curl (for testing)

curl -X POST "https://api.openai.com/v1/realtime/sessions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-realtime-preview-2024-12-17",
    "voice": "shimmer"
  }'

Response:

{
    "id": "sess_xxx",
    "object": "realtime.session",
    "model": "gpt-4o-realtime-preview-2024-12-17",
    "client_secret": {
        "value": "ek_xxx...",
        "expires_at": 1234567890
    }
}

Copy the client_secret.value and use it with Atticus.

Option 2: Backend (Node.js/Express) - Recommended for Production

app.post("/api/session", async (req, res) => {
    const response = await fetch(
        "https://api.openai.com/v1/realtime/sessions",
        {
            method: "POST",
            headers: {
                Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                model: "gpt-4o-realtime-preview-2024-12-17",
                voice: "shimmer",
            }),
        }
    );

    const data = await response.json();
    res.json({ clientSecret: data.client_secret.value });
});

Frontend

async function fetchClientSecret() {
    const response = await fetch("/api/session", { method: "POST" });
    const data = await response.json();
    return data.clientSecret;
}

const clientSecret = await fetchClientSecret();
const agent = new Atticus({
    clientSecret,
    agent: {
        name: "Assistant",
        instructions: "Help users interact with the page.",
    },
    ui: {
        enabled: true,
        rootElement: document.getElementById("app"),
});

Running the Demo

# Clone the repo
git clone https://github.com/rajnandan1/atticus.git
cd atticus

# Install dependencies
npm install

# Start dev server (builds + serves demo)
npm run dev

# Open http://localhost:3000/demo/

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
src		src
test-d2snap		test-d2snap
.gitignore		.gitignore
CNAME		CNAME
LICENSE		LICENSE
README.md		README.md
banner.png		banner.png
banner_original.png		banner_original.png
favicon.ico		favicon.ico
index.html		index.html
logo.png		logo.png
package-lock.json		package-lock.json
package.json		package.json
style.css		style.css
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Atticus

Installation

npm/yarn

CDN (Vanilla HTML/JS)

Quick Start

Vanilla HTML/JS Usage

UI-Aware Mode

Manual Action Execution

Preserving Nested DOM with `data-preserve`

Configuration

Voice Options

Supported Languages

Events

UI Action Types

API

Methods

Properties

Getting a Client Secret

Option 1: Direct curl (for testing)

Option 2: Backend (Node.js/Express) - Recommended for Production

Frontend

Running the Demo

License

About

Uh oh!

Releases 9

Packages

Languages

License

rajnandan1/atticus

Folders and files

Latest commit

History

Repository files navigation

Atticus

Installation

npm/yarn

CDN (Vanilla HTML/JS)

Quick Start

Vanilla HTML/JS Usage

UI-Aware Mode

Manual Action Execution

Preserving Nested DOM with data-preserve

Configuration

Voice Options

Supported Languages

Events

UI Action Types

API

Methods

Properties

Getting a Client Secret

Option 1: Direct curl (for testing)

Option 2: Backend (Node.js/Express) - Recommended for Production

Frontend

Running the Demo

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Languages

Preserving Nested DOM with `data-preserve`

Packages