A framework-agnostic voice agent library for voice-controlled UI interactions, powered by OpenAI's Realtime API.
npm install @rajnandan1/atticus<!-- Use the IIFE build via unpkg or jsdelivr -->
<script src="https://unpkg.com/@rajnandan1/atticus@latest/dist/index.global.js"></script>
<!-- Or specific version -->
<script src="https://unpkg.com/@rajnandan1/atticus@v1.1.3/dist/index.global.js"></script>
<!-- jsdelivr alternative -->
<script src="https://cdn.jsdelivr.net/npm/@rajnandan1/atticus@latest/dist/index.global.js"></script>The script tag exposes Atticus globally - see Vanilla HTML Usage below.
import { Atticus } from "atticus";
// Get a client secret from your backend (which calls OpenAI's API)
const clientSecret = await fetchClientSecret();
const agent = new Atticus({
clientSecret,
voice: "shimmer", // Optional: alloy, ash, ballad, coral, echo, sage, shimmer, verse
language: "en", // Optional: supports 40+ languages
agent: {
name: "Assistant",
instructions:
"You are a helpful assistant that helps users interact with the UI.",
},
ui: {
enabled: true,
rootElement: document.getElementById("app"),
},
});
// Listen to events
agent.on("connected", () => console.log("Connected!"));
agent.on("message", (msg) => console.log("Message:", msg));
agent.on("error", (err) => console.error("Error:", err));
// Connect and start talking
await agent.connect();
// Disconnect when done
agent.disconnect();Atticus works perfectly with vanilla HTML/JS using a script tag:
<!DOCTYPE html>
<html>
<head>
<title>Atticus Voice Demo</title>
</head>
<body>
<button id="connectBtn">Connect</button>
<div id="status">Idle</div>
<!-- Include Atticus -->
<script src="https://unpkg.com/@rajnandan1/atticus@latest/dist/index.global.js"></script>
<script>
// Atticus is now available globally
let agent = null;
document
.getElementById("connectBtn")
.addEventListener("click", async () => {
if (agent && agent.isConnected) {
agent.disconnect();
return;
}
// Get client secret from your backend
const response = await fetch("/api/session", {
method: "POST",
});
const { clientSecret } = await response.json();
agent = new Atticus.Atticus({
clientSecret,
agent: {
name: "Assistant",
instructions: "You are a helpful voice assistant.",
},
voice: "shimmer",
language: "en",
ui: {
enabled: true,
rootElement: document.body,
},
});
agent.on("connected", () => {
document.getElementById("status").textContent =
"Connected!";
});
agent.on("message", (msg) => {
console.log("Message:", msg);
});
await agent.connect();
});
</script>
</body>
</html>See index.html for a complete example.
Enable UI awareness to let users control your interface with voice. Actions are automatically executed by default:
const agent = new Atticus({
clientSecret,
agent: {
name: "UI Assistant",
instructions: "Help users fill out the form on this page.",
},
ui: {
enabled: true,
rootElement: document.getElementById("app")!,
},
});
// Actions are auto-executed! Just listen for logging/feedback
agent.on("action", (action) => {
console.log("Action executed:", action.outputText);
console.log("Code:", action.outputCode);
});
await agent.connect();
// Now say: "Fill the name field with John Doe"
// The library will automatically execute the action!If you want to handle actions yourself:
const agent = new Atticus({
clientSecret,
agent: { name: "Assistant", instructions: "..." },
doNotExecuteActions: true, // Disable auto-execution
ui: { enabled: true, rootElement: document.body },
});
agent.on("action", async (action) => {
// Validate or modify action before execution
if (action.actionType === "click") {
const result = await agent.executeAction(action);
console.log("Result:", result);
}
});By default, Atticus compresses the DOM to fit within token limits. To preserve specific sections with their full nested structure (useful for complex components or data that shouldn't be simplified), add the data-preserve attribute:
<!-- This section will maintain its full nested structure -->
<div
class="product-list"
data-preserve="List of available products with prices"
>
<div class="product">
<h3>Product Name</h3>
<p>Description</p>
<span class="price">$99.99</span>
<button>Add to Cart</button>
</div>
<!-- More products... -->
</div>
<!-- Another preserved section -->
<form data-preserve="Contact form with name, email, and message fields">
<input type="text" name="name" placeholder="Name" />
<input type="email" name="email" placeholder="Email" />
<textarea name="message" placeholder="Message"></textarea>
<button type="submit">Submit</button>
</form>The data-preserve attribute value should describe the content to help the AI understand what's inside. Preserved sections are included in full detail in the DOM context sent to the AI agent.
const agent = new Atticus({
clientSecret,
agent: {
name: "Shopping Assistant",
instructions: "Help users find and add products to their cart.",
},
ui: {
enabled: true,
rootElement: document.getElementById("app"),
},
});
// AI will see the full product list structure and can interact with specific productsinterface AtticusConfig {
// Required: OpenAI client secret (ephemeral key)
clientSecret: string;
// Required: Agent configuration
agent: {
name: string;
instructions: string;
};
// Optional: Voice for the agent (default: 'alloy')
// Options: 'alloy', 'ash', 'ballad', 'coral', 'echo', 'sage', 'shimmer', 'verse'
voice?: AtticusVoice;
// Optional: Language code (default: 'en')
// Supports: en, es, fr, de, it, pt, ru, ja, ko, zh, hi, ar, and 30+ more
language?: string;
// Optional: OpenAI model (default: 'gpt-4o-realtime-preview')
model?: string;
// Optional: Auto-greet on connect (default: true)
autoGreet?: boolean;
// Optional: Greeting message (default: language-specific greeting)
greetingMessage?: string;
// Optional: Debug logging (default: false)
debug?: boolean;
// Optional: Disable auto-execution of UI actions (default: false)
doNotExecuteActions?: boolean;
// Optional: UI awareness configuration
ui?: {
enabled: boolean;
rootElement: Element;
autoUpdate?: boolean;
autoUpdateInterval?: number; // ms, default: 5000
d2SnapOptions?: {
maxTokens?: number; // default: 4096
assignUniqueIDs?: boolean; // default: true
};
};
}| Voice | Description |
|---|---|
alloy |
Neutral, balanced (default) |
ash |
Soft, gentle |
ballad |
Warm, expressive |
coral |
Clear, friendly |
echo |
Smooth, conversational |
sage |
Calm, wise |
shimmer |
Bright, energetic |
verse |
Articulate, professional |
Atticus supports 40+ languages with native greetings. Set the language option:
const agent = new Atticus({
clientSecret,
language: "hi", // Hindi - will greet with "नमस्ते!"
agent: { name: "Assistant", instructions: "Help users with their tasks." },
ui: {
enabled: true,
rootElement: document.getElementById("app"),
},
});| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
en |
English | ja |
Japanese | pl |
Polish |
hi |
Hindi | ko |
Korean | nl |
Dutch |
es |
Spanish | zh |
Chinese | sv |
Swedish |
fr |
French | ar |
Arabic | da |
Danish |
de |
German | bn |
Bengali | no |
Norwegian |
it |
Italian | ta |
Tamil | fi |
Finnish |
pt |
Portuguese | te |
Telugu | tr |
Turkish |
ru |
Russian | th |
Thai | uk |
Ukrainian |
| Event | Payload | Description |
|---|---|---|
connected |
- | Successfully connected |
disconnected |
- | Disconnected |
error |
string |
Error occurred |
statusChange |
AtticusStatus |
Connection status changed |
conversationStateChange |
ConversationState |
Conversation state changed |
message |
Message |
New message received |
historyChange |
Message[] |
Conversation history updated |
stateChange |
AtticusState |
Any state changed |
agentStart |
- | Agent started speaking |
agentEnd |
- | Agent stopped speaking |
userAudio |
- | User audio detected |
action |
UIAction |
UI action executed (or requested if doNotExecuteActions=true) |
When UI mode is enabled, the agent can perform these actions:
| Action | Description | Example Code |
|---|---|---|
click |
Click elements | document.getElementById('btn').click() |
type |
Enter text | document.getElementById('input').value = 'Hello' |
scroll |
Scroll page/elements | window.scrollTo(0, 500) |
focus |
Focus form elements | document.getElementById('field').focus() |
select |
Select dropdown options | document.getElementById('select').value = 'option1' |
hover |
Hover over elements | - |
navigate |
Navigate pages | window.location.href = '/page' |
read |
Read information (no code) | - |
connect()- Connect to the voice agentdisconnect()- Disconnect from the voice agenttoggle()- Toggle connection stateinterrupt()- Interrupt the AI while speakingsendMessage(text)- Send a text messageupdateDOM(element | html)- Manually update DOM contextrefreshDOM()- Refresh DOM from root elementstartAutoUpdate()- Start auto-updating DOMstopAutoUpdate()- Stop auto-updating DOMexecuteAction(action)- Manually execute a UI actiongetState()- Get complete state objectdestroy()- Clean up resources
status- Connection status (idle|connecting|connected|error)conversationState- Conversation state (idle|ai_speaking|user_turn|user_speaking)error- Error message (if any)history- Conversation historyisConnected- Is connectedisAiSpeaking- Is AI speakingisUserSpeaking- Is user speakinglanguage- Configured languagecurrentDOM- Current DOM contextisUIEnabled- Is UI mode enabled
The client secret (ephemeral key) must be obtained from OpenAI's API. You can get it directly via curl or from your backend.
curl -X POST "https://api.openai.com/v1/realtime/sessions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-realtime-preview-2024-12-17",
"voice": "shimmer"
}'Response:
{
"id": "sess_xxx",
"object": "realtime.session",
"model": "gpt-4o-realtime-preview-2024-12-17",
"client_secret": {
"value": "ek_xxx...",
"expires_at": 1234567890
}
}Copy the client_secret.value and use it with Atticus.
app.post("/api/session", async (req, res) => {
const response = await fetch(
"https://api.openai.com/v1/realtime/sessions",
{
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4o-realtime-preview-2024-12-17",
voice: "shimmer",
}),
}
);
const data = await response.json();
res.json({ clientSecret: data.client_secret.value });
});async function fetchClientSecret() {
const response = await fetch("/api/session", { method: "POST" });
const data = await response.json();
return data.clientSecret;
}
const clientSecret = await fetchClientSecret();
const agent = new Atticus({
clientSecret,
agent: {
name: "Assistant",
instructions: "Help users interact with the page.",
},
ui: {
enabled: true,
rootElement: document.getElementById("app"),
});# Clone the repo
git clone https://github.com/rajnandan1/atticus.git
cd atticus
# Install dependencies
npm install
# Start dev server (builds + serves demo)
npm run dev
# Open http://localhost:3000/demo/