A Node.js LLM chat toolkit built on top of @jsilvanus/embedeer.
Provides Explainer, LLMAdapter, ChatSession, prompt utilities, and higher-level agentic helpers for multi-turn tool-calling.
New additions include createChatProvider, runAgentLoop, and createOpenAiChatHandler (OpenAI /v1/chat/completions handler).
npm install @jsilvanus/chattydeerimport { Explainer } from '@jsilvanus/chattydeer';
const explainer = await Explainer.create('llama-3.2-3b', { deterministic: true }); // deterministic: stable, reproducible outputs (useful for tests)
const result = await explainer.explain({
task: 'narrate', // intent: what you want the explainer to produce (e.g. 'summarize', 'narrate')
domain: 'evolution', // domain helps choose domain-specific phrasing or templates
context: { filePath: 'src/auth/handler.ts' }, // optional contextual metadata (file path, URL, etc.)
evidence: [
// Evidence items: structured blocks the explainer will reason over
{ id: 1, source: 'src/auth/handler.ts', excerpt: '2024-03-15 *** LARGE CHANGE' },
],
maxTokens: 256, // limit the response length (tokens)
});
console.log(result.explanation); // "The auth handler underwent a major rewrite..."
await explainer.destroy();import { ChatSession } from '@jsilvanus/chattydeer';
const session = await ChatSession.create('llama-3.2-3b', {
systemPrompt: 'You are a gitsema guide assistant. Use tools to answer questions.',
tools: [
{
name: 'semantic_search',
description: 'Search the codebase semantically by natural-language query.',
parameters: {
type: 'object',
properties: { query: { type: 'string' } },
required: ['query'],
},
},
{
name: 'recent_commits',
description: 'Return the N most recent commits touching a file.',
parameters: {
type: 'object',
properties: {
filePath: { type: 'string' },
n: { type: 'number' },
},
required: ['filePath'],
},
},
],
});
// The session runs an agentic loop:
// LLM requests tool calls → ChatSession executes them → results fed back → repeat
const answer = await session.send('Which files changed most in the last month?', {
executeTool: async (name, args) => {
if (name === 'semantic_search') return mySearch(args.query);
if (name === 'recent_commits') return myCommits(args.filePath, args.n);
throw new Error(`Unknown tool: ${name}`);
},
maxIterations: 10, // safeguard against infinite loops
});
console.log(answer);
// Continue the conversation in the same session
const followUp = await session.send('Can you summarise the top file in one sentence?', {
executeTool: async (name, args) => { /* ... */ },
});
await session.destroy();To reduce the latency of the first request you can preload a model into the underlying embedding/generation layer. Two common options:
- Use
@jsilvanus/embedeerdirectly to download and cache a model ahead of time:
import { loadModel } from '@jsilvanus/embedeer';
// downloads and caches the model files so subsequent creation is fast
await loadModel('Xenova/all-MiniLM-L6-v2', { token: process.env.HF_TOKEN });- Or create and initialize a
LLMAdapter/ChatSessiononce at startup and reuse it:
import { LLMAdapter, ChatSession } from '@jsilvanus/chattydeer';
// warm a generator pipeline (keeps it in memory)
const adapter = await LLMAdapter.create('llama-3.2-3b', { token: process.env.HF_TOKEN });
// reuse adapter for sessions
const session = await ChatSession.create('irrelevant', { adapter });
// later: session.send(...)Both approaches reduce cold-start latency. Use loadModel() when you only need to ensure model artifacts are present on disk; use LLMAdapter.create() when you want an initialized in-process generator ready to serve requests.
Explainer.create(modelName, opts)— create an explainer bound to a modelexplainer.explain(request)— explain using structured evidence blocks; returns{ explanation, labels, references, meta }explainer.destroy()— release underlying resources
ChatSession.create(modelName, opts)— create a session; acceptstools,systemPrompt,adaptersession.send(userMessage, opts)— send a message and run the agentic tool-call loop; returns a final plain-text answeropts.executeTool—async (name, args, callId) => result— called for each tool the LLM requestsopts.maxIterations— loop guard (default10)opts.maxTokens— tokens per LLM call (default512)
session.history— read-only snapshot of allChatMessageobjects in the conversationsession.destroy()— release underlying resources
Additional ChatSession notes:
session.append(msg)— append aChatMessageto the session (useful for programmatic injection of tool or system messages).- Chat history is kept in-memory on the
ChatSessioninstance (the_historyarray).session.historyreturns a snapshot copy. There is no built-in durable persistence; to persist history, serializesession.historyyourself and replay or rehydrate into a newChatSession.
LLMAdapter.create(modelName, opts)— low-level text-generation adapteradapter.generate(prompt, opts)— returns{ text, raw, meta }adapter.destroy()— release underlying resources
explainForGitsema(payload, opts)— gitsema-compatible adapterrenderTemplate(domain, vars)— render a domain-specific prompt templateestimateTokensFromChars(chars)/trimEvidenceForBudget(prelude, evidence, budget)— prompt utilities
createChatProvider(httpUrl, model, apiKey)— factory that returns aChatCompletionProviderwithcomplete()andstream()methods; useful if you need to proxy to an OpenAI-compatible endpoint.runAgentLoop(session, opts)— run an agentic tool-calling loop using aChatCompletionProvider. Options includeprovider,tools,executeTool,maxRoundtrips,maxTokens,temperature,onMessage, andredactContent(a hook that can redact sensitive content before it leaves the process).createOpenAiChatHandler(provider, tools?, executeTool?, opts?)— returns an ExpressRequestHandlerimplementing a lightweight subset of OpenAI'sPOST /v1/chat/completionsAPI that maps requests torunAgentLoop. The handler honors theredactContenthook and is suitable for mounting inside an existing HTTP server.
See explainer-contract.md for the full Explainer interface contract.
MIT
