chattydeer

A Node.js chat completions toolkit

A Node.js LLM chat toolkit built on top of @jsilvanus/embedeer.

Provides Explainer, LLMAdapter, ChatSession, prompt utilities, and higher-level agentic helpers for multi-turn tool-calling. New additions include createChatProvider, runAgentLoop, and createOpenAiChatHandler (OpenAI /v1/chat/completions handler).

Install

npm install @jsilvanus/chattydeer

Usage

Single-turn explanation

import { Explainer } from '@jsilvanus/chattydeer';

const explainer = await Explainer.create('llama-3.2-3b', { deterministic: true }); // deterministic: stable, reproducible outputs (useful for tests)

const result = await explainer.explain({
  task: 'narrate',                     // intent: what you want the explainer to produce (e.g. 'summarize', 'narrate')
  domain: 'evolution',                 // domain helps choose domain-specific phrasing or templates
  context: { filePath: 'src/auth/handler.ts' }, // optional contextual metadata (file path, URL, etc.)
  evidence: [
    // Evidence items: structured blocks the explainer will reason over
    { id: 1, source: 'src/auth/handler.ts', excerpt: '2024-03-15 *** LARGE CHANGE' },
  ],
  maxTokens: 256,                       // limit the response length (tokens)
});

console.log(result.explanation); // "The auth handler underwent a major rewrite..."
await explainer.destroy();

Multi-turn chat with function calling (agentic loop)

import { ChatSession } from '@jsilvanus/chattydeer';

const session = await ChatSession.create('llama-3.2-3b', {
  systemPrompt: 'You are a gitsema guide assistant. Use tools to answer questions.',
  tools: [
    {
      name: 'semantic_search',
      description: 'Search the codebase semantically by natural-language query.',
      parameters: {
        type: 'object',
        properties: { query: { type: 'string' } },
        required: ['query'],
      },
    },
    {
      name: 'recent_commits',
      description: 'Return the N most recent commits touching a file.',
      parameters: {
        type: 'object',
        properties: {
          filePath: { type: 'string' },
          n:        { type: 'number' },
        },
        required: ['filePath'],
      },
    },
  ],
});

// The session runs an agentic loop:
//   LLM requests tool calls → ChatSession executes them → results fed back → repeat
const answer = await session.send('Which files changed most in the last month?', {
  executeTool: async (name, args) => {
    if (name === 'semantic_search')  return mySearch(args.query);
    if (name === 'recent_commits')   return myCommits(args.filePath, args.n);
    throw new Error(`Unknown tool: ${name}`);
  },
  maxIterations: 10, // safeguard against infinite loops
});

console.log(answer);

// Continue the conversation in the same session
const followUp = await session.send('Can you summarise the top file in one sentence?', {
  executeTool: async (name, args) => { /* ... */ },
});

await session.destroy();

Preloading / warming a model

To reduce the latency of the first request you can preload a model into the underlying embedding/generation layer. Two common options:

Use @jsilvanus/embedeer directly to download and cache a model ahead of time:

import { loadModel } from '@jsilvanus/embedeer';

// downloads and caches the model files so subsequent creation is fast
await loadModel('Xenova/all-MiniLM-L6-v2', { token: process.env.HF_TOKEN });

Or create and initialize a LLMAdapter/ChatSession once at startup and reuse it:

import { LLMAdapter, ChatSession } from '@jsilvanus/chattydeer';

// warm a generator pipeline (keeps it in memory)
const adapter = await LLMAdapter.create('llama-3.2-3b', { token: process.env.HF_TOKEN });

// reuse adapter for sessions
const session = await ChatSession.create('irrelevant', { adapter });
// later: session.send(...)

Both approaches reduce cold-start latency. Use loadModel() when you only need to ensure model artifacts are present on disk; use LLMAdapter.create() when you want an initialized in-process generator ready to serve requests.

API

`Explainer`

Explainer.create(modelName, opts) — create an explainer bound to a model
explainer.explain(request) — explain using structured evidence blocks; returns { explanation, labels, references, meta }
explainer.destroy() — release underlying resources

`ChatSession`

ChatSession.create(modelName, opts) — create a session; accepts tools, systemPrompt, adapter
session.send(userMessage, opts) — send a message and run the agentic tool-call loop; returns a final plain-text answer
- opts.executeTool — async (name, args, callId) => result — called for each tool the LLM requests
- opts.maxIterations — loop guard (default 10)
- opts.maxTokens — tokens per LLM call (default 512)
session.history — read-only snapshot of all ChatMessage objects in the conversation
session.destroy() — release underlying resources

Additional ChatSession notes:

session.append(msg) — append a ChatMessage to the session (useful for programmatic injection of tool or system messages).
Chat history is kept in-memory on the ChatSession instance (the _history array). session.history returns a snapshot copy. There is no built-in durable persistence; to persist history, serialize session.history yourself and replay or rehydrate into a new ChatSession.

`LLMAdapter`

LLMAdapter.create(modelName, opts) — low-level text-generation adapter
adapter.generate(prompt, opts) — returns { text, raw, meta }
adapter.destroy() — release underlying resources

Utilities

explainForGitsema(payload, opts) — gitsema-compatible adapter
renderTemplate(domain, vars) — render a domain-specific prompt template
estimateTokensFromChars(chars) / trimEvidenceForBudget(prelude, evidence, budget) — prompt utilities

Agentic helpers

createChatProvider(httpUrl, model, apiKey) — factory that returns a ChatCompletionProvider with complete() and stream() methods; useful if you need to proxy to an OpenAI-compatible endpoint.
runAgentLoop(session, opts) — run an agentic tool-calling loop using a ChatCompletionProvider. Options include provider, tools, executeTool, maxRoundtrips, maxTokens, temperature, onMessage, and redactContent (a hook that can redact sensitive content before it leaves the process).
createOpenAiChatHandler(provider, tools?, executeTool?, opts?) — returns an Express RequestHandler implementing a lightweight subset of OpenAI's POST /v1/chat/completions API that maps requests to runAgentLoop. The handler honors the redactContent hook and is suitable for mounting inside an existing HTTP server.

See explainer-contract.md for the full Explainer interface contract.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
assets		assets
src		src
test		test
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
chattydeer_contract.md		chattydeer_contract.md
explainer-contract.md		explainer-contract.md
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
tsconfig.types.json		tsconfig.types.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chattydeer

A Node.js chat completions toolkit

Install

Usage

Single-turn explanation

Multi-turn chat with function calling (agentic loop)

Preloading / warming a model

API

`Explainer`

`ChatSession`

`LLMAdapter`

Utilities

Agentic helpers

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

chattydeer

A Node.js chat completions toolkit

Install

Usage

Single-turn explanation

Multi-turn chat with function calling (agentic loop)

Preloading / warming a model

API

Explainer

ChatSession

LLMAdapter

Utilities

Agentic helpers

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Explainer`

`ChatSession`

`LLMAdapter`

Packages