Skip to content

jsilvanus/chattydeer

Repository files navigation

chattydeer

Embedeer Logo: a deer with vector numbers between antlers. Logo generated by ChatGPT. Public Domain.

A Node.js chat completions toolkit

A Node.js LLM chat toolkit built on top of @jsilvanus/embedeer.

Provides Explainer, LLMAdapter, ChatSession, prompt utilities, and higher-level agentic helpers for multi-turn tool-calling. New additions include createChatProvider, runAgentLoop, and createOpenAiChatHandler (OpenAI /v1/chat/completions handler).

Install

npm install @jsilvanus/chattydeer

Usage

Single-turn explanation

import { Explainer } from '@jsilvanus/chattydeer';

const explainer = await Explainer.create('llama-3.2-3b', { deterministic: true }); // deterministic: stable, reproducible outputs (useful for tests)

const result = await explainer.explain({
  task: 'narrate',                     // intent: what you want the explainer to produce (e.g. 'summarize', 'narrate')
  domain: 'evolution',                 // domain helps choose domain-specific phrasing or templates
  context: { filePath: 'src/auth/handler.ts' }, // optional contextual metadata (file path, URL, etc.)
  evidence: [
    // Evidence items: structured blocks the explainer will reason over
    { id: 1, source: 'src/auth/handler.ts', excerpt: '2024-03-15 *** LARGE CHANGE' },
  ],
  maxTokens: 256,                       // limit the response length (tokens)
});

console.log(result.explanation); // "The auth handler underwent a major rewrite..."
await explainer.destroy();

Multi-turn chat with function calling (agentic loop)

import { ChatSession } from '@jsilvanus/chattydeer';

const session = await ChatSession.create('llama-3.2-3b', {
  systemPrompt: 'You are a gitsema guide assistant. Use tools to answer questions.',
  tools: [
    {
      name: 'semantic_search',
      description: 'Search the codebase semantically by natural-language query.',
      parameters: {
        type: 'object',
        properties: { query: { type: 'string' } },
        required: ['query'],
      },
    },
    {
      name: 'recent_commits',
      description: 'Return the N most recent commits touching a file.',
      parameters: {
        type: 'object',
        properties: {
          filePath: { type: 'string' },
          n:        { type: 'number' },
        },
        required: ['filePath'],
      },
    },
  ],
});

// The session runs an agentic loop:
//   LLM requests tool calls → ChatSession executes them → results fed back → repeat
const answer = await session.send('Which files changed most in the last month?', {
  executeTool: async (name, args) => {
    if (name === 'semantic_search')  return mySearch(args.query);
    if (name === 'recent_commits')   return myCommits(args.filePath, args.n);
    throw new Error(`Unknown tool: ${name}`);
  },
  maxIterations: 10, // safeguard against infinite loops
});

console.log(answer);

// Continue the conversation in the same session
const followUp = await session.send('Can you summarise the top file in one sentence?', {
  executeTool: async (name, args) => { /* ... */ },
});

await session.destroy();

Preloading / warming a model

To reduce the latency of the first request you can preload a model into the underlying embedding/generation layer. Two common options:

  • Use @jsilvanus/embedeer directly to download and cache a model ahead of time:
import { loadModel } from '@jsilvanus/embedeer';

// downloads and caches the model files so subsequent creation is fast
await loadModel('Xenova/all-MiniLM-L6-v2', { token: process.env.HF_TOKEN });
  • Or create and initialize a LLMAdapter/ChatSession once at startup and reuse it:
import { LLMAdapter, ChatSession } from '@jsilvanus/chattydeer';

// warm a generator pipeline (keeps it in memory)
const adapter = await LLMAdapter.create('llama-3.2-3b', { token: process.env.HF_TOKEN });

// reuse adapter for sessions
const session = await ChatSession.create('irrelevant', { adapter });
// later: session.send(...)

Both approaches reduce cold-start latency. Use loadModel() when you only need to ensure model artifacts are present on disk; use LLMAdapter.create() when you want an initialized in-process generator ready to serve requests.

API

Explainer

  • Explainer.create(modelName, opts) — create an explainer bound to a model
  • explainer.explain(request) — explain using structured evidence blocks; returns { explanation, labels, references, meta }
  • explainer.destroy() — release underlying resources

ChatSession

  • ChatSession.create(modelName, opts) — create a session; accepts tools, systemPrompt, adapter
  • session.send(userMessage, opts) — send a message and run the agentic tool-call loop; returns a final plain-text answer
    • opts.executeToolasync (name, args, callId) => result — called for each tool the LLM requests
    • opts.maxIterations — loop guard (default 10)
    • opts.maxTokens — tokens per LLM call (default 512)
  • session.history — read-only snapshot of all ChatMessage objects in the conversation
  • session.destroy() — release underlying resources

Additional ChatSession notes:

  • session.append(msg) — append a ChatMessage to the session (useful for programmatic injection of tool or system messages).
  • Chat history is kept in-memory on the ChatSession instance (the _history array). session.history returns a snapshot copy. There is no built-in durable persistence; to persist history, serialize session.history yourself and replay or rehydrate into a new ChatSession.

LLMAdapter

  • LLMAdapter.create(modelName, opts) — low-level text-generation adapter
  • adapter.generate(prompt, opts) — returns { text, raw, meta }
  • adapter.destroy() — release underlying resources

Utilities

  • explainForGitsema(payload, opts) — gitsema-compatible adapter
  • renderTemplate(domain, vars) — render a domain-specific prompt template
  • estimateTokensFromChars(chars) / trimEvidenceForBudget(prelude, evidence, budget) — prompt utilities

Agentic helpers

  • createChatProvider(httpUrl, model, apiKey) — factory that returns a ChatCompletionProvider with complete() and stream() methods; useful if you need to proxy to an OpenAI-compatible endpoint.
  • runAgentLoop(session, opts) — run an agentic tool-calling loop using a ChatCompletionProvider. Options include provider, tools, executeTool, maxRoundtrips, maxTokens, temperature, onMessage, and redactContent (a hook that can redact sensitive content before it leaves the process).
  • createOpenAiChatHandler(provider, tools?, executeTool?, opts?) — returns an Express RequestHandler implementing a lightweight subset of OpenAI's POST /v1/chat/completions API that maps requests to runAgentLoop. The handler honors the redactContent hook and is suitable for mounting inside an existing HTTP server.

See explainer-contract.md for the full Explainer interface contract.

License

MIT

About

A Node.js minimal LLM chat completionist

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors