@probeo/anymodel

OpenRouter-compatible LLM router with unified batch support. Self-hosted, zero fees.

Route requests across OpenAI, Anthropic, and Google with a single API. Add any OpenAI-compatible provider. Run as an SDK or standalone HTTP server.

Install

npm install @probeo/anymodel

Quick Start

Set your API keys as environment variables:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=AIza...

SDK Usage

import { AnyModel } from "@probeo/anymodel";

const client = new AnyModel();

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

Streaming

const stream = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Write a haiku" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Supported Providers

Set the env var and go. Models are auto-discovered from each provider's API.

Provider	Env Var	Example Model
OpenAI	`OPENAI_API_KEY`	`openai/gpt-4o`
Anthropic	`ANTHROPIC_API_KEY`	`anthropic/claude-sonnet-4-6`
Google	`GOOGLE_API_KEY`	`google/gemini-2.5-pro`
Mistral	`MISTRAL_API_KEY`	`mistral/mistral-large-latest`
Groq	`GROQ_API_KEY`	`groq/llama-3.3-70b-versatile`
DeepSeek	`DEEPSEEK_API_KEY`	`deepseek/deepseek-chat`
xAI	`XAI_API_KEY`	`xai/grok-3`
Together	`TOGETHER_API_KEY`	`together/meta-llama/Llama-3.3-70B-Instruct-Turbo`
Fireworks	`FIREWORKS_API_KEY`	`fireworks/accounts/fireworks/models/llama-v3p3-70b-instruct`
Perplexity	`PERPLEXITY_API_KEY`	`perplexity/sonar-pro`
Ollama	`OLLAMA_BASE_URL`	`ollama/llama3.3`

Ollama runs locally with no API key — just set OLLAMA_BASE_URL (defaults to http://localhost:11434/v1).

Model Naming

Models use provider/model format:

anthropic/claude-sonnet-4-6
openai/gpt-4o
google/gemini-2.5-pro
mistral/mistral-large-latest
groq/llama-3.3-70b-versatile
deepseek/deepseek-chat
xai/grok-3
perplexity/sonar-pro
ollama/llama3.3

Flex Pricing (OpenAI)

Get 50% off OpenAI requests with flexible latency:

const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
  service_tier: "flex",
});

Fallback Routing

Try multiple models in order. If one fails, the next is attempted:

const response = await client.chat.completions.create({
  model: "",
  models: [
    "anthropic/claude-sonnet-4-6",
    "openai/gpt-4o",
    "google/gemini-2.5-pro",
  ],
  route: "fallback",
  messages: [{ role: "user", content: "Hello" }],
});

Tool Calling

Works across all providers with a unified interface:

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [{ role: "user", content: "What's the weather in NYC?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string" },
          },
          required: ["location"],
        },
      },
    },
  ],
  tool_choice: "auto",
});

if (response.choices[0].message.tool_calls) {
  for (const call of response.choices[0].message.tool_calls) {
    console.log(call.function.name, call.function.arguments);
  }
}

Structured Output

const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "List 3 colors" }],
  response_format: { type: "json_object" },
});

Batch Processing

Process many requests with native provider batch APIs or concurrent fallback. OpenAI, Anthropic, and Google batches are processed server-side — OpenAI at 50% cost, Anthropic with async processing for up to 10K requests, Google at 50% cost via batchGenerateContent. Other providers fall back to concurrent execution automatically.

Submit and wait

const results = await client.batches.createAndPoll({
  model: "openai/gpt-4o-mini",
  requests: [
    { custom_id: "req-1", messages: [{ role: "user", content: "Summarize AI" }] },
    { custom_id: "req-2", messages: [{ role: "user", content: "Summarize ML" }] },
    { custom_id: "req-3", messages: [{ role: "user", content: "Summarize NLP" }] },
  ],
});

for (const result of results.results) {
  console.log(result.custom_id, result.response?.choices[0].message.content);
}

Submit now, check later

Submit a batch and get back an ID immediately — no need to keep the process running for native batches (OpenAI, Anthropic, Google):

// Submit and get the batch ID
const batch = await client.batches.create({
  model: "anthropic/claude-haiku-4-5",
  requests: [
    { custom_id: "req-1", messages: [{ role: "user", content: "Summarize AI" }] },
    { custom_id: "req-2", messages: [{ role: "user", content: "Summarize ML" }] },
  ],
});
console.log(batch.id); // "batch-abc123"
console.log(batch.batch_mode); // "native" or "concurrent"

// Check status any time — even after a process restart
const status = client.batches.get("batch-abc123");
console.log(status.status); // "pending", "processing", "completed", "failed"

// Wait for results when you're ready (reconnects to provider API)
const results = await client.batches.poll("batch-abc123");

// Or get results directly if already completed
const results = client.batches.results("batch-abc123");

List and cancel

// List all batches on disk
const all = client.batches.list();
for (const b of all) {
  console.log(b.id, b.batch_mode, b.status, b.provider_name);
}

// Cancel a running batch (also cancels at the provider for native batches)
await client.batches.cancel("batch-abc123");

Batch configuration

const client = new AnyModel({
  batch: {
    pollInterval: 10000, // default poll interval in ms (default: 5000)
    concurrencyFallback: 10, // concurrent request limit for non-native providers (default: 5)
  },
  io: {
    readConcurrency: 30, // concurrent file reads (default: 20)
    writeConcurrency: 15, // concurrent file writes (default: 10)
  },
});

// Override poll interval per call
const results = await client.batches.createAndPoll(request, {
  interval: 3000, // poll every 3s for this batch
  onProgress: (batch) => {
    console.log(`${batch.completed}/${batch.total} done`);
  },
});

Batches are persisted to ./.anymodel/batches/ in the current working directory and survive process restarts.

Automatic max_tokens

When max_tokens isn't set on a batch request, anymodel automatically calculates a safe value per-request based on the estimated input size and the model's context window. This prevents truncated responses and context overflow errors without requiring you to hand-tune each request in a large batch. The estimation uses a ~4 chars/token heuristic with a 5% safety margin — conservative enough to avoid overflows, lightweight enough to skip tokenizer dependencies.

Models Endpoint

const models = await client.models.list();
const anthropicModels = await client.models.list({ provider: "anthropic" });

Generation Stats

const response = await client.chat.completions.create({ ... });
const stats = client.generation.get(response.id);
console.log(stats.latency, stats.tokens_prompt, stats.tokens_completion);

Configuration

Programmatic

const client = new AnyModel({
  anthropic: { apiKey: "sk-ant-..." },
  openai: { apiKey: "sk-..." },
  google: { apiKey: "AIza..." },
  aliases: {
    default: "anthropic/claude-sonnet-4-6",
    fast: "anthropic/claude-haiku-4-5",
    smart: "anthropic/claude-opus-4-6",
  },
  defaults: {
    temperature: 0.7,
    max_tokens: 4096,
    retries: 2,
    timeout: 120, // HTTP timeout in seconds (default: 120 = 2 min, flex: 600 = 10 min)
  },
});

// Use aliases as model names
const response = await client.chat.completions.create({
  model: "fast",
  messages: [{ role: "user", content: "Quick answer" }],
});

Config File

Create anymodel.config.json in your project root:

{
  "anthropic": {
    "apiKey": "${ANTHROPIC_API_KEY}"
  },
  "aliases": {
    "default": "anthropic/claude-sonnet-4-6",
    "fast": "anthropic/claude-haiku-4-5"
  },
  "defaults": {
    "temperature": 0.7,
    "max_tokens": 4096
  },
  "batch": {
    "pollInterval": 5000,
    "concurrencyFallback": 5
  },
  "io": {
    "readConcurrency": 20,
    "writeConcurrency": 10
  }
}

${ENV_VAR} references are interpolated from environment variables.

Config Resolution Order

Programmatic options (highest priority)
Local anymodel.config.json
Global ~/.anymodel/config.json
Environment variables (lowest priority)

Configs are deep-merged, not replaced.

Custom Providers

Add any OpenAI-compatible endpoint:

const client = new AnyModel({
  custom: {
    ollama: {
      baseURL: "http://localhost:11434/v1",
      models: ["llama3.3", "mistral"],
    },
    together: {
      baseURL: "https://api.together.xyz/v1",
      apiKey: "your-key",
    },
  },
});

const response = await client.chat.completions.create({
  model: "ollama/llama3.3",
  messages: [{ role: "user", content: "Hello from Ollama" }],
});

Provider Preferences

Control which providers are used and in what order:

const response = await client.chat.completions.create({
  model: "",
  models: ["anthropic/claude-sonnet-4-6", "openai/gpt-4o", "google/gemini-2.5-pro"],
  route: "fallback",
  provider: {
    order: ["anthropic", "openai"],
    ignore: ["google"],
  },
  messages: [{ role: "user", content: "Hello" }],
});

Transforms

Automatically truncate long conversations to fit within context windows:

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: veryLongConversation,
  transforms: ["middle-out"],
});

middle-out preserves the system prompt and most recent messages, removing from the middle.

Server Mode

Run as a standalone HTTP server compatible with the OpenAI SDK:

npx anymodel serve --port 4141

Then point any OpenAI-compatible client at it:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4141/api/v1",
  apiKey: "unused",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [{ role: "user", content: "Hello via server" }],
});

Server Endpoints

Method	Path	Description
POST	`/api/v1/chat/completions`	Chat completion (streaming supported)
GET	`/api/v1/models`	List available models
GET	`/api/v1/generation/:id`	Get generation stats
POST	`/api/v1/batches`	Create a batch
GET	`/api/v1/batches`	List batches
GET	`/api/v1/batches/:id`	Get batch status
GET	`/api/v1/batches/:id/results`	Get batch results
POST	`/api/v1/batches/:id/cancel`	Cancel a batch
GET	`/health`	Health check

Examples

See examples/basic.ts for runnable demos of completions, streaming, tool calling, fallback routing, batch processing, and generation stats.

# Run all examples
npx tsx examples/basic.ts

# Run a specific example
npx tsx examples/basic.ts stream
npx tsx examples/basic.ts tools
npx tsx examples/basic.ts batch

Built-in Resilience

Retries: Automatic retry with exponential backoff on 429/502/503 errors (configurable via defaults.retries)
Rate limit tracking: Per-provider rate limit state, automatically skips rate-limited providers during fallback routing
Parameter stripping: Unsupported parameters are automatically removed before forwarding to providers
Smart batch defaults: Automatic max_tokens estimation per-request in batches — calculates safe values from input size and model context limits, preventing truncation and overflow without manual tuning
Memory-efficient batching: Concurrent batch requests are streamed from disk — only N requests (default 5) are in-flight at a time, making 10K+ request batches safe without memory spikes
High-volume IO: All batch file operations use concurrency-limited async queues with atomic durable writes (temp file + fsync + rename) to prevent corruption on crash. Defaults: 20 concurrent reads, 10 concurrent writes — configurable via io.readConcurrency and io.writeConcurrency

Roadmap

A/B testing — split routing (% traffic to each model) and compare mode (same request to multiple models, return all responses with stats)
Cost tracking — per-request and aggregate cost calculation from provider pricing
Caching — response caching with configurable TTL for identical requests
Native batch APIs — OpenAI Batch API (JSONL upload, 50% cost), Anthropic Message Batches (10K requests, async), and Google Gemini Batch (50% cost). Auto-detects provider and routes to native API, falls back to concurrent for other providers
Result export — saveResults() to write batch results to a configurable output directory
Prompt logging — optional request/response logging for debugging and evaluation

Also Available

Python: anymodel-py on PyPI
Go: anymodel-go

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

@probeo/anymodel

Install

Quick Start

SDK Usage

Streaming

Supported Providers

Model Naming

Flex Pricing (OpenAI)

Fallback Routing

Tool Calling

Structured Output

Batch Processing

Submit and wait

Submit now, check later

List and cancel

Batch configuration

Automatic max_tokens

Models Endpoint

Generation Stats

Configuration

Programmatic

Config File

Config Resolution Order

Custom Providers

Provider Preferences

Transforms

Server Mode

Server Endpoints

Examples

Built-in Resilience

Roadmap

Also Available

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages