Skip to content

modelscan/registry

Repository files navigation

modelscan registry

English | 简体中文

An open registry of large-language-model metadata. One machine-consumable JSON file — models.json — describing model identity, authorship, modalities, context/output limits, capabilities, lifecycle, and per-source commercial offers (prices, endpoints, rate limits) with the originating source kept as provenance.

Public site: https://modelscan.io/

Consume it

The canonical, always-current file:

https://raw.githubusercontent.com/modelscan/registry/main/models.json
curl -s https://raw.githubusercontent.com/modelscan/registry/main/models.json | jq '.models | length'
const { models } = await fetch(
  'https://raw.githubusercontent.com/modelscan/registry/main/models.json',
).then((r) => r.json())

The file is a single object: { schema_version, generated_at, count, models[] }. It is validated in CI against schema/models.schema.json (JSON Schema, draft 2020-12).

What a model looks like

{
  "id": "claude-opus-4-7",                  // canonical id (URL slug / program reference)
  "model": "Claude Opus 4.7",               // display name
  "author": "anthropic",                    // developer (a provider id)
  "alias_id": ["anthropic/claude-opus-4-7", "us.anthropic.claude-opus-4-7"],
  "input_modalities": ["text", "image"],
  "output_modalities": ["text"],
  "context_length": 200000,
  "max_output_tokens": 64000,
  "reasoning": true,
  "tool_calling": true,
  "release_timestamp": 1730000000,
  "endpoints": ["chat"],                    // API operations any source exposes
  "other_parameters": { "knowledge_cutoff": "2025-03" },
  "offers": [                               // one per source — prices, route, limits + provenance
    {
      "source": "openrouter",
      "currency": "USD",
      "prices": [{ "input": { "amount": 15, "unit": "per_1m_tokens" },
                   "output": { "amount": 75, "unit": "per_1m_tokens" } }]
    }
  ]
}

Key ideas

  • Stable identity. Every model has one canonical id. Dated snapshots fold to their base id, and the dated / vendor-prefixed forms are preserved in alias_id — so the same model is never split into two rows across sources. author is always a provider id, so a developer never appears under two spellings.
  • Two currencies. Pricing is kept in its native currency — never lossily converted: USD offers from OpenRouter / LiteLLM, CNY offers from Alibaba Bailian (百炼) / Volcengine Ark (火山方舟). A single model can carry both, side by side.
  • Facts vs offers. Top-level fields are source-agnostic facts merged per field across sources. Commercial data (prices, endpoint paths, rate limits) lives in offers[], one per source, each carrying its source as provenance — so you can see where every number came from.
  • Tiered & conditional pricing. prices[] is a list of tiers; a tier may carry conditions (input-length thresholds, or a variant label for axes like video resolution / audio).
  • Lifecycle. A model that disappears from every source is marked deprecation: { status: "delisted", since } and kept, never deleted.

See schema/models.schema.json for the full contract.

Contributing

Corrections and additions are welcome — see CONTRIBUTING.md. models.json is machine-generated, so fixes are applied as maintainer overrides rather than direct edits to the generated file.

License

models.json, the schema, and the docs are licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). Use it anywhere, including commercially — just attribute modelscan registry (https://modelscan.io/).

About

Open registry of large-language-model metadata — identity, authorship, modalities, context/output limits, capabilities & lifecycle dates as one machine-readable models.json validated by JSON Schema. CC BY 4.0.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors