Skip to content

lm15-dev/lm15-python

Repository files navigation

lm15

PyPI version Python 3.10+ MIT

One interface for OpenAI, Anthropic, and Gemini. Zero dependencies.

lm15 google-genai litellm
install 72ms 137ms 184ms
import 95ms 2,656ms 4,534ms
total (install → response) 1,090ms 3,992ms 5,840ms
dependencies 0 25 55
disk footprint 408K 41M 155M

Median of 10 cold-start runs. Fresh venv, single completion against gemini-3.1-flash-lite-preview. Benchmark source.

import lm15

resp = lm15.call("claude-sonnet-4-5", "Hello.")
print(resp.text)

Switch models by changing the string. Same types, same streaming, same tool calling. That's it.

Yes, we know.

Install

pip install lm15

Set at least one provider key:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=...         # or GOOGLE_API_KEY

Or use a .env file and configure once:

import lm15
lm15.configure(env=".env")

# No env= needed on any subsequent call
resp = lm15.call("gpt-4.1-mini", "Hello.")

Discover what is available:

import lm15

print(lm15.providers_info())
for m in lm15.models(provider="openai")[:5]:
    print(m.id)

Usage

Streaming

for text in lm15.call("gpt-4.1-mini", "Write a haiku."):
    print(text, end="")

Full event access:

for event in lm15.call("gpt-4.1-mini", "Write a haiku.").events():
    match event.type:
        case "text":     print(event.text, end="")
        case "thinking": print(f"💭 {event.text}", end="")
        case "finished": print(f"\n📊 {event.response.usage}")

Tools (auto-execute)

Pass Python functions — schema is inferred, execution is automatic:

def get_weather(city: str) -> str:
    """Get weather by city."""
    return f"22°C in {city}"

resp = lm15.call("gpt-4.1-mini", "Weather in Montreal?", tools=[get_weather])
print(resp.text)  # "It's 22°C in Montreal."

Tools (manual)

from lm15 import FunctionTool

weather = FunctionTool(name="get_weather", description="Get weather", parameters={...})
gpt = lm15.model("gpt-4.1-mini")

resp = gpt.call("Weather in Montreal?", tools=[weather])
results = {tc.id: "22°C, sunny" for tc in resp.tool_calls}
resp = gpt.submit_tools(results)
print(resp.text)

Inspect before sending

req = lm15.prepare("gpt-4.1-mini", "Weather?", tools=[get_weather])
print(req.tools[0].name)        # "get_weather"
print(req.tools[0].parameters)  # inferred JSON Schema
print(req.messages)             # constructed messages

resp = lm15.send(req)           # send when ready

Images, audio, video, documents

from lm15 import Part

# Image from URL
resp = lm15.call("gemini-2.5-flash", ["Describe this.", Part.image(url="https://example.com/cat.jpg")])

# Image generation → vision (cross-model)
resp = lm15.call("gpt-4.1-mini", "Draw a cat.", output="image")
resp2 = lm15.call("claude-sonnet-4-5", ["What's this?", resp.image])

# Document
resp = lm15.call("claude-sonnet-4-5", ["Summarize.", Part.document(url="https://example.com/paper.pdf")])

# Upload via provider file API
doc = lm15.upload("claude-sonnet-4-5", "contract.pdf")
resp = lm15.call("claude-sonnet-4-5", ["Find liability clauses.", doc])

Structured output (JSON)

resp = lm15.call("gpt-4.1-mini", "Extract: 'Alice is 30.'.",
    system="Return JSON: {name, age}", prefill="{")
data = resp.json  # parsed dict — raises ValueError if not valid JSON
print(data["name"], data["age"])  # Alice 30

Image and audio bytes

# Get generated image as raw bytes
resp = lm15.call("gpt-4.1-mini", "Draw a cat.", output="image")
with open("cat.png", "wb") as f:
    f.write(resp.image_bytes)  # decoded bytes, no base64 wrangling

# Same for audio
resp = lm15.call("gpt-4o-mini-tts", "Say hello.", output="audio")
with open("hello.wav", "wb") as f:
    f.write(resp.audio_bytes)

Reasoning

resp = lm15.call("claude-sonnet-4-5", "Prove √2 is irrational.", reasoning=True)
print(resp.thinking)  # chain of thought
print(resp.text)      # final answer

Conversation

gpt = lm15.model("gpt-4.1-mini", system="You remember everything.")

gpt.call("My name is Max.")
gpt.call("I like chess.")
resp = gpt.call("What do you know about me?")
print(resp.text)  # knows both

Prompt caching

Reduces cost and latency for repeated prefixes — system prompts, long documents, agent loops:

agent = lm15.model("claude-sonnet-4-5",
    system="<long system prompt>",
    tools=[read_file, write_file],
    prompt_caching=True,
)

resp = agent.call("Add tests for auth.")
while resp.finish_reason == "tool_call":
    results = execute(resp.tool_calls)
    resp = agent.submit_tools(results)
    print(f"Cache hit: {resp.usage.cache_read_tokens} tokens")

Prefill

resp = lm15.call("claude-sonnet-4-5", "Output JSON for a person.", prefill="{")

Reusable model with config

gpt = lm15.model("gpt-4.1-mini", system="You are terse.", retries=3, cache=True, temperature=0)
resp = gpt.call("Hello.")

# Override per call
resp = gpt.call("Be creative.", temperature=1.5)

# Derive new models
claude = gpt.copy(model="claude-sonnet-4-5")

Config from dicts

config = {"model": "gpt-4.1-mini", "system": "You are terse.", "temperature": 0}
resp = lm15.call(prompt="Summarize DNA.", **config)

Built-in tools

resp = lm15.call("gpt-4.1-mini", "Latest AI news", tools=["web_search"])
for c in resp.citations:
    print(c.title, c.url)

Provider support

Capability OpenAI Anthropic Gemini
complete
stream
embeddings
files
batches
images
audio
prompt caching auto

Architecture

lm15.call / lm15.acall / lm15.model   ← high-level surface
                │
                ▼
          Result / AsyncResult
                │
                ▼
LMRequest ──▶ UniversalLM ──▶ MiddlewarePipeline ──▶ ProviderAdapter ──▶ Transport
                  │                                        │
                  │ resolve_provider(model)                 │ build_request / parse_stream_event
                  ▼                                        ▼
            capabilities.py                         providers/{openai,anthropic,gemini}.py

The high-level surface (lm15.call, lm15.acall, lm15.model, Result) is a thin layer over LMRequest, UniversalLM, and provider adapters. Third parties can still build their own surface on top of the same internals.

Why this exists

  • Stdlib only. No requests, no httpx, no aiohttp. Transport is urllib or optional pycurl.
  • Frozen dataclasses all the way down. High level: Result out. Low level: LMRequest / LMResponse stay fully accessible. No mutable builder chains.
  • Nothing is hidden. Every internal type is importable. Provider escape hatches are always there.
  • Plugin discovery via entry points. Third-party providers install and register without touching lm15 core.

Docs

Topic Path
API v2 spec (legacy) docs/API_SPEC_V2.md
Getting started docs/GETTING_STARTED.md
Core concepts docs/CONCEPTS.md
Architecture docs/ARCHITECTURE.md
Provider contract docs/CONTRACT.md
Portability spec docs/PORTABILITY.md
Transport design docs/DESIGN_TRANSPORT.md
Error handling docs/ERRORS.md
Streaming docs/STREAMING.md
Writing an adapter docs/ADAPTER_GUIDE.md
Adding a provider docs/ADD_PROVIDER_GUIDE.md
Completeness testing docs/COMPLETENESS.md
Production checklist docs/PRODUCTION_CHECKLIST.md

Cookbooks v2: docs/COOKBOOKS_V2/ — practical examples + references:

  1. Hello World
  2. Streaming
  3. Tools (auto-execute)
  4. Tools (manual loop)
  5. Multimodal
  6. Reasoning
  7. Conversation
  8. Prompt caching
  9. Model config
  10. Building an agent
  11. call()/acall()/Result reference
  12. Model discovery and provider status
  13. Live sessions (real-time audio & video)

Cookbooks v1 (low-level): docs/COOKBOOKS/ — 8 examples using the internal LMRequest/UniversalLM API directly.

License

MIT

About

Universal LM core with pluggable provider adapters for OpenAI, Anthropic, and Gemini.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages