Skip to content

Python Runtime Examples

Mike edited this page May 28, 2026 · 2 revisions

Runtime Examples

Multi-unit runtime

units = [
    xlocllm.unit("LLM", "Qwen-3.5-0.8b"),
    xlocllm.unit("embedding", "multilingual-e5-small"),
    xlocllm.unit("reranker", "bge-reranker-base"),
]
with xlocllm.runtime(units, port=1146) as rt:
    rt.run()
    print(rt.models())

OpenAI test harness with local native runtime

from openai import OpenAI

unit = xlocllm.unit("LLM", "Qwen-3.5-0.8b", quant="q4")
unit1 = xlocllm.unit("embedding", "multilingual-e5-small")

rt = xlocllm.runtime([unit, unit1], mode="native", port=1146)
rt.run()

# Below this line, use the normal OpenAI Python library.
client = OpenAI(base_url=rt.url, api_key="xlocllm")
print(client.models.list())

response = client.chat.completions.create(
    model=unit.model,
    messages=[{"role": "user", "content": "Write a smoke test checklist."}],
)
print(response.choices[0].message.content)

rt.close()

Chat UI

with xlocllm.runtime([llm]) as rt:
    rt.run()
    rt.chatui(session="demo", use_rag=True)

Production-style explicit port

runtime = xlocllm.runtime([llm], port=12000, mode="native")
runtime.run()
print("Use this base_url:", runtime.url)

Clone this wiki locally