Skip to content

Python Unit Examples

Mike edited this page May 28, 2026 · 1 revision

Unit Examples

Fast native LLM

llm = xlocllm.unit("LLM", "Qwen-3.5-0.8b", quant="q4")
with xlocllm.runtime([llm]) as rt:
    rt.run()
    print(rt.chat("Give me three local AI use cases."))

Browser WebGPU LLM

with xlocllm.webgpu:
    llm = xlocllm.unit("LLM", "SmolLM2-360M-Instruct-q4f16_1-MLC")
    with xlocllm.runtime([llm], mode="web") as rt:
        rt.run()
        print(rt.chat("Hello from WebGPU"))

Browser CPU/WASM classifier

with xlocllm.web:
    clf = xlocllm.unit(
        "text-classification",
        "Xenova/distilbert-base-uncased-finetuned-sst-2-english",
    )
    with xlocllm.runtime([clf], mode="web") as rt:
        rt.run()
        print(rt.invoke("text.classify", {"text": "xlocllm is useful"}))

Embedding + reranker + RAG

emb = xlocllm.unit("embedding", "multilingual-e5-small")
rerank = xlocllm.unit("reranker", "bge-reranker-base")
rag = xlocllm.rag(emb=emb, rerank=rerank, name="docs")
llm = xlocllm.unit("LLM", "Qwen-3.5-0.8b-fp32", rag=rag)

with xlocllm.runtime([llm]) as rt:
    rt.run()
    rag.add(["Refunds take up to five business days."], ids=["refund"])
    print(rt.chat("How long does a refund take?"))

Custom ONNX regression

reg = xlocllm.unit(
    "model.onnx",
    type="regression",
    name="local-regression",
    input_name="float_input",
)
with xlocllm.runtime([reg]) as rt:
    rt.run()
    print(reg.predict([[1.0, 2.0, 3.0]]))

Clone this wiki locally