-
Notifications
You must be signed in to change notification settings - Fork 0
About
Mike edited this page May 28, 2026
·
1 revision
xlocllm закрывает задачу локального inference без внешнего API: Python код выбирает модель,
группирует ее в runtime, поднимает локальный bridge на 127.0.0.1 и отдает совместимые с OpenAI endpoints.
-
ModelInfo- запись каталога модели с runtime, hardware и task metadata. -
Unit- capability/model пара, напримерLLM + Qwenилиembedding + multilingual-e5-small. -
Runtime- набор units, которые должны жить и запускаться вместе. -
Bridge/NativeBridge- локальный HTTP control plane для выбранного режима.
| Режим | Как включить | Где выполняется модель | Когда выбирать |
|---|---|---|---|
| native |
xlocllm.mode = "native" или default |
локальные native engines: llama.cpp/GGUF, ONNX Runtime | серверные и production Python сценарии, RAG, CPU/GPU локально |
| webgpu |
with xlocllm.webgpu: или @xlocllm.webgpu
|
paired browser window, WebGPU/WebNN если доступно | демо и browser-backed inference с GPU |
| web |
with xlocllm.web: или @xlocllm.web
|
paired browser window, CPU/WASM fallback | модели без WebGPU, легкие Transformers.js задачи |
Глобально SDK различает mode="native" и mode="web"; webgpu и web - это scoped helpers,
которые выставляют browser device defaults (webgpu или wasm) для создаваемых units.
По умолчанию xlocllm хранит bridge metadata, native engine/model cache, vector stores и browser profiles:
- Windows:
%LOCALAPPDATA%\xlocllm - Linux/macOS:
$XDG_STATE_HOME/xlocllmили~/.local/state/xlocllm
Environment variables:
| Переменная | Назначение |
|---|---|
XLOCLLM_HOME |
переопределить state/cache directory |
XLOCLLM_WEB_URL |
использовать кастомный web runtime URL |
XLOCLLM_LOG_LEVEL |
уровень логов uvicorn |
XLOCLLM_NATIVE_DISABLE_INSTALL=1 |
запретить managed native dependency install и падать с диагностикой |
- xlocllm
- Quickstart
- About
- Functions Python
- Functions TypeScript
- Use cases
- Examples Python
- Examples TypeScript
- Shared GPU mode
-
Models catalog
- Models The best
- Models Full model list
- Models Use your model
- For native mode
- Models Native LLM tiny small
- Models Native LLM medium
- Models Native LLM large
- Models Native embedding
- Models Native reranker
- Models Native translator
- Models Native tts
- Models Native vlm
- Models Native asr
- Models Native ocr
- Models Native image-classification
- Models Native object-detection
- Models Native image-segmentation
- Models Native depth-estimation
- Models Native document-layout
- Models Native table-detection
- Models Native document-qa
- Models Native language-id
- Models Native audio-classification
- Models Native text-classification
- Models Native ner
- Models Native zero-shot-text
- Models Native summarization
- Models Native text2text
- Models Native code
- For webgpu mode
- For web mode
- Models Web LLM
- Models Web embedding
- Models Web reranker
- Models Web translator
- Models Web tts
- Models Web vlm
- Models Web asr
- Models Web ocr
- Models Web image-classification
- Models Web object-detection
- Models Web image-segmentation
- Models Web depth-estimation
- Models Web document-layout
- Models Web table-detection
- Models Web document-qa
- Models Web zero-shot-image
- Models Web language-id
- Models Web audio-classification
- Models Web text-classification
- Models Web ner
- Models Web zero-shot-text
- Models Web summarization
- Models Web text2text
- Models Web code
- Dev