-
Notifications
You must be signed in to change notification settings - Fork 0
Use Cases Quick Dev with OpenAI
Mike edited this page May 28, 2026
·
2 revisions
Этот сценарий подходит, когда приложение уже написано под официальную Python библиотеку openai,
но на этапе разработки хочется быстро запускать запросы локально, без внешнего API.
- нужно проверить prompt, agents, RAG или tool-calling glue локально;
- хочется оставить текущий OpenAI-compatible client code почти без изменений;
- нужен стабильный локальный
/v1endpoint для тестов; - важно, чтобы inference шел в
nativeрежиме, а не в browser runtime.
import xlocllm
from openai import OpenAI
unit = xlocllm.unit("LLM", "Qwen-3.5-0.8b", quant="q4")
unit1 = xlocllm.unit("embedding", "multilingual-e5-small")
rt = xlocllm.runtime([unit, unit1], mode="native", port=1146)
rt.run()
# Existing OpenAI client code starts here.
client = OpenAI(base_url=rt.url, api_key="xlocllm")
response = client.chat.completions.create(
model=unit.model,
messages=[
{"role": "system", "content": "You are a concise local dev assistant."},
{"role": "user", "content": "Draft a test plan for a login form."},
],
temperature=0,
max_tokens=256,
)
print(response.choices[0].message.content)
rt.close()Все, что ниже client = OpenAI(...), остается обычным кодом OpenAI library:
client.chat.completions.create(...), client.embeddings.create(...), wrappers ваших тестов,
fixtures и application code.
| OpenAI cloud code | xlocllm native quick-dev code |
|---|---|
OpenAI(api_key=...) |
OpenAI(base_url=rt.url, api_key="xlocllm") |
| remote model name |
unit.model from xlocllm.unit(...)
|
| external hosted inference |
mode="native" local runtime |
| cloud billing/network dependency | local cache + local loopback API |
Модели выбираются через xlocllm.unit(...), runtime получает список [unit, unit1, ...],
а дальше клиент работает как с обычным OpenAI-compatible сервером. Для quick dev обычно
достаточно заменить base_url; значение model должно указывать на локальный unit.model.
- xlocllm
- Quickstart
- About
- Functions Python
- Functions TypeScript
- Use cases
- Examples Python
- Examples TypeScript
- Shared GPU mode
-
Models catalog
- Models The best
- Models Full model list
- Models Use your model
- For native mode
- Models Native LLM tiny small
- Models Native LLM medium
- Models Native LLM large
- Models Native embedding
- Models Native reranker
- Models Native translator
- Models Native tts
- Models Native vlm
- Models Native asr
- Models Native ocr
- Models Native image-classification
- Models Native object-detection
- Models Native image-segmentation
- Models Native depth-estimation
- Models Native document-layout
- Models Native table-detection
- Models Native document-qa
- Models Native language-id
- Models Native audio-classification
- Models Native text-classification
- Models Native ner
- Models Native zero-shot-text
- Models Native summarization
- Models Native text2text
- Models Native code
- For webgpu mode
- For web mode
- Models Web LLM
- Models Web embedding
- Models Web reranker
- Models Web translator
- Models Web tts
- Models Web vlm
- Models Web asr
- Models Web ocr
- Models Web image-classification
- Models Web object-detection
- Models Web image-segmentation
- Models Web depth-estimation
- Models Web document-layout
- Models Web table-detection
- Models Web document-qa
- Models Web zero-shot-image
- Models Web language-id
- Models Web audio-classification
- Models Web text-classification
- Models Web ner
- Models Web zero-shot-text
- Models Web summarization
- Models Web text2text
- Models Web code
- Dev