llocal is a small, polished terminal chat client for local OpenAI-compatible model servers.
Bring a server. llocal gives you the interface.
llocal
|
v
http://127.0.0.1:8080/v1/chat/completions
|
v
llama.cpp, Ollama, vLLM, Transformers Serve, or whatever speaks the shape
The extra l is for localhost. Also for plausible deniability.
Local model runtimes are already good at loading weights, using Metal/CUDA/CPU backends, managing KV cache, and generating tokens.
They are not always good at being a pleasant terminal chat interface.
llocal keeps that boundary clean:
- The server runs the model.
- The TUI handles the human loop.
- The API shape stays boring.
- OpenAI-compatible
/v1/chat/completionsclient - Markdown rendering with Charmbracelet Glamour
- Auto token budgeting, defaulting to useful long answers
/continueafter token-limit cutoffs- Scrollable viewport
- Transcript saving
- Keyboard-first controls
- No accounts, no telemetry, no hosted default
From source:
git clone https://github.com/mager/llocal.git
cd llocal
go install ./cmd/llocalOr run without installing:
go run ./cmd/llocalllocal does not load model weights. Start a local OpenAI-compatible server first.
Example with llama.cpp:
brew install llama.cpp
llama-server \
-m /path/to/model.gguf \
--host 127.0.0.1 \
--port 8080 \
--ctx-size 8192Then run:
llocalDefaults:
endpoint: http://127.0.0.1:8080
model: local
tokens: auto
temp: 0.70
Flags:
llocal \
--endpoint http://127.0.0.1:8080 \
--model local \
--tokens 0 \
--temp 0.7Environment:
LLOCAL_ENDPOINT=http://127.0.0.1:8080 \
LLOCAL_MODEL=local \
llocalLOCAL_LLM_ENDPOINT and LOCAL_LLM_MODEL also work as compatibility aliases.
/help show commands
/continue continue after a token-limit cutoff
/model show endpoint and model
/reset clear the conversation
/save transcript.md save the current chat
/tokens auto estimate max tokens from the prompt
/tokens 4096 manually set max tokens
/temp 0.4 set temperature
/quit quit
Scroll:
PageUp / PageDown
Ctrl+U / Ctrl+D
Ctrl+G top
Ctrl+B bottom
Mouse wheel
The default is tokens=auto.
Tiny prompts get tiny budgets. Most real prompts get enough room to avoid the constant irritation of "please continue."
If a response still hits the cap, llocal tells you to use:
/continue
make run
make build
go test ./...Pronounce it however you want.
I say "local" and let the extra l sit there bothering people.