llocal

llocal is a small, polished terminal chat client for local OpenAI-compatible model servers.

Bring a server. llocal gives you the interface.

llocal
  |
  v
http://127.0.0.1:8080/v1/chat/completions
  |
  v
llama.cpp, Ollama, vLLM, Transformers Serve, or whatever speaks the shape

The extra l is for localhost. Also for plausible deniability.

Why

Local model runtimes are already good at loading weights, using Metal/CUDA/CPU backends, managing KV cache, and generating tokens.

They are not always good at being a pleasant terminal chat interface.

llocal keeps that boundary clean:

The server runs the model.
The TUI handles the human loop.
The API shape stays boring.

Features

OpenAI-compatible /v1/chat/completions client
Markdown rendering with Charmbracelet Glamour
Auto token budgeting, defaulting to useful long answers
/continue after token-limit cutoffs
Scrollable viewport
Transcript saving
Keyboard-first controls
No accounts, no telemetry, no hosted default

Install

From source:

git clone https://github.com/mager/llocal.git
cd llocal
go install ./cmd/llocal

Or run without installing:

go run ./cmd/llocal

Start A Local Server

llocal does not load model weights. Start a local OpenAI-compatible server first.

Example with llama.cpp:

brew install llama.cpp

llama-server \
  -m /path/to/model.gguf \
  --host 127.0.0.1 \
  --port 8080 \
  --ctx-size 8192

Then run:

llocal

Defaults:

endpoint: http://127.0.0.1:8080
model:    local
tokens:   auto
temp:     0.70

Configure

Flags:

llocal \
  --endpoint http://127.0.0.1:8080 \
  --model local \
  --tokens 0 \
  --temp 0.7

Environment:

LLOCAL_ENDPOINT=http://127.0.0.1:8080 \
LLOCAL_MODEL=local \
llocal

LOCAL_LLM_ENDPOINT and LOCAL_LLM_MODEL also work as compatibility aliases.

Commands

/help                show commands
/continue            continue after a token-limit cutoff
/model               show endpoint and model
/reset               clear the conversation
/save transcript.md  save the current chat
/tokens auto         estimate max tokens from the prompt
/tokens 4096         manually set max tokens
/temp 0.4            set temperature
/quit                quit

Scroll:

PageUp / PageDown
Ctrl+U / Ctrl+D
Ctrl+G top
Ctrl+B bottom
Mouse wheel

Token Mode

The default is tokens=auto.

Tiny prompts get tiny budgets. Most real prompts get enough room to avoid the constant irritation of "please continue."

If a response still hits the cap, llocal tells you to use:

/continue

Development

make run
make build
go test ./...

Name

Pronounce it however you want.

I say "local" and let the extra l sit there bothering people.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cmd/llocal		cmd/llocal
internal		internal
.gitignore		.gitignore
DESIGN.md		DESIGN.md
Makefile		Makefile
PRODUCT.md		PRODUCT.md
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llocal

Why

Features

Install

Start A Local Server

Configure

Commands

Token Mode

Development

Name

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llocal

Why

Features

Install

Start A Local Server

Configure

Commands

Token Mode

Development

Name

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages