Skip to content

mager/llocal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llocal

llocal is a small, polished terminal chat client for local OpenAI-compatible model servers.

Bring a server. llocal gives you the interface.

llocal
  |
  v
http://127.0.0.1:8080/v1/chat/completions
  |
  v
llama.cpp, Ollama, vLLM, Transformers Serve, or whatever speaks the shape

The extra l is for localhost. Also for plausible deniability.

Why

Local model runtimes are already good at loading weights, using Metal/CUDA/CPU backends, managing KV cache, and generating tokens.

They are not always good at being a pleasant terminal chat interface.

llocal keeps that boundary clean:

  • The server runs the model.
  • The TUI handles the human loop.
  • The API shape stays boring.

Features

  • OpenAI-compatible /v1/chat/completions client
  • Markdown rendering with Charmbracelet Glamour
  • Auto token budgeting, defaulting to useful long answers
  • /continue after token-limit cutoffs
  • Scrollable viewport
  • Transcript saving
  • Keyboard-first controls
  • No accounts, no telemetry, no hosted default

Install

From source:

git clone https://github.com/mager/llocal.git
cd llocal
go install ./cmd/llocal

Or run without installing:

go run ./cmd/llocal

Start A Local Server

llocal does not load model weights. Start a local OpenAI-compatible server first.

Example with llama.cpp:

brew install llama.cpp

llama-server \
  -m /path/to/model.gguf \
  --host 127.0.0.1 \
  --port 8080 \
  --ctx-size 8192

Then run:

llocal

Defaults:

endpoint: http://127.0.0.1:8080
model:    local
tokens:   auto
temp:     0.70

Configure

Flags:

llocal \
  --endpoint http://127.0.0.1:8080 \
  --model local \
  --tokens 0 \
  --temp 0.7

Environment:

LLOCAL_ENDPOINT=http://127.0.0.1:8080 \
LLOCAL_MODEL=local \
llocal

LOCAL_LLM_ENDPOINT and LOCAL_LLM_MODEL also work as compatibility aliases.

Commands

/help                show commands
/continue            continue after a token-limit cutoff
/model               show endpoint and model
/reset               clear the conversation
/save transcript.md  save the current chat
/tokens auto         estimate max tokens from the prompt
/tokens 4096         manually set max tokens
/temp 0.4            set temperature
/quit                quit

Scroll:

PageUp / PageDown
Ctrl+U / Ctrl+D
Ctrl+G top
Ctrl+B bottom
Mouse wheel

Token Mode

The default is tokens=auto.

Tiny prompts get tiny budgets. Most real prompts get enough room to avoid the constant irritation of "please continue."

If a response still hits the cap, llocal tells you to use:

/continue

Development

make run
make build
go test ./...

Name

Pronounce it however you want.

I say "local" and let the extra l sit there bothering people.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors