Retro terminal CLI for working with Mistral models through two backends:
local llama.cpp deployments and remote Mistral Cloud through the
official mistralai Python SDK.
The client is currently supported on Linux only.
The repository is intentionally focused on one product surface:
- use local
llama.cppand remote Mistral Cloud from one consistent REPL - handle image and document turns through backend-appropriate multimodal flows
- keep local OS tools and MCP tools available when the task needs them
- prefer source-backed answers for factual questions when MCP/web tools are available
- support coding, document work, research, OCR, and general assistant workflows
- a dedicated interactive CLI with retro green/orange presentation
- a general-purpose multimodal assistant experience for Mistral models
- always-on local tools:
shell,read_file,write_file,list_dir,search_text - optional FireCrawl MCP tools loaded from
mcp.jsonusingFIRECRAWL_API_KEYfrom your environment - optional Mistral Cloud premium web search (
web_search_premium) on the remote backend via/premium-web,--premium-web, ortools.premium_web_searchin config /imageand/docattachment commands with a terminal-native picker/remote on|offto switch between localllama.cppand Mistral cloud/remote model [small|medium]to change the remote model (only when remote is on)/compactto inspect, tune, or manually compact chat-completions context- tests for completion, streaming, cancellation recovery and multimodal payloads
make sync
uv run python -m mistralcliFor a complete command-by-command walkthrough, see the user guide.
- Python
>=3.10(tested on 3.10, 3.11, 3.12, 3.13, 3.14) - Linux
uv- a local
llama.cppserver athttp://127.0.0.1:8080if you want local mode MISTRAL_API_KEYin your shell if you want remote modeFIRECRAWL_API_KEYin your shell if you want FireCrawl MCP tools (disabled while premium web search is active on a remote session)MISTRAL_API_KEYbilling/access for Mistral premium web search when enabledpdftoppmavailable inPATHif you want PDF document rasterization via/doc- a real terminal with TTY support if you want the interactive attachment picker
Useful one-shot smoke test:
uv run python -m mistralcli --version
uv run python -m mistralcli --once "Return only the word ok." --no-streamReasoning can be requested or disabled at startup:
uv run python -m mistralcli --reasoning
uv run python -m mistralcli --no-reasoningThe supported install path for workstations and servers is a built wheel plus
uv tool install.
Build distributable artifacts on one machine:
make buildThis creates:
dist/mistralcli-<version>-py3-none-any.whldist/mistralcli-<version>.tar.gz
Version tags such as v3.5.1 also trigger a GitHub Actions release build that
publishes the wheel and source archive as GitHub release assets after passing
the normal repo checks and a wheel-install smoke test.
Install from a local wheel:
uv tool install ./mistralcli-<version>-py3-none-any.whlReinstall or upgrade from a newer wheel:
uv tool install --force ./mistralcli-<version>-py3-none-any.whlInstall directly from a GitHub release asset without cloning the repo:
uv tool install \
"https://github.com/ibitato/MistralClient/releases/download/v3.5.1/mistralcli-3.5.1-py3-none-any.whl"An optional convenience wrapper is available in scripts/install.sh.
It still uses uv tool install under the hood, and its job is only to help
with local wheel discovery, optional release URLs, and cleanup of legacy
repo-local installs.
After installation:
mistralcli --version
mistralcli --print-defaults
mistralcliThe public CLI behavior stays centered around mistralcli, but the internal
implementation is now split into smaller domain modules:
src/mistralcli/session.pyis the thinMistralSessionfacadesession_runtime.py,session_transport.py,session_conversations.py,session_tools.py,session_context.py, andsession_primitives.pyown the main session domainssrc/mistralcli/cli.pyis the thin CLI entrypoint facadecli_config.py,cli_repl.py,cli_commands.py,cli_shortcuts.py, andcli_state.pyown CLI-specific runtime responsibilitiestests/cli_support.pycontains shared CLI fixtures and fakes, while the CLI behavior tests are split by domain undertests/test_cli_*.py
This keeps individual Python units easier to navigate without changing the user-facing command surface.
Inside the REPL:
/helpfor actionable usage/defaultsto inspect runtime parameters/statusto inspect the current live session snapshot--versionor-vto print the installed CLI version/toolsto inspect loaded tools/timeout [VALUE]to inspect or change the active request timeout/run -- ...to execute a shell command/ls [PATH]to inspect the tree/find -- ...to search text in the workspace/edit PATH -- ...to write text files/imageto pick and analyze images in the terminal/docto pick and analyze documents in the terminal/remote on|offto switch cloud mode/remote model [small|medium]to change the remote model/conv ...to manage Mistral Cloud Conversations and local bookmarks/reasoning [on|off|toggle]to request or suppress backend reasoning/thinking [on|off|toggle]to show or hide returned thinking blocks/compact [status|now|auto on|auto off|threshold N|reserve N|keep N]to manage context/reset,/system ...,/exit
Interactive TTY behavior:
- the prompt is rendered as a retro green
MC>composer in TTY sessions - long prompts wrap in the composer instead of overflowing one raw line
- multiline TTY composer with visual wrap, Alt+Enter/Ctrl+J line breaks, and preserved paste newlines buffer; nothing is sent until you press Enter
- a bottom status bar appears during active turns and shows live phase, backend, attachments, live context estimate, and backend token accounting
/statusprints that same live session state on demand between turns- fenced code blocks in assistant answers are highlighted in a dedicated cyan code style so snippets stand out from normal prose
- standalone Markdown separators such as
---are rendered as terminal divider lines outside code fences - assistant reasoning and answer text stream with a fast typewriter-style cadence in TTY mode
- assistant prose wraps cleanly without splitting words in the middle
Typical tasks include:
- general chat and question answering
- image and document analysis
- OCR and extraction from attached files
- summaries, comparisons, translations, and drafting
- local workspace automation with tools
- programming and debugging when you want it
The model always sees these local tools, but they are intentionally specialized:
shellis the primary tool for Linux and OS inspection:rg,grep,find,git,ps,systemctl, package managers, logs, env vars, permissions, and system-level discovery.search_textis only for searching text inside files under a workspace path. It is for repo/source lookup and returns one matching line per file.list_diris for directory orientation before reading or searching deeper.read_fileis for reading one specific known text file.write_fileis for saving or updating text on disk when the task requires it.
Examples:
- "Find files mentioning timeout in
src/" ->search_text - "Check running nginx processes" ->
shell - "Search the OS for docker service files" ->
shell - "Show what is in
/etc/systemd" ->list_dirorshell - "Read
pyproject.toml" ->read_file
Remote mode requirements:
- export
MISTRAL_API_KEYin your shell - remote mode defaults to
mistral-small-latestand--remote-modelcan switch tomistral-medium-3.5 - backend switching resets the active conversation
- optional Conversations mode uses
client.beta.conversationsand is off by default --conversationsstarts in Conversations mode;--conversation-store on|offcontrols server-side persistence and defaults toon--conversation-resume {last,new,prompt}controls whether Conversations mode resumes the last known stored remote conversation; the default islast--conversation-name,--conversation-description, and repeated--conversation-meta KEY=VALUEset pending metadata for the next remote conversation start--reasoningand--no-reasoningcontrol whether reasoning is requested--thinkingand--no-thinkingcontrol whether returned thinking is renderedstore=offruns stateless one-shot Conversation calls, so it does not preserveconversation_idacross turns- the CLI keeps a local registry at
~/.local/state/mistralcli/conversations.json(or$XDG_STATE_HOME/...) for aliases, tags, notes, and last-active resume state - the default request timeout is
300000 ms(5 minutes)
Conversations management:
/conv onenables Conversations mode and resumes the last stored conversation when the resume policy islast/conv list --page 0 --size 20 --meta owner=dlopezlists remote conversations; metadata filters are applied by the CLI using remote details plus its local registry cache/conv show <id>inspects remote metadata for one conversation/conv use <id>reattaches the current session to an existing remote conversation/conv history [id]and/conv messages [id]inspect remote history/conv restart <entry_id> [id]branches from a specific remote history entry/conv delete [id]deletes a remote conversation/conv alias,/conv note,/conv tag,/conv bookmarks, and/conv forgetmanage the CLI-side local overlay/conv alias release-reviewassigns an alias directly to the active conversation;/conv alias <id> release-reviewstill works for any known id- Mistral does not expose a remote update API for existing conversation
name/metadata, so aliases and bookmarks are stored locally by the CLI - Mistral model Conversations currently preserve
nameanddescription, but may not return custommetadatainget/list; when a conversation is created from this CLI, the requested metadata is preserved in the local registry so/conv list --meta ...remains useful for those sessions
Context management:
- default chat completions are client-managed and send the full local history
- the CLI estimates context before each non-Conversations request because the SDK does not expose a backend tokenizer for this path
- local mode defaults to the configured local model window of
262144tokens; remote chat completions default to256000tokens - auto-compaction is enabled by default at
90%of the configured window and reserves8192tokens for the next response /compactsummarizes older turns into one compact assistant message and keeps the most recent 6 user turns verbatim- if compaction cannot bring a request under the hard window, the CLI blocks the turn before sending it to the backend
- Conversations mode is not compacted locally; its server-side context handling remains backend-managed by Mistral
Typical environment setup:
export MISTRAL_API_KEY=...
export FIRECRAWL_API_KEY=...Attachment picker flow:
/imageand/docuse a pure terminal picker with no GUI requirements- first browse directories in the terminal picker
- use
[use]to keep the current directory or[..]to move to the parent - then use a fuzzy list to pick one matching file
Enterselects the highlighted entry andCtrl-Ccancels- if the picker cannot run, the CLI falls back to manual path entry
The local runtime is expected to be running outside this repo with llama.cpp.
The validated launch profile for the current test stack is:
llama-server \
-hf unsloth/Mistral-Small-4-119B-2603-GGUF:UD-Q5_K_XL \
--host 0.0.0.0 --port 8080 \
--jinja --flash-attn off --no-mmap \
--chat-template-file ./mistral-small-4-reasoning.jinja \
--ctx-size 262144 \
-ngl 99 \
--temp 0.3 --top-p 0.95 --top-k 40 --min-p 0.0 \
--parallel 1 --ctx-checkpoints 32 --cache-prompt \
--threads "$(nproc)"Recommended runtime defaults used by the CLI:
temperature=0.3top_p=0.95prompt_mode=reasoningtimeout_ms=300000- streaming on by default
max_tokensunset unless you override it- Conversations mode off by default;
store=onwhen enabled - auto context compaction on at
90%, preserving 6 recent turns
The lower default temperature is intentional: Mistral's API guidance describes
lower values as more focused and deterministic, which is a better default for
this CLI's factual, tool-assisted workflow. Raise it with --temperature or
MISTRAL_LOCAL_TEMPERATURE when you explicitly want more creative variation.
Remote mode keeps the same sampling defaults, but it does not send
prompt_mode=reasoning. The live Mistral cloud API rejects that setting for
mistral-small-latest or mistral-medium-3.5, so the CLI uses the official SDK with
reasoning_effort=high when reasoning is enabled, and reasoning_effort=none
when it is disabled. /thinking only affects terminal rendering.
Conversations mode is an optional Mistral Cloud path. It resets the current chat
when enabled, starts a fresh remote conversation_id on the next user turn, and
keeps the normal chat-completions path as the default. When reasoning is
enabled, the CLI requests thinking traces for Conversations too, but the
backend may still omit them on some turns; the CLI reports that explicitly
when thinking display is on.
Attachment handling is backend-aware:
- local
/imagesendsimage_urlblocks tollama.cpp - local
/docrasterizes supported documents into page images for OCR/vision - remote
/imageuses the official SDK vision flow withimage_url - remote
/docuses the official SDK document flow withdocument_urlforpdfanddocx - remote plain-text documents are embedded directly as text because they are already machine-readable
The repository now includes the exact reasoning template at
mistral-small-4-reasoning.jinja. In this
local setup it is effectively required if you want reasoning requested by
default, because it sets reasoning_effort=high in the llama.cpp chat
template.
For the detailed backend and runtime runbook, see docs/backends-and-runtime.md. For day-to-day CLI usage, see docs/user-guide.md.
make check
make test
make docsmake check runs formatting, lint, mypy, pyright, and docs checks.
make test runs the full pytest suite, including local integration tests
that require the llama.cpp server.
Remote cloud integration runs automatically when MISTRAL_API_KEY is present
in the environment and skips cleanly when it is absent.
make docs regenerates the checked-in API reference from public docstrings.
make typecheck runs both static type checkers, with mypy covering src
and tests and pyright providing a second pass over the configured project
tree.
For the intended retro palette in the interactive REPL, prefer:
export TERM=xterm-256colorThe interactive REPL clears the screen on startup and after conversation reset actions so the app stays pinned to the top of the terminal.
- Secrets are not stored in the repository.
mcp.jsonuses${FIRECRAWL_API_KEY}interpolation instead of a checked-in token.- Remote cloud mode reads
MISTRAL_API_KEYfrom the user environment at runtime. - The CLI exposes powerful local tools:
shell,read_file,write_file,list_dir, andsearch_text. - Run it in a workspace and environment you trust.
- Keep
.venv,.env, and any ad hoc credential files out of version control.
src/mistralcli/- CLI, session, tools and attachment handlingtests/- unit and integration testsdocs/user-guide.md- practical end-user guide for the CLIdocs/configuration.md- configuration files, environment variables, and CLI flagsdocs/backends-and-runtime.md- backend-specific deployment and runtime notesdocs/testing-matrix.md- feature-to-test coverage map for maintainersdocs/reference.md- generated API reference from public docstringsmistral-small-4-reasoning.jinja- versioned llama.cpp reasoning templatemcp.json- optional FireCrawl MCP config that expandsFIRECRAWL_API_KEYat runtime
MIT. See LICENSE.