A local-first coding sidekick for Antigravity, VS Code and Cursor. Stream live model output, chat with your repo, edit files, run terminal commands, and keep everything on-device without shipping your soul to the cloud.
This extension was built because I was tired of being ghosted by AI every time I hit 30,000 feet. No Wi-Fi? No problem. LLeM was made specifically for those long-haul flights where you just want to vibe and code without needing an internet connection.
Note
Credits & Origin: Huge shoutout to the OG inspiration, connect-ai. We took that foundation, gave it a massive refactor, and cranked everything up to eleven. Weβre talking boosted performance, fresh features, and a serious security audit to keep your local workflow locked down. We didn't just download it; we leveled it up.
Special Thanks: Seriously, LLeM wouldn't even exist if it weren't for the connect-ai sharing their code with the world. Major respect for that open-source energyβyou guys made this happen. π«Ά
Fair Play: If you're planning to build on top of this or create something new based on this code, please keep the good karma flowing and make sure to shout out the original creators connect-ai and their contributions. Respect the hustle! βοΈ
Important
LLeM is 100% local. Your code never leaves your machine. No cloud, no drama, just pure local intelligence.
- [UI/UX] Live Stream Metadata: Action progress badges now display real-time statistics, including total duration, chunk count, and character count, giving you full visibility into the AI's generation performance.
- [Robustness] AI Self-Correction Loop: When a file edit fails due to context mismatch, LLeM now automatically feeds the actual file content back to the AI for an immediate, accurate retry.
- [UI/UX] Action Progress Visualization: Live streaming now shows clean, "Codex-style" progress badges for file operations instead of raw XML, keeping you informed without the clutter.
- [Reliability] Tag Normalization: Improved handling of aborted or incomplete streams to ensure actions are executed even if the connection drops.
Since LLeM is currently in early flight, we distribute it via .vsix files. Follow these steps to get airborne:
- Go to the LLeM GitHub Repository.
- Look at the Releases section on the right sidebar.
- Click on the latest release tag (e.g.,
v3.1.3). - Under the Assets section, click on the
.vsixfile (e.g.,llem-3.1.3.vsix) to download it to your machine.
- Open VS Code or Cursor.
- Open the Extensions view by clicking the square icon in the left sidebar or pressing
Cmd+Shift+X(macOS) orCtrl+Shift+X(Windows/Linux). - Click the
...(More Actions) menu icon in the top right corner of the Extensions view title bar. - Select Install from VSIX... from the dropdown menu.
- Locate and select the
.vsixfile you just downloaded. - Once installed, you might need to click Reload or restart your editor.
- π‘οΈ Local-First Workflow: Connects directly to local engines like Ollama or LM Studio. No cloud, no API costs.
- π Live Streaming: Real-time output rendered inside a custom VS Code chat panel with full Markdown and code block support.
- π οΈ Agentic Actions: Trigger file creations, non-destructive edits, and terminal commands directly from the AI's response.
- ποΈ Persistent History: Conversations are automatically saved to
~/.llem-history, supporting session recovery, renaming, and bulk deletion. - π Workspace Awareness: Real-time monitoring of your project files. Drop files/folders into chat for instant, high-fidelity context injection.
- π§ The Brain (Markdown Vault): Sync your notes with an Obsidian-compatible vault. Supports visual network maps and local Git synchronization.
- β‘ Performance First: Multi-layered caching, request throttling, and token-usage monitoring to keep your dev environment snappy.
- π§ Model-Aware Prompt Budgeting: Automatically trims prompt weight for big local models so 24B+ and 26B-class runs stay responsive instead of drowning in context.
- π Built-In Diagnostics: Inspect prompt size, first-token delay, section-by-section context weight, and streaming throughput directly from the LLeM diagnostics panel.
To get started, you'll need a local model runtime running on your machine.
Typical URL: http://127.0.0.1:11434
# Pull a model and serve
ollama pull gemma4:e4b
ollama serveFor larger local runs, a 24B+ Gemma-family model is a better fit for the new performance profile flow:
# Example 26B-class local setup
ollama pull gemma6:26b
ollama serveTypical URL: http://127.0.0.1:1234
- Download and load your favorite model.
- Enable the Local Server.
- Confirm the server is active.
Open your VS Code settings.json to customize the experience.
| Setting | Description | Default |
|---|---|---|
llem.engineUrl |
Local/remote model endpoint URL. | http://127.0.0.1:11434 |
llem.defaultModel |
The default model slug used for requests. | gemma4:e4b |
llem.performancePreset |
Prompt and generation budget profile. Use auto, balanced, or large-local-26b. |
auto |
llem.requestTimeout |
Request timeout in seconds. | 300 |
llem.vaultPath |
Path to your markdown vault. | ~/.llem-vault |
llem.bridgeEnabled |
Enable the local HTTP bridge on port 4825. | false |
llem.bridgeToken |
Security token for authenticated bridge callers. | (empty) |
llem.mcpEnabled |
Enable MCP server discovery and tool calls. | true |
llem.mcpServers |
LLeM-owned MCP server definitions keyed by server name. Includes context-mode by default. |
context-mode via npx -y context-mode |
llem.mcpConfigSources |
MCP sources to import. LLeM asks before enabling external Antigravity, VS Code, Codex, or Claude Code sources. | ["workspace"] |
llem.mcpConfigPaths |
Extra JSON/TOML MCP config files to import. | [] |
llem.maxHistoryItems |
Maximum number of sessions to keep in history. | 100 |
Tip
If you're using a slower model or long prompts, try bumping up the llem.requestTimeout.
LLeM can discover and call MCP tools from the chat loop. It includes context-mode by default using npx -y context-mode, and it can also import MCP server configs you already use in VS Code, Claude Code, Codex, Antigravity, project-level MCP files, or LLeM-specific .llem/mcp.json files.
Current support:
stdio,http,sse, and Streamable HTTP MCP servers are executed and callable.- HTTP-based servers use the MCP SDK client transports and forward configured
headersas request headers. - External config files are read-only. LLeM never rewrites your Claude Code, Codex, or Antigravity MCP files.
- Secrets in
envandheadersshould stay in environment variables or your existing tool config; avoid committing them to project files.
The default MCP setup already contains:
{
"llem.mcpServers": {
"context-mode": {
"command": "npx",
"args": ["-y", "context-mode"],
"timeoutSeconds": 30
}
}
}Open the LLeM settings menu from the chat gear, choose MCP servers, then select a server to test its connection. In chat, ask for MCP-backed work normally; LLeM will use:
<list_mcp_tools/>then call a discovered tool with:
<call_mcp_tool server="context-mode" tool="ctx_stats">{}</call_mcp_tool>You usually do not need to type these tags yourself. They are the internal action format LLeM gives to the local model.
When context-mode is called through MCP, the chat Action Report shows that it ran and includes any reported context or token savings.
Edit VS Code settings.json and add servers under llem.mcpServers. Server names are the object keys.
{
"llem.mcpServers": {
"context-mode": {
"command": "npx",
"args": ["-y", "context-mode"],
"timeoutSeconds": 30
},
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "${WORKSPACE_ROOT:-.}"],
"env": {
"NODE_ENV": "production"
},
"cwd": "${WORKSPACE_ROOT:-.}",
"timeoutSeconds": 30
},
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_PERSONAL_ACCESS_TOKEN}"
},
"startupTimeoutSeconds": 20,
"toolTimeoutSeconds": 60
}
}
}Supported fields:
| Field | Description |
|---|---|
type |
Optional transport type. Omit for command-based stdio; use stdio explicitly if desired. |
command |
Executable to launch, for example npx, node, python, or an absolute path. |
args |
Array of command arguments. |
env |
Environment variables passed to the MCP server. Supports ${VAR} and ${VAR:-default} expansion. |
cwd |
Working directory for the server process. Supports env expansion. |
disabled |
Set true to keep a server configured but unavailable. |
timeoutSeconds |
Default timeout for startup and tool calls. |
startupTimeoutSeconds |
Timeout for server startup and tool listing. |
toolTimeoutSeconds |
Timeout for individual tool calls. |
enabledTools |
Optional allow-list of tool names. |
disabledTools |
Optional deny-list of tool names. |
If you prefer project or user config files instead of VS Code settings, create one of these files:
- Workspace config:
.llem/mcp.jsonin the open workspace root - User config:
%USERPROFILE%/.llem/mcp.jsonon Windows, or~/.llem/mcp.jsonon macOS/Linux
These files use the same mcpServers object shape as .mcp.json, plus an optional contextMode value.
{
"contextMode": "auto",
"mcpServers": {
"playwright": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@playwright/mcp"],
"startupTimeoutSeconds": 20,
"toolTimeoutSeconds": 60
}
}
}contextMode controls how aggressively LLeM should treat MCP-backed context as available:
| Value | Behavior |
|---|---|
off |
Do not automatically use MCP-backed context. |
auto |
Let LLeM use MCP-backed context when the request calls for it. |
always |
Prefer MCP-backed context whenever it is available. |
Invalid contextMode values are ignored and reported as config warnings.
LLeM imports workspace MCP config by default. When the extension starts, it looks for Antigravity, VS Code, Codex, and Claude Code MCP config files and asks before enabling those imports:
{
"llem.mcpConfigSources": ["workspace"]
}Import priority, from lowest to highest, is:
- Extra raw config paths from
llem.mcpConfigPaths - Antigravity user config
- Workspace
.vscode/mcp.json - Claude Code user and project config
- Codex user config
- Workspace
.codex/config.toml - Workspace
.mcp.json - User
.llem/mcp.json - Workspace
.llem/mcp.json - VS Code
llem.mcpServers
When two sources define the same server name, the higher-priority source wins.
Project-level MCP files use the common mcpServers shape:
{
"mcpServers": {
"context-mode": {
"command": "npx",
"args": ["-y", "context-mode"]
}
}
}LLeM reads this from:
.vscode/mcp.jsonin the open workspace root when thevscodesource is enabled.mcp.jsonin the open workspace root.llem/mcp.jsonin the open workspace root~/.llem/mcp.jsonfor user-level LLeM config~/.claude.jsonunder the current project entry~/.claude/settings.jsonwhen it containsmcpServers
Codex servers are read from $CODEX_HOME/config.toml when CODEX_HOME is set, otherwise from ~/.codex/config.toml.
[mcp_servers.playwright]
command = "npx"
args = ["-y", "@playwright/mcp"]
enabled = true
startup_timeout_sec = 20
tool_timeout_sec = 60
enabled_tools = ["browser_navigate", "browser_snapshot"]
disabled_tools = ["browser_install"]Codex field mapping:
| Codex TOML field | LLeM behavior |
|---|---|
enabled = false |
Treated as disabled: true. |
startup_timeout_sec |
Mapped to startupTimeoutSeconds. |
tool_timeout_sec |
Mapped to toolTimeoutSeconds. |
enabled_tools |
Used as a tool allow-list. |
disabled_tools |
Used as a tool deny-list. |
Antigravity servers are read from ~/.gemini/antigravity/mcp_config.json when the antigravity source is enabled.
If another tool exposes a raw MCP config file, add it to llem.mcpConfigPaths. JSON files with a top-level mcpServers object and Codex-style TOML files are supported.
{
"llem.mcpConfigPaths": [
"C:/Users/you/AppData/Roaming/Antigravity/mcp.json",
"~/shared/mcp/config.toml"
]
}- If a server does not appear, open LLeM settings β MCP servers and check its source/status.
- If a server appears as unsupported, check that its transport is one of
stdio,http,sse, or Streamable HTTP and that required fields likecommandorurlare present. - If
npxservers fail on Windows, confirmnode,npm, andnpxare available in the VS Code process environment. - If env expansion produces empty values, define the variable before launching VS Code/Cursor, or use
${VAR:-default}. - If a tool returns too much data, LLeM truncates the result before feeding it back into the model to keep chat context usable.
For bigger local models such as gemma6:26b or other 24B+ Gemma-family builds:
- prefer Ollama for the current optimized path,
- switch
llem.performancePresettolarge-local-26bif you want tighter prompt budgets immediately, - keep
llem.performancePresetonautoif you want LLeM to detect 26B-class models by name or metadata, - raise
llem.requestTimeoutto around600seconds on slower or memory-constrained machines, - pair a 26B default with a smaller fallback model if you want fast iteration for simple edits.
Current-machine guidance:
- on Apple Silicon systems around the
34 GBclass,large-local-26bis the recommended preset for 26B local models, - on other machines, start with the same preset and only widen timeout or context if your hardware can comfortably handle it.
LLeM now exposes a model-sensitive prompt and generation budget setting through llem.performancePreset.
auto: Recommended default. LLeM checks the selected model name and, when available, Ollama metadata such asparameter_size. If the model looks like a24B+local run, it automatically switches into the 26B profile.balanced: Keeps the wider default context and generation budget. This is the better fit for smaller local models when raw responsiveness is already good.large-local-26b: Uses a tighter prompt budget and smaller Ollama generation window so big local models spend less time chewing through workspace context before the first token lands.
When large-local-26b is active, LLeM intentionally becomes more selective about context:
- active editor context gets first priority,
- attached text files are budgeted per file and across the whole turn,
- workspace tree and vault index are clipped more aggressively,
- and older low-relevance chat history is pruned before the current request is allowed to grow out of control.
This is designed to improve real-world latency, not benchmark token counts in isolation. The point is to keep the answer useful while reducing the hidden prompt tax that large local models pay.
Use LLeM: Show Diagnostics when tuning a larger model. The diagnostics channel now surfaces the key numbers you need:
- selected model and resolved performance profile,
- estimated prompt size before send,
- final request size after pruning,
- history, attachment, active-editor, workspace, and vault character breakdowns,
- pruned message count and attachment trim amount,
- first-token latency,
- total stream duration,
- and token throughput.
If a 26B-class model still feels sluggish, the fastest knobs to check are:
llem.performancePresetllem.requestTimeout- total attachment size in the current turn
- whether the active file or vault index is unusually large
In practice, this makes it much easier to see whether the bottleneck is model load time, prompt size, or generation speed.
- Node.js (v18+)
- npm
- Compile:
npm run compile - Build VSIX:
npm run package:vsix - Local Test VSIX:
npm run package:vsix:local
- Context Limits: Large file attachments might hit the context window limit of your local model.
- Large-Model Warmup: The first request to a 24B+ local model can still feel slow even after prompt trimming, especially right after loading the model into memory.
- Server Check: Make sure your local engine (Ollama/LM Studio) is actually running before you start chatting.
v3.1.1 builds on the v3.1.0 chat UX refresh and adds the missing piece: editing earlier user messages in a Gemini Web-style flow.
You can now go back to a previous user message, click Edit, revise the prompt, and continue from there.
- the old thread stays intact,
- LLeM creates a new branch from the point before that message,
- the edited prompt is resubmitted into that branch,
- and any reusable attached files from the original message can travel with the edit flow.
This keeps the conversation history safe while making prompt iteration much faster and less destructive.
With Copy, Branch, Edit, π, and π, each finished exchange can now be reused in multiple ways:
- Copy a strong answer,
- Branch an assistant response into a new direction,
- Edit a user message to retry from an earlier point,
- Like a response style you want repeated,
- Dislike a response style you want avoided later.
That makes LLeM feel much closer to modern consumer chat tools while staying inside VS Code and staying local-first.
Preference memory continues to apply across:
- normal follow-up turns,
- new chats,
- chat branches,
- and edited-message branches.
So if you teach LLeM what kind of answers you like, that preference signal survives even when you fork or revise the conversation path.
Clickable file references in chat are now more accurate:
- only editable file types can be opened from chat,
- basename-only references like
extension.tscan resolve to a real workspace file when the match is unambiguous, - and chat attachments preserve enough metadata to reopen the right source more reliably.
The webview renderer now handles leftover inline Markdown markers more gracefully in normal prose.
- inline bold and emphasis markers render more reliably in mixed-language text,
- bullet items such as
- **βλ‘컬 νκ²½μ λΏλ¦¬λ΄λ¦°(Local-first) μ§λ₯ν μμ΄μ νΈβ**now display with the intended emphasis, - and the fallback logic avoids touching fenced code blocks while cleaning up visible chat output.
- added editable earlier-message branching from the webview action bar,
- preserved reusable attachment payloads in display history for edit/retry flows,
- added branch generation from the point before a selected user message,
- improved workspace filename resolution for clickable chat file references,
- added a safe inline-Markdown fallback for webview chat rendering,
- kept reply-style preference memory persistent across branch variants.
The big shift here is that LLeM is no longer just good at answering or branching. It is now better at revising. That means less copy-paste, less losing context, and much smoother iteration when you're tightening prompts or trying alternate implementation directions.
v3.1.0 is the first release that makes each completed reply feel more like a modern chat product, while still keeping the whole workflow local-first.
Once an assistant reply finishes streaming, LLeM now shows a compact action row directly under that message.
- Copy: Copies just that specific assistant response to your clipboard.
- Branch: Creates a brand-new chat branch from that response so you can explore a different direction without losing the original thread.
- π Like: Marks that answer style as something the user wants more of.
- π Dislike: Marks that answer style as something the user wants less of.
This interaction model is intentionally inspired by the post-reply controls you see in Gemini Web, but adapted to LLeM's local VS Code workflow.
Branching is now a first-class concept inside the chat experience.
- You can branch from any completed assistant response.
- The new branch becomes its own saved chat session.
- The original conversation remains untouched in history.
- The branch inherits the visible conversation context up to the selected reply, making it easy to explore alternate plans, implementations, or follow-up prompts.
This is especially useful when you want to:
- compare two implementation strategies,
- keep one thread focused on debugging while another explores a refactor,
- or preserve a "good state" before taking the conversation in a different direction.
Likes and dislikes are not cosmetic. They now update a persistent memory layer that survives:
- new chats,
- branched chats,
- and extension restarts.
When you give feedback on a reply, LLeM stores a compact memory of that preference and uses it to steer future responses. In practice, that means:
- replies you like help reinforce the kind of tone, structure, and answer shape you want,
- replies you dislike tell the assistant to avoid similar response patterns later unless you explicitly ask for them.
This preference memory is injected into the system context for future requests, so LLeM can adapt over time instead of acting like every conversation starts from zero.
This release also tightens the file interaction model inside chat:
- only editable file types are shown as clickable in message content,
- only editable attachments can be opened from chat,
- and dropped file attachments preserve enough metadata to open the correct source more reliably.
That keeps chat interactions cleaner and avoids misleading "clickable" affordances on files that are not actually editable in the intended way.
Under the hood, this release adds several important building blocks:
- a shared editable-file classifier used by both the webview and extension host,
- per-message feedback state in persisted chat history,
- a new response preference manager backed by extension global state,
- message-level UI actions for copy, branching, and feedback,
- and branch session generation from the currently visible conversation timeline.
LLeM has always focused on local execution, real file edits, and practical repo-aware assistance. With v3.1.0, the chat UX becomes much more iterative:
- you can fork thought paths without losing your place,
- quickly reuse or share strong replies,
- and gradually teach the assistant how you want it to respond.
Still local. Still yours. Just much more adaptable.
Sup world! π v3.0.5 is officially out in the wild and it's our first public release. π
- Branding on Point: We ditched the boring stuff for a fresh icon and a UI that actually looks good.
- Gemma Optimization: We tweaked the engine to hunt down Ollama's or LM Studio's default model automatically.
- Chat History 2.0: Full persistence layer implemented. Your conversations now survive VS Code restarts.
- Workspace Sync: Instant UI updates when you rename, delete, or add files to your project.
- Security Audit: Completed a deep-dive security pass on the Bridge Server, adding rate limiting and token-based auth.
- Better Vibes: Smoother logging and descriptive errors so you're never left guessing.
- Public Launch: This is it. The first time we're letting this thing out of the hangar for everyone to use.
Local-first, offline-always. Let's cook. π«π»
This release focuses on making agentic file edits visible, debuggable, and easier to trust when running local models such as Ollama Gemma-family models.
- Codex-style file change summaries in chat: When LLeM creates, edits, or deletes files, the chat now shows a compact change card with one row per file. Each row includes the action, file name, and line-level
+/-counts so you can immediately see what changed without opening the filesystem first. - Whole-turn change totals: Multi-file edits now include a footer such as
2 files changed +75 -20, giving a clear overview of the total edit impact for the current agent action. - Clickable changed files: File rows in the change summary can be clicked to open the affected file directly from the chat UI.
- Review Changes shortcut: The change summary includes a
Review changesbutton that opens VS Code's Source Control view, making it faster to inspect the workspace diff after an agent run. - Stronger edit failure visibility: If the model emits an
<edit_file>action but none of the<find>blocks match the current file, LLeM now reports it as a clear failure:Edit failed ... replacement 0/N. This makes silent no-op edits much harder to miss. - Immediate Action Report streaming: External action results are now posted into the live chat stream as soon as they happen. File edits, failed replacements, safety blocks, MCP activity, and terminal actions no longer wait until later continuation logic to become visible.
- Action Report preserved in the final answer: The final assistant message keeps the action report attached, so the user can scroll back later and still see exactly what LLeM tried, what succeeded, and what failed.
- Cleaner regenerate behavior:
Regenerate replynow removes the previous assistant response from the chat UI before streaming the replacement, so regeneration feels like a true retry instead of an extra appended answer. - Follow-up recovery guidance for local models: When an edit fails because the
<find>text does not match, LLeM now gives the follow-up model turn a stronger system observation telling it to retry with exact current file content instead of explaining the failure away. - Post-mortem logging for file actions: File create/edit/delete paths now write structured diagnostics for validation blocks, missing files, invalid edit bodies, zero-replacement edits, successful writes, and exceptions. These logs include trace IDs, parsed action counts, file paths, replacement metadata, and previews to help reconstruct what happened after a failed run.
- Safer testable logging outside VS Code: The logger now lazily loads the VS Code API and falls back to diagnostics-file logging during Node-based tests, so action logging can be covered without requiring an extension host.
- Regression coverage for edit metadata: Tests now verify that file action results include structured change metadata for created, edited, and deleted files.
- MCP behavior carried forward: The release keeps HTTP/SSE MCP support, approval-based MCP imports from Antigravity/VS Code/Codex/Claude Code, and on-demand
context-modereporting from the previous MCP work.
- Bumped the VSIX build from
3.3.38to3.3.39. - Support HTTP and SSE MCP transports while keeping context-mode as an on-demand MCP tool instead of a forced preflight.
- Packaged
release/llem-3.3.39.vsix.
This release improves MCP transport support and keeps context-mode visible when it is actually called through MCP.
- HTTP, SSE, and Streamable HTTP MCP servers are now callable through the MCP SDK client transports, so Codex-imported servers such as Figma, Linear, and Notion no longer appear as unsupported solely because they use HTTP.
- Configured HTTP
headersare forwarded as request headers for HTTP-based MCP servers. - The chat Action Report shows visible
context-modeactivity when a model-driven or manual MCP call runs it. When the MCP result includes savings fields, LLeM summarizes context characters saved, tokens saved, before/after token counts, and compression ratio. context-moderesult parsing accepts both normal JSON-like MCP payloads and text content that contains embedded JSON, so different server response shapes can still produce useful savings summaries.- Regression coverage was added for context-mode savings extraction, per-server MCP tool listing, supported HTTP/SSE config resolution, and HTTP MCP tool calls.
- LLeM now looks for existing MCP configs from Antigravity, VS Code, Codex, and Claude Code when the extension starts, then asks before enabling those external sources. This keeps imports explicit while still making setup easy.
- Antigravity import now reads the Gemini Antigravity path directly:
~/.gemini/antigravity/mcp_config.json. - VS Code workspace MCP import was added via
.vscode/mcp.json, alongside the existing workspace.mcp.json, Codex TOML, Claude Code JSON, and LLeM-specific.llem/mcp.jsonsources. - The default
llem.mcpConfigSourcesvalue is now["workspace"]. External sources are added only after user approval, while LLeM's own defaultcontext-modeserver remains available throughllem.mcpServers. - Users can choose
Import,Not now, orNever askwhen external MCP configs are found.Never askis remembered in extension global state so the same source is not prompted repeatedly. - MCP settings docs now describe the import priority, the Antigravity path, the VS Code workspace MCP path, and the approval-based import flow.
- Tests were added for Antigravity import, VS Code MCP import, external source discovery, skipped configured/dismissed imports, and package schema defaults.
- Added the first full MCP tool path in LLeM: MCP server discovery, stdio server startup, tool listing, tool calling, timeout handling, and allow/deny tool filtering.
- Added
context-modeas the built-in default MCP server throughnpx -y context-mode, making context-aware MCP support available without manual server setup. - Added MCP settings for enabling/disabling MCP, defining LLeM-owned servers, importing known config sources, and adding extra JSON/TOML config paths.
- Added MCP UI support in the settings menu so imported servers can be inspected and health-checked from the chat gear.
- Added README guidance for MCP server setup, source priority, supported transports, JSON/TOML config shapes, and in-chat MCP action tags.
- Bumped the VSIX build from
3.3.35to3.3.35. - Bumped version and upgraded axios before VSIX build.
- Packaged
release/llem-3.3.35.vsix.
- Bumped the VSIX build from
3.3.33to3.3.34. - Fixed image lightbox close behavior so the top-right close button and backdrop dismiss reliably. Reduced action-history bloat by keeping only the most recent file context per turn and trimming file/web observation payloads.
- Packaged
release/llem-3.3.34.vsix.
- Bumped the VSIX build from
3.3.32to3.3.33. - Reduced action-history bloat by keeping only the most recent file context per turn and trimming file/web observation payloads. Improved live output masking, offline vision detection, image lightbox preview, and request startup logging.
- Packaged
release/llem-3.3.33.vsix.
- Bumped the VSIX build from
3.3.31to3.3.32. - Masked create_file and edit_file code from live output and now show progress-only streaming states. Improved offline vision-model detection using local Ollama manifests and added vision decision logging.
- Packaged
release/llem-3.3.32.vsix.
- Bumped the VSIX build from
3.3.30to3.3.31. - Improved offline vision-model detection using local Ollama manifests. Fixed capability checks to use the active engine endpoint and added vision decision logging.
- Packaged
release/llem-3.3.31.vsix.
- Bumped the VSIX build from
3.3.29to3.3.30. - Implemented Intelligent Repetition Guard with tiered backoff (3s, 10s, 30s), non-blocking queue scheduling, and automated retry orchestration with UI cooldown feedback.
- Packaged
release/llem-3.3.30.vsix.
- Bumped the VSIX build from
3.3.28to3.3.29. - Implemented File System Access Transparency with user-approved out-of-workspace operations and high-fidelity UI feedback.
- Packaged
release/llem-3.3.29.vsix.
- Bumped the VSIX build from
3.3.27to3.3.28. - Action transparency and loop prevention improvements
- Packaged
release/llem-3.3.28.vsix.
- Bumped the VSIX build from
3.3.26to3.3.27. - Added live stream metadata (duration, chunks, chars) to action progress UI
- Packaged
release/llem-3.3.27.vsix.
- Bumped the VSIX build from
3.3.23to3.3.24. - Implemented AI self-correction loop and Codex-style action progress visualization
- Packaged
release/llem-3.3.24.vsix.
- Bumped the VSIX build from
3.3.21to3.3.22. - B-1 fix: Repeated/watchdog-aborted responses are no longer pushed to the chat history. Previously, the aborted assistant message would linger in history and seed the next turn with a contaminated context, causing cascading repetition loops. Now the pipeline returns immediately without writing the bad response to history.
- B-2 fix: Consecutive
assistant β assistantoruser β usermessage pushes during agentic action loops are now de-duplicated. If acontinuationuser message arrives when the last history entry is already auserentry, the content is merged rather than creating a second entry. - B-3 fix: Images are no longer forwarded to text-only models (gemma, llama, mistral, etc.). The model name is inspected for known vision indicators (
llava,vision,:vl,bakllava,moondream, etc.) and a clear in-chat notice is shown when an image is skipped. - B-4 fix:
RequestRetryGuardfingerprints now use a normalized, punctuation-stripped 300-character prompt core instead of the raw prompt string. Rephrased retries of the same request are blocked even when the exact wording changes. - FileStateGuard: New
src/fileStateGuard.tscomputes SHA-256 hashes before and after everyedit_fileaction. Ano-effectwarning is surfaced when the file is unchanged (typically a<find>mismatch). After 3 consecutive no-effect edits on the same file,loop-detectedis returned and further edits on that path are blocked viaActionLoopGuard. - Packaged
release/llem-3.3.22.vsix.
- Bumped the VSIX build from
3.3.20to3.3.21. - Live stream output now shows raw AI text without any HTML/Markdown parsing during generation β
<edit_file>,<find>,<replace>action tags are visible as-is while streaming. - Final reply (after stream completes) continues to render as full Markdown with code highlighting, file badges, and action summaries.
- Removed
sanitizeAssistantDisplayText()call from the liverenderStreamNow()path so the raw model output is never silently stripped mid-stream. - Packaged
release/llem-3.3.21.vsix.
- Bumped the VSIX build from
3.3.19to3.3.20. - Hardened assistant output sanitization to prevent leaked action tags and scratchpad text in streamed replies
- Packaged
release/llem-3.3.20.vsix.
- Bumped the VSIX build from
3.3.18to3.3.19. - Fixed RepetitionWatchdog false positives that could truncate edit-file streams during repeated action-tag/code sequences. Added regression coverage for repeated closing-tag action streams.
- Packaged
release/llem-3.3.19.vsix.
- Bumped the VSIX build from
3.3.17to3.3.18. - Repackaged the current workspace state through the formal VSIX release flow.
- Packaged
release/llem-3.3.18.vsix.
- Bumped the VSIX build from
3.3.16to3.3.17. - Fixed RepetitionWatchdog false positives on markdown structure tokens so tables, fences, headers, list markers, blockquotes, and task markers no longer abort valid replies
- added regression tests for markdown-safe watchdog behavior
- Packaged
release/llem-3.3.17.vsix.
- Bumped the VSIX build from
3.3.16to3.3.16. - Added structured repetition abort handling, retry and action loop guards, safer file mutation validation, restored clickable editable files, and added default-browser opening for chat URL links
- Packaged
release/llem-3.3.16.vsix.
- Bumped the extension version from
3.3.15to3.3.16. - Fixed Korean IME Enter handling so composing Hangul no longer sends a duplicated trailing message.
- Added composition-aware Enter submission logic with regression coverage for
isComposingand IME confirm keycode229. - Hardened stream loop handling so repetition detection is promoted into structured pipeline state instead of being treated like a normal completion.
- Stopped follow-up execution after repetition aborts, including watchdog-triggered stops and turn-to-turn repeated continuation loops.
- Added request fingerprinting and retry fencing so the same request cannot immediately restart after a repetition stop.
- Added action loop guarding so repeated
create_fileandedit_filepatterns are blocked before they spin in place. - Added file mutation guarding so the same file cannot be mutated twice at the same time during model-driven actions.
- Rejected incomplete
<find>/<replace>edit bodies before disk write, preventing truncated edit actions from corrupting files. - Rejected obviously truncated
create_fileoutput such as unbalanced fenced code blocks before writing files. - Generalized plan-first enforcement for implementation requests, not just special design-guideline file names.
- Added implementation planning mode so code-generation requests are guided toward a compact file split and smaller Next.js/TypeScript steps first.
- Added a stronger post-processing guard that blocks action-tag execution if the model disobeys the initial plan-only response.
- Restored clickable editable-file behavior in chat by improving local file link validation, workspace-path resolution, and message rerendering after workspace file sync.
- Added default-browser opening for URL links in chat by routing external links through the extension host with
vscode.env.openExternal(...). - Expanded tests for stream outcome handling, retry guards, action loop guards, file mutation guards, design planning mode, editable file resolution, external link routing, and file-safety edge cases.
- Bumped the VSIX build from
3.3.14to3.3.15. - Fixed Korean IME Enter handling to prevent duplicate trailing messages
- added regression tests for composition-safe prompt submission
- Packaged
release/llem-3.3.15.vsix.
- Bumped the VSIX build from
3.3.14to3.3.14. - Added queued request pause/resume and reordering
- Added direct editing for queued items
- Expanded queue tests and stabilized package test suite
- Packaged
release/llem-3.3.14.vsix.
- Bumped the VSIX build from
3.3.11to3.3.12. - Fix stop button UI and edit banner visibility
- Packaged
release/llem-3.3.12.vsix.
- Bumped the VSIX build from
3.3.10to3.3.11. - Fix main-view layout causing input to overflow
- Packaged
release/llem-3.3.11.vsix.
- Bumped the VSIX build from
3.3.9to3.3.10. - Fix terminal executing logged messages
- Packaged
release/llem-3.3.10.vsix.
- Bumped the VSIX build from
3.3.8to3.3.9. - Fix immediate deletion of history items in UI
- Packaged
release/llem-3.3.9.vsix.
- Bumped the VSIX build from
3.3.7to3.3.8. - Fix edit banner visibility on initial chat load
- Packaged
release/llem-3.3.8.vsix.
- Bumped the VSIX build from
3.3.7to3.3.7. - Fix edit banner visibility on initial chat load
- Packaged
release/llem-3.3.7.vsix.
- Bumped the VSIX build from
3.3.6to3.3.7. - Fix terminal rendering, layout stability, and improve hardware summary quality
- Packaged
release/llem-3.3.7.vsix.
- Bumped the VSIX build from
3.3.5to3.3.6. - Implemented sequence-aware RepetitionWatchdog and improved action parsing to prevent infinite loops.
- Packaged
release/llem-3.3.6.vsix.
- Bumped the VSIX build from
3.2.8to3.2.9. - Fixed model output streaming issues with buffering and enhanced token extraction for reasoning fields.
- Packaged
release/llem-3.2.9.vsix.
- Bumped the VSIX build from
3.2.6to3.2.7. - Fixed AI response truncation, improved action tag stripping with smart quote support, and tuned model performance profiles for 26B models.
- Packaged
release/llem-3.2.7.vsix.
- Bumped the VSIX build from
3.2.4to3.2.5. - Enabled unlimited response length by setting predict token limits to -1. Added handling for unlimited output in both Ollama and LM Studio engines.
- Packaged
release/llem-3.2.5.vsix.
- Bumped the VSIX build from
3.2.2to3.2.3. - Increased token prediction limits to 4096+ to prevent response truncation. Fixed LM Studio max_tokens mapping.
- Packaged
release/llem-3.2.3.vsix.
- Bumped the VSIX build from
3.2.0to3.2.1. - Implemented repetition penalty for large models to prevent hallucination loops, fixed model selection persistence in settings.json, and added overwrite protection for user settings.
- Packaged
release/llem-3.2.1.vsix.
- Bumped the VSIX build from
3.1.8to3.1.9. - Revert to standard settings.json persistence and fix model selection overwrite issue
- Packaged
release/llem-3.1.9.vsix.
- Bumped the VSIX build from
3.1.7to3.1.8. - Added Codex-style message actions for user and assistant replies
- restored copy and edit flows for existing user messages
- added edit-in-new-branch composer state
- Packaged
release/llem-3.1.8.vsix.
- Bumped the VSIX build from
3.1.6to3.1.7. - Made the model dropdown persist the real active default model and pass runtime engine/model metadata into each request
- Removed the earlier-message editing banner and edit entrypoint from the chat UI so message composer stays in normal send mode
- Packaged
release/llem-3.1.7.vsix.
- Bumped the VSIX build from
3.1.5to3.1.6. - Added file-based diagnostics for stream debugging with per-request raw chunk capture and parsed token traces
- Logged final assistant text cleanup so empty replies can be traced from transport through final rendering
- Packaged
release/llem-3.1.6.vsix.
- Bumped the VSIX build from
3.1.4to3.1.5. - Improved stream parsing for object-shaped output chunks
- Fixed empty reply state when the model returned text in newer OpenAI-compatible stream formats
- Packaged
release/llem-3.1.5.vsix.
- Bumped the VSIX build from
3.1.3to3.1.4. - Fixed recurring empty replies by broadening stream parsing for additional LM Studio and Ollama response shapes
- Added raw stream preview logging when parsed output ends up empty so future payload mismatches are diagnosable instantly
- Packaged
release/llem-3.1.4.vsix.
- Bumped the VSIX build from
3.1.2to3.1.3. - Added model-aware performance presets for 26B-class local Ollama runs
- Added prompt budgeting and richer diagnostics for large local Gemma-family models
- Expanded the README with detailed performance profile guidance, 26B tuning notes, and diagnostics tips
- Packaged
release/llem-3.1.3.vsix.
- Bumped the VSIX build from
3.1.1to3.1.2. - Fixed empty-reply turns by hardening stream parsing for Ollama and LM Studio
- Flushed trailing stream buffers so the final token is not lost when a stream ends without a newline
- Saved assistant replies consistently into chat history so follow-up turns keep the right conversation context
- Updated the chat UI to distinguish truly empty replies from successful completed output
- Packaged
release/llem-3.1.2.vsix.