v0.4.1 — VLM UI + Persistence + HTTP#35
Merged
Merged
Conversation
- StoredMessage.images: [ImageAttachment] mirrors ChatMessage.images added in PR #33. Custom decoder defaults to empty when key absent — pre-v0.4.1 conversation JSON loads unchanged. - save(_:) internalises image URLs: any attachment outside the conversation's own images dir gets copied to <directory>/<conv-uuid>/images/<image-uuid>.<ext>, then the URL is rewritten to point there. Best-effort: copy failure logs to stderr and falls through with the original URL preserved. - delete(id:) tears down both the JSON sidecar and the per- conversation directory (recursive remove). Pre-v0.4.1 conversations with no per-dir no-op cleanly. - 4 new tests in ConversationStoreImagesTests (.serialized for tmpdir safety): external-image copy, idempotent internal-URL preservation, delete-tears-down-conv-dir, legacy-JSON-decode. 115/115 Core green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- UIChatMessage gains images: [ImageAttachment] mirroring StoredMessage + Core ChatMessage. Hydrated from stored messages on conversation reload; stripped to images: [] on send when isn't a VLM. - ChatViewModel.attachedImages staging bag; canAttachImages / attachImage(at:) / removeAttachedImage(at:) / clearAttachedImages() helpers wired to the input view and to the model-modality gate (ChatViewModel.canAttachImages == coordinator.currentModel.format == .mlxVLM). send() picks the bag up + clears it. generate()'s ChatMessage map now passes images through to Core / engine. - ChatInputView gets a horizontal thumbnail strip above the text field, a paperclip button driving SwiftUI .fileImporter (image UTTypes only — png/jpeg/webp/gif/heic/bmp), and an enabled-state gate with explanatory tooltip when the loaded model is text-only. Send button now also enables when the user has staged images but no text (image-only ask is legitimate on a VLM). - ChatMessageView renders an inline LazyVGrid of 96pt thumbnails above the bubble for any message that has attachments. Click a thumbnail to open the file in Preview via NSWorkspace.shared.open. - AsyncThumbnailImage helper (NSImage-backed) lives next to ChatInputView and is reused by ChatMessageView. Local Xcode App Build green; 115/115 Core tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChatCompletionRequest.Message.content now decodes either:
- a plain string (every existing client)
- or an OpenAI multimodal array of {type, text|image_url} parts
Implementation lives in a new MultimodalContent enum that tries
String first and falls through to [Part] — so legacy callers keep
working unchanged.
handleChatCompletions extracts text via .content.text (concatenated
text parts), images via .content.extractImages():
- data:<mime>;base64,<bytes> URLs decode to a tmpfile-backed
ImageAttachment (jpeg/png/webp/gif/heic/bmp). Caps: 4 images per
message, 10 MB per image. Oversized / unknown-MIME parts silently
drop.
- http(s):// and file:// are not fetched (defence-in-depth, even
though the server is localhost-bound).
Decoded ImageAttachments flow through ChatMessage.images → engine
(VLM model receives them; LLM model logs + drops, per PR #34).
Ollama /api/chat / /api/generate stays text-only — Ollama uses a
separate top-level field that's a follow-up.
115/115 Core tests still green; existing chatCompletionsNonStreaming
proves the string-form fallback path still works.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third and final v0.4.1 PR — lights up the user-facing surfaces for vision-language models. Closes the v0.4.1 rollout begun in #33 (Foundation) and #34 (Engine).
Plan: `docs/superpowers/plans/2026-05-10-v0.4.1-vlm.md`.
What lands
Chat input — image picker + thumbnail strip
Chat bubbles — inline thumbnails
Conversation persistence
OpenAI multimodal HTTP
```json
{"role":"user","content":[
{"type":"text","text":"What's this?"},
{"type":"image_url","image_url":{"url":"data:image/png;base64,…"}}
]}
```
Test plan
Stack
Builds on `main` after:
After this PR merges, v0.4.1 is feature-complete and a tag is reasonable.
🤖 Generated with Claude Code