Local server for dagger cli#5
Merged
Merged
Conversation
…ct ones that won't fit in GPU memory. What to do next: why is it hanging??
lukemarsden
added a commit
that referenced
this pull request
Oct 8, 2025
**Issue #1-3: WolfLobbyID Handling** - Add WolfLobbyID to SessionMetadata (was missing) - Save WolfLobbyID when creating external agent session - Fix token response to return lobby ID instead of PIN **Issue #5: Moonlight Credentials** - Add api.credentials = 'helix' in MoonlightStreamViewer - Matches moonlight-web-config/config.json setting **Documentation**: - docs/STREAMING_ISSUES_FOUND.md - Complete review findings - 12 issues documented (3 critical fixed, 2 need action, 7 minor/future) Remaining: Wolf host pairing needed before streaming works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
lukemarsden
added a commit
that referenced
this pull request
Oct 16, 2025
**Issue #1-3: WolfLobbyID Handling** - Add WolfLobbyID to SessionMetadata (was missing) - Save WolfLobbyID when creating external agent session - Fix token response to return lobby ID instead of PIN **Issue #5: Moonlight Credentials** - Add api.credentials = 'helix' in MoonlightStreamViewer - Matches moonlight-web-config/config.json setting **Documentation**: - docs/STREAMING_ISSUES_FOUND.md - Complete review findings - 12 issues documented (3 critical fixed, 2 need action, 7 minor/future) Remaining: Wolf host pairing needed before streaming works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
lukemarsden
added a commit
that referenced
this pull request
Nov 14, 2025
PROVEN FACTS (from core dump + source analysis): Thread Flow: 1. Thread 99 = HTTPS server (wolf.cpp:187, port 47984) 2. Processing /cancel endpoint (endpoints::https::cancel) 3. Fires StopStreamEvent SYNCHRONOUSLY (event_bus.hpp:171) 4. Handler calls gst_element_send_event FROM HTTPS THREAD 5. GStreamer recursively traverses pipeline (frames #7→#5) 6. Blocks on mutex 0x70537c0062b0 in libgstbase-1.0.so.0 7. Thread 40 (audio pipeline owner) is HEALTHY in ppoll 8. Only Thread 99 waiting on this mutex - no contention GStreamer Analysis: - gst_element_send_event IS thread-safe (uses recursive STATE_LOCK) - Documented as "MT safe" - can be called from any thread - But empirically CAUSES DEADLOCK when called from HTTPS thread - GStreamer has both recursive (STATE/PAD) and NON-recursive (live_lock) mutexes The Mystery: - WHO holds mutex 0x70537c0062b0? NOT Thread 40, not any visible thread - Options: abandoned by crashed thread, corrupted, or race condition - Cannot prove exact mechanism without debugging symbols CONFIRMED FIXES: 1. HTTPS connection leak (100% certain) - add close() in error handler 2. Replace gst_element_send_event with g_main_loop_quit (80% confidence) - Eliminates cross-thread pipeline calls - g_main_loop_quit IS thread-safe (documented) - Even though gst_element_send_event claims to be safe, empirically fails Gaps in Evidence: - No debug symbols for libgstbase (can't see frame #4 function) - Core dump partially corrupted - Can't identify mutex owner - Need symbols + reproduction to prove exact mechanism
lukemarsden
added a commit
that referenced
this pull request
Mar 18, 2026
Spec-Ref: helix-specs@792cfa369:001588_read-helixs-design2026
lukemarsden
added a commit
that referenced
this pull request
Mar 18, 2026
Spec-Ref: helix-specs@6c5123c58:001588_read-helixs-design2026
lukemarsden
added a commit
that referenced
this pull request
Mar 18, 2026
Issue #1 (stuck "Starting Desktop"): - Add defer in StartDesktop to clear external_agent_status on any error - Give waitForDesktopBridge its own 90s context decoupled from dockerCtx Issue #4 (status not cleared on stop): - StopDesktop unconditionally clears external_agent_status and status_message Issue #5 (no restart button in Starting state): - Frontend: show Stop button in "starting" state in both screenshot and stream modes - Show "may have failed to start" message after 2-minute timeout Issue #10a (duplicate sessions per spectask): - Re-read task from DB before CreateSession; skip if PlanningSessionID already set Issue #10b (scanner targets wrong sessions): - processPendingPromptsForIdleSessions now filters to canonical planning_session_id only Issue #2 (duplicate message sends): - Add ClaimPromptForSending() atomic store method (UPDATE WHERE status IN pending/failed) - Both interrupt and any-pending delivery paths use claim before send Issue #7 (promotion race gives empty zvol): - resolveDockerDataDir: acquire read lock before fresh zvol creation; re-check after Issue #3: Already handled by existing open_thread on agent_ready reconnect Issue #6: Fixed in merged PR #1947 (RecoverStaleBuilds 60s retry) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Spec-Ref: helix-specs@04b515c3c:001588_read-helixs-design2026
lukemarsden
added a commit
that referenced
this pull request
Apr 30, 2026
Three pre-baked profiles for the customer's actual deployment, each sized to its hardware and using the current best open-weights models (April 2026 — DeepSeek-V4-Pro, GLM-4.7-Flash, Qwen3.6-35B-A3B, Qwen3.5-27B). Verified composeparse handles each unchanged. Profiles: - design/sample-profiles/customer-node1-4xA100.yaml 4× A100 80GB. 4 services on GPUs 0-2 (qwen3 embeddings sharing GPU 0, GLM-4.7-Flash 31B on GPU 1, Qwen3.6-35B-A3B MoE on GPU 2). GPU 3 reserved for desktops via Decision 15. A100 has no NVENC so video encoding falls back to libx264 software — fine for 1-2 sessions (verified live in cloud GPU campaign run #4). composeparse output: 4 services, GPUCount=3 → 4-GPU host has 1 GPU of explicit headroom. - design/sample-profiles/customer-node2to4-4xL40S.yaml 4× L40S 48GB. Same 4-services-on-3-GPUs shape as Node 1; sized for L40S's smaller VRAM (Qwen3.5-27B + Qwen3.6-35B-A3B FP8 fit single cards). Deployed identically to all three nodes (2, 3, 4) — the inference router round-robins. L40S has full NVENC + display engine → hardware-accelerated desktops on GPU 3. - design/sample-profiles/customer-node5-8xMI300X.yaml 8× MI300X 192GB = 1.5 TiB total VRAM. Runs DeepSeek-V4-Pro 862B FP8 with TP=8 across all 8 cards via rocm/vllm. **Inference-only** — MI300X CDNA-3 is compute-only, can't render desktops (Mesa radeonsi refuses graphics context — verified live in cloud GPU campaign run #5). The rocm/vllm image needs explicit `entrypoint: ["vllm", "serve"]` because unlike vllm/vllm-openai its default entrypoint is /bin/bash (also verified live). Wiring across the 4 surfaces: - api/pkg/runner/composeparse/sample_profiles_test.go — locked in: customer-node1 (4 services, 3 GPUs), customer-node2to4 (4, 3), customer-node5 (1 service, 8 GPUs). Future parser changes that break any of these will fail tests. - design/sample-profiles/README.md — table updated with all three plus a new section explaining the per-node deployment. - frontend/src/components/dashboard/profileBlocks.ts — three new curated entries in the Profile Gallery: "Customer Node 1 — 4×A100 80GB", "Customer Nodes 2-4 — 4×L40S 48GB (each)", and "Customer Node 5 — 8×MI300X big-iron (inference-only)". Each card has accurate pros/cons including the desktop-headroom story per node. - integration-test/gpucloud/matrix.yaml — the existing disabled node1-a100-4x / node2-l40s-4x / node3-l40s-4x / node4-l40s-4x / node5-mi300x-8x entries now point at the new richer customer-nodeN-... profiles instead of the generic placeholders they had before. UI flexibility audit (separate question from the user): the compose YAML field in EditRunnerProfile.tsx is a plain `<TextField multiline minRows={20}>` textarea — smart users have full flexibility to define arbitrary services with any Docker image (incl. custom builds + private registries), any env vars, any CLI args, any GPU pinning. Validation is server-side via composeparse on save (parses the YAML, extracts model list + GPU count) — there's no client-side allowlist or schema enforcement. Test results: TestParse_SampleProfiles green for all 9 profiles including the new 3. Frontend builds clean (39s). Harness dry-run shows the new entries are correctly disabled (so accidental cloud spend is impossible without flipping enabled: true). Spec-Ref: helix-specs@ac4cc3643:001959_we-need-to-replace-all
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
conceptual review welcome