Long running#3
Merged
Merged
Conversation
… how to patch the python code for sd-scripts
lukemarsden
added a commit
that referenced
this pull request
Nov 14, 2025
THREE CRITICAL BUGS found causing HTTPS deadlock within 16 hours: BUG #1: GStreamer Thread-Safety Violation (PRIMARY ROOT CAUSE) - gst_element_send_event() called from HTTPS thread (wrong context!) - Must be called from pipeline's g_main_loop_run() thread - HTTPS thread blocks on GStreamer internal mutex (0x70537c0062b0) - Located in streaming.cpp:124, 132, 176, 184, 401, 524 - FIX: Use g_main_loop_quit() instead (thread-safe) BUG #2: NVIDIA Driver Mutex Deadlock (SECONDARY) - Multiple GStreamer pipelines compete for NVIDIA mutex (0x705580003b80) - Circular deadlock: HTTPS→GStreamer→NVIDIA→? - Core dump shows 2 threads stuck on same NVIDIA mutex - Inside proprietary libEGL_nvidia.so.0 (no symbols) - FIX: Separate CUDA contexts per pipeline OR remove NVIDIA from SSL BUG #3: HTTPS Connection Leak (CONTRIBUTING FACTOR) - custom-https.cpp error handler doesn't close sockets - 17 leaked connections in 16 hours (~1/hour leak rate!) - Connections stuck in CLOSE_WAIT forever - From: external browsers, moonlight-web, localhost - FIX: Add socket->close() in error handler COMPLETE DEADLOCK CHAIN: 1. HTTPS request fires StopStreamEvent (endpoints.hpp:484) 2. Event handler runs in HTTPS thread (synchronous dispatch) 3. Calls gst_element_send_event() - WRONG THREAD (Bug #1) 4. Blocks on GStreamer mutex 5. GStreamer holds mutex, waiting on NVIDIA 6. NVIDIA mutex held by another operation 7. ALL new HTTPS requests block on continue_lock() 8. System appears completely hung for HTTPS EVIDENCE: - HTTP (port 47989) still works perfectly - HTTPS (port 47984) completely hung - Core dump shows exact mutex addresses and call stacks - 17 leaked CLOSE_WAIT connections - Thread 99 stuck in gst_element_send_event from wrong context CRITICAL FIX: Replace all gst_element_send_event(eos) with g_main_loop_quit() in event handlers at streaming.cpp:124,132,176,184,401,524
lukemarsden
added a commit
that referenced
this pull request
Mar 18, 2026
lukemarsden
added a commit
that referenced
this pull request
Mar 18, 2026
Issue #1 (stuck "Starting Desktop"): - Add defer in StartDesktop to clear external_agent_status on any error - Give waitForDesktopBridge its own 90s context decoupled from dockerCtx Issue #4 (status not cleared on stop): - StopDesktop unconditionally clears external_agent_status and status_message Issue #5 (no restart button in Starting state): - Frontend: show Stop button in "starting" state in both screenshot and stream modes - Show "may have failed to start" message after 2-minute timeout Issue #10a (duplicate sessions per spectask): - Re-read task from DB before CreateSession; skip if PlanningSessionID already set Issue #10b (scanner targets wrong sessions): - processPendingPromptsForIdleSessions now filters to canonical planning_session_id only Issue #2 (duplicate message sends): - Add ClaimPromptForSending() atomic store method (UPDATE WHERE status IN pending/failed) - Both interrupt and any-pending delivery paths use claim before send Issue #7 (promotion race gives empty zvol): - resolveDockerDataDir: acquire read lock before fresh zvol creation; re-check after Issue #3: Already handled by existing open_thread on agent_ready reconnect Issue #6: Fixed in merged PR #1947 (RecoverStaleBuilds 60s retry) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Spec-Ref: helix-specs@04b515c3c:001588_read-helixs-design2026
lukemarsden
added a commit
that referenced
this pull request
Apr 26, 2026
…art-up Embed auth fix #3: validate token in /auth/{authenticated,user} directly
philwinder
added a commit
that referenced
this pull request
Apr 28, 2026
Previously the activation prompt only carried Body. The Worker had to call read_events to learn Subject, From, ThreadID, Extra — exactly the round-trip that caused the docs-engineer to misroute issue #3 to PR #2 during the github demo's E2E run. renderTrigger now formats every populated envelope field into the prompt, omitting empties for cleanliness. The Trigger.Body field is dropped; callers pass the full Message instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
philwinder
added a commit
that referenced
this pull request
May 4, 2026
Previously the activation prompt only carried Body. The Worker had to call read_events to learn Subject, From, ThreadID, Extra — exactly the round-trip that caused the docs-engineer to misroute issue #3 to PR #2 during the github demo's E2E run. renderTrigger now formats every populated envelope field into the prompt, omitting empties for cleanliness. The Trigger.Body field is dropped; callers pass the full Message instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lukemarsden
added a commit
that referenced
this pull request
May 15, 2026
Spec-Ref: helix-specs@abaaa8c45:002021_investigate-notion
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.