Skip to content

fix(worker/ocs): resolve SSE timeout and server startup issues#96

Merged
hrygo merged 3 commits intohrygo:mainfrom
aaronwong1989:fix/ocs-sse-timeout-and-server-startup
May 1, 2026
Merged

fix(worker/ocs): resolve SSE timeout and server startup issues#96
hrygo merged 3 commits intohrygo:mainfrom
aaronwong1989:fix/ocs-sse-timeout-and-server-startup

Conversation

@aaronwong1989
Copy link
Copy Markdown
Contributor

Summary

This PR fixes two high-priority issues affecting OpenCode Server worker stability and observability:

Issue #85: SSE Client Timeout Breaks Long-lived Streaming

Problem: SingletonProcessManager created an http.Client with 30s Timeout. The readSSE() goroutine reused this client for SSE requests, causing the long-lived streaming connection to be interrupted after 30s. Additionally, Terminate()/Kill() did not cancel the in-flight SSE HTTP request, leaving the readSSE goroutine blocked.

Solution:

  • Add separate sseClient field without Timeout for SSE connections
  • Use cancellable context in readSSE() for clean shutdown
  • Store sseCancel in Worker and call it in Terminate()/Kill()

Issue #79 Finding 1: Silent HTTP Server Startup Failure

Problem: serverErr channel was created but never read. If HTTP server failed to bind (port in use, permission denied), the error was silently lost. The gateway appeared to start successfully but was actually non-functional.

Solution:

  • Use select to race signal against serverErr
  • Log error and exit if ListenAndServe fails
  • Prevents silent startup failure

Changes

  • internal/worker/opencodeserver/singleton.go: Add sseClient field, update Acquire() signature
  • internal/worker/opencodeserver/worker.go: Use sseClient, add context cancellation
  • cmd/hotplex/gateway_run.go: Handle serverErr channel, remove unused waitForSignal()
  • internal/worker/opencodeserver/singleton_test.go: Update for new Acquire() signature

Test Plan

  • make test - All tests pass
  • make lint - Zero issues
  • Manual test: SSE connection remains responsive beyond 30s timeout
  • Manual test: Terminate/Kill unblocks readSSE goroutine promptly
  • Manual test: Port conflict triggers error log and clean exit

Related Issues

🤖 Generated with Claude Code

This commit fixes two high-priority issues affecting OpenCode Server worker:

**Issue hrygo#85: SSE client timeout breaks long-lived streaming**
- Add separate sseClient without Timeout for SSE connections
- Use cancellable context in readSSE for clean shutdown
- Prevents SSE stream interruption after 30s HTTPTimeout
- Enables proper goroutine cleanup on Terminate/Kill

**Issue hrygo#79 Finding 1: serverErr channel never consumed**
- Fix silent HTTP server startup failure
- Use select to race signal against server error
- Log error and exit if ListenAndServe fails
- Prevents gateway appearing to start when port is unavailable

**Changes:**
- internal/worker/opencodeserver/singleton.go: Add sseClient field
- internal/worker/opencodeserver/worker.go: Use sseClient + context cancellation
- cmd/hotplex/gateway_run.go: Handle serverErr channel
- Remove unused waitForSignal function

**Verification:**
- make test: All tests pass
- make lint: Zero issues
- Closes hrygo#91 (verified as already fixed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 53.65854% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.55%. Comparing base (a0275ac) to head (9b7fc35).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
internal/worker/opencodeserver/worker.go 37.03% 17 Missing ⚠️
internal/worker/opencodeserver/singleton.go 85.71% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #96      +/-   ##
==========================================
- Coverage   59.56%   59.55%   -0.01%     
==========================================
  Files         134      134              
  Lines       15944    15969      +25     
==========================================
+ Hits         9497     9511      +14     
- Misses       5856     5865       +9     
- Partials      591      593       +2     
Flag Coverage Δ
unittests 59.55% <53.65%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sisyphus 🏔️ and others added 2 commits May 1, 2026 17:22
…t cancellation

Add unit tests to cover new code paths:
- Test sseClient initialization in NewSingletonProcessManager
- Test sseClient is returned by Acquire()
- Test Terminate() and Kill() call sseCancel
- Test New() worker initialization
- Test nil sseCancel handling in Terminate/Kill

These tests improve patch coverage without requiring opencode binary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Test that Terminate and Kill properly handle releaseOnce to ensure
singleton reference is released exactly once.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hrygo hrygo merged commit 53f7fdd into hrygo:main May 1, 2026
7 of 9 checks passed
@aaronwong1989 aaronwong1989 deleted the fix/ocs-sse-timeout-and-server-startup branch May 1, 2026 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants