Skip to content

Harden api-proxy startup healthcheck to reduce early unhealthy flaps#2155

Merged
lpcox merged 3 commits intomainfrom
copilot/fix-api-proxy-healthcheck-flaps
Apr 22, 2026
Merged

Harden api-proxy startup healthcheck to reduce early unhealthy flaps#2155
lpcox merged 3 commits intomainfrom
copilot/fix-api-proxy-healthcheck-flaps

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 22, 2026

api-proxy startup was intermittently marked unhealthy before the Node server fully stabilized, which could block agent startup via depends_on: service_healthy. This change increases healthcheck tolerance during bootstrap while preserving the same readiness endpoint.

  • Healthcheck startup tolerance (api-proxy)

    • Increased start_period from 2s10s
    • Increased retries from 510
    • Increased timeout from 1s2s
    • Kept probe path and cadence unchanged (curl -f http://localhost:10000/health, interval: 1s)
  • Unit test contract update

    • Extended generateDockerCompose api-proxy healthcheck assertions to lock the new timeout, retries, and start_period values.
healthcheck: {
  test: ['CMD', 'curl', '-f', 'http://localhost:10000/health'],
  interval: '1s',
  timeout: '2s',
  retries: 10,
  start_period: '10s',
}

Copilot AI changed the title [WIP] Fix api-proxy healthcheck flaps during startup Harden api-proxy startup healthcheck to reduce early unhealthy flaps Apr 22, 2026
Copilot AI requested a review from lpcox April 22, 2026 23:01
@lpcox lpcox marked this pull request as ready for review April 22, 2026 23:07
@lpcox lpcox requested a review from Mossaka as a code owner April 22, 2026 23:07
Copilot AI review requested due to automatic review settings April 22, 2026 23:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens the api-proxy container startup healthcheck in generated Docker Compose to reduce early “unhealthy” flaps that can block the agent via depends_on: service_healthy.

Changes:

  • Increased api-proxy healthcheck tolerance by updating timeout, retries, and start_period.
  • Updated unit tests to assert the new healthcheck parameters.
Show a summary per file
File Description
src/docker-manager.ts Adjusts api-proxy healthcheck parameters (timeout/retries/start_period) to be more tolerant during bootstrap.
src/docker-manager.test.ts Extends the api-proxy healthcheck test assertions to match the updated compose output.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Comment thread src/docker-manager.test.ts Outdated
Comment on lines +2366 to +2369
const proxy = result.services['api-proxy'] as any;
expect(proxy.healthcheck).toBeDefined();
expect((proxy.healthcheck as any).test).toEqual(['CMD', 'curl', '-f', 'http://localhost:10000/health']);
expect(proxy.healthcheck.test).toEqual(['CMD', 'curl', '-f', 'http://localhost:10000/health']);
expect(proxy.healthcheck.timeout).toBe('2s');
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, casting the entire proxy service to any is unnecessary and reduces type safety; DockerService already includes a typed healthcheck field. Prefer keeping proxy typed and using the existing pattern used elsewhere in this file (e.g., proxy.healthcheck!.... after toBeDefined()) or assigning const healthcheck = proxy.healthcheck! for assertions.

Copilot uses AI. Check for mistakes.
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented Apr 22, 2026

@copilot integrate this review feedback #2155 (review)

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 22, 2026

@copilot integrate this review feedback #2155 (review)

Integrated the review feedback by removing the unnecessary any cast in src/docker-manager.test.ts and using a typed healthcheck variable for assertions. Addressed in commit d0e87f3.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

🔥 Smoke Test: Copilot BYOK (Offline) Mode

Test Result
GitHub MCP (list_pull_requests#2147 "chore: bump Copilot CLI to v1.0.34 in smoke-copilot")
GitHub.com connectivity
File write/read (smoke-test-copilot-byok-24807652602.txt)
BYOK inference (agent → api-proxy → api.githubcopilot.com)

Running in BYOK offline mode (COPILOT_OFFLINE=true) via api-proxy → api.githubcopilot.com

Overall: PASS — PR by @app/copilot-swe-agent, assignees: @lpcox, @Copilot

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results

PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3 ❌ NO
Node.js v24.14.1 v20.20.2 ❌ NO
Go go1.22.12 go1.22.12 ✅ YES

Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environment.

Tested by Smoke Chroot

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results: GitHub Actions Services Connectivity

Check Status Details
Redis PING (port 6379) ❌ Failed redis-cli not available — apt-get is non-functional in this environment
pg_isready (port 5432) ❌ Failed host.docker.internal:5432 - no response
psql SELECT 1 (smoketest DB) ❌ Failed PostgreSQL unreachable (connection timed out)

All 3 checks failed. The service containers at host.docker.internal are not reachable from this environment. apt-get is also non-functional, preventing installation of redis-cli.

🔌 Service connectivity validated by Smoke Services

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color passed ✅ PASS
Go env passed ✅ PASS
Go uuid passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx passed ✅ PASS
Node.js execa passed ✅ PASS
Node.js p-limit passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #2155 · ● 1.1M ·

@lpcox lpcox merged commit 06d76ad into main Apr 22, 2026
59 of 64 checks passed
@lpcox lpcox deleted the copilot/fix-api-proxy-healthcheck-flaps branch April 22, 2026 23:59
@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results

Test Status
GitHub MCP (list merged PRs) ✅ PR #2155: "Harden api-proxy startup healthcheck to reduce early unhealthy flaps"
GitHub.com HTTP connectivity ❌ Pre-step data not available (template vars unsubstituted)
File write/read ❌ Pre-step data not available (template vars unsubstituted)

Overall: FAIL — 2 of 3 tests could not be verified due to unsubstituted workflow template variables.

PR author: @app/copilot-swe-agent | Assignees: @lpcox, @Copilot

📰 BREAKING: Report filed by Smoke Copilot

@github-actions
Copy link
Copy Markdown
Contributor

Smoke test matrix
PRs: Harden api-proxy startup healthcheck to reduce early unhealthy flaps; chore: bump Copilot CLI to v1.0.34 in smoke-copilot

  1. GitHub MCP-style PR review ✅
  2. safeinputs-gh CLI query ❌ (tool unavailable in this runtime)
  3. Playwright github.com title check ✅
  4. Tavily search ❌ (Tavily MCP unavailable)
  5. File write /tmp/gh-aw/agent/smoke-test-codex-24811093204.txt ✅
  6. Bash cat verification ✅
  7. Discussion interaction + oracle comment ✅
  8. npm ci && npm run build ✅
    Overall status: FAIL

Warning

⚠️ Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions github-actions Bot mentioned this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

api-proxy healthcheck flaps during startup, causing Docker Compose to fail

3 participants