feat(elbv2): real health probes with state machine#756
Merged
vieiralucas merged 2 commits intomainfrom Apr 25, 2026
Merged
Conversation
A background tokio prober walks every target group with HealthCheckEnabled, hits each registered target on its configured protocol/port/path at the configured interval, and flips state.target.health.state per the healthy/unhealthy threshold counts. HTTP/HTTPS probes match against matcher.http_code (single, range, list, mixed). TCP/TLS probes succeed on connect. UDP/GENEVE return healthy without active probing (matches AWS NLB semantics). Behavior: - Newly registered targets start at "initial" until first probe completes - Consecutive successes >= healthy_threshold -> "healthy" - Consecutive failures >= unhealthy_threshold -> "unhealthy" with Target.FailedHealthChecks reason - DescribeTargetHealth returns the live state Set FAKECLOUD_ELBV2_DISABLE_HEALTH_PROBES=true to opt out and keep the historical synthetic "healthy" default for placeholder IPs that do not actually answer. E2E: spawns a tiny TCP server returning configurable HTTP status codes, registers it as a target, asserts the prober flips healthy with 200 and unhealthy after consecutive 503s. Removes "Health probes" from the elbv2.md "Not yet implemented" section and adds a dedicated "Health probes" section.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
2 issues found across 9 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="crates/fakecloud-elbv2/src/prober.rs">
<violation number="1" location="crates/fakecloud-elbv2/src/prober.rs:29">
P2: The hard-coded reqwest client timeout (30s) caps HTTP/HTTPS health checks and can override `HealthCheckTimeoutSeconds` for target groups.</violation>
<violation number="2" location="crates/fakecloud-elbv2/src/prober.rs:221">
P1: `job.port as u16` can wrap invalid `i32` ports, causing TCP/TLS probes to hit the wrong port instead of failing fast.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
- Drop the 30s reqwest client-level timeout. Per-probe `tokio::time::timeout` using `HealthCheckTimeoutSeconds` is now the only timeout, so that knob is authoritative and not capped at 30s. - Validate health-check port range in `build_job` (1..=65535) and use `u16::try_from` at the TCP/TLS connect site so an out-of-range i32 port fails the probe fast instead of wrapping to a different valid port.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HealthCheckEnabled, hits each registered target on its configured protocol/port/path at the configured interval, and flips target health state per healthy/unhealthy threshold countsMatcher.HttpCode(single, range, list, mixed). TCP/TLS probes succeed on connect. UDP/GENEVE return healthy without active probinginitial(matches AWS) until the first probe completesFAKECLOUD_ELBV2_DISABLE_HEALTH_PROBES=trueopts out and keeps the historical synthetichealthydefault — useful when targets are placeholder IPsTest plan
matcher_matches(single, range, list, mixed)healthythenunhealthyafter status flipcargo clippy --workspace --all-targets -- -D warningscleancargo fmt --checkcleanSummary by cubic
Adds real ELBv2 health checks with a background prober that updates target health based on group settings. HTTP/HTTPS honor code matchers; TCP/TLS succeed on connect; UDP/GENEVE are treated healthy.
New Features
HealthCheckEnabledon the configured protocol/port/path atHealthCheckIntervalSeconds.Matcher.HttpCode; TCP/TLS succeed on connect; UDP/GENEVE are marked healthy. State flips via consecutive success/failure thresholds;DescribeTargetHealthshows live state.initial; setFAKECLOUD_ELBV2_DISABLE_HEALTH_PROBES=trueto keep all targetshealthy. Docs add a "Health probes" section.Bug Fixes
HealthCheckTimeoutSecondsand is authoritative.Written for commit d8cbe34. Summary will update on new commits.