Skip to content

feat(ilp): multi-host ws ingress + egress failover hardening#20

Open
kafka1991 wants to merge 4 commits intovi_sffrom
ingress_role
Open

feat(ilp): multi-host ws ingress + egress failover hardening#20
kafka1991 wants to merge 4 commits intovi_sffrom
ingress_role

Conversation

@kafka1991
Copy link
Copy Markdown
Collaborator

@kafka1991 kafka1991 commented May 6, 2026

Summary

Multi-host failover for ws/wss ingress; egress QwpQueryClient failover loop refactored onto a shared host-health tracker; role-aware 503 + X-QuestDB-Role upgrade-reject handling unified across both.

Ingress

  • addr= accepts multi-host (comma list, repeated key, IPv4, bracketed and bare IPv6); TCP/UDP unchanged single-host.
  • QwpWebSocketSender.buildAndConnect() walks the shared QwpHostHealthTracker on every reconnect (initial, I/O loop, orphan-drainer replays); synchronized to serialise concurrent callers.
  • auth_timeout_ms (default 15s) bounds the upgrade response read per endpoint — catches "TCP accepts but never replies" blackholes the OS connect timeout misses.
  • gorilla=on|off plumbed through connect() (was a post-connect setter).

Egress (QwpQueryClient)

  • reconnectViaTracker() replaces reconnectSkippingIndex(failedIndex) — picks by tracker priority instead of "skip the failed index" round-robin.
  • lb_strategy=random|first (default random): shuffles the initial endpoint list so N clients spread across N hosts.
  • failover_max_duration_ms (default 30s): wall-clock cap complementary to failover_max_attempts.
  • Backoff jitter is now full-jitter [0, capped]. The previous additive [base, 2·base) clamped at failover_max_backoff_ms self-cancelled at saturation.
  • Default target=any still reads from any role. With target=primary|replica, SERVER_INFO mismatches and 503 + X-QuestDB-Role rejects are classified PRIMARY_CATCHUP transient, REPLICA topological.

Shared

  • QwpHostHealthTracker: priority Healthy → Unknown → TransientReject → TransportError → TopologyReject; sticky-Healthy across rounds.
  • QwpIngressRoleRejectedException carries role + host:port with isTransient / isTopological.
  • WebSocketClient.getUpgradeRejectRole() parses X-QuestDB-Role from 503 upgrade responses.

Config keys

Key Default Side Purpose
auth_timeout_ms 15000 ingress + egress per-endpoint upgrade timeout
gorilla on ingress DoD timestamp encoding
lb_strategy random egress initial endpoint pick
failover_max_duration_ms 30000 egress failover wall-clock cap

Connect/error logs now emit the currently-bound endpoint, not the first configured host.

@kafka1991 kafka1991 changed the title feat(qwp): ingress support failover feat(ilp): ingress support failover May 6, 2026
@kafka1991 kafka1991 changed the title feat(ilp): ingress support failover feat(qwp): multi-host ws ingress + egress failover hardening May 6, 2026
@kafka1991 kafka1991 changed the title feat(qwp): multi-host ws ingress + egress failover hardening feat(ilp): multi-host ws ingress + egress failover hardening May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant