CLDSRV-836 Fix deep healthcheck fail status #6055

BourgoisMickael · 2026-01-20T12:26:49Z

Now fail if ALL backends of only one client fails.

Previously, the deep healthcheck would fail if ALL backends/locations
failed globally across all clients (data, metadata, vault, kms).

This change modifies the logic to fail if ANY client has ALL its
backends/locations failing. This ensures:

For data backend with multiple sproxyd location constraints:
- Returns HTTP 200 if at least ONE location is healthy
- Returns HTTP 500 only if ALL locations fail
Each client (data, metadata, vault, kms) is evaluated independently
- If ALL locations of the data client fail, overall check fails
- If ALL locations of metadata fail, overall check fails
- etc.

The new logic uses:

results.some() to check across clients
keys.every() within each client to check all its locations

bert-e · 2026-01-20T12:26:53Z

Hello bourgoismickael,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options

name	description	privileged	authored
`/after_pull_request`	Wait for the given pull request id to be merged before continuing with the current one.
`/bypass_author_approval`	Bypass the pull request author's approval	⭐
`/bypass_build_status`	Bypass the build and test status	⭐
`/bypass_commit_size`	Bypass the check on the size of the changeset `TBA`	⭐
`/bypass_incompatible_branch`	Bypass the check on the source branch prefix	⭐
`/bypass_jira_check`	Bypass the Jira issue check	⭐
`/bypass_peer_approval`	Bypass the pull request peers' approval	⭐
`/bypass_leader_approval`	Bypass the pull request leaders' approval	⭐
`/approve`	Instruct Bert-E that the author has approved the pull request.		✍️
`/create_pull_requests`	Allow the creation of integration pull requests.
`/create_integration_branches`	Allow the creation of integration branches.
`/no_octopus`	Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
`/unanimity`	Change review acceptance criteria from `one reviewer at least` to `all reviewers`
`/wait`	Instruct Bert-E not to run until further notice.

Available commands

name	description	privileged
`/help`	Print Bert-E's manual in the pull request.
`/status`	Print Bert-E's current status in the pull request `TBA`
`/clear`	Remove all comments from Bert-E from the history `TBA`
`/retry`	Re-start a fresh build `TBA`
`/build`	Re-start a fresh build `TBA`
`/force_reset`	Delete integration branches & pull requests, and restart merge process from the beginning.
`/reset`	Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

bert-e · 2026-01-20T12:27:01Z

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

codecov · 2026-01-20T12:28:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.39%. Comparing base (9b02185) to head (ab3b7be).
⚠️ Report is 13 commits behind head on development/9.2.
✅ All tests successful. No failed tests found.

Additional details and impacted files

Files with missing lines	Coverage Δ
lib/utilities/healthcheckHandler.js	`92.45% <100.00%> (+2.25%)`	⬆️

... and 1 file with indirect coverage changes

@@                 Coverage Diff                 @@
##           development/9.2    #6055      +/-   ##
===================================================
- Coverage            84.41%   84.39%   -0.03%     
===================================================
  Files                  206      206              
  Lines                13016    13018       +2     
===================================================
- Hits                 10987    10986       -1     
- Misses                2029     2032       +3

Flag	Coverage Δ
file-ft-tests	`67.45% <100.00%> (+<0.01%)`	⬆️
kmip-ft-tests	`28.12% <100.00%> (+0.01%)`	⬆️
mongo-v0-ft-tests	`68.70% <100.00%> (+<0.01%)`	⬆️
mongo-v1-ft-tests	`68.69% <100.00%> (+<0.01%)`	⬆️
multiple-backend	`35.29% <100.00%> (+<0.01%)`	⬆️
sur-tests	`36.39% <100.00%> (+<0.01%)`	⬆️
sur-tests-inflights	`37.40% <100.00%> (-0.03%)`	⬇️
unit	`69.98% <100.00%> (+0.01%)`	⬆️
utapi-v2-tests	`34.29% <100.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR fixes the deep healthcheck logic to fail when ANY client has ALL its backends/locations failing, rather than only failing when ALL backends across ALL clients fail. This ensures better detection of client-specific failures, particularly for multi-location data backends.

Changes:

Modified the failure detection logic from checking all backends globally to checking each client independently
Added empty client handling to skip clients with no backends

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/utilities/healthcheckHandler.js

anurag4DSB

This behaviour needs to be documented in official documentation so customers and CS know what to expect + release notes are a must for this change in behaviour

lib/utilities/healthcheckHandler.js

francoisferrand · 2026-01-21T14:46:42Z

This behaviour needs to be documented in official documentation so customers and CS know what to expect + release notes are a must for this change in behavior

Does this actually change behavior in S3C? When discussing, our understanding was that this actually does not change the behavior (since each "client" returns a single result) : so we would just go back to the existing behavior of S3C.
The case with multiple result is multi-location backend in Zenko, which will indeed return one result per location. In that case, the behavior did not change either with respect to that client.
In the end, we expect the only change of behavior (at product level) is that in Artesca, cloudserver may now fail healthcheck when either every vault or every mongodb is dead (since k8s handles the routing), instead of when when every vault and every mongo is done (as today). So the change is really just a corner case, which nobody will ever notice in Artesca IMHO.

lib/utilities/healthcheckHandler.js

francoisferrand · 2026-01-22T13:17:39Z

tests/unit/healthchecks/clientCheck.js

+            done();
+        });
+    });
+


for completeness, should we have a test as well when metadata or vault fails?

i don't think so, all client are handled generically, there is no more cases to test for specific clients

Now fail if ALL backends of only one client fails. Previously, the deep healthcheck would fail if ALL backends/locations failed globally across all clients (data, metadata, vault, kms). This change modifies the logic to fail if ANY client has ALL its backends/locations failing. This ensures: 1. For data backend with multiple sproxyd location constraints: - Returns HTTP 200 if at least ONE location is healthy - Returns HTTP 500 only if ALL locations fail 2. Each client (data, metadata, vault, kms) is evaluated independently - If ALL locations of the data client fail, overall check fails - If ALL locations of metadata fail, overall check fails - etc. The new logic uses: - `results.some()` to check across clients - `keys.every()` within each client to check all its locations

BourgoisMickael · 2026-01-22T16:03:43Z

/approve

bert-e · 2026-01-22T16:03:54Z

Integration data created

I have created the integration data for the additional destination branches.

this pull request will merge bugfix/CLDSRV-836-deep-healthcheck into
development/9.2
w/9.3/bugfix/CLDSRV-836-deep-healthcheck will be merged into development/9.3

The following branches will NOT be impacted:

development/7.10
development/7.4
development/7.70
development/8.8
development/9.0
development/9.1

You can set option create_pull_requests if you need me to create
integration pull requests in addition to integration branches, with:

@bert-e create_pull_requests

The following options are set: approve

bert-e · 2026-01-22T16:25:08Z

Build failed

The build for commit did not succeed in branch w/9.3/bugfix/CLDSRV-836-deep-healthcheck

The following options are set: approve

BourgoisMickael · 2026-01-22T18:25:35Z

ping

bert-e · 2026-01-22T18:28:21Z

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

✔️ development/9.2
✔️ development/9.3

The following branches will NOT be impacted:

development/7.10
development/7.4
development/7.70
development/8.8
development/9.0
development/9.1

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

Any commit you add on the source branch will trigger a new cycle after the
current queue is merged.
Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

bert-e · 2026-01-22T18:53:15Z

I have successfully merged the changeset of this pull request
into targetted development branches:

✔️ development/9.2
✔️ development/9.3

The following branches have NOT changed:

development/7.10
development/7.4
development/7.70
development/8.8
development/9.0
development/9.1

Please check the status of the associated issue CLDSRV-836.

Goodbye bourgoismickael.

BourgoisMickael requested review from a team, anurag4DSB, Copilot, leif-scality and nicolas2bert and removed request for a team January 20, 2026 13:10

Copilot started reviewing on behalf of BourgoisMickael January 20, 2026 13:10 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

lib/utilities/healthcheckHandler.js Outdated Show resolved Hide resolved

BourgoisMickael requested a review from francoisferrand January 20, 2026 13:14

leif-scality approved these changes Jan 20, 2026

View reviewed changes

tcarmet approved these changes Jan 20, 2026

View reviewed changes

jonathan-gramain approved these changes Jan 20, 2026

View reviewed changes

anurag4DSB approved these changes Jan 21, 2026

View reviewed changes

francoisferrand reviewed Jan 21, 2026

View reviewed changes

lib/utilities/healthcheckHandler.js Outdated Show resolved Hide resolved

francoisferrand reviewed Jan 21, 2026

View reviewed changes

lib/utilities/healthcheckHandler.js Outdated Show resolved Hide resolved

BourgoisMickael requested a review from francoisferrand January 21, 2026 18:45

francoisferrand approved these changes Jan 22, 2026

View reviewed changes

BourgoisMickael added 2 commits January 22, 2026 15:49

CLDSRV-836: Unit test healthcheck clientCheck

6dd43a5

BourgoisMickael force-pushed the bugfix/CLDSRV-836-deep-healthcheck branch 2 times, most recently from ab3b7be to 6dd43a5 Compare January 22, 2026 16:02

BourgoisMickael mentioned this pull request Jan 22, 2026

CLDSRV-838: Change startTime type to milliseconds since epoch #6056

Merged

scality deleted a comment from bert-e Jan 22, 2026

bert-e merged commit 00c52d9 into development/9.2 Jan 22, 2026
55 checks passed

bert-e deleted the bugfix/CLDSRV-836-deep-healthcheck branch January 22, 2026 18:53

CLDSRV-836 Fix deep healthcheck fail status #6055

CLDSRV-836 Fix deep healthcheck fail status #6055

Conversation

BourgoisMickael commented Jan 20, 2026

Uh oh!

bert-e commented Jan 20, 2026

Hello bourgoismickael,

Uh oh!

bert-e commented Jan 20, 2026

Request integration branches

Uh oh!

codecov bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

anurag4DSB left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

francoisferrand commented Jan 21, 2026

Uh oh!

Uh oh!

francoisferrand Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

BourgoisMickael Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

BourgoisMickael commented Jan 22, 2026

Uh oh!

bert-e commented Jan 22, 2026

Integration data created

Uh oh!

bert-e commented Jan 22, 2026

Build failed

Uh oh!

BourgoisMickael commented Jan 22, 2026

Uh oh!

bert-e commented Jan 22, 2026

In the queue

Uh oh!

bert-e commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

codecov bot commented Jan 20, 2026 •

edited

Loading

anurag4DSB left a comment •

edited

Loading