Skip to content

Conversation

@BourgoisMickael
Copy link
Contributor

Now fail if ALL backends of only one client fails.

Previously, the deep healthcheck would fail if ALL backends/locations
failed globally across all clients (data, metadata, vault, kms).

This change modifies the logic to fail if ANY client has ALL its
backends/locations failing. This ensures:

  1. For data backend with multiple sproxyd location constraints:

    • Returns HTTP 200 if at least ONE location is healthy
    • Returns HTTP 500 only if ALL locations fail
  2. Each client (data, metadata, vault, kms) is evaluated independently

    • If ALL locations of the data client fail, overall check fails
    • If ALL locations of metadata fail, overall check fails
    • etc.

The new logic uses:

  • results.some() to check across clients
  • keys.every() within each client to check all its locations

@bert-e
Copy link
Contributor

bert-e commented Jan 20, 2026

Hello bourgoismickael,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Jan 20, 2026

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

@codecov
Copy link

codecov bot commented Jan 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.39%. Comparing base (9b02185) to head (ab3b7be).
⚠️ Report is 13 commits behind head on development/9.2.
✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
lib/utilities/healthcheckHandler.js 92.45% <100.00%> (+2.25%) ⬆️

... and 1 file with indirect coverage changes

@@                 Coverage Diff                 @@
##           development/9.2    #6055      +/-   ##
===================================================
- Coverage            84.41%   84.39%   -0.03%     
===================================================
  Files                  206      206              
  Lines                13016    13018       +2     
===================================================
- Hits                 10987    10986       -1     
- Misses                2029     2032       +3     
Flag Coverage Δ
file-ft-tests 67.45% <100.00%> (+<0.01%) ⬆️
kmip-ft-tests 28.12% <100.00%> (+0.01%) ⬆️
mongo-v0-ft-tests 68.70% <100.00%> (+<0.01%) ⬆️
mongo-v1-ft-tests 68.69% <100.00%> (+<0.01%) ⬆️
multiple-backend 35.29% <100.00%> (+<0.01%) ⬆️
sur-tests 36.39% <100.00%> (+<0.01%) ⬆️
sur-tests-inflights 37.40% <100.00%> (-0.03%) ⬇️
unit 69.98% <100.00%> (+0.01%) ⬆️
utapi-v2-tests 34.29% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the deep healthcheck logic to fail when ANY client has ALL its backends/locations failing, rather than only failing when ALL backends across ALL clients fail. This ensures better detection of client-specific failures, particularly for multi-location data backends.

Changes:

  • Modified the failure detection logic from checking all backends globally to checking each client independently
  • Added empty client handling to skip clients with no backends

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@anurag4DSB anurag4DSB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour needs to be documented in official documentation so customers and CS know what to expect + release notes are a must for this change in behaviour

@francoisferrand
Copy link
Contributor

This behaviour needs to be documented in official documentation so customers and CS know what to expect + release notes are a must for this change in behavior

  • Does this actually change behavior in S3C? When discussing, our understanding was that this actually does not change the behavior (since each "client" returns a single result) : so we would just go back to the existing behavior of S3C.
  • The case with multiple result is multi-location backend in Zenko, which will indeed return one result per location. In that case, the behavior did not change either with respect to that client.
  • In the end, we expect the only change of behavior (at product level) is that in Artesca, cloudserver may now fail healthcheck when either every vault or every mongodb is dead (since k8s handles the routing), instead of when when every vault and every mongo is done (as today). So the change is really just a corner case, which nobody will ever notice in Artesca IMHO.

done();
});
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for completeness, should we have a test as well when metadata or vault fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think so, all client are handled generically, there is no more cases to test for specific clients

Now fail if ALL backends of only one client fails.

Previously, the deep healthcheck would fail if ALL backends/locations
failed globally across all clients (data, metadata, vault, kms).

This change modifies the logic to fail if ANY client has ALL its
backends/locations failing. This ensures:

1. For data backend with multiple sproxyd location constraints:
   - Returns HTTP 200 if at least ONE location is healthy
   - Returns HTTP 500 only if ALL locations fail

2. Each client (data, metadata, vault, kms) is evaluated independently
   - If ALL locations of the data client fail, overall check fails
   - If ALL locations of metadata fail, overall check fails
   - etc.

The new logic uses:
- `results.some()` to check across clients
- `keys.every()` within each client to check all its locations
@BourgoisMickael BourgoisMickael force-pushed the bugfix/CLDSRV-836-deep-healthcheck branch 2 times, most recently from ab3b7be to 6dd43a5 Compare January 22, 2026 16:02
@BourgoisMickael
Copy link
Contributor Author

/approve

@bert-e
Copy link
Contributor

bert-e commented Jan 22, 2026

Integration data created

I have created the integration data for the additional destination branches.

The following branches will NOT be impacted:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8
  • development/9.0
  • development/9.1

You can set option create_pull_requests if you need me to create
integration pull requests in addition to integration branches, with:

@bert-e create_pull_requests

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jan 22, 2026

Build failed

The build for commit did not succeed in branch w/9.3/bugfix/CLDSRV-836-deep-healthcheck

The following options are set: approve

@BourgoisMickael
Copy link
Contributor Author

ping

@scality scality deleted a comment from bert-e Jan 22, 2026
@bert-e
Copy link
Contributor

bert-e commented Jan 22, 2026

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/9.2

  • ✔️ development/9.3

The following branches will NOT be impacted:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8
  • development/9.0
  • development/9.1

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jan 22, 2026

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/9.2

  • ✔️ development/9.3

The following branches have NOT changed:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8
  • development/9.0
  • development/9.1

Please check the status of the associated issue CLDSRV-836.

Goodbye bourgoismickael.

@bert-e bert-e merged commit 00c52d9 into development/9.2 Jan 22, 2026
55 checks passed
@bert-e bert-e deleted the bugfix/CLDSRV-836-deep-healthcheck branch January 22, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants