Skip to content

fix: correct cache client status reporting#5793

Open
CAICAIIs wants to merge 5 commits intofluid-cloudnative:masterfrom
CAICAIIs:cache-client-ready-fix
Open

fix: correct cache client status reporting#5793
CAICAIIs wants to merge 5 commits intofluid-cloudnative:masterfrom
CAICAIIs:cache-client-ready-fix

Conversation

@CAICAIIs
Copy link
Copy Markdown
Contributor

@CAICAIIs CAICAIIs commented Apr 17, 2026

Ⅰ. Describe what this PR does

This PR corrects cache runtime client status reporting.

It updates client phase handling so the client is reported as Ready, PartialReady, or NotReady more accurately, including the zero-desired-replica case.

It also keeps runtime readiness and dataset binding based on master and worker readiness, and adds unit tests to cover the related status transitions and retry behavior.

Ⅱ. Does this pull request fix one issue?

NONE

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Added unit tests for:

  • client not ready
  • client partial ready
  • client zero desired replicas
  • client fully ready
  • runtime readiness recomputation on retry

Ⅳ. Describe how to verify it

Run:

  • go test ./pkg/ddc/cache/engine

Ⅴ. Special notes for reviews

This PR only updates cache client status handling and related test coverage.

Signed-off-by: CAICAIIs <3360776475@qq.com>
Signed-off-by: CAICAIIs <3360776475@qq.com>
@fluid-e2e-bot
Copy link
Copy Markdown

fluid-e2e-bot bot commented Apr 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cheyang for approval by writing /assign @cheyang in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fluid-e2e-bot
Copy link
Copy Markdown

fluid-e2e-bot bot commented Apr 17, 2026

Hi @CAICAIIs. Thanks for your PR.

I'm waiting for a fluid-cloudnative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the runtime status logic to incorporate the client component's readiness into the overall runtime state. Specifically, setClientComponentStatus now returns a boolean indicating readiness, and CheckAndUpdateRuntimeStatus has been updated to aggregate this value. New unit tests were also introduced to cover various readiness scenarios. Feedback suggests improving the robustness of the replica comparison by using >= instead of == and simplifying the conditional logic for determining the component phase.

Comment thread pkg/ddc/cache/engine/status.go Outdated
Signed-off-by: CAICAIIs <3360776475@qq.com>
@xliuqq
Copy link
Copy Markdown
Collaborator

xliuqq commented Apr 18, 2026

@Syspretor @cheyang does fuse client affect the dataset binding?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens cache runtime readiness evaluation so that datasets are only bound once the client component is fully ready, and adds unit tests covering the corrected behavior.

Changes:

  • Update cache runtime readiness logic to include client readiness (and require all client replicas ready) before considering the runtime ready.
  • Adjust client component status calculation to explicitly set NotReady/PartialReady/Ready phases based on desired vs ready replicas.
  • Add unit tests validating runtime readiness and setup duration behavior across client not-ready/partial-ready/ready transitions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/ddc/cache/engine/status.go Includes client readiness in overall runtime readiness gating; updates client status phase/ready computation.
pkg/ddc/cache/engine/status_test.go Adds unit tests ensuring runtime isn’t marked ready (and setup duration not set) until client is fully ready.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/ddc/cache/engine/status.go Outdated
Comment on lines 125 to 127
if masterReady && workerReady && clientReady {
runtimeReady = true
} else {
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtimeReady is mutated inside the RetryOnConflict closure but never reset to false when the readiness condition is not met. If the first attempt sets runtimeReady=true and a subsequent retry observes not-ready components, runtimeReady will remain true, which can incorrectly record SetupDuration and cause callers to treat the runtime as ready. Compute runtimeReady from the current masterReady/workerReady/clientReady on every retry attempt (e.g., assign runtimeReady = masterReady && workerReady && clientReady unconditionally, or reset it at the top of each closure invocation).

Suggested change
if masterReady && workerReady && clientReady {
runtimeReady = true
} else {
runtimeReady = masterReady && workerReady && clientReady
if !runtimeReady {

Copilot uses AI. Check for mistakes.
Comment thread pkg/ddc/cache/engine/status.go Outdated
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 19, 2026

Codecov Report

❌ Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 58.67%. Comparing base (61b67ae) to head (e15c9e7).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
pkg/ddc/cache/engine/status.go 92.85% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5793      +/-   ##
==========================================
+ Coverage   58.46%   58.67%   +0.20%     
==========================================
  Files         473      473              
  Lines       32222    32224       +2     
==========================================
+ Hits        18839    18906      +67     
+ Misses      11836    11757      -79     
- Partials     1547     1561      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cheyang
Copy link
Copy Markdown
Collaborator

cheyang commented Apr 19, 2026

Replying to @xliuqq's question: Fuse client should NOT affect dataset binding. The dataset should become Bound once master and worker are ready, regardless of client readiness.

This means the current PR direction — gating runtimeReady on clientReady — is incorrect. The client phase should be reported in the status for observability, but it must not block the runtime from becoming ready or the dataset from being bound.

The PR does fix real bugs in setClientComponentStatus (missing phase when DesiredReplicas==0 or ReadyReplicas==0), and the refactored phase logic is correct for status reporting. However, clientReady should be removed from the runtimeReady condition in CheckAndUpdateRuntimeStatus.

Suggested approach:

  1. Keep the setClientComponentStatus signature change returning (ready bool, err) — useful for status display
  2. Keep the fixed phase branching logic (covers all states correctly)
  3. Remove clientReady from runtimeReady gating — revert to masterReady && workerReady
  4. Keep unit tests but update them to reflect that runtime becomes ready regardless of client state

@CAICAIIs — would you like to adjust the PR accordingly?

@CAICAIIs
Copy link
Copy Markdown
Contributor Author

would you like to adjust the PR accordingly

Thanks, this clarifies the intended semantics.

I agree the current gating is wrong: client/fuse readiness must not block runtime ready or dataset binding. I’ll revise the PR so it only fixes client status reporting and test coverage, and I’ll remove clientReady from the runtimeReady condition.

Signed-off-by: CAICAIIs <3360776475@qq.com>
@CAICAIIs CAICAIIs force-pushed the cache-client-ready-fix branch from bfc0875 to 50327b6 Compare April 19, 2026 15:31
Signed-off-by: CAICAIIs <3360776475@qq.com>
@sonarqubecloud
Copy link
Copy Markdown

@CAICAIIs CAICAIIs changed the title fix: require full client readiness before binding dataset fix: correct cache client status reporting Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants