Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Continue processing PVs for network fencing when no node IPs found #13768

Merged
merged 3 commits into from Feb 14, 2024

Conversation

Madhu-1
Copy link
Member

@Madhu-1 Madhu-1 commented Feb 14, 2024

The cephfs PVC might exist on the kubernetes node object but due to some timing issues the ip might not be visible on the ceph cluster or the client might have already been evicted or disconnected from the ceph cluster. In this case, we will not be able to get IP details for the subvolume and we don't have any check for empty ip's in the code rook tries to create NetworkFence CR with an empty Ip's, and the NetworkFence will get moved to the Failed state. This PR adds the necessary checks and logging to prevent this one.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

The cephfs PVC might exist on the kubernetes
node object but due to some timing issues the
ip might not be visible on the ceph cluster or
the client might already got evicted or disconnected
from ceph cluster. In this case we will not be
able to get IP details for the subvolume and we dont
have any check for empty ip's in the code and rook
tries to create NetworkFence CR with empty Ip's
and the NetworkFence will get moved to the Failed state.
This PR adds the necessary check and logging to prevent
this one.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
to keep the code consistent and to propogate
more details about error rook uses error wraping.
updating current code whereever its required to
wrap the details.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>

if len(ips) == 0 {
logger.Infof("no active mds clients found for cephfs volume %q", cephFSPV.Name)
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By returning nil, shouldn't this mean the fencing was successful? But the fencing is being skipped. How will the fencing be retried in the future if we are returning nil? Don't we need to return an error so the fencing will try again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes true, we are doing the same for RBD

if len(ips) != 0 {
err = c.createNetworkFence(ctx, rbdPV, node, cluster, ips, rbdDriver)
if err != nil {
return pkgerror.Wrapf(err, "failed to create network fence for node %q", node.Name)
}
}
, if we don't any client using that subvolume/rbdimage we are assuming that no client is connected and no need for blocklisting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both rbd and cephfs, the calling method is in a for loop. If nil is returned, the for loop breaks out. In the case of no IPs found, should the for loop keep iterating through other PVs instead of aborting? Or all PVs will have the same result of no IPs found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are correct we should try for other PVC, added new comment for introducing new check. PTAL

@Madhu-1 Madhu-1 requested a review from travisn February 14, 2024 17:51
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a whitespace suggestion

if len(ips) == 0 {
logger.Infof("no active rbd clients found for rbd volume %q", rbdPV.Name)
return errActiveClientNotFound

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove blank line

retry other cephfs/rbd pvc if there
are no active clients found on ceph cluster.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Comment on lines +248 to +251
// continue to fence next rbd volume if active client not found
if stderrors.Is(err, errActiveClientNotFound) {
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title says cephfs. This seems to be operating on RBD PVs. Is this intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR was originally just for cephfs, but also expanded for rbd. I've updated the title.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE sorry i missed the ping, Thanks @travisn for updating it

@travisn travisn changed the title core: fix cephfs pvc network fencing core: Continue processing PVs for network fencing when no node IPs found Feb 14, 2024
@travisn travisn merged commit 8ec85dc into rook:master Feb 14, 2024
50 of 51 checks passed
mergify bot added a commit that referenced this pull request Feb 14, 2024
core: Continue processing PVs for network fencing when no node IPs found (backport #13768)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants