Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator: load cluster owner info in LoadClusterInfo #14079

Merged

Conversation

BlaineEXE
Copy link
Member

@BlaineEXE BlaineEXE commented Apr 16, 2024

The CreateOrLoadClusterInfo (and therefore LoadClusterInfo) methods were not loading the ClusterInfo.OwnerInfo. This should help future CRD controllers get the full ClusterInfo struct without having missing information. Current controllers that need ClusterInfo fill the field themselves.

During testing, I observed one corner case where an upgraded cluster was missing the rook-ceph-csi-config configmap. The cluster had a CephFilesystemSubVolumeGroup resource created, and the reconcile for that resource was attempting to create the missing CSI configmap and failing with a nil pointer exception due to the missing OwnerInfo field in ClusterInfo. This cluster condition hasn't been reproduced in healthy environments, and it is unknown how the CSI configmap came to be missing. However, the case did expose the missing loaded info as a potential for causing nil pointer exceptions during corner cases when code is otherwise correct.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

The CreateOrLoadClusterInfo (and therefore LoadClusterInfo) methods were
not loading the ClusterInfo.OwnerInfo. This should help future CRD
controllers get the full ClusterInfo struct without having missing
information. Current controllers that need ClusterInfo fill the field
themselves.

During testing, I observed one corner case where an upgraded cluster was
missing the `rook-ceph-csi-config` configmap. The cluster had a
CephFilesystemSubVolumeGroup resource created, and the reconcile for
that resource was attempting to create the missing CSI configmap and
failing with a nil pointer exception due to the missing OwnerInfo field
in ClusterInfo. This cluster condition hasn't been reproduced in healthy
environments, and it is unknown how the CSI configmap came to be
missing. However, the case did expose the missing loaded info as a
potential for causing nil pointer exceptions during corner cases when
code is otherwise correct.

Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
@BlaineEXE
Copy link
Member Author

I don't anticipate this showing up in user environments because the corner case here was the missing rook-ceph-csi-config configmap, which should be present in all Rook clusters for the last 5 years. I really have no idea how the user's (test) environment came to lose that configmap, but I suspect it to be an accident or an issue of installing and updating too rapidly.

But it seemed like something that would be a good fix anyway for the sake of developers. I've had nil pointer issues in the past that end up being caused by Rook only partially completing struct information during runtime. It's always tedious and annoying, and I think it's worth trying to make that less common for devs in the future.

@@ -141,6 +141,9 @@ func CreateOrLoadClusterInfo(clusterdContext *clusterd.Context, context context.
} else {
return nil, maxMonID, monMapping, errors.New("failed to find either the cluster admin key or the username")
}
if len(secrets.OwnerReferences) > 0 {
clusterInfo.OwnerInfo = k8sutil.NewOwnerInfoWithOwnerRef(&secrets.GetOwnerReferences()[0], namespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems safe to assume there is always an ownerref on this secret, like you said we have had it there for a long time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think it should be safe because of this code above:

	secrets, err := clusterdContext.Clientset.CoreV1().Secrets(namespace).Get(context, AppName, metav1.GetOptions{})
	if err != nil {
		if !kerrors.IsNotFound(err) {
			return nil, maxMonID, monMapping, errors.Wrap(err, "failed to get mon secrets")
		}
		if ownerInfo == nil {
			return nil, maxMonID, monMapping, ClusterInfoNoClusterNoSecret
		}

When a CephCluster is first created and the Secret is initially created and populated (by initClusterInfo()/PopulateExternalClusterInfo()), the secret shouldn't get created unless there is a CephCluster given to act as the owner.

@travisn travisn merged commit a9fded2 into rook:master Apr 16, 2024
52 of 53 checks passed
@BlaineEXE BlaineEXE deleted the load-cluster-owner-info-in-LoadClusterInfo branch April 16, 2024 22:05
satoru-takeuchi added a commit that referenced this pull request Apr 16, 2024
operator: load cluster owner info in LoadClusterInfo (backport #14079)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants