Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: fix missing namespace bug in csi cluster config map #14154

Merged

Conversation

BlaineEXE
Copy link
Member

@BlaineEXE BlaineEXE commented May 2, 2024

Someone testing the Multus holder pod removal feature encountered an issue where the migration process failed to lead to a system state where PVCs could be created successfully.

The root cause was found to be a ceph csi config map wherein the primary CephCluster entry was lacking a value for the "namespace" field.

I observed this once in my development on the holder pod removal feature, but I was unable to reproduce and assumed it was my own error. Since this has been seen in a user environment, it must be that the error is a race condition, and I am unable to determine the exact source of the bug.

I do not believe this bug would be present if the code that updates the CSI configmap were properly idempotent, but it has many conditions based on prior states, and I was unable to determine how to resolve this underlying impelementation pattern issue.

Instead, I opted to separate the clusterKey parameter into two clear parts:

  1. clusterID for when clusterKey is used as an analogue for clusterID
  2. clusterNamespace for when clusterKey is used as an analogue for clusterNamespace

I added unit tests to ensure that SaveClusterConfig() will be able to detect when the namespace is currently missing, and using the new clusterNamespace field, it should always know what value to use as input when correcting the bug in already-installed systems.

I also verified that this update works when the function simultaneously removes netNamespaceFilePath entries, and that those entries are removed properly.

Finally, manual testing also verifies the change.


If any users find that PVCs don't work after following steps to remove multus holder pods, users should upgrade to Rook v1.14.3 (upcoming) to get this bug fix. It should resolve the issue for them. Users can determine if they are affected by this bug at any time using this command:

kubectl -n rook-ceph get cm rook-ceph-csi-config -oyaml                           ⎈ minikube
apiVersion: v1
data:
  csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["10.104.66.98:6789"],"cephFS":{"netNamespaceFilePath":"","subvolumeGroup":"","kernelMountOptions":"","fuseMountOptions":""},"rbd":{"netNamespaceFilePath":"","radosNamespace":""},"nfs":{"netNamespaceFilePath":""},"readAffinity":{"enabled":false,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]},"namespace":""}]'
kind: ConfigMap
metadata:
  creationTimestamp: "2024-05-02T19:37:37Z"
  name: rook-ceph-csi-config
  namespace: rook-ceph
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: false
    controller: true
    kind: Deployment
    name: rook-ceph-operator
    uid: 7359139c-4149-4fba-8096-d77bfe018800
  resourceVersion: "2750"
  uid: a91c133b-5883-4745-9c0d-f2fc864b8b03

In the example output above notice that the final entry in the config data shows "namespace":"". The bug is present in this cluster. Users should not follow steps to disable holder pods until the issue is resolved. Users can manually resolve the issue by editing the configmap and inserting the CephCluster namespace into the values, like this "namespace":"rook-ceph" (assuming the namespace is rook-ceph).

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

centry.ClusterID = clusterKey
centry.Namespace = newCsiClusterConfigEntry.Namespace
centry.ClusterID = clusterID
centry.Namespace = clusterNamespace
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't confirm this, but I suspect that the root cause of this issue is that there is a place in the code where (currently or in the past) newCsiClusterConfigEntry.Namespace was empty, which is leading to the CSI configmap to have an empty value for the field.

Then, because this value isn't modified or updated after the initial creation, the issue persisted forever.

This behavior is non-idempotent, and I want to stress that operator code should always be idempotent (and simple) with very few exceptions. Non-idempotent code in operators is prone to bugs like these.

The only exceptions should be if the idempotent implementation adds significant (more than a second) of latency to reconcile loops.

I would suggest that the SaveClusterConfig() method should be reworked to focus on simple, idempotent logic.

I also notice that this function is doing 2 (or more) jobs, which also contributes to complexity and risk of bugs. I suggest splitting this into SaveClusterConfig() and RemoveClusterConfig() so that the add and remove behaviors present aren't adding unnecessary test cases (and room for bugs).

I also want to remind everyone that even though I believe I may be fixing the root cause by making this particular change, this won't resolve the issue in already-installed clusters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to remind everyone that even though I believe I may be fixing the root cause by making this particular change, this won't resolve the issue in already-installed clusters.

This won't be resolved by the change on lines 169-175 above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 169-175 fix the problem if it exists in an already-installed cluster, but they don't fix the root cause of the bug.

I think (but I'm not very certain) that this might be the root cause fix. However, this line's fix only applies when the entry is first created, so it won't do anything for brownfield clusters that have already experienced the error.

This is also helping to illustrate my point about creating proper idempotency in reconciliations. The code used here to create the first instance is different from the code used above to update existing instances. When there are two branch flows for create and update, any discrepancy between the two can lead to bugs like this.

@BlaineEXE BlaineEXE marked this pull request as ready for review May 2, 2024 19:48
centry.ClusterID = clusterKey
centry.Namespace = newCsiClusterConfigEntry.Namespace
centry.ClusterID = clusterID
centry.Namespace = clusterNamespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to remind everyone that even though I believe I may be fixing the root cause by making this particular change, this won't resolve the issue in already-installed clusters.

This won't be resolved by the change on lines 169-175 above?

@@ -302,11 +310,9 @@ func updateCsiConfigMapOwnerRefs(ctx context.Context, namespace string, clientse
// SaveClusterConfig updates the config map used to provide ceph-csi with
// basic cluster configuration. The clusterNamespace and clusterInfo are
// used to determine what "cluster" in the config map will be updated and
// the clusterNamespace value is expected to match the clusterID
// value that is provided to ceph-csi uses in the storage class.
// The locker l is typically a mutex and is used to prevent the config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independently from this PR... Is "locker l" obsolete? I don't see it here anymore. Or it must be referring to the package var configMutex. Maybe we can just remove this comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense too 👍

@BlaineEXE
Copy link
Member Author

I tested this locally by:

  1. install rook
  2. manually edit the rook-ceph-csi-config to remove the namespace value
  3. restart the rook operator
  4. observe that the namespace is re-added to the cluster config in the configmap

Someone testing the Multus holder pod removal feature encountered an
issue where the migration process failed to lead to a system state where
PVCs could be created successfully.

The root cause was found to be a ceph csi config map wherein the primary
CephCluster entry was lacking a value for the "namespace" field.

I observed this once in my development on the holder pod removal
feature, but I was unable to reproduce and assumed it was my own error.
Since this has been seen in a user environment, it must be that the
error is a race condition, and I am unable to determine the exact source
of the bug.

I do not believe this bug would be present if the code that updates the
CSI configmap were properly idempotent, but it has many conditions based
on prior states, and I was unable to determine how to resolve this
underlying impelementation pattern issue.

Instead, I opted to separate the `clusterKey` parameter into two clear
parts:
1. `clusterID` for when `clusterKey` is used as an analogue for
   `clusterID`
2. `clusterNamespace` for when `clusterKey` is used as an analogue for
   `clusterNamespace`

I added unit tests to ensure that SaveClusterConfig() will be able to
detect when the namespace is currently missing, and using the new
`clusterNamespace` field, it should always know what value to use as
input when correcting the bug in already-installed systems.

I also verified that this update works when the function simultaneously
removes netNamespaceFilePath entries, and that those entries are removed
properly.

Finally, manual testing also verifies the change.

Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
@BlaineEXE BlaineEXE force-pushed the csi-config-map-fix-empty-namespace branch from 61d13e7 to 4ff8812 Compare May 2, 2024 19:58
@@ -178,7 +186,7 @@ func updateCsiClusterConfig(curr, clusterKey string, newCsiClusterConfigEntry *C
}
}
for i, centry := range cc {
if centry.ClusterID == clusterKey {
Copy link
Member

@Madhu-1 Madhu-1 May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE i have a couple of other alternatives

  • can we move the above fix from 169 to 176 here? it ensures that it works for all the clusters even in case if newCsiClusterConfigEntry is also nil.
  • We can have empty namespace check along clusterID here
    if cc[i].Namespace == clusterNamespace {
if cc[i].Namespace == clusterNamespace || cc[i].Namespace== "" && cc[i].ClusterID=clusterNamespace{
...
}

I also think might be an issue with the IsHolderEnabled() check in the updateNetNamespaceFilePath function. This is because other controllers use this function to update the configmap and it is dependent on the holder variable set by the CSI controller. we can look at it later.

The current change looks good to be but this is having more changes, we can also have option 3 which is one line of change, i will let you and @travisn do decide on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to keep the "fix" separate since it can then log when it is fixing the condition. I'd like to be able to look for that log line in must-gathers from any future multus issues that users find.

As for your comment here...

I also think might be an issue with the IsHolderEnabled() check in the updateNetNamespaceFilePath function. This is because other controllers use this function to update the configmap and it is dependent on the holder variable set by the CSI controller. we can look at it later.

I tend to agree. Having IsHolderEnabled() use a global variable seems risky to me. I considered changing that when I implemented the multus changes several weeks back, but I found that it required more code changes than seemed reasonable, and I also didn't have a high confidence that I wouldn't introduce new bugs by making the changes. Ultimately, since we are going to remove holder pods entirely from the code base in a couple releases, I decided that it was best to leave it as-is.

From a technical standpoint, I also don't believe users will see any race conditions related to the global variable, for reasons below.

When the CSI controller sees a change in multus/holder enable state, it updates the global variable before updating the csi config and before making pod changes. Unless I missed something (always possible), I don't believe there are race conditions in the CSI controller itself. Additionally, the rados namespace and subvolume reconcilers don't use the global variable when deciding to set the netns file path, so they should be immune to race conditions related to the global var.

The only case where I think a race condition could exist is if a user has multiple multus-enabled CephClusters. However, I don't think Rook actually supports that case because it can't apply multiple sets of network selection annotations to the CSI pods. If any users were trying to do that today, I think we would see a bug report from them.

All of that said, there are no guarantees that users won't see bugs, but I have done as much investigation and due diligence as possible to ensure that users won't have issues.

@@ -1112,7 +1112,9 @@ func (c *Cluster) saveMonConfig() error {
Monitors: monEndpoints,
},
}
if err := csi.SaveClusterConfig(c.context.Clientset, c.Namespace, c.ClusterInfo, csiConfigEntry); err != nil {

clusterId := c.Namespace // cluster id is same as cluster namespace for CephClusters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for rados namespace it can be different....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the mon code though. The rados namespace code uses a separate clusterID and namespace

@BlaineEXE BlaineEXE merged commit 1af97d0 into rook:master May 3, 2024
53 checks passed
@BlaineEXE BlaineEXE deleted the csi-config-map-fix-empty-namespace branch May 3, 2024 18:18
mergify bot added a commit that referenced this pull request May 3, 2024
csi: fix missing namespace bug in csi cluster config map (backport #14154)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants