-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: fix missing namespace bug in csi cluster config map #14154
csi: fix missing namespace bug in csi cluster config map #14154
Conversation
centry.ClusterID = clusterKey | ||
centry.Namespace = newCsiClusterConfigEntry.Namespace | ||
centry.ClusterID = clusterID | ||
centry.Namespace = clusterNamespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't confirm this, but I suspect that the root cause of this issue is that there is a place in the code where (currently or in the past) newCsiClusterConfigEntry.Namespace
was empty, which is leading to the CSI configmap to have an empty value for the field.
Then, because this value isn't modified or updated after the initial creation, the issue persisted forever.
This behavior is non-idempotent, and I want to stress that operator code should always be idempotent (and simple) with very few exceptions. Non-idempotent code in operators is prone to bugs like these.
The only exceptions should be if the idempotent implementation adds significant (more than a second) of latency to reconcile loops.
I would suggest that the SaveClusterConfig()
method should be reworked to focus on simple, idempotent logic.
I also notice that this function is doing 2 (or more) jobs, which also contributes to complexity and risk of bugs. I suggest splitting this into SaveClusterConfig()
and RemoveClusterConfig()
so that the add and remove behaviors present aren't adding unnecessary test cases (and room for bugs).
I also want to remind everyone that even though I believe I may be fixing the root cause by making this particular change, this won't resolve the issue in already-installed clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also want to remind everyone that even though I believe I may be fixing the root cause by making this particular change, this won't resolve the issue in already-installed clusters.
This won't be resolved by the change on lines 169-175 above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 169-175 fix the problem if it exists in an already-installed cluster, but they don't fix the root cause of the bug.
I think (but I'm not very certain) that this might be the root cause fix. However, this line's fix only applies when the entry is first created, so it won't do anything for brownfield clusters that have already experienced the error.
This is also helping to illustrate my point about creating proper idempotency in reconciliations. The code used here to create the first instance is different from the code used above to update existing instances. When there are two branch flows for create and update, any discrepancy between the two can lead to bugs like this.
centry.ClusterID = clusterKey | ||
centry.Namespace = newCsiClusterConfigEntry.Namespace | ||
centry.ClusterID = clusterID | ||
centry.Namespace = clusterNamespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also want to remind everyone that even though I believe I may be fixing the root cause by making this particular change, this won't resolve the issue in already-installed clusters.
This won't be resolved by the change on lines 169-175 above?
@@ -302,11 +310,9 @@ func updateCsiConfigMapOwnerRefs(ctx context.Context, namespace string, clientse | |||
// SaveClusterConfig updates the config map used to provide ceph-csi with | |||
// basic cluster configuration. The clusterNamespace and clusterInfo are | |||
// used to determine what "cluster" in the config map will be updated and | |||
// the clusterNamespace value is expected to match the clusterID | |||
// value that is provided to ceph-csi uses in the storage class. | |||
// The locker l is typically a mutex and is used to prevent the config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independently from this PR... Is "locker l" obsolete? I don't see it here anymore. Or it must be referring to the package var configMutex
. Maybe we can just remove this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense too 👍
I tested this locally by:
|
Someone testing the Multus holder pod removal feature encountered an issue where the migration process failed to lead to a system state where PVCs could be created successfully. The root cause was found to be a ceph csi config map wherein the primary CephCluster entry was lacking a value for the "namespace" field. I observed this once in my development on the holder pod removal feature, but I was unable to reproduce and assumed it was my own error. Since this has been seen in a user environment, it must be that the error is a race condition, and I am unable to determine the exact source of the bug. I do not believe this bug would be present if the code that updates the CSI configmap were properly idempotent, but it has many conditions based on prior states, and I was unable to determine how to resolve this underlying impelementation pattern issue. Instead, I opted to separate the `clusterKey` parameter into two clear parts: 1. `clusterID` for when `clusterKey` is used as an analogue for `clusterID` 2. `clusterNamespace` for when `clusterKey` is used as an analogue for `clusterNamespace` I added unit tests to ensure that SaveClusterConfig() will be able to detect when the namespace is currently missing, and using the new `clusterNamespace` field, it should always know what value to use as input when correcting the bug in already-installed systems. I also verified that this update works when the function simultaneously removes netNamespaceFilePath entries, and that those entries are removed properly. Finally, manual testing also verifies the change. Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
61d13e7
to
4ff8812
Compare
@@ -178,7 +186,7 @@ func updateCsiClusterConfig(curr, clusterKey string, newCsiClusterConfigEntry *C | |||
} | |||
} | |||
for i, centry := range cc { | |||
if centry.ClusterID == clusterKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BlaineEXE i have a couple of other alternatives
can we move the above fix from 169 to 176 here? it ensures that it works for all the clusters even in case ifnewCsiClusterConfigEntry
is also nil.We can have empty namespace check along clusterID hererook/pkg/operator/ceph/csi/cluster_config.go
Line 140 in 8ec9f94
if cc[i].Namespace == clusterNamespace {
if cc[i].Namespace == clusterNamespace || cc[i].Namespace== "" && cc[i].ClusterID=clusterNamespace{
...
}
I also think might be an issue with the IsHolderEnabled()
check in the updateNetNamespaceFilePath
function. This is because other controllers use this function to update the configmap and it is dependent on the holder variable set by the CSI controller. we can look at it later.
The current change looks good to be but this is having more changes, we can also have option 3 which is one line of change, i will let you and @travisn do decide on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer to keep the "fix" separate since it can then log when it is fixing the condition. I'd like to be able to look for that log line in must-gathers from any future multus issues that users find.
As for your comment here...
I also think might be an issue with the
IsHolderEnabled()
check in theupdateNetNamespaceFilePath
function. This is because other controllers use this function to update the configmap and it is dependent on the holder variable set by the CSI controller. we can look at it later.
I tend to agree. Having IsHolderEnabled()
use a global variable seems risky to me. I considered changing that when I implemented the multus changes several weeks back, but I found that it required more code changes than seemed reasonable, and I also didn't have a high confidence that I wouldn't introduce new bugs by making the changes. Ultimately, since we are going to remove holder pods entirely from the code base in a couple releases, I decided that it was best to leave it as-is.
From a technical standpoint, I also don't believe users will see any race conditions related to the global variable, for reasons below.
When the CSI controller sees a change in multus/holder enable state, it updates the global variable before updating the csi config and before making pod changes. Unless I missed something (always possible), I don't believe there are race conditions in the CSI controller itself. Additionally, the rados namespace and subvolume reconcilers don't use the global variable when deciding to set the netns file path, so they should be immune to race conditions related to the global var.
The only case where I think a race condition could exist is if a user has multiple multus-enabled CephClusters. However, I don't think Rook actually supports that case because it can't apply multiple sets of network selection annotations to the CSI pods. If any users were trying to do that today, I think we would see a bug report from them.
All of that said, there are no guarantees that users won't see bugs, but I have done as much investigation and due diligence as possible to ensure that users won't have issues.
@@ -1112,7 +1112,9 @@ func (c *Cluster) saveMonConfig() error { | |||
Monitors: monEndpoints, | |||
}, | |||
} | |||
if err := csi.SaveClusterConfig(c.context.Clientset, c.Namespace, c.ClusterInfo, csiConfigEntry); err != nil { | |||
|
|||
clusterId := c.Namespace // cluster id is same as cluster namespace for CephClusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for rados namespace it can be different....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the mon code though. The rados namespace code uses a separate clusterID and namespace
csi: fix missing namespace bug in csi cluster config map (backport #14154)
Someone testing the Multus holder pod removal feature encountered an issue where the migration process failed to lead to a system state where PVCs could be created successfully.
The root cause was found to be a ceph csi config map wherein the primary CephCluster entry was lacking a value for the "namespace" field.
I observed this once in my development on the holder pod removal feature, but I was unable to reproduce and assumed it was my own error. Since this has been seen in a user environment, it must be that the error is a race condition, and I am unable to determine the exact source of the bug.
I do not believe this bug would be present if the code that updates the CSI configmap were properly idempotent, but it has many conditions based on prior states, and I was unable to determine how to resolve this underlying impelementation pattern issue.
Instead, I opted to separate the
clusterKey
parameter into two clear parts:clusterID
for whenclusterKey
is used as an analogue forclusterID
clusterNamespace
for whenclusterKey
is used as an analogue forclusterNamespace
I added unit tests to ensure that SaveClusterConfig() will be able to detect when the namespace is currently missing, and using the new
clusterNamespace
field, it should always know what value to use as input when correcting the bug in already-installed systems.I also verified that this update works when the function simultaneously removes netNamespaceFilePath entries, and that those entries are removed properly.
Finally, manual testing also verifies the change.
If any users find that PVCs don't work after following steps to remove multus holder pods, users should upgrade to Rook v1.14.3 (upcoming) to get this bug fix. It should resolve the issue for them. Users can determine if they are affected by this bug at any time using this command:
In the example output above notice that the final entry in the config data shows
"namespace":""
. The bug is present in this cluster. Users should not follow steps to disable holder pods until the issue is resolved. Users can manually resolve the issue by editing the configmap and inserting the CephCluster namespace into the values, like this"namespace":"rook-ceph"
(assuming the namespace is rook-ceph).Checklist: