nfs: restart nfs servers when configmap is updated #9104

BlaineEXE · 2021-11-04T16:59:02Z

When the configuration configmap is updated for a CephNFS server, the
NFS application should restart to ensure it is running with the latest
config.

Fixes #9028

Signed-off-by: Blaine Gardner blaine.gardner@redhat.com

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

BlaineEXE · 2021-11-05T22:22:21Z

pkg/operator/ceph/nfs/nfs_test.go

+
+	svcs, err := r.context.Clientset.CoreV1().Services(ns).List(context.TODO(), metav1.ListOptions{})
+	assert.NoError(t, err)
+	// Each NFS server gets a service.


This is the current behavior, but I'm not sure this is right. Should this be the design?

This is part of the design doc, so I guess it seems good.

The operator creates a k8s service for each of the ganesha server pods
to allow each of the them to have a stable IP address.

pkg/operator/ceph/nfs/nfs_test.go

pkg/operator/ceph/nfs/nfs.go

When the configuration configmap is updated for a CephNFS server, the NFS application should restart to ensure it is running with the latest config. Fixes rook#9028 Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>

travisn · 2021-11-08T18:09:09Z

pkg/operator/ceph/nfs/spec.go

@@ -141,6 +141,12 @@ func (r *ReconcileCephNFS) makeDeployment(nfs *cephv1.CephNFS, cfg daemonConfig)
 		ObjectMeta: metav1.ObjectMeta{
 			Name:   resourceName,
 			Labels: getLabels(nfs, cfg.ID, true),
+			Annotations: map[string]string{


You're seeing an update to the annotations cause a pod restart? I was thinking the annotations didn't affect the pod restart, but that must only be the case when the annotations are on the deployment instead of the podTemplateSpec.

I thought I had a similar comment in my initial review, somehow I cannot find it anymore. Why not use the deployment annotation instead of the pod's one?

I got distracted from my testing by other things. I'll start testing again today to verify that annotations can cause pods to restart. I'm pretty sure they can.

A changed annotation on the deployment won't cause the pod to restart, but a changed annotation on the pod should. Relatedly, that's why the ceph_version and rook_version labels are on deployments and not pods, because they don't cause Pods to restart.

Confirmed that the annotation causes a pod restart as desired.

One more question, the configmap is updated during the reconcile right? So regardless of the annotation being on pod or the deployment, the pod is going to be restarted right? If this is true I don't understand why we don't place the annotation on the deployment instead.
Quickly glancing at the code, I doubt the CephNFS config should be changed or supports any changes despite GaneshaServerSpec. The only CR spec that impacts the configmap is n.Spec.RADOS, especially n.Spec.RADOS.Namespace, is it safe to change it? Since recently changing n.Spec.RADOS.Pool does not do anything since it always defaults to .nfs.
If all the above is true and changing n.Spec.RADOS.Namespace is risky then I'm wondering why we would need to restart the pod if the config changes if there is a misconfiguration in the CR perhaps? I just don't see where it could be.
Thanks.

travisn · 2021-11-08T18:15:57Z

pkg/operator/ceph/nfs/nfs.go

-	ConfigConfigMap string              // name of configmap holding config
-	DataPathMap     *config.DataPathMap // location to store data in container
+	ID                  string              // letter ID of daemon (e.g., a, b, c, ...)
+	ConfigConfigMap     string              // name of configmap holding config


How is this configmap related to the RADOS object where the config is stored? If the RADOS config is updated, do we get events for that to restart the daemon as well?

The RADOS config contains additional config. This config is needed in order for Ganesha to find the RADOS config(s).

Is this something worth documenting? Or maybe an overhaul of the ceph nfs doc is better left for a separate PR.

This [configmap] isn't something that users should be touching, so I don't think it's worth documenting. It's what allows us to run the NFS servers with a generated config that doesn't get made in an init container that needs to run the Rook binary.

one more question... When is the configmap updated? When exports are added/removed?

Only when the CephNFS CR is updated AFAIK.

BlaineEXE · 2021-11-09T19:56:23Z

FWIW, this provides some idea of why config reloading is necessary, though it is from 2017: nfs-ganesha/nfs-ganesha#173

travisn · 2021-11-09T23:54:53Z

pkg/operator/ceph/nfs/nfs.go

-	ConfigConfigMap string              // name of configmap holding config
-	DataPathMap     *config.DataPathMap // location to store data in container
+	ID                  string              // letter ID of daemon (e.g., a, b, c, ...)
+	ConfigConfigMap     string              // name of configmap holding config


one more question... When is the configmap updated? When exports are added/removed?

leseb

I still need some more clarification. Thanks

leseb · 2021-11-10T08:54:37Z

pkg/operator/ceph/nfs/spec.go

@@ -141,6 +141,12 @@ func (r *ReconcileCephNFS) makeDeployment(nfs *cephv1.CephNFS, cfg daemonConfig)
 		ObjectMeta: metav1.ObjectMeta{
 			Name:   resourceName,
 			Labels: getLabels(nfs, cfg.ID, true),
+			Annotations: map[string]string{


One more question, the configmap is updated during the reconcile right? So regardless of the annotation being on pod or the deployment, the pod is going to be restarted right? If this is true I don't understand why we don't place the annotation on the deployment instead.
Quickly glancing at the code, I doubt the CephNFS config should be changed or supports any changes despite GaneshaServerSpec. The only CR spec that impacts the configmap is n.Spec.RADOS, especially n.Spec.RADOS.Namespace, is it safe to change it? Since recently changing n.Spec.RADOS.Pool does not do anything since it always defaults to .nfs.
If all the above is true and changing n.Spec.RADOS.Namespace is risky then I'm wondering why we would need to restart the pod if the config changes if there is a misconfiguration in the CR perhaps? I just don't see where it could be.
Thanks.

BlaineEXE · 2021-11-17T18:48:03Z

One more question, the configmap is updated during the reconcile right? So regardless of the annotation being on pod or the deployment, the pod is going to be restarted right? If this is true I don't understand why we don't place the annotation on the deployment instead.

This is during reconcile, but if they pod template spec in the deployment doesn't change, then the pod doesn't get restarted, even if the deployment itself changes. That is why the annotation is needed on the pod.

Quickly glancing at the code, I doubt the CephNFS config should be changed or supports any changes despite GaneshaServerSpec. The only CR spec that impacts the configmap is n.Spec.RADOS, especially n.Spec.RADOS.Namespace, is it safe to change it? Since recently changing n.Spec.RADOS.Pool does not do anything since it always defaults to .nfs.
If all the above is true and changing n.Spec.RADOS.Namespace is risky then I'm wondering why we would need to restart the pod if the config changes if there is a misconfiguration in the CR perhaps? I just don't see where it could be.

This is all also correct. These updates are only useful on Ceph Octopus, where the config is still allowed to change. I was thinking that this would be desirable for migrating to the .nfs pool before upgrading from Octopus to Pacific. It could allow the upgrade to be without downtime instead of with a ?-amount of downtime.

leseb · 2021-11-26T16:05:00Z

One more question, the configmap is updated during the reconcile right? So regardless of the annotation being on pod or the deployment, the pod is going to be restarted right? If this is true I don't understand why we don't place the annotation on the deployment instead.

This is during reconcile, but if they pod template spec in the deployment doesn't change, then the pod doesn't get restarted, even if the deployment itself changes. That is why the annotation is needed on the pod.

Quickly glancing at the code, I doubt the CephNFS config should be changed or supports any changes despite GaneshaServerSpec. The only CR spec that impacts the configmap is n.Spec.RADOS, especially n.Spec.RADOS.Namespace, is it safe to change it? Since recently changing n.Spec.RADOS.Pool does not do anything since it always defaults to .nfs.
If all the above is true and changing n.Spec.RADOS.Namespace is risky then I'm wondering why we would need to restart the pod if the config changes if there is a misconfiguration in the CR perhaps? I just don't see where it could be.

This is all also correct. These updates are only useful on Ceph Octopus, where the config is still allowed to change. I was thinking that this would be desirable for migrating to the .nfs pool before upgrading from Octopus to Pacific. It could allow the upgrade to be without downtime instead of with a ?-amount of downtime.

Sorry earlier I meant GaneshaRADOSSpec but the underlying specs were correct (RADOS.Namespace).
Understood for the migration, we need to migrate data from the old pool to .nfs and then change n.Spec.RADOS.Pool to .nfs. Then with this patch, the pod will restart so that the FSAL can be updated. I think this last comment is good for posterity, now we know the WHY this patch is needed.

leseb

Finally approving based on #9104 (comment). Thanks!

BlaineEXE force-pushed the nfs-restart-with-configmap branch from 2ff8604 to 38855ac Compare November 5, 2021 22:19

BlaineEXE commented Nov 5, 2021

View reviewed changes

BlaineEXE requested review from travisn and leseb November 5, 2021 22:22

BlaineEXE marked this pull request as ready for review November 5, 2021 22:22

leseb requested changes Nov 8, 2021

View reviewed changes

pkg/operator/ceph/nfs/nfs_test.go Outdated Show resolved Hide resolved

pkg/operator/ceph/nfs/nfs.go Outdated Show resolved Hide resolved

nfs: restart nfs servers when configmap is updated

5a7dee2

When the configuration configmap is updated for a CephNFS server, the NFS application should restart to ensure it is running with the latest config. Fixes rook#9028 Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>

BlaineEXE force-pushed the nfs-restart-with-configmap branch from 38855ac to 5a7dee2 Compare November 8, 2021 18:01

travisn reviewed Nov 8, 2021

View reviewed changes

travisn approved these changes Nov 9, 2021

View reviewed changes

leseb requested changes Nov 10, 2021

View reviewed changes

leseb approved these changes Nov 26, 2021

View reviewed changes

leseb merged commit bd962e3 into rook:master Nov 26, 2021

BlaineEXE deleted the nfs-restart-with-configmap branch November 30, 2021 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nfs: restart nfs servers when configmap is updated #9104

nfs: restart nfs servers when configmap is updated #9104

BlaineEXE commented Nov 4, 2021

BlaineEXE Nov 5, 2021

BlaineEXE Nov 8, 2021 •

edited

travisn Nov 8, 2021

leseb Nov 9, 2021

BlaineEXE Nov 9, 2021 •

edited

BlaineEXE Nov 9, 2021

leseb Nov 10, 2021

travisn Nov 8, 2021

BlaineEXE Nov 9, 2021

travisn Nov 9, 2021

BlaineEXE Nov 9, 2021 •

edited

travisn Nov 9, 2021

leseb Nov 10, 2021

BlaineEXE commented Nov 9, 2021

travisn Nov 9, 2021

leseb left a comment

leseb Nov 10, 2021

BlaineEXE commented Nov 17, 2021

leseb commented Nov 26, 2021

leseb left a comment

nfs: restart nfs servers when configmap is updated #9104

nfs: restart nfs servers when configmap is updated #9104

Conversation

BlaineEXE commented Nov 4, 2021

Choose a reason for hiding this comment

BlaineEXE Nov 8, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlaineEXE Nov 9, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlaineEXE Nov 9, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlaineEXE commented Nov 9, 2021

Choose a reason for hiding this comment

leseb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlaineEXE commented Nov 17, 2021

leseb commented Nov 26, 2021

leseb left a comment

Choose a reason for hiding this comment

BlaineEXE Nov 8, 2021 •

edited

BlaineEXE Nov 9, 2021 •

edited

BlaineEXE Nov 9, 2021 •

edited