Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nfs: restart nfs servers when configmap is updated #9104

Merged
merged 1 commit into from Nov 26, 2021

Conversation

BlaineEXE
Copy link
Member

When the configuration configmap is updated for a CephNFS server, the
NFS application should restart to ensure it is running with the latest
config.

Fixes #9028

Signed-off-by: Blaine Gardner blaine.gardner@redhat.com

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.


svcs, err := r.context.Clientset.CoreV1().Services(ns).List(context.TODO(), metav1.ListOptions{})
assert.NoError(t, err)
// Each NFS server gets a service.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current behavior, but I'm not sure this is right. Should this be the design?

Copy link
Member Author

@BlaineEXE BlaineEXE Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the design doc, so I guess it seems good.

The operator creates a k8s service for each of the ganesha server pods
to allow each of the them to have a stable IP address.

@BlaineEXE BlaineEXE marked this pull request as ready for review November 5, 2021 22:22
pkg/operator/ceph/nfs/nfs_test.go Outdated Show resolved Hide resolved
pkg/operator/ceph/nfs/nfs.go Outdated Show resolved Hide resolved
When the configuration configmap is updated for a CephNFS server, the
NFS application should restart to ensure it is running with the latest
config.

Fixes rook#9028

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
@@ -141,6 +141,12 @@ func (r *ReconcileCephNFS) makeDeployment(nfs *cephv1.CephNFS, cfg daemonConfig)
ObjectMeta: metav1.ObjectMeta{
Name: resourceName,
Labels: getLabels(nfs, cfg.ID, true),
Annotations: map[string]string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're seeing an update to the annotations cause a pod restart? I was thinking the annotations didn't affect the pod restart, but that must only be the case when the annotations are on the deployment instead of the podTemplateSpec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had a similar comment in my initial review, somehow I cannot find it anymore. Why not use the deployment annotation instead of the pod's one?

Copy link
Member Author

@BlaineEXE BlaineEXE Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got distracted from my testing by other things. I'll start testing again today to verify that annotations can cause pods to restart. I'm pretty sure they can.

A changed annotation on the deployment won't cause the pod to restart, but a changed annotation on the pod should. Relatedly, that's why the ceph_version and rook_version labels are on deployments and not pods, because they don't cause Pods to restart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that the annotation causes a pod restart as desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question, the configmap is updated during the reconcile right? So regardless of the annotation being on pod or the deployment, the pod is going to be restarted right? If this is true I don't understand why we don't place the annotation on the deployment instead.
Quickly glancing at the code, I doubt the CephNFS config should be changed or supports any changes despite GaneshaServerSpec. The only CR spec that impacts the configmap is n.Spec.RADOS, especially n.Spec.RADOS.Namespace, is it safe to change it? Since recently changing n.Spec.RADOS.Pool does not do anything since it always defaults to .nfs.
If all the above is true and changing n.Spec.RADOS.Namespace is risky then I'm wondering why we would need to restart the pod if the config changes if there is a misconfiguration in the CR perhaps? I just don't see where it could be.
Thanks.

ConfigConfigMap string // name of configmap holding config
DataPathMap *config.DataPathMap // location to store data in container
ID string // letter ID of daemon (e.g., a, b, c, ...)
ConfigConfigMap string // name of configmap holding config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this configmap related to the RADOS object where the config is stored? If the RADOS config is updated, do we get events for that to restart the daemon as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RADOS config contains additional config. This config is needed in order for Ganesha to find the RADOS config(s).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something worth documenting? Or maybe an overhaul of the ceph nfs doc is better left for a separate PR.

Copy link
Member Author

@BlaineEXE BlaineEXE Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This [configmap] isn't something that users should be touching, so I don't think it's worth documenting. It's what allows us to run the NFS servers with a generated config that doesn't get made in an init container that needs to run the Rook binary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more question... When is the configmap updated? When exports are added/removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only when the CephNFS CR is updated AFAIK.

@BlaineEXE
Copy link
Member Author

FWIW, this provides some idea of why config reloading is necessary, though it is from 2017: nfs-ganesha/nfs-ganesha#173

ConfigConfigMap string // name of configmap holding config
DataPathMap *config.DataPathMap // location to store data in container
ID string // letter ID of daemon (e.g., a, b, c, ...)
ConfigConfigMap string // name of configmap holding config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more question... When is the configmap updated? When exports are added/removed?

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need some more clarification. Thanks

@@ -141,6 +141,12 @@ func (r *ReconcileCephNFS) makeDeployment(nfs *cephv1.CephNFS, cfg daemonConfig)
ObjectMeta: metav1.ObjectMeta{
Name: resourceName,
Labels: getLabels(nfs, cfg.ID, true),
Annotations: map[string]string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question, the configmap is updated during the reconcile right? So regardless of the annotation being on pod or the deployment, the pod is going to be restarted right? If this is true I don't understand why we don't place the annotation on the deployment instead.
Quickly glancing at the code, I doubt the CephNFS config should be changed or supports any changes despite GaneshaServerSpec. The only CR spec that impacts the configmap is n.Spec.RADOS, especially n.Spec.RADOS.Namespace, is it safe to change it? Since recently changing n.Spec.RADOS.Pool does not do anything since it always defaults to .nfs.
If all the above is true and changing n.Spec.RADOS.Namespace is risky then I'm wondering why we would need to restart the pod if the config changes if there is a misconfiguration in the CR perhaps? I just don't see where it could be.
Thanks.

@BlaineEXE
Copy link
Member Author

One more question, the configmap is updated during the reconcile right? So regardless of the annotation being on pod or the deployment, the pod is going to be restarted right? If this is true I don't understand why we don't place the annotation on the deployment instead.

This is during reconcile, but if they pod template spec in the deployment doesn't change, then the pod doesn't get restarted, even if the deployment itself changes. That is why the annotation is needed on the pod.

Quickly glancing at the code, I doubt the CephNFS config should be changed or supports any changes despite GaneshaServerSpec. The only CR spec that impacts the configmap is n.Spec.RADOS, especially n.Spec.RADOS.Namespace, is it safe to change it? Since recently changing n.Spec.RADOS.Pool does not do anything since it always defaults to .nfs.
If all the above is true and changing n.Spec.RADOS.Namespace is risky then I'm wondering why we would need to restart the pod if the config changes if there is a misconfiguration in the CR perhaps? I just don't see where it could be.

This is all also correct. These updates are only useful on Ceph Octopus, where the config is still allowed to change. I was thinking that this would be desirable for migrating to the .nfs pool before upgrading from Octopus to Pacific. It could allow the upgrade to be without downtime instead of with a ?-amount of downtime.

@leseb
Copy link
Member

leseb commented Nov 26, 2021

One more question, the configmap is updated during the reconcile right? So regardless of the annotation being on pod or the deployment, the pod is going to be restarted right? If this is true I don't understand why we don't place the annotation on the deployment instead.

This is during reconcile, but if they pod template spec in the deployment doesn't change, then the pod doesn't get restarted, even if the deployment itself changes. That is why the annotation is needed on the pod.

Quickly glancing at the code, I doubt the CephNFS config should be changed or supports any changes despite GaneshaServerSpec. The only CR spec that impacts the configmap is n.Spec.RADOS, especially n.Spec.RADOS.Namespace, is it safe to change it? Since recently changing n.Spec.RADOS.Pool does not do anything since it always defaults to .nfs.
If all the above is true and changing n.Spec.RADOS.Namespace is risky then I'm wondering why we would need to restart the pod if the config changes if there is a misconfiguration in the CR perhaps? I just don't see where it could be.

This is all also correct. These updates are only useful on Ceph Octopus, where the config is still allowed to change. I was thinking that this would be desirable for migrating to the .nfs pool before upgrading from Octopus to Pacific. It could allow the upgrade to be without downtime instead of with a ?-amount of downtime.

Sorry earlier I meant GaneshaRADOSSpec but the underlying specs were correct (RADOS.Namespace).
Understood for the migration, we need to migrate data from the old pool to .nfs and then change n.Spec.RADOS.Pool to .nfs. Then with this patch, the pod will restart so that the FSAL can be updated. I think this last comment is good for posterity, now we know the WHY this patch is needed.

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally approving based on #9104 (comment). Thanks!

@leseb leseb merged commit bd962e3 into rook:master Nov 26, 2021
@BlaineEXE BlaineEXE deleted the nfs-restart-with-configmap branch November 30, 2021 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically restart NFS-Ganesha container when the ConfigMap is updated
3 participants