Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nfs: add livness-probe to nfs-ganesha container #12845

Merged
merged 2 commits into from
Nov 13, 2023

Conversation

synarete
Copy link
Contributor

@synarete synarete commented Sep 4, 2023

Use K8s LivenessProbe mechanism to check OK-status of nfs-ganesha container. Expects NFS TCP-port 2049 to be active; that is, willing to accept new connections. Define permissive values to liveness-probe to ensure that the nfs-service is defined in failed-state only when it has non-recoverable error.

The current definition of LivenessProbe is expected to guard the nfs pod from at least the following two cases:

  • Deadlocks: where an nfs-ganesha server is running, but unable serve new connections due to internal bad-state.

  • Resource exhaustion on the host node (e.g. OOM) which prevents the server from accepting new connections.

In both cases we expect K8s to reschedule the nfs pod, most likely on different host node.

Refs rook issue #12719

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@BlaineEXE
Copy link
Member

BlaineEXE commented Sep 7, 2023

I'd like to get some input from the NFS-Ganesha devs here as well to make sure adding a liveness probe will have the right behavior for the whole system. I'll reach out to them to comment.

We are recently learning that liveness probes can be problematic for the MDS (#12789), and we have learned in the past that liveness probes are problematic for RGW.

For RGW this is explained well here:

// RGW has internal mechanisms for restarting its processing if it gets stuck. any failures detected
// in a liveness probe are likely to either (1) be resolved by the RGW internally or (2) be a result
// of connection issues to the Ceph cluster. in the first case, restarting is unnecessary. in the
// second case, restarting will only cause more load to the Ceph cluster by causing RGWs to attempt
// to re-connect, potentially causing more issues with the Ceph cluster. forcing a restart of the
// RGW is more likely to cause issues than solve them, so do not implement this probe.
func noLivenessProbe() *v1.Probe {
return nil
}

@BlaineEXE
Copy link
Member

I got some offline advice from Kaleb

see https://github.com/gluster/glusterfs/tree/devel/extras/ganesha/*
I.e. pacemaker. But to cut to the chase the test to see if a ganesha.nfsd is live is just a test -e /proc/$pid-of-ganesha where $pid-of-ganesha is obtained from /var/run/ganesha.pid`

I think it may be good to change to a script-based probe that checks the PID instead, since that is the latest advice

@synarete
Copy link
Contributor Author

I got some offline advice from Kaleb

see https://github.com/gluster/glusterfs/tree/devel/extras/ganesha/*
I.e. pacemaker. But to cut to the chase the test to see if a ganesha.nfsd is live is just a test -e /proc/$pid-of-ganesha where $pid-of-ganesha is obtained from /var/run/ganesha.pid`

I think it may be good to change to a script-based probe that checks the PID instead, since that is the latest advice

I will try create another PR with the $pid-of-ganesha option so we may have the two option at hand and choose the best. However, keep in mind that this option is targeting non-K8s clusters, where one need to do explicit checks for process aliveness. In rook's case if nfs-ganesha server dies it will be detected by K8s itself.

@BlaineEXE
Copy link
Member

Distillation of discussion today:

Using a null RPC call to the NFS-Ganesha server will be the best check we can implement today. As long as the timeout on the null RPC call is sufficiently long, it should be extremely rare to get a false-failure status.

It's not clear to me how to initiate a null RPC call, nor how long the timeout should be to make false-failures exceedingly rare. I'll reach out to NFS-Ganesha devs to clarify those 2 questions.

@BlaineEXE
Copy link
Member

@ffilz suggested:

A null RPC call can be made with (mount not required):
rpcinfo -t {server hostname/address} nfs 4
I think that uses a 30 second timeout but I'm not sure. To get more control of timeout, you would need to code up an RPC client.

As another piece of info, at least one Golang NFS client library exists https://github.com/Cyberax/go-nfs-client

It looks like rpcinfo has a --timeout flag, at least for some implementations. The larger issue is that rpcinfo is not present in the quay.io/ceph/ceph:v18 upstream image as of today.

@BlaineEXE
Copy link
Member

I can't find an alternative to rpcinfo, but it looks like it is a small install. In fedora, it is provided by dnf install rpcbind.

[root@d575d9e94d8b /]# ls -alFh /usr/bin/rpcinfo
-rwxr-xr-x. 1 root root 69K Aug  6  2022 /usr/bin/rpcinfo*
[root@d575d9e94d8b /]# ls -alFh /usr/bin/rpcbind
-rwxr-xr-x. 1 root root 70K Aug  6  2022 /usr/bin/rpcbind*

@BlaineEXE
Copy link
Member

I opened a PR to add rpcbind to the upstream ceph image here: ceph/ceph-container#2156

I think the best thing we can do for Rook users in this case is to add a liveness probe that uses the NULL RPC check outlined by Frank and disable it by default. For any Rook users that know they are using images that have rpcbind installed, they can enable the probe.

It could be possible for Rook to implement the probe conditionally (like below), but I believe that defeats some of the intent of the probe itself.

if which rpcinfo; then 
   # do probe
else
   exit 0 # assume success if cannot check
fi

@obnoxxx
Copy link
Contributor

obnoxxx commented Sep 11, 2023

@BlaineEXE qrote:

I got some offline advice from Kaleb

see https://github.com/gluster/glusterfs/tree/devel/extras/ganesha/*
I.e. pacemaker. But to cut to the chase the test to see if a ganesha.nfsd is live is just a test -e /proc/$pid-of-ganesha where $pid-of-ganesha is obtained from /var/run/ganesha.pid`

not at all saying that Kaleb is wrong here but maybe this could be refined by avoiding to probe /var/run but by using the pidof tool instead, like pidof ganesha ...

I think it may be good to change to a script-based probe that checks the PID instead, since that is the latest advice

yes, a separate probe script would be great!

I will try create another PR with the $pid-of-ganesha option so we may have the two option at hand and choose the best. However, keep in mind that this option is targeting non-K8s clusters, where one need to do explicit checks for process aliveness. In rook's case if nfs-ganesha server dies it will be detected by K8s itself.

I agree that making the probe script so generic that it is not specific to a k8s deployment would be ideal!

It May even be worth considering to add it to the ganesha codebase itself and have rook consume it once it has landed?

@BlaineEXE
Copy link
Member

BlaineEXE commented Sep 11, 2023

@synarete because we are proposing (at this point) to have the probe be disabled by default, we need to add an option to control the server liveness probe via the CephNFS CRD. There is an example of how/where that's done with CephFilesystem MDS here:

// +optional
LivenessProbe *ProbeSpec `json:"livenessProbe,omitempty"`

That would be in the server spec here:

type GaneshaServerSpec struct {

We are proposing to disable it by default today, but once Rook can assume that all builds of Ceph will have the rpcinfo tool installed by default, Rook will enable it by default. Likely, Rook can check if Ceph is v18.2.1 (v18.2.2 if necessary) or higher and assume it's safe to enable.

Because of the above, I would suggest for Rook's code something like:

probeSpec := nfs.Spec.Server.LivenessProbe
livenessProbe = genDefaultLivenessProbe(/* ... */)
if probeSpec == nil && !cephVersion.IsAtLeast(cephver.CephVersion{"18", "2", "1"}) {
    // do not add probe when `rpcinfo` tool used for probe may not be present in image
    // ref: https://github.com/ceph/ceph-container/pull/2156
    defaultProbe = emptyProbe()
} else if probeSpec != nil && probeSpec.Disabled {
   livenessProbe = emptyProbe() // disable probe if explicitly disabled
}
// TODO: use GetProbeWithDefaults to apply `livenessProbe` to the pod spec

[update]

In the script that runs rpcinfo, it would be good to first check if the tool exists and then give an actionable error message

set -e

if ! which rpcinfo >/dev/null; then
  echo "rpcinfo tool not found: install rpcinfo/rpcbind package in the ceph image or set CephNFS 'spec.server.livenessProbe.disable: true' to disable this health check"
fi

rpcinfo --timeout <T> localhost:<port> nfs 4 # <-- modify as needed

@BlaineEXE
Copy link
Member

@synarete the latest build of quay.io/ceph/daemon-base has the rpcinfo package ready to go.

@BlaineEXE
Copy link
Member

Ceph v18.2.0 is the latest release today, so we should be able to use v18.2.1 as the check for determining if we can use rpcinfo in the probe by default.

@synarete
Copy link
Contributor Author

synarete commented Oct 16, 2023

Ceph v18.2.0 is the latest release today, so we should be able to use v18.2.1 as the check for determining if we can use rpcinfo in the probe by default.

@BlaineEXE When configure nfs-ganesha without nfsv3, rpcinfo is a not relevant as liveness probe. Maybe we should investigate other options (e.g., using pynfs).

@ffilz
Copy link

ffilz commented Oct 16, 2023

Ceph v18.2.0 is the latest release today, so we should be able to use v18.2.1 as the check for determining if we can use rpcinfo in the probe by default.

@BlaineEXE When configure nfs-ganesha without nfsv3, rpcinfo is a not relevant as liveness probe. Maybe we should investigate other options (e.g., using pynfs).

Why would rpcinfo not be relevant for liveness without NFSv3? rpcinfo is fully able to probe NFSv4.

pynfs is a test case tool and awfully heavy for doing a liveness check. It also by default creates files on the chosen export.

@BlaineEXE
Copy link
Member

BlaineEXE commented Oct 16, 2023

@synarete this thread seems to show a similar issue using rpcbind with NFSv4 (I believe with the linux kernel nfs server), but they were able to resolve it by specifying a domain name. @ffilz do you think we could be seeing a similar issue here with nfs-ganesha?

https://forums.freebsd.org/threads/nfs-doesnt-register-with-rpcbind.78203/

@ffilz
Copy link

ffilz commented Oct 16, 2023

@synarete this thread seems to show a similar issue using rpcbind with NFSv4 (I believe with the linux kernel nfs server), but they were able to resolve it by specifying a domain name. @ffilz do you think we could be seeing a similar issue here with nfs-ganesha?

https://forums.freebsd.org/threads/nfs-doesnt-register-with-rpcbind.78203/

I guess one issue would be if rpcbind isn't running on the host/vm/container nfs-ganesha is running on, then rpcinfo can't be used, except you should still be able to use:

rpcinfo -n 2049 -t {host} nfs 4

And then I believe it will not attempt to contact rpcbind.

Also, unless the Ganesha root pseudofs has been restrictively configured, the following would work for a reasonable liveness test:

set host=a.b.c.d && mkdir /tmp/mnt-$host && mount $host:/ /tmp/mnt-$host && umount /tmp/$host && rmdir /tmp/mnt-$host

Maybe put that in a shell script so it can make sure to clean up /tmp/$host even if the mount fails.

@synarete
Copy link
Contributor Author

rpcinfo -n 2049 -t {host} nfs 4

A bit of context: I deploy a simple rook+cephfs+nfs setup on OpenShift (4.13, AWS). All pods run ok. There isn't rpcbind process runnings, and rpcinfo fails with Connection refused both when run from within the nfs-ganesha container or from external pod:

[root@centos /]# rpcinfo -n 2049 -t 10.129.2.28 nfs 4                                                      
10.129.2.28: RPC: Remote system error - Connection refused

[root@rook-ceph-my-nfs /]# rpcinfo -n 2049 -t 127.0.0.1 nfs 4                                              
127.0.0.1: RPC: Remote system error - Connection refused

@ffilz Please let us know if anything is missing in the config file:

[root@rook-ceph-my-nfs /]# cat /etc/ganesha/ganesha.conf 

NFS_CORE_PARAM {
        Enable_NLM = false;
        Enable_RQUOTA = false;
        Protocols = 4;
}

MDCACHE {
        Dir_Chunk = 0;
}

EXPORT_DEFAULTS {
        Attr_Expiration_Time = 0;
}

NFSv4 {
        Delegations = false;
        RecoveryBackend = 'rados_cluster';
        Minor_Versions = 1, 2;
}

RADOS_KV {
        ceph_conf = "/etc/ceph/ceph.conf";
        userid = nfs-ganesha.my-nfs.a;
        nodeid = my-nfs.a;
        pool = ".nfs";
        namespace = "my-nfs";
}

RADOS_URLS {
        ceph_conf = "/etc/ceph/ceph.conf";
        userid = nfs-ganesha.my-nfs.a;
        watch_url = 'rados://.nfs/my-nfs/conf-nfs.my-nfs';
}

RGW {
        name = "client.nfs-ganesha.my-nfs.a";
}

%url    rados://.nfs/my-nfs/conf-nfs.my-nfs

@ffilz
Copy link

ffilz commented Oct 17, 2023

You can do:

rpcinfo -a 1.2.3.4.8.1 -T tcp nfs 4

Where 1.2.3.4 is the IP address of the server (or you can use an IPv6 address). The .8.1 at the end is port 2049 in the right format.

-n is supposed to work, but got broken...

@synarete
Copy link
Contributor Author

rpcinfo -a 1.2.3.4.8.1 -T tcp nfs 4

@ffilz Many tanks for this hint. Using this notation I was able to successfully send-recv rpcinfo to the nfs-ganesha server, both from within the container itself and from external pod:

[root@rook-ceph-my-nfs /]# rpcinfo -a 127.0.0.1.8.1 -T tcp nfs 4
program 100003 version 4 ready and waiting

[root@centos /]# rpcinfo -a 10.131.0.30.8.1 -T tcp nfs 4
program 100003 version 4 ready and waiting

@BlaineEXE I will now fix this PR accordingly.

@BlaineEXE
Copy link
Member

// ProbeSpec is a wrapper around Probe so it can be enabled or disabled for a Ceph daemon
type ProbeSpec struct {
// Disabled determines whether probe is disable or not
// +optional
Disabled bool `json:"disabled,omitempty"`
// Probe describes a health check to be performed against a container to determine whether it is
// alive or ready to receive traffic.
// +optional
Probe *v1.Probe `json:"probe,omitempty"`
}

@synarete synarete force-pushed the nfs-liveness-probe branch 2 times, most recently from 0181c98 to bec8dc3 Compare October 26, 2023 07:43
Comment on lines +269 to +264
// liveness-probe using K8s builtin TCP-socket action
return controller.GenerateLivenessProbeTcpPort(nfsPort, failureThreshold)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a backup option when RPC isn't available!

Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks pretty great. Very concise and clear.

The only thing that I think needs updated is slight adjustment to the user-defined behavior of the probe. You can look into how this is done in the RGW code for an example:

func configureReadinessProbe(container *v1.Container, healthCheck cephv1.ObjectHealthCheckSpec) {

Also, please let me know if you'd like me to take over the work for any reason, no questions asked. :)

pkg/operator/ceph/nfs/spec.go Outdated Show resolved Hide resolved
Comment on lines 123 to 125
LivenessProbe: &cephv1.ProbeSpec{
Disabled: false,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to also verify that the probe is present in the output?

With adding the ability to override the probe also, I think it would be good to override one of the properties here and check that the output has default values for the probe, except for the overridden value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a dedicate test to check livness-probe settings. Naturally, we may check the correctness of rpcinfo as liveness-probe only against real nfs-ganesha container.

@synarete synarete force-pushed the nfs-liveness-probe branch 4 times, most recently from 7c8be8e to 2a88ae4 Compare October 30, 2023 16:27
@BlaineEXE
Copy link
Member

There are ongoing NFS CI issues that I'm starting to look into. I think it's important to resolve those before moving forward here. Sorry for the added delay here.

@synarete synarete force-pushed the nfs-liveness-probe branch 4 times, most recently from 7e0eaeb to 1776528 Compare November 7, 2023 12:05
Allow end-user to have full control on NFS-Ganesha liveness probe:

  - Disabled
  - Enabled, using user-provided definition
  - Enabled, using Rook's default liveness probe for Ganesha.

Signed-off-by: Shachar Sharon <ssharon@redhat.com>
Use K8s LivenessProbe mechanism to check OK-status of nfs-ganesha
container. A user may define his own lineness-probe, or a default one
which expects NFS TCP-port 2049 to be active; that is, willing to accept
new connections: for Ceph>=18.2.1 issue 'rpcinfo' call on local pod;
otherwise use standard K8s TCP-socket liveness probe mechanism.
Define permissive values to liveness-probe to ensure that the NFS
service is defined in failed-state only when it has non-recoverable
error.

The current default definition of LivenessProbe is expected to guard the
nfs pod from at least the following two cases:

  - Deadlocks: where an nfs-ganesha server is running, but unable serve
    new connections due to internal bad-state.

  - Resource exhaustion on the host node (e.g. OOM) which prevents the
    server from accepting new connections and reply to NULL RPC request.

In both cases we expect K8s to reschedule the nfs pod, most likely on
different host node.

Refs rook issue rook#12719

Signed-off-by: Shachar Sharon <ssharon@redhat.com>
Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I did a manual test also, and the probe defaulting behavior works as expected. Thanks @synarete, and sorry for the delay in reviewing.

@BlaineEXE BlaineEXE merged commit 5230fcd into rook:master Nov 13, 2023
48 of 51 checks passed
mergify bot added a commit that referenced this pull request Nov 13, 2023
nfs: add livness-probe to nfs-ganesha container (backport #12845)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants