nfs: add livness-probe to nfs-ganesha container #12845

synarete · 2023-09-04T10:01:34Z

Use K8s LivenessProbe mechanism to check OK-status of nfs-ganesha container. Expects NFS TCP-port 2049 to be active; that is, willing to accept new connections. Define permissive values to liveness-probe to ensure that the nfs-service is defined in failed-state only when it has non-recoverable error.

The current definition of LivenessProbe is expected to guard the nfs pod from at least the following two cases:

Deadlocks: where an nfs-ganesha server is running, but unable serve new connections due to internal bad-state.
Resource exhaustion on the host node (e.g. OOM) which prevents the server from accepting new connections.

In both cases we expect K8s to reschedule the nfs pod, most likely on different host node.

Refs rook issue #12719

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

pkg/operator/ceph/nfs/spec.go

BlaineEXE · 2023-09-07T15:10:57Z

I'd like to get some input from the NFS-Ganesha devs here as well to make sure adding a liveness probe will have the right behavior for the whole system. I'll reach out to them to comment.

We are recently learning that liveness probes can be problematic for the MDS (#12789), and we have learned in the past that liveness probes are problematic for RGW.

For RGW this is explained well here:

rook/pkg/operator/ceph/object/spec.go

Lines 440 to 448 in 0ba1578

    
           // RGW has internal mechanisms for restarting its processing if it gets stuck. any failures detected 
        
           // in a liveness probe are likely to either (1) be resolved by the RGW internally or (2) be a result 
        
           // of connection issues to the Ceph cluster. in the first case, restarting is unnecessary. in the 
        
           // second case, restarting will only cause more load to the Ceph cluster by causing RGWs to attempt 
        
           // to re-connect, potentially causing more issues with the Ceph cluster. forcing a restart of the 
        
           // RGW is more likely to cause issues than solve them, so do not implement this probe. 
        
           func noLivenessProbe() *v1.Probe { 
        
           	return nil 
        
           }

BlaineEXE · 2023-09-07T17:18:47Z

I got some offline advice from Kaleb

see https://github.com/gluster/glusterfs/tree/devel/extras/ganesha/*
I.e. pacemaker. But to cut to the chase the test to see if a ganesha.nfsd is live is just a test -e /proc/$pid-of-ganesha where $pid-of-ganesha is obtained from /var/run/ganesha.pid`

I think it may be good to change to a script-based probe that checks the PID instead, since that is the latest advice

synarete · 2023-09-11T05:50:23Z

I got some offline advice from Kaleb

see https://github.com/gluster/glusterfs/tree/devel/extras/ganesha/*
I.e. pacemaker. But to cut to the chase the test to see if a ganesha.nfsd is live is just a test -e /proc/$pid-of-ganesha where $pid-of-ganesha is obtained from /var/run/ganesha.pid`

I think it may be good to change to a script-based probe that checks the PID instead, since that is the latest advice

I will try create another PR with the $pid-of-ganesha option so we may have the two option at hand and choose the best. However, keep in mind that this option is targeting non-K8s clusters, where one need to do explicit checks for process aliveness. In rook's case if nfs-ganesha server dies it will be detected by K8s itself.

BlaineEXE · 2023-09-11T16:02:53Z

Distillation of discussion today:

Using a null RPC call to the NFS-Ganesha server will be the best check we can implement today. As long as the timeout on the null RPC call is sufficiently long, it should be extremely rare to get a false-failure status.

It's not clear to me how to initiate a null RPC call, nor how long the timeout should be to make false-failures exceedingly rare. I'll reach out to NFS-Ganesha devs to clarify those 2 questions.

BlaineEXE · 2023-09-11T16:28:19Z

@ffilz suggested:

A null RPC call can be made with (mount not required):
rpcinfo -t {server hostname/address} nfs 4
I think that uses a 30 second timeout but I'm not sure. To get more control of timeout, you would need to code up an RPC client.

As another piece of info, at least one Golang NFS client library exists https://github.com/Cyberax/go-nfs-client

It looks like rpcinfo has a --timeout flag, at least for some implementations. The larger issue is that rpcinfo is not present in the quay.io/ceph/ceph:v18 upstream image as of today.

BlaineEXE · 2023-09-11T16:43:17Z

I can't find an alternative to rpcinfo, but it looks like it is a small install. In fedora, it is provided by dnf install rpcbind.

[root@d575d9e94d8b /]# ls -alFh /usr/bin/rpcinfo
-rwxr-xr-x. 1 root root 69K Aug  6  2022 /usr/bin/rpcinfo*
[root@d575d9e94d8b /]# ls -alFh /usr/bin/rpcbind
-rwxr-xr-x. 1 root root 70K Aug  6  2022 /usr/bin/rpcbind*

BlaineEXE · 2023-09-11T17:20:01Z

I opened a PR to add rpcbind to the upstream ceph image here: ceph/ceph-container#2156

I think the best thing we can do for Rook users in this case is to add a liveness probe that uses the NULL RPC check outlined by Frank and disable it by default. For any Rook users that know they are using images that have rpcbind installed, they can enable the probe.

It could be possible for Rook to implement the probe conditionally (like below), but I believe that defeats some of the intent of the probe itself.

if which rpcinfo; then 
   # do probe
else
   exit 0 # assume success if cannot check
fi

obnoxxx · 2023-09-11T17:25:57Z

@BlaineEXE qrote:

I got some offline advice from Kaleb

see https://github.com/gluster/glusterfs/tree/devel/extras/ganesha/*
I.e. pacemaker. But to cut to the chase the test to see if a ganesha.nfsd is live is just a test -e /proc/$pid-of-ganesha where $pid-of-ganesha is obtained from /var/run/ganesha.pid`

not at all saying that Kaleb is wrong here but maybe this could be refined by avoiding to probe /var/run but by using the pidof tool instead, like pidof ganesha ...

I think it may be good to change to a script-based probe that checks the PID instead, since that is the latest advice

yes, a separate probe script would be great!

I will try create another PR with the $pid-of-ganesha option so we may have the two option at hand and choose the best. However, keep in mind that this option is targeting non-K8s clusters, where one need to do explicit checks for process aliveness. In rook's case if nfs-ganesha server dies it will be detected by K8s itself.

I agree that making the probe script so generic that it is not specific to a k8s deployment would be ideal!

It May even be worth considering to add it to the ganesha codebase itself and have rook consume it once it has landed?

BlaineEXE · 2023-09-11T18:21:38Z

@synarete because we are proposing (at this point) to have the probe be disabled by default, we need to add an option to control the server liveness probe via the CephNFS CRD. There is an example of how/where that's done with CephFilesystem MDS here:

rook/pkg/apis/ceph.rook.io/v1/types.go

Lines 1130 to 1131 in d845107

    
           // +optional 
        
           LivenessProbe *ProbeSpec `json:"livenessProbe,omitempty"`

That would be in the server spec here:

rook/pkg/apis/ceph.rook.io/v1/types.go

Line 2063 in 9a5ea4c

type GaneshaServerSpec struct {

We are proposing to disable it by default today, but once Rook can assume that all builds of Ceph will have the rpcinfo tool installed by default, Rook will enable it by default. Likely, Rook can check if Ceph is v18.2.1 (v18.2.2 if necessary) or higher and assume it's safe to enable.

Because of the above, I would suggest for Rook's code something like:

probeSpec := nfs.Spec.Server.LivenessProbe
livenessProbe = genDefaultLivenessProbe(/* ... */)
if probeSpec == nil && !cephVersion.IsAtLeast(cephver.CephVersion{"18", "2", "1"}) {
    // do not add probe when `rpcinfo` tool used for probe may not be present in image
    // ref: https://github.com/ceph/ceph-container/pull/2156
    defaultProbe = emptyProbe()
} else if probeSpec != nil && probeSpec.Disabled {
   livenessProbe = emptyProbe() // disable probe if explicitly disabled
}
// TODO: use GetProbeWithDefaults to apply `livenessProbe` to the pod spec

[update]

In the script that runs rpcinfo, it would be good to first check if the tool exists and then give an actionable error message

set -e

if ! which rpcinfo >/dev/null; then
  echo "rpcinfo tool not found: install rpcinfo/rpcbind package in the ceph image or set CephNFS 'spec.server.livenessProbe.disable: true' to disable this health check"
fi

rpcinfo --timeout <T> localhost:<port> nfs 4 # <-- modify as needed

BlaineEXE · 2023-09-19T20:51:07Z

@synarete the latest build of quay.io/ceph/daemon-base has the rpcinfo package ready to go.

BlaineEXE · 2023-09-19T20:52:32Z

Ceph v18.2.0 is the latest release today, so we should be able to use v18.2.1 as the check for determining if we can use rpcinfo in the probe by default.

synarete · 2023-10-16T12:25:31Z

Ceph v18.2.0 is the latest release today, so we should be able to use v18.2.1 as the check for determining if we can use rpcinfo in the probe by default.

@BlaineEXE When configure nfs-ganesha without nfsv3, rpcinfo is a not relevant as liveness probe. Maybe we should investigate other options (e.g., using pynfs).

ffilz · 2023-10-16T14:49:20Z

Ceph v18.2.0 is the latest release today, so we should be able to use v18.2.1 as the check for determining if we can use rpcinfo in the probe by default.

@BlaineEXE When configure nfs-ganesha without nfsv3, rpcinfo is a not relevant as liveness probe. Maybe we should investigate other options (e.g., using pynfs).

Why would rpcinfo not be relevant for liveness without NFSv3? rpcinfo is fully able to probe NFSv4.

pynfs is a test case tool and awfully heavy for doing a liveness check. It also by default creates files on the chosen export.

BlaineEXE · 2023-10-16T16:45:39Z

@synarete this thread seems to show a similar issue using rpcbind with NFSv4 (I believe with the linux kernel nfs server), but they were able to resolve it by specifying a domain name. @ffilz do you think we could be seeing a similar issue here with nfs-ganesha?

https://forums.freebsd.org/threads/nfs-doesnt-register-with-rpcbind.78203/

ffilz · 2023-10-16T18:54:26Z

@synarete this thread seems to show a similar issue using rpcbind with NFSv4 (I believe with the linux kernel nfs server), but they were able to resolve it by specifying a domain name. @ffilz do you think we could be seeing a similar issue here with nfs-ganesha?

https://forums.freebsd.org/threads/nfs-doesnt-register-with-rpcbind.78203/

I guess one issue would be if rpcbind isn't running on the host/vm/container nfs-ganesha is running on, then rpcinfo can't be used, except you should still be able to use:

rpcinfo -n 2049 -t {host} nfs 4

And then I believe it will not attempt to contact rpcbind.

Also, unless the Ganesha root pseudofs has been restrictively configured, the following would work for a reasonable liveness test:

set host=a.b.c.d && mkdir /tmp/mnt-$host && mount $host:/ /tmp/mnt-$host && umount /tmp/$host && rmdir /tmp/mnt-$host

Maybe put that in a shell script so it can make sure to clean up /tmp/$host even if the mount fails.

synarete · 2023-10-17T08:02:48Z

rpcinfo -n 2049 -t {host} nfs 4

A bit of context: I deploy a simple rook+cephfs+nfs setup on OpenShift (4.13, AWS). All pods run ok. There isn't rpcbind process runnings, and rpcinfo fails with Connection refused both when run from within the nfs-ganesha container or from external pod:

[root@centos /]# rpcinfo -n 2049 -t 10.129.2.28 nfs 4                                                      
10.129.2.28: RPC: Remote system error - Connection refused

[root@rook-ceph-my-nfs /]# rpcinfo -n 2049 -t 127.0.0.1 nfs 4                                              
127.0.0.1: RPC: Remote system error - Connection refused

@ffilz Please let us know if anything is missing in the config file:

[root@rook-ceph-my-nfs /]# cat /etc/ganesha/ganesha.conf 

NFS_CORE_PARAM {
        Enable_NLM = false;
        Enable_RQUOTA = false;
        Protocols = 4;
}

MDCACHE {
        Dir_Chunk = 0;
}

EXPORT_DEFAULTS {
        Attr_Expiration_Time = 0;
}

NFSv4 {
        Delegations = false;
        RecoveryBackend = 'rados_cluster';
        Minor_Versions = 1, 2;
}

RADOS_KV {
        ceph_conf = "/etc/ceph/ceph.conf";
        userid = nfs-ganesha.my-nfs.a;
        nodeid = my-nfs.a;
        pool = ".nfs";
        namespace = "my-nfs";
}

RADOS_URLS {
        ceph_conf = "/etc/ceph/ceph.conf";
        userid = nfs-ganesha.my-nfs.a;
        watch_url = 'rados://.nfs/my-nfs/conf-nfs.my-nfs';
}

RGW {
        name = "client.nfs-ganesha.my-nfs.a";
}

%url    rados://.nfs/my-nfs/conf-nfs.my-nfs

ffilz · 2023-10-17T19:01:42Z

You can do:

rpcinfo -a 1.2.3.4.8.1 -T tcp nfs 4

Where 1.2.3.4 is the IP address of the server (or you can use an IPv6 address). The .8.1 at the end is port 2049 in the right format.

-n is supposed to work, but got broken...

synarete · 2023-10-18T08:55:32Z

rpcinfo -a 1.2.3.4.8.1 -T tcp nfs 4

@ffilz Many tanks for this hint. Using this notation I was able to successfully send-recv rpcinfo to the nfs-ganesha server, both from within the container itself and from external pod:

[root@rook-ceph-my-nfs /]# rpcinfo -a 127.0.0.1.8.1 -T tcp nfs 4
program 100003 version 4 ready and waiting

[root@centos /]# rpcinfo -a 10.131.0.30.8.1 -T tcp nfs 4
program 100003 version 4 ready and waiting

@BlaineEXE I will now fix this PR accordingly.

BlaineEXE · 2023-10-23T14:06:22Z

rook/pkg/apis/ceph.rook.io/v1/types.go

Lines 2785 to 2794 in 38c262c

    
           // ProbeSpec is a wrapper around Probe so it can be enabled or disabled for a Ceph daemon 
        
           type ProbeSpec struct { 
        
           	// Disabled determines whether probe is disable or not 
        
           	// +optional 
        
           	Disabled bool `json:"disabled,omitempty"` 
        
           	// Probe describes a health check to be performed against a container to determine whether it is 
        
           	// alive or ready to receive traffic. 
        
           	// +optional 
        
           	Probe *v1.Probe `json:"probe,omitempty"` 
        
           }

BlaineEXE · 2023-10-26T19:01:04Z

pkg/operator/ceph/nfs/spec.go

+	// liveness-probe using K8s builtin TCP-socket action
+	return controller.GenerateLivenessProbeTcpPort(nfsPort, failureThreshold)


I like the idea of having a backup option when RPC isn't available!

BlaineEXE

Overall, this looks pretty great. Very concise and clear.

The only thing that I think needs updated is slight adjustment to the user-defined behavior of the probe. You can look into how this is done in the RGW code for an example:

rook/pkg/operator/ceph/object/spec.go

Line 425 in ccd961b

    
           func configureReadinessProbe(container *v1.Container, healthCheck cephv1.ObjectHealthCheckSpec) {

Also, please let me know if you'd like me to take over the work for any reason, no questions asked. :)

pkg/operator/ceph/nfs/spec.go

BlaineEXE · 2023-10-26T19:04:15Z

pkg/operator/ceph/nfs/spec_test.go

+					LivenessProbe: &cephv1.ProbeSpec{
+						Disabled: false,
+					},


Is it possible to also verify that the probe is present in the output?

With adding the ability to override the probe also, I think it would be good to override one of the properties here and check that the output has default values for the probe, except for the overridden value.

I added a dedicate test to check livness-probe settings. Naturally, we may check the correctness of rpcinfo as liveness-probe only against real nfs-ganesha container.

BlaineEXE · 2023-10-30T18:16:35Z

There are ongoing NFS CI issues that I'm starting to look into. I think it's important to resolve those before moving forward here. Sorry for the added delay here.

Allow end-user to have full control on NFS-Ganesha liveness probe: - Disabled - Enabled, using user-provided definition - Enabled, using Rook's default liveness probe for Ganesha. Signed-off-by: Shachar Sharon <ssharon@redhat.com>

Use K8s LivenessProbe mechanism to check OK-status of nfs-ganesha container. A user may define his own lineness-probe, or a default one which expects NFS TCP-port 2049 to be active; that is, willing to accept new connections: for Ceph>=18.2.1 issue 'rpcinfo' call on local pod; otherwise use standard K8s TCP-socket liveness probe mechanism. Define permissive values to liveness-probe to ensure that the NFS service is defined in failed-state only when it has non-recoverable error. The current default definition of LivenessProbe is expected to guard the nfs pod from at least the following two cases: - Deadlocks: where an nfs-ganesha server is running, but unable serve new connections due to internal bad-state. - Resource exhaustion on the host node (e.g. OOM) which prevents the server from accepting new connections and reply to NULL RPC request. In both cases we expect K8s to reschedule the nfs pod, most likely on different host node. Refs rook issue rook#12719 Signed-off-by: Shachar Sharon <ssharon@redhat.com>

BlaineEXE

Looks great! I did a manual test also, and the probe defaulting behavior works as expected. Thanks @synarete, and sorry for the delay in reviewing.

nfs: add livness-probe to nfs-ganesha container (backport #12845)

subhamkrai reviewed Sep 4, 2023

View reviewed changes

pkg/operator/ceph/nfs/spec.go Outdated Show resolved Hide resolved

synarete force-pushed the nfs-liveness-probe branch from 49f7c71 to da46467 Compare September 4, 2023 17:19

travisn requested a review from BlaineEXE September 5, 2023 18:01

synarete force-pushed the nfs-liveness-probe branch 2 times, most recently from b0f160a to fc1e7c7 Compare September 7, 2023 06:08

synarete force-pushed the nfs-liveness-probe branch from fc1e7c7 to af2fd12 Compare October 18, 2023 18:08

synarete force-pushed the nfs-liveness-probe branch 2 times, most recently from 0181c98 to bec8dc3 Compare October 26, 2023 07:43

BlaineEXE reviewed Oct 26, 2023

View reviewed changes

BlaineEXE requested changes Oct 26, 2023

View reviewed changes

synarete force-pushed the nfs-liveness-probe branch from bec8dc3 to ece4371 Compare October 29, 2023 13:50

synarete force-pushed the nfs-liveness-probe branch 4 times, most recently from 7c8be8e to 2a88ae4 Compare October 30, 2023 16:27

synarete force-pushed the nfs-liveness-probe branch 4 times, most recently from 7e0eaeb to 1776528 Compare November 7, 2023 12:05

synarete force-pushed the nfs-liveness-probe branch from 1776528 to abd8510 Compare November 9, 2023 14:25

synarete added 2 commits November 13, 2023 16:21

nfs: add optional LivenessProbe to GaneshaServerSpec

7c3e913

Allow end-user to have full control on NFS-Ganesha liveness probe: - Disabled - Enabled, using user-provided definition - Enabled, using Rook's default liveness probe for Ganesha. Signed-off-by: Shachar Sharon <ssharon@redhat.com>

synarete force-pushed the nfs-liveness-probe branch from abd8510 to e05184d Compare November 13, 2023 14:21

BlaineEXE approved these changes Nov 13, 2023

View reviewed changes

BlaineEXE added the backport-release-1.12 label Nov 13, 2023

BlaineEXE merged commit 5230fcd into rook:master Nov 13, 2023
48 of 51 checks passed

mergify bot mentioned this pull request Nov 13, 2023

nfs: add livness-probe to nfs-ganesha container (backport #12845) #13198

Merged

mergify bot added a commit that referenced this pull request Nov 13, 2023

Merge pull request #13198 from rook/mergify/bp/release-1.12/pr-12845

997289b

nfs: add livness-probe to nfs-ganesha container (backport #12845)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nfs: add livness-probe to nfs-ganesha container #12845

nfs: add livness-probe to nfs-ganesha container #12845

synarete commented Sep 4, 2023

BlaineEXE commented Sep 7, 2023 •

edited

BlaineEXE commented Sep 7, 2023

synarete commented Sep 11, 2023

BlaineEXE commented Sep 11, 2023

BlaineEXE commented Sep 11, 2023

BlaineEXE commented Sep 11, 2023

BlaineEXE commented Sep 11, 2023

obnoxxx commented Sep 11, 2023

BlaineEXE commented Sep 11, 2023 •

edited

BlaineEXE commented Sep 19, 2023

BlaineEXE commented Sep 19, 2023

synarete commented Oct 16, 2023 •

edited

ffilz commented Oct 16, 2023

BlaineEXE commented Oct 16, 2023 •

edited

ffilz commented Oct 16, 2023

synarete commented Oct 17, 2023

ffilz commented Oct 17, 2023

synarete commented Oct 18, 2023

BlaineEXE commented Oct 23, 2023

BlaineEXE Oct 26, 2023

BlaineEXE left a comment

BlaineEXE Oct 26, 2023

synarete Oct 29, 2023

BlaineEXE commented Oct 30, 2023

BlaineEXE left a comment

		// liveness-probe using K8s builtin TCP-socket action
		return controller.GenerateLivenessProbeTcpPort(nfsPort, failureThreshold)

nfs: add livness-probe to nfs-ganesha container #12845

nfs: add livness-probe to nfs-ganesha container #12845

Conversation

synarete commented Sep 4, 2023

BlaineEXE commented Sep 7, 2023 • edited

BlaineEXE commented Sep 7, 2023

synarete commented Sep 11, 2023

BlaineEXE commented Sep 11, 2023

BlaineEXE commented Sep 11, 2023

BlaineEXE commented Sep 11, 2023

BlaineEXE commented Sep 11, 2023

obnoxxx commented Sep 11, 2023

BlaineEXE commented Sep 11, 2023 • edited

BlaineEXE commented Sep 19, 2023

BlaineEXE commented Sep 19, 2023

synarete commented Oct 16, 2023 • edited

ffilz commented Oct 16, 2023

BlaineEXE commented Oct 16, 2023 • edited

ffilz commented Oct 16, 2023

synarete commented Oct 17, 2023

ffilz commented Oct 17, 2023

synarete commented Oct 18, 2023

BlaineEXE commented Oct 23, 2023

BlaineEXE Oct 26, 2023

Choose a reason for hiding this comment

BlaineEXE left a comment

Choose a reason for hiding this comment

BlaineEXE Oct 26, 2023

Choose a reason for hiding this comment

synarete Oct 29, 2023

Choose a reason for hiding this comment

BlaineEXE commented Oct 30, 2023

BlaineEXE left a comment

Choose a reason for hiding this comment

BlaineEXE commented Sep 7, 2023 •

edited

BlaineEXE commented Sep 11, 2023 •

edited

synarete commented Oct 16, 2023 •

edited

BlaineEXE commented Oct 16, 2023 •

edited