nfsdiag

Website: www.nfsdiag.org · Releases: github.com/lsferreira42/nfsdiag/releases/latest

nfsdiag is a command-line NFS diagnostic tool written in C. You give it an IP or hostname, and it checks everything that usually breaks in NFS: network reachability, rpcbind, NFS versions, mountd, exports, permissions, root squash, locking, stale handles, and performance.

It is not magic, and it will not replace a good server-side analysis. But it narrows down whether the problem is network, NFS config, permissions, UID/GID mapping, or something stranger.

What this tool does

test if rpcbind TCP port 111 is reachable
test if NFS TCP port 2049 is reachable
query the RPC service map from rpcbind (rpcbind v3/v4 DUMP with native IPv6, plus legacy portmapper)
measure TCP connect latency and path MTU towards the server
verify that dynamically registered mountd/lockd/statd ports are reachable through firewalls
fingerprint the server implementation heuristically from its RPC service layout
detect registered NFS, mountd, lockd/NLM and statd/NSM services
test NFS v2, v3 and v4 with RPC NULLPROC (including v4.1 and v4.2 hints)
test mountd v1, v2 and v3; optionally probe RPC over UDP
enumerate exports using mountd
check client prerequisite daemons (nfs-client.target, rpc.gssd, nfs-idmapd)
detect Kerberos tickets and configuration with --krb5, and test which sec=krb5/krb5i/krb5p flavors actually mount
mount exports automatically, trying NFSv4.2 → 4.1 → 4 → 3 in cascade
test selected exports only with repeatable --export, or several exports concurrently with --parallel N
benchmark rsize/wsize/nconnect combinations with --sweep and suggest mount options
parse and verify effective mount options from /proc/self/mountinfo
capture RPC stats (retransmissions, auth refreshes) before and after tests
extract deep latency metrics from /proc/self/mountstats
read NFS server info from /proc/fs/nfsfs/servers (protocol version, active mount count)
run filesystem checks after mount: close-to-open consistency, special files, quotas
test read/traverse permission, directory listing
test POSIX ACLs, NFSv4 ACLs, generic xattrs, and SELinux contexts
test create/write/read/fsync; advanced I/O: copy_file_range, fallocate, O_DIRECT
test advisory locks with fcntl
detect practical root_squash behavior
simulate UID/GID access with supplemental groups
run metadata latency benchmark (create/rename/unlink)
run stale file handle loop looking for ESTALE
test long filenames (255-byte), special characters (spaces, colons, UTF-8 multibyte)
detect NFSv4 delegation activity via /proc/self/mountstats (DELEGRETURN operations)
check for pNFS layouts via /proc/self/mountstats
run external fio benchmarks alongside internal smoke tests
generate JSON and HTML reports; stream NDJSON; emit Prometheus metrics or JUnit XML
serve Prometheus metrics continuously over HTTP with --listen PORT
keep a per-host baseline and compare each run against it with --diff-baseline
emit event categories, stable check_id values and remediation text for automation
write evidence bundles with --output-dir
compare two JSON reports with nfsdiag diff
run local dependency/helper validation with --self-test
run Docker fixture tests for regression checks (14 scenarios)

By default the output is compact. Use --verbose to see all probe steps.

Important note

NFS problems are very environment-dependent. Results can change because of firewall rules, server export options, NFS version, kernel client state, UID/GID mapping, root squash, ACLs, SELinux/AppArmor on the server, server load, or stale file handles that only appear during real use.

If the tool says no ESTALE happened, it only means the tool did not reproduce it during the test window.

Quick start (OCI image)

No compilation needed:

docker run --rm --privileged ghcr.io/lsferreira42/nfsdiag 192.168.1.10

The image is published to ghcr.io/lsferreira42/nfsdiag as :latest and :vX.Y.Z on each release.

Build requirements

Debian / Ubuntu:

sudo apt-get install -y build-essential pkg-config libtirpc-dev nfs-common

Fedora / RHEL:

sudo dnf install -y gcc make pkgconf-pkg-config libtirpc-devel nfs-utils

Build

make                    # build
make check              # unit tests plus CLI self-check
make sbom               # minimal SPDX-style SBOM in build/
sudo make install       # install binary, man page and shell completions to /usr/local

Override prefix:

make PREFIX=/opt/nfsdiag install

Manual compile:

gcc -O2 -Wall -Wextra -D_GNU_SOURCE -I/usr/include/tirpc \
    src/main.c src/mount.c src/network.c src/report.c \
    src/rpc.c src/stats.c src/tests.c src/validation.c \
    -ltirpc -o nfsdiag

Packaging

make deb        # Debian/Ubuntu .deb → build/
make rpm        # Fedora/RHEL .rpm   → build/
make apk        # Alpine .apk (needs Docker) → build/
make packages   # all three

Additional packaging templates live under packaging/:

packaging/Dockerfile for the OCI image
packaging/homebrew/nfsdiag.rb
packaging/aur/PKGBUILD
packaging/nix/flake.nix

Pre-built binaries (amd64 and arm64), packages, SBOM, checksums and provenance are attached to GitHub releases.

Basic usage

sudo ./nfsdiag 192.168.1.10          # full diagnostic
./nfsdiag --verbose 192.168.1.10     # show all steps
./nfsdiag --no-mount 192.168.1.10    # network/RPC only, no mounts
sudo ./nfsdiag --export /data 192.168.1.10   # one export only
sudo ./nfsdiag --read-only 192.168.1.10      # skip write/create tests
sudo ./nfsdiag --dry-run 192.168.1.10        # print what would run, do nothing
./nfsdiag --self-test                         # local dependency/helper checks

Profiles provide safer presets:

sudo ./nfsdiag --profile quick 192.168.1.10
sudo ./nfsdiag --profile safe 192.168.1.10
sudo ./nfsdiag --profile full 192.168.1.10
sudo ./nfsdiag --profile performance 192.168.1.10
sudo ./nfsdiag --profile security 192.168.1.10

Output formats

Default output is tagged text ([OK], [WARN], [FAIL], [INFO]).

Summary table (box-drawing, per-export columns):

sudo ./nfsdiag --output-format=table 192.168.1.10

Streaming NDJSON (one JSON object per event — ideal for log pipelines):

sudo ./nfsdiag --output-format=ndjson 192.168.1.10 | jq 'select(.level=="fail")'

Prometheus / OpenMetrics (emitted at end of run):

sudo ./nfsdiag --output-format=prometheus 192.168.1.10

JSON and HTML reports

JSON to stdout (diagnostic text suppressed):

./nfsdiag --json 192.168.1.10

JSON to file (diagnostic text still on stdout):

./nfsdiag --json=report.json 192.168.1.10

HTML report:

./nfsdiag --html=report.html 192.168.1.10

Suppress stdout when writing to file:

./nfsdiag --quiet --json=report.json 192.168.1.10

Reports include tool version, host, timestamp, system info, per-export results (NFS version, latency, throughput, ACLs), global events, and recommendations. The JSON schema includes schema_version, timestamp_iso8601, duration_sec, event category, stable check_id, severity, and remediation text.

Evidence bundle:

sudo ./nfsdiag --output-dir ./nfsdiag-report 192.168.1.10

This writes JSON, HTML, evidence text and SHA256SUMS for the generated files.

Compare two JSON reports:

./nfsdiag diff before.json after.json

Watch mode

Re-run diagnostics every N seconds (Ctrl-C to stop):

sudo ./nfsdiag --watch 60 192.168.1.10

The terminal is cleared between iterations. All pending mounts are cleaned up on SIGINT.

Multi-host batch

Run against a list of hosts:

sudo ./nfsdiag --hosts-file /etc/nfs-servers.txt --json=audit.json

File format: one host per line; lines starting with # are comments. Use --delay-ms to rate-limit between hosts.

On-failure hook

Execute a script whenever any test fails:

sudo ./nfsdiag --on-fail-exec /usr/local/bin/alert.sh 192.168.1.10

The script receives: NFSDIAG_HOST, NFSDIAG_LEVEL, NFSDIAG_FAIL_COUNT, NFSDIAG_WARN_COUNT. It is invoked through a resolved trusted path with a minimal environment, never via a shell.

Config file

Persist options in a key=value file:

sudo ./nfsdiag --config /etc/nfsdiag.conf 192.168.1.10

Example nfsdiag.conf:

timeout = 10
bench_bytes = 8388608
uid = 1000
gid = 1000

CLI flags override config-file values.

UID/GID and permission tests

sudo ./nfsdiag --uid 1000 --gid 1000 192.168.1.10
sudo ./nfsdiag --uid 1000 --gid 1000 --uid 65534 --gid 65534 192.168.1.10
sudo ./nfsdiag --uid 1000 --gid 1000 --groups 10,20,30 192.168.1.10

Performance and stale handle tests

sudo ./nfsdiag --bench-bytes 167772160 192.168.1.10
sudo ./nfsdiag --bench-iterations 500 192.168.1.10
sudo ./nfsdiag --bench-type=fio 192.168.1.10       # requires fio installed
sudo ./nfsdiag --stale-iterations 1000 192.168.1.10

Safety options

sudo ./nfsdiag --command-timeout 15 192.168.1.10
sudo ./nfsdiag --delay-ms 500 192.168.1.10
sudo ./nfsdiag --mount-namespace 192.168.1.10      # explicit namespace
sudo ./nfsdiag --no-mount-namespace 192.168.1.10   # opt out of automatic namespace
sudo ./nfsdiag --dangerous-fs-tests 192.168.1.10   # enable symlink/hardlink/FIFO/device probes
sudo ./nfsdiag --allow-risky-mount-options -o exec 192.168.1.10

Network/protocol options

./nfsdiag --no-mount --udp 192.168.1.10
./nfsdiag --ipv4-only --no-mount 192.168.1.10
./nfsdiag --ipv6-only --no-mount nfs-server.example.com
sudo ./nfsdiag --no-nfs4-discovery 192.168.1.10

Shell completions

sudo make install already places the bash, zsh, and fish completions. To load them without installing, source them manually:

source completions/nfsdiag.bash          # bash
fpath=(completions $fpath)               # zsh (add to .zshrc before compinit)
cp completions/nfsdiag.fish ~/.config/fish/completions/

Man page

man docs/nfsdiag.8           # view locally

sudo make install installs the man page to the system man path.

Command line reference

Usage: nfsdiag [OPTIONS] <server-ip-or-hostname>

Diagnostic options:
  -e, --export PATH          Test only this export path (repeatable, up to 64)
  -o, --mount-options OPTS   Extra mount options passed to mount(8)
      --no-mount             Run network/RPC checks only; skip all mounts
      --dry-run              Print what would be done; skip mounts and fs tests
      --read-only            Do not create or write test files
      --uid UID              Simulate access as UID (repeatable, needs root)
      --gid GID              GID paired with last --uid
      --groups G1,G2         Supplemental GIDs for UID/GID simulation
      --krb5                 Check Kerberos prerequisites and test sec=krb5/krb5i/krb5p mounts
      --parallel N           Test up to N exports concurrently (1-32). Default: 1
      --sweep                Benchmark rsize/wsize/nconnect combos and suggest mount options
      --diff-baseline        Compare with the last saved run for this host, then update it
      --udp                  Also probe RPC NULLPROC over UDP
      --ipv4-only            Force IPv4 for direct TCP checks
      --ipv6-only            Force IPv6 for direct TCP checks
      --no-nfs4-discovery    Disable NFSv4 pseudo-root fallback
      --mount-namespace      Use private mount namespace (needs root/CAP_SYS_ADMIN)
      --no-mount-namespace   Disable automatic private mount namespace
      --dangerous-fs-tests   Enable symlink/hardlink/FIFO/device-node probes
      --allow-risky-mount-options
                              Permit risky mount options such as exec/suid/dev
                              and skip the default nosuid,nodev,noexec hardening
      --profile NAME         quick, safe, full, performance, security, readonly
      --hosts-file FILE      Read one host per line from FILE
      --watch SEC            Re-run diagnostics every SEC seconds until Ctrl-C
      --on-fail-exec SCRIPT  Execute SCRIPT via trusted path when any test fails
      --config FILE          Load options from FILE (key=value) before CLI args

Timeout options:
      --timeout SEC          Network/RPC connect timeout. Default: 5
      --command-timeout SEC  Timeout for mount/umount commands. Default: 30
      --fs-timeout SEC       Timeout for each filesystem test group. Default: 30
      --delay-ms MS          Delay between testing each export (rate limit). Default: 0

Benchmark options:
      --bench-bytes BYTES    Bytes for read/write benchmark. Default: 4194304
      --bench-iterations N   Metadata latency iterations. Default: 10
      --bench-type TYPE      Benchmark engine: 'internal' or 'fio'. Default: internal
      --stale-iterations N   ESTALE probe loop iterations. Default: 100

Output options:
      --json[=PATH]          Emit JSON report to PATH (use '-' or omit for stdout)
      --html[=PATH]          Emit HTML report to PATH (use '-' or omit for stdout)
      --output-dir DIR       Write JSON, HTML, evidence and checksums to DIR
      --output-format FMT    Terminal output format: text (default), table, ndjson, prometheus, junit
      --listen PORT          Serve Prometheus metrics over HTTP on PORT;
                              re-runs diagnostics every --watch SEC (default 60)
      --keep-temp            Keep temp workspace after tests
  -v, --verbose              Show all diagnostic steps
  -q, --quiet                Suppress stdout (combine with --json=FILE or --html=FILE)
  -V, --version              Print version and exit
      --self-test            Validate local dependencies and helper checks
  -h, --help                 Show this help

Exit codes: 0=pass  1=warn/fail  2=usage/runtime error

Stdout suppression: active only when --json=- or --html=- (report to stdout).
  Use --quiet to suppress stdout when writing a report to a file.

Exit codes

0: no warnings or failures
1: warning or failure found
2: usage error or local runtime error

Warnings return 1 because in automation they usually need attention.

Docker fixtures

The project has Docker fixtures to reproduce bad NFS situations.

make docker-build-all          # build all fixture images
make test-fixtures             # run all fixture tests
make test-fixture-root-squash  # run one fixture
make test-fixtures-list        # list available fixtures

Some tests need root for real kernel NFS mounts. If the host kernel cannot run NFS inside Docker, those cases are skipped.

Warning: fixture configurations use wildcard clients, insecure, and no_root_squash intentionally. These settings are test-only and must never be used in production.

Available fixtures: rpcbind-unreachable, nfs-port-unreachable, rpc-map-missing-nfs, mountd-unavailable, empty-exports, mount-denied, permission-denied, acl-unsupported, identity-denied, read-only-export, root-squash, locking-missing, stale-handle, slow-performance.

Security notes

nfsdiag is designed to run as root. Key mitigations:

Mount operations run in a private mount namespace to avoid polluting the global namespace.
Exports are mounted with nosuid,nodev,noexec by default; disable only with --allow-risky-mount-options.
Host, export path and mount options are validated before network or mount activity.
Risky mount options require --allow-risky-mount-options.
Identity simulation always resets supplemental groups so results reflect the simulated user, not root.
--on-fail-exec scripts and --config files are refused if not owned by root/current user or if group/world-writable.
Output from external commands is sanitised for terminal escape sequences before display.
Symlink, hardlink, FIFO and device-node probes require --dangerous-fs-tests.
External commands are resolved from trusted directories and run with a minimal environment.
Report files are created with O_NOFOLLOW and mode 0600.
Test file paths include cryptographically random bytes (getrandom()) to prevent symlink attacks.
XDR strings from the server are sanitised for control characters before display.
HTML reports include a Content-Security-Policy header; all server-supplied strings are HTML-escaped.
TMPDIR is validated for ownership and world-writability before use.
Child processes that simulate UID/GID clear ambient capabilities before setuid().

To avoid creating test files in exports, use --read-only.

Version bumping

make bump-version-bugfix   # 0.5.0 → 0.5.1
make bump-version-minor    # 0.5.0 → 0.6.0
make bump-version-major    # 0.5.0 → 1.0.0

Each target updates VERSION, src/nfsdiag.h, and all packaging files atomically.

Limitations

ESTALE only appears if the handle becomes stale during the test window
SELinux/AppArmor problems can look like generic permission denied
ACL info depends on what the NFS client exposes
Performance numbers are smoke-test values, not full benchmarks
Docker fixture tests depend on host kernel and Docker privileges

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
completions		completions
dockerfiles		dockerfiles
docs		docs
packaging		packaging
src		src
tests		tests
website		website
.clang-tidy		.clang-tidy
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
VERSION		VERSION
wrangler.jsonc		wrangler.jsonc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nfsdiag

What this tool does

Important note

Quick start (OCI image)

Build requirements

Build

Packaging

Basic usage

Output formats

JSON and HTML reports

Watch mode

Multi-host batch

On-failure hook

Config file

UID/GID and permission tests

Performance and stale handle tests

Safety options

Network/protocol options

Shell completions

Man page

Command line reference

Exit codes

Docker fixtures

Security notes

Version bumping

Limitations

About

Uh oh!

Releases 6

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nfsdiag

What this tool does

Important note

Quick start (OCI image)

Build requirements

Build

Packaging

Basic usage

Output formats

JSON and HTML reports

Watch mode

Multi-host batch

On-failure hook

Config file

UID/GID and permission tests

Performance and stale handle tests

Safety options

Network/protocol options

Shell completions

Man page

Command line reference

Exit codes

Docker fixtures

Security notes

Version bumping

Limitations

About

Topics

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Contributors

Uh oh!

Languages