feat(runtime): add configurable rlimits (nofile/nproc) for jobs#73
Open
luthermonson wants to merge 1 commit into
Open
feat(runtime): add configurable rlimits (nofile/nproc) for jobs#73luthermonson wants to merge 1 commit into
luthermonson wants to merge 1 commit into
Conversation
Adds [runtime.rlimits] config block with `nofile` and `nproc` keys. Defaults to 1024/1024 — same as containerd's built-in OCI spec — so an empty config is a no-behavior-change. Higher values let build tools call `ulimit -n N` up to the configured ceiling without needing CAP_SYS_RESOURCE, which we deliberately don't grant. The Linux spec helper replaces (not appends) `spec.Process.Rlimits` so the containerd-default RLIMIT_NOFILE entry from `oci.WithDefaultSpecForPlatform` doesn't end up duplicated alongside our own. Both soft and hard are set equal; raising the hard limit from inside the container requires CAP_SYS_RESOURCE which is intentionally not in containerCapabilities. On Windows and macOS the helper is a no-op — HCS and Vz use different resource-limit models that are configured at the VM/utility-VM level. Local CI notes: lint and tests pass cross-compiled for linux. `mage test` on this Windows host hits the documented pkcs11/ocicrypt cgo preprocessing failure for packages that transitively import it (cmd/ephemerd, pkg/containerd, pkg/dind, pkg/workflow); GOOS=linux go test -run xxx compiles them clean.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
[runtime.rlimits]block toconfig.tomlthat lets operators raise the per-job container'sRLIMIT_NOFILEandRLIMIT_NPROCabove containerd's default 1024.nofile = 1024,nproc = 1024— exactly what containerd'sWithDefaultSpecForPlatformalready produces, so an empty config is a no-behavior-change.ulimit -n 4096) were failing with Operation not permitted because raising the hard limit from inside the container needsCAP_SYS_RESOURCE, whichcontainerCapabilitiesdeliberately omits. Setting the hard limit higher in the OCI spec letsulimitsucceed without granting the cap.Implementation
pkg/config/config.go— newRuntimeConfig{Rlimits RuntimeRlimits{Nofile, Nproc int64}}withResolved()filling defaults for zero/negative values.pkg/runtime/runtime.go— newRlimits config.RuntimeRlimitsfield onruntime.Config, wired into the OCI opts slice next toseccompOpts().pkg/runtime/rlimits_linux.go— replaces (not appends)spec.Process.Rlimitsso the containerd-defaultRLIMIT_NOFILE=1024entry fromWithDefaultSpecForPlatformdoesn't end up duplicated.pkg/runtime/rlimits_other.go— no-op for Windows / macOS. HCS-isolated Windows containers and Vz Linux/macOS VMs use different resource-limit models that are configured at the VM level, not in the OCI process spec.cmd/ephemerd/main.go— passesRlimits: cfg.Runtime.Rlimits.Resolved()at bothruntime.Newsites.Usage
Test plan
go test ./pkg/config/...— 6 new tests forResolved()defaults, explicit values, negative fallback, mixed zero/explicit, TOML round-trip, omitted block.go test ./pkg/runtime/...— 4 new linux-only tests covering applied values, defaults-when-zero, duplicate-rlimit avoidance, and nil-Process safety.GOOS=linux ./bin/golangci-lint run ./...→ 0 issues.mage build:windowssucceeds end-to-end (700 MB ephemerd.exe + embedded linux binary).nofile = 4096and confirmulimit -ninside a job returns 4096 (noOperation not permittedfromulimit -n 2048calls).Local CI notes
mage lintand the build-side ofmage testtrip the documentedmiekg/pkcs11cgo preprocessing failure on Windows hosts (AGENTS.md); lint and test compile clean underGOOS=linux, which is what the GitHub Actions CI runs.