fix: address 10 security, stability, and correctness issues#511
fix: address 10 security, stability, and correctness issues#511
Conversation
Add json:"-" tags to prevent credential leak via the config API endpoint. Without this, an authenticated user (or anyone when auth is off) can retrieve the bcrypt hash and JWT signing key, enabling token forgery. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Use sync.Once to protect the done channel close, preventing a runtime panic when multiple goroutines (shutdown signal, pprof error, web server error) attempt to close it concurrently. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Protect job map access in dockerContainersUpdate and iniConfigUpdate with a sync.RWMutex. Without this, simultaneous Docker events and config file reloads cause fatal concurrent map read/write panics. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Replace *BareJob type assertion with a DependencyProvider interface so that ExecJob, RunJob, LocalJob, and all types embedding BareJob have their depends-on/on-success/on-failure configurations correctly collected. Previously the type assertion always failed for concrete job types, silently breaking the entire DAG/workflow feature. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Replace context.Background() with ctx.Ctx in ExecJob, RunJob, and RunServiceJob so that scheduler shutdown, job removal, and max-runtime cancellation properly propagate to Docker API calls instead of being silently ignored. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Always require CSRF tokens for login requests regardless of the X-Requested-With header. The previous bypass allowed cross-origin requests to skip CSRF validation entirely by setting this freely controllable header. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Add lastAccess tracking to RateLimiter and implement CleanupOldLimiters with a maxAge parameter. Previously the limiters map grew unbounded as new client IPs made login attempts, enabling memory exhaustion attacks. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Group hooks by priority and execute groups sequentially (lower priority first). Hooks within the same priority group still run concurrently. Previously all hooks ran concurrently regardless of priority, defeating the purpose of the priority system and causing the scheduler to shut down simultaneously with the HTTP server. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Previously watchContainer returned nil when a service task exited with a non-zero code, silently reporting success. Now it returns a proper NonZeroExitError, matching the behavior of ExecJob and RunJob. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Ignore X-Forwarded-For and X-Real-IP when the direct connection is not from a loopback address. This prevents attackers from spoofing IPs to bypass per-IP rate limiting. Connections from local reverse proxies (127.0.0.1, ::1) continue to trust forwarded headers. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
✅ Mutation Testing ResultsMutation Score: 71.88% (threshold: 60%)
What is mutation testing?Mutation testing measures test quality by introducing small changes (mutations) to the code and checking if tests detect them. A higher score means better test effectiveness.
|
There was a problem hiding this comment.
Pull request overview
Addresses a set of security, stability, and correctness issues across the web auth layer, scheduler/workflow wiring, Docker job execution, and daemon/config concurrency.
Changes:
- Harden web auth and request handling (CSRF enforcement, rate-limiter cleanup, and forwarded-IP trust rules) and update/add tests accordingly.
- Fix core correctness around workflow dependency extraction, Docker context cancellation propagation, shutdown hook ordering, and swarm service exit-code handling.
- Improve daemon/config robustness (idempotent done-channel close, mutex-protected config job maps, and updated tests).
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| web/middleware.go | Rate limiter now extracts host from RemoteAddr and only trusts XFF from loopback. |
| web/middleware_mutation_test.go | Align tests with host-only rate-limit keying and trusted-loopback XFF behavior. |
| web/auth_secure.go | Implement limiter cleanup with last-access tracking; enforce CSRF token for all login requests; tighten client IP extraction. |
| web/auth_secure_ext_test.go | Add/adjust tests for CSRF bypass removal, limiter cleanup, and trusted proxy handling in getClientIP. |
| web/auth_secure_coverage_unit_test.go | Update cleanup tests for new CleanupOldLimiters signature and behavior. |
| web/auth_integration_test.go | Fetch CSRF token via /api/csrf-token and use it for /api/login integration flows. |
| web/missing_coverage_test.go | Update cleanup invocation to pass maxAge. |
| core/workflow.go | Replace *BareJob assertion with DependencyProvider interface to support embedded BareJob types. |
| core/workflow_test.go | Update/add tests ensuring dependencies are collected for embedded BareJob job types (incl. ExecJob). |
| core/execjob.go | Propagate ctx.Ctx to provider instead of context.Background(). |
| core/runjob.go | Propagate ctx.Ctx through Docker operations for cancellation correctness. |
| core/runservice.go | Propagate ctx.Ctx; return NonZeroExitError for non-zero swarm task exit codes. |
| core/runservice_unit_test.go | Add tests for watchContainer/run non-zero exit behavior (needs loop-var capture fix for parallel subtests). |
| core/context_propagation_test.go | New tests verifying ctx propagation into provider calls for ExecJob/RunJob/RunServiceJob. |
| core/bare_job.go | Add dependency accessors to support DependencyProvider interface. |
| core/shutdown.go | Change shutdown hook execution to run concurrently within priority groups and sequentially between groups. |
| core/shutdown_test.go | Add tests for priority-group ordering and same-priority concurrency. |
| cli/daemon.go | Prevent double-close panics on done channel via sync.Once + helper. |
| cli/daemon_closedone_test.go | New tests verifying closeDone is safe concurrently and sequentially. |
| cli/config.go | Add RWMutex to protect job maps; lock around label/INI sync and use scheduler accessors for middleware updates. |
| cli/config_test.go | Fix loop variable capture and avoid deep-equality on Config containing a mutex by comparing relevant fields. |
| cli/config_security_test.go | Ensure JSON marshaling of Global config does not leak secret fields. |
| cli/config_concurrent_test.go | New race-focused tests validating locked concurrent access patterns to job maps. |
Comments suppressed due to low confidence (1)
core/shutdown.go:153
- When shutdown hooks return errors, Shutdown() currently wraps them with ErrShutdownTimeout ("shutdown timed out") even though no timeout occurred. This makes it hard for callers to distinguish hook failures from an actual timeout and is semantically incorrect. Consider returning a dedicated hook-failure sentinel (or errors.Join of the hook errors) and reserve ErrShutdownTimeout only for ctx deadline expiry.
sm.logger.Info("Graceful shutdown completed successfully")
if len(shutdownErrors) > 0 {
return fmt.Errorf("%w: %d errors occurred", ErrShutdownTimeout, len(shutdownErrors))
}
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a significant upgrade to the codebase's resilience and security posture. It systematically addresses a total of ten identified issues across security, stability, and correctness domains. The changes ensure that sensitive data is protected, concurrent operations are safe, and critical system behaviors, such as job dependencies and graceful shutdowns, function as expected under various conditions. The thorough test coverage provides confidence in the reliability of these improvements. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Add tc := tc before t.Run to prevent race on the loop variable when subtests call t.Parallel(). Without this, parallel subtests may observe the wrong tc values (typically the last iteration). Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Replace blocking wg.Wait() with a select on both wg completion and ctx.Done() per priority group. This ensures ErrShutdownTimeout is returned within the configured deadline even when a hook ignores its context and blocks indefinitely. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
…only Add web-trusted-proxies config option (CLI: --web-trusted-proxies, env: OFELIA_WEB_TRUSTED_PROXIES) accepting CIDRs like "172.17.0.0/16". When set, X-Forwarded-For is trusted from both loopback and the configured ranges. This supports reverse proxies in other containers or on non-loopback host IPs (e.g. nginx/traefik in Docker networks). Loopback addresses remain implicitly trusted regardless of config. Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive set of fixes for 10 distinct security, stability, and correctness issues. The changes are well-structured, and each fix is accompanied by dedicated tests, many of which follow a TDD approach. Key improvements include patching a credential leak via the API, fixing a CSRF bypass, preventing a rate-limiter DoS, and hardening against IP spoofing. Stability is enhanced by preventing double-close panics and concurrent map access crashes. Correctness is improved by fixing workflow dependencies, ensuring context propagation for cancellation, respecting shutdown hook priorities, and correctly handling Swarm exit codes. My review found one area for improvement regarding the consistency of an IP spoofing fix. Overall, this is an excellent and impactful set of changes.
Note: Security Review did not run due to the size of the PR.
- Remove Go 1.22+ unnecessary tc := tc rebinding (copyloopvar) - Fix import ordering (gci) - Use integer range loops (intrange) - Fix fatcontext: avoid reassigning context variable - Use wrapped static error for ParseTrustedProxies (err113) - Use require instead of assert for error checks (testifylint) - Guard against nil ctx.Ctx in ExecJob/RunJob/RunServiceJob.Run() to prevent panic in tests that construct Context manually Signed-off-by: Sebastian Mendel <sebastian.mendel@netresearch.de> Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
- Use integer range (Go 1.22+) in daemon_closedone_test.go - Fix gci formatting (missing spaces) in context_propagation_test.go - Add nolint:fatcontext for intentional context captures in test assertions Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
Aligns the rate limiter's forwarded header handling with getClientIP in auth_secure.go, which already checks both X-Forwarded-For and X-Real-IP from trusted proxies. Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
|
🚀 Released in v0.21.2 Thank you for your contribution! 🙏 This is now available in the latest release. Please test and verify everything works as expected in your environment. If you encounter any issues, please open a new issue. |
Summary
Comprehensive codebase review uncovered 10 security, stability, and correctness issues. Each fix has dedicated tests written first (TDD), all passing with
-racedetection.Security fixes (4)
json:"-"onWebPasswordHash/WebSecretKeyto hide them from/api/configX-Requested-Withheader bypass that allowed skipping CSRF validationCleanupOldLimiterswithlastAccesstracking (was an empty stub)X-Forwarded-For/X-Real-IPfrom loopback addressesStability fixes (2)
sync.Onceon daemondonechannel to prevent crash when multiple goroutines detect errorssync.RWMutexon Config job maps to prevent crash when Docker events race with config reloadCorrectness fixes (4)
DependencyProviderinterface instead of*BareJobtype assertion sodepends-on/on-success/on-failurework for ExecJob, RunJob, etc.ctx.Ctxinstead ofcontext.Background()to Docker API callsNonZeroExitErrorfor non-zero service task exit codesTest plan
go test -race ./core/ ./web/ ./cli/ ./config/ ./middlewares/ ./metrics/— all passgo vet ./...— clean