Auto-reload pipelines on TLS certificate rotation#18978
Auto-reload pipelines on TLS certificate rotation#18978kaisecheng merged 37 commits intoelastic:mainfrom
Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request does not have a backport label. Could you fix it @kaisecheng? 🙏
|
0090a90 to
db336a9
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds automatic pipeline reloads when TLS certificate/key/CA files change on disk, enabling hot-reload of pipelines affected by certificate rotation (including Kubernetes-style symlink swaps) without a Logstash restart.
Changes:
- Introduces a Java
FileWatchService(NIOWatchService) with per-file callbacks and accompanying unit tests. - Adds a Ruby
LogStash::SslFileTrackerto register/deregister SSL-related paths per pipeline and detect staleness via checksum (regular files) or mtime polling (symlinks), with specs. - Wires the tracker into
LogStash::Agentand all pipeline lifecycle actions; adds QA integration coverage including a TLS-enabled Elasticsearch fixture/service.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| qa/integration/specs/tls_hot_reload_spec.rb | New integration suite covering reload behavior for regular files, symlinks, shared certs, and ES output CA changes. |
| qa/integration/services/service_locator.rb | Updates service class name resolution to support underscored service names. |
| qa/integration/services/http_proxy_service.rb | Renames service class to match updated locator naming (HttpProxyService). |
| qa/integration/services/elasticsearch_tls_service.rb | Adds a TLS-enabled Elasticsearch service wrapper for integration tests. |
| qa/integration/services/elasticsearch_setup.sh | Adds TLS-capable ES startup path and copies certs into ES config for entitlement constraints. |
| qa/integration/services/elasticsearch_teardown.sh | Cleans ES data/logs and TLS-specific artifacts/users on teardown. |
| qa/integration/framework/cert_helpers.rb | Adds helper functions to generate/write CA/leaf certs for integration testing. |
| qa/integration/fixtures/tls_hot_reload_spec.yml | Adds fixture definition enabling elasticsearch_tls alongside logstash. |
| logstash-core/src/main/java/org/logstash/common/FileWatchService.java | New core Java file watcher with callback dispatching and lazy watcher thread. |
| logstash-core/src/test/java/org/logstash/common/FileWatchServiceTest.java | JUnit tests validating modify/rename events, multi-callback behavior, deregistration, and multi-dir support. |
| logstash-core/lib/logstash/ssl_file_tracker.rb | New tracker that maps SSL file paths to pipelines and detects changes (checksum vs symlink mtime). |
| logstash-core/lib/logstash/agent.rb | Creates/owns watcher+tracker when auto-reload is enabled; checks for stale pipelines each converge and triggers reload actions. |
| logstash-core/lib/logstash/pipeline_action/create.rb | Registers SSL paths before pipeline start; deregisters on create failure. |
| logstash-core/lib/logstash/pipeline_action/reload.rb | Deregisters old pipeline paths, registers new pipeline paths, and cleans up on failed start. |
| logstash-core/lib/logstash/pipeline_action/recover.rb | Mirrors reload behavior for recover: deregister old, register new, deregister on failed start. |
| logstash-core/lib/logstash/pipeline_action/stop.rb | Deregisters SSL paths when stopping a pipeline. |
| logstash-core/lib/logstash/pipeline_action/stop_and_delete.rb | Deregisters SSL paths when stopping/deleting a pipeline. |
| logstash-core/lib/logstash/pipeline_action/delete.rb | Deregisters SSL paths after successful delete. |
| logstash-core/spec/logstash/ssl_file_tracker_spec.rb | New unit specs for registration, shared paths, stale detection, and symlink polling behaviors. |
| logstash-core/spec/logstash/agent_spec.rb | Adds specs for tracker presence/absence, SSL converge behavior, and converge-result merging. |
| logstash-core/spec/logstash/pipeline_action/create_spec.rb | Stubs agent.ssl_file_tracker and keeps existing create behavior tests passing. |
| logstash-core/spec/logstash/pipeline_action/reload_spec.rb | Adds expectations for tracker deregister/register ordering on reload success/failure. |
| logstash-core/spec/logstash/pipeline_action/stop_spec.rb | Stubs agent.ssl_file_tracker for updated stop action behavior. |
| logstash-core/spec/logstash/pipeline_action/stop_and_delete_spec.rb | Stubs agent.ssl_file_tracker for updated stop-and-delete action behavior. |
| logstash-core/spec/logstash/pipeline_action/delete_spec.rb | Stubs agent.ssl_file_tracker for updated delete action behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bfc0f16 to
dc75eea
Compare
| new_pipeline = LogStash::JavaPipeline.new(@pipeline_config, @metric, agent) | ||
| agent.ssl_file_tracker&.register(new_pipeline) | ||
| success = new_pipeline.start # block until the pipeline is correctly started or crashed | ||
| # Keep the SSL file registered on failure so subsequent certificate recovery can be detected. Do not deregister it. |
There was a problem hiding this comment.
We do not deregister SSL tracking on reload failure.
Reload may fail because the new TLS material is temporarily invalid. In that situation we still want to observe future file changes, because the next cert update may repair the problem.
andsel
left a comment
There was a problem hiding this comment.
Given the PR touches 3 layers I've started to scratch the first one, the file watch service. While we progress I'll look into the rest, else the review becomes a nightmare.
Implement FileWatchService using Java NIO WatchService to detect file modifications via OS kernel notifications. Supports registering per-file callbacks within watched directories, lazy-starts the watcher thread on first registration, and deregisters watch keys when all files in a directory are removed.
SslFileTracker wraps FileWatchService to track SSL certificate files across pipelines. It maintains reference counts per path so shared certs are only deregistered when the last pipeline releases them. Detection uses a hybrid strategy chosen at registration time: - Regular files: OS kernel events via FileWatchService, gated by SHA-256 checksum to suppress spurious notifications - Symlinks: mtime polling on each converge cycle, since NIO WatchService tracks the symlink pointer rather than the link target
Wire SslFileTracker into Agent so that when a TLS certificate file changes, the affected pipelines are automatically reloaded without requiring a full Logstash restart. - Agent.execute injects SslFileTracker when auto-reload is enabled - converge_state_and_update calls stale_pipelines on each cycle and queues Reload actions for pipelines with changed certs - Pipeline actions (Create, Reload, Stop, Delete, StopAndDelete) call register/deregister on SslFileTracker to keep tracked paths in sync with the live pipeline set
- regular file rotation triggers exactly one pipeline reload then stays stable - symlink target swap is detected via mtime poll and reloads the pipeline - rotating one pipeline cert does not reload the other (isolation) - shared cert rotation reloads all pipelines referencing it - ES output CA rotation reloads the pipeline and events continue flowing to ES - invalid CA cert rotation triggers reload failure and stops sending events to ES
…ch cycle Previously, refresh_pipeline_symlink_stamps compared each pipeline's registered baseline stamp against the latest stamp for every tracked SSL file on every converge cycle — O(pipelines × SSL files) work repeated each interval. Now, SslFileTracker maintains a @stale_pipeline_ids Set. Both :watch paths (FileWatchService callbacks) and :poll paths (symlink mtime polls) write directly to this Set when a change is detected. Each converge reads the Set in O(1) instead of scanning all stamps.
Two plugins referencing the same cert file via different path styles (one relative, one absolute) would previously create two separate @watched_files entries and register the file with FileWatchService twice. ssl_file_paths now expands every path with File.expand_path before deduplication, so the same file is always tracked as a single entry regardless of how it was declared in config.
…d stamps
- Remove baseline check from watch callback
- Replace @registered_stamps with @id_paths ({ id => [path] })
baseline stamps are no longer needed
- Rename @stale_pipeline_ids to @stale_ids
- Rename @watched_files to @path_watched
Without this, a failed reload stops future cert change recovery because the tracker is deregistered. Added integration test for invalid cert recovery. Shortened the integration test wait time.
Add a new `ssl.reload.automatic` boolean (default false) that controls whether Logstash watches SSL cert and key files referenced by pipelines and triggers a pipeline reload when they change. The watcher and file tracker only start when both `ssl.reload.automatic` and `config.reload.automatic` are enabled, or when Centralized Pipeline Management is in use (CPM flips config.reload.automatic on at boot). A bootstrap check fails fast if `ssl.reload.automatic: true` is set without a compatible reload mode. The setting is exposed through the docker env2yaml mapping and documented in the sample logstash.yml.
|
@andsel Changes after applying your suggestion
|
andsel
left a comment
There was a problem hiding this comment.
Left some suggestion to avoid to expose too much of internal state and suggestion to increase readability.
This reverts commit 228e2ea.
…ine-reload-certs
|
@andsel this is ready for another look. Red CI is caused by the latest update of jruby-openssl |
|
@kaisecheng I think we can rebase to |
andsel
left a comment
There was a problem hiding this comment.
LGTM
Tested creating key and cert with
openssl req -x509 -newkey rsa:2048 \
-keyout key_new.pem -out cert_new.pem \
-days 365 -nodes \
-subj "/CN=localhost" \
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1"And tinkered with copying files, symlinks overwrite, also with invalid cert/key and it worked as expected.
Good work @kaisecheng 👏
Just an ask, rebase on main and prove CI is green, then for me the PR is fine.
|
run exhaustive tests |
1 similar comment
|
run exhaustive tests |
a3658fb to
4c53832
Compare
|
run exhaustive tests |
4c53832 to
c4f0538
Compare
Three cases cannot run reliably on Windows under JRuby: * The poll-mode and shared-symlink contexts depend on File.symlink plus File.utime to advance mtime, which is unreliable on NTFS via JRuby. * The relative-vs-absolute dedup spec uses Pathname#relative_path_from, which raises when Tempfile and the checkout live on different drives (Buildkite Windows runners put Tempfile on C: and checkout on A:).
|
run exhaustive tests |
💚 Build Succeeded
History
|
|
The fixes after approval are for test cases only.
|
Release notes
Logstash can automatically reload affected pipelines when TLS certificate files change on disk. This removes the need for a manual restart or explicit reload trigger after certificate rotation.
Adds opt-in automatic pipeline reload on TLS certificate rotation with a new
ssl.reload.automaticoption that works whenconfig.reload.automaticis enabled and accepts the following values:true: reloads pipelines whose SSL certificate or key files have changed on diskfalse(default): do not watch SSL filesWhat does this PR do?
Introduces the new
ssl.reload.automaticsetting plus three layered components that together enable TLS certificate hot-reload:1.
ssl.reload.automaticsettingNew boolean in
logstash.yml(defaultfalse). Whentrue, Logstash starts the file watcher and SSL tracker and reloads affected pipelines on certificate change. Requiresconfig.reload.automatic: trueor Centralized Pipeline Management to be enabledxpack.management.enabled: true.2.
FileWatchService(Java)Uses Java NIO
WatchServiceto detect file modifications. Supports per-file callbacks within watched directories. Lazily starts the watcher thread on first registration and cleans up watch keys when all files in a directory are deregistered.When a target file has changed, it fires callbacks to update the SHA-256 checksum.
3.
SslFileTracker(Ruby)Wraps
FileWatchServiceand tracks which SSL cert/key paths belong to which pipelines. A path shared by multiple pipelines is only deregistered when the last pipeline releases it. Chooses the detection strategy at registration time:FileWatchService, gated by SHA-256 checksum to suppress duplicate notificationsWatchServicetracks the symlink pointer rather than the link target. This covers the Kubernetes double-symlink scenario.SslFileTrackerregisters SSL paths by scanning plugin config entries whose names start withssl_and validate as:path, plus explicit allowlisted SSL path settings such as certificate authorities and truststores.4. Agent wiring
Agent#initializeconstructsFileWatchServiceandSslFileTrackeronly when bothconfig.reload.automaticandssl.reload.automaticare enabled; otherwiseagent.ssl_file_trackerisniland all call sites become no-opsconverge_state_and_updatecallsrefresh_pipeline_symlink_stampson each cycle and queuesReloadactions for affected pipelines (stale_pipeline_ids)Why is it important/What is the impact to the user?
Without this change, rotating a TLS certificate requires restarting Logstash or manually triggering a reload. This causes an interruption of ingestion.
With this change, when
ssl.reload.automatic: trueis set andconfig.reload.automatic: trueis enabled, Logstash detects the file change and reloads only the affected pipeline automatically. Pipelines using unrotated certs are unaffected.The feature is off by default, so upgrading users see no behaviour change. Users who want it opt in via
ssl.reload.automatic: trueinlogstash.yml(orSSL_RELOAD_AUTOMATIC=truefor docker).For Kubernetes environments, cert-manager rotates certificates via
Secretvolume mounts (atomic symlink swaps). The symlink-aware mtime polling strategy handles this pattern explicitly.Checklist
I have made corresponding changes to the documentationAuthor's Checklist
How to test this PR locally
echo "ssl.reload.automatic: true" >> config/logstash.yml
echo "config.reload.automatic: true" >> config/logstash.yml
Regular file rotation
Symlink rotation (Kubernetes-style)
Related issues