Fix check config and take over interaction#49632
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GenAI-Assisted: Yes Human-Reviewed: Yes Tool: Cursor-CLI, Model: GPT-5.3 Codex High
- Use files instead of adding the config in the middle of the test - Remove time.Sleep - Fix duplication count
GenAI-Assisted: Yes Human-Reviewed: Yes Tool: Calude, Model: Default Sonnet 4.6
Check whether the Filestream state already exists before taking over a Log input state. This replace the previous logic that relied on the Log input setting the TTL of un-used states to -2. GenAI-Assisted: Yes Human-Reviewed: Yes Tool: Calude, Model: Default Sonnet 4.6
…-and-take-over-interaction
🤖 GitHub commentsJust comment with:
|
Ensure the whole TakeOver code runs holding a lock of the ephemeralStore, thus allowing for a consistent view of the data while the migration is happening.
…-and-take-over-interaction
This comment has been minimized.
This comment has been minimized.
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis pull request moves Filestream state takeover out of 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@filebeat/input/filestream/internal/input-logfile/input.go`:
- Around line 93-95: The takeover path returns early on error from
inp.prospector.TakeOver, but StopInput is only called later, so the input ID can
remain marked active; fix by deferring StopInput immediately before calling
inp.prospector.TakeOver (e.g., place a defer inp.StopInput() right before the
TakeOver call) so StopInput always runs on any early return; ensure you
capture/handle any error from StopInput (log or ignore) as appropriate while
preserving the original TakeOver error return and reference the symbols
inp.prospector.TakeOver, inp.StopInput, and inp.sourceIdentifier.ID when
locating the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: de328f37-0a30-434a-a445-df2fd35e6599
📒 Files selected for processing (13)
.buildkite/filebeat/filebeat-pipeline.ymlchangelog/fragments/1769099693-filestream-takeover-autodiscover-reingest.yamlfilebeat/input/filestream/internal/input-logfile/input.gofilebeat/input/filestream/internal/input-logfile/manager.gofilebeat/input/filestream/internal/input-logfile/manager_test.gofilebeat/input/filestream/internal/input-logfile/prospector.gofilebeat/input/filestream/internal/input-logfile/store.gofilebeat/input/filestream/internal/input-logfile/store_test.gofilebeat/input/filestream/prospector.gofilebeat/magefile.gofilebeat/tests/integration/autodiscover_test.gofilebeat/tests/integration/testdata/autodiscover/take-over-filestream-input-k8s.ymlfilebeat/tests/integration/testdata/autodiscover/take-over-log-input-k8s.yml
filebeat/tests/integration/testdata/autodiscover/take-over-filestream-input-k8s.yml
Outdated
Show resolved
Hide resolved
…-and-take-over-interaction
There was a problem hiding this comment.
Pull request overview
Fixes a Filestream take_over + autodiscover interaction where CheckConfig could trigger state migration to a temporary input ID, causing subsequent re-ingestion from offset 0. The PR moves takeover to the real input start, makes log-state takeover idempotent, and adds integration coverage.
Changes:
- Move Filestream takeover from prospector initialization to a dedicated
TakeOverstep invoked during input start (soCheckConfigcan’t migrate state). - Update registry takeover semantics to be idempotent (skip if Filestream state already exists; don’t delete Log input state).
- Add/extend unit + integration tests for autodiscover takeover and update CI pipeline to build a Docker image needed for Kind-based tests.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| filebeat/input/filestream/prospector.go | Extracts takeover into TakeOver() and stops doing it during Init(). |
| filebeat/input/filestream/internal/input-logfile/prospector.go | Extends Prospector interface with TakeOver() and updates docs/comments accordingly. |
| filebeat/input/filestream/internal/input-logfile/input.go | Invokes prospector.TakeOver(...) during Run(); defers StopInput. |
| filebeat/input/filestream/internal/input-logfile/manager.go | Tracks previousSrcIdentifiers and passes them into managed input for runtime takeover. |
| filebeat/input/filestream/internal/input-logfile/manager_test.go | Updates noop prospector to satisfy the new interface. |
| filebeat/input/filestream/internal/input-logfile/store.go | Changes takeover locking and semantics (don’t delete Log input states; skip takeover when Filestream key exists; resource cleanup fixes). |
| filebeat/input/filestream/internal/input-logfile/store_test.go | Adds tests for takeover-from-Log behavior and idempotency. |
| filebeat/tests/integration/autodiscover_test.go | Adds Kind-based integration test ensuring takeover does not re-ingest under autodiscover; adds helper functions. |
| filebeat/tests/integration/testdata/autodiscover/take-over-log-input-k8s.yml | New autodiscover config template for Log input. |
| filebeat/tests/integration/testdata/autodiscover/take-over-filestream-input-k8s.yml | New autodiscover config template for Filestream takeover input. |
| .buildkite/filebeat/filebeat-pipeline.yml | Builds Docker package image before running Filebeat Go integration tests. |
| changelog/fragments/1769099693-filestream-takeover-autodiscover-reingest.yaml | Adds changelog fragment for the bug fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…om:belimawr/beats into fix-check-config-and-take-over-interaction
|
@Mergifyio backport 9.2 9.3 |
✅ Backports have been createdDetails
Cherry-pick of 8a648cf has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally
Cherry-pick of 8a648cf has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
When using Filestream's take_over feature with autodiscover, files were being re-ingested from the beginning instead of continuing from the offset recorded by the Log input. Autodiscover validates each rendered configuration by instantiating the input with a temporary, suffixed ID before starting it. Because take_over ran during input initialisation, states were migrated to the temporary ID rather than the real input ID. When the real input started, the Log input states had already been consumed, so all files appeared new. The fix moves the take_over migration step from input initialisation to input start. This ensures that config validation (CheckConfig) never triggers state migration, and only the input that actually runs performs the takeover. Additionally, the Log input state is no longer deleted from the registry after migration. Instead, Filestream checks whether it already holds a state for the file before migrating, skipping the takeover if a state is found. This makes the mechanism idempotent and removes reliance on the TTL=-2 heuristic that was used to detect previously-migrated states. Last, but not least, a few other issues in the TakeOver implementation are also fixed: - Incorrect resource release - ephemeralStore is now locked throughout the whole TakeOver duration GenAI-Assisted: Yes Human-Reviewed: Yes Tool: Claude-CLI, Model: Claude 4.6 Opus (Thinking) Tool: Cursor-CLI, Model: GPT-5.3 Codex Extra High --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (cherry picked from commit 8a648cf) # Conflicts: # filebeat/input/filestream/internal/input-logfile/input.go # filebeat/input/filestream/internal/input-logfile/store.go
When using Filestream's take_over feature with autodiscover, files were being re-ingested from the beginning instead of continuing from the offset recorded by the Log input. Autodiscover validates each rendered configuration by instantiating the input with a temporary, suffixed ID before starting it. Because take_over ran during input initialisation, states were migrated to the temporary ID rather than the real input ID. When the real input started, the Log input states had already been consumed, so all files appeared new. The fix moves the take_over migration step from input initialisation to input start. This ensures that config validation (CheckConfig) never triggers state migration, and only the input that actually runs performs the takeover. Additionally, the Log input state is no longer deleted from the registry after migration. Instead, Filestream checks whether it already holds a state for the file before migrating, skipping the takeover if a state is found. This makes the mechanism idempotent and removes reliance on the TTL=-2 heuristic that was used to detect previously-migrated states. Last, but not least, a few other issues in the TakeOver implementation are also fixed: - Incorrect resource release - ephemeralStore is now locked throughout the whole TakeOver duration GenAI-Assisted: Yes Human-Reviewed: Yes Tool: Claude-CLI, Model: Claude 4.6 Opus (Thinking) Tool: Cursor-CLI, Model: GPT-5.3 Codex Extra High --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (cherry picked from commit 8a648cf) # Conflicts: # filebeat/input/filestream/internal/input-logfile/store.go
…9785) When using Filestream's take_over feature with autodiscover, files were being re-ingested from the beginning instead of continuing from the offset recorded by the Log input. Autodiscover validates each rendered configuration by instantiating the input with a temporary, suffixed ID before starting it. Because take_over ran during input initialisation, states were migrated to the temporary ID rather than the real input ID. When the real input started, the Log input states had already been consumed, so all files appeared new. The fix moves the take_over migration step from input initialisation to input start. This ensures that config validation (CheckConfig) never triggers state migration, and only the input that actually runs performs the takeover. Additionally, the Log input state is no longer deleted from the registry after migration. Instead, Filestream checks whether it already holds a state for the file before migrating, skipping the takeover if a state is found. This makes the mechanism idempotent and removes reliance on the TTL=-2 heuristic that was used to detect previously-migrated states. Last, but not least, a few other issues in the TakeOver implementation are also fixed: - Incorrect resource release - ephemeralStore is now locked throughout the whole TakeOver duration GenAI-Assisted: Yes Human-Reviewed: Yes Tool: Claude-CLI, Model: Claude 4.6 Opus (Thinking) Tool: Cursor-CLI, Model: GPT-5.3 Codex Extra High --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (cherry picked from commit 8a648cf) # Conflicts: # filebeat/input/filestream/internal/input-logfile/store.go --------- Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
Proposed commit message
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesstresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.## Disruptive User Impact## Author's ChecklistHow to test this PR locally
The integration test
TestAutodiscoverFilestreamTakeOverDoesNotReingestKind (and Docker) to created a K8s cluster for testing.Run the tests
Manual Test: Filestream take_over does not re-ingest with autodiscover
Requirements: Linux, Docker, root (needs
/var/lib/docker/containersread access)Start a container that writes one log line per second:
Start Filebeat with the Log input via autodiscover, pointed at the container log file:
Start Filebeat:
Wait until at least 5 events appear in the output file, then stop Filebeat. Note the line count:
Restart Filebeat with the Filestream input and
take_over: enabled: true, using the same output file (no rotation):Start Filebeat:
Wait until at least 2 new lines appear in the output (check with
wc -l /tmp/fb-test/output*), confirming Filestream picked up where the Log input left off.Stop the container and count the total lines it generated:
Wait for Filebeat to log
"File is inactive. Closing.", then stop it and count total ingested events:Expected result
TOTAL_INGESTED == GENERATEDNo lines should be duplicated or missing. If
TOTAL_INGESTED > GENERATED, re-ingestion occurred — the Filestream input restarted from offset 0 instead of continuing from where the Log input stopped.Related issues
## Use cases## Screenshots## Logs