Skip to content

Fix check config and take over interaction#49632

Merged
belimawr merged 43 commits intoelastic:mainfrom
belimawr:fix-check-config-and-take-over-interaction
Mar 30, 2026
Merged

Fix check config and take over interaction#49632
belimawr merged 43 commits intoelastic:mainfrom
belimawr:fix-check-config-and-take-over-interaction

Conversation

@belimawr
Copy link
Copy Markdown
Contributor

@belimawr belimawr commented Mar 24, 2026

Proposed commit message

When using Filestream's take_over feature with autodiscover, files were being
re-ingested from the beginning instead of continuing from the offset recorded
by the Log input.

Autodiscover validates each rendered configuration by instantiating the input
with a temporary, suffixed ID before starting it. Because take_over ran during
input initialisation, states were migrated to the temporary ID rather than the
real input ID. When the real input started, the Log input states had already
been consumed, so all files appeared new.

The fix moves the take_over migration step from input initialisation to input
start. This ensures that config validation (CheckConfig) never triggers state
migration, and only the input that actually runs performs the takeover.

Additionally, the Log input state is no longer deleted from the registry after
migration. Instead, Filestream checks whether it already holds a state for the
file before migrating, skipping the takeover if a state is found. This makes
the mechanism idempotent and removes reliance on the TTL=-2 heuristic that was
used to detect previously-migrated states.

Last, but not least, a few other issues in the TakeOver implementation
are also fixed:
- Incorrect resource release
- ephemeralStore is now locked throughout the whole TakeOver duration

GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Claude-CLI, Model: Claude 4.6 Opus (Thinking)
Tool: Cursor-CLI, Model: GPT-5.3 Codex Extra High

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

## Disruptive User Impact
## Author's Checklist

How to test this PR locally

The integration test TestAutodiscoverFilestreamTakeOverDoesNotReingest Kind (and Docker) to created a K8s cluster for testing.

Run the tests

cd filebeat
go test -v -run '(?i)takeover' ./input/filestream/... -race
mage buildSystemTestBinary
go test -v -tags integration -run '(?i)takeover' ./tests/integration/... -race

Manual Test: Filestream take_over does not re-ingest with autodiscover

Requirements: Linux, Docker, root (needs /var/lib/docker/containers read access)


  1. Start a container that writes one log line per second:

    docker run -d --name flog-test mingrammer/flog -l -d 1 -s 1
    export CONTAINER_ID=$(docker inspect -f '{{.Id}}' flog-test)
    
  2. Start Filebeat with the Log input via autodiscover, pointed at the container log file:

    # filebeat-log.yml
    filebeat.autodiscover:
      providers:
        - type: docker
          templates:
            - condition:
                contains:
                  docker.container.id: ${CONTAINER_ID}
              config:
                - type: log
                  allow_deprecated_use: true
                  paths:
                    - /var/lib/docker/containers/${data.docker.container.id}/*.log
                  json:
                    message_key: log
                    keys_under_root: true
                    overwrite_keys: true
    output.file:
      path: /tmp/fb-test
      filename: output
      rotate_on_startup: false
    
    logging:
      to_stderr: true

    Start Filebeat:

    filebeat -c filebeat-log.yml
    
  3. Wait until at least 5 events appear in the output file, then stop Filebeat. Note the line count:

    wc -l /tmp/fb-test/output*
    
  4. Restart Filebeat with the Filestream input and take_over: enabled: true, using the same output file (no rotation):

    # filebeat-filestream.yml
    filebeat.autodiscover:
      providers:
        - type: docker
          templates:
            - condition:
                contains:
                  docker.container.id: ${CONTAINER_ID}
              config:
                - type: filestream
                  id: "${data.docker.container.id}-logs"
                  take_over:
                    enabled: true
                  file_identity.native: ~
                  prospector.scanner.fingerprint.enabled: false
                  close.on_state_change.inactive: 2s
                  paths:
                    - /var/lib/docker/containers/${data.docker.container.id}/*.log
                  parsers:
                    - container: ~
    output.file:
      path: /tmp/fb-test
      filename: output
      rotate_on_startup: false
    
    logging:
      to_stderr: true
      level: debug
      selectors:
        - "input.filestream"

Start Filebeat:

filebeat -c filebeat-filestream.yml
  1. Wait until at least 2 new lines appear in the output (check with wc -l /tmp/fb-test/output*), confirming Filestream picked up where the Log input left off.

  2. Stop the container and count the total lines it generated:

    docker stop flog-test
    GENERATED=$(docker logs flog-test 2>/dev/null | wc -l)
    echo "Container generated: $GENERATED"
    
  3. Wait for Filebeat to log "File is inactive. Closing.", then stop it and count total ingested events:

    TOTAL_INGESTED=$(wc -l < /tmp/fb-test/output*)
    echo "Total ingested: $TOTAL_INGESTED"
    

Expected result

TOTAL_INGESTED == GENERATED

No lines should be duplicated or missing. If TOTAL_INGESTED > GENERATED, re-ingestion occurred — the Filestream input restarted from offset 0 instead of continuing from where the Log input stopped.

Related issues

## Use cases
## Screenshots
## Logs

github-actions bot and others added 7 commits March 23, 2026 16:30
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: GPT-5.3 Codex High
 - Use files instead of adding the config in the middle of the test
 - Remove time.Sleep
 - Fix duplication count
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Calude, Model: Default Sonnet 4.6
Check whether the Filestream state already exists before taking over a
Log input state. This replace the previous logic that relied on the
Log input setting the TTL of un-used states to -2.

GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Calude, Model: Default Sonnet 4.6
@belimawr belimawr self-assigned this Mar 24, 2026
@belimawr belimawr added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team bugfix backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches labels Mar 24, 2026
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

Ensure the whole TakeOver code runs holding a lock of the
ephemeralStore, thus allowing for a consistent view of the data while
the migration is happening.
@github-actions

This comment has been minimized.

@belimawr belimawr marked this pull request as ready for review March 27, 2026 14:40
@belimawr belimawr requested review from a team as code owners March 27, 2026 14:40
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 07ca13c5-03ee-488c-b3ad-827cf5f8b8e8

📥 Commits

Reviewing files that changed from the base of the PR and between 58ce258 and 57d3d36.

📒 Files selected for processing (1)
  • filebeat/tests/integration/autodiscover_test.go

📝 Walkthrough

Walkthrough

This pull request moves Filestream state takeover out of Init into a new TakeOver method invoked explicitly after initialization. The input manager now supplies prior source identifiers to managed inputs and calls prospector TakeOver before running. The source store takeover logic was changed to avoid overwriting existing filestream targets and to preserve log-input entries. New integration tests and autodiscover configs validate no re-ingestion occurs.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Changes comprehensively address issue #49579 by moving TakeOver from initialization to runtime start and making state migration idempotent.
Out of Scope Changes check ✅ Passed All changes are directly aligned with fixing the TakeOver+Autodiscover re-ingestion issue; no out-of-scope modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@filebeat/input/filestream/internal/input-logfile/input.go`:
- Around line 93-95: The takeover path returns early on error from
inp.prospector.TakeOver, but StopInput is only called later, so the input ID can
remain marked active; fix by deferring StopInput immediately before calling
inp.prospector.TakeOver (e.g., place a defer inp.StopInput() right before the
TakeOver call) so StopInput always runs on any early return; ensure you
capture/handle any error from StopInput (log or ignore) as appropriate while
preserving the original TakeOver error return and reference the symbols
inp.prospector.TakeOver, inp.StopInput, and inp.sourceIdentifier.ID when
locating the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: de328f37-0a30-434a-a445-df2fd35e6599

📥 Commits

Reviewing files that changed from the base of the PR and between 7d7d034 and bdf4651.

📒 Files selected for processing (13)
  • .buildkite/filebeat/filebeat-pipeline.yml
  • changelog/fragments/1769099693-filestream-takeover-autodiscover-reingest.yaml
  • filebeat/input/filestream/internal/input-logfile/input.go
  • filebeat/input/filestream/internal/input-logfile/manager.go
  • filebeat/input/filestream/internal/input-logfile/manager_test.go
  • filebeat/input/filestream/internal/input-logfile/prospector.go
  • filebeat/input/filestream/internal/input-logfile/store.go
  • filebeat/input/filestream/internal/input-logfile/store_test.go
  • filebeat/input/filestream/prospector.go
  • filebeat/magefile.go
  • filebeat/tests/integration/autodiscover_test.go
  • filebeat/tests/integration/testdata/autodiscover/take-over-filestream-input-k8s.yml
  • filebeat/tests/integration/testdata/autodiscover/take-over-log-input-k8s.yml

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Filestream take_over + autodiscover interaction where CheckConfig could trigger state migration to a temporary input ID, causing subsequent re-ingestion from offset 0. The PR moves takeover to the real input start, makes log-state takeover idempotent, and adds integration coverage.

Changes:

  • Move Filestream takeover from prospector initialization to a dedicated TakeOver step invoked during input start (so CheckConfig can’t migrate state).
  • Update registry takeover semantics to be idempotent (skip if Filestream state already exists; don’t delete Log input state).
  • Add/extend unit + integration tests for autodiscover takeover and update CI pipeline to build a Docker image needed for Kind-based tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
filebeat/input/filestream/prospector.go Extracts takeover into TakeOver() and stops doing it during Init().
filebeat/input/filestream/internal/input-logfile/prospector.go Extends Prospector interface with TakeOver() and updates docs/comments accordingly.
filebeat/input/filestream/internal/input-logfile/input.go Invokes prospector.TakeOver(...) during Run(); defers StopInput.
filebeat/input/filestream/internal/input-logfile/manager.go Tracks previousSrcIdentifiers and passes them into managed input for runtime takeover.
filebeat/input/filestream/internal/input-logfile/manager_test.go Updates noop prospector to satisfy the new interface.
filebeat/input/filestream/internal/input-logfile/store.go Changes takeover locking and semantics (don’t delete Log input states; skip takeover when Filestream key exists; resource cleanup fixes).
filebeat/input/filestream/internal/input-logfile/store_test.go Adds tests for takeover-from-Log behavior and idempotency.
filebeat/tests/integration/autodiscover_test.go Adds Kind-based integration test ensuring takeover does not re-ingest under autodiscover; adds helper functions.
filebeat/tests/integration/testdata/autodiscover/take-over-log-input-k8s.yml New autodiscover config template for Log input.
filebeat/tests/integration/testdata/autodiscover/take-over-filestream-input-k8s.yml New autodiscover config template for Filestream takeover input.
.buildkite/filebeat/filebeat-pipeline.yml Builds Docker package image before running Filebeat Go integration tests.
changelog/fragments/1769099693-filestream-takeover-autodiscover-reingest.yaml Adds changelog fragment for the bug fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

belimawr and others added 3 commits March 27, 2026 16:39
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…om:belimawr/beats into fix-check-config-and-take-over-interaction
@belimawr belimawr merged commit 8a648cf into elastic:main Mar 30, 2026
53 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

@Mergifyio backport 9.2 9.3

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

backport 9.2 9.3

✅ Backports have been created

Details

Cherry-pick of 8a648cf has failed:

On branch mergify/bp/9.2/pr-49632
Your branch is up to date with 'origin/9.2'.

You are currently cherry-picking commit 8a648cf55.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   .buildkite/filebeat/filebeat-pipeline.yml
	new file:   changelog/fragments/1769099693-filestream-takeover-autodiscover-reingest.yaml
	modified:   filebeat/input/filestream/internal/input-logfile/manager.go
	modified:   filebeat/input/filestream/internal/input-logfile/manager_test.go
	modified:   filebeat/input/filestream/internal/input-logfile/prospector.go
	modified:   filebeat/input/filestream/internal/input-logfile/store_test.go
	modified:   filebeat/input/filestream/prospector.go
	modified:   filebeat/tests/integration/autodiscover_test.go
	new file:   filebeat/tests/integration/testdata/autodiscover/take-over-filestream-input-k8s.yml
	new file:   filebeat/tests/integration/testdata/autodiscover/take-over-log-input-k8s.yml

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   filebeat/input/filestream/internal/input-logfile/input.go
	both modified:   filebeat/input/filestream/internal/input-logfile/store.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Cherry-pick of 8a648cf has failed:

On branch mergify/bp/9.3/pr-49632
Your branch is up to date with 'origin/9.3'.

You are currently cherry-picking commit 8a648cf55.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   .buildkite/filebeat/filebeat-pipeline.yml
	new file:   changelog/fragments/1769099693-filestream-takeover-autodiscover-reingest.yaml
	modified:   filebeat/input/filestream/internal/input-logfile/input.go
	modified:   filebeat/input/filestream/internal/input-logfile/manager.go
	modified:   filebeat/input/filestream/internal/input-logfile/manager_test.go
	modified:   filebeat/input/filestream/internal/input-logfile/prospector.go
	modified:   filebeat/input/filestream/internal/input-logfile/store_test.go
	modified:   filebeat/input/filestream/prospector.go
	modified:   filebeat/tests/integration/autodiscover_test.go
	new file:   filebeat/tests/integration/testdata/autodiscover/take-over-filestream-input-k8s.yml
	new file:   filebeat/tests/integration/testdata/autodiscover/take-over-log-input-k8s.yml

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   filebeat/input/filestream/internal/input-logfile/store.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

mergify bot pushed a commit that referenced this pull request Mar 30, 2026
When using Filestream's take_over feature with autodiscover, files were being
re-ingested from the beginning instead of continuing from the offset recorded
by the Log input.

Autodiscover validates each rendered configuration by instantiating the input
with a temporary, suffixed ID before starting it. Because take_over ran during
input initialisation, states were migrated to the temporary ID rather than the
real input ID. When the real input started, the Log input states had already
been consumed, so all files appeared new.

The fix moves the take_over migration step from input initialisation to input
start. This ensures that config validation (CheckConfig) never triggers state
migration, and only the input that actually runs performs the takeover.

Additionally, the Log input state is no longer deleted from the registry after
migration. Instead, Filestream checks whether it already holds a state for the
file before migrating, skipping the takeover if a state is found. This makes
the mechanism idempotent and removes reliance on the TTL=-2 heuristic that was
used to detect previously-migrated states.

Last, but not least, a few other issues in the TakeOver implementation
are also fixed:
- Incorrect resource release
- ephemeralStore is now locked throughout the whole TakeOver duration

GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Claude-CLI, Model: Claude 4.6 Opus (Thinking)
Tool: Cursor-CLI, Model: GPT-5.3 Codex Extra High

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit 8a648cf)

# Conflicts:
#	filebeat/input/filestream/internal/input-logfile/input.go
#	filebeat/input/filestream/internal/input-logfile/store.go
mergify bot pushed a commit that referenced this pull request Mar 30, 2026
When using Filestream's take_over feature with autodiscover, files were being
re-ingested from the beginning instead of continuing from the offset recorded
by the Log input.

Autodiscover validates each rendered configuration by instantiating the input
with a temporary, suffixed ID before starting it. Because take_over ran during
input initialisation, states were migrated to the temporary ID rather than the
real input ID. When the real input started, the Log input states had already
been consumed, so all files appeared new.

The fix moves the take_over migration step from input initialisation to input
start. This ensures that config validation (CheckConfig) never triggers state
migration, and only the input that actually runs performs the takeover.

Additionally, the Log input state is no longer deleted from the registry after
migration. Instead, Filestream checks whether it already holds a state for the
file before migrating, skipping the takeover if a state is found. This makes
the mechanism idempotent and removes reliance on the TTL=-2 heuristic that was
used to detect previously-migrated states.

Last, but not least, a few other issues in the TakeOver implementation
are also fixed:
- Incorrect resource release
- ephemeralStore is now locked throughout the whole TakeOver duration

GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Claude-CLI, Model: Claude 4.6 Opus (Thinking)
Tool: Cursor-CLI, Model: GPT-5.3 Codex Extra High

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit 8a648cf)

# Conflicts:
#	filebeat/input/filestream/internal/input-logfile/store.go
belimawr added a commit that referenced this pull request Mar 30, 2026
…9785)

When using Filestream's take_over feature with autodiscover, files were being
re-ingested from the beginning instead of continuing from the offset recorded
by the Log input.

Autodiscover validates each rendered configuration by instantiating the input
with a temporary, suffixed ID before starting it. Because take_over ran during
input initialisation, states were migrated to the temporary ID rather than the
real input ID. When the real input started, the Log input states had already
been consumed, so all files appeared new.

The fix moves the take_over migration step from input initialisation to input
start. This ensures that config validation (CheckConfig) never triggers state
migration, and only the input that actually runs performs the takeover.

Additionally, the Log input state is no longer deleted from the registry after
migration. Instead, Filestream checks whether it already holds a state for the
file before migrating, skipping the takeover if a state is found. This makes
the mechanism idempotent and removes reliance on the TTL=-2 heuristic that was
used to detect previously-migrated states.

Last, but not least, a few other issues in the TakeOver implementation
are also fixed:
- Incorrect resource release
- ephemeralStore is now locked throughout the whole TakeOver duration

GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Claude-CLI, Model: Claude 4.6 Opus (Thinking)
Tool: Cursor-CLI, Model: GPT-5.3 Codex Extra High

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit 8a648cf)

# Conflicts:
#	filebeat/input/filestream/internal/input-logfile/store.go

---------

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches bugfix Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Filestream] TakeOver with Autodiscover causes files re-ingestion.

5 participants