Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debounce input reload on autodiscover #35645

Merged
merged 15 commits into from
Jun 20, 2023

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Jun 1, 2023

What does this PR do?

The Kubernetes autodiscover feature now incorporates a debounce logic when reloading inputs. By default, it waits for at least 1 second before invoking the Reload method. In case of an error, it introduces a 10-second delay before retrying.

The channel used for test synchronisation has been removed and tests now use (assert/require).Eventually.

When Autodiscover calls cfgfile.NewRunnerList to instantiate a RunnerList, it now specifies a different logger name, enabling more granular log filtering.

Debug logs now provide information about the reasons for invoking Reload.

Certain tests that perform sequential actions now utilise require instead of assert to maintain a consistent state avoid cascading failures.

Tests that required updates now leverage require.Eventually instead of wait, providing additional information on failure causes.

Documentation for cfgfile.RunnerList has been improved to enhance clarity.

Why is it important?

Fixes #34388

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

## Author's Checklist

How to test this PR locally

Start a Kubernetes cluster (I used Minikube with none driver on a VM)
Deploy some pods to generate logs constantly

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  labels:
    app: flog-01
  name: flog-01
spec:
  progressDeadlineSeconds: 600
  replicas: 4
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: flog-01
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: flog-01
    spec:
      containers:
      - args:
        - -l
        - -s
        - "1"
        - -d
        - "1"
        image: mingrammer/flog
        imagePullPolicy: Always
        name: flog
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30

Start Filebeat with the following configuration

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      hints.enabled: true
      hints.default_config:
        type: container
        id: "runner-${data.kubernetes.container.id}"
        paths:
          - /var/log/containers/*-${data.kubernetes.container.id}.log
        fields:
          role: kubernetes

logging:
  level: debug
  selectors:
    - autodiscover
    - autodiscover.cfgfile
output.elasticsearch:
  hosts:
    - https://foo.bar.cloud.elastic.co:443
  protocol: https
  username: foo
  password: bar

There should be no data loss or constant log messages like:

Error creating runner from config: Can only start an input when all related states are finished

Related issues

## Use cases
## Screenshots
## Logs

@belimawr belimawr added the bug label Jun 1, 2023
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 1, 2023
@mergify
Copy link
Contributor

mergify bot commented Jun 1, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@belimawr belimawr added the Team:Elastic-Agent Label for the Agent team label Jun 1, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 1, 2023
@belimawr belimawr added the backport-v8.8.0 Automated backport with mergify label Jun 1, 2023
@belimawr belimawr marked this pull request as ready for review June 1, 2023 16:01
@belimawr belimawr requested a review from a team as a code owner June 1, 2023 16:01
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 1, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-06-20T14:45:54.810+0000

  • Duration: 84 min 19 sec

Test stats 🧪

Test Results
Failed 0
Passed 27449
Skipped 2013
Total 29462

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@belimawr belimawr force-pushed the fix-input-reload-autodiscover branch 2 times, most recently from 6dc9b2c to 10eac4f Compare June 5, 2023 14:04
CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM. Just left a couple of minor suggestions.

@belimawr
Copy link
Contributor Author

belimawr commented Jun 8, 2023

@ycombinator all changes you requested are on b82eda4.

ycombinator
ycombinator previously approved these changes Jun 8, 2023
Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@belimawr
Copy link
Contributor Author

belimawr commented Jun 9, 2023

/test

@belimawr belimawr force-pushed the fix-input-reload-autodiscover branch from b82eda4 to a255240 Compare June 9, 2023 16:25
@belimawr
Copy link
Contributor Author

belimawr commented Jun 9, 2023

Rebasing onto main is the easiest way to get the tests to run again 🤷‍♂️

@belimawr belimawr changed the title Debounce input reload on Kubernetes autodiscover Debounce input reload on autodiscover Jun 15, 2023
@belimawr
Copy link
Contributor Author

This PR is finally ready for a final review. The tests were failing because of a bug I introduced, it has been fixed on 1f38a4c and
65208cb adds tests for that case.

@ycombinator could you review it again?

@belimawr belimawr dismissed ycombinator’s stale review June 15, 2023 09:40

I changed the implementation

The Kubernetes autodiscover feature now incorporates a debounce logic
when reloading inputs. By default, it waits for at least 1 second
before invoking the Reload method. In case of an error, it introduces
a 10-second delay before retrying.

The channel used for test synchronisation has been removed and tests
now use (assert/require).Eventually.

When Autodiscover calls `cfgfile.NewRunnerList` to instantiate a
RunnerList, it now specifies a different logger name, enabling more
granular log filtering.

Debug logs now provide information about the reasons for invoking
Reload.

Certain tests that perform sequential actions now utilise `require`
instead of `assert` to maintain a consistent state avoid cascading
failures.

Tests that required updates now leverage `require.Eventually` instead
of `wait`, providing additional information on failure causes.

Documentation for `cfgfile.RunnerList` has been improved to enhance
clarity.
This tests seems flaky on CI, increasing the timeout might help.
Test the case when handle(Start|Stop) is called multiple times
and on at least on the last one they return false.
@belimawr belimawr force-pushed the fix-input-reload-autodiscover branch from 5bd1bd8 to e67e829 Compare June 19, 2023 10:07
@belimawr
Copy link
Contributor Author

Tests were failing due to CI issues (pip was failing). I re-based onto main and force-pushed.

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@belimawr belimawr merged commit d270536 into elastic:main Jun 20, 2023
91 checks passed
mergify bot pushed a commit that referenced this pull request Jun 20, 2023
The Kubernetes autodiscover feature now incorporates a debounce logic
when reloading inputs. By default, it waits for at least 1 second
before invoking the Reload method. In case of an error, it introduces
a 10-second delay before retrying.

The channel used for test synchronisation has been removed and tests
now use (assert/require).Eventually.

When Autodiscover calls `cfgfile.NewRunnerList` to instantiate a
RunnerList, it now specifies a different logger name, enabling more
granular log filtering.

Debug logs now provide information about the reasons for invoking
Reload.

Certain tests that perform sequential actions now utilise `require`
instead of `assert` to maintain a consistent state avoid cascading
failures.

Tests that required updates now leverage `require.Eventually` instead
of `wait`, providing additional information on failure causes.

Documentation for `cfgfile.RunnerList` has been improved to enhance
clarity.

(cherry picked from commit d270536)
@belimawr belimawr deleted the fix-input-reload-autodiscover branch June 20, 2023 16:43
belimawr added a commit that referenced this pull request Jun 20, 2023
The Kubernetes autodiscover feature now incorporates a debounce logic
when reloading inputs. By default, it waits for at least 1 second
before invoking the Reload method. In case of an error, it introduces
a 10-second delay before retrying.

The channel used for test synchronisation has been removed and tests
now use (assert/require).Eventually.

When Autodiscover calls `cfgfile.NewRunnerList` to instantiate a
RunnerList, it now specifies a different logger name, enabling more
granular log filtering.

Debug logs now provide information about the reasons for invoking
Reload.

Certain tests that perform sequential actions now utilise `require`
instead of `assert` to maintain a consistent state avoid cascading
failures.

Tests that required updates now leverage `require.Eventually` instead
of `wait`, providing additional information on failure causes.

Documentation for `cfgfile.RunnerList` has been improved to enhance
clarity.

(cherry picked from commit d270536)

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
@reakaleek reakaleek mentioned this pull request Jul 19, 2023
6 tasks
@belimawr
Copy link
Contributor Author

This also seems to relate to #34717, I'll edit the description of the PR adding this information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.8.0 Automated backport with mergify bug Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filebeat: Error when using autodiscover + hints + templates with filebeat 8.6
3 participants