Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix drop processor for monitoring events #2982

Merged
merged 5 commits into from Jul 10, 2023

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Jul 3, 2023

What does this PR do?

It fixes the drop processor for monitoring component logs, instead of using the dataset that does not include any information about whether the component is a monitoring component it now uses the component.id.

Why is it important?

There can be an infinity loop of a monitoring component failing to send an event, logging it, then having it picked up from the logs which will be sent and fail, creating an infinity loop of error messages.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

## Author's Checklist

How to test this PR locally

Run Elastic-Agent with monitoring enabled, then go to Kibana and look at the dataview logs-*, there should be no logs where component.id matches the regexp .*-monitoring$.

Related issues

## Use cases
## Screenshots
## Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?

@belimawr belimawr added the bug Something isn't working label Jul 3, 2023
@mergify
Copy link
Contributor

mergify bot commented Jul 3, 2023

This pull request does not have a backport label. Could you fix it @belimawr? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip label Jul 3, 2023
@belimawr belimawr force-pushed the fix-monitoring-drop-processor branch from 088fa14 to 563533f Compare July 3, 2023 16:33
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jul 3, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-10T12:45:17.715+0000

  • Duration: 20 min 29 sec

Test stats 🧪

Test Results
Failed 0
Passed 16
Skipped 0
Total 16

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jul 3, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 25.974% (20/77) 👍
Files 12.928% (34/263) 👍
Classes 8.998% (44/489) 👍
Methods 6.642% (127/1912) 👎 -0.003
Lines 4.613% (1361/29506) 👎 -0.023
Conditionals 100.0% (0/0) 💚

@belimawr belimawr added the Team:Elastic-Agent Label for the Agent team label Jul 4, 2023
@belimawr belimawr force-pushed the fix-monitoring-drop-processor branch 4 times, most recently from ad0ca46 to 6ab8c1e Compare July 5, 2023 10:12
@mergify
Copy link
Contributor

mergify bot commented Jul 6, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix-monitoring-drop-processor upstream/fix-monitoring-drop-processor
git merge upstream/main
git push upstream fix-monitoring-drop-processor

It fixes the drop processor for monitoring component logs, instead
of using the dataset that does not include any information about
whether the component is a monitoring component it now uses the
`component.id`.
@belimawr belimawr force-pushed the fix-monitoring-drop-processor branch from 79f128b to 85f208a Compare July 7, 2023 10:20
@belimawr
Copy link
Contributor Author

belimawr commented Jul 7, 2023

rebase onto main, force push

@belimawr belimawr marked this pull request as ready for review July 7, 2023 10:22
@belimawr belimawr requested a review from a team as a code owner July 7, 2023 10:22
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

One request for a non-blocking enhancement request in test.


t.Log("waiting 20s so the components can generate some logs and" +
"Filebeat can collect them")
time.Sleep(20 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking but do we have an option besides Sleep? Any chance we could generate and read some kind of pre and post sentinel logs where we know a monitoring event would have been sent between the 2 of them?

@belimawr belimawr added the backport-v8.9.0 Automated backport with mergify label Jul 7, 2023
@mergify mergify bot removed the backport-skip label Jul 7, 2023
@mergify
Copy link
Contributor

mergify bot commented Jul 8, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix-monitoring-drop-processor upstream/fix-monitoring-drop-processor
git merge upstream/main
git push upstream fix-monitoring-drop-processor

@jlind23
Copy link
Contributor

jlind23 commented Jul 8, 2023

buildkite test it

Removing double installation/enrollment + remove tests based on the unwanted behavior (we no more have lines in logs for monitoring)
@jlind23
Copy link
Contributor

jlind23 commented Jul 10, 2023

@pierrehilbert looks like integration tests coverage is not considered. Do you know if we generate any coverage report? This breaks the sonarcloud check

@pierrehilbert
Copy link
Contributor

/test

Reducing code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an extra changelog fragment... I don't see the related change and I think it's already been merged, right ?

Copy link
Contributor

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the extra changelog fragment (not sure if it's a correction for a previous PR or a leftover and should be removed), the rest LGTM

@pierrehilbert pierrehilbert added backport-skip and removed backport-v8.9.0 Automated backport with mergify labels Jul 10, 2023
@pierrehilbert
Copy link
Contributor

Removing the backport, we are now too close to the 8.9 release.

@pierrehilbert pierrehilbert merged commit 7cb5295 into elastic:main Jul 10, 2023
23 of 24 checks passed
@cmacknz
Copy link
Member

cmacknz commented Jul 10, 2023

Removing the backport, we are now too close to the 8.9 release.

@pierrehilbert I think this is a small enough change that it could be backported. It should at least be in 8.9.1.

@pierrehilbert
Copy link
Contributor

I was thinking about this in 8.9.1 because we are really close to the last BC but if that's the wrong call and the fact that's a really small change makes us confident, let's change this :-)

@cmacknz
Copy link
Member

cmacknz commented Jul 10, 2023

This one is small and fixes something that is currently completely broken, so let's backport it.

For larger changes, we are definitely in the window where we need to think carefully about backporting.

@cmacknz cmacknz added the backport-v8.9.0 Automated backport with mergify label Jul 10, 2023
mergify bot pushed a commit that referenced this pull request Jul 10, 2023
* integration:local can run a single test

* Fix drop processor for monitoring components

It fixes the drop processor for monitoring component logs, instead
of using the dataset that does not include any information about
whether the component is a monitoring component it now uses the
`component.id`.

* Update enroll_test.go

Removing double installation/enrollment + remove tests based on the unwanted behavior (we no more have lines in logs for monitoring)

* Update enroll_test.go

Reducing code

---------

Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
(cherry picked from commit 7cb5295)
cmacknz added a commit that referenced this pull request Jul 11, 2023
* integration:local can run a single test

* Fix drop processor for monitoring components

It fixes the drop processor for monitoring component logs, instead
of using the dataset that does not include any information about
whether the component is a monitoring component it now uses the
`component.id`.

* Update enroll_test.go

Removing double installation/enrollment + remove tests based on the unwanted behavior (we no more have lines in logs for monitoring)

* Update enroll_test.go

Reducing code

---------

Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
(cherry picked from commit 7cb5295)

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip backport-v8.9.0 Automated backport with mergify bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filestream monitoring processors used to prevent error loops are wrong
7 participants