Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] Add publish pipeline timeout to run_once #35721

Merged
merged 21 commits into from Jul 28, 2023

Conversation

emilioalvap
Copy link
Collaborator

@emilioalvap emilioalvap commented Jun 8, 2023

What does this PR do?

Closes #35706.

Decouples run_once from sync pipeline and adds a timeout before exiting for emitting pending events.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Note
To simplify testing, it's recommended disable exiting on run_once state loader error fmt.Errorf("run_once mode fatal error: %w", err)

  1. Build heartbeat locally.
  2. Set up a few monitors and run_once mode.
  3. Point output to a non-existing ES, like https://localhost:9200/ without it running locally.
  4. Run HB, notice it never exits.
  5. Stop it and add heartbeat.publish_timeout: 15s to heartbeat.yml
  6. Run HB again and notice it exits after 15s, with events still pending:
{"log.level":"info","@timestamp":"2023-06-09T17:52:44.677+0200","log.origin":{"file.name":"beater/signalwait.go","file.line":88},"message":"Ending run_once run.","service.name":"heartbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-06-09T17:52:44.677+0200","log.origin":{"file.name":"beater/heartbeat.go","file.line":219},"message":"Shutting down, waiting for output to complete","service.name":"heartbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-06-09T17:52:44.677+0200","log.origin":{"file.name":"beater/heartbeat.go","file.line":230},"message":"shutdown: output timer started. Waiting for max 15s.","service.name":"heartbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-06-09T17:52:59.681+0200","log.origin":{"file.name":"beater/signalwait.go","file.line":88},"message":"shutdown: time out waiting for pipeline to publish events.","service.name":"heartbeat","ecs.version":"1.6.0"}

@emilioalvap emilioalvap added bug Team:obs-ds-hosted-services Label for the Observability Hosted Services team labels Jun 8, 2023
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jun 8, 2023
@mergify
Copy link
Contributor

mergify bot commented Jun 8, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @emilioalvap? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 8, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-28T09:56:36.339+0000

  • Duration: 56 min 26 sec

Test stats 🧪

Test Results
Failed 0
Passed 2031
Skipped 25
Total 2056

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@emilioalvap emilioalvap marked this pull request as ready for review June 9, 2023 14:55
@emilioalvap emilioalvap requested a review from a team as a code owner June 9, 2023 14:55
@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime (Team:Uptime)

@emilioalvap
Copy link
Collaborator Author

/test

@emilioalvap emilioalvap added the backport-skip Skip notification from the automated backport with mergify label Jun 28, 2023
Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a first pass, Code changes LGTM. However I did not have a change to run all the test cases yet.

heartbeat/monitors/pipeline.go Show resolved Hide resolved
heartbeat/beater/heartbeat.go Show resolved Hide resolved
Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Great work 🎉

Nothing blocking though, Added few comments. Also, I tested different modes with HTTP, Browser monitors. Looking good so far.

heartbeat/monitors/signalwait.go Outdated Show resolved Hide resolved
heartbeat/config/config.go Show resolved Hide resolved
@@ -211,20 +211,34 @@ func (bt *Heartbeat) Run(b *beat.Beat) error {

defer bt.scheduler.Stop()

<-bt.done
// Wait until run_once ends or bt is being shut down
waitMonitors.AddChan(bt.done)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if we ever waited for Kill signal from the OS or any interrupts? Not blocking the PR, just want to understand if we are handling anything differently here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, any interrupt is handled by libbeat which in turn triggers heartbeat.Stop() and finally <-bt.done. This is just replicating the same condition using the wait signal approach and listening both for interrupts OR run once finished event, if in run once mode.

@emilioalvap emilioalvap merged commit b2d5017 into elastic:main Jul 28, 2023
25 checks passed
@emilioalvap emilioalvap deleted the hb-run-once-exit-pipeline branch July 28, 2023 12:53
@emilioalvap emilioalvap removed their assignment Jul 28, 2023
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
Decouples run_once from sync pipeline and adds a timeout before exiting for emitting pending events.

* Add publish pipeline timeout to run_once

* Clean up ISyncClient

* Nit wait for run_once

* Rename pipeline

* Add signal tests

* Add pipeline sync tests

* Disable linter false positive

* golint

* Apply suggestions from code review

* Update heartbeat/monitors/pipeline.go

* Apply suggestions from code review

* Add changelog and docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify bug Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Heartbeat] Run_once mode should exit early if ES is unavailable
3 participants