Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Improve GRPC stop to be more relaxed. #20118

Merged
merged 3 commits into from
Jul 23, 2020

Conversation

blakerouse
Copy link
Contributor

What does this PR do?

It allows the GPRC client protocol to just disconnect when receiving expected state of Stopping. If the client was connected then disconnects, that is accepted as valid signal that the application has stopped.

Why is it important?

Because sometimes the TCP connection will not be flushed on disconnect and the Agent will not get the Stopping message.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@blakerouse blakerouse self-assigned this Jul 21, 2020
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jul 21, 2020
@blakerouse blakerouse marked this pull request as ready for review July 21, 2020 22:02
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jul 21, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #20118 updated]

  • Start Time: 2020-07-21T22:03:10.581+0000

  • Duration: 32 min 3 sec

@@ -548,8 +549,10 @@ func (as *ApplicationState) Stop(timeout time.Duration) error {
s := as.status
doneChan := as.checkinDone
as.checkinLock.RUnlock()
if s == proto.StateObserved_STOPPING && doneChan == nil {
// sent stopping and now is disconnected (so its stopped)
if (wasConn && doneChan == nil) || (!wasConn && s == proto.StateObserved_STOPPING && doneChan == nil) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure i follow this. if it was connected and doneChan is nil means it got disconnected this seems ok.
second part means if status of application is stopping and doneChan is nil (got disconnected) then we;re destroying but only in case doneChan was nil before so nothing changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the second case is only in the case that Stop() was called but the client was disconnected from the GRPC at that time (which is very rare, but possible).

So if the client was disconnected from the GPRC at the time Stop() was called, it needs to know that it did receive the stopping state. So it waits for the client to send that it is actually stopping and then it has disconnected. This requires that the client actually reconnect to get the stopping message or timeout occurs, which ever comes first.

In the normal case the wasConn && doneChan == nil will almost always be used in this loop.

Hopefully that it explains it better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense thanks

@blakerouse blakerouse merged commit 3811728 into elastic:master Jul 23, 2020
@blakerouse blakerouse deleted the agent-grpc-stop-disconnect branch July 23, 2020 13:07
blakerouse added a commit to blakerouse/beats that referenced this pull request Jul 23, 2020
* Improve stop to be more relaxed.

* Add changelog.

(cherry picked from commit 3811728)
blakerouse added a commit to blakerouse/beats that referenced this pull request Jul 23, 2020
* Improve stop to be more relaxed.

* Add changelog.

(cherry picked from commit 3811728)
blakerouse added a commit that referenced this pull request Jul 23, 2020
* Improve stop to be more relaxed.

* Add changelog.

(cherry picked from commit 3811728)
blakerouse added a commit that referenced this pull request Jul 23, 2020
* Improve stop to be more relaxed.

* Add changelog.

(cherry picked from commit 3811728)
v1v added a commit to v1v/beats that referenced this pull request Jul 27, 2020
…ne-2.0

* upstream/master: (41 commits)
  adding possibility to override content-type checks, it was breaking certain webhooks that is not able to set content-headers at all. Still defaults to application/json (elastic#20232)
  fix: use a fixed worker type for tests (elastic#20130)
  [Ingest Manager] Prepare packaging for endpoint and asc files (elastic#20186)
  [Packetbeat] HTTP: Improve support for 100-continue elastic#15830 (elastic#19349)
  Increase index.max_docvalue_fields_search to 200 (elastic#20218)
  [Ingest Manager] Prevent closing closed reader (elastic#20214)
  [Metricbeat] Use MySQL Host Parser in Query metricset (elastic#20191)
  Testing: Ignore timestamp from cylance/protect dataset (elastic#20211)
  [Filebeat] Ignore cylance.protect timestamps while testing (elastic#20207)
  [CI] remove codecov step (elastic#20102)
  [docs] Indicate that SYSTEM user is required on Windows to use Endpoint (elastic#20172)
  Remove f5/firepass rsa2elk fileset (elastic#20160)
  [Elastic Agent] Improve GRPC stop to be more relaxed. (elastic#20118)
  Fix fileset field prefixing (elastic#20170)
  Fix terminating pod autodiscover issue (elastic#20084)
  Call host parser only once when building light metricsets (elastic#20149)
  [CI] fix null string with contains (elastic#20182)
  [Ingest Manager] Fix failing unit tests on windows (elastic#20127)
  [Filebeat] Update crowdstrike module (elastic#20138)
  [docs] Add x-pack role to relevant metricsets (elastic#20167)
  ...
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this pull request Oct 14, 2020
* Improve stop to be more relaxed.

* Add changelog.
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…elastic#20202)

* Improve stop to be more relaxed.

* Add changelog.

(cherry picked from commit 3bbbb19)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Elastic Agent] When GRPC connected and Stop occurs, a disconnect should be considered stopped
3 participants