Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix goroutine leak in SubOutlet #7820

Merged
merged 1 commit into from Aug 2, 2018

Conversation

@adriansr
Copy link
Member

adriansr commented Jul 31, 2018

This fixes a goroutine leaking every time that a harvester is closed.

@adriansr adriansr force-pushed the adriansr:fix/fb/goroutine_leak branch from 558a906 to bca6f3f Jul 31, 2018
// still running.
o.mutex.Lock()
defer o.mutex.Unlock()
o.closeEventChannel()

This comment has been minimized.

Copy link
@ph

ph Jul 31, 2018

Member

I am a bit confused , we have a call to close(o.done) this will trigger the select below and call o.closeEventChannel

This comment has been minimized.

Copy link
@ph

ph Jul 31, 2018

Member

The channel can still be close.

This comment has been minimized.

Copy link
@ph

ph Jul 31, 2018

Member

OK, I got it,

OnEvent -> Acquire mutex
Out of bound close -> Block on mutex if on event is currently running.

This comment has been minimized.

Copy link
@ph

ph Jul 31, 2018

Member

@adriansr I am a bit worried that we add one more synchronization mechanism to this object, we have the isOpen (atomic bool), the chClosed has some execution state, the mutex and the channels.

I think I am missing some context from the environment and I am just trying to understand what we want to prevent, it is multiple calls to `close(o.ch)?

This comment has been minimized.

Copy link
@ph

ph Jul 31, 2018

Member
// This mutex prevents the event channel to be closed if OnEvent is
// still running.

From your comment this is what we want to stop? So we could drop the atomic bool and use isOpen instead of chClosed

This comment has been minimized.

Copy link
@ph

ph Jul 31, 2018

Member

by "drop" I mean using a single mutex.

@adriansr adriansr force-pushed the adriansr:fix/fb/goroutine_leak branch from bca6f3f to 6ef0446 Jul 31, 2018
@adriansr adriansr dismissed urso’s stale review Jul 31, 2018

Changed after review

@adriansr

This comment has been minimized.

Copy link
Member Author

adriansr commented Jul 31, 2018

I've submitted a simpler version addressing @ph comments.

To my understanding, the atomic bool is still necessary to prevent closing done twice.

@andrewkroh

This comment has been minimized.

Copy link
Member

andrewkroh commented Jul 31, 2018

the atomic bool is still necessary to prevent closing done twice.

I didn't examine the code but based on the sounds of it this could be a good use case for sync.Once to prevent channel closing more than once.

// still running.
o.mutex.Lock()
defer o.mutex.Unlock()
close(o.ch)

This comment has been minimized.

Copy link
@ph

ph Jul 31, 2018

Member

as @andrewkroh noted sync.Once could replace the semaphore isOpen.

This comment has been minimized.

Copy link
@urso

urso Jul 31, 2018

Collaborator

The open flag is also used in OnEvent. Once the outlet is closed, we MUST NOT run into the select block in OnEvent. A select is non-deterministic. Even if o.done is closed, we might end up sending something to o.ch. Depending on current state this can generate either a panic or some inconsistent state .

@adriansr adriansr force-pushed the adriansr:fix/fb/goroutine_leak branch from 6ef0446 to 7aedaa5 Jul 31, 2018
@ph
ph approved these changes Jul 31, 2018
@urso
urso approved these changes Jul 31, 2018
@adriansr adriansr force-pushed the adriansr:fix/fb/goroutine_leak branch from 7aedaa5 to f7e73e8 Aug 1, 2018
This fixes a goroutine leaking every time that a harvester is closed.
@adriansr adriansr force-pushed the adriansr:fix/fb/goroutine_leak branch from f7e73e8 to 9bd463c Aug 2, 2018
@ph

This comment has been minimized.

Copy link
Member

ph commented Aug 2, 2018

@adriansr I've look at the failure on the CI, I think its one of our flaky test.

======================================================================
ERROR: Checks that states are properly removed after clean_inactive
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/gopath/src/github.com/elastic/beats/filebeat/tests/system/test_registrar.py", line 805, in test_clean_inactive
    max_timeout=10)
  File "/Users/travis/gopath/src/github.com/elastic/beats/filebeat/tests/system/../../../libbeat/tests/system/beat/beat.py", line 325, in wait_until
    "Waited {} seconds.".format(max_timeout))
TimeoutError: Timeout waiting for 'cond' to be true. Waited 10 seconds.

@urso we can merge as is? WDYT?

@adriansr

This comment has been minimized.

Copy link
Member Author

adriansr commented Aug 2, 2018

@ph just in case, I relaunched the failing job. Now it passed.

@ph ph merged commit b84e083 into elastic:master Aug 2, 2018
6 checks passed
6 checks passed
CLA Commit author is a member of Elasticsearch
Details
Hound No violations found. Woof!
beats-ci Build finished.
Details
codecov/patch 100% of diff hit (target 67.02%)
Details
codecov/project Absolute coverage decreased by -2.2% but relative coverage increased by +32.97% compared to 5f05ac7
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ph

This comment has been minimized.

Copy link
Member

ph commented Aug 2, 2018

@adriansr I've merged it, I will take a look as soon as possible to find why this test is not reliable.

adriansr added a commit to adriansr/beats that referenced this pull request Aug 6, 2018
This fixes a goroutine leaking every time that a harvester is closed.

(cherry picked from commit b84e083)
@adriansr adriansr added v6.4.0 and removed needs_backport labels Aug 6, 2018
urso added a commit that referenced this pull request Aug 6, 2018
Cherry-pick of PR #7820 to 6.4 branch. Original message: 

This fixes a goroutine leaking every time that a harvester is closed.
ph added a commit to ph/beats that referenced this pull request Dec 17, 2018
This fixes a goroutine leaking every time that a harvester is closed.

(cherry picked from commit b84e083)
@ph ph added the v6.5.4 label Dec 17, 2018
ph added a commit to ph/beats that referenced this pull request Dec 17, 2018
This fixes a goroutine leaking every time that a harvester is closed.

(cherry picked from commit b84e083)
@ph ph added the v6.6.0 label Dec 17, 2018
ph added a commit to ph/beats that referenced this pull request Dec 17, 2018
This fixes a goroutine leaking every time that a harvester is closed.

(cherry picked from commit b84e083)
ph added a commit that referenced this pull request Dec 17, 2018
Cherry-pick of PR #7820 to 6.5 branch. Original message: 

This fixes a goroutine leaking every time that a harvester is closed.
ph added a commit that referenced this pull request Dec 17, 2018
Cherry-pick of PR #7820 to 6.x branch. Original message: 

This fixes a goroutine leaking every time that a harvester is closed.
lucksuper added a commit to lucksuper/beats that referenced this pull request Dec 23, 2018
…stic#9592)

Cherry-pick of PR elastic#7820 to 6.5 branch. Original message:

This fixes a goroutine leaking every time that a harvester is closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.