Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race condition in the TCP input #13038

Merged
merged 7 commits into from Jul 25, 2019
Merged

Fix a race condition in the TCP input #13038

merged 7 commits into from Jul 25, 2019

Conversation

ph
Copy link
Contributor

@ph ph commented Jul 23, 2019

Pass the net.Conn object when creating the client instead of passing it to the
Handle() method that keep a reference to it. By doing this we do not
have to worry about read or write race over the internal field. The
client still need a reference to the connection when the out of bound
call to Close() is executed to make sure we are getting out of a
Read() call early.

Tested with :

while true; do go test -v -race; done

Found in #13021

@ph ph requested a review from a team as a code owner July 23, 2019 19:02
@ph ph added Filebeat Filebeat review labels Jul 23, 2019
@ph ph assigned urso Jul 23, 2019
@ph
Copy link
Contributor Author

ph commented Jul 23, 2019

cc @vjsamuel for awareness, I break the new interface :(

ph added 3 commits July 23, 2019 15:22
Pass the `net.Conn` object when creating the client instead of passing it to the
`Handle()` method that keep a reference to it. By doing this we do not
have to worry about read or write race over the internal field. The
client still need a reference to the connection when the out of bound
call to `Close()` is executed to make sure we are getting out of a
`Read()` call early.

Tested with :

```
while true; do go test -v -race; done
```

Found in elastic#13021
@andrewkroh
Copy link
Member

Does this fix #12982?

@vjsamuel
Copy link
Contributor

thanks for the heads up.

@urso
Copy link

urso commented Jul 24, 2019

The default split handler stores the connection object only for shutdown. This is because the server expects the handler to close the connection as well. But what if we change the server instead.

type client struct {
  conn net.Conn
  handler ConnectionHandler
}

func (c *client) Stop() {
  c.handler.Stop()
  c.conn.Close()
}

...

  conn, err := s.Listener.Accept()
  ...
  handler := s.factory(*s.config)
  client := &client{ conn: conn, handler: handler}
  s.registerClient(client)

  ...
  go func() {
    ...

    client.handler.Handle(conn)

    ...
  }()

In order to reduce the house-keeping + reduce the interface of ConnectionHandler even further we can pass some shutdown context to the handler as well (no need for client type anymore).
For example:

type ConnectionHandler struct {
  Handle(CloseRef, net.Conn) error
}

type CloseRef struct {  // subset of context.Context btw.
  Done() <- chan struct
  Err() error
}

type server struct {
  closer closeRef
  wg sync.WaitGroup
  ...
}

type (s *server) Stop() {
  closer.Close()
}

   ...

   conn, err := s.Listener.Accept()
   ...
   handler := s.factory(*s.config)
   closeRef := withCloser(s.closer, conn) // shutdown on s.closer would close 'conn' and propagate shutdown.

  ...
  go func() {
    ...

    defer closeRef.Close() // close connection and unlink from signal propagation

    ...

    handler.Handle(closeRef, conn)

    ...
  }()

We have this use-case quite often. And we fail to implement it correctly every now and so often. Close a connection and tell some loop to quit. Would be nice if it's just part of setting up the actual go-routine, instead of forcing us to do extra house-keeping just to implement shutdown.

@ph
Copy link
Contributor Author

ph commented Jul 24, 2019

@andrewkroh Yes this is exactly that.

@urso I see where you are going with that, I will add a commit to this PR with the suggestion.

@ph
Copy link
Contributor Author

ph commented Jul 24, 2019

@urso I have added a closeRef/Closer in the code to make sure to propagate the close, seems to work fine.

Copy link

@urso urso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all I like how the closer bundles all complexity, while setup+shutdown handling become much more symmetric in code -> less bookeeping and biolerplate.

Next step: generalize closer into a generic common reusable concept.
But let's keep the PR as is for now and add an issue to clean this up later.

filebeat/inputsource/tcp/server.go Outdated Show resolved Hide resolved
@@ -97,7 +99,7 @@ func (c *splitHandler) Handle(conn net.Conn) error {
if err != nil {
// we are forcing a Close on the socket, lets ignore any error that could happen.
select {
case <-c.done:
case <-closer.Done():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the splitter returns an error due to us closing the connection, shall we supress the error then and just return nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently what I do I suppress errors in that scenario, in the beats input, I've decided otherwise and it created a lot of confusion for the users.

@ph
Copy link
Contributor Author

ph commented Jul 24, 2019

@urso I have made the interface public and cleaned up a bit so we can extract it without too much trouble.

c.mu.Lock()
err := c.err
c.mu.Unlock()
return err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any code that can set c.err besides close. That is, one could replace the mutex with a simple atomic.Bool and implement the body like: if !c.active.Load() { return ErrClosed }; return nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have to lock the children map though? I can revisit if when we move it to external repo.

@ph ph merged commit 106ad06 into elastic:master Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants