Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track closed connections and reason for closing #692

Merged
merged 8 commits into from
Jun 27, 2018
Merged

Conversation

derekcollison
Copy link
Member

This PR allows the server to maintain state on closed connections. Uses a fixed size ring buffer that is configurable. Monitoring endpoints now can return closed connections via /connz?state=closed and will report when they closed and why.

Adresses #520 and #663

Signed-off-by: Derek Collison derek@nats.io

/cc @nats-io/core

Signed-off-by: Derek Collison <derek@nats.io>
@coveralls
Copy link

coveralls commented Jun 26, 2018

Coverage Status

Coverage decreased (-0.2%) to 92.431% when pulling ec8e263 on closed into e155332 on master.

Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that there is an error in ConnInfo.fill() where you don't use the passed nc. Also, the reason for the close we would want to report may not be the one that is recorded due to error detection lower in the stack.

return ConnOpen, nil
}
switch strings.ToLower(str) {
case "open":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be opened since you have closed (or add both open, opened and close, closed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think open is correct since it means perpetual vs closed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

func (reason ClosedState) String() string {
switch reason {
case ClientClosed:
return "Client"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Closed instead or Client Closed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it as Client Closed, but thought it read better looking at the JSON as reason for closed is Client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


// Fixed sized ringbuffer for closed connections.
type closedRingBuffer struct {
total uint64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that make more sense to add a mutex in that object and so we don't need server lock for that? When I create a new object like that, I tend to favor that object having its own lock so we can reduce lock contention. Especially this is a singleton, so adding a mutex should not be costly mem wise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that too, but decided its very fast and will only hold for time to copy. I was thinking about preallocating the structs too. I could put a lock around it.

@@ -834,6 +838,35 @@ func (s *Server) createClient(conn net.Conn) *client {
return c
}

// This will save off a closed client in a ring buffer such that
// /connz can inspect. Useful for debugging, etc.
func (s *Server) saveClosedClient(c *client, nc net.Conn, reason ClosedState) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a receiving of the ring buffer object (especially if we use a dedicated lock there)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it could. Feels like a server verb though that just uses the ring buffer. We do the time stamp etc.

ci.InMsgs = atomic.LoadInt64(&client.inMsgs)
ci.InBytes = atomic.LoadInt64(&client.inBytes)

// If the connection is gone, too bad, we won't set TLSVersion and TLSCipher.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you should use nc down below, not client.nc...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree will fix.

server/server.go Outdated
@@ -786,7 +790,7 @@ func (s *Server) createClient(conn net.Conn) *client {
if err := conn.Handshake(); err != nil {
c.Errorf("TLS handshake error: %v", err)
c.sendErr("Secure Connection - TLS Required")
c.closeConnection()
c.closeConnection(TLSHandshakeError)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm... I tried to connect a non TLS client to a TLS server and I get Write Error, not TLS Handshake Error. I believe that since we call sendErr(), it is possible that flushOutbound() gets an error because the client will also close the connection (after receiving the error), so the actual close of the connection occurs there, with WriteError. That may then not prove that useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non TLS client won't try to handshake will it? I will look into it though, agree we want the correct reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they won't.

Actually, what happened is that I was connecting without TLS, so the client when getting the INFO fails the connect and says that secure connection is required. The server when getting into the Handshake will fail and send the error, but since the connection is closed it will get a write error, so that kind of makes sense.

But, I just tried with a client connection with TLS but without the CA root cert so it won't be able to verify the server certificate, and I still get the Write Error as a reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will fix and add a test, thanks.

@derekcollison
Copy link
Member Author

derekcollison commented Jun 26, 2018

I also just found a bug with closed connections and optional Subs (and AuthorizedUser). Once set it stays set, will fix in the am.

Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
@@ -267,7 +267,7 @@ func TestClosedMaxPayload(t *testing.T) {

func TestClosedSlowConsumerWriteDeadline(t *testing.T) {
opts := DefaultOptions()
opts.WriteDeadline = 10 * time.Microsecond // Make very small to trip.
opts.WriteDeadline = 10 * time.Millisecond // Make very small to trip.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not enough. Bumping it to 30ms in TestWriteDeadline for very same reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it was originally at 1 ms though), so that may be ok...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to pass on travis..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can change it..

@kozlovic
Copy link
Member

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants