Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a metric for closed connections #2676

Merged
merged 1 commit into from Jul 20, 2020
Merged

Conversation

marten-seemann
Copy link
Member

I'm not really happy with this PR. This seems overly complicated (and converting a bool to string is not pretty either).

The reason this is so complicated is that there are many reasons a QUIC connection can be closed:

  1. It can run into a (handshake or post-handshake) timeout
  2. It can receive a stateless reset because the peer lost state
  3. The application can close the connection, in which case we want to log which side did it and which error code was used
  4. The transport can close the connection, in which case we also want to log which side did it and which error code was used

@lanzafame Does this look ok to you?

case logging.TimeoutReasonIdle:
return "idle_timeout"
default:
panic("unknown timeout reason")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't panic, just return "unknown". Metrics is never a good enough reason to crash a program.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree, although if we don't panic, those bugs will probably go unnoticed for a long time (until someone looks at the logs and sees a bunch of unknowns there). Anyway, this is a larger change, since we currently panic both in the logging and in the qlog package.

Copy link

@lanzafame lanzafame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the panic, this looks good.

@@ -106,7 +117,33 @@ func (t *connTracer) StartedConnection(local, _ net.Addr, _ logging.VersionNumbe
)
}

func (t *connTracer) ClosedConnection(logging.CloseReason) {}
func (t *connTracer) ClosedConnection(r logging.CloseReason) {
var tags []tag.Mutator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this could be simplified by doing more of this processing during the CloseReason creation already. For example the type of reason, or whether it's remote or not could be set when creating the CloseReason, rather than interpreting errors later on. Essentially, make CloseReason POD?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, CloseReason is a mess. I'd like to clean this up, but I'm not really sure how.

@codecov
Copy link

codecov bot commented Jul 20, 2020

Codecov Report

Merging #2676 into master will increase coverage by 0.09%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2676      +/-   ##
==========================================
+ Coverage   86.73%   86.82%   +0.09%     
==========================================
  Files         124      124              
  Lines        9878     9968      +90     
==========================================
+ Hits         8567     8654      +87     
- Misses        977      978       +1     
- Partials      334      336       +2     
Impacted Files Coverage Δ
qlog/qlog.go 95.36% <0.00%> (-0.32%) ⬇️
logging/multiplex.go 95.56% <0.00%> (+0.49%) ⬆️
server.go 84.94% <0.00%> (+1.46%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f10894a...8a21cf7. Read the comment docs.

@marten-seemann marten-seemann merged commit c8255cb into master Jul 20, 2020
@marten-seemann marten-seemann deleted the conn-close-metric branch July 22, 2020 07:19
@aschmahmann aschmahmann mentioned this pull request Sep 22, 2020
72 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants