Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Transport analytics #12580

Closed
bradfitz opened this issue Sep 10, 2015 · 31 comments
Closed

net/http: Transport analytics #12580

bradfitz opened this issue Sep 10, 2015 · 31 comments
Milestone

Comments

@bradfitz
Copy link
Contributor

I keep hearing requests for getting more analytics and timing information out of net/http.Transport.

I will reference some requests and use cases and similar bugs here.

Feel free to add things I'm missing.

I have a rough design idea to add an optional func field to http.Transport to let people return an opaque-to-net/http interface{} at the beginning of a request and get it passed back to them at various stages of the request, so people can do their own timing/logging. I want to be careful not to give people too many hooks to modify internals and constrain the transport implementation in the future (which includes the HTTP/2 implementation), so I'll probably start pretty conservatively, but with an eye towards adding more steps in the future as needed.

But let's gather requirements before discussing implementation too much.

@bradfitz bradfitz self-assigned this Sep 10, 2015
@bradfitz bradfitz added this to the Go1.6 milestone Sep 10, 2015
@bradfitz
Copy link
Contributor Author

Only somewhat (but perhaps should be more?) related: contexts in http requests: https://godoc.org/golang.org/x/net/context/ctxhttp

@tombergan
Copy link
Contributor

To add to this feature request, I would like to compute the following statistics, for each RoundTrip call:

  • Whether or not an idle connection was reused
  • If a new connection was dialed, the DNS lookup time (same as net: allow custom Resolver method implementation(s) #12503)
  • If a new connection was dialed, the TCP connect time
  • TTFB (time between sending the first byte of the request and receiving the first byte of the response)

I would also like the following, although they are slightly less important:

  • RemoteAddr used by the connection (same as the golang-nuts post linked in the first comment)
  • The time it took to establish a connection, either via reusing an idle connection or dialing a fresh connection (note that even when an idle connection is reused, this time can be nontrivial, because an idle connection may have become available while a dial was in progress)

I don't need the LocalAddr used by the connection (#6732) , but if there's a way to get the RemoteAddr, there might as well be a way to get the LocalAddr.

@tombergan
Copy link
Contributor

It's somewhat unfortunate that http.Transport.Dial and combines both "Resolve" and "Dial". Otherwise, I think there is a very simple set of hooks that could satisfy all the requirements so far:

type TransportHooks struct {
  AfterResolve         func(interface{}, []net.Addr, error)
  AfterDial            func(interface{}, net.Conn, error, isReused bool) // includes https setup
  OnFirstResponseByte  func(interface{})
}

@brknstrngz
Copy link

Please, oh please, add a hook called AfterResponseReceived to that list, that takes an interface{} and the net.Conn the response was received on. Together with AfterDial it could be used to implement much more flexible pooling, by returning the net.Conn to the pool after the response is received.

@bradfitz
Copy link
Contributor Author

@vgalu, I don't understand your use case. The existing Transport already does connection pooling. What do you hope to achieve? I don't want to provide access to net.Conns, lest people try to read or write from them.

@brknstrngz
Copy link

@bradfitz: #12289

It is currently quite easy to run out of ephemeral ports as there is no upper bound on the idle+active number of connections in the pool.

@Redundancy
Copy link

I'm a little concerned about the inconvenience and potential for temporal coupling and fragile assumptions when trying to put callbacks on an object like the Transport that can simultaneously service many libraries and threads. Somehow I need to safely and correctly associate those callbacks with my individual requests.

Clearly there are issues with backwards compatibility / function signatures, but in this sort of environment it would seem safer to be able to handle these callbacks in a per-request manner, either through return values or a per-request callback. In an almost ideal world, a variation of http.Client.Do would be ideal.

@tombergan
Copy link
Contributor

@bradfitz: I don't want to provide access to net.Conns, lest people try to read or write from them.

The Conn is already accessible (for freshly dialed connections) via Transport.Dial, so technically that problem already exists.

I'm curious, though, why you're leaning towards a callback interface? Callbacks can't really do anything except call time.Now and write a log entry, unless the callbacks are allowed to customize RoundTrip similarly to Transport.Dial. Further, Callbacks have issues like those raised by @Redundancy. A new method like the following seems to satisfy all of the use cases stated so far (except for @vgalu's use case, which I don't quite understand either):

// Note that this can be packaged into an http.RoundTripper via a per-request struct like:
// type foo struct { *Transport, *RoundTripAnalytics }
// where foo.RoundTrip invokes Transport.RoundTripWithAnalytics and captures the result
RoundTripWithAnalytics(*Request) (*Response, *RoundTripAnalytics, error)

@bradfitz
Copy link
Contributor Author

Let's not discuss code yet. It's a distraction at this point.

I'm trying to gather requirements.

@nightlyone
Copy link
Contributor

What about something like per-request events? Those report any life-cycle events of a request.

Defining a born and dead event, but reserving the right to insert any events in between.

All events deliver timing information and an request identification valid between born and dead (ideally valid forever).

If events could no be delivered, the amount of missed events will be counted and the first event transmitted next will be a "you missed X events" event (the miss-event), where X will be incremented every time that miss-event could not be delivered, too.

A private, buffered channel could be that API. It would be delivered as an read-only channel to the consumer.

RoundTrippers could then insert any kind of event at any point, provided they at least provide born, dead and miss-event. And they would not slowed down by slow consumers.

Note: I am not sure how to solve request identification.

@bradfitz
Copy link
Contributor Author

@nightlyone, did you have any requirements?

@brknstrngz
Copy link

@tombergan our use case is quite simple: a high speed HTTP client sends out many (thousands/second) requests to an API server. We use HTTP 1.1, so when the pool fills up, even more connections are established. We see ephemeral ports exhaustion even with fast timewait recycling of those connections that for one reason or another close (the API server sends a Connection: close from a few endpoints). Fiddling with MaxIdle has no effect for this use case, as none of the connections in the pool are ever idle.

What we could use is a cap on the pool size so that attempting to get a connection from a maxed out pool either: a) returns a channel to the caller; the caller then waits on it until a connection becomes available or b) returns an err to the caller right away.

We do not mind doing our own housekeeping, but the current API only enables part of that - obtaining connections from a pool by providing a custom Dial() - we can stick a buffered channel in there to emulate the pool behaviour. Once the response for the request sent across that connection is received, there is no way to return the connection to the pool.

@tombergan
Copy link
Contributor

@vgalu: Got it. Sounds like all you need is for the following TODO to be fixed, which seems orthogonal to the feature being considered in this issue:
https://github.com/golang/go/blob/master/src/net/http/transport.go#L116

@brknstrngz
Copy link

@tombergan yes, thanks. A generic, callback-based approach looked interesting/useful enough to chime in.

@bradfitz
Copy link
Contributor Author

Belated sketch at https://go-review.googlesource.com/16501

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/16501 mentions this issue.

@sajal
Copy link

sajal commented Oct 29, 2015

Looks good.

If i understood correctly, I associate a httptrace.ClientTrace with my http.Transport, it would fire the functions Start, GotConn, etc(and in future DNS related things that im after) the moment those event occur, and i can stash timespamps inside the httptrace.Handle which would be the context for one particular request/response cycle.

@Redundancy
Copy link

I'm still not super excited about a global handler, but I think I see how this could work.

I make a request on a http.Transport that I need to wrap / override within a library (in case someone at a higher level of application logic is tracing requests).

My ClientTrace object has a function pointer that returns an object of my choosing, so I return a handle object to associate things with at the point where I have a connection, associate that handle with the IP address it connected to (and chain the call down to any other handler).

My calling function then sends the request to something that can use the headers and path to look up the handle, and can from there determine the IP that I connected to (perhaps I stuff a GUID per-request into a request header to make it easier to associate a specific request with a handle). That way I can also more easily ignore any requests that someone else is making on the Transport.

It's a bit of a pain for anyone creating a library, because they should defer choices about the Transport to the application layer (for things like proxies, right?), but have to modify it to get the information that they need.

@tombergan
Copy link
Contributor

@Redundancy My calling function then sends the request to something that can use the headers and path to look up the handle,

ClientTrace.Start gets a pointer to the http.Request, so with a bit of boilerplate it's easy to lookup a handle given an *http.Request.

@Redundancy That way I can also more easily ignore any requests that someone else is making on the Transport.

I think you're saying that the current API makes it impossible (or at least difficult) to change the tracing functions based on the context of the RoundTrip call. I can think of APIs that don't have this problem, but they're clumsier ... can you think of a compelling example where you need different tracing functions based on calling context?

For your library example, note that it's possible to allow the library and caller to both install their own tracing functions: the caller installs their ClientTrace object first, then the library installs a delegate object that invokes the caller's ClientTrace.Foo before calling its own ClientTrace.Foo.

@rsc rsc modified the milestones: Go1.7, Go1.6 Dec 5, 2015
@Redundancy
Copy link

I'm mostly worried that by modifying a passed Transport, I'm potentially causing side effects elsewhere in the case that transport is reused or (maybe worse) the default transport if the user doesn't read the documentation properly. Copying a transport struct would also be unusual.

I just feel uncomfortable with most of the ways that I have of getting the things that I need out of requests since all choices seem to leak out of the library somehow.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/22101 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 15, 2016
For #12580 (http.Transport tracing/analytics)
Updates #13021

Change-Id: I126e494a7bd872e42c388ecb58499ecbf0f014cc
Reviewed-on: https://go-review.googlesource.com/22101
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/22124 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 16, 2016
This simply connects the contexts, pushing them down the call stack.
Future CLs will utilize them.

For #12580 (http.Transport tracing/analytics)
Updates #13021

Change-Id: I5b2074d6eb1e87d79a767fc0609c84e7928d1a16
Reviewed-on: https://go-review.googlesource.com/22124
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/22191 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 28, 2016
…uests

Updates #12580

Change-Id: I9f9578148ef2b48dffede1007317032d39f6af55
Reviewed-on: https://go-review.googlesource.com/22191
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Tom Bergan <tombergan@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/22659 mentions this issue.

gopherbot pushed a commit that referenced this issue May 1, 2016
And add a test.

Updates #12580

Change-Id: Ia7eaba09b8e7fd0eddbcaefb948d01ab10af876e
Reviewed-on: https://go-review.googlesource.com/22659
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@bradfitz
Copy link
Contributor Author

bradfitz commented May 3, 2016

Now that this is submitted, I can submit the http2 half (using the +build go1.7 tag to safely use the new package and keep http2 working for older Go versions)

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/23069 mentions this issue.

mk0x9 pushed a commit to mk0x9/go that referenced this issue May 13, 2016
The httptrace.ConnectStart and ConnectDone hooks are just about the
post-DNS connection to the host. We were accidentally also firing on
the UDP dials to DNS. Exclude those for now. We can add them back
later as separate hooks if desired. (but they'd only work for pure Go
DNS)

This wasn't noticed earlier because I was developing on a Mac at the
time, which always uses cgo for DNS. When running other tests on
Linux, I started seeing UDP dials.

Updates golang#12580

Change-Id: I2b2403f2483e227308fe008019f1100f6300250b
Reviewed-on: https://go-review.googlesource.com/23069
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/23101 mentions this issue.

gopherbot pushed a commit that referenced this issue May 14, 2016
This fixes change https://go-review.googlesource.com/#/c/23069/, which
assumes all DNS requests are UDP. This is not true -- DNS requests can
be TCP in some cases. See:
https://tip.golang.org/src/net/dnsclient_unix.go#L154
https://en.wikipedia.org/wiki/Domain_Name_System#Protocol_transport

Also, the test code added by the above change doesn't actually test
anything because the test uses a faked DNS resolver that doesn't
actually make any DNS queries. I fixed that by adding another test
that uses the system DNS resolver.

Updates #12580

Change-Id: I6c24c03ebab84d437d3ac610fd6eb5353753c490
Reviewed-on: https://go-review.googlesource.com/23101
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@rsc
Copy link
Contributor

rsc commented May 18, 2016

What's left here? Not much time before the beta.

@bradfitz
Copy link
Contributor Author

Just a couple trivial http2 additions to call these hooks which I have pending on my laptop. They had to wait for the main CL to be submitted first. Aiming to send them out today. I was focusing on the harder bugs first.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/23206 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/23205 mentions this issue.

gopherbot pushed a commit to golang/net that referenced this issue May 18, 2016
Updates golang/go#12580

Change-Id: I95d7206a8fde494f88288b381814a5d72d42df5e
Reviewed-on: https://go-review.googlesource.com/23205
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@golang golang locked and limited conversation to collaborators May 18, 2017
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants