Instrumentation hooks and Prometheus metrics #299

mwitkow · 2015-08-18T18:59:20Z

Implements a simple callback-based instrumentation hooks for gRPC. The choice of instrumentation is made through server.options and clinetconn.DialOption, leaving the user in full control. The default implementation is a No-Op, incurring no overhead.

The Prometheus implementation counts:

number of started RPCs (to be able to see RPCs in flight)
number of fully handled RPCs (to the app layer) with breakdown by statusCode, including latency measurements
number of erred RPCs in the RPC-layer
number of sent/received messages for streaming RPCS

The serverside screenshot of the metrics page of prometheus showign both streaming and unary rpcs is in the related bug: #240

Input needed:

naming for prometheus
testing?

matttproud · 2015-08-18T19:16:51Z

server.go

@@ -262,6 +276,14 @@ func (s *Server) processUnaryRPC(t transport.ServerTransport, stream *transport.
 			}
 		}()
 	}
+	monitor := s.opts.serverMonitor.NewServerMonitor(monitoring.Unary, stream.Method())
+	defer func() {


Curious: Have you benchmarked for throughput the overhead of the additional defer call on this procedure and elsewhere?

No, I haven't. But the gRPC codebase seems to be full of defers. Should I put a if monitor.(type) == monitoring.NoOpMonitor?

mwitkow · 2015-08-19T14:38:09Z

Added the client side stream monitoring, but I am not entirely correct if I got all the edge cases. @iamqizhao can you take a look? Even if this is not gonna make it upstream, we still intend to use it internally :)

pires · 2015-08-20T11:02:20Z

This is awesome! Looking for it to get merged so I can implement a sink to InfluxDB.

miekg · 2015-08-20T12:10:58Z

monitoring/monitoring.go

+	"google.golang.org/grpc/codes"
+)
+
+type RpcType string


Does this need to be exported?

Yup, see later ocmment.

miekg · 2015-08-20T16:48:40Z

On the whole this looks good and something worth having IMHO.

iamqizhao · 2015-09-15T21:37:00Z

call.go

@@ -111,6 +112,7 @@ func Invoke(ctx context.Context, method string, args, reply interface{}, cc *Cli
 			return toRPCErr(err)
 		}
 	}
+	monitor := cc.dopts.clientMonitor.NewClientMonitor(monitoring.Unary, method)


if cc.dopts.clientMonitor != nil {
...
}

Discussed below.

iamqizhao · 2015-09-15T21:53:10Z

The general design looks good to me.

mwitkow · 2015-09-21T10:32:39Z

Glad to hear that right after coming back from vacation :)

Two things to discuss:

would you prefer the NoOpMonitor to be removed and substituted with if monitor != nil? There's hardly any allocation done per-RPC so the performance impact should be negligible, but the code is clearer.
would you prefer a collapsed ClientMonitor and ServerMonitor interfaces? Their signatures would be the same.

mwitkow · 2015-09-28T16:58:54Z

@iamqizhao I have refactored the PR to:

remove NewClientMonitor and NewServerMonitor to be unified and have the same signature
rebase on top of current master, which includes the Picker changes that broke

Could you PTAL? :)

mwitkow · 2015-10-02T20:42:20Z

@iamqizhao, I understand that you guys have tons of other work. It would be at least useful to know whether this PR is considered for acceptance?

We're thinking about relying on it for our SLO monitoring (using different error.Codes to differentiated between user faults and our system faults), and knowing whether this PR has a chance of being upstreamed would be incredibly useful.

iamqizhao · 2015-10-02T21:23:38Z

yep, this PR can be accepted. But I need to check all the points you inserted the code and have not got time to do that. Sorry about the delay.

BTW, I know it is painful but can you sync your code to the latest?

mwitkow · 2015-10-05T07:46:39Z

@iamqizhao Great to hear that. I have rebase over the latest master.

I'm wondering how to add unit tests that will make the monitoring still work correctly across major refactors such as the ones I'm currently rebasing on. Maybe retrofit some integration tests with monitoring counters?

iamqizhao · 2015-10-05T08:00:06Z

On Mon, Oct 5, 2015 at 12:46 AM, Michal Witkowski notifications@github.com
wrote:

@iamqizhao https://github.com/iamqizhao Great to hear that. I have
rebase over the latest master.

FYI, I will consider merging this change after the WIP naming and load
balancing change is done. Sorry about that because this probably means you
need to do a couple of extra rebasing in the next a couple of weeks.

I'm wondering how to add unit tests that will make the monitoring still
work correctly across major refactors such as the ones I'm currently
rebasing on. Maybe retrofit some integration tests with monitoring counters?

I would suggest creating a dummy monitoring impl in the end2end test and
verify it works properly.

—
Reply to this email directly or view it on GitHub
#299 (comment).

mwitkow · 2015-10-05T08:02:55Z

That's ok :) LB and naming is something we're incredibly interested in (and willing to put some manpower behind DNS SRV implementation).

Can you provide some pointers regarding how to implement the things in the end2end test?

mwitkow · 2015-11-10T11:25:42Z

@iamqizhao, any updates here? we've been happily using this monitoring for the last 2 months in prod and are willing to help out getting it upstreamed :)

mwitkow · 2016-02-12T09:16:34Z

@iamqizhao, any updates on the metrics API? We've been happily using this in prod for a while now :)

raliste · 2016-03-18T21:56:32Z

+1 for updates here

Actively using grpc at preyproject.com

rolandshoemaker · 2016-04-13T22:33:39Z

@iamqizhao has this been depreciated by the server side interceptor currently being reviewed internally + proposals for client side interceptors?

The Boulder team at Let's Encrypt is currently working on moving away from our current RPC implementation in favor of gRPC but the lack of exposed metrics hooks is slowing us down somewhat (one of the reasons for moving to gRPC was to get rid of a bunch of non-CA code we had to maintain, including various hacks to collect client and server side metrics which we'd rather not re-implement).

We'd prefer to use something native to grpc-go rather than writing another set of wrappers/using additional code generators and are wondering if there is a prospect of this being merged or if the other proposals have superseded it (and in either case if you have a ETA, even if it is likely to shift).

iamqizhao · 2016-04-13T23:29:08Z

The ETA is by the end of this week or early next week. Sorry about the
delay. The internal review took much longer than I expected.

On Wed, Apr 13, 2016 at 3:33 PM, Roland Bracewell Shoemaker <
notifications@github.com> wrote:

@iamqizhao https://github.com/iamqizhao has this been depreciated by
the server side interceptor currently being reviewed internally + proposals
for client side interceptors?

The Boulder team at Let's Encrypt is currently working on moving away from
our current RPC implementation in favor of gRPC but the lack of exposed
metrics hooks is slowing us down somewhat (one of the reasons for moving to
gRPC was to get rid of a bunch of non-CA code we had to maintain, including
various hacks to collect client and server side metrics which we'd rather
not re-implement).

We'd prefer to use something native to grpc-go rather than writing
another set of wrappers/using additional code generators and are wondering
if there is a prospect of this being merged or if the other proposals have
superseded it (and in either case if you have a ETA, even if it is likely
to shift).

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#299 (comment)

mwitkow · 2016-05-13T14:14:24Z

I moved this implementation into a server-side interceptor under:
https://github.com/mwitkow/go-grpc-prometheus

mwitkow mentioned this pull request Aug 18, 2015

Instrumentation hooks #240

Closed

matttproud reviewed Aug 18, 2015
View reviewed changes

miekg reviewed Aug 20, 2015
View reviewed changes

iamqizhao reviewed Sep 15, 2015
View reviewed changes

mwitkow force-pushed the monitoring_take_i branch 2 times, most recently from 5c83df7 to 3598371 Compare September 28, 2015 16:57

mwitkow force-pushed the monitoring_take_i branch from 3598371 to 2bef982 Compare October 5, 2015 07:44

mon: generic RpcMonitoring and Prometheus integration.

58f6b41

mwitkow force-pushed the monitoring_take_i branch from 2bef982 to 58f6b41 Compare October 21, 2015 09:11

mwitkow mentioned this pull request May 3, 2016

*: add metrics for grpc api etcd-io/etcd#5249

Merged

mwitkow closed this May 13, 2016

lock bot locked as resolved and limited conversation to collaborators Jan 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrumentation hooks and Prometheus metrics #299

Instrumentation hooks and Prometheus metrics #299

mwitkow commented Aug 18, 2015

matttproud Aug 18, 2015

mwitkow Aug 19, 2015

mwitkow commented Aug 19, 2015

pires commented Aug 20, 2015

miekg Aug 20, 2015

mwitkow Aug 20, 2015

miekg commented Aug 20, 2015

iamqizhao Sep 15, 2015

mwitkow Sep 21, 2015

iamqizhao commented Sep 15, 2015

mwitkow commented Sep 21, 2015

mwitkow commented Sep 28, 2015

mwitkow commented Oct 2, 2015

iamqizhao commented Oct 2, 2015

mwitkow commented Oct 5, 2015

iamqizhao commented Oct 5, 2015

mwitkow commented Oct 5, 2015

mwitkow commented Nov 10, 2015

mwitkow commented Feb 12, 2016

raliste commented Mar 18, 2016

rolandshoemaker commented Apr 13, 2016

iamqizhao commented Apr 13, 2016

mwitkow commented May 13, 2016

Instrumentation hooks and Prometheus metrics #299

Instrumentation hooks and Prometheus metrics #299

Conversation

mwitkow commented Aug 18, 2015

matttproud Aug 18, 2015

Choose a reason for hiding this comment

mwitkow Aug 19, 2015

Choose a reason for hiding this comment

mwitkow commented Aug 19, 2015

pires commented Aug 20, 2015

miekg Aug 20, 2015

Choose a reason for hiding this comment

mwitkow Aug 20, 2015

Choose a reason for hiding this comment

miekg commented Aug 20, 2015

iamqizhao Sep 15, 2015

Choose a reason for hiding this comment

mwitkow Sep 21, 2015

Choose a reason for hiding this comment

iamqizhao commented Sep 15, 2015

mwitkow commented Sep 21, 2015

mwitkow commented Sep 28, 2015

mwitkow commented Oct 2, 2015

iamqizhao commented Oct 2, 2015

mwitkow commented Oct 5, 2015

iamqizhao commented Oct 5, 2015

mwitkow commented Oct 5, 2015

mwitkow commented Nov 10, 2015

mwitkow commented Feb 12, 2016

raliste commented Mar 18, 2016

rolandshoemaker commented Apr 13, 2016

iamqizhao commented Apr 13, 2016

mwitkow commented May 13, 2016