grpc: add streaming metrics #3048

kyessenov · 2020-10-12T22:23:06Z

Signed-off-by: Kuat Yessenov kuat@google.com

Adds metrics for request and response message counts.
These are enabled by default, but I can add a guard.
Do not report 0 counter values. This is useful for gRPC counts since the metric applies to all HTTP traffic. Peers may interchange HTTP and gRPC traffic on the same connection, so we cannot predict whether gRPC metric is not needed.
Interval reporting will be in a follow-up.

Signed-off-by: Kuat Yessenov <kuat@google.com>

douglas-reid · 2020-10-13T00:08:03Z

testdata/metric/client_request_messages.yaml.tmpl

+    value: default
+  - name: request_protocol
+    value: grpc
+  - name: response_code


do we need response_code on these metrics? it is always 200, correct?

Yeah, it is redundant. It's an artifact of the implementation, that should be fixed I think.

douglas-reid · 2020-10-13T00:09:38Z

testdata/metric/client_request_messages.yaml.tmpl

+  - name: grpc_response_status
+    value: "{{ .Vars.GrpcResponseStatus }}"
+  - name: response_flags
+    value: DC


do we expect DC in the base case for client metrics? does response_flags add value here?

DC is downstream closure (client closed connection before server responded, intentionally). I think it's valuable here.

douglas-reid · 2020-10-13T00:17:57Z

testdata/metric/client_request_messages.yaml.tmpl

+    value: server
+  - name: destination_service_namespace
+    value: default
+  - name: request_protocol


i wonder, is this useful? will these metrics already imply grpc?

Yes. The same problem is with TCP (it's also tcp there). It's not useful.

douglas-reid · 2020-10-13T00:22:39Z

extensions/stats/plugin.cc

                    [](const ::Wasm::Common::RequestInfo& request_info)
                        -> uint64_t { return request_info.response_size; },
                    false},
+      // GRPC metrics.
+      MetricFactory{
+          "request_messages", MetricType::Counter,


should we consider making these request_messages_total and response_messages_total to align with prom best practices: https://prometheus.io/docs/practices/naming/

looks like we went with _count instead of _total ?

Oops, renamed to total.

Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov · 2020-10-13T01:03:16Z

Removed a bunch of labels and can remove more. They can be back-filled with metric customization.

extensions/common/context.cc

kyessenov · 2020-10-13T03:14:17Z

/test release-centos-test_proxy

Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov · 2020-10-14T18:00:25Z

/test test-tsan_proxy

istio-testing · 2020-10-14T18:43:39Z

@kyessenov: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
release-centos-test_proxy	`1bfc85b`	link	`/test release-centos-test_proxy`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

kyessenov · 2020-10-14T19:15:17Z

/test test-tsan_proxy

kyessenov · 2020-10-14T20:56:19Z

@douglas-reid @mandarjog is this good to go for non-interval reporting? I'm afraid interval reporting is going to run into some Wasm lifecycle issue, so want to separate it out for now.

bianpengyuan · 2020-10-15T21:04:21Z

extensions/common/context.cc

@@ -391,6 +395,19 @@ void populateTCPRequestInfo(bool outbound, RequestInfo* request_info,
  request_info->request_protocol = kProtocolTCP;
 }

+bool populateGRPCInfo(RequestInfo* request_info) {


No need to return boolean here? does not seem to be used and it always returns false.

It returns true if it parses correctly (SimpleAtoi return bool). It's not used yet, but seems reasonable to have a status return.

extensions/common/context.cc

extensions/common/context.h

bianpengyuan · 2020-10-15T21:19:58Z

extensions/stats/plugin.cc

+          [](const ::Wasm::Common::RequestInfo& request_info) -> uint64_t {
+            return request_info.request_message_count;
+          },
+          false, count_peer_labels},


Can you add some comments here about what labels are ignored for this metric?

Also would connection_security_policy worth to be added?

Or we are considering that it is already tracked by request total, thus it is a duplicated here and this should just focus on the message count.

Yeah, I only put peer labels, since the other metrics apply to the outer request rather than individual messages within them. I'm not sure if we want more, so I am starting from the minimal set.

test/envoye2e/driver/grpc.go

bianpengyuan · 2020-10-15T21:35:27Z

test/envoye2e/stats_plugin/stats_test.go

+		"WasmRuntime":                "envoy.wasm.runtime.null",
+		"DisableDirectResponse":      "true",
+		"UsingGrpcBackend":           "true",
+		"GrpcResponseStatus":         "2",


This is not used?

bianpengyuan

lg, only minor comments.

Signed-off-by: Kuat Yessenov <kuat@google.com>

gargnupur · 2020-10-15T21:54:59Z

extensions/stats/plugin.cc

+      MetricFactory{
+          "response_messages_count", MetricType::Counter,
+          [](const ::Wasm::Common::RequestInfo& request_info) -> uint64_t {
+            return request_info.response_message_count;


how about metric for avg size of these messages?

That would be difficult I think. The messages are framed by another filter, so we don't have a way to accumulate their sizes here. Open to any suggestions how to make that possible.

I think it would be useful.
@kyessenov is it possible to add it upstream? We need a distribution of message sizes.
We need not hold this PR for it though.

We can report size distribution to native envoy stats, but I don't know how to get it out back here.

not sure either... but can we use grpc stats filter only to get message size and add it in the above stats?

There might be many messages, so we'd be passing a large list of values for sizes. We would need histogram summaries I think, to make it work.

Signed-off-by: Kuat Yessenov <kuat@google.com>

mandarjog

lgtm

add streaming metrics

d865967

Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov requested a review from a team October 12, 2020 22:23

kyessenov requested a review from a team as a code owner October 12, 2020 22:23

istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 12, 2020

google-cla bot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Oct 12, 2020

kyessenov requested review from bianpengyuan and gargnupur October 12, 2020 22:23

kyessenov added 2 commits October 12, 2020 17:04

revert change

ed42949

Signed-off-by: Kuat Yessenov <kuat@google.com>

lint

707a0ff

Signed-off-by: Kuat Yessenov <kuat@google.com>

douglas-reid reviewed Oct 13, 2020

View reviewed changes

remove extra labels

d47fb2b

Signed-off-by: Kuat Yessenov <kuat@google.com>

mandarjog reviewed Oct 13, 2020

View reviewed changes

extensions/common/context.cc Show resolved Hide resolved

kyessenov added 2 commits October 13, 2020 10:44

do not report 0 counters

1bfc85b

Signed-off-by: Kuat Yessenov <kuat@google.com>

Merge remote-tracking branch 'upstream/master' into grpc_stream_counts

7fbb439

Merge remote-tracking branch 'upstream/master' into grpc_stream_counts

46253c9

bianpengyuan reviewed Oct 15, 2020

View reviewed changes

extensions/common/context.cc Show resolved Hide resolved

gargnupur reviewed Oct 15, 2020

View reviewed changes

extensions/common/context.h Outdated Show resolved Hide resolved

bianpengyuan reviewed Oct 15, 2020

View reviewed changes

test/envoye2e/driver/grpc.go Outdated Show resolved Hide resolved

bianpengyuan reviewed Oct 15, 2020

View reviewed changes

comments

6c60226

Signed-off-by: Kuat Yessenov <kuat@google.com>

bianpengyuan approved these changes Oct 15, 2020

View reviewed changes

gargnupur reviewed Oct 15, 2020

View reviewed changes

rename count to total

7d3a5e8

Signed-off-by: Kuat Yessenov <kuat@google.com>

mandarjog approved these changes Oct 15, 2020

View reviewed changes

gargnupur approved these changes Oct 15, 2020

View reviewed changes

istio-testing merged commit 3317993 into istio:master Oct 15, 2020

kyessenov mentioned this pull request Nov 10, 2020

preliminary release notes for 1.8 istio/istio.io#8415

Merged

grpc: add streaming metrics #3048

grpc: add streaming metrics #3048

Conversation

kyessenov commented Oct 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyessenov commented Oct 13, 2020

kyessenov commented Oct 13, 2020

kyessenov commented Oct 14, 2020

istio-testing commented Oct 14, 2020 • edited Loading

kyessenov commented Oct 14, 2020

kyessenov commented Oct 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bianpengyuan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mandarjog left a comment

Choose a reason for hiding this comment

kyessenov commented Oct 12, 2020 •

edited

Loading

istio-testing commented Oct 14, 2020 •

edited

Loading