Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats/opentelemetry: Add OpenTelemetry instrumentation component #7066

Closed
30 changes: 15 additions & 15 deletions stats/opentelemetry/client_metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -202,21 +202,21 @@ func (csh *clientStatsHandler) processRPCEnd(ctx context.Context, mi *metricsInf
}

const (
// ClientAttemptStartedName is the name of the client attempt started
// metric.
ClientAttemptStartedName = MetricName("grpc.client.attempt.started")
// ClientAttemptDurationName is the name of the client attempt duration
// metric.
ClientAttemptDurationName = MetricName("grpc.client.attempt.duration")
// ClientAttemptSentCompressedTotalMessageSize is the name of the client
// attempt sent total compressed message size metric.
ClientAttemptSentCompressedTotalMessageSize = MetricName("grpc.client.attempt.sent_total_compressed_message_size")
// ClientAttemptRcvdCompressedTotalMessageSize is the name of the client
// attempt rcvd total compressed message size metric.
ClientAttemptRcvdCompressedTotalMessageSize = MetricName("grpc.client.attempt.rcvd_total_compressed_message_size")
// ClientCallDurationName is the name of the client call duration metric.
ClientCallDurationName = MetricName("grpc.client.call.duration")
// ClientAttemptStarted is the number of client call attempts started.
ClientAttemptStarted Metric = "grpc.client.attempt.started"
// ClientAttemptDuration is the end-to-end time taken to complete a client
// call attempt.
ClientAttemptDuration Metric = "grpc.client.attempt.duration"
// ClientAttemptSentCompressedTotalMessageSize is the compressed message
// bytes sent per client call attempt.
ClientAttemptSentCompressedTotalMessageSize Metric = "grpc.client.attempt.sent_total_compressed_message_size"
// ClientAttemptRcvdCompressedTotalMessageSize is the compressed message
// bytes received per call attempt.
ClientAttemptRcvdCompressedTotalMessageSize Metric = "grpc.client.attempt.rcvd_total_compressed_message_size"
// ClientCallDurationName is the time taken by gRPC to complete an RPC from
// application's perspective.
ClientCallDurationName Metric = "grpc.client.call.duration"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is still named Name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, whoops. Removed.

)

// DefaultClientMetrics are the default client metrics provided by this module.
var DefaultClientMetrics = NewMetrics(ClientAttemptStartedName, ClientAttemptDurationName, ClientAttemptSentCompressedTotalMessageSize, ClientAttemptRcvdCompressedTotalMessageSize, ClientCallDurationName)
var DefaultClientMetrics = NewMetrics(ClientAttemptStarted, ClientAttemptDuration, ClientAttemptSentCompressedTotalMessageSize, ClientAttemptRcvdCompressedTotalMessageSize, ClientCallDurationName)
15 changes: 15 additions & 0 deletions stats/opentelemetry/e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,21 @@ import (
"go.opentelemetry.io/otel/sdk/metric/metricdata/metricdatatest"
)

func ExampleNewMetrics() {
// Will disable metrics, set with no metrics specified.
NewMetrics()
// Will enable these two metrics.
NewMetrics(ClientAttemptDuration, ServerCallDuration)
// Use DefaultClientMetrics to enable default client metrics. Equivalent to
// unset, which will pick up defaults.
}

func ExampleMetric_Remove() {
// Use DefaultClientMetrics without ClientAttemptDuration and
// ClientCallDurationName.
DefaultClientMetrics.Remove(ClientAttemptDuration, ClientCallDurationName)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these would be more useful on the DialOption / ServerOption, or Options, and should show how to build the DialOption. Just one line of code isn't super useful.

E.g. something like:

func ExampleDialOption_ExcludeMetrics() {
	// To exclude specific metrics, initialize Options as follows:
	otelOpts := opentelemetry.Option{
		MetricsOptions: MetricsOptions{
			Metrics: DefaultClientMetrics.Remove(opentelemetry.ClientAttemptDuration,
		},
	}
	dialOption := opentelemetry.DialOption(otelOpts)
	cc, _ := grpc.Dial("<target string>",  dialOption)
	// Handle err.
	defer cc.Close()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't the examples technically have to be linked to a single exported function from the package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there are many ways of attaching examples at different places in the package. Check the testing documentation for more details: https://pkg.go.dev/testing#hdr-Examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added numerous examples.


var defaultTestTimeout = 5 * time.Second

type s struct {
Expand Down
55 changes: 27 additions & 28 deletions stats/opentelemetry/opentelemetry.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,20 +37,20 @@ var canonicalString = internal.CanonicalString.(func(codes.Code) string)

var joinDialOptions = internal.JoinDialOptions.(func(...grpc.DialOption) grpc.DialOption)

// MetricName is a name of a metric.
type MetricName string
// Metric is an identifier for a metric provided by this package.
type Metric string

// Metrics is a set of metrics to record. Once created, Metrics is immutable,
// however Add and Remove can make copies with specific metrics added or
// removed, respectively.
type Metrics struct {
// metrics are the set of metrics to initialize.
metrics map[MetricName]bool
metrics map[Metric]bool
}

// NewMetrics returns a Metrics with the metrics provided specified.
func NewMetrics(metrics ...MetricName) *Metrics {
newMetrics := make(map[MetricName]bool)
// NewMetrics returns a Metrics containing the Metric's provided.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another choice is to say "containing metrics." - directly referencing the parameter's name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, switched.

func NewMetrics(metrics ...Metric) *Metrics {
newMetrics := make(map[Metric]bool)
for _, metric := range metrics {
newMetrics[metric] = true
}
Expand All @@ -61,8 +61,8 @@ func NewMetrics(metrics ...MetricName) *Metrics {

// Add adds the metrics to the metrics set and returns a new copy with the
// additional metrics.
func (m *Metrics) Add(metrics ...MetricName) *Metrics {
newMetrics := make(map[MetricName]bool)
func (m *Metrics) Add(metrics ...Metric) *Metrics {
newMetrics := make(map[Metric]bool)
for metric := range m.metrics {
newMetrics[metric] = true
}
Expand All @@ -77,8 +77,8 @@ func (m *Metrics) Add(metrics ...MetricName) *Metrics {

// Remove removes the metrics from the metrics set and returns a new copy with
// the metrics removed.
func (m *Metrics) Remove(metrics ...MetricName) *Metrics {
newMetrics := make(map[MetricName]bool)
func (m *Metrics) Remove(metrics ...Metric) *Metrics {
newMetrics := make(map[Metric]bool)
for metric := range m.metrics {
newMetrics[metric] = true
}
Expand All @@ -99,17 +99,19 @@ type Options struct {

// MetricsOptions are the metrics options for OpenTelemetry instrumentation.
type MetricsOptions struct {
// MeterProvider is the MeterProvider instance that will be used for access
// to Named Meter instances to instrument an application. To enable metrics
// collection, set a meter provider. If unset, no metrics will be recorded.
// Any implementation knobs (i.e. views, bounds) set in the passed in object
// take precedence over the API calls from the interface in this component
// (i.e. it will create default views for unset views).
// MeterProvider is the MeterProvider instance that will be used to create
// instruments. To enable metrics collection, set a meter provider. If
// unset, no metrics will be recorded. Any implementation knobs (i.e. views,
// bounds) set in the MeterProvider take precedence over the API calls from
// this interface. (i.e. it will create default views for unset views).
MeterProvider metric.MeterProvider
dfawley marked this conversation as resolved.
Show resolved Hide resolved
// Metrics are the metrics to instrument. Will turn on the corresponding
// metric supported by the client and server instrumentation components if
// applicable.

// Metrics are the metrics to instrument. Will create instrument and record telemetry
// for corresponding metric supported by the client and server
// instrumentation components if applicable. If not set, the default metrics
// will be recorded.
Metrics *Metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a user sets an unsupported metric? (E.g. if they specify server metrics when they initialize the client)? Ideally DialOption would error??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembering discussing this with Yash, and I think the decision was just to leave this as a no-op in this scenario (I.e. just ignore the metrics). Right @yashykt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, no-op is what i'm thinking. Consider the case of experimental metrics. There might be a case where we decide to remove experimental metrics and we don't want removal of metrics to break user code (if they had enabled the experimental metrics).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds good to me. Let me know what you think Doug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, consider the case where they're initializing a DialOption and accidentally passing the default server metrics object. I could easily see this happening, as all the types line up and everything. In this scenario, they'll get no metrics at all.

If we remove an experimental metric, maybe the user wants an init-time error so they know it was removed?

And in our case, that's what will happen anyway, unless we leave the const behind for the removed metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're calling the metrics in A66 stable, which is why we have default + exported consts. I think the goal was the defaults now are stable (i.e. ones defined in A66).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a user set Server Metrics for Dial Option :)? I guess the tradeoff is the scenario you mentioned earlier vs. being defensive now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By accident. Spot the bug:

opentelemetry.DialOption(opentelemetry.Options{
	LoggingOptions: ...,
	MetricsOptions: opentelemetry.MetricsOptions{
		MeterProvider: blahblahblah,
		Metrics: opentelemetry.DefaultServerMetrics,
		TargetAttributeFilter: func(t target) bool {
			return allowedTargets[t]
		},
	},
	TracingOptions: ...,
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's separate the metrics field to client and server metrics then for type safety? not entirely type safe but at least the user knows whether setting for client or sever and not tied to DialOption/ServerOption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline; decided to make it DefaultMetrics that contains both client and server, which will be ignored on the other side to the initialized option.


// TargetAttributeFilter is a callback that takes the target string of the
// channel and returns a bool representing whether to use target as a label
// value or use the string "other". If unset, will use the target string as
Expand Down Expand Up @@ -245,7 +247,7 @@ type serverMetrics struct {
callDuration metric.Float64Histogram
}

func createInt64Counter(setOfMetrics map[MetricName]bool, metricName MetricName, meter metric.Meter, options ...metric.Int64CounterOption) metric.Int64Counter {
func createInt64Counter(setOfMetrics map[Metric]bool, metricName Metric, meter metric.Meter, options ...metric.Int64CounterOption) metric.Int64Counter {
if _, ok := setOfMetrics[metricName]; !ok {
return nil
}
Expand All @@ -257,7 +259,7 @@ func createInt64Counter(setOfMetrics map[MetricName]bool, metricName MetricName,
return ret
}

func createInt64Histogram(setOfMetrics map[MetricName]bool, metricName MetricName, meter metric.Meter, options ...metric.Int64HistogramOption) metric.Int64Histogram {
func createInt64Histogram(setOfMetrics map[Metric]bool, metricName Metric, meter metric.Meter, options ...metric.Int64HistogramOption) metric.Int64Histogram {
if _, ok := setOfMetrics[metricName]; !ok {
return nil
}
Expand All @@ -269,7 +271,7 @@ func createInt64Histogram(setOfMetrics map[MetricName]bool, metricName MetricNam
return ret
}

func createFloat64Histogram(setOfMetrics map[MetricName]bool, metricName MetricName, meter metric.Meter, options ...metric.Float64HistogramOption) metric.Float64Histogram {
func createFloat64Histogram(setOfMetrics map[Metric]bool, metricName Metric, meter metric.Meter, options ...metric.Float64HistogramOption) metric.Float64Histogram {
if _, ok := setOfMetrics[metricName]; !ok {
return nil
}
Expand All @@ -284,14 +286,11 @@ func createFloat64Histogram(setOfMetrics map[MetricName]bool, metricName MetricN
// Users of this component should use these bucket boundaries as part of their
// SDK MeterProvider passed in. This component sends this as "advice" to the
// API, which works, however this stability is not guaranteed, so for safety the
// SDK Meter Provider should set these bounds.
// SDK Meter Provider provided should set these bounds for corresponding
// metrics.
var (
// DefaultLatencyBounds are the default bounds for latency metrics. Users of
// this component should set these bucket boundaries as part of their SDK
// MeterProvider passed in for desired latency metrics.
// DefaultLatencyBounds are the default bounds for latency metrics.
DefaultLatencyBounds = []float64{0, 0.00001, 0.00005, 0.0001, 0.0003, 0.0006, 0.0008, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.008, 0.01, 0.013, 0.016, 0.02, 0.025, 0.03, 0.04, 0.05, 0.065, 0.08, 0.1, 0.13, 0.16, 0.2, 0.25, 0.3, 0.4, 0.5, 0.65, 0.8, 1, 2, 5, 10, 20, 50, 100} // provide "advice" through API, SDK should set this too
dfawley marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be nice to show users how to do this, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as part of example. Verified it worked by playing around with example configuration wrt bounds and seeing the result.

// DefaultSizeBounds are the default bounds for metrics which record size.
// Users of this component should set these bucket boundaries as part of
// their SDK MeterProvider passed in for desired size metrics.
DefaultSizeBounds = []float64{0, 1024, 2048, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864, 268435456, 1073741824, 4294967296}
)
23 changes: 12 additions & 11 deletions stats/opentelemetry/server_metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -150,17 +150,18 @@ func (ssh *serverStatsHandler) processRPCEnd(ctx context.Context, mi *metricsInf
}

const (
// ServerCallStartedName is the name of the server call started metric.
ServerCallStartedName = MetricName("grpc.server.call.started")
// ServerCallSentCompressedTotalMessageSize is the name of the server call
// sent total compressed message size metric.
ServerCallSentCompressedTotalMessageSize = MetricName("grpc.server.call.sent_total_compressed_message_size")
// ServerCallRcvdCompressedTotalMessageSize is the name of the server call
// rcvd total compressed message size metric.
ServerCallRcvdCompressedTotalMessageSize = MetricName("grpc.server.call.rcvd_total_compressed_message_size")
// ServerCallDurationName is the name of the server call duration metric.
ServerCallDurationName = MetricName("grpc.server.call.duration")
// ServerCallStarted is the number of server calls started.
ServerCallStarted Metric = "grpc.server.call.started"
// ServerCallSentCompressedTotalMessageSize is the compressed message bytes
// sent per server call.
ServerCallSentCompressedTotalMessageSize Metric = "grpc.server.call.sent_total_compressed_message_size"
// ServerCallRcvdCompressedTotalMessageSize is the compressed message bytes
// received per server call.
ServerCallRcvdCompressedTotalMessageSize Metric = "grpc.server.call.rcvd_total_compressed_message_size"
// ServerCallDuration is the end-to-end time taken to complete a call from
// server transport's perspective.
ServerCallDuration Metric = "grpc.server.call.duration"
)

// DefaultServerMetrics are the default server metrics provided by this module.
var DefaultServerMetrics = NewMetrics(ServerCallStartedName, ServerCallSentCompressedTotalMessageSize, ServerCallRcvdCompressedTotalMessageSize, ServerCallDurationName)
var DefaultServerMetrics = NewMetrics(ServerCallStarted, ServerCallSentCompressedTotalMessageSize, ServerCallRcvdCompressedTotalMessageSize, ServerCallDuration)