Question about correctness of metric.Meter implementation and use #4836

cep21 · 2024-01-20T18:57:23Z

Description

metric.Meter has the following code

type Meter interface {
...
  Int64Counter(name string, options ...Int64CounterOption) (Int64Counter, error)
...
}

Users, like grpc, have the following code.

	c.rpcResponseSize, err = c.meter.Int64Histogram("rpc."+role+".response.size",
		metric.WithDescription("Measures size of RPC response messages (uncompressed)."),
		metric.WithUnit("By"))
	if err != nil {
		otel.Handle(err)
	}

My problem is that metric.Meter does not document if it is ok to use the first returned value if the second is non-nil. In Go, this answer is generally assumed "no". In the example above, otel.Handle is called, but the function is not returned and continues. Later on, because there is no return, code could use the c.rpcResponseSize object and panic.

The only documentation for otel.Handle is

	// Handle handles any error deemed irremediable by an OpenTelemetry
	// component.
	Handle(error)

My assumption is that handle is intended to allow more than just panic. Maybe one idea is the service emails an error to a sys admin, but continues to run.

If that's the case, then creates of otel instrumentation, such as gRPC, need to either:

Do much more than otel.Handle, but actually set a noop object into their config
Implementers of metric.Meter need to always return a non-nil object as the first parameter and this contract should be clear in the documentation of metric.Meter

For the first option above, the implementation of gRPC otel should be modified. For the second option above, the documentation of metric.Meter needs to be modified.

Recommendation

The contract, IE documentation, of metric.Meter should say "If the second returned value is non-nil, then the first returned value will be a no-op object and users should call otel.Handle."

The text was updated successfully, but these errors were encountered:

MrAlias · 2024-01-23T18:43:05Z

"If the second returned value is non-nil, then the first returned value will be a no-op object and users should call otel.Handle."

This is not true if there was an instrument name error. The returned instrument will be "usable" (though it may produce conflicting metric streams) and the error is ErrInstrumentName. How the user want's to handle this situation is going to be case-by-case dependent.

It is also not true that all users should pipe their error to otel.Handle. That is one option, but it is not universal. If it were we would have just done so within the called function. We cannot make a global suggestion that users should do that, it would be incorrect.

cep21 · 2024-01-23T19:22:37Z

The returned instrument will be "usable"

Should this be part of the documentation of the interface since implementations, such as gRPC, are depending upon this behavior to be correct?

also not true that all users should pipe their error to otel.Handle

Is there guidance on when to call otel.Handle and when not to?

MrAlias · 2024-01-23T20:56:25Z

Is there guidance on when to call otel.Handle and when not to?

For authors of the OTel SDK: whenever an error that cannot be returned to the user (asynchronous, not allowed via the API).

For authors of OTel instrumentation: as little as possible. If the error can be handled, handle it; if it cannot but can be returned to the user, return it to the user; if neither can be done and it is possibly recoverable or critical to the use then send to the otel.Handler.

For users of OTel that is defined by themselves: they are setting a handler, they are in the best position to decide this.

MrAlias · 2024-01-23T21:00:15Z

Should this be part of the documentation of the interface since implementations, such as gRPC, are depending upon this behavior to be correct?

Sure, sounds good to me. But I think it should be a part of the implementation of the interface's documentation not the interface itself given that is not where the behavior comes from.

cep21 · 2024-01-23T21:07:45Z

should be a part of the implementation of the interface's documentation

I disagree with that part. Here is my reasoning:

gRPC's usage takes a metric.Meter, not an implementation of a Meter. Therefor, it must only code against the contract of the thing it takes, an interface, not the contract of an implementation of that interface, of which there could be infinite. In the future, if people want to implement their own metric.Meter, they will need to know what behavior to use to implement a correct object. If correctness requires that the first parameter is always non nil and usable, then that detail must exist in the interface itself.

MrAlias · 2024-01-23T21:12:31Z

which there could be infinite

This it the reason you cannot say the interface will always return an error for a name defined by OpenTelemetry to be invalid for their SDKs in the interface. There is nothing stopping one of those infinite implementations from deviating from this behavior, nor should there be.

cep21 · 2024-01-23T21:15:47Z

I'm sorry I'll clarify my problem. I am writing code that takes a metric.Meter. I as calling Int64Counter. If the second parameter is an error, I want to know if it is ok to call functions on the first returned Int64Counter, or if those function calls will panic.

If it is ok to call functions on the first parameter, I would like to solidify this contract by putting it in the documentation of the object I'm calling: metric.Meter.

MrAlias · 2024-01-24T16:35:33Z

Yeah, I think we can document the Meter interface to state that if a non-nil instrument is returned it should never panic when its methods are called.

cep21 · 2024-01-24T16:47:27Z

if a non-nil instrument is returned

I'm sorry, I want to document.

All function calls of Meter that return an object and an error, always return a usable non-nil first value even if the second value is an error

I would like to document this because this current behavior of Meter is different than the rest of the Go standard library. For example, http.NewRequest returns a nil first object if the second return value, error, is non nil.

Resolves open-telemetry#4836

cep21 · 2024-01-24T16:50:58Z

I've submitted an example PR here: #4853

MrAlias · 2024-01-24T17:00:12Z

You are again describing the behavior of the OTel SDK that backs the interface. I don't think we should limit other implementations of that interface to never return nil. That is too restrictive.

cep21 · 2024-01-24T17:04:30Z

I see the disagreement. Is it fair to say:

I'm saying:

If a function takes a meter.Meter, then it can only rely on the documented contract of the object it takes: meter.Meter.

You're saying:

Code can take a meter.Meter and depend upon the contract of a Meter implementation, not meter.Meter itself, and still be correct.

MrAlias · 2024-01-24T17:11:53Z

I'm saying all a user can rely on is the interface contract itself. If an implementation of that interface wants to return a partial result and an error that is fine. If they want to return an unrecoverable error and nil, also fine.

If we want to communicate to users that an implementation may return partial results through the interface that sounds fine with me. I'm not okay with saying all implementations will be expected to do this and never return nil and an error.

cep21 · 2024-01-24T17:18:18Z

Thanks for clearing this up! Back to the gRPC implementation of open telemetry with this code

	c.rpcResponseSize, err = c.meter.Int64Histogram("rpc."+role+".response.size",
		metric.WithDescription("Measures size of RPC response messages (uncompressed)."),
		metric.WithUnit("By"))
	if err != nil {
		otel.Handle(err)
	}
        // ...
        c.rpcResponseSize.Record(...)

Since this implementation is relying on the behavior that c.rpcResponseSize is non nil, is this implementation of gRPC open telemetry incorrect? Should the implementation of open telemetry for gRPC be modified to not call functions on c.rpcResponseSize if the error is nil? Maybe

	c.rpcResponseSize, err = c.meter.Int64Histogram("rpc."+role+".response.size",
		metric.WithDescription("Measures size of RPC response messages (uncompressed)."),
		metric.WithUnit("By"))
	if err != nil {
		otel.Handle(err)
+		c.rpcResponseSize = &noopHistogram{}
	}

MrAlias · 2024-01-24T17:42:35Z

I think the instrumentation needs to check if the instrument is nil

	c.rpcResponseSize, err = c.meter.Int64Histogram("rpc."+role+".response.size",
		metric.WithDescription("Measures size of RPC response messages (uncompressed)."),
		metric.WithUnit("By"))
	if err != nil {
		otel.Handle(err)
	}
        // ...
        if c.rpcResponseSize != nil {
		c.rpcResponseSize.Record(...)
	}

or, like you mentioned, just set it to a noop if it is nil

	c.rpcResponseSize, err = c.meter.Int64Histogram("rpc."+role+".response.size",
		metric.WithDescription("Measures size of RPC response messages (uncompressed)."),
		metric.WithUnit("By"))
	if err != nil {
		otel.Handle(err)
		if c.rpcResponseSize == nil {
			c.rpcResponseSize = &noopHistogram{}
		}
	}

cep21 · 2024-01-24T18:01:40Z

Thanks for clarifying. I'm trying to write my own collector and am using the existing contrib collectors as models for best practice, which is where my confusion came from.

cep21 added the bug Something isn't working label Jan 20, 2024

cep21 added a commit to cep21/opentelemetry-go that referenced this issue Jan 24, 2024

docs(meter): Update meter.Meter contract

be22270

Resolves open-telemetry#4836

cep21 mentioned this issue Jan 24, 2024

docs(meter): Update meter.Meter contract #4853

Closed

MrAlias mentioned this issue Jan 29, 2024

[otelgrpc] Do not panic for nil instruments open-telemetry/opentelemetry-go-contrib#4875

Merged

MrAlias closed this as completed in open-telemetry/opentelemetry-go-contrib#4875 Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about correctness of metric.Meter implementation and use #4836

Question about correctness of metric.Meter implementation and use #4836

cep21 commented Jan 20, 2024 •

edited

MrAlias commented Jan 23, 2024

cep21 commented Jan 23, 2024

MrAlias commented Jan 23, 2024

MrAlias commented Jan 23, 2024

cep21 commented Jan 23, 2024

MrAlias commented Jan 23, 2024

cep21 commented Jan 23, 2024 •

edited

MrAlias commented Jan 24, 2024

cep21 commented Jan 24, 2024 •

edited

cep21 commented Jan 24, 2024

MrAlias commented Jan 24, 2024

cep21 commented Jan 24, 2024

MrAlias commented Jan 24, 2024

cep21 commented Jan 24, 2024 •

edited

MrAlias commented Jan 24, 2024

cep21 commented Jan 24, 2024

Question about correctness of metric.Meter implementation and use #4836

Question about correctness of metric.Meter implementation and use #4836

Comments

cep21 commented Jan 20, 2024 • edited

Description

Recommendation

MrAlias commented Jan 23, 2024

cep21 commented Jan 23, 2024

MrAlias commented Jan 23, 2024

MrAlias commented Jan 23, 2024

cep21 commented Jan 23, 2024

MrAlias commented Jan 23, 2024

cep21 commented Jan 23, 2024 • edited

MrAlias commented Jan 24, 2024

cep21 commented Jan 24, 2024 • edited

cep21 commented Jan 24, 2024

MrAlias commented Jan 24, 2024

cep21 commented Jan 24, 2024

MrAlias commented Jan 24, 2024

cep21 commented Jan 24, 2024 • edited

MrAlias commented Jan 24, 2024

cep21 commented Jan 24, 2024

cep21 commented Jan 20, 2024 •

edited

cep21 commented Jan 23, 2024 •

edited

cep21 commented Jan 24, 2024 •

edited

cep21 commented Jan 24, 2024 •

edited