Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support span event listeners #8619

Merged
merged 21 commits into from Apr 25, 2024
Merged

Conversation

tjquinno
Copy link
Member

@tjquinno tjquinno commented Apr 5, 2024

Description

Resolves #8352 using restrictive adapters around existing types (Span, Span.Builder, Scope) passed to the developer's callback methods so user code cannot change the state of the span, span builder, or scope.

How it works

  • Helidon adds the io.helidon.tracing.SpanListener interface for the life cycle listener behavior.

  • At runtime our provider implementations (OpenTelemetry, OpenTracing, Zipkin, Jaeger) of the tracing types (Tracer, Span.Builder, Span, Scope) invoke the known listeners at the right times, passing proxies for the relevant objects--span builders, spans, and scopes--to prevent user code in the listener from interfering with the key characteristics or life cycle of the actual objects.

  • How does Helidon know about listener instances?

    • Explicit registration

      Developer explicitly invokes aTracer.register(myListener). Many users would probably use Tracer.global() anyway but we wouldn't force that.

    • Service loading

      Developer implements SpanListener and prepares the META-INF/services/io.helidon.tracing.SpanListener file to refer to the implementation and, if appropriate, adds a provides...with to the application or library module-info.java. Using the service loader mechanism makes it very easy for developers to register listeners with all tracers without needing to modify the code that creates individual tracers so the listeners can be registered with all of them explicitly.

    Usually developers use Tracer.global() rather than creating individual Tracer instances in their code, but it's useful to allow developers to explicitly register listeners on individual Tracer objects. It also simplifies unit testing.

Key added type

io.helidon.tracing.SpanListener

Method When Invoked Main Usage of parameter(s)
starting(Span.Builder<?>) Before a span is started from its builder. (formerly beforeStart) Assign tags (name/value pairs).
started(Span) After a span has started. (formerly afterStart) Assign tags, add events, update baggage.
activated(Span, Scope) After a span has been activated, creating a new scope in the process. (formerly afterActivate) Note span becoming active.
closed(Span, Scope) After a scope has been closed. (formerly afterClose) Note span is no longer active.
ended(Span) * After a span has ended successfully. (formerly afterEnd) Note completion of span.
ended(Span, Throwable) * After a span has ended unsuccessfully. (formerly afterEnd) Note completion of span.

† Not all spans are activated; it is up to the application or library code that creates and manages the span. As a result Helidon might not invoke the listeners' activated and closed methods for every span.

* The successful or unsuccessful nature of a span's end is not about whether the tracing or telemetry system failed to end the span. Rather, it indicates whether the code that ended the span indicated some error in the processing which the span represents.

Error handling

Helidon catches any exception thrown by listener methods or the Helidon-provided proxies and logs a warning message describing the exception. This approach insulates the original developer's code which deals with the span from having to know whether listeners are present, whether they might throw exceptions, etc.

Limitations on what operations the listener can invoke on parameters

Lifecycle listeners cannot alter the lifecycle or the essential nature of the parameters they are passed. The arguments passed are proxies which implement the interfaces (Span.Builder, Span, Scope) but the state-changing or other forbidden operations throw UnsupportedOperationException; permitted operations delegate to the actual underlying object.

io.helidon.tracing.Span.Builder

Method Purpose OK?
build() Starts the span. -
end methods Ends the span. -
get() Starts the span. -
kind(Kind) Sets the "kind" of span (server, client, internal, etc.) -
parent(SpanContext) Sets the parent of the span to be created from the builder. -
start() Starts the span. -
start(Instant) Starts the span. -
tag methods Add a tag to the builder before the span is built.
unwrap(Class) Cast the builder to the specified implementation type. †

† Helidon returns the unwrapped object, not a safe adapter to it.

io.helidon.tracing.Span

Method Purpose OK?
activate() Makes the span "current", returning a Scope. * -
addEvent methods Associate a string (and optionally other info) with a span.
baggage() Returns the Baggage instance associated with the span.
context() Returns the SpanContext associated with the span.
status(Status) Sets the status of the span. -
any tag method Add a tag to the span.
unwrap(Class) Cast the span to the specified implementation type. †

* Helidon throws UnsupportedActivationException, a Helidon exception which extends UnsupportedOperationException and adds Scope scope(). This allows the caller to catch the exception and close the Scope which was created before a problem occurred in invoking the listeners.

† Helidon returns the unwrapped object, not a safe adapter to it.

io.helidon.tracing.Scope

Method Purpose OK?
close() Close the scope. -
isClosed() Reports whether the scope is closed.

io.helidon.tracing.SpanContext

Method Purpose OK?
asParent(Span.Builder) Sets this context as the parent of a new span builder.
baggage() Returns Baggage instance associated with the span context.
spanId() Returns the span ID.
traceId() Returns the trace ID.

Documentation

Included in the PR.

Test changes

The PR includes two new general testing types added to helidon-common-testing-junit5.

InMemoryLoggingHandler

Tests use static factory methods to create a handler and add it to a logger. The handler stores each LogRecord in a list and exposes the list so tests can examine the accumulated log records.

LogRecordMatcher

A Hamcrest matcher that deals with the LogRecord type, currently allowing for matching the thrown type or for matching the log message contents in a log record.

Copy link
Member

@ljnelson ljnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are probably preference-oriented, so choosing Request changes feels weird but I'm ticking that box. Nice work.


|====

{empty}* Helidon throws the link:{tracing-javadoc}/UnsupportedActivationException.html[`UnsupportedActivationException`] if a listener attempts an illegal operation from inside its `afterActivation` method. This Helidon exception extends `UnsupportedOperationException` and adds the `Scope scope()` method. Callers should catch this exception and close the `Scope`; Helidon will have activated the span and created the scope _before_ it invoked the listeners.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Controversial opinion: perhaps the Exception itself should implement AutoCloseable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not seeing what problem this solves and how it does so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough; a fair warning which I know you know already which is that no one is ever going to close that Scope! I was looking for some way to make it more obvious that you have to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a developer fails to close that Scope and asks why the scope remains active in the error situation we have the mechanism in place they can use to fix their code.

This approach also gives the developer the option to not immediately close the Scope. After all, the problem indicated by the exception is that an ill-mannered listener improperly tried to close the scope from its activated method. By throwing the exception Helidon both lets the developer's code know that and also communicates the scope that was created so the meaningful work that was supposed to take place within the scope could proceed (if that makes sense in the developer's use case) and then the developer's code can close the scope at the right time.

It's messy and involved but, I think, covers the corners.


A completely different alternative would be for Helidon to detect the listener methods' attempts to improperly alter the life cycle of the span or scope (as it does in the PR) but then log a warning instead of throw an exception.

An advantage to this approach is that the developer who wrote the main code being traced does not need to ever account for the possible presence of errant listeners. The life cycle that developer wrote is always followed regardless of what happens in the listeners, if there even are any. No special error handling code for dealing with bad listeners needs to clutter up the developer's code.

* @param listener the {@link SpanLifeCycleListener} to register
* @return the updated {@code Tracer}
*/
Tracer register(SpanLifeCycleListener listener);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm old school but I'd prefer addSpanListener(SpanListener listener). I'm sure others will disagree.

* when Helidon throws this exception due to an error in a listener, the caller has no access to the {@code Scope} return value
* return value.
*/
public class UnsupportedActivationException extends UnsupportedOperationException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Controversial opinion: consider implementing AutoCloseable directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnsupportedOperationException is designed for collection that do not support a subset of operations (such as write operations on a read only collection).
I do not think it is relevant for this use case, as all operations must be (by design of the API) supported.
I think extending RuntimeException is more aligned with what this does

Copy link
Member

@ljnelson ljnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming and such, primarily.


|====

{empty}* Helidon throws the link:{tracing-javadoc}/UnsupportedActivationException.html[`UnsupportedActivationException`] if a listener attempts an illegal operation from inside its `afterActivation` method. This Helidon exception extends `UnsupportedOperationException` and adds the `Scope scope()` method. Callers should catch this exception and close the `Scope`; Helidon will have activated the span and created the scope _before_ it invoked the listeners.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough; a fair warning which I know you know already which is that no one is ever going to close that Scope! I was looking for some way to make it more obvious that you have to.

@tjquinno tjquinno requested a review from ljnelson April 16, 2024 19:34
@tjquinno tjquinno marked this pull request as draft April 16, 2024 22:27
@tjquinno tjquinno marked this pull request as ready for review April 17, 2024 15:55
Copy link
Member

@ljnelson ljnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo, a couple of nits, and a question on the documentation of SpanListener. Nothing big.

* test--using try-with-resource--will automatically clear the handler's log records and detach the handler from the logger.
* </p>
*/
public class InMemoryLoggingHandler extends Handler implements AutoCloseable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Hackles go up; there are all sorts of classloading issues with logging that I never remember. Be careful this doesn't introduce some leak somewhere.)

There is also https://docs.oracle.com/en/java/javase/21/docs/api/java.logging/java/util/logging/MemoryHandler.html in case you want to extend something that presumably does everything correctly in this area.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I know about is that the LogManager keeps only weak refs to loggers, so if you don't keep a ref yourself a given named logger might come and go seemingly randomly. Not sure if that's what you mean by the class loading issues.

I'm glad to know of MemoryHandler! I'll look at leveraging it.

Copy link
Member Author

@tjquinno tjquinno Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further review, the MemoryHandler seems designed as a front-end to a downstream handler target to which log records are pushed upon certain conditions.

The constructor uses either logging config to set it up or explicit constructor arguments, in either case rejecting a null target handler. The test use case for InMemoryLoggingHandler certainly does not need or want the target or pushing logic.

The only class loading action I see in MemoryHandler is when it locates the target handler class by name when it is specified in logging config.

I'll plan to stick with InMemoryLoggingHandler as-is unless other information comes to light.

A listener cannot affect the lifecycle of a span or scope it is notified about, but it can add tags and events and update the baggage associated with a span.
Often a listener does additional work that does not change the span or scope such as logging a message.

When Helidon invokes the listener's methods it passes proxies for the parameter types. These proxies limit the access the listener has to the span builder, span, or scope, as summarized in the following table. If a listener method tries to invoke an forbidden operation, the proxies throw an `UnsupportedOperationException` and Helidon then logs a `WARNING` message describing the invalid operation invocation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"proxies for the parameter types" ▶️ "proxies for the arguments"

Perhaps not thinking clearly, but if the outcome of this sort of thing is a warning message, then do you even have to construct/throw the UnsupportedOperationException? Can't you just do a no-op instead?

Copy link
Member Author

@tjquinno tjquinno Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text fixed.

Throwing the exception easily captures the stack trace which handily incriminates the offending listener code and it's consistent with the general practice of "forbidden" methods throwing UnsupportedOperationException.

@tjquinno tjquinno requested a review from ljnelson April 17, 2024 21:26
* when Helidon throws this exception due to an error in a listener, the caller has no access to the {@code Scope} return value
* return value.
*/
public class UnsupportedActivationException extends UnsupportedOperationException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnsupportedOperationException is designed for collection that do not support a subset of operations (such as write operations on a read only collection).
I do not think it is relevant for this use case, as all operations must be (by design of the API) supported.
I think extending RuntimeException is more aligned with what this does

tracing/tracing/src/main/java/module-info.java Outdated Show resolved Hide resolved
See the License for the specific language governing permissions and
limitations under the License.
-->
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is in a strange location.
In general integration tests should be under module tracing/tests or tests/integration as in other modules. This module should not be here.

Copy link
Member Author

@tjquinno tjquinno Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same approach we chose for metrics. These are tests which all providers need to pass:

metrics/
    provider-tests
    providers/
        micrometer

and, in this PR following the same pattern:

tracing/
    provider-tests
    providers/
        jaeger
        opentelemetry
        opentracing
        zipkin

tracing/tracing/src/main/java/module-info.java Outdated Show resolved Hide resolved
@tjquinno tjquinno merged commit e93ee0c into helidon-io:main Apr 25, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
3 participants