sdk/trace: add SpanProcessor #115

mauriciovasquezbernal · 2019-08-29T16:37:55Z

SpanProcessor is an interface that allows to register hooks for Span start and end invocations.

It is not already part of the spec but there is ongoing work on this: open-telemetry/opentelemetry-specification#205

This PR introduces the SpanProcessor interface in the SDK, also a MultiSpanProcessor implementation based on the java one to allow more than one processor to be registered.

Built-in span processors are not considered here and will be added in another PR soon.

According to the specs the span processors should only be invoked when IsRecordingEvents is true, however that flag is still missing: #100

The following is a small example of how span processor can be used:

...
from opentelemetry.sdk import trace

# implement `SpanProcessor` interface
class MySpanProcessor(trace.SpanProcessor):
    def on_start(self, span: "trace.Span") -> None:
        print("span started...")

    def on_end(self, span: "trace.Span") -> None:
        print("span ended...")

# create tracer instance (real users will use trace.tracer() from the api package)
tracer = trace.Tracer()

# add span processor to the tracer
tracer.add_span_processor(MySpanProcessor())

# create some spans just to test
with tracer.start_span("foo"):
    with tracer.start_span("bar"):
        with tracer.start_span("baz"):
            pass
...

This is ongoing work for #60.

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

reyang

There is potential race condition, please consider the immutable processors approach.

carlosalberto · 2019-08-29T18:54:39Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

@@ -145,6 +145,62 @@ def from_map(cls, maxlen, mapping):
 Link = namedtuple("Link", ("context", "attributes"))


+class SpanProcessor:
+    def on_start(self, span: "Span") -> None:


Shall we mention this is a SDK Span, not a API Span?

We can mention this explicitly too, but the "Span" type annotation here should already refer to the Span in this file.

BTW, please move SpanProcessor after Span, so that the type annotation can be written without making it a string.

I'll mention this explicitly in a the SpanProcessor docs.

I had tried to move it there but Span needs _NO_OP_SPAN_PROCESSOR that is an instance of SpanProcessor, so I gave up and preferred to use strings for the annotations.

Sorry if this is covered somewhere, but why would the SpanProcessor take the SDK Span as an object? Shouldn't SpanProcessors be more agnostic of the span implementation?

@toumorokoshi We specify SpanProcessors only in the SDK level, so there can't ever be something different than an SDK span. Also, a plain span would be pretty useless to a span processor since there aren't any properties on it that can be read.

@toumorokoshi as already pointed by @Oberon00 the SpanProcessors are only defined in the SDK level. We discussed in the SIG meeting about passing a readable span (something similar to SpanData) to the SpanProcessors, but we then agreed to pass directly the SDK span for the time being.

And that in the SIG was only about exporters, where this might change in the long term. I imagine SpanProcessors will continue to always just receive a sdk.Span object directly.

I imagine SpanProcessors will continue to always just receive a sdk.Span object directly.

Correct. It has been mentioned the (future) possibility to set properties on the SDK Span from processor code ;)

Oberon00

Just a few nitpicks, I think this looks good!

Oberon00 · 2019-08-30T12:39:16Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

@@ -145,6 +145,62 @@ def from_map(cls, maxlen, mapping):
 Link = namedtuple("Link", ("context", "attributes"))


+class SpanProcessor:
+    def on_start(self, span: "Span") -> None:


We can mention this explicitly too, but the "Span" type annotation here should already refer to the Span in this file.

BTW, please move SpanProcessor after Span, so that the type annotation can be written without making it a string.

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

Oberon00 · 2019-08-30T12:44:38Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

+        """
+
+    def shutdown(self) -> None:
+        """Called when a :class:`Tracer` is shutdown."""


I don't think we have a shutdown method in our tracer yet?

Oberon00 · 2019-08-30T12:45:23Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

+
+    def add_span_processor(self, span_processor: SpanProcessor):
+        """Adds a SpanProcessor to the list handled by this instance."""
+        self._span_processors = self._span_processors + (span_processor,)


@reyang: Is assignment to a member variable thread safe in Python?

I believe yes for CPython at least.

Hmm, but still the access that reads the _span_processors and the one that writes it are not atomic together, so this will still need some kind of locking (or, preferably IMHO, as this would sacrifice performance for an edge case), we document that this method is not thread safe (and recommend it is only called during an initial initialization phase or some such)

Having thread safe solution without using locks in the reader side looks complicated to me, so I agree that we can assume this function is not thread safe and document it.

We could have a single, simple lock for the SDK Tracer itself. In Java we have this one for both shutdown() and addSpanProcessor() (and given the fact we will have to add shutdown(), I see no reason to not add this now).

Observe that a lock here shouldn't be much of a problem, as it will not impact Span (or related) operations ;)

Hmm true, when we assume atomic assignment, we won't need a lock when we just read the span processor list.

If we assume atomic assignment then just a lock to protect multiple calls to add_span_processor will suffice. I don't really know if we should assume that because it could not work in all the cases, different implementations than CPython for instance.

opentelemetry-sdk/tests/trace/test_trace.py

toumorokoshi

I don't have enough context to approve / unapprove, but left some clarifying questions.

toumorokoshi · 2019-08-31T05:19:06Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

+_NO_OP_SPAN_PROCESSOR = SpanProcessor()
+
+
+class MultiSpanProcessor(SpanProcessor):


Is there a motivation to have MultiSpanProcessor be a specialized version of SpanProcessor? In general I've thought if an interface probably needs to support adding multiple processors, it would just be better to have the system consuming the SpanProcessor to take multiple?

e.g.

tracer.set_span_processor(Processor1(), Processor2())

rather than

tracer.set_span_processor(MultiSpanProcessor(Processor1(), Processor2())

I think this use of the Composite Pattern is OK, as it is IMHO a well-understood coding-pattern. Putting that code directly in the span would not be much simpler. Though I agree that it currently is simpler and I don't know if we will ever have more consumers of span processors (though it is imaginable to add processors for before and after sampling stages, for example)

toumorokoshi · 2019-08-31T05:19:53Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

+    events to a list of `SpanProcessor`.
+    """
+
+    def __init__(self):


why not accept the list of processors in the constructor?

While this seems nice, I would vote for following the YAGNI-principle here and not add that to the constructor. Users of OpenTelemetry will probably never create (or even directly access) a MultiSpanProcessor themselves.

So far there is not an envisioned case where a MultipleSpanProcessor is created from a list of SpanProcessors. Currently they are added one at the single time after the MultipleSpanProcessor instance in the Tracer is been created.

MultiSpanProcessor is internal class, it is just an implementation choice for this problem.

Any reason to not make this class public, btw? This could be useful for users too.

If we keep this internal we can modify it as we want without worry about compatibility with users. Do you have a use case in mind for that?

I imagine users wanting to plug multiple processors, simply. Unless this processor gets really complicated, or needs to be super complicated, I see no reason to not expose it.

It can be currently done using the Trace::add_span_processor method:

tracer.add_span_processor(sp1) tracer.add_span_processor(sp2) ...

If you still think it is worthy to expose it, I'll do.

Sure, we can add multiple processors already, but it would been nice to have this multi processor - in an ideal world, it would be simple and not married to the implementation, so there shouldn't be complicated to expose it.

In any case, lets keep it private for the time being then, unless somebody has objections. We can reconsider making it public later.

toumorokoshi · 2019-08-31T05:24:55Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

@@ -179,6 +245,7 @@ def __init__(
        attributes: types.Attributes = None,  # TODO
        events: typing.Sequence[Event] = None,  # TODO
        links: typing.Sequence[Link] = None,  # TODO
+        span_processor: SpanProcessor = _NO_OP_SPAN_PROCESSOR,


the PR on SpanProcessor in the spec implies that the SpanProcessor would be attached to the tracer: https://github.com/open-telemetry/opentelemetry-specification/pull/205/files#r311622907

To me this feels more intuitive as well: no need to go attach a SpanProcessor to every span I create. What was the choice to attach to the Span? enable overriding of what is specified by the tracer?

For me this looks more like an implementation choice: The span only needs to know the span processors not the whole Tracer that contains it. But since the Tracer currently switches from None/No-Op to a MultiSpanProcessor, this does indeed violate the specification for spans created before the first span processor is added.

To fix this, I suggest to always use the same MultiSpanProcessor in the Tracer right from __init__ (I also commented something to that effect in Tracer.add_span_processor). And, of course, add a unit test against that bug 😉

@toumorokoshi the way you mentioned is the one used in this PR.
Spans created with the create_span method are already attached to the span processors configured in the tracer. It is not needed to attach them later on.

@Oberon00 Although I think you are right, I also think the specification is not that clear, is it really true that SpanExporters must be called for Spans that started before they were added?

You are right, the specification is unclear here. Then I would say, do whatever is easier to implement (but it might still make sense to add a unit test to ensure we don't change the behavior we decide for)

For me this looks more like an implementation choice

Correct. The Specification mentions the processor being attached to the Tracer, but it doesn't mention internally it should not be attached to Spans.

this does indeed violate the specification for spans created before the first span processor is added.

As @mauriciovasquezbernal said, I think the specification does not clarify this. I get the feeling the current simple approach is good enough - there's also the possible related case of Spans getting, if using always the current Tracer processor, of getting on_end() and not on_start().

but it might still make sense to add a unit test to ensure we don't change the behavior we decide for

Agreed.

In my implementation a span processor will receive on_end notifications from spans that were created before it was added to the tracer. I added a test for it.

Mmm that's an interesting thing, as we might need to mention (in the Specification) that a processor could get a End call without a start (or else restrict this case).

Will think about it a little bit and fill an issue about it most likely.

Oberon00

I forgot to add https://github.com/open-telemetry/opentelemetry-python/pull/115/files#r319923224 to this review.

After these are fixed, I think the PR is ready to be merged (or at least, to be approved by me 😀)

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

Oberon00 · 2019-09-02T11:58:51Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

+
+        The span processors are invoked in the same order they are registered.
+        """
+        with self._lock:


I don't think this optimization is worth the added LOC. Just initialize an empty MultiSpanProcessor in __init__.

I agree, I removed this _NO_OP_SPAN_PROCESSOR from the Tracer.

Oberon00 · 2019-09-02T12:05:55Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

@@ -145,6 +145,62 @@ def from_map(cls, maxlen, mapping):
 Link = namedtuple("Link", ("context", "attributes"))


+class SpanProcessor:
+    def on_start(self, span: "Span") -> None:


@toumorokoshi We specify SpanProcessors only in the SDK level, so there can't ever be something different than an SDK span. Also, a plain span would be pretty useless to a span processor since there aren't any properties on it that can be read.

Oberon00 · 2019-09-02T12:12:24Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

@@ -179,6 +245,7 @@ def __init__(
        attributes: types.Attributes = None,  # TODO
        events: typing.Sequence[Event] = None,  # TODO
        links: typing.Sequence[Link] = None,  # TODO
+        span_processor: SpanProcessor = _NO_OP_SPAN_PROCESSOR,


For me this looks more like an implementation choice: The span only needs to know the span processors not the whole Tracer that contains it. But since the Tracer currently switches from None/No-Op to a MultiSpanProcessor, this does indeed violate the specification for spans created before the first span processor is added.

To fix this, I suggest to always use the same MultiSpanProcessor in the Tracer right from __init__ (I also commented something to that effect in Tracer.add_span_processor). And, of course, add a unit test against that bug 😉

Oberon00 · 2019-09-02T12:24:08Z

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

+_NO_OP_SPAN_PROCESSOR = SpanProcessor()
+
+
+class MultiSpanProcessor(SpanProcessor):


I think this use of the Composite Pattern is OK, as it is IMHO a well-understood coding-pattern. Putting that code directly in the span would not be much simpler. Though I agree that it currently is simpler and I don't know if we will ever have more consumers of span processors (though it is imaginable to add processors for before and after sampling stages, for example)

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

SpanProcessor is an interface that allows to register hooks for Span start and end invocations. This commit adds the SpanProcessor interface to the SDK as well as the MultiSpanProcessor that allows to register multiple processors.

mauriciovasquezbernal requested review from a-feld, c24t, carlosalberto, lzchen, Oberon00, reyang and toumorokoshi as code owners August 29, 2019 16:37

reyang reviewed Aug 29, 2019

View reviewed changes

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Show resolved Hide resolved

reyang reviewed Aug 29, 2019

View reviewed changes

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Show resolved Hide resolved

reyang requested changes Aug 29, 2019

View reviewed changes

mauriciovasquezbernal force-pushed the mauricio/add_span_processor branch 2 times, most recently from 618fc7a to 974a148 Compare August 29, 2019 18:53

carlosalberto reviewed Aug 29, 2019

View reviewed changes

reyang approved these changes Aug 29, 2019

View reviewed changes

Oberon00 reviewed Aug 30, 2019

View reviewed changes

mauriciovasquezbernal force-pushed the mauricio/add_span_processor branch from 2a5fb21 to 52598a8 Compare August 30, 2019 14:24

mauriciovasquezbernal mentioned this pull request Aug 30, 2019

sdk: add Exporter interface, SimpleSpanProcessor and InMemorySpanExporter #119

Merged

toumorokoshi reviewed Aug 31, 2019

View reviewed changes

Oberon00 suggested changes Sep 2, 2019

View reviewed changes

mauriciovasquezbernal force-pushed the mauricio/add_span_processor branch 4 times, most recently from cb6ff17 to 32cf44b Compare September 3, 2019 14:21

Oberon00 approved these changes Sep 6, 2019

View reviewed changes

Oberon00 reviewed Sep 6, 2019

View reviewed changes

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved

sdk/trace: add SpanProcessor

8539936

SpanProcessor is an interface that allows to register hooks for Span start and end invocations. This commit adds the SpanProcessor interface to the SDK as well as the MultiSpanProcessor that allows to register multiple processors.

mauriciovasquezbernal force-pushed the mauricio/add_span_processor branch from 32cf44b to 8539936 Compare September 6, 2019 12:44

Oberon00 merged commit 42acdb9 into open-telemetry:master Sep 6, 2019

mauriciovasquezbernal mentioned this pull request Sep 10, 2019

Add span processor #60

Closed

mauriciovasquezbernal deleted the mauricio/add_span_processor branch April 14, 2020 21:50

srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this pull request Nov 1, 2020

Export NoopLogger and BinaryTraceContext (open-telemetry#115)

060f424

		_NO_OP_SPAN_PROCESSOR = SpanProcessor()


		class MultiSpanProcessor(SpanProcessor):

sdk/trace: add SpanProcessor #115

sdk/trace: add SpanProcessor #115

Conversation

mauriciovasquezbernal commented Aug 29, 2019 • edited Loading

reyang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mauriciovasquezbernal Aug 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toumorokoshi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mauriciovasquezbernal Sep 3, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mauriciovasquezbernal commented Aug 29, 2019 •

edited

Loading

mauriciovasquezbernal Aug 30, 2019 •

edited

Loading

mauriciovasquezbernal Sep 3, 2019 •

edited

Loading