-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sdk/trace: add SpanProcessor #115
sdk/trace: add SpanProcessor #115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is potential race condition, please consider the immutable processors approach.
618fc7a
to
974a148
Compare
@@ -145,6 +145,62 @@ def from_map(cls, maxlen, mapping): | |||
Link = namedtuple("Link", ("context", "attributes")) | |||
|
|||
|
|||
class SpanProcessor: | |||
def on_start(self, span: "Span") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we mention this is a SDK Span
, not a API Span
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can mention this explicitly too, but the "Span" type annotation here should already refer to the Span in this file.
BTW, please move SpanProcessor after Span, so that the type annotation can be written without making it a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll mention this explicitly in a the SpanProcessor
docs.
I had tried to move it there but Span
needs _NO_OP_SPAN_PROCESSOR
that is an instance of SpanProcessor
, so I gave up and preferred to use strings for the annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this is covered somewhere, but why would the SpanProcessor take the SDK Span as an object? Shouldn't SpanProcessors be more agnostic of the span implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toumorokoshi We specify SpanProcessors only in the SDK level, so there can't ever be something different than an SDK span. Also, a plain span would be pretty useless to a span processor since there aren't any properties on it that can be read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toumorokoshi as already pointed by @Oberon00 the SpanProcessors are only defined in the SDK level. We discussed in the SIG meeting about passing a readable span (something similar to SpanData) to the SpanProcessors, but we then agreed to pass directly the SDK span for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that in the SIG was only about exporters, where this might change in the long term. I imagine SpanProcessors will continue to always just receive a sdk.Span object directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine SpanProcessors will continue to always just receive a sdk.Span object directly.
Correct. It has been mentioned the (future) possibility to set properties on the SDK Span
from processor code ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nitpicks, I think this looks good!
@@ -145,6 +145,62 @@ def from_map(cls, maxlen, mapping): | |||
Link = namedtuple("Link", ("context", "attributes")) | |||
|
|||
|
|||
class SpanProcessor: | |||
def on_start(self, span: "Span") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can mention this explicitly too, but the "Span" type annotation here should already refer to the Span in this file.
BTW, please move SpanProcessor after Span, so that the type annotation can be written without making it a string.
""" | ||
|
||
def shutdown(self) -> None: | ||
"""Called when a :class:`Tracer` is shutdown.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a shutdown
method in our tracer yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't.
|
||
def add_span_processor(self, span_processor: SpanProcessor): | ||
"""Adds a SpanProcessor to the list handled by this instance.""" | ||
self._span_processors = self._span_processors + (span_processor,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyang: Is assignment to a member variable thread safe in Python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe yes for CPython at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but still the access that reads the _span_processors and the one that writes it are not atomic together, so this will still need some kind of locking (or, preferably IMHO, as this would sacrifice performance for an edge case), we document that this method is not thread safe (and recommend it is only called during an initial initialization phase or some such)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thread safe solution without using locks in the reader side looks complicated to me, so I agree that we can assume this function is not thread safe and document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a single, simple lock for the SDK Tracer
itself. In Java we have this one for both shutdown()
and addSpanProcessor()
(and given the fact we will have to add shutdown()
, I see no reason to not add this now).
Observe that a lock here shouldn't be much of a problem, as it will not impact Span
(or related) operations ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm true, when we assume atomic assignment, we won't need a lock when we just read the span processor list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we assume atomic assignment then just a lock to protect multiple calls to add_span_processor
will suffice. I don't really know if we should assume that because it could not work in all the cases, different implementations than CPython for instance.
2a5fb21
to
52598a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have enough context to approve / unapprove, but left some clarifying questions.
_NO_OP_SPAN_PROCESSOR = SpanProcessor() | ||
|
||
|
||
class MultiSpanProcessor(SpanProcessor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a motivation to have MultiSpanProcessor be a specialized version of SpanProcessor? In general I've thought if an interface probably needs to support adding multiple processors, it would just be better to have the system consuming the SpanProcessor to take multiple?
e.g.
tracer.set_span_processor(Processor1(), Processor2())
rather than
tracer.set_span_processor(MultiSpanProcessor(Processor1(), Processor2())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this use of the Composite Pattern is OK, as it is IMHO a well-understood coding-pattern. Putting that code directly in the span would not be much simpler. Though I agree that it currently is simpler and I don't know if we will ever have more consumers of span processors (though it is imaginable to add processors for before and after sampling stages, for example)
events to a list of `SpanProcessor`. | ||
""" | ||
|
||
def __init__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not accept the list of processors in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this seems nice, I would vote for following the YAGNI-principle here and not add that to the constructor. Users of OpenTelemetry will probably never create (or even directly access) a MultiSpanProcessor themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far there is not an envisioned case where a MultipleSpanProcessor
is created from a list of SpanProcessors
. Currently they are added one at the single time after the MultipleSpanProcessor
instance in the Tracer
is been created.
MultiSpanProcessor
is internal class, it is just an implementation choice for this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not make this class public, btw? This could be useful for users too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this internal we can modify it as we want without worry about compatibility with users. Do you have a use case in mind for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine users wanting to plug multiple processors, simply. Unless this processor gets really complicated, or needs to be super complicated, I see no reason to not expose it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be currently done using the Trace::add_span_processor
method:
tracer.add_span_processor(sp1)
tracer.add_span_processor(sp2)
...
If you still think it is worthy to expose it, I'll do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can add multiple processors already, but it would been nice to have this multi processor - in an ideal world, it would be simple and not married to the implementation, so there shouldn't be complicated to expose it.
In any case, lets keep it private for the time being then, unless somebody has objections. We can reconsider making it public later.
@@ -179,6 +245,7 @@ def __init__( | |||
attributes: types.Attributes = None, # TODO | |||
events: typing.Sequence[Event] = None, # TODO | |||
links: typing.Sequence[Link] = None, # TODO | |||
span_processor: SpanProcessor = _NO_OP_SPAN_PROCESSOR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the PR on SpanProcessor in the spec implies that the SpanProcessor would be attached to the tracer: https://github.com/open-telemetry/opentelemetry-specification/pull/205/files#r311622907
To me this feels more intuitive as well: no need to go attach a SpanProcessor to every span I create. What was the choice to attach to the Span? enable overriding of what is specified by the tracer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this looks more like an implementation choice: The span only needs to know the span processors not the whole Tracer that contains it. But since the Tracer currently switches from None/No-Op to a MultiSpanProcessor, this does indeed violate the specification for spans created before the first span processor is added.
To fix this, I suggest to always use the same MultiSpanProcessor in the Tracer right from __init__
(I also commented something to that effect in Tracer.add_span_processor). And, of course, add a unit test against that bug 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toumorokoshi the way you mentioned is the one used in this PR.
Spans created with the create_span
method are already attached to the span processors configured in the tracer. It is not needed to attach them later on.
@Oberon00 Although I think you are right, I also think the specification is not that clear, is it really true that SpanExporters must be called for Spans that started before they were added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the specification is unclear here. Then I would say, do whatever is easier to implement (but it might still make sense to add a unit test to ensure we don't change the behavior we decide for)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this looks more like an implementation choice
Correct. The Specification mentions the processor being attached to the Tracer
, but it doesn't mention internally it should not be attached to Span
s.
this does indeed violate the specification for spans created before the first span processor is added.
As @mauriciovasquezbernal said, I think the specification does not clarify this. I get the feeling the current simple approach is good enough - there's also the possible related case of Span
s getting, if using always the current Tracer
processor, of getting on_end()
and not on_start()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it might still make sense to add a unit test to ensure we don't change the behavior we decide for
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my implementation a span processor will receive on_end
notifications from spans that were created before it was added to the tracer. I added a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm that's an interesting thing, as we might need to mention (in the Specification) that a processor could get a End
call without a start (or else restrict this case).
Will think about it a little bit and fill an issue about it most likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to add https://github.com/open-telemetry/opentelemetry-python/pull/115/files#r319923224 to this review.
After these are fixed, I think the PR is ready to be merged (or at least, to be approved by me 😀)
|
||
The span processors are invoked in the same order they are registered. | ||
""" | ||
with self._lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this optimization is worth the added LOC. Just initialize an empty MultiSpanProcessor in __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I removed this _NO_OP_SPAN_PROCESSOR from the Tracer.
@@ -145,6 +145,62 @@ def from_map(cls, maxlen, mapping): | |||
Link = namedtuple("Link", ("context", "attributes")) | |||
|
|||
|
|||
class SpanProcessor: | |||
def on_start(self, span: "Span") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toumorokoshi We specify SpanProcessors only in the SDK level, so there can't ever be something different than an SDK span. Also, a plain span would be pretty useless to a span processor since there aren't any properties on it that can be read.
@@ -179,6 +245,7 @@ def __init__( | |||
attributes: types.Attributes = None, # TODO | |||
events: typing.Sequence[Event] = None, # TODO | |||
links: typing.Sequence[Link] = None, # TODO | |||
span_processor: SpanProcessor = _NO_OP_SPAN_PROCESSOR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this looks more like an implementation choice: The span only needs to know the span processors not the whole Tracer that contains it. But since the Tracer currently switches from None/No-Op to a MultiSpanProcessor, this does indeed violate the specification for spans created before the first span processor is added.
To fix this, I suggest to always use the same MultiSpanProcessor in the Tracer right from __init__
(I also commented something to that effect in Tracer.add_span_processor). And, of course, add a unit test against that bug 😉
_NO_OP_SPAN_PROCESSOR = SpanProcessor() | ||
|
||
|
||
class MultiSpanProcessor(SpanProcessor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this use of the Composite Pattern is OK, as it is IMHO a well-understood coding-pattern. Putting that code directly in the span would not be much simpler. Though I agree that it currently is simpler and I don't know if we will ever have more consumers of span processors (though it is imaginable to add processors for before and after sampling stages, for example)
cb6ff17
to
32cf44b
Compare
SpanProcessor is an interface that allows to register hooks for Span start and end invocations. This commit adds the SpanProcessor interface to the SDK as well as the MultiSpanProcessor that allows to register multiple processors.
32cf44b
to
8539936
Compare
SpanProcessor
is an interface that allows to register hooks for Spanstart
andend
invocations.It is not already part of the spec but there is ongoing work on this: open-telemetry/opentelemetry-specification#205
This PR introduces the SpanProcessor interface in the SDK, also a MultiSpanProcessor implementation based on the java one to allow more than one processor to be registered.
Built-in span processors are not considered here and will be added in another PR soon.
According to the specs the span processors should only be invoked when
IsRecordingEvents
is true, however that flag is still missing: #100The following is a small example of how span processor can be used:
This is ongoing work for #60.