Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to the current context when context is not given for B3 propagator extraction #1732

Closed
wants to merge 2 commits into from
Closed

Conversation

marcinzaremba
Copy link
Contributor

@marcinzaremba marcinzaremba commented Apr 1, 2021

Description

This is a follow up to #1728.

With the current design of the propagators pipeline every single propagator is responsible for defaulting to the current context in case the context is not provided as an input, which TextMapPropagator interface mandates: https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-api/src/opentelemetry/propagators/textmap.py#L153.

Recent changes to the B3 format propagator amplified this breach of contract, because there is a possibility that no context is given and in the case of an empty carrier, the context is returned as is, which then would result in a caller receiving None instead of Context promised in the signature (see unit test).
Previously this behavior was masked, because some Context was always returned (possibly invalid) even if it was not the current one, giving an impression that it is what is expected.

Contributes to #1727

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

See implemented unit test.

Does This PR Require a Contrib Repo Change?

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@marcinzaremba marcinzaremba marked this pull request as ready for review April 1, 2021 09:06
@marcinzaremba marcinzaremba requested a review from a team as a code owner April 1, 2021 09:06
@marcinzaremba marcinzaremba requested review from owais and lzchen and removed request for a team April 1, 2021 09:06
@owais
Copy link
Contributor

owais commented Apr 1, 2021

Is this mandated by the spec? If I understood the change correctly, I think this would result in very surprising behavior.

with tracer.start_as_current_span('s1'):
     print(propagate.extract({})

Will print None or a context with invalid span depending on your previous PR.

If I understand correctly, after this PR, propagate.extract({}) will always return the active span (if present). Is this correct?

If so, I'm not sure about this change. Think of propagation API in isolation. It is supposed to extract remove span context from a serialized form and inject serialized form of the active span into some carrier. I think extract returning anything other than what is inside the carrier would be surprising.

I see it might be convenient to implement something like, "Use remote span as parent if present, otherwise use local active span" but I don't think propagation API should have to worry about it.

Does that make sense or did I completely misunderstand the issue?

@marcinzaremba
Copy link
Contributor Author

marcinzaremba commented Apr 1, 2021

Does that make sense or did I completely misunderstand the issue?

That makes sense.

Is this mandated by the spec?

No, however it seems to be mandated by sofar design of the Python API and it seems to be a potential design issue in general. It is enforced/documented by TextMapPropagator itself:

If so, I'm not sure about this change. Think of propagation API in isolation. It is supposed to extract remove span context from a serialized form and inject serialized form of the active span into some carrier. I think extract returning anything other than what is inside the carrier would be surprising.

I think it's hard to think about propagation API in isolation when by default context is an input of a potential operation and this said context is by nature a shared object that multiple components contribute to by taking the previous one and emitting a new one. How one can think of them in isolation? Spec does not seem to present a direct operation to merge multiple contexts. Then every potential operation would need to look like this (assuming hypothetical interfaces):

from opentelemetry import context as otel_context

old_ctx = otel_context.get_current()
new_ctx = propagator.extract(carrier, old_ctx)
# Now we supposedly two contexts in isolation. How can one then merge them?
merged_new_ctx = otel_context.merge(old_ctx, new_ctx) # hypothetical API

This does not happen. So instead a recipient of a context is responsible for a "merge" operation which in this discussion are extract method of the propagators.

If I understand correctly, after this PR, propagate.extract({}) will always return the active span (if present). Is this correct?

Yes, but that's why extract has an optional context parameter and Context in general is immutable (in implementation sense) and one needs to attach/detach it explicitly to become global. If one wants to control the operation, one passes context directly to avoid any potential surprises and if not provided then it defaults to the current one.
Otherwise why one wants to extract a context after activating a span within the same context when this span can be potentially overridden immediately after activation if one intends to "attach" the new/extracted context?

Additionally this design ambiguity seems to have a toll of inconsistent implementations. Looking at the Jaeger propagator, it does exactly said behavior of default to the current context: https://github.com/open-telemetry/opentelemetry-python/blob/main/propagator/opentelemetry-propagator-jaeger/src/opentelemetry/propagators/jaeger/__init__.py#L49 and is functioning within main for a great deal of time. Naturally, what is considered a correct implementation here?

@srikanthccv
Copy link
Member

Defaulting to current context happens down the line when set_span_in_context is called in propagator which in turn calls set_value.

def set_value(
key: str, value: "object", context: typing.Optional[Context] = None
) -> Context:
"""To record the local state of a cross-cutting concern, the
RuntimeContext API provides a function which takes a context, a
key, and a value as input, and returns an updated context
which contains the new value.
Args:
key: The key of the entry to set.
value: The value of the entry to set.
context: The context to copy, if None, the current context is used.
Returns:
A new `Context` containing the value set.
"""
if context is None:
context = get_current()
new_values = context.copy()
new_values[key] = value
return Context(new_values)

I just realised that b3 doesn't pass the received context from extract here, that is a bug and should be changed. Same as what I suggested yesterday, we should be doing return trace.set_span_in_context(trace.INVALID_SPAN, context) when we can't get the span context from carrier and passed context doesn't have some pre-existing value.

@owais
Copy link
Contributor

owais commented Apr 1, 2021

extract is not meant to merge anything. It is a very isolated or self-contained operation. It takes a carrier and extracts trace context from it. It's basically just a de-serializer. It should always only de-serialize a previously serialized span context. Explicitly passing a default context is fine as it is very obvious and there is no surprise.

If the extract method automatically defaults to current context, how is the caller supposed to know whether the carrier contained a valid context or not?

ctx = extract(carrier)
if ctx is None:
    # propagated context not found

How would one do this after the change? It seems only way would be to fetch the current context and compare the result?

If the docs say that it'll fallback on the global active context then IMO the docs are wrong and should be updated.

Naturally, what is considered a correct implementation here?
IMO, the correct behavior is for this de-serializer function (exact) to be deterministic. Provided the same input, it should always return the same output and not be influenced by some global object.

Users who need to fallback to current context can easily do it:

ctx = extract(carrier) or current_context
# or with convenience provided by function
ctx = extract(carrier, current_context)

To me this is very obvious for anyone having written a bit of Python. Dict.get, getattr, etc all behave like this. It is very explicit and always does exactly what the code suggests without any side-effects. I see zero value in defaulting to a global context TBH. IMO if we want a function to behave like this, it should probably be a convenience function in the context package or elsewhere.

That's how I see it. May be others have a different opinion.

BTW do you have a specific use case that is not possible or hard to implement without this change? Perhaps discussing in that context might help.

@marcinzaremba
Copy link
Contributor Author

marcinzaremba commented Apr 1, 2021

If the extract method automatically defaults to current context, how is the caller supposed to know whether the carrier contained a valid context or not?

ctx = extract(carrier)
if ctx is None:
    # propagated context not found

How would one do this after the change? It seems only way would be to fetch the current context and compare the result?
Users who need to fallback to current context can easily do it:

According to the current interface extract of the propagator is always supposed to return Context according to the type annotation. How one is supposed to expect None as you shown in the example? Then please change the interface accordingly to allow None as the result, from:

    @abc.abstractmethod
    def extract(
        self,
        carrier: CarrierT,
        context: typing.Optional[Context] = None,
        getter: Getter = default_getter,
    ) -> Context:

to

    @abc.abstractmethod
    def extract(
        self,
        carrier: CarrierT,
        context: typing.Optional[Context] = None,
        getter: Getter = default_getter,
    ) -> Optional[Context]:

@owais

BTW do you have a specific use case that is not possible or hard to implement without this change? Perhaps discussing in that context might help.

Look at the above. The examples that you presented above are in violation with the current type annotation, because you assume None can be returned here, however type annotation specify return value as Context and same is for upper layers which rely on this behavior. Not returning a Context object but a None seems to be a surprising behavior considering that type annotation. Again, what would you recommend to return if no headers are extracted and there is no passed context without changing this type annotation?

To me this is very obvious for anyone having written a bit of Python. Dict.get, getattr, etc all behave like this. It is very explicit and always does exactly what the code suggests without any side-effects. I see zero value in defaulting to a global context TBH. IMO if we want a function to behave like this, it should probably be a convenience function in the context package or elsewhere.

Excluding the "written bit of Python" bit, I am just referring to the code that is already in the code base, which is said type annotation that basically disallow the operation that you just described. Otherwise it would completely fine to do what proposed. However current interfaces and documentation seems to be suggesting otherwise for the potential implementers of other propagators.

@lonewolf3739 That's a good suggestion indeed. Now I understand better...the context of it :D Thanks for your insight.

@owais
Copy link
Contributor

owais commented Apr 1, 2021

Otherwise why one wants to extract a context after activating a span within the same context when this span can be potentially overridden immediately after activation if one intends to "attach" the new/extracted context?

There can be many situations where users would want to differentiate between the two but it's irrelevant because not defaulting to a global context makes all cases possible and easy but more important obvious enough to implement.

Few of situations that come to mind:

  • A batch task processes objects from a batch. The task has one span to track it's entire operation. It also has one span per item in batch. Each item could have been produced by a different process/task. So each item's span needs to be a child of or linked to the span that represents item's producer. For items that don't have a valid span context embedded, their spans will SILENTLY become children of the main task span. For example tasks processing kafka/kinesis/SQS tasks.
  • A web service where multiple libraries are instrumented (gunicorn+flask) and flask instrumentation wants always have the server span (never wants gunicorn span as parent). We can imagine an auto-instrumentation generating current active spans that may or may not be in users control.
  • A service or a worker/thread tracks it's lifetime with a long lived span. We wouldn't want request spans to use this process span as their parent.
  • A user wants to add a specific tag to a span only when it is the local root span for the service i.e, has no parent or parent was extracted from a remote context.
  • A user wants to record metrics or log about how many times it received remote context.

Extract defaulting to current context would make that awkward to implement. Consider:

remote_ctx = extract(carrier)
if remote_ctx:
   ...
...  

vs

ctx = extract(carrier)
if condition_to_detect_ctx_as_remote_or_local:
   ...
...  

The later one doesn't seem obvious/intuitive to me. On the other hand users who don't want any such checks and want to default to current context can simply write extract(carrier, current_context) or extract(carrier) or current_context.

If we automatically return current context, we make some cases very awkward to implement and due to what the API looks like, very surprising IMO. On the other hand, not defaulting makes all cases possible in a straightforward manner.

That's how I personally would expect the extract method to work. May be others think differently. We can discuss in the SIG meeting with everyone else later today.

@owais
Copy link
Contributor

owais commented Apr 1, 2021

@marcinzaremba I'm not concerned about type-annotation or returning None vs InvalidSpan but about returning a hidden global value in case the function is called with bad input.

@marcinzaremba
Copy link
Contributor Author

marcinzaremba commented Apr 1, 2021

@marcinzaremba I'm not concerned about type-annotation or returning None vs InvalidSpan but about returning a hidden global value in case the function is called with bad input.

I am not in favor of it either, don't get me wrong. What I am saying is this seems to be a current hint from the codebase (both from docstring and type annotation of the interface) that this is intended to work like that (which I follow + spec obviously). What is more it is implemented inconsistently among the propagators (B3 vs Jaeger) and what I would expect to see from the API interfaces is the following:

  • the type annotation is correct towards the expectation (if Context can be None, let's tag it as so),
  • docstring describes the expected behavior (if docstring is not correct, let's adjust it to the expected behavior),
  • implementations are consistent in implementing this guidance (let's implement them according to the guidance).

This would make a potential contribution experience much nicer and intention much clearer right from the start, which would avoid any potential discussions and ambiguity towards the matter.

@marcinzaremba
Copy link
Contributor Author

In the end I can contribute to the above if decisions are taken to make it consistent and issues created to make it happen. Thanks a lot for the discussion.

@owais
Copy link
Contributor

owais commented Apr 1, 2021

Makes sense. All these points can be addressed easily. We have a couple of types called InvalidSpan and InvalidContext which can be returned instead of None. None vs InvalidSpanContext is not an issue. Docs can be updated as well. Main point of contention for me is returning the global context from a function that looks like is supposed to deserialize a bunch of strings to a SpanContext instance.

I'll bring it up in the SIG meeting today and see what others have to say about this. You're most welcome to join and share your thoughts if you'd like.

@marcinzaremba
Copy link
Contributor Author

marcinzaremba commented Apr 1, 2021

Sounds good. Please also consider how it all plays out with #1727 (+ discussion in the PR). Returning InvalidSpanContext is encouraging to stop caring about whatever is in given context, which is a problem in CompositePropagator scenario in the first place, not to override the previously valid context. Taking this kind of implementation in mind that means every propagator would need to check if there is no valid span context already in the given context in order not to overwrite it, which sounds a little bit cumbersome, however I am not strongly opinionated about the particular solution (@lonewolf3739 also commented here about this particular issue). None sounds more natural to me however it would violate the current expectation, but examples that you shown seem like a "Pythonic" solution.

would need to check if there is no valid span context already in the given context in order not to overwrite it

This can also be addressed by having a helper method/function within the base class/SDK to handle the obvious, not to duplicate the effort among the propagators.

@marcinzaremba
Copy link
Contributor Author

marcinzaremba commented Apr 1, 2021

Another alternative solution is to introduce one more kind of a CompositePropagator that would guard against overwriting currently valid span context by another propagator, because it seems this issue is mostly isolated to this use case.

@srikanthccv
Copy link
Member

@marcinzaremba If we cannot get span context from carrier because of empty headers/parsing failure/invalid value or any other reason I suggest we do this.

if context is None:
    return trace.set_span_in_context(trace.INVALID_SPAN, context)
else:
    return context # Return context as is so we don't accidentally override with invalid

This is simple, spec-compliant and we don't need to introduce one more kind of a CompositePropagator.

@owais
Copy link
Contributor

owais commented Apr 6, 2021

@marcinzaremba What do you think about what @lonewolf3739 suggested above? This doesn't magically return active/parent context so .extract() would work as expected while never returning None.

Since this is returning None right now, it has actually resulted in a lot of breakage. Basically any web service that does not receive valid context in request headers is crashing right now.

Will @lonewolf3739's suggestion solve the problem you are facing with composite propagators?

@marcinzaremba
Copy link
Contributor Author

Superseded by #1750

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants