Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for "zero-touch telemetry" requirements #5

Merged
merged 3 commits into from
Jul 13, 2019
Merged

RFC for "zero-touch telemetry" requirements #5

merged 3 commits into from
Jul 13, 2019

Conversation

bhs
Copy link
Contributor

@bhs bhs commented Jul 1, 2019

@bhs
Copy link
Contributor Author

bhs commented Jul 1, 2019

sigh... I got a new laptop last weekend and need to fix my Git config (re the CLA failure). Will do...

time passes...

done.

@AloisReitbauer
Copy link

As mentioned on Gitter I support the activity if it covers autoinjection and auto instrumentation.

@@ -0,0 +1,57 @@
# (Open) Telemetry Without Manual Instrumentation

_Cross-language requirements for automated approaches to extracting portable telemetry data with zero source code modification._

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also add injection of tracers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AloisReitbauer can you be more specific about the requirement here? I can interpret "injection of tracers" in various ways and don't want to misunderstand/misrepresent.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below on specifying the trace at program startup and OpenTel being able to use this parameter to set - and load - the trace at runtime.

0002-telemetry-without-manual-instrumentation.md Outdated Show resolved Hide resolved
* Licensing must be permissive (e.g., ASL / BSD)
* Packaging must allow vendors to “wrap” or repackage the portable (OpenTelemetry) library into a single asset that’s delivered to customers
* That is, vendors do not want to require users to comprehend both an OpenTel package and a vendor-specific package
* Explicit, whitebox OpenTelemetry instrumentation must interoperate with the “automatic” / zero-source-code-modification / blackbox instrumentation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the far biggest issue right for automatic instrumentation. Unfortunately, it is a bit more complicated than just managing spans. The biggest problem we have is double instrumentation i.e. the same action is instrumented with blackbox and whitebox instrumentation - as you call it.

Detection and disabling is one thing, but maybe one instrumentation augments the information of another. How do you want to handle these situations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not planning to make augmentation a requirement for our first take on this problem. I'll add it as a nice-to-have, though, as it certainly would be... well, nice to have.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. I think we need to get started on this. As we see more real-world use cases of double instrumentation we will also better understand how to solve the issue. Currently, there is quite some research going on at Dynatrace to figure out how to make this work best.

Components in OpenTelemetry might already solve a big part of the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that all makes perfect sense and I agree wholeheartedly with

Components in OpenTelemetry might already solve a big part of the problem.

"Components" (or "plugins" or whatever we call them) are a crucial abstraction, especially when mixing and matching various forms of instrumentation.


### Nice-to-have properties

* Run-time integration (vs compile-time integration)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For us this is a must have

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this here for two reasons:

  1. Some other folks on the original GitHub issue (or maybe it was the google doc?) were suggesting we drop the requirement to make v1 easier to build, and
  2. More fundamentally, in languages like Go it's unclear how this would work without forking the compiler. So it felt odd listing this as a requirement

Certainly runtime integration is preferable, all other things being equal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are discussing this with the Go team right now. Happy to bring them in.

I also believe it, there would be an easy way for dynamic tracer loading using an environment variable for the tracer to load and the OpenTel library to load this tracer. We need to define how situations are handled when a tracer is compiled in.

Generally, I would propose to only compile the code with a noop tracer and provider the actual tracer to use as a parameter.

Going a bit deep into the weeds here, but I want to clarify my point and what I would like this group to look at.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpful, thanks for clarifying. Speaking only for myself, I feel uneasy mandating that Tracers are loaded via environment variables only, though I certainly acknowledge the benefits of structuring things that way in certain scenarios.

Re this point:

We are discussing this with the Go team right now. Happy to bring them in.

I think that might be premature, at least for this high-level multi-language RFC. For Go specifically, it would be fantastic to get them involved (and I imagine the various Googlers involved with the project can help with that, too).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not mean only, but there should be a well-defined way to load via environment variables.


## Trade-offs and mitigations

Approaching a problem this language-specific at the cross-language altitude is intrinsically challenging since "different languages are different" – e.g., in Go there is no way to perform the kind of runtime interpositioning that's possible in Python, Ruby, or even Java.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely true. Dynatrace has runtime instrumentation for Go :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a dig on Dynatrace, but I'm sure it doesn't work as well or as flexibly as it does in Java... so I would stand by the "that's possible in" bit here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No offense was taken :-). I believe we agree on the problem. We - meaning all tracing tool providers - need to invest a lot of magic to make this work. We often use undocumented features or even (security) loopholes of platforms to make this work.

* Licensing must be permissive (e.g., ASL / BSD)
* Packaging must allow vendors to “wrap” or repackage the portable (OpenTelemetry) library into a single asset that’s delivered to customers
* That is, vendors do not want to require users to comprehend both an OpenTel package and a vendor-specific package
* Explicit, whitebox OpenTelemetry instrumentation must interoperate with the “automatic” / zero-source-code-modification / blackbox instrumentation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding interop, it'll be helpful to clarify the versioning/compatibility story open-telemetry/opentelemetry-specification#115.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyang no argument on this at all... open-telemetry/opentelemetry-specification#115 is unresolved at the moment, so difficult to incorporate here, but certainly the versioning story (and "component"-ization story as @AloisReitbauer has brought up here) is/are a big part of the challenge for interoperable whitebox+blackbox instrumentation.

* Packaging must allow vendors to “wrap” or repackage the portable (OpenTelemetry) library into a single asset that’s delivered to customers
* That is, vendors do not want to require users to comprehend both an OpenTel package and a vendor-specific package
* Explicit, whitebox OpenTelemetry instrumentation must interoperate with the “automatic” / zero-source-code-modification / blackbox instrumentation.
* If the blackbox instrumentation starts a Span, whitebox instrumentation must be able to discover it as the active Span (and vice versa)
Copy link
Member

@reyang reyang Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the whitebox instrumentation has multiple tracers, how to define the "active span"?

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/tracing-api.md#getcurrentspan

Probably we need a way for the blackbox instrumentation to discover the tracers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyang I don't know what you mean by "multiple traces" – do you mean "multiple tracers"? If so, I was of the understanding (feel free to correct me 😉) that the "multiple tracer" scenario is more of a "tracer multiplexer" where there would still be only a single active span.

Probably we need a way for the blackbox instrumentation to discover the tracers?

Almost certainly, yes, though IMO that's "an implementation detail," albeit an important one. I.e., I wouldn't call it a "requirement", just something that we'll need to do in order to satisfy the requirements (if that distinction makes sense).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks. (and you're right, I've made a typo, it is "multiple tracers", I've updated the comment.

* Explicit, whitebox OpenTelemetry instrumentation must interoperate with the “automatic” / zero-source-code-modification / blackbox instrumentation.
* If the blackbox instrumentation starts a Span, whitebox instrumentation must be able to discover it as the active Span (and vice versa)
* Relatedly, there also must be a way to discover and avoid potential conflicts/overlap/redundancy between explicit whitebox instrumentation and blackbox instrumentation of the same libraries/packages
* That is, if a developer has already added the “official” OpenTelemetry plugin for, say, gRPC, then when the blackbox instrumentation effort adds gRPC support, it should *not* “double-instrument” it and create a mess of extra spans/etc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the developer took whitebox approach and configured a specific exporter/propagator? Should the blackbox instrumentation modify/tweak it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what is most likely going to happen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is naive of me, but my hope (??) was that both the blackbox and whitebox OpenTelemetry code would be portable and non-vendor-specific. I.e., that the choice of exporter/propagator should not be relevant here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally - yes. Our initial research of double instrumentation and potential issues indicates that this is anyways a problem that needs to be solved in backend analytics.

@lizthegrey
Copy link
Member

Hi, I'm here to say on behalf of @honeycombio that we're interested in contributing the Beelines (e.g. for NodeJS Express) to OpenTelemetry, which allow for the most common language frameworks automatic hooking. It's not quite "zero touch" but it's also "add one library dependency" rather than add manual spans everywhere.

Is this something you'd like me to file a separate RFC for, or should we also discuss it here?

@bhs
Copy link
Contributor Author

bhs commented Jul 11, 2019

@lizthegrey this is great to hear!

Is there a technical writeup (can just be bullets / etc, nothing formal) describing the design goals for Beelines? Would be particularly interested to understand how they would translate to a Span-oriented worldview.

I think the various conversations would be easier to navigate if that was its own thread (separate from this RFC, I mean), but that's just me.

@bhs
Copy link
Contributor Author

bhs commented Jul 11, 2019

Hi folks... I'm inclined to merge this in the next 48h if there are no more blocking concerns.

To be clear, this RFC can continue to evolve and be PR'd against. I just don't want to leave it in this pending state forever.

(cc @AloisReitbauer @reyang @lizthegrey who've all made comments)

@lizthegrey
Copy link
Member

@lizthegrey this is great to hear!

Is there a technical writeup (can just be bullets / etc, nothing formal) describing the design goals for Beelines? Would be particularly interested to understand how they would translate to a Span-oriented worldview.

I think the various conversations would be easier to navigate if that was its own thread (separate from this RFC, I mean), but that's just me.

No strong objections to the existing agent/bytecode approach, and looking forward to see how it evolves... I'll start a separate PR for talking about add one code snippet as opposed to zero-touch integration :)

@AloisReitbauer
Copy link

I already wanted to ask when we are going to merge. I am fine with it.

@bhs
Copy link
Contributor Author

bhs commented Jul 11, 2019

Yeah, I have a reminder set to merge tomorrow AM (PT) barring any additional concerns.

@lizthegrey
Copy link
Member

#7 -- here's RFC 0003 for people interested in the alternate and complementary approach to instrumentation we propose :)

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the great work!

### Requirements

Without further ado, here are a set of requirements for “official” OpenTelemetry efforts to accomplish zero-source-code-modification instrumentation (i.e., “OpenTelemetry agents”) in any given language:
* No _manual_ source code modifications allowed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be relaxed a bit. In some languages it may not be possible to attach instrumentation without some minimal O(1) initialization code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll loosen it a bit. For languages (e.g., Java) where there is a lot of prior art for truly zero-touch instrumentation, I think that should still be our goal, though.

... There are certainly a lot of legacy JARs out there that can only be instrumented from the outside in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(PTAL)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge this, but will happily debate this point further in subsequent PRs to the RFC itself.

@bhs bhs merged commit 1b1914d into open-telemetry:master Jul 13, 2019
@mwear
Copy link
Member

mwear commented Jul 17, 2019

I wrote up a proposal of how auto-instrumentation could work for OpenTelemetry Ruby here. It covers the distribution, detection and installation of instrumentation, and while it's pretty specific to Ruby, some of the ideas might be applicable to other languages.

iredelmeier referenced this pull request in iredelmeier/oteps Jul 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants