Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto instrumentation via pod mutation #455

Closed
anuraaga opened this issue Oct 11, 2021 · 7 comments
Closed

Auto instrumentation via pod mutation #455

anuraaga opened this issue Oct 11, 2021 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@anuraaga
Copy link
Contributor

anuraaga commented Oct 11, 2021

I'd like to propose a new feature for the k8s operator (which I can work on), the ability to inject and enable auto instrumentation with no user code changes to their dockerfiles. Being able to opt in pods, or even namespaces to auto instrumentation, could be a transformative experience on k8s where observability is ensured by the infrastructure team, without involvement from app teams.

This is somewhat related to opentelemetry-lambda - it has a similar job of injecting auto instrumentation into Lambda runtimes and the approaches will generally be similar.

Basic premise

Enabling auto instrumentation generally requires two things to happen

  • Have the actual instrumentation libraries available in the container's filesystem. This could be the javaagent, or the contents of node_modules for opentelemetry-js instrumentation libraries, etc
  • Edit the entrypoint command or an environment variable to use these libraries when running the application.

These can happen as part of building an image by modifying Dockerfile, but the k8s operator could instead inject the files and edit the runtime command without build changes.

Packaging Instrumentation

The package format / ecosystem for k8s is docker images. For each implemented language, we would publish a docker image containing the instrumentation libraries for the language. GHCR may be an appropriate location, though any container registry could be used. For example, ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-k8s-java-autoinstrumentation

Init container / volume

The operator can mutate a pod manifest to make instrumentation libraries available to an app container by copying from the docker image into a local volume. The simplest approach that I know if is using an init container, with a volume mounted RW and a simple cp command line. The app container would be modified to reference the same volume as RO.

Update runtime

The app's container can be mutated in a language specific way to reference the instrumentation in the mounted volume.

One corner case is if an environment variable is updated but also referenced in the Dockerfile, it may get overridden and require a user to copy that environment variable into their k8s yaml. There is probably an approach to work around this though.

Language specific details. I've tried the approach for Java using k8s yaml and it worked well, others I haven't vetted with yaml yet. My assumption is any hand-written yaml boilerplate I could write could instead be applied by the operator automatically.

Java

Package contents: opentelemetry javaagent

Runtime update: Add or update JAVA_TOOL_OPTIONS to reference the java agent

This is identical to the approach taken by opentelemetry-lambda

https://github.com/open-telemetry/opentelemetry-lambda/blob/main/java/layer-javaagent/scripts/otel-handler

JS

Package contents: A wrapper library that initializes instrumentation, with the node_modules generated by npm install on a package.json referencing all instrumentation libraries that are used by the wrapper.

Runtime update: Add or update NODE_OPTIONS to reference the wrapper

TODO: Find the best option for adding the wrapper / libraries to the module lookup path

This is identical to the approach taken by opentelemetry-lambda

https://github.com/open-telemetry/opentelemetry-lambda/blob/main/nodejs/packages/layer/scripts/otel-handler

Python

Package contents: Site packages created by pip install of all opentelemetry-python instrumentation libraries. While most apps use opentelemetry-bootstrap to automatically determine a subset of instrumentation to include, our volume should contain all of them to allow full auto instrumentation

Runtime update: Prepend container entrypoint with opentelemetry-instrument

TODO: Find the best option to add the instrumentation packages to the module lookup path

Ruby

TBD

Dotnet

TBD

PHP

TBD

Go

Likely not possible due to static compilation

@jpkrohling
Copy link
Member

I love this idea and I know @pavolloffay is interested in this topic as well. I'd say that you can go ahead with a PoC :-)

@jpkrohling jpkrohling added the enhancement New feature or request label Oct 11, 2021
@anuraaga anuraaga changed the title Auto instrumentation via operator Auto instrumentation via pod mutation Oct 11, 2021
@pavolloffay
Copy link
Member

hi @anuraaga I have already build a POC but in a separate operator https://github.com/pavolloffay/opentelemetry-instrumentation-operator. I was planning to submit a PR to bring it here (at least for Java initially).

For languages, without "agent" feature we can still provide this functionality that would serve as a control plane - e.g. configure the SDK (reporting, sampling...)

I am willing to submit a PR with this functionality if you haven't started working on this already.

@anuraaga
Copy link
Contributor Author

Thanks @pavolloffay - I looked through that code and the approach looks quite similar, would be great if you could add it here for Java! And I could help extend that with another language - I think we'll be able to get most languages supported, not only a control plane but actual auto instrumentation which will be quite cool.

@Aneurysm9
Copy link
Member

This is a great idea, though I'm sad Go can't take advantage of it. :)

I wonder about the footprint of copying the instrumentation libraries to a new volume for each pod. Could that get to be rather large with a large number of pods? Would it be possible to use a PersistentVolume with ReadOnlyMany mode to share access to the libraries?

@anuraaga
Copy link
Contributor Author

anuraaga commented Oct 13, 2021

I wonder about the footprint of copying the instrumentation libraries to a new volume for each pod.

Yup, this is something on my mind. PersistentVolume comes to mind, but has its own complexity such as the long time it can take to provision one, capacity-related inability to do so, or whether the cluster even has a PersistentVolumeController at all (my understanding is EKS by default doesn't, for example). I think we will want to explore these sort of optimizations going forward, but I'm only aware of the init container as a fullproof, if possibly inefficient, approach.

@pavolloffay
Copy link
Member

pavolloffay commented Oct 27, 2021

The initial implementation will be merged soon. Now adding here my task list for the follow-up PRs

@pavolloffay
Copy link
Member

I think we can close this issue now and create dedicated well-defined followup issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants