New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation on how to use OpenTelemetry +collector #3005
Add documentation on how to use OpenTelemetry +collector #3005
Conversation
…us for serving. Signed-off-by: Evan Anderson <evan.k.anderson@gmail.com>
/assign |
I will review and try the instructions @evankanderson |
If you get stuck, feel free to ping me on Slack. (Oh, I just realized I forgot to count the two operators in the overall RAM overhead... it looks like they add around 120MB, with the otel one being a bit bigger because it has two containers.) |
for the collector: | ||
|
||
```shell | ||
kubectl apply --filename collector.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually instruct in pre-reqs to git clone the repo and change directory, if not it will be a file not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering about linking to raw.githubusercontent.com
instead, so users don't need to do a clone. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think is better to provide htttp URL to raw.githubusercontent.com
1. Finally, update the `config-observability` ConfigMap in Knative Serving and | ||
Eventing | ||
```shell | ||
kubectl patch --namespace knative-serving configmap/config-observability \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the diagram it looks like only queue-proxy
and activator
would be pushing metrics, what about the control-plane controllers? Would they also be pushing metrics or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they would too, I can add more elements, thanks for the callout!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add Eventing objects too like sources which emit metrics.
Using the opentelemetry operator didn't work for me. I think also using Prometheus Operator it hides the simplicity of configure prometheus to scrape I think we should provide a simpler setup, maybe single yaml that defines 1 deployment and 1 ConfigMap for each agent/collector and Prometheus If this example is only about metrics I think we should remove any references to traces to avoid confusion. Do another docs example for traces. |
I'll switch this to direct ConfigMap/Deployment/Service setup, rather than using the operators. Thanks for the feedback. I'll spin a new version of this this morning, hopefully. |
@csantanapr Please take another look/try these again. It turns out that queue-proxy doesn't seem to put the most useful labels on the exported metrics, so I added some |
@@ -0,0 +1,97 @@ | |||
This document describes how to set up the | |||
[OpenTelemetry Collector](https://opentelemetry.io/docs/collector/about/) to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link seems dead, an option is https://opentelemetry.io/docs/collector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
[opentelemetry-operator](https://github.com/open-telemetry/opentelemetry-operator), | ||
but it's also easy to manage this service directly. | ||
|
||
![Diagram of components reporting to collector, which is scraped by Prometheus](./system-diagram.svg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make clear that this is an example architecture and provide a link to more options here. Some users might want to use an Collector as an Agent etc, do we support that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those should work; I wanted to include a minimal example here.
This document describes how to set up the | ||
[OpenTelemetry Collector](https://opentelemetry.io/docs/collector/about/) to | ||
receive metrics from the Knative infrastructure components and distribute them | ||
to Prometheus. [OpenTelemetry](https://opentelemetry.io/) is a CNCF project to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also use the definition of what is advertised at the official site:
OpenTelemetry (CNF project) is an observability framework for cloud native software. It provides a collection of tools, APIs, and SDKs which allow the instrumentation, generation, collection, and export of telemetry data (metrics, logs, and traces) for analysis in order to understand software's performance and behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, copied it over.
@evankanderson any updates on this one? Is it blocked by anything? |
I've addressed the most recent @skonto comments. I think this is still correct -- I've used it intermittently and didn't realize I hadn't gotten the PR submitted. 😁 |
/retest |
/approve Thanks Evan! 🙂 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abrennan89, MontyCarter The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I'll dig out relevant issue numbers in the morning; this provides instructions for setting up the OpenTelemetry collector and Prometheus (in a fairly small footprint; probably around 250MB of RAM based on 200MB for Prometheus and 50MB for the otc-collector.
With this documentation, I'd like to start the clock ticking on migrating existing Prometheus and Stackdriver users to the OpenTelemetry collector so that we can drastically simplify the code in
pkg/metrics
.Proposed Changes
/assign @MontyCarter @mpetason