Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation on how to use OpenTelemetry +collector #3005

Merged
merged 7 commits into from Mar 17, 2021

Conversation

evankanderson
Copy link
Member

I'll dig out relevant issue numbers in the morning; this provides instructions for setting up the OpenTelemetry collector and Prometheus (in a fairly small footprint; probably around 250MB of RAM based on 200MB for Prometheus and 50MB for the otc-collector.

With this documentation, I'd like to start the clock ticking on migrating existing Prometheus and Stackdriver users to the OpenTelemetry collector so that we can drastically simplify the code in pkg/metrics.

Proposed Changes

  • Add documentation on exporting statistics using the OpenTelemetry aggregator.

/assign @MontyCarter @mpetason

…us for serving.

Signed-off-by: Evan Anderson <evan.k.anderson@gmail.com>
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Nov 6, 2020
@knative-prow-robot knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 6, 2020
@csantanapr
Copy link
Member

/assign

@csantanapr
Copy link
Member

I will review and try the instructions @evankanderson

@evankanderson
Copy link
Member Author

If you get stuck, feel free to ping me on Slack.

(Oh, I just realized I forgot to count the two operators in the overall RAM overhead... it looks like they add around 120MB, with the otel one being a bit bigger because it has two containers.)

for the collector:

```shell
kubectl apply --filename collector.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually instruct in pre-reqs to git clone the repo and change directory, if not it will be a file not found

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about linking to raw.githubusercontent.com instead, so users don't need to do a clone. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think is better to provide htttp URL to raw.githubusercontent.com

1. Finally, update the `config-observability` ConfigMap in Knative Serving and
Eventing
```shell
kubectl patch --namespace knative-serving configmap/config-observability \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the diagram it looks like only queue-proxy and activator would be pushing metrics, what about the control-plane controllers? Would they also be pushing metrics or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they would too, I can add more elements, thanks for the callout!

Copy link
Contributor

@skonto skonto Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add Eventing objects too like sources which emit metrics.

@csantanapr
Copy link
Member

Using the opentelemetry operator didn't work for me.

I think also using Prometheus Operator it hides the simplicity of configure prometheus to scrape

I think we should provide a simpler setup, maybe single yaml that defines 1 deployment and 1 ConfigMap for each agent/collector and Prometheus

If this example is only about metrics I think we should remove any references to traces to avoid confusion.

Do another docs example for traces.

@evankanderson
Copy link
Member Author

I'll switch this to direct ConfigMap/Deployment/Service setup, rather than using the operators. Thanks for the feedback.

I'll spin a new version of this this morning, hopefully.

docs/install/collecting-metrics/collector.yaml Outdated Show resolved Hide resolved
docs/install/collecting-metrics/README.md Outdated Show resolved Hide resolved
@evankanderson
Copy link
Member Author

@csantanapr Please take another look/try these again.

It turns out that queue-proxy doesn't seem to put the most useful labels on the exported metrics, so I added some label_replace examples.

@@ -0,0 +1,97 @@
This document describes how to set up the
[OpenTelemetry Collector](https://opentelemetry.io/docs/collector/about/) to
Copy link
Contributor

@skonto skonto Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link seems dead, an option is https://opentelemetry.io/docs/collector

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

[opentelemetry-operator](https://github.com/open-telemetry/opentelemetry-operator),
but it's also easy to manage this service directly.

![Diagram of components reporting to collector, which is scraped by Prometheus](./system-diagram.svg)
Copy link
Contributor

@skonto skonto Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make clear that this is an example architecture and provide a link to more options here. Some users might want to use an Collector as an Agent etc, do we support that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those should work; I wanted to include a minimal example here.

This document describes how to set up the
[OpenTelemetry Collector](https://opentelemetry.io/docs/collector/about/) to
receive metrics from the Knative infrastructure components and distribute them
to Prometheus. [OpenTelemetry](https://opentelemetry.io/) is a CNCF project to
Copy link
Contributor

@skonto skonto Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use the definition of what is advertised at the official site:

OpenTelemetry (CNF project) is an observability framework for cloud native software. It provides a collection of tools, APIs, and SDKs which allow the instrumentation, generation, collection, and export of telemetry data (metrics, logs, and traces) for analysis in order to understand software's performance and behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, copied it over.

@abrennan89
Copy link
Contributor

@evankanderson any updates on this one? Is it blocked by anything?

@abrennan89 abrennan89 added triage/needs-eng-input Engineering input is requested status/blocked labels Mar 5, 2021
@evankanderson
Copy link
Member Author

I've addressed the most recent @skonto comments. I think this is still correct -- I've used it intermittently and didn't realize I hadn't gotten the PR submitted. 😁

Base automatically changed from master to main March 8, 2021 17:41
@abrennan89
Copy link
Contributor

/retest

@abrennan89 abrennan89 assigned abrennan89 and unassigned mpetason Mar 17, 2021
@abrennan89 abrennan89 added this to the v0.22.0 milestone Mar 17, 2021
@abrennan89
Copy link
Contributor

/approve
/lgtm

Thanks Evan! 🙂

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 17, 2021
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abrennan89, MontyCarter

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot merged commit f2e741e into knative:main Mar 17, 2021
@evankanderson evankanderson deleted the reading-the-meter branch March 24, 2021 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/blocked triage/needs-eng-input Engineering input is requested
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants