Add OTEL metrics to cel input#47014
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
This pull request is now in conflicts. Could you fix it? 🙏 |
d45b52e to
b42ba9e
Compare
|
This pull request is now in conflicts. Could you fix it? 🙏 |
b42ba9e to
cdb0179
Compare
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
… to a Sum metricv so it can be visualized in APM
…ponential Histograms
Added a check for an environment variable 'APM_OTLP'. if set, all metric histograms will be exported as Sum (Counter) type. This is support sending metrics to both the APM OTLP endpoint and the managed OTLP endpoint
Histogram defaults to exponential type. Can be changed to use regular histograms by setting environment variable USE_NON_EXPONENTIAL_HISTOGRAMS. Removed flush and shutdown functions due to the exporter being shared.
Shortened metric names. Removed option to export as plain histograms. Cleaned up README. Added a PNG of where metrics are collected
…ead of init() for global ExporterFactory initialization
chrisberkhout
left a comment
There was a problem hiding this comment.
Looks good. If all or most of the issues raised are addressed, it'll be good to get this in. We can continue to refine metrics in later PRs.
…e. Use a different context for Export function so it will not be cancelled as part of application shutdown
faec
left a comment
There was a problem hiding this comment.
Approving for data plane, subject to the (it looks like nearly complete) detailed review by cel input owners.
I don't want to slow this down considering the release deadline we're all pushing for, but we should probably check in once things calm down to make sure OTel metrics migration plans for different Beats components are compatible :-)
Proposed commit message
Added OTEL metrics to cel input to support collection of metrics per input periodic run in agentless environment.
Produces http and cel input metrics using the OTEL SDK and pushes the metrics to either a defined endpoint or the console at the end of each periodic run. No metrics are produced if no environment variables are set.
Produces a count for each defined metric for every periodic run. Each metric set is for a single periodic run.
Histograms are exported as Exponential Histograms.
If the environment variable OTEL_EXPORTER_OTLP_ENDPOINT is set, OTEL OTLP metrics will be exported after each periodic run using the to the endpoint defined in OTEL_EXPORTER_OTLP_ENDPOINT.
Each input has a unique resource attribute set. Any attributes set in the environment variable OTEL_RESOURCE_ATTRIBUTES are added to the input attribute set. Existing keys will not be overwritten
Checklist
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.Disruptive User Impact
The default is to produce no metrics.
The only place that I made changes that could possibly effect behavior outside of this change is: We are wrapping the http transport for http metrics. We have many nested transport wrappers. I do not expect the other transport wrappers
to be effected. But it's something to look out for.
Author's Checklist
How to test this PR locally
Reviewing this PR requires building beats, building elastic-agent, then running elastic-agent standalone against a cluster.
I used the serverless cluster on prod as this has a managed OTLP endpoint. You can also run this against a 9.3.0-SNAPSHOT using elastic-package. To use elastic-package you will need to enable the APM server. Create another profiel with apm enabled. Note, if you are MacOS, you will need to change the docker file to expose a different port than 8200 and rebuild elastic-package because MacOS uses that port for another service.
There are two ways to test this.
To build agentbeat: checkout branch, cd ../beats/x-pack/agentbeat and run
DEV=true SNAPSHOT=true PLATFORMS=darwin/arm64 mage buildreplace PLATFORMS with correct platform for builds on non MacOS machines.
Not required unless you are overwriting the agentbeat in an existing elastic-agent installation.
If so, overwrite the agentbeat at /data/elastic-agent-/components
To build elastic-agent: cd into elastic-agent repo and Build elastic-agent
DEV=true EXTERNAL=false SNAPSHOT=true PLATFORMS=darwin/arm64 PACKAGES=tar.gz mage -v packagereplace PLATFORMS with correct platform for builds on non MAC machines. Make sure that beats repo is in the same directory as the elastic-agent repo since the EXTERNAL=false will pull beats code from the co-located beats repo instead of from github.
in elastic-agent repo
cd ./build/distributions
tar -xvzf <elastic-agent-.tar.gz>
cd into untarred directory elastic-agent-.
rm elastic-agent.yml (we will replace this before running the elastic-agent)
The rest of the directions are for serverless. Create an observability serverless cluster
Get environment variables for APM
On Bottom left side click "Add Data"
Choose "Application" from choices of "What do you want to monitor?"
Choose "OpenTelemetry" from "Monitor your Application using:"
Copy the 3 environment variables from section 2. Values from OTEL_RESOURCE_ATTRIBUTES are added to the resource object that each CEL input creates to identify itself. It's presence is required.
In the OTEL_RESOURCE_ATTRIBUTES template, replace with elastic-agent, app-version with the version being used. You may choose to override deployment.environment.
Each CEL input behaves like it's own application. All CEL applications require OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS values as well.
Add an integration. I have a simple CEL integration package that requires no configuration if you want an easy one to use.
a. On lower right chose "Install Elastic Agent"
b. In the first paragraph in the next page click the link that says “standalone mode”
c. This takes you to the configuration page for the integration.
d. After filling out configuration, on lower right click“Save and Continue”
e. ON configure Agent page: Create API Key
f. Download policy
e. Do not install agent. Leave page
Copy the downloaded policy to elastic-agent.yml into ./build/distributions/elastic-agent--
Start the agent in development mode. In ./build/distributions/elastic-agent*
sudo OTEL_RESOURCE_ATTRIBUTES="<value>" OTEL_EXPORTER_OTLP_ENDPOINT="<value>" OTEL_EXPORTER_OTLP_HEADERS="<value>" ./elastic-agent run -e --develop &> output.txtCheck for data in the cluster.
On Left choose "Discover"
in Date View, use dropdown to select 'metrics-'
Filter by package and datastream name: package.datastream : "<package_name>.<datastream.name>"
All the metrics for periodic run will have the same timestamp. For any timestamp there will be 19 metrics. 1 document per metrics, except for the http.cleint. metrics which may have multiple metrics.
"input.cel.periodic.run",
"input.cel.periodic.program.run.started",
"input.cel.periodic.program.run.success",
"input.cel.periodic.batch.received",
"input.cel.periodic.batch.published",
"input.cel.periodic.event.received",
"input.cel.periodic.event.published",
"input.cel.periodic.run.duration",
"input.cel.periodic.cel.duration",
"input.cel.periodic.event.publish.duration",
"input.cel.program.batch.received",
"input.cel.program.batch.published",
"input.cel.program.event.received",
"input.cel.program.event.published",
"input.cel.program.run.duration",
"input.cel.program.cel.duration",
"input.cel.program.publish.duration",
"http.client.request.body.size", (could be multiple documents for this depending upon integration used for test)
"http.client.request.duration", (could be multiple documents for this depending upon integration used for test)
Verify that metrics exist for each of these names.
Look for metrics beginning with
input.cel.periodic.* (cel processing metrics for each periodic run) (all are counters)
input.cel.program.* (cel processing metrics for each program run. Most are histograms across all the program runs for the periodic run)
CEL periodic and program metrics.input.cel.*
To look at http metrics that are generated from the SDK
Other filtering options:
For instance if the id in the elastic-agent.yml is "- id: cel-cel_simple-d78ef7a8-0757-4606-902e-c6a7f9320013"
then you can filter by
resource.attributes.service.instance.id : "cel-cel_simple.fakedts-d78ef7a8-0757-4606-902e-c6a7f9320013"
Related issues