Skip to content

Add the EDOT collector service#67

Merged
ezimuel merged 11 commits intomainfrom
feature/otel
Sep 15, 2025
Merged

Add the EDOT collector service#67
ezimuel merged 11 commits intomainfrom
feature/otel

Conversation

@ezimuel
Copy link
Copy Markdown
Collaborator

@ezimuel ezimuel commented Jul 8, 2025

This PR add the EDOT (Elastic Distribution of OpenTelemetry) collector in start-local using the option --edot. This is done using the edot_collector configuration reported here.
This PR should address the request in #55.

To test the --edot option we can run the following command:

curl -fsSL https://raw.githubusercontent.com/elastic/start-local/refs/heads/feature/otel/start-local.sh | sh -s -- --edot

To be done:

  • test the EDOT collector in tests
  • document the --edot option in README.md

start-local.sh Outdated

exporters:
elasticsearch:
endpoint: http://elasticsearch:9200
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endpoint: http://elasticsearch:9200
endpoints: [ "http://elasticsearch:9200" ]

Shouldn't this be plural?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copy & paste from here where it's singular.

Copy link
Copy Markdown
Member

@xrmx xrmx Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source of truth should be the collector documentation:
https://www.elastic.co/docs/reference/opentelemetry/edot-collector/config/default-config-standalone

This is the example config for the collector in agent mode (i.e. will export to ES) that will also collect logs and generate metrics from the host https://raw.githubusercontent.com/elastic/elastic-agent/refs/tags/v9.0.3/internal/pkg/otel/samples/linux/logs_metrics_traces.yml

I think it's a bit too much but maybe hostmetrics are a good source of data to do some smoke testing? i.e. you can check that you can query them in elasticsearch

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need help on this configuration for the docker-compose.yml. The goal is to offer a default settings for EDOT Collector (Standalone) to be used locally for start using the Observability stack in Elastic. I need the config here for the service and the docker specification here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rogercoll let's chat to get your support on generating a configuration that can enable us to deploy EDOT collector as a gateway for the start-local effort. So far I have referenced the one we are using for K8s gateway, but we might need to hardcode the endpoints and enrichment processors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both endpoint and endpoints configurations are supported in the Elasticsearch exporter. The first one is because the exporter embedding the upstream confighttp configuration: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/elasticsearchexporter/config.go#L69C13-L69C25

The latest was previously added because alignment with elasticsearch-go client configuration: https://github.com/elastic/go-elasticsearch/blob/main/elasticsearch.go#L71

Not a strong opinion, but maybe use endpoints as being the one used in our public docs: https://www.elastic.co/docs/reference/opentelemetry/edot-collector/config/default-config-standalone#data-export

@SylvainJuge
Copy link
Copy Markdown
Member

Do you plan to also generate a dedicated API key for the otel ingestion like it's currently provided for ES clients ? Without that users would have to perform another manual step to generate the API key before being able to send otel data to the otel collector.

ezimuel and others added 2 commits July 8, 2025 16:40
Co-authored-by: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com>
Co-authored-by: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com>
@ezimuel
Copy link
Copy Markdown
Collaborator Author

ezimuel commented Jul 8, 2025

@SylvainJuge it this API key different from the ES one? Can you point me to any documentation about it? Thanks.

I don't know if it's different, but there is dedicated UI for in in Kibana, also you can use https://www.elastic.co/docs/api/doc/kibana/operation/operation-createagentkey to create a new one for APM agents (which also includes EDOT SDKs).

if [ -z "${esonly:-}" ]; then
if [ "$otel" = "true" ]; then
cat >> .env <<- EOM
ES_LOCAL_JAVA_OPTS="-Xms2g -Xmx2g"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we set that Xms to 2GB intentionally? I don't think we'll compare well to Grafana's stack and others like that ("big", "bloated",...). We should IMO still initialize as small as possible with the necessary room to grow if needed. Even if we take a small performance hit when needing to increase the heap size.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xeraa this is the default configuration proposed here. I'm not an expert of EDOT collector and I asked to the OTel team to help on this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, IMO this can (should) use the same settings and general approach used for the other options. We still want this to start really lightweight.

Copy link
Copy Markdown

@mlunadia mlunadia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments about imaged used and collector config.

start-local.sh Outdated
if [ "$otel" = "true" ]; then
cat >> uninstall.sh <<- EOM
if docker rmi docker.elastic.co/elastic-agent/elastic-otel-collector:${es_version} >/dev/null 2>&1; then
echo "Image docker.elastic.co/elastic-agent/elastic-otel-collector:${es_version} removed successfully"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezimuel we should be using the image of Elastic Agent and trigger otel mode, there is a flag that enables the Elastic Agent container to start in otel mode = EDOT

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlunadia Are we interested in any other Elastic Agent binaries/features except the Otel collector? If we are only interested in the EDOT collector I would recommend using the elastic-agent/elastic-otel-collector image as being smaller than the main elastic-agent image: elastic/elastic-agent#7173

Copy link
Copy Markdown

@mlunadia mlunadia Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just EDOT collector for start-local, for edge we might add examples later

# Add the OTLP configs in docker-compose.yml
cat >> docker-compose.yml <<-'EOM'
configs:
# This is the minimal yaml configuration needed to listen on all interfaces
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to work on this config as we are missing some of the core processors like batch. As reference we can begin using this configuration with the exception of the inframetrics processor which is marked for removal in it. https://github.com/elastic/elastic-agent/blob/main/internal/pkg/otel/samples/linux/gateway.yml

start-local.sh Outdated

exporters:
elasticsearch:
endpoint: http://elasticsearch:9200
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rogercoll let's chat to get your support on generating a configuration that can enable us to deploy EDOT collector as a gateway for the start-local effort. So far I have referenced the one we are using for K8s gateway, but we might need to hardcode the endpoints and enrichment processors.

start-local.sh Outdated

exporters:
elasticsearch:
endpoint: http://elasticsearch:9200
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both endpoint and endpoints configurations are supported in the Elasticsearch exporter. The first one is because the exporter embedding the upstream confighttp configuration: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/elasticsearchexporter/config.go#L69C13-L69C25

The latest was previously added because alignment with elasticsearch-go client configuration: https://github.com/elastic/go-elasticsearch/blob/main/elasticsearch.go#L71

Not a strong opinion, but maybe use endpoints as being the one used in our public docs: https://www.elastic.co/docs/reference/opentelemetry/edot-collector/config/default-config-standalone#data-export

connectors:
elasticapm:

processors:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a couple of batch processor configurations? batch for logs and traces pipelines while batch/metrics for metrics pipelines, sample configuration: https://github.com/elastic/elastic-agent/blob/main/internal/pkg/otel/samples/linux/gateway.yml#L38-L44

(note that they need to be referenced in the pipeline's configuration too)

start-local.sh Outdated
Comment on lines +765 to +770
logs_dynamic_index:
enabled: true
metrics_dynamic_index:
enabled: true
traces_dynamic_index:
enabled: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed as being deprecated:

No-op. Documents are now always routed dynamically unless logs_index is not empty. Will be removed in a future version.

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/elasticsearchexporter/README.md#elasticsearch-document-routing

start-local.sh Outdated
# Kibana settings container name
kibana_settings_container_name="kibana-local-settings${ES_LOCAL_DIR:+-${ES_LOCAL_DIR}}"
# OTEL container name
otel_container_name="otel-collector${ES_LOCAL_DIR:+-${ES_LOCAL_DIR}}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt of renaming the container name to elastic-otel-collector or edot-collector to differentiate from the upstream image?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can have different services based on the version (--v option) and to avoid conflict we need to have different names.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have worded that better. My suggestion would be to include the edot keyword on the container name prefix:

Suggested change
otel_container_name="otel-collector${ES_LOCAL_DIR:+-${ES_LOCAL_DIR}}"
otel_container_name="edot-collector${ES_LOCAL_DIR:+-${ES_LOCAL_DIR}}"

(EDOT = Elastic Distributions of OpenTelemetry)

processors:
elastictrace:

exporters:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the debug exporter with the default configuration? That is really helpful to quickly check if data is being received by the collector, similar to https://github.com/elastic/elastic-agent/blob/main/internal/pkg/otel/samples/linux/gateway.yml#L48

ezimuel and others added 2 commits August 7, 2025 16:34
Co-authored-by: Roger Coll <roger.coll@elastic.co>
Co-authored-by: Roger Coll <roger.coll@elastic.co>
@ezimuel ezimuel changed the title WIP: Add the OTEL collector service WIP: Add the EDOT collector service Aug 22, 2025
@ezimuel ezimuel changed the title WIP: Add the EDOT collector service Add the EDOT collector service Sep 10, 2025
@ezimuel ezimuel marked this pull request as ready for review September 10, 2025 12:52
@ezimuel
Copy link
Copy Markdown
Collaborator Author

ezimuel commented Sep 12, 2025

@rogercoll and @mlunadia I updated the PR with the requested changes. Thanks.

Copy link
Copy Markdown
Contributor

@rogercoll rogercoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

mapping:
mode: otel

service:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
service:
service:
telemetry:
metrics:
readers:
- pull:
exporter:
prometheus:
host: '0.0.0.0'
port: 8888

One way to test the container is up and running would be exposing the collector's internal metrics through a Prometheus exporter interface. Then, we can assert the response code of curl http://edot-collector:8888/metrics

Copy link
Copy Markdown
Contributor

@rogercoll rogercoll Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach, that would not require exposing the collector's internal metrics, would be sending a testing payload to the already available http OTLP endpoint. Example:

$ curl -X POST http://localhost:4318/v1/logs \
  -H "Content-Type: application/json" \
  -d '{
    "Timestamp": "1634630400000",
    "ObservedTimestamp": "1634630401000",
    "TraceId": "abcd1234",
    "SpanId": "efgh5678",
    "SeverityText": "DEBUG",
    "SeverityNumber": "5",
    "Body": "Testing log to assert collector OTLP endpoint",
    "Resource": {
      "service.name": "start-local-testing"
    },
    "InstrumentationScope": {},
    "Attributes": {}
  }' \
  -o /dev/null -s -w "%{http_code}\n"
200

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rogercoll, I think we'll choose this approach since this does not require changing the configuration.

@ezimuel ezimuel dismissed mlunadia’s stale review September 15, 2025 15:33

All suggestions have been applied

@ezimuel ezimuel merged commit 83ef641 into main Sep 15, 2025
11 checks passed
@ezimuel ezimuel mentioned this pull request Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants