Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions troubleshoot/ingest/opentelemetry/429-errors-motlp.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
Refer to the [Rate limiting section](opentelemetry://reference/motlp.md#rate-limiting) in the mOTLP reference documentation for details.

* In {{ech}}, the {{es}} capacity for your deployment might be underscaled for the current ingest rate.
* In {{serverless-full}}, rate limiting should not result from {{es}} capacity, since the platform automatically scales ingest capacity. If you suspect a scaling issue, [contact Elastic Support](contact-support.md).
* In {{serverless-full}}, rate limiting should not result from {{es}} capacity, since the platform automatically scales ingest capacity. If you suspect a scaling issue, [contact Elastic Support](/troubleshoot/ingest/opentelemetry/contact-support.md).

Check notice on line 54 in troubleshoot/ingest/opentelemetry/429-errors-motlp.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.Wordiness: Consider using 'because' instead of 'since'.
* Multiple Collectors or SDKs are sending data concurrently without load balancing or backoff mechanisms.

## Resolution
Expand All @@ -62,7 +62,7 @@

If you’ve confirmed that your ingest configuration is stable but still encounter 429 errors:

* {{serverless-full}}: [Contact Elastic Support](contact-support.md) to request an increase in ingest limits.
* {{serverless-full}}: [Contact Elastic Support](/troubleshoot/ingest/opentelemetry/contact-support.md) to request an increase in ingest limits.
* {{ech}} (ECH): Increase your {{es}} capacity by scaling or resizing your deployment:
* [Scaling considerations](../../../deploy-manage/production-guidance/scaling-considerations.md)
* [Resize deployment](../../../deploy-manage/deploy/cloud-enterprise/resize-deployment.md)
Expand Down Expand Up @@ -106,7 +106,7 @@
enabled: true
```

This ensures the Collector buffers data locally while waiting for the ingest endpoint to recover from throttling.
This ensures the Collector buffers data locally while waiting for the ingest endpoint to recover from throttling. For more information on export failures and queue configuration, refer to [Export failures when sending telemetry data](/troubleshoot/ingest/opentelemetry/edot-collector/trace-export-errors.md).

## Best practices

Expand Down
8 changes: 4 additions & 4 deletions troubleshoot/ingest/opentelemetry/connectivity.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
navigation_title: Connectivity issues
description: Troubleshoot connectivity issues between EDOT SDKs, the EDOT Collector, and Elastic.
applies_to:
serverless: all
serverless: ga
product:
edot_collector: ga
products:
Expand Down Expand Up @@ -75,14 +75,14 @@ Connectivity errors usually trace back to one of the following issues:
Errors can look similar whether they come from an SDK or the Collector. Identifying the source helps you isolate the problem.

:::{note}
Note: Some SDKs support setting a proxy directly (for example, using `HTTPS_PROXY`). Refer to [Proxy settings for EDOT SDKs](../opentelemetry/edot-sdks/proxy.md) for details.
Note: Some SDKs support setting a proxy directly (for example, using `HTTPS_PROXY`). Refer to [Proxy settings for EDOT SDKs](/troubleshoot/ingest/opentelemetry/edot-sdks/proxy.md) for details.
:::

#### SDK

Application logs report failures when the SDK cannot send data to the Collector or directly to Elastic. These often appear as `connection refused` or `timeout` messages. If seen, verify that the Collector endpoint is reachable.

For guidance on enabling logs in your SDK, see [Enable SDK debug logging](../opentelemetry/edot-sdks/enable-debug-logging.md).
For guidance on enabling logs in your SDK, refer to [Enable SDK debug logging](/troubleshoot/ingest/opentelemetry/edot-sdks/enable-debug-logging.md).

Example (Java SDK):

Expand Down Expand Up @@ -154,6 +154,6 @@ If basic checks and configuration look correct but issues persist, collect more

* Review proxy settings. For more information, refer to [Proxy settings](opentelemetry://reference/edot-collector/config/proxy.md).

* If ports are confirmed open but errors persist, [enable debug logging in the SDK](../opentelemetry/edot-sdks/enable-debug-logging.md) or [in the Collector](../opentelemetry/edot-collector/enable-debug-logging.md) for more detail.
* If ports are confirmed open but errors persist, [enable debug logging in the SDK](/troubleshoot/ingest/opentelemetry/edot-sdks/enable-debug-logging.md) or [in the Collector](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md) for more detail.

* Contact your network administrator with test results if you suspect firewall restrictions.
4 changes: 2 additions & 2 deletions troubleshoot/ingest/opentelemetry/contact-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ To help Elastic Support investigate the problem efficiently, please include the

### Logs and diagnostics

* Recent Collector logs with relevant errors or warning messages
* Recent Collector logs with relevant errors or warning messages. For guidance on enabling debug logging, refer to [Enable debug logging for the EDOT Collector](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md) or [Enable debug logging for EDOT SDKs](/troubleshoot/ingest/opentelemetry/edot-sdks/enable-debug-logging.md).
* Output from:

```bash
Expand All @@ -92,7 +92,7 @@ To help Elastic Support investigate the problem efficiently, please include the

### Data and UI symptoms

* Are traces, metrics, or logs missing from the UI?
* Are traces, metrics, or logs missing from the UI? For troubleshooting steps, refer to [No data visible in {{kib}}](/troubleshoot/ingest/opentelemetry/no-data-in-kibana.md) or [No application-level telemetry visible in {{kib}}](/troubleshoot/ingest/opentelemetry/edot-sdks/missing-app-telemetry.md).
* Are you using the [Elastic Managed OTLP endpoint](https://www.elastic.co/docs/observability/apm/otel/managed-otel-ingest/)?
* If data is missing or incomplete, consider enabling the [debug exporter](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/debugexporter/README.md) to inspect the raw signal data emitted by the Collector.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ If you're deploying the EDOT Collector in a standalone configuration, try to:
./otelcol --set=service.telemetry.logs.level=debug
```

This is especially helpful for diagnosing configuration parsing issues or startup errors.
This is especially helpful for diagnosing configuration parsing issues or startup errors. For more information on enabling debug logging, refer to [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md).


* Confirm required components are defined
Expand Down Expand Up @@ -95,7 +95,7 @@ If you're deploying the EDOT Collector in a standalone configuration, try to:
lsof -i :4317
```

If needed, adjust your configuration or free up the port.
If needed, adjust your configuration or free up the port. For network connectivity issues, refer to [Connectivity issues](/troubleshoot/ingest/opentelemetry/connectivity.md).

### Kubernetes EDOT Collector

Expand All @@ -117,6 +117,8 @@ If you're deploying the EDOT Collector using the Elastic Helm charts, try to:

Common issues include volume mount errors, image pull failures, or misconfigured environment variables.

If the Collector starts but no data appears in {{kib}}, refer to [No data visible in {{kib}}](/troubleshoot/ingest/opentelemetry/no-data-in-kibana.md) for additional troubleshooting steps.

## Resources

* [Collector configuration documentation](https://opentelemetry.io/docs/collector/configuration/)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ products:

If your EDOT Collector pods terminate with an `OOMKilled` status, this usually indicates sustained memory pressure or potentially a memory leak due to an introduced regression or a bug. You can use the Performance Profiler (`pprof`) extension to collect and analyze memory profiles, helping you identify the root cause of the issue.

If you're running the Collector in Kubernetes and experiencing resource allocation issues, refer to [Insufficient resources in Kubernetes](/troubleshoot/ingest/opentelemetry/edot-collector/insufficient-resources-kubestack.md) for troubleshooting steps.

## Symptoms

These symptoms typically indicate that the EDOT Collector is experiencing a memory-related failure:
Expand All @@ -25,6 +27,8 @@ These symptoms typically indicate that the EDOT Collector is experiencing a memo
- Memory usage steadily increases before the crash.
- The Collector's logs don't show clear errors before termination.

For more detailed diagnostics, refer to [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md).

## Resolution

Turn on runtime profiling using the `pprof` extension and then gather memory heap profiles from the affected pod:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,4 @@ Debug logging for the Collector is not currently configurable through {{fleet}}.

## Resources

To learn how to enable debug logging for the EDOT SDKs, refer to [Enable debug logging for EDOT SDKs](../edot-sdks/enable-debug-logging.md).
To learn how to enable debug logging for the EDOT SDKs, refer to [Enable debug logging for EDOT SDKs](/troubleshoot/ingest/opentelemetry/edot-sdks/enable-debug-logging.md).
48 changes: 33 additions & 15 deletions troubleshoot/ingest/opentelemetry/edot-collector/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,36 @@

# Troubleshoot the EDOT Collector

Perform these checks when troubleshooting common Collector issues:

* Check logs: Review the Collector’s logs for error messages.
* Validate configuration: Use the `--dry-run` option to test configurations.
* Enable debug logging: Run the Collector with `--log-level=debug` for detailed logs.
* Check service status: Ensure the Collector is running with `systemctl status <collector-service>` (Linux) or `tasklist` (Windows).
* Test connectivity: Use `telnet <endpoint> <port>` or `curl` to verify backend availability.
* Check open ports: Run netstat `-tulnp or lsof -i` to confirm the Collector is listening.
* Monitor resource usage: Use top/htop (Linux) or Task Manager (Windows) to check CPU & memory.
* Validate exporters: Ensure exporters are properly configured and reachable.
* Verify pipelines: Use `otelctl` diagnose (if available) to check pipeline health.
* Check permissions: Ensure the Collector has the right file and network permissions.
* Review recent changes: Roll back recent config updates if the issue started after changes.

For in-depth details on troubleshooting refer to the [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/).
Use the topics in this section to troubleshoot issues with the EDOT Collector.

If you're not sure where to start, review the Collector's logs for error messages and validate your configuration using the `--dry-run` option. For more detailed diagnostics, refer to [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md).

## Resource issues

* [Collector out of memory](/troubleshoot/ingest/opentelemetry/edot-collector/collector-oomkilled.md): Diagnose and resolve out-of-memory issues in the EDOT Collector using Go's Performance Profiler.

* [Insufficient resources in {{k8s}}](/troubleshoot/ingest/opentelemetry/edot-collector/insufficient-resources-kubestack.md): Troubleshoot resource allocation issues when running the EDOT Collector in {{k8s}} environments.

## Configuration issues

* [Collector doesn't start](/troubleshoot/ingest/opentelemetry/edot-collector/collector-not-starting.md): Resolve startup failures caused by invalid configuration, port conflicts, or missing components.

* [Missing or incomplete traces due to Collector sampling](/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md): Troubleshoot missing or incomplete traces caused by sampling configuration.

* [Collector doesn't propagate client metadata](/troubleshoot/ingest/opentelemetry/edot-collector/metadata.md): Learn why the Collector doesn't extract custom attributes and how to propagate such values using EDOT SDKs.

## Connectivity and export issues

* [Export failures when sending telemetry data](/troubleshoot/ingest/opentelemetry/edot-collector/trace-export-errors.md): Resolve export failures caused by `sending_queue` overflow and {{es}} exporter timeouts.

## Debugging

* [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md): Learn how to enable debug logging for the EDOT Collector in supported environments.

## See also

Check notice on line 42 in troubleshoot/ingest/opentelemetry/edot-collector/index.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.WordChoice: Consider using 'refer to (if it's a document), view (if it's a UI element)' instead of 'See', unless the term is in the UI.

* [EDOT SDKs troubleshooting](/troubleshoot/ingest/opentelemetry/edot-sdks/index.md): For end-to-end issues that may involve both the Collector and SDKs.

Check notice on line 44 in troubleshoot/ingest/opentelemetry/edot-collector/index.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.WordChoice: Consider using 'can, might' instead of 'may', unless the term is in the UI.

* [Troubleshoot EDOT](/troubleshoot/ingest/opentelemetry/index.md): Overview of all EDOT troubleshooting resources.

For in-depth details on troubleshooting, refer to the contrib [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/).
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ These symptoms are common when the Kube-Stack chart is deployed with insufficien
- Cluster or Daemon pods are unable to export data to the Gateway collector due being `OOMKilled` (high memory usage).
- Pods have logs similar to: `error internal/queue_sender.go:128 Exporting failed. Dropping data.`

For detailed diagnostics on OOMKilled issues, refer to [Collector out of memory](/troubleshoot/ingest/opentelemetry/edot-collector/collector-oomkilled.md). For more information on enabling debug logging, refer to [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md).

## Resolution

Follow these steps to resolve the issue.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ This will not work, as the Collector doesn't automatically extract such values f

## Resolution

If you want to propagate customer IDs or project names into spans or metrics, you must instrument this in your code using one of the SDKs.
If you want to propagate customer IDs or project names into spans or metrics, you must instrument this in your code using one of the SDKs. For SDK-specific troubleshooting guidance, refer to [EDOT SDKs troubleshooting](/troubleshoot/ingest/opentelemetry/edot-sdks/index.md).

Use `span.set_attribute` in your application code, where OpenTelemetry spans are created. For example:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
navigation_title: Collector sampling issues
description: Learn how to troubleshoot missing or incomplete traces in the EDOT Collector caused by sampling configuration.
applies_to:
serverless: all
serverless: ga
product:
edot_collector: ga
products:
Expand All @@ -12,11 +12,11 @@ products:

# Missing or incomplete traces due to Collector sampling

If traces or spans are missing in {{kib}}, the issue might be related to the Collectors sampling configuration.
If traces or spans are missing in {{kib}}, the issue might be related to the Collector's sampling configuration. For general troubleshooting when no data appears in {{kib}}, refer to [No data visible in {{kib}}](/troubleshoot/ingest/opentelemetry/no-data-in-kibana.md).

{applies_to}`stack: ga 9.2` Tail-based sampling (TBS) allows the Collector to evaluate entire traces before deciding whether to keep them. If TBS policies are too strict or not aligned with your workloads, traces you expect to see may be dropped.

Both Collector-based and SDK-level sampling can lead to gaps in telemetry if not configured correctly. See [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md) for more information.
Both Collector-based and SDK-level sampling can lead to gaps in telemetry if not configured correctly. Refer to [Missing or incomplete traces due to SDK sampling](/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md) for more information.

## Symptoms

Expand Down Expand Up @@ -79,4 +79,4 @@ Follow these steps to resolve sampling configuration issues:

- [Tail sampling processor (Collector)](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor)
- [OpenTelemetry sampling concepts - contrib documentation](https://opentelemetry.io/docs/concepts/sampling/)
- [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md)
- [Missing or incomplete traces due to SDK sampling](/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md)
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
navigation_title: Export errors from the EDOT Collector
description: Learn how to resolve export failures caused by `sending_queue` overflow and Elasticsearch exporter timeouts in the EDOT Collector.
applies_to:
serverless: all
serverless: ga
product:
edot_collector: ga
products:
Expand All @@ -14,6 +14,8 @@ products:

During high traffic or load testing scenarios, the EDOT Collector might fail to export telemetry data (traces, metrics, or logs) to {{es}}. This typically happens when the internal queue for outgoing data fills up faster than it can be drained, resulting in timeouts and dropped data.

If you're experiencing network connectivity issues, refer to [Connectivity issues](/troubleshoot/ingest/opentelemetry/connectivity.md). If no data appears in {{kib}}, refer to [No data visible in {{kib}}](/troubleshoot/ingest/opentelemetry/no-data-in-kibana.md).

## Symptoms

You might see one or more of the following messages in the EDOT Collector logs:
Expand Down Expand Up @@ -90,6 +92,8 @@ For a complete list of available metrics, refer to the upstream OpenTelemetry me

* Ensure sufficient CPU and memory for the EDOT Collector.
* Scale vertically (more resources) or horizontally (more replicas) as needed.

For Kubernetes deployments, refer to [Insufficient resources in Kubernetes](/troubleshoot/ingest/opentelemetry/edot-collector/insufficient-resources-kubestack.md) for detailed resource configuration guidance.
::::

::::{step} Optimize Elasticsearch performance
Expand All @@ -105,6 +109,8 @@ Focus tuning efforts on {{es}} performance, Collector resource allocation, and q
:::


For more detailed diagnostics, refer to [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md) to troubleshoot export failures.

## Resources

* [Upstream documentation - OpenTelemetry Collector configuration](https://opentelemetry.io/docs/collector/configuration)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@

The SDK creates logs that allow you to see what it's working on and what might have failed at some point. You can find the logs in [logcat](https://developer.android.com/studio/debug/logcat), filtered by the tag `ELASTIC_AGENT`.

For more information about the SDK's internal logs, as well as how to configure them, refer to the [internal logging policy](apm-agent-android://reference/edot-android/configuration.md#internal-logging-policy) configuration.
For more information about the SDK's internal logs, as well as how to configure them, refer to the [internal logging policy](apm-agent-android://reference/edot-android/configuration.md#internal-logging-policy) configuration. For more information on enabling debug logging, refer to [Enable debug logging for EDOT SDKs](/troubleshoot/ingest/opentelemetry/edot-sdks/enable-debug-logging.md).

Check warning on line 28 in troubleshoot/ingest/opentelemetry/edot-sdks/android/index.md

View workflow job for this annotation

GitHub Actions / vale

Elastic.PluralAbbreviations: Don't use apostrophes when making abbreviations plural. Use 'SDK' instead of ''s'.

## Connectivity to the {{stack}}

If after following the [getting started](apm-agent-android://reference/edot-android/getting-started.md) guide and configuring your {{stack}} [endpoint parameters](apm-agent-android://reference/edot-android/configuration.md#export-connectivity), you can't see your application's data in {{kib}}, you can follow the following tips to try and figure out what could be wrong.
If after following the [getting started](apm-agent-android://reference/edot-android/getting-started.md) guide and configuring your {{stack}} [endpoint parameters](apm-agent-android://reference/edot-android/configuration.md#export-connectivity), you can't see your application's data in {{kib}}, you can follow the following tips to try and figure out what could be wrong. For more detailed connectivity troubleshooting, refer to [Connectivity issues](/troubleshoot/ingest/opentelemetry/connectivity.md). If telemetry data isn't appearing in {{kib}}, refer to [No application-level telemetry visible in {{kib}}](/troubleshoot/ingest/opentelemetry/edot-sdks/missing-app-telemetry.md) or [No data visible in {{kib}}](/troubleshoot/ingest/opentelemetry/no-data-in-kibana.md).

### Check out the logs

Expand Down
Loading
Loading