OTLP exporters don't respect timeout #2663

aabmass · 2022-05-05T22:35:42Z

OTLP exporter as of today

try to export, with a 30 10 second timeout, fail and wait 2 seconds

try to export, with a 30 10 second timeout, fail and wait 4 seconds

try to export, with a 30 10 second timeout, fail and wait 8 seconds

try to export, with a 30 10 second timeout, fail and wait 16 seconds

try to export, with a 30 10 second timeout, fail and wait 32 seconds

try to export, with a 30 10 second timeout, fail and wait 64 seconds and finally give up

In an ideal scenario the BSP has to cancel the export call after 30s irrespective of exporter handling mechanism. But it doesn't do that with 30s export mills timeout passed to via env/arg.

aabmass · 2022-05-06T03:04:58Z

Moving some discussion from the PR here:

    def _export(
        self,
        data: TypingSequence[SDKDataT],
        timeout_millis: Optional[float] = None,
    ) -> ExportResultT:
        if timeout_millis is not None:
            timeout_seconds = timeout_millis / 10**3
        else:
            timeout_seconds = self._timeout
Help me understand this. It is generally self._timeout except when we want to finish work as early as possible force_flush or shutdown in which case we pass the timeout_millis from processor/periodicexportingreader?
From our discussion last week, IIRC you said the timeout configured directly on the exporter (via constructor or environment vars) should apply to individual requests, potentially within a series of retries.

cc @srikanthccv

I checked the Go impl (where this is a common pattern) and it actually does something different.

If the exporter's config timeout (self._timeout for us) is present, it uses that.
- Note, if that is longer than the context's timeout is, it will actually use the shorter one
If not present, it uses the just the parent context
It doesn't have a different context for individual requests vs retries (retry logic here)

This is really confusing to me because DEADLINE_EXCEEDED is retryable but it will never happen? I'm not really sure what the intention of the spec is. Maybe a few examples can help tease out the details:

BSP has as longer timeout

user sets OTEL_EXPORTER_OTLP_TIMEOUT=10000
they set OTEL_BSP_EXPORT_TIMEOUT=30000

does the export abort after 10s? Or should it retry until 30 seconds? What if the user calls force flush with 15s timeout?

BSP has a shorter timeout

user sets OTEL_EXPORTER_OTLP_TIMEOUT=30000
they set OTEL_BSP_EXPORT_TIMEOUT=10000

presumably the BSP should win out here

User passes a longer timeout than either

user sets OTEL_EXPORTER_OTLP_TIMEOUT=10000
they set OTEL_BSP_EXPORT_TIMEOUT=10000

What if the user calls force flush with a 30s timeout? Should we wait 30s or respect the shorter configured options of BSP/OTLP? Maybe there is an argument for making the timeout params Optional to indicate no deadline from the parent call

gsakkis · 2024-02-09T22:20:16Z

Any update on this? Not sure if it's the same bug but I set the timeout to 1 sec and it keeps trying to reach the (inexistent) endpoint:

WARNING - 2024-02-10 00:16:51,218 - opentelemetry.exporter.otlp.proto.grpc.exporter - exporter - Transient error StatusCode.UNAVAILABLE encountered while exporting traces to foo, retrying in 1s.
WARNING - 2024-02-10 00:16:52,227 - opentelemetry.exporter.otlp.proto.grpc.exporter - exporter - Transient error StatusCode.UNAVAILABLE encountered while exporting traces to foo, retrying in 2s.
WARNING - 2024-02-10 00:16:54,233 - opentelemetry.exporter.otlp.proto.grpc.exporter - exporter - Transient error StatusCode.UNAVAILABLE encountered while exporting traces to foo, retrying in 4s.
WARNING - 2024-02-10 00:16:58,238 - opentelemetry.exporter.otlp.proto.grpc.exporter - exporter - Transient error StatusCode.UNAVAILABLE encountered while exporting traces to foo, retrying in 8s.
WARNING - 2024-02-10 00:17:06,245 - opentelemetry.exporter.otlp.proto.grpc.exporter - exporter - Transient error StatusCode.UNAVAILABLE encountered while exporting traces to foo, retrying in 16s.
WARNING - 2024-02-10 00:17:22,255 - opentelemetry.exporter.otlp.proto.grpc.exporter - exporter - Transient error StatusCode.UNAVAILABLE encountered while exporting traces to foo, retrying in 32s.

Fixes open-telemetry#2284, open-telemetry#2663, open-telemetry#3309

havrylenkok · 2024-03-19T21:19:57Z

This behavior is EXTREMELY problematic when collector has worse performance than the application. Which for many is acceptable, after all this is telemetry data.

Especially when we are talking about Lambda, with sub-second invocations and short timeouts.

Current workaround for autoinstrumentation I see is to patch the retries out completely during the cold start:

from opentelemetry.metrics._internal import get_meter_provider
meter_provider = get_meter_provider()  # should return the global object
meter_provider._sdk_config.metric_readers[0]._exporter._retryable = lambda _: False

jonapich · 2024-05-15T11:18:10Z

I wanted to keep some form of retry so I made this patch:

from opentelemetry.exporter.otlp.proto.common._internal import _create_exp_backoff_generator  # this is a copy


def _patched_otel_create_exp_backoff_generator(*args: Any, **kwargs: Any) -> Any:
    original_max_value = kwargs.get("max_value", 0)
    new_max_value = 2
    kwargs["max_value"] = new_max_value

    for backoff in _create_exp_backoff_generator(*args, **kwargs):
        yield backoff
        if backoff == new_max_value:
            if original_max_value:
                yield original_max_value  # the original function will stop if we return its max value
            break

Here's how you can apply this patch:

import logging

def _patch_otel_backoff_strategy() -> None:
    try:
        # import without the "from" to get the reference, not a copy
        import opentelemetry.exporter.otlp.proto.common._internal
        import opentelemetry.exporter.otlp.proto.grpc.exporter

        # patch the source as well as its copy in the exporter module
        opentelemetry.exporter.otlp.proto.common._internal._create_exp_backoff_generator = (
            opentelemetry.exporter.otlp.proto.grpc.exporter._create_exp_backoff_generator
        ) = _patched_otel_create_exp_backoff_generator
    except Exception:
        logging.warning("Failed to patch the OpenTelemetry exporter backoff strategy.")

aabmass added bug Something isn't working metrics 1.10.0rc1 release candidate 1 for metrics GA labels May 5, 2022

aabmass added this to Unassigned SDK in Metrics RC May 5, 2022

This was referenced May 6, 2022

Add timeouts to metric SDK #2653

Merged

Add callback options with timeout to observable callbacks #2664

Merged

ocelotl removed the 1.10.0rc1 release candidate 1 for metrics GA label May 6, 2022

ocelotl removed this from Unassigned SDK in Metrics RC May 6, 2022

lzchen added the release:after-ga To be resolved after GA release label Jul 29, 2022

aabmass mentioned this issue Jul 20, 2023

Exporters shutdown takes longer then a minute when failing to send metrics/traces #3309

Open

Arnatious added a commit to Arnatious/opentelemetry-python that referenced this issue Mar 7, 2024

export and shutdown timeouts for all OTLP exporters

84a2e2b

Fixes open-telemetry#2284, open-telemetry#2663, open-telemetry#3309

Arnatious added a commit to Arnatious/opentelemetry-python that referenced this issue Mar 7, 2024

export and shutdown timeouts for all OTLP exporters

c3c2415

Fixes open-telemetry#2284, open-telemetry#2663, open-telemetry#3309

Arnatious added a commit to Arnatious/opentelemetry-python that referenced this issue Mar 7, 2024

export and shutdown timeouts for all OTLP exporters

6625df4

Fixes open-telemetry#2284, open-telemetry#2663, open-telemetry#3309

Arnatious added a commit to Arnatious/opentelemetry-python that referenced this issue Mar 7, 2024

export and shutdown timeouts for all OTLP exporters

bd19a3d

Fixes open-telemetry#2284, open-telemetry#2663, open-telemetry#3309

Arnatious mentioned this issue Mar 7, 2024

export and shutdown timeouts for all OTLP exporters #3764

Open

10 tasks

Arnatious added a commit to Arnatious/opentelemetry-python that referenced this issue Mar 7, 2024

export and shutdown timeouts for all OTLP exporters

3cd93c9

Fixes open-telemetry#2284, open-telemetry#2663, open-telemetry#3309

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTLP exporters don't respect timeout #2663

OTLP exporters don't respect timeout #2663

aabmass commented May 5, 2022

aabmass commented May 6, 2022 •

edited

Loading

gsakkis commented Feb 9, 2024 •

edited

Loading

havrylenkok commented Mar 19, 2024

jonapich commented May 15, 2024

OTLP exporters don't respect timeout #2663

OTLP exporters don't respect timeout #2663

Comments

aabmass commented May 5, 2022

aabmass commented May 6, 2022 • edited Loading

BSP has as longer timeout

BSP has a shorter timeout

User passes a longer timeout than either

gsakkis commented Feb 9, 2024 • edited Loading

havrylenkok commented Mar 19, 2024

jonapich commented May 15, 2024

aabmass commented May 6, 2022 •

edited

Loading

gsakkis commented Feb 9, 2024 •

edited

Loading