Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log export fails with '413 Request Entity Too Large' status code #16834

Closed
julealgon opened this issue Dec 9, 2022 · 17 comments
Closed

Log export fails with '413 Request Entity Too Large' status code #16834

julealgon opened this issue Dec 9, 2022 · 17 comments
Labels
bug Something isn't working data:logs Logs related issues exporter/datadog Datadog components priority:p2 Medium Stale

Comments

@julealgon
Copy link

Component(s)

exporter/datadog

What happened?

Description

I've observed one of our applications has stopped pushing logs via the collector, and then double checked the error log in the VM running the collector.

It appears to be failing with a 413 Request Entity Too Large error, retrying multiple times, and then discarding the data.

Steps to Reproduce

Unknown at this point what is causing this.

Expected Result

No exceptions and logs flowing through to datadog.

Actual Result

Logs are not flowing for this particular application.

Collector version

0.62.1

Environment information

Environment

OS: (e.g., "Windows Server 2019 DataCenter 10.0.17763.3650")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:

  hostmetrics:
    collection_interval: 10s
    scrapers:
      paging:
        metrics:
          system.paging.utilization:
            enabled: true
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
      disk:
      filesystem:
        metrics:
          system.filesystem.utilization:
            enabled: true
      load:
      memory:
      network:
      processes:

processors:
  batch:
    timeout: 10s

  resource:
    attributes:
    - key: deployment.environment
      value: "beta"
      action: upsert

exporters:
  datadog:
    api:
      key: ${DATADOG_APIKEY}

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [datadog]
    metrics:
      receivers: [hostmetrics, otlp]
      processors: [resource, batch]
      exporters: [datadog]
    logs:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [datadog]

Log output

2022-12-09T16:05:07.076Z	info	exporterhelper/queued_retry.go:427	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "logs", "name": "datadog", "error": "413 Request Entity Too Large", "interval": "18.341313917s"}
2022-12-09T16:05:14.630Z	error	logs/sender.go:68	Failed to send logs	{"kind": "exporter", "data_type": "logs", "name": "datadog", "error": "413 Request Entity Too Large", "msg": "{\"errors\":[{\"status\":\"413\",\"title\":\"Request Entity Too Large\",\"detail\":\"Request too large\"}]}", "status_code": "413 Request Entity Too Large"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter/internal/logs.(*Sender).SubmitLogs
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.62.0/internal/logs/sender.go:68
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*logsExporter).consumeLogs
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.62.0/logs_exporter.go:101
go.opentelemetry.io/collector/exporter/exporterhelper.(*logsRequest).Export
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/logs.go:65
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/common.go:203
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/queued_retry.go:388
go.opentelemetry.io/collector/exporter/exporterhelper.(*logsExporterWithObservability).send
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/logs.go:132
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/queued_retry.go:206
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/internal/bounded_memory_queue.go:61
2022-12-09T16:05:25.629Z	error	logs/sender.go:68	Failed to send logs	{"kind": "exporter", "data_type": "logs", "name": "datadog", "error": "413 Request Entity Too Large", "msg": "{\"errors\":[{\"status\":\"413\",\"title\":\"Request Entity Too Large\",\"detail\":\"Request too large\"}]}", "status_code": "413 Request Entity Too Large"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter/internal/logs.(*Sender).SubmitLogs
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.62.0/internal/logs/sender.go:68
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*logsExporter).consumeLogs
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.62.0/logs_exporter.go:101
go.opentelemetry.io/collector/exporter/exporterhelper.(*logsRequest).Export
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/logs.go:65
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/common.go:203
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/queued_retry.go:388
go.opentelemetry.io/collector/exporter/exporterhelper.(*logsExporterWithObservability).send
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/logs.go:132
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/queued_retry.go:206
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/internal/bounded_memory_queue.go:61
2022-12-09T16:05:25.629Z	error	exporterhelper/queued_retry.go:176	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "logs", "name": "datadog", "error": "max elapsed time expired 413 Request Entity Too Large", "dropped_items": 6604}
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).onTemporaryFailure
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/queued_retry.go:176
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/queued_retry.go:411
go.opentelemetry.io/collector/exporter/exporterhelper.(*logsExporterWithObservability).send
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/logs.go:132
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/queued_retry.go:206
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
	go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/internal/bounded_memory_queue.go:61

Additional context

Nothing relevant at this point.

@julealgon julealgon added bug Something isn't working needs triage New item requiring triage labels Dec 9, 2022
@julealgon
Copy link
Author

I initially had a suspicion that this could've been caused by a couple of recent changes we did to the application:

  • migrated from .NET6 to .NET7
  • updated every single dependency to latest versions
  • added service name and namespace, as well as OtelSDK info (were missing before)

However, I used a separate sample application here that I use for testing purposes and did the exact same steps with it. Then, I added the exact same service information that I added to the original application, and I can see the traces in Datadog.

I'm completely lost now as to what could be the root cause of this.

I'll try to disable instrumentations one by one to see if that might be related in any way.

@codeboten codeboten added the exporter/datadog Datadog components label Dec 10, 2022
@github-actions
Copy link
Contributor

Pinging code owners for exporter/datadog: @KSerrania @mx-psi @gbbr @knusbaum @amenasria @dineshg13. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dineshg13
Copy link
Member

@julealgon can you please try with the latest v0.67.0 version of collector. This was fixed in #16380.

@mx-psi mx-psi added data:logs Logs related issues waiting for author priority:p2 Medium and removed needs triage New item requiring triage labels Dec 12, 2022
@julealgon
Copy link
Author

@julealgon can you please try with the latest v0.67.0 version of collector. This was fixed in #16380.

@dineshg13 I will try updating today along with a few other tests. However, are you sure this was fixed as part of #16380? I actually did find that issue before opening this one, but they have different HTTP status codes. Looks like that one is talking about the URL, while mine is caused by payload size.

@julealgon
Copy link
Author

@dineshg13 after updating to latest (0.67.0), I'm now not seeing any errors in the collector console but no logs are coming at all from the application.

If I push logs from my local environment instead, I see them as before.

I have no idea what is going on here. I'm pushing from local using the exact same application/version as I'm using in Azure.

I'm now wondering if this is something about the Azure .NET7 AppService environment that is causing the issue. Just in case, I'll try updating every OTEL .NET library to their latest beta versions and redeploy the app.

@julealgon
Copy link
Author

julealgon commented Dec 12, 2022

Ok, after further testing, I can confirm this does not seem to be an Azure-specific connection issue: after adding a file exporter, I can actually see data being logger there for my service.

It appears that scope information is completely missing however. All the logs inside the scope are intact, but the scope itself (the tracing main data) is not there so it does not exist in Datadog as well.

Here is the output I saw from making a single request to my application (I've redacted some of the info):

{
    "resourceLogs": [{
            "resource": {
                "attributes": [{
                        "key": "service.name",
                        "value": {
                            "stringValue": "REDACTED"
                        }
                    }, {
                        "key": "service.namespace",
                        "value": {
                            "stringValue": "REDACTED"
                        }
                    }, {
                        "key": "service.instance.id",
                        "value": {
                            "stringValue": "5e334c5e-5ef0-4c8c-80f4-ebf9a93d2d8b"
                        }
                    }, {
                        "key": "telemetry.sdk.name",
                        "value": {
                            "stringValue": "opentelemetry"
                        }
                    }, {
                        "key": "telemetry.sdk.language",
                        "value": {
                            "stringValue": "dotnet"
                        }
                    }, {
                        "key": "telemetry.sdk.version",
                        "value": {
                            "stringValue": "1.3.1.622"
                        }
                    }, {
                        "key": "deployment.environment",
                        "value": {
                            "stringValue": "beta"
                        }
                    }
                ]
            },
            "scopeLogs": [{
                    "scope": {},
                    "logRecords": [{
                            "timeUnixNano": "1670858470630743200",
                            "severityNumber": 9,
                            "severityText": "Information",
                            "body": {
                                "stringValue": "IDX10242: Security token: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]' has a valid signature."
                            },
                            "attributes": [{
                                    "key": "dotnet.ilogger.category",
                                    "value": {
                                        "stringValue": "Microsoft.IdentityModel.LoggingExtensions.IdentityLoggerAdapter"
                                    }
                                }
                            ],
                            "traceId": "ddaa4e9c5a223c774e0b3817d88d173e",
                            "spanId": "86430313a5338a30"
                        }, {
                            "timeUnixNano": "1670858470630840500",
                            "severityNumber": 9,
                            "severityText": "Information",
                            "body": {
                                "stringValue": "IDX10239: Lifetime of the token is valid."
                            },
                            "attributes": [{
                                    "key": "dotnet.ilogger.category",
                                    "value": {
                                        "stringValue": "Microsoft.IdentityModel.LoggingExtensions.IdentityLoggerAdapter"
                                    }
                                }
                            ],
                            "traceId": "ddaa4e9c5a223c774e0b3817d88d173e",
                            "spanId": "86430313a5338a30"
                        }, {
                            "timeUnixNano": "1670858470630929700",
                            "severityNumber": 9,
                            "severityText": "Information",
                            "body": {
                                "stringValue": "IDX10234: Audience Validated.Audience: 'api://REDACTED'"
                            },
                            "attributes": [{
                                    "key": "dotnet.ilogger.category",
                                    "value": {
                                        "stringValue": "Microsoft.IdentityModel.LoggingExtensions.IdentityLoggerAdapter"
                                    }
                                }
                            ],
                            "traceId": "ddaa4e9c5a223c774e0b3817d88d173e",
                            "spanId": "86430313a5338a30"
                        }, {
                            "timeUnixNano": "1670858470630982500",
                            "severityNumber": 9,
                            "severityText": "Information",
                            "body": {
                                "stringValue": "IDX10245: Creating claims identity from the validated token: '[PII of type 'System.IdentityModel.Tokens.Jwt.JwtSecurityToken' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'."
                            },
                            "attributes": [{
                                    "key": "dotnet.ilogger.category",
                                    "value": {
                                        "stringValue": "Microsoft.IdentityModel.LoggingExtensions.IdentityLoggerAdapter"
                                    }
                                }
                            ],
                            "traceId": "ddaa4e9c5a223c774e0b3817d88d173e",
                            "spanId": "86430313a5338a30"
                        }, {
                            "timeUnixNano": "1670858470631025800",
                            "severityNumber": 9,
                            "severityText": "Information",
                            "body": {
                                "stringValue": "IDX10241: Security token validated. token: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'."
                            },
                            "attributes": [{
                                    "key": "dotnet.ilogger.category",
                                    "value": {
                                        "stringValue": "Microsoft.IdentityModel.LoggingExtensions.IdentityLoggerAdapter"
                                    }
                                }
                            ],
                            "traceId": "ddaa4e9c5a223c774e0b3817d88d173e",
                            "spanId": "86430313a5338a30"
                        }
                    ]
                }
            ]
        }
    ]
}

Notice how all logs have the trace/span IDs, but scope info is missing entirely:

            "scopeLogs": [{
                    "scope": {},

EDIT:
I think I misunderstood the output here. The resourceSpans element is completely missing from the collector. I see both a resourceSpans and a resourceLogs element when firing the same request from my local app.

I'll look into potential changes in the AspNetCore instrumentation library.

@julealgon
Copy link
Author

@dineshg13 , I'm seeing the 413 error again right now, using latest collector:

2022-12-12T16:05:48.309Z	error	logs/sender.go:84	Failed to send logs	{"kind": "exporter", "data_type": "logs", "name": "datadog", "error": "413 Request Entity Too Large", "msg": "{\"errors\":[{\"status\":\"413\",\"title\":\"Request Entity Too Large\",\"detail\":\"Request too large\"}]}", "status_code": "413 Request Entity Too Large"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter/internal/logs.(*Sender).SubmitLogs
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.67.0/internal/logs/sender.go:84
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*logsExporter).consumeLogs
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.67.0/logs_exporter.go:101
go.opentelemetry.io/collector/exporter/exporterhelper.(*logsRequest).Export
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/logs.go:65
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/common.go:208
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/queued_retry.go:387
go.opentelemetry.io/collector/exporter/exporterhelper.(*logsExporterWithObservability).send
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/logs.go:135
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/queued_retry.go:205
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/internal/bounded_memory_queue.go:61
2022-12-12T16:05:48.309Z	error	exporterhelper/queued_retry.go:175	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "logs", "name": "datadog", "error": "max elapsed time expired 413 Request Entity Too Large", "dropped_items": 8266}
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).onTemporaryFailure
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/queued_retry.go:175
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/queued_retry.go:410
go.opentelemetry.io/collector/exporter/exporterhelper.(*logsExporterWithObservability).send
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/logs.go:135
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/queued_retry.go:205
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
	go.opentelemetry.io/collector@v0.67.0/exporter/exporterhelper/internal/bounded_memory_queue.go:61

Still not seeing any traces after bumping every single OTEL library to their latest versions.

@julealgon
Copy link
Author

I figured why I was not seeing tracing info and it was due to something else entirely:

I'll still leave this one open since the 413 responses are still happening, even if they are unrelated to the lack of traces.

@dineshg13
Copy link
Member

@julealgon whats the size of the log you are trying to submit to Datadog ?

@julealgon
Copy link
Author

@julealgon whats the size of the log you are trying to submit to Datadog ?

How can I get that information @dineshg13 ? We have many projects currently sending log information through the collector, so I don't precisely know which particular source is sending potentially big logs that could be causing this error.

The logs from the exporter also don't mention anything about which payload is failing. I'd need help in identifying what payload is actually causing this problem.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 13, 2023
@julealgon
Copy link
Author

@dineshg13 can you confirm whether this might still be an issue or not?

@dineshg13
Copy link
Member

@julealgon we have fixed this issue. Can you please try using latest collector-contrib image ?

@github-actions github-actions bot removed the Stale label Feb 14, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 17, 2023
@dineshg13
Copy link
Member

@julealgon were u able to try this ?

@julealgon
Copy link
Author

@julealgon were u able to try this ?

I did update the collector a while ago and the problem seemed to go away.

@mx-psi
Copy link
Member

mx-psi commented Apr 18, 2023

Thanks for following up @julealgon! I am closing this based on your comment :)

@mx-psi mx-psi closed this as completed Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data:logs Logs related issues exporter/datadog Datadog components priority:p2 Medium Stale
Projects
None yet
Development

No branches or pull requests

4 participants