Request Entity Too Large when connecting to AWS ElasticSearch #2192

kalta · 2020-04-21T08:27:38Z

Requirement

Sending tracings from a client using ElasticSearch backend (as a service in AWS), Zipkin protocol over http.

Problem

It works perfectly, but, after a while, it seems Jaeger starts skipping all traces, not sending anything else to ElasticSearch and a restart of the container is needed to work again.

I am using stand-alone product, version 1.17.0.

Messages in log appear for each request that is discarded:

{"level":"error","ts":1587454192.6916847,"caller":"config/config.go:137","msg":"Elasticsearch could not process bulk request","request_count":65,"failed_count":0,"error":"elastic: Error 413 (Request Entity Too Large)","response":null,"stacktrace":"github.com/jaegertracing/jaeger/pkg/es/config.(*Configuration).NewClient.func2\n\tgithub.com/jaegertracing/jaeger/pkg/es/config/config.go:137\ngithub.com/jaegertracing/jaeger/vendor/github.com/olivere/elastic.(*bulkWorker).commit\n\tgithub.com/jaegertracing/jaeger/vendor/github.com/olivere/elastic/bulk_processor.go:588\ngithub.com/jaegertracing/jaeger/vendor/github.com/olivere/elastic.(*bulkWorker).work\n\tgithub.com/jaegertracing/jaeger/vendor/github.com/olivere/elastic/bulk_processor.go:487"}

I tried the parameters ES_BULK_SIZE and ES_BULK_ACTIONS without success. This is the way docker container is started:

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
  -e SPAN_STORAGE_TYPE=elasticsearch \
  -e ES_SERVER_URLS="https://***.amazonaws.com/" \
  -e ES_BULK_SIZE=100000 \
  -e ES_BULK_ACTIONS=10 \
  -e QUERY_BASE_PATH=/jaeger \
  -p 16686:16686 \
  -p 9411:9411 \
  -p 14269:14269 \
  jaegertracing/all-in-one:1.17.0

Thank you!

The text was updated successfully, but these errors were encountered:

pavolloffay · 2020-04-21T09:10:22Z

The docker command you referenced uses lower bulk settings that the default - that is a good way to debug this. Could you please confirm that this configuration works and then it suddenly stops? After what time duration Jaeger starts failing?

kalta · 2020-04-21T11:25:45Z

Yes - it works and then suddenly starts failing, with this lower configuration too.
It works on a restart, then starts failing again, some times in 5 minutes, some times in some hours.

kalta · 2020-04-21T11:26:16Z

I can try to lower it further. And thank you for your fast response!!

kalta · 2020-04-22T07:09:29Z

I can confirm the same happens even with ES_BULK_ACTIONS=1 and ES_BULK_SIZE=1000.

kalta · 2020-04-22T16:02:48Z

Hi. I was sending the spans in batches of 1000. I reduced it to 100. Same result. It stops working after a while.

pavolloffay · 2020-04-22T16:25:27Z

Could you please do a test against upstream elasticsearch https://www.docker.elastic.co/? Just run it as a docker container and configure Jaeger to use it.

docker run -it --rm -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -p 9200:9200 -p 9300:9300 -e "http.host=0.0.0.0" -e "discovery.type=single-node" --name=elasticsearch docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.4

kalta · 2020-04-22T16:56:22Z

Done, working for now (as before). I will report if it fails.

kalta · 2020-04-23T14:31:57Z

In a direct connection to upstream elasticsearch, it works perfectly. It is even the same version (6.8) however something in AWS ElasticSearch makes it fail after a while. Any ideas?

pavolloffay · 2020-04-24T08:48:05Z

We are not using it maybe somebody from @jaegertracing/elasticsearch has any ideas?

Maybe you could raise it in AWS support.

ledor473 · 2020-04-24T15:11:34Z

You might be facing the Maximum Size of HTTP Request Payloads limit in AWS ESS (documented here)

As for why a container restart is needed, I would think it's because the error returned by AWS ESS causes Jaeger to re-attempt to send it. But obviously, if the data was too big the first time, it will only be bigger if you continue to receive more and try to re-send it later.

kalta · 2020-04-25T09:09:57Z

Ok, but the question would be then why option ES_BULK_SIZE does not work? I set it to a much lower value than AWS limit (1K) (-e ES_BULK_SIZE=1000). Maybe it is not the correct format?

ledor473 · 2020-04-27T00:47:04Z

I'm not entirely sure, but ES_BULK_SIZE which translate to BulkSize in the elastic client used in Jaeger seems to control the minimum payload:

Now, when does bulk processor send these batches? There are 3 parameters that you can control:
...
2. When the batch exceeds a certain size (in bytes).
...

Maybe one of the application is buffering a lot of Spans which causes that behavior?

mrgasparov · 2022-05-18T14:36:18Z

I'm facing the same issue using open distro 7.6.2 (unfortunately this is the only version which is available on our cloud provider). Is there a way to limit the amount of spans sent in one bulk request, using the jaeger-operator helm chart? That would definitely be the easiest solution to this problem.

ghost added the needs-triage label Apr 21, 2020

pavolloffay added storage/elasticsearch and removed needs-triage labels Apr 21, 2020

kalta closed this as completed Apr 28, 2020

kalta reopened this Apr 28, 2020

kalta closed this as completed Apr 28, 2020

kalta reopened this Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request Entity Too Large when connecting to AWS ElasticSearch #2192

Request Entity Too Large when connecting to AWS ElasticSearch #2192

kalta commented Apr 21, 2020

pavolloffay commented Apr 21, 2020

kalta commented Apr 21, 2020 •

edited

kalta commented Apr 21, 2020 •

edited

kalta commented Apr 22, 2020

kalta commented Apr 22, 2020

pavolloffay commented Apr 22, 2020

kalta commented Apr 22, 2020

kalta commented Apr 23, 2020

pavolloffay commented Apr 24, 2020

ledor473 commented Apr 24, 2020

kalta commented Apr 25, 2020

ledor473 commented Apr 27, 2020

mrgasparov commented May 18, 2022 •

edited

Request Entity Too Large when connecting to AWS ElasticSearch #2192

Request Entity Too Large when connecting to AWS ElasticSearch #2192

Comments

kalta commented Apr 21, 2020

Requirement

Problem

pavolloffay commented Apr 21, 2020

kalta commented Apr 21, 2020 • edited

kalta commented Apr 21, 2020 • edited

kalta commented Apr 22, 2020

kalta commented Apr 22, 2020

pavolloffay commented Apr 22, 2020

kalta commented Apr 22, 2020

kalta commented Apr 23, 2020

pavolloffay commented Apr 24, 2020

ledor473 commented Apr 24, 2020

kalta commented Apr 25, 2020

ledor473 commented Apr 27, 2020

mrgasparov commented May 18, 2022 • edited

kalta commented Apr 21, 2020 •

edited

kalta commented Apr 21, 2020 •

edited

mrgasparov commented May 18, 2022 •

edited