Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request Entity Too Large when connecting to AWS ElasticSearch #2192

Open
kalta opened this issue Apr 21, 2020 · 13 comments
Open

Request Entity Too Large when connecting to AWS ElasticSearch #2192

kalta opened this issue Apr 21, 2020 · 13 comments

Comments

@kalta
Copy link

kalta commented Apr 21, 2020

Requirement

Sending tracings from a client using ElasticSearch backend (as a service in AWS), Zipkin protocol over http.

Problem

It works perfectly, but, after a while, it seems Jaeger starts skipping all traces, not sending anything else to ElasticSearch and a restart of the container is needed to work again.

I am using stand-alone product, version 1.17.0.

Messages in log appear for each request that is discarded:

{"level":"error","ts":1587454192.6916847,"caller":"config/config.go:137","msg":"Elasticsearch could not process bulk request","request_count":65,"failed_count":0,"error":"elastic: Error 413 (Request Entity Too Large)","response":null,"stacktrace":"github.com/jaegertracing/jaeger/pkg/es/config.(*Configuration).NewClient.func2\n\tgithub.com/jaegertracing/jaeger/pkg/es/config/config.go:137\ngithub.com/jaegertracing/jaeger/vendor/github.com/olivere/elastic.(*bulkWorker).commit\n\tgithub.com/jaegertracing/jaeger/vendor/github.com/olivere/elastic/bulk_processor.go:588\ngithub.com/jaegertracing/jaeger/vendor/github.com/olivere/elastic.(*bulkWorker).work\n\tgithub.com/jaegertracing/jaeger/vendor/github.com/olivere/elastic/bulk_processor.go:487"}

I tried the parameters ES_BULK_SIZE and ES_BULK_ACTIONS without success. This is the way docker container is started:

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
  -e SPAN_STORAGE_TYPE=elasticsearch \
  -e ES_SERVER_URLS="https://***.amazonaws.com/" \
  -e ES_BULK_SIZE=100000 \
  -e ES_BULK_ACTIONS=10 \
  -e QUERY_BASE_PATH=/jaeger \
  -p 16686:16686 \
  -p 9411:9411 \
  -p 14269:14269 \
  jaegertracing/all-in-one:1.17.0

Thank you!

@pavolloffay
Copy link
Member

The docker command you referenced uses lower bulk settings that the default - that is a good way to debug this. Could you please confirm that this configuration works and then it suddenly stops? After what time duration Jaeger starts failing?

@kalta
Copy link
Author

kalta commented Apr 21, 2020

Yes - it works and then suddenly starts failing, with this lower configuration too.
It works on a restart, then starts failing again, some times in 5 minutes, some times in some hours.

@kalta
Copy link
Author

kalta commented Apr 21, 2020

I can try to lower it further. And thank you for your fast response!!

@kalta
Copy link
Author

kalta commented Apr 22, 2020

I can confirm the same happens even with ES_BULK_ACTIONS=1 and ES_BULK_SIZE=1000.

@kalta
Copy link
Author

kalta commented Apr 22, 2020

Hi. I was sending the spans in batches of 1000. I reduced it to 100. Same result. It stops working after a while.

@pavolloffay
Copy link
Member

Could you please do a test against upstream elasticsearch https://www.docker.elastic.co/? Just run it as a docker container and configure Jaeger to use it.

docker run -it --rm -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -p 9200:9200 -p 9300:9300 -e "http.host=0.0.0.0" -e "discovery.type=single-node" --name=elasticsearch docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.4

@kalta
Copy link
Author

kalta commented Apr 22, 2020

Done, working for now (as before). I will report if it fails.

@kalta
Copy link
Author

kalta commented Apr 23, 2020

In a direct connection to upstream elasticsearch, it works perfectly. It is even the same version (6.8) however something in AWS ElasticSearch makes it fail after a while. Any ideas?

@pavolloffay
Copy link
Member

We are not using it maybe somebody from @jaegertracing/elasticsearch has any ideas?

Maybe you could raise it in AWS support.

@ledor473
Copy link
Member

You might be facing the Maximum Size of HTTP Request Payloads limit in AWS ESS (documented here)

As for why a container restart is needed, I would think it's because the error returned by AWS ESS causes Jaeger to re-attempt to send it. But obviously, if the data was too big the first time, it will only be bigger if you continue to receive more and try to re-send it later.

@kalta
Copy link
Author

kalta commented Apr 25, 2020

Ok, but the question would be then why option ES_BULK_SIZE does not work? I set it to a much lower value than AWS limit (1K) (-e ES_BULK_SIZE=1000). Maybe it is not the correct format?

@ledor473
Copy link
Member

I'm not entirely sure, but ES_BULK_SIZE which translate to BulkSize in the elastic client used in Jaeger seems to control the minimum payload:

Now, when does bulk processor send these batches? There are 3 parameters that you can control:
...
2. When the batch exceeds a certain size (in bytes).
...

Maybe one of the application is buffering a lot of Spans which causes that behavior?

@kalta kalta closed this as completed Apr 28, 2020
@kalta kalta reopened this Apr 28, 2020
@kalta kalta closed this as completed Apr 28, 2020
@kalta kalta reopened this Apr 28, 2020
@mrgasparov
Copy link

mrgasparov commented May 18, 2022

I'm facing the same issue using open distro 7.6.2 (unfortunately this is the only version which is available on our cloud provider). Is there a way to limit the amount of spans sent in one bulk request, using the jaeger-operator helm chart? That would definitely be the easiest solution to this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants