New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Max bytes length exceeded when percolating a document with huge geo_shape property #83418
Comments
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
I was able to reproduce it and the error that is generated is
It basically fails in PercolateQueryBuilder when we try to [build MemoryIndex] ( Line 526 in 3157d1c
|
I don't have much knowledge of the inner working of percolator but the issue seems related to geo_shape doc values. In this case the binary doc value is very big and it seems the MemoryIndex has a limit on the size, therefore when it tries to store it in MemoryIndex#storeDocValues it throws an error. As a workaround, you just need to disable doc_values for geo_shape in
I need to dig more to see if we can automatically disable as I think it is not needed. |
It turns out that this is probably a Lucene bug so I opened https://issues.apache.org/jira/browse/LUCENE-10405 to address it. |
The issue has been fixed upstream and it will be released in Lucene 9.1. For the time being the workaround would be to disable doc values. As there is nothing more to do, I hope you don't mind if I close the issue. Thanks for reporting! |
Elasticsearch Version
7.16.3
Installed Plugins
No response
Java Version
bundled
OS Version
Linux 1f98d58db442 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Problem Description
I have a system using 3 types of documents : data with a geo_shape property, workflows using some data and search requests that contains associations between data and workflows (using that data) through a percolate query.
A workflow use Elasticsearch queries for each data it needs. When a workflow is created, it can easily search all data it needs.
The aim here, is also to know what workflows wait for a newly data. When a newly data is created, we use search query index and percolate query to retrieve all workflows needing it.
I have a problem when a data with a huge geo_shape is submitted to percolation (even if search request index is empty) : max_bytes_length_exceeded_exception
If i remove geo_shape property from data mapping, i haven't the problem.
I submitted this case to discuss.elastic.co but doesn't get any response....
(https://discuss.elastic.co/t/percolation-with-document-containing-huge-geo-shape/294557/2)
Steps to Reproduce
data type is "product_polygon"
mapping of dt_product_polygon (ie. data index) is :
mapping of sr_product_polygon (ie. search request index) is:
One document that contains huge geo_shape data.zip
The request that makes a pb (even if sr_product_polygon is empty):
with the result:
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: