New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistencies in response codes for _bulk API request body JSON errors #92443
Comments
The difference is that Elasticsearch must read every metadata line (e.g. the
... then it would index the As this is more of a user question than something needing action from the Elasticsearch team, we'd like to direct these kinds of things to the Elasticsearch forum. If you can stop by there, we'd appreciate it. This allows us to use GitHub for verified bug reports, feature requests, and pull requests. There's an active community in the forum that should be able to help get an answer to your question. As such, I hope you don't mind that I close this. |
We need to separate 2 errors modes here:
When ingesting documents into Elasticsearch, errors can happen like mapping conflict, ingest pipeline fails or cluster is overloaded. But all these errors can be resolved by changing configs / assets of the Elasticsearch cluster. In general we are trying to become more lenient with the mappings to reject less data and hit case 1 much less frequent. See #89743 for related discussions. Then there is 2 where there is no solution out of it. No change to mappings, ingest pipeline, retry can fix this issue, is is invalid json. On the one hand I like that all the other docs which have valid json are still ingested and I expect in most production environments, it can be assumed the shipped json docs are mostly valid. But it makes debugging really hard without having custom error implementations on the client side. This becomes even more tricky if the client side is not owned. As a potential solution (feature request ;-) ), what if we could pass a param as part of the bulk request |
I think it would make more sense to check for well-formedness of the JSON on the sending client(s). Today there's no need for the coordinating node to even parse the JSON it receives, it just passes the raw bytes straight through to the various primaries. If we were to validate the docs on the coordinating node first we'd need to do this additional parsing work, imposing extra load in the cluster and potentially introducing an indexing bottleneck. |
Pinging @elastic/es-distributed (Team:Distributed) |
related: #60442 |
Elasticsearch Version
8.7.0-SNAPSHOT
Installed Plugins
No response
Java Version
bundled
OS Version
macos
Problem Description
I was investigating issues relating to invalid JSON in _bulk requests. I noticed that depending on where in the request body the JSON error is found, the response code is different:
the following returns a 400 with
{"error":{"root_cause":[{"type":"x_content_parse_exception","reason":"[1:27] Unrecognized token 'test': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')\n at ...
(note the missing quote around
test
)whereas the following returns a 200 with
{"took":3,"errors":true,"items":[{"index":{"_index":"test","_id":"y3rbKYUBDnKSxzO74aQ8","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"x_content_parse_exception","reason":"[1:25] Unrecognized token 'tom': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')\n at
(note the missing quite around
tom
)Why should these equivalent parsing errors return different response codes?
Steps to Reproduce
Call the _bulk api with the data detailed above.
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: