-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add failure store status in index response of data streams #112816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add failure store status in index response of data streams #112816
Conversation
|
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a really good first pass. Had a couple of comments and suggestions.
I noticed that in the example above:
{
"create": {
"_index": "ds-no-fs",
"_id": "hxDDvJEB_J3Inuia2jj3",
"status": 400,
"error": {
"type": "document_parsing_exception",
"reason": "[1:153] failed to parse field [count] of type [long] in document with id 'hxDDvJEB_J3Inuia2jj3'. Preview of field's value: 'bla'",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "For input string: \"bla\""
},
"failure_store": "not_enabled"
}
}
}
The failure_store field is inside of the error object, is that accurate still?
I'm also wondering if it would be reasonable to add the original error information on the response even if the document is redirected to the failure store. I know in earlier discussions we had considered this as a useful way to surface that info to clients while still signaling that we accepted the document, but I don't know if we've soured on the idea since then. @dakrone any thoughts on that?
server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/rest/action/document/RestBulkAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
Outdated
Show resolved
Hide resolved
| IndexDocFailureStoreStatus indexDocFailureStoreStatus = IndexDocFailureStoreStatus.NOT_APPLICABLE_OR_UNKNOWN; | ||
| if (docWriteRequest instanceof IndexRequest indexRequest) { | ||
| executedPipelines = indexRequest.getExecutedPipelines(); | ||
| if (indexRequest.isWriteToFailureStore()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this where this field needs to be serialized for? I'm wondering if it makes more sense to derive this in the BulkOperation listener for the shard operation instead of serializing this flag across the wire. I'm mostly poking at this because I don't actually like this isWriteToFailureStore flag on the index request and would prefer if we don't immortalize it in the wire serialization. It was meant to be a stopgap originally. That said, I don't have an alternative for it... If we think this is definitely the better place to put this then I'm willing to leave it be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't know, I will look into an alternative way then first, this was already in use in the code so I thought it was here to stay.
server/src/main/java/org/elasticsearch/action/bulk/BulkOperation.java
Outdated
Show resolved
Hide resolved
Ah good point, I updated it. So, I made the decision to not nest it so it's on the top level and can be accessed in the same way in a client. |
| capabilities: [ 'failure_store_status' ] | ||
| - method: PUT | ||
| path: /_bulk | ||
| capabilities: [ 'failure_store_status' ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this test is ignored because of #113231.
|
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Mary!
Leaving this extra note as documentation: @gmarouli and I spoke and there are some options that Mary will explore to clean up the transient writeToFailureStore field on the IndexRequest. If those options don't pan out, then we will circle back and add the field to the request's wire serialization to avoid any issues going forward.
| } | ||
|
|
||
| /** | ||
| * Transient flag denoting that the local request should be routed to a failure store. Not persisted across the wire. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…12816) The failure store status is a flag that indicates how the failure store was used or could be used if enabled. The user can be informed about the usage of the failure store in the following way: When relevant we add the optional field `failure_store` . The field will be omitted when the use of the failure store is not relevant. For example, if a document was successfully indexed in a data stream, if a failure concerns an index or if the opType is not index or create. In more detail: - when we have a “success” create/index response, the field `failure_store` will not be present if the documented was indexed in a backing index. Otherwise, if it got stored in the failure store it will have the value `used`. - when we have a “rejected“ create/index response, meaning the document was not persisted in elasticsearch, we return the field `failure_store` which is either `not_enabled`, if the document could have ended up in the failure store if it was enabled, or `failed` if something went wrong and the document was not persisted in the failure store, for example, the cluster is out of space and in read-only mode. We chose to make it an optional field to reduce the impact of this field on a bulk response. The value will exist in the java object but it will not be returned to the user. The only values that will be displayed are: - `used`: meaning this document was indexed in the failure store - `not_enabled`: meaning this document was rejected but could have been stored in the failure store if it was applicable. - `failed`: meaning this failed document, failed to be stored in the failure store. Example: ``` "errors": true, "took": 202, "items": [ { "create": { "_index": ".fs-my-ds-2024.09.04-000002", "_id": "iRDDvJEB_J3Inuia2zgH", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 6, "_primary_term": 1, "status": 201, "failure_store": "used" } }, { "create": { "_index": "ds-no-fs", "_id": "hxDDvJEB_J3Inuia2jj3", "status": 400, "error": { "type": "document_parsing_exception", "reason": "[1:153] failed to parse field [count] of type [long] in document with id 'hxDDvJEB_J3Inuia2jj3'. Preview of field's value: 'bla'", "caused_by": { "type": "illegal_argument_exception", "reason": "For input string: \"bla\"" } } }, "failure_store": "not_enabled" }, { "create": { "_index": ".ds-my-ds-2024.09.04-000001", "_id": "iBDDvJEB_J3Inuia2jj3", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 7, "_primary_term": 1, "status": 201 } } ] ``` (cherry picked from commit f4f075a)
…113245) The failure store status is a flag that indicates how the failure store was used or could be used if enabled. The user can be informed about the usage of the failure store in the following way: When relevant we add the optional field `failure_store` . The field will be omitted when the use of the failure store is not relevant. For example, if a document was successfully indexed in a data stream, if a failure concerns an index or if the opType is not index or create. In more detail: - when we have a “success” create/index response, the field `failure_store` will not be present if the documented was indexed in a backing index. Otherwise, if it got stored in the failure store it will have the value `used`. - when we have a “rejected“ create/index response, meaning the document was not persisted in elasticsearch, we return the field `failure_store` which is either `not_enabled`, if the document could have ended up in the failure store if it was enabled, or `failed` if something went wrong and the document was not persisted in the failure store, for example, the cluster is out of space and in read-only mode. We chose to make it an optional field to reduce the impact of this field on a bulk response. The value will exist in the java object but it will not be returned to the user. The only values that will be displayed are: - `used`: meaning this document was indexed in the failure store - `not_enabled`: meaning this document was rejected but could have been stored in the failure store if it was applicable. - `failed`: meaning this failed document, failed to be stored in the failure store. Example: ``` "errors": true, "took": 202, "items": [ { "create": { "_index": ".fs-my-ds-2024.09.04-000002", "_id": "iRDDvJEB_J3Inuia2zgH", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 6, "_primary_term": 1, "status": 201, "failure_store": "used" } }, { "create": { "_index": "ds-no-fs", "_id": "hxDDvJEB_J3Inuia2jj3", "status": 400, "error": { "type": "document_parsing_exception", "reason": "[1:153] failed to parse field [count] of type [long] in document with id 'hxDDvJEB_J3Inuia2jj3'. Preview of field's value: 'bla'", "caused_by": { "type": "illegal_argument_exception", "reason": "For input string: \"bla\"" } } }, "failure_store": "not_enabled" }, { "create": { "_index": ".ds-my-ds-2024.09.04-000001", "_id": "iBDDvJEB_J3Inuia2jj3", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 7, "_primary_term": 1, "status": 201 } } ] ``` (cherry picked from commit f4f075a)
It would be useful for clients to be able to tell whether their document was indexed successfully, whether it went into the failure store, or it could have been stored in the failure store if it was enabled.
We propose to adjust the response the user sees in the following way:
failure_storewhich is optional and it will be omitted when the use of the failure store is not relevant. For example, if a document was successfully indexed in a data stream, if a failure concerns an index or if theopTypeis notindexorcreate.failure_storewill not be present if the documented was indexed in a backing index. Otherwise, if it got stored in the failure store it will have the valueusedfailure_storewhich is eithernot_enabled, if the document could have ended up in the failure store if it was enabled, orfailedif something went wrong and the document was not persisted in the failure store, for example, the cluster is out of space and in read-only mode.We chose to make it an optional field to reduce the impact of this field on a bulk response. The value will exist in the java object but it will not be returned to the user. The only values that will be displayed are:
used: meaning this document was indexed in the failure storenot_enabled: meaning this document was rejected but could have been stored in the failure store if it was applicable.failed: meaning this failed document, failed to be stored in the failure store.Example: