New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Transform] Unattended are failing due to missing configuration #107266
Comments
Pinging @elastic/ml-core (Team:ML) |
This may be resolved when we introduce the change to abort failing transforms during cluster restarts: #100891 The only thing I can think of is that we have two threads at work, thread1 is removing the persistent task during a node shtudown, and thread2 is trying to update the transform configuration. At least with #100891, the error won't fail the transform, and instead we'll retry it on another node (or the same node when it comes back online) |
The above was only partially true, there is another set that seem to be followed by a Transform delete:
But the Indexer thread is still running and will eventually fail and error out. |
Seems to come from this: https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/epm/elasticsearch/transform/remove.ts#L27 So likely we're calling the Stop API beforehand, or at least we are calling it as part of the delete API (via |
Yes, the backend is calling |
When `_stop?wait_for_checkpoint=false` and `_stop?force=true&wait_for_checkpoint=false` are called, there is a small chance that the Transform Indexer thread will run if it is scheduled before the stop API is called but before the threadpool runs the executable. The `onStart` method now checks the state of the indexer before executing. This will mitigate errors caused by reading from Transform internal indices while the Task is stopped or deleted. This does not impact when `wait_for_checkpoint=true`, because the indexer state will remain `INDEXING` until the checkpoint is finished. Relate elastic#107266
When `_stop?wait_for_checkpoint=false` and `_stop?force=true&wait_for_checkpoint=false` are called, there is a small chance that the Transform Indexer thread will run if it is scheduled before the stop API is called but before the threadpool runs the executable. The `onStart` method now checks the state of the indexer before executing. This will mitigate errors caused by reading from Transform internal indices while the Task is stopped or deleted. This does not impact when `wait_for_checkpoint=true`, because the indexer state will remain `INDEXING` until the checkpoint is finished. Relate #107266
Check if the Transform was aborted before failing due to missing Transform config. If the `DELETE _transform/id` API is called while the Indexer is looking up the Config, it is possible the delete API will remove the Config before the Indexer can retrieve the Config. Rather than fail the Transform, the indexer will check if the delete API has been called via the `ABORTING` state and move into its graceful shutdown sequence. Fix elastic#107266
Description
Issue seen ~1-3 times per week
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformIndexer.java#L385-L386
Next steps:
The text was updated successfully, but these errors were encountered: