Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] Improve reporting status of the transform that is about to finish #95672

Merged
merged 5 commits into from May 5, 2023

Conversation

przemekwitek
Copy link
Contributor

@przemekwitek przemekwitek commented Apr 28, 2023

This PR fixes 2 places that may yield issues with status reporting when the transform finishes:

  1. it makes the TransformTask.shutdown method wait until the persistent task disappears.
  2. It makes the _stats action report stopping rather than stopped state if there is no persistent task seen by the TaskManager but there still is a task in the ClusterState.

Additionally, this PR adds a test (TransformRobustnessIT.testCreateAndDeleteTransformInALoop) which shows that every time we are about to delete the transform, the persistent task is already gone.

Relates #95327

@przemekwitek przemekwitek force-pushed the debug_failing_test branch 9 times, most recently from 9623618 to a0b53e0 Compare May 4, 2023 12:43
@przemekwitek przemekwitek changed the title [Transform] Fix race condition in TransformIndexer [Transform] Improve reporting status of the transform that is about to finish May 5, 2023
@przemekwitek przemekwitek removed the WIP label May 5, 2023
@przemekwitek przemekwitek marked this pull request as ready for review May 5, 2023 08:00
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label May 5, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @przemekwitek, I've created a changelog YAML for you.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I think it would be clearer if you renamed one variable.

@przemekwitek przemekwitek merged commit d9c169f into elastic:main May 5, 2023
12 checks passed
@przemekwitek przemekwitek deleted the debug_failing_test branch May 5, 2023 11:54
@hendrikmuhs
Copy link
Contributor

2. It makes the `_stats` action report `stopping` rather than `stopped` state if there is no persistent task seen by the `TaskManager` but there still is a task in the `ClusterState`.

This is great.

1. it makes the `TransformTask.shutdown` method wait until the persistent task disappears.

The stop API has a wait_completion query parameter, the added code is a duplication of that in a different place. You find it in the transport implementation of _stop.

The change breaks wait_completion=false.

I therefore think we should revert that part. It doesn't solve the fundamental problem that if the indexer is running _stop does not unregister the persistent task. That means that with a hanging indexer thread it is not possible to remove the p-task using _stop.

@przemekwitek
Copy link
Contributor Author

2. It makes the `_stats` action report `stopping` rather than `stopped` state if there is no persistent task seen by the `TaskManager` but there still is a task in the `ClusterState`.

This is great.

1. it makes the `TransformTask.shutdown` method wait until the persistent task disappears.

The stop API has a wait_completion query parameter, the added code is a duplication of that in a different place. You find it in the transport implementation of _stop.

The change breaks wait_completion=false.

I therefore think we should revert that part. It doesn't solve the fundamental problem that if the indexer is running _stop does not unregister the persistent task. That means that with a hanging indexer thread it is not possible to remove the p-task using _stop.

Right, removing the p-task using stop is a separate problem that I'd like to tackle.
I've verified (using the TransformRobustnessIT.testCreateAndDeleteTransformInALoop test case) that this change was indeed unnecessary.

The partial revert PR is ready for review: #96060

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml/Transform Transform Team:ML Meta label for the ML team v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants