Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultCheckpointProviderTests#testHandlingShardFailures doesn't work #104533

Closed
DaveCTurner opened this issue Jan 18, 2024 · 1 comment · Fixed by #106793
Closed

DefaultCheckpointProviderTests#testHandlingShardFailures doesn't work #104533

DaveCTurner opened this issue Jan 18, 2024 · 1 comment · Fixed by #106793
Assignees
Labels
:ml/Transform Transform Team:ML Meta label for the ML team >test Issues or PRs that are addressing/adding tests

Comments

@DaveCTurner
Copy link
Contributor

DefaultCheckpointProviderTests#testHandlingShardFailures ends with a latch.await(10, TimeUnit.SECONDS); but that doesn't check that the latch has been counted down. In fact it isn't counted down, this test always waits for 10s before timing out and passing. I'm pretty sure it needs attention, either fixing with a safeAwait(latch), or perhaps just removing if it's no longer relevant.

@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :ml Machine learning labels Jan 18, 2024
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jan 18, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@przemekwitek przemekwitek added :ml/Transform Transform and removed :ml Machine learning labels Jan 18, 2024
@prwhelan prwhelan self-assigned this Mar 26, 2024
prwhelan added a commit to prwhelan/elasticsearch that referenced this issue Mar 26, 2024
When there are no remote or local clusters for a given source index, we
call the listener's `onFailure` method with a `CheckpointException`.
A running transform will fail and retry, eventually moving into an
unhealthy and failed state.  Any call to the stats API will note the
checkpoint failure and return.

This fixes a timeout issue calling the Transform stats API and prevents
the Transform from being stuck in indexing.

Fix elastic#106790
Fix elastic#104533
prwhelan added a commit to prwhelan/elasticsearch that referenced this issue Mar 27, 2024
When there are no remote or local clusters for a given source index, we
call the listener's `onFailure` method with a `CheckpointException`.
A running transform will fail and retry, eventually moving into an
unhealthy and failed state.  Any call to the stats API will note the
checkpoint failure and return.

This fixes a timeout issue calling the Transform stats API and prevents
the Transform from being stuck in indexing.

Fix elastic#106790
Fix elastic#104533
elasticsearchmachine pushed a commit that referenced this issue Mar 27, 2024
When there are no remote or local clusters for a given source index, we
call the listener's `onFailure` method with a `CheckpointException`.
A running transform will fail and retry, eventually moving into an
unhealthy and failed state.  Any call to the stats API will note the
checkpoint failure and return.

This fixes a timeout issue calling the Transform stats API and prevents
the Transform from being stuck in indexing.

Fix #106790
Fix #104533
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Transform Transform Team:ML Meta label for the ML team >test Issues or PRs that are addressing/adding tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants