Skip to content

Conversation

@valeriy42
Copy link
Contributor

@valeriy42 valeriy42 commented Oct 31, 2025

What's going on:

  • Forecasts start in SCHEDULED status, then transition to STARTED, then FINISHED
  • The document may be indexed but not immediately searchable

Race condition:

  • Document exists but is still SCHEDULED when first checked, and assertBusy may catch it in this state if the native process is slow to start
  • waitForecastToFinish may check before the document is indexed or searchable
  • Document is indexed but not refreshed, so getForecastStats() doesn't find it initially

Hence, this PR mitigates the possible race condition by doing two things:

  • Refresh the index before checking forecast stats to ensure recently indexed documents are visible.
  • Modify waitForecastToFinish to first wait for document existence, then wait for FINISHED status.

Fixes #117740

…Added logic to wait for forecast documents to exist in non-terminal states (SCHEDULED, STARTED) before checking for FINISHED status. Included index refresh to ensure visibility of recently indexed forecast stats documents.
@valeriy42 valeriy42 added >test Issues or PRs that are addressing/adding tests :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Oct 31, 2025
@valeriy42 valeriy42 requested a review from Copilot October 31, 2025 13:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a flaky test (ForecastIT#testOverflowToDisk) by addressing race conditions in forecast status verification. The fix ensures the forecast document is visible and properly tracked through its state transitions before asserting completion.

Key Changes:

  • Modified waitForecastToFinish to perform a two-stage wait: first for the forecast to reach any non-terminal state (SCHEDULED/STARTED/FINISHED), then specifically for FINISHED status
  • Added index refresh in waitForecastStatus to ensure recently indexed forecast documents are visible before status checks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +249 to +259
int timeoutSeconds = inFipsJvm() ? 300 : 90;
// First wait for the forecast document to exist and be in a non-terminal state
// This handles the race condition where the document may be SCHEDULED or STARTED initially
waitForecastStatus(
timeoutSeconds,
jobId,
forecastId,
ForecastRequestStats.ForecastRequestStatus.SCHEDULED,
ForecastRequestStats.ForecastRequestStatus.STARTED,
ForecastRequestStats.ForecastRequestStatus.FINISHED
);
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'flacky' to 'flaky' in the PR title.

Copilot uses AI. Check for mistakes.
@valeriy42 valeriy42 changed the title [ML] Fix flacky ForecastIT#testOverflowToDisk [ML] Fix flaky ForecastIT#testOverflowToDisk Oct 31, 2025
@valeriy42 valeriy42 self-assigned this Oct 31, 2025
@valeriy42 valeriy42 marked this pull request as ready for review October 31, 2025 13:52
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@valeriy42 valeriy42 merged commit 7aa001c into elastic:main Oct 31, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning Team:ML Meta label for the ML team >test Issues or PRs that are addressing/adding tests v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] ForecastIT testOverflowToDisk failing

3 participants