-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML-DataFrame] fix starting a batch data frame after stopping at runtime #45340
[ML-DataFrame] fix starting a batch data frame after stopping at runtime #45340
Conversation
…opped/ started within one run
Pinging @elastic/ml-core |
* @return checkpoint in progress or 0 if task/indexer is not active | ||
*/ | ||
public long getInProgressCheckpoint() { | ||
return indexerState.equals(IndexerState.INDEXING) ? checkpoint + 1L : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gone for good, next checkpoint is not dependent on the indexer state.
@@ -200,9 +204,9 @@ protected void nodeOperation(AllocatedPersistentTask task, @Nullable DataFrameTr | |||
final long lastCheckpoint = stateHolder.get().getCheckpoint(); | |||
|
|||
if (lastCheckpoint == 0) { | |||
logger.trace("[{}] No checkpoint found, starting the task", transformId); | |||
startTask(buildTask, indexerBuilder, lastCheckpoint, startTaskListener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the main fix: we started the task without loading the next checkpoint when last checkpoint is 0,
|
||
if (nextCheckpoint.isEmpty()) { | ||
// corner case which should not happen ;-) | ||
// reset the position to force a full re-run with checkpoint creation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getTransformCheckpoint
always returns a checkpoint object: in case it does not find a document in the internal index, it returns an empty object.
.../org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java
Outdated
Show resolved
Hide resolved
.../org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java
Outdated
Show resolved
Hide resolved
…/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
run elasticsearch-ci/1 |
1 similar comment
run elasticsearch-ci/1 |
…ime (elastic#45340) fix loading of next checkpoint after data frame transform has been stopped/started within one run closes elastic#45339
…ime (elastic#45340) fix loading of next checkpoint after data frame transform has been stopped/started within one run closes elastic#45339
fix loading of next checkpoint after data frame transform has been stopped/started within one run
closes #45339
The logic introduced in #44219 wrongly assumed no next checkpoint if no checkpoint has not been created yet. The fix properly loads the next checkpoint.