Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML-DataFrame] fix starting a batch data frame after stopping at runtime #45340

Merged
merged 5 commits into from
Aug 9, 2019

Conversation

hendrikmuhs
Copy link
Contributor

fix loading of next checkpoint after data frame transform has been stopped/started within one run

closes #45339

The logic introduced in #44219 wrongly assumed no next checkpoint if no checkpoint has not been created yet. The fix properly loads the next checkpoint.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

* @return checkpoint in progress or 0 if task/indexer is not active
*/
public long getInProgressCheckpoint() {
return indexerState.equals(IndexerState.INDEXING) ? checkpoint + 1L : 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gone for good, next checkpoint is not dependent on the indexer state.

@@ -200,9 +204,9 @@ protected void nodeOperation(AllocatedPersistentTask task, @Nullable DataFrameTr
final long lastCheckpoint = stateHolder.get().getCheckpoint();

if (lastCheckpoint == 0) {
logger.trace("[{}] No checkpoint found, starting the task", transformId);
startTask(buildTask, indexerBuilder, lastCheckpoint, startTaskListener);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main fix: we started the task without loading the next checkpoint when last checkpoint is 0,


if (nextCheckpoint.isEmpty()) {
// corner case which should not happen ;-)
// reset the position to force a full re-run with checkpoint creation
Copy link
Contributor Author

@hendrikmuhs hendrikmuhs Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getTransformCheckpoint always returns a checkpoint object: in case it does not find a document in the internal index, it returns an empty object.

Hendrik Muhs and others added 2 commits August 9, 2019 08:09
…/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java

Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
@hendrikmuhs
Copy link
Contributor Author

run elasticsearch-ci/1

1 similar comment
@hendrikmuhs
Copy link
Contributor Author

run elasticsearch-ci/1

@hendrikmuhs hendrikmuhs merged commit c42dd74 into elastic:master Aug 9, 2019
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this pull request Aug 9, 2019
…ime (elastic#45340)

fix loading of next checkpoint after data frame transform has been stopped/started within one run

closes elastic#45339
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this pull request Aug 9, 2019
…ime (elastic#45340)

fix loading of next checkpoint after data frame transform has been stopped/started within one run

closes elastic#45339
hendrikmuhs pushed a commit that referenced this pull request Aug 9, 2019
…ime (#45340) (#45381)

fix loading of next checkpoint after data frame transform has been stopped/started within one run

closes #45339
hendrikmuhs pushed a commit that referenced this pull request Aug 9, 2019
…ime (#45340) (#45380)

fix loading of next checkpoint after data frame transform has been stopped/started within one run

closes #45339
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DataFrame] batch dataframe transform fails to start after it stopped at runtime
4 participants