Skip to content
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.

Redeploys will require some way to resume state from previous jobs #319

Closed
lbradstreet opened this issue Oct 3, 2015 · 4 comments
Closed
Milestone

Comments

@lbradstreet
Copy link
Member

The idea would be that a job-id would be supplied and tasks would replay state from the previous job, assuming there are an equivalent number of peers for a particularly named task (this is already sounding a little tricky, because the tasks in the replica aren't really named, just task-ids).

It wouldn't be too bad, except that the coordination log gc will have to take into account the fact that later jobs may want to rebuild state from previous jobs.

@lbradstreet
Copy link
Member Author

When restoring the window definitions should probably be validated against the previous job to ensure that the jobs are equivalent (also the task-map uniqueness key)

@lbradstreet
Copy link
Member Author

Flink has a solution for this problem called "save points". I don't think we'll need to use the same strategy because our design is different, but it's an interesting read: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html and https://issues.apache.org/jira/browse/FLINK-2976

@lbradstreet
Copy link
Member Author

As part of this item, allow repartitioning of key space over slots.

Also, remember that you will need to replay over the windows / triggers in the same order that they were in the previous job for any ability to restore state.

@lbradstreet lbradstreet added this to the 0.9.1 milestone Mar 18, 2016
@lbradstreet
Copy link
Member Author

Achieved by resume points in 0.10.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant