Implement checkpointing for data #56

jlowin · 2018-07-15T23:16:50Z

Currently tasks have a dummy "checkpoint" attribute which needs to be handled

cicdw · 2018-07-30T20:53:22Z

So I started working on output caching for tasks; however, I quickly ran into a question: where should the cached data be stored?

I began implementing it analogously to #78 by putting the outputs into a task's State which is then fed into the FlowRunner via task_states. However, after a successful first run the returned state is Success which effectively is an output cache except with no input-validation to ensure the cache is still valid.

We could continue down this route, and let the TaskRunner perform the necessary cache-validation checks when the task has a Success state, but this might alter the fundamental behavior of the TaskRunner when it is provided with a Success state (for example, if the TaskRunner receives a Success state whose cache is not valid anymore).

Two other options I immediately see:

the Task itself maintains the cache
prefect.Context somehow maintains a cache for all tasks via a dictionary _cached_tasks

The answer should probably depend on how we envision this output cache behaving in relation to our server and whether the cache should work across Flows or not.

cc: @jlowin

cicdw · 2018-07-30T21:00:32Z

Another option is to create a new State: CachedState which inherits from Success but is handled differently than a Success in that the TaskRunner will rerun this task if the cache is no longer valid. (I think I actually prefer this option the most)

jlowin added enhancement An improvement of an existing feature class: Task labels Jul 15, 2018

jlowin self-assigned this Jul 17, 2018

jlowin added this to the v 0.3 milestone Jul 18, 2018

cicdw mentioned this issue Jul 27, 2018

Input cacheing #78

Merged

cicdw mentioned this issue Jul 30, 2018

Output Caching #84

Merged

cicdw closed this as completed in #84 Aug 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement checkpointing for data #56

Implement checkpointing for data #56

jlowin commented Jul 15, 2018

cicdw commented Jul 30, 2018

cicdw commented Jul 30, 2018 •

edited

Implement checkpointing for data #56

Implement checkpointing for data #56

Comments

jlowin commented Jul 15, 2018

cicdw commented Jul 30, 2018

cicdw commented Jul 30, 2018 • edited

cicdw commented Jul 30, 2018 •

edited