Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement checkpointing for data #56

Closed
jlowin opened this issue Jul 15, 2018 · 2 comments
Closed

Implement checkpointing for data #56

jlowin opened this issue Jul 15, 2018 · 2 comments
Assignees
Labels
enhancement An improvement of an existing feature
Milestone

Comments

@jlowin
Copy link
Member

jlowin commented Jul 15, 2018

Currently tasks have a dummy "checkpoint" attribute which needs to be handled

@jlowin jlowin added enhancement An improvement of an existing feature class: Task labels Jul 15, 2018
@jlowin jlowin self-assigned this Jul 17, 2018
@jlowin jlowin added this to the v 0.3 milestone Jul 18, 2018
@cicdw cicdw mentioned this issue Jul 27, 2018
@cicdw
Copy link
Member

cicdw commented Jul 30, 2018

So I started working on output caching for tasks; however, I quickly ran into a question: where should the cached data be stored?

I began implementing it analogously to #78 by putting the outputs into a task's State which is then fed into the FlowRunner via task_states. However, after a successful first run the returned state is Success which effectively is an output cache except with no input-validation to ensure the cache is still valid.

We could continue down this route, and let the TaskRunner perform the necessary cache-validation checks when the task has a Success state, but this might alter the fundamental behavior of the TaskRunner when it is provided with a Success state (for example, if the TaskRunner receives a Success state whose cache is not valid anymore).

Two other options I immediately see:

  • the Task itself maintains the cache
  • prefect.Context somehow maintains a cache for all tasks via a dictionary _cached_tasks

The answer should probably depend on how we envision this output cache behaving in relation to our server and whether the cache should work across Flows or not.

cc: @jlowin

@cicdw
Copy link
Member

cicdw commented Jul 30, 2018

Another option is to create a new State: CachedState which inherits from Success but is handled differently than a Success in that the TaskRunner will rerun this task if the cache is no longer valid. (I think I actually prefer this option the most)

@cicdw cicdw mentioned this issue Jul 30, 2018
@cicdw cicdw closed this as completed in #84 Aug 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants