Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: persisting tasks on a per-run basis #403

Open
NickCrews opened this issue Jul 30, 2023 · 0 comments
Open

ENH: persisting tasks on a per-run basis #403

NickCrews opened this issue Jul 30, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@NickCrews
Copy link
Contributor

Is your feature request related to a problem?

I'm taking about https://pytask-dev.readthedocs.io/en/stable/tutorials/making_tasks_persist.html

If I format my tasks.py file or update an unrelated task in there, then pytask will re-run the tasks. But I don't want this. I also don't want this to be a permanent feature of the task, as marked with a decorator.

Describe the solution you'd like

I want to be able to specify on a per-run basis whether or not to re-run a task if the source file changed. Ideally it should be something like

  • Keep the @pytask.mark.persist as marking default behavior: If its not present, the default is to re-run on src changes. If it is present, then the default is to NOT re-run on source changes. Therefore, we don't break the old API.
  • But, add a flag/API at runtime to override this default: On a per-task level be able to change the behavior. I the options should be "rerun", "don't rerun, and mark as updated", and "don't rerun, but don't mark as updated, so next time you run then the task will still appear stale". Maybe I'm not breaking down these options along the right axes, it could use more work.

API breaking implications

Per above, it shouldn't be breaking

Describe alternatives you've considered

Perhaps this warrants re-thinking the API for determining what is "up to date". Per those docs above:

Internally, the state of the dependencies, the source file, and the products are updated in the database such that the subsequent execution will skip the task successfully.

If we accumulate many more edge cases for when a task is up to date, things are going to get complicated. If there was a more official API when you define a task, eg @pytask.Task(func, ins, outs, *, should_run = None) and you could pass in (really brainstorming here)

  • None (keep default behavior)
  • True (rerun every time)
  • A Callable[[Not sure what the args should be], bool]
  • "on_dependency_change"
  • "on_source_change"
  • An Iterable of any of the above. If any are true, then rerun.
@NickCrews NickCrews added the enhancement New feature or request label Jul 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant