Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max job retained pipelined runs #608

Conversation

fruttasecca
Copy link
Member

@fruttasecca fruttasecca commented Dec 21, 2021

Description

This PR adds to the BE the functionality to automatically clean up the oldest job pipeline runs. This is done through a new field of the Job model, max_retained_pipeline_runs. This can be specified both during the creation of a job (POST) or update (PUT). This will make it so that 1) when new pipeline runs are scheduled for the job 2) when a pipeline run of the job gets into an end state the logic will check for the oldest job pipeline runs to delete. A requirement for a pipeline run to be deleted is to be in an end state (SUCCESS, FAILED, ABORTED).

Some details:

  • the deletion is asynchronous to new runs being scheduled or a run getting to an end state, that is, the cleanup is not run in the same transaction but after. This is because the orchest-api does not control theuserdir, but the orchest-webserver does.
  • each job pipeline run (as before) has a pipeline_run_index, this is used to find out what are the oldest job pipeline runs to delete, i.e. highest_index - max_retained_pipeline_runs would be the threshold to be considered for deletion
  • among the pipeline runs considered for deletion, only the ones in an end state are considered
  • max_retained_pipeline_runs = 0 is a valid value, i.e. every run will cleanup itself after reaching an end state
  • most likely, this setting makes sense only for cronjobs, but I have not put a constraint in place, keen to know your opinion about this being a FE "constraint" or an actual BE constraint

Related to #550

@yannickperrenet The deletion of the directory of a job pipeline run could have been done in different manners, I went for preserving the constraint that the orchest-api does not touch the userdir, and for reusing the existing orchest-webserver endpoint, so now the orchest-api is actually getting in touch with the webserver and it's quite awkward since it will result in a call to the orchest-api. Keen to know your opinion :).

⚠️ The PR contains schema changes that will alter your DB if you build & run.

Checklist

  • I have manually tested the application to make sure the changes don’t cause any downstream issues, which includes making sure ./orchest status --ext is not reporting failures when Orchest is running.
  • In case I changed one of the services’ models.py I have performed the appropriate database migrations.

Later used to retrieve job runs to delete when max_pipeline_runs
retained has a valid value.
A job level property must be used to keep track of how many job pipeline
runs have been scheduled because job pipeline runs can now be deleted,
also the parameters of a (cron)job can be edited.
For some reason alembic did not like this migration (i.e. the previous
verison) when creating database from scratch.
@ricklamers
Copy link
Member

Nice job! LGTM

Makes it so that the deletion of a job pipeline run directory is done by
the celery worker rather than the orchest webserver. This allows to
avoid a bidirectional relationship between the orchest-webserver and the
orchest-api. The logic related to max_retained_pipeline_runs has been
refactored into a 2PF, DeleteNonRetainedJobPipelineRuns.
Copy link
Contributor

@yannickperrenet yannickperrenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! 💯

@fruttasecca fruttasecca merged commit 29438ef into feat/delete-individual-job-pipeline-run Dec 23, 2021
@yannickperrenet yannickperrenet deleted the feat/max-job-retained-pipelined-runs branch January 18, 2022 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants