Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a culling controller for Notebooks #6807

Conversation

apo-ger
Copy link
Contributor

@apo-ger apo-ger commented Dec 6, 2022

This PR splits the idleness/culling logic into a separate controller inside of the Notebooks Controller/Operator.

The culling-controller will:

  • reconcile only notebooks CRs.
  • set/update culling annotations
    • notebooks.kubeflow.org/last_activity
    • notebooks.kubeflow.org/last_activity_check_timestamp
  • perform idleness checks every IDLENESS_CHECK_PERIOD minutes and set the kubeflow-resource-stopped annotation, if a notebook needs to be culled.

Refs: #6767

Signed-off-by: Apostolos Gerakaris apoger@arrikto.com

@apo-ger apo-ger force-pushed the feature-apoger-notebook-controller-culling branch 2 times, most recently from 9d3f293 to 5ad3f07 Compare December 12, 2022 10:11
@google-oss-prow google-oss-prow bot added size/XXL and removed size/L labels Dec 12, 2022
@apo-ger apo-ger force-pushed the feature-apoger-notebook-controller-culling branch from 5ad3f07 to d61c5da Compare December 12, 2022 10:19
Changes:

 * Move the idleness/culling logic into a separate controller
   as part of the Notebooks Controller/Operator.

 * Introduce an "notebooks.kubeflow.org/last_activity_check_timestamp".
   annotation in each Notebook CR to keep the timestamp of the last
   performed idleness check

The controller can then compare this timestamp with the current time to
ensure that notebooks will get reconciled every IDLENESS_CHECK_PERIOD
minutes.

The culling-controller will:

* reconcile only notebooks CRs
* set/update culling annotations
  - 'notebooks.kubeflow.org/last_activity'
  - 'notebooks.kubeflow.org/last_activity_check_timestamp'
* perform idleness checks every 'IDLENESS_CHECK_PERIOD' minutes
  and set the 'kubeflow-resource-stopped' annotation, if a notebook
  needs to be culled.

Refs: kubeflow#6767

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>
@apo-ger apo-ger force-pushed the feature-apoger-notebook-controller-culling branch from d61c5da to b4e0869 Compare January 24, 2023 10:35
Copy link
Member

@kimwnasptd kimwnasptd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apo-ger left some comments. The code mostly LGTM!

I also tested the controller and indeed it's making the requests by respecting the actual check period. Nice work!

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>
Add a log message at the beginning of the reconciliation loop
to make it clear that a Reconcile was called for a notebook.

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>
@apo-ger apo-ger force-pushed the feature-apoger-notebook-controller-culling branch from 63eae84 to 0a08ff6 Compare January 26, 2023 10:10
* Introduce make rule for running the controller locally with
  culling enabled

* Introduce a dev_culling_authorization_policy which must be
  applied when testing the culling-controller locally

Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>
Signed-off-by: Apostolos Gerakaris <apoger@arrikto.com>
@apo-ger apo-ger force-pushed the feature-apoger-notebook-controller-culling branch from 72f1318 to 91fe6b4 Compare January 26, 2023 13:20
@kimwnasptd
Copy link
Member

Thanks for this work @apo-ger!

/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kimwnasptd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 361a266 into kubeflow:master Jan 26, 2023
@apo-ger
Copy link
Contributor Author

apo-ger commented Jan 26, 2023

@kimwnasptd nice! Thank you for the review!

@shalberd
Copy link

shalberd commented Jan 26, 2023

Very good code using metav1.HasAnnotation returning either true or false, instead of checking for the presence of an annotation with an own logic block. I will integrate that in my PR as well, cool. @kimwnasptd Could you have a look at my PR? Took cues on annotations handling before this update from apo-gers code in the now-deleted culler helper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants