[Schedule] Synchronize labels on upgrade #6389

rokatyy · 2024-09-23T12:08:14Z

After implementing ML-7349, which aligns the labels in schedules and schedule objects, we need to account for objects created prior to this change. To address this, a new data migration (version 8) has been added. This migration retrieves all schedules from the database and aligns their labels accordingly.

Jira - https://iguazio.atlassian.net/browse/ML-7914

quaark

Looks good! Got a couple of comments on the alignment implementation

quaark · 2024-09-23T14:00:08Z

server/api/initial_data.py

+def _align_schedule_labels(db, db_session):
+    db_schedules: list[mlrun.common.schemas.ScheduleRecord] = db.list_schedules(
+        session=db_session
+    )
+    schedules_update = []
+    for db_schedule in db_schedules:
+        # convert list[LabelRecord] to dict
+        db_schedule_labels = {label.name: label.value for label in db_schedule.labels}
+        # merging labels
+        merged_labels = server.api.utils.scheduler.Scheduler()._merge_schedule_and_schedule_object_labels(
+            labels=db_schedule_labels,
+            scheduled_object=db_schedule.scheduled_object,
+        )
+
+        # get a Schedule object (not a ScheduleRecord) and update it
+        schedule = db._get_schedule_record(
+            db_session, db_schedule.project, db_schedule.name
+        )
+        db._update_schedule_body(
+            schedule=schedule,
+            scheduled_object=db_schedule.scheduled_object,
+            labels=merged_labels,
+        )
+        schedules_update.append(schedule)
+
+    db._upsert(db_session, schedules_update)


This method is very expensive. It is going to preform O(n) queries for each schedule in the DB. I suggest adding a as_records: bool = False, param to list_schedules (similar to list_artifacts). You can still extract the schedule struct with _transform_schedule_record_to_scheme but then you don't need to re-query for each record after getting them all.

This method is using loads of the db's private methods, I don't think we should make them public, but maybe this method should be part of the sqldb interface?

alonmr

Very well done! A few improvement suggestions

alonmr · 2024-09-23T13:59:10Z

server/api/initial_data.py

+        # convert list[LabelRecord] to dict
+        db_schedule_labels = {label.name: label.value for label in db_schedule.labels}
+        # merging labels
+        merged_labels = server.api.utils.scheduler.Scheduler()._merge_schedule_and_schedule_object_labels(


Perhaps use ensure_scheduler and get_scheduler just for consistency

alonmr · 2024-09-23T14:04:30Z

server/api/initial_data.py

+    db_schedules: list[mlrun.common.schemas.ScheduleRecord] = db.list_schedules(
+        session=db_session
+    )
+    schedules_update = []
+    for db_schedule in db_schedules:


I'm thinking that we need to optimize this for scale systems. I would take the following steps:

Add as_records flag similar to list_artifacts.

Use sqlalchemy lazy load - return the query and then for loop on the query instead of using query.all() and loading all schedules in memory.

Transform the DB schedule to schema schedule then you have both and you don't need to get the DB one (saves us O(n) queries to the DB/cache).

Suggested change

db_schedules: list[mlrun.common.schemas.ScheduleRecord] = db.list_schedules(

session=db_session

)

schedules_update = []

for db_schedule in db_schedules:

schedules_update = []

for db_schedule in db.list_schedules(session=db_session, as_records=True):

schedule_record = db._transform_schedule_record_to_scheme(db_schedule)

alonmr · 2024-09-23T14:08:38Z

tests/api/test_initial_data.py

@@ -228,6 +237,39 @@ def test_create_project_summaries():
    assert migrated_project_summary.name == project.metadata.name


+def test_align_schedule_labels():


please add some more schedules with edge cases. we need to make sure this will not break in field

rokatyy · 2024-09-23T16:15:55Z

@quaark @alonmr Great minds think alike :)

Thanks for suggestions! Have fixed them + also moved code from scheduler to helpers to enable access from both scheduler and sql avoiding circular imports.

alonmr

Very well done! Minor suggestion

server/api/utils/helpers.py

rokatyy added 2 commits September 23, 2024 13:04

[Schedule] Synchronize labels on upgrade

2a1567e

improve test

6066c82

liranbg requested a review from alonmr September 23, 2024 13:48

quaark requested changes Sep 23, 2024

View reviewed changes

alonmr reviewed Sep 23, 2024

View reviewed changes

Comments

5e275a7

rokatyy requested review from alonmr and quaark September 23, 2024 16:16

alonmr approved these changes Sep 23, 2024

View reviewed changes

server/api/utils/helpers.py Show resolved Hide resolved

liranbg approved these changes Sep 23, 2024

View reviewed changes

liranbg merged commit 74af972 into mlrun:development Sep 23, 2024
11 checks passed

rokatyy mentioned this pull request Oct 16, 2024

[Data migration] Bump latest data migration version #6554

Merged

rokatyy added a commit to rokatyy/mlrun that referenced this pull request Oct 16, 2024

[Schedule] Synchronize labels on upgrade (mlrun#6389)

997463b

rokatyy mentioned this pull request Oct 16, 2024

[Schedule] Synchronize labels on upgrade [1.7.x] #6555

Merged

roei3000b mentioned this pull request Oct 29, 2024

[ProjectSummaries] Fixing schedule count logic for jobs [1.7.x] #6597

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Schedule] Synchronize labels on upgrade #6389

[Schedule] Synchronize labels on upgrade #6389

rokatyy commented Sep 23, 2024 •

edited by jira bot

Loading

quaark left a comment

quaark Sep 23, 2024

alonmr left a comment

alonmr Sep 23, 2024

alonmr Sep 23, 2024

alonmr Sep 23, 2024

rokatyy commented Sep 23, 2024

alonmr left a comment

		@@ -228,6 +237,39 @@ def test_create_project_summaries():
		assert migrated_project_summary.name == project.metadata.name


		def test_align_schedule_labels():

[Schedule] Synchronize labels on upgrade #6389

[Schedule] Synchronize labels on upgrade #6389

Conversation

rokatyy commented Sep 23, 2024 • edited by jira bot Loading

quaark left a comment

Choose a reason for hiding this comment

quaark Sep 23, 2024

Choose a reason for hiding this comment

alonmr left a comment

Choose a reason for hiding this comment

alonmr Sep 23, 2024

Choose a reason for hiding this comment

alonmr Sep 23, 2024

Choose a reason for hiding this comment

alonmr Sep 23, 2024

Choose a reason for hiding this comment

rokatyy commented Sep 23, 2024

alonmr left a comment

Choose a reason for hiding this comment

rokatyy commented Sep 23, 2024 •

edited by jira bot

Loading