-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Schedule] Synchronize labels on upgrade #6389
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Got a couple of comments on the alignment implementation
server/api/initial_data.py
Outdated
def _align_schedule_labels(db, db_session): | ||
db_schedules: list[mlrun.common.schemas.ScheduleRecord] = db.list_schedules( | ||
session=db_session | ||
) | ||
schedules_update = [] | ||
for db_schedule in db_schedules: | ||
# convert list[LabelRecord] to dict | ||
db_schedule_labels = {label.name: label.value for label in db_schedule.labels} | ||
# merging labels | ||
merged_labels = server.api.utils.scheduler.Scheduler()._merge_schedule_and_schedule_object_labels( | ||
labels=db_schedule_labels, | ||
scheduled_object=db_schedule.scheduled_object, | ||
) | ||
|
||
# get a Schedule object (not a ScheduleRecord) and update it | ||
schedule = db._get_schedule_record( | ||
db_session, db_schedule.project, db_schedule.name | ||
) | ||
db._update_schedule_body( | ||
schedule=schedule, | ||
scheduled_object=db_schedule.scheduled_object, | ||
labels=merged_labels, | ||
) | ||
schedules_update.append(schedule) | ||
|
||
db._upsert(db_session, schedules_update) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This method is very expensive. It is going to preform
O(n)
queries for each schedule in the DB. I suggest adding aas_records: bool = False,
param tolist_schedules
(similar tolist_artifacts
). You can still extract the schedule struct with_transform_schedule_record_to_scheme
but then you don't need to re-query for each record after getting them all. - This method is using loads of the
db
's private methods, I don't think we should make them public, but maybe this method should be part of thesqldb
interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well done! A few improvement suggestions
server/api/initial_data.py
Outdated
# convert list[LabelRecord] to dict | ||
db_schedule_labels = {label.name: label.value for label in db_schedule.labels} | ||
# merging labels | ||
merged_labels = server.api.utils.scheduler.Scheduler()._merge_schedule_and_schedule_object_labels( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps use ensure_scheduler
and get_scheduler
just for consistency
server/api/initial_data.py
Outdated
db_schedules: list[mlrun.common.schemas.ScheduleRecord] = db.list_schedules( | ||
session=db_session | ||
) | ||
schedules_update = [] | ||
for db_schedule in db_schedules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that we need to optimize this for scale systems. I would take the following steps:
- Add
as_records
flag similar tolist_artifacts
. - Use sqlalchemy lazy load - return the query and then for loop on the query instead of using query.all() and loading all schedules in memory.
- Transform the DB schedule to schema schedule then you have both and you don't need to get the DB one (saves us O(n) queries to the DB/cache).
db_schedules: list[mlrun.common.schemas.ScheduleRecord] = db.list_schedules( | |
session=db_session | |
) | |
schedules_update = [] | |
for db_schedule in db_schedules: | |
schedules_update = [] | |
for db_schedule in db.list_schedules(session=db_session, as_records=True): | |
schedule_record = db._transform_schedule_record_to_scheme(db_schedule) |
tests/api/test_initial_data.py
Outdated
@@ -228,6 +237,39 @@ def test_create_project_summaries(): | |||
assert migrated_project_summary.name == project.metadata.name | |||
|
|||
|
|||
def test_align_schedule_labels(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add some more schedules with edge cases. we need to make sure this will not break in field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well done! Minor suggestion
After implementing ML-7349, which aligns the labels in schedules and schedule objects, we need to account for objects created prior to this change. To address this, a new data migration (version 8) has been added. This migration retrieves all schedules from the database and aligns their labels accordingly.
Jira - https://iguazio.atlassian.net/browse/ML-7914