-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-43109: Add update-day-obs command #36
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #36 +/- ##
==========================================
- Coverage 54.35% 52.39% -1.97%
==========================================
Files 33 34 +1
Lines 1148 1212 +64
Branches 255 271 +16
==========================================
+ Hits 624 635 +11
- Misses 490 543 +53
Partials 34 34 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. For scalability It maybe better to move all visit logic inside the exposure batch loop, but it's up to you.
visit_defs = butler.registry.queryDimensionRecords( | ||
"visit_definition", | ||
where="exposure in (exps)", | ||
bind={"exps": list(exposures_to_be_updated)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This potentially needs to be in batches too, but if you are sure that it's not going to be large then it's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been assuming that queryDimensionRecords
would be clever enough to do the batching internally if the list is too long, but if you think that's going to be problematic I can chunk it externally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern here is that WHERE exposure in (...)
will become very long, and we do not batch that expression internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will chunk. I think we do batch internally in some cases and there might even be a ticket for doing it with dimension records.
This implementation is a proof of concept that uses standard butler APIs to do the updates. It may be too slow for large repositories.
Checklist