New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-27033: Development branch for schema changes #397
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
timj
commented
Oct 20, 2020
•
edited
edited
- DM-26407: Store dimensions configuration in registry database #393
- DM-27034: Overhaul dimension construction and add ABCs #387
- DM-27035: miscellaneous fixes and integration after DM-27034 #398
- DM-27266: Switch format for dimensions in database to JSON #401
timj
force-pushed
the
tickets/DM-27033
branch
3 times, most recently
from
October 22, 2020 16:03
52edc3a
to
11773d0
Compare
timj
force-pushed
the
tickets/DM-27033
branch
4 times, most recently
from
October 30, 2020 16:41
d584c15
to
98729e4
Compare
TallJimbo
force-pushed
the
tickets/DM-27033
branch
3 times, most recently
from
November 4, 2020 12:44
8418ee0
to
5e4dd80
Compare
The class was already documented as immutable, so the lack of protection seems to have been an oversight. This required some workarounds and type-checking fixes, but that's to be expected given the unusual and highly dynamic nature of the class.
This allows a dependency order of core._topology -> core.timespan -> core.dimensions which will allow DatabaseTimespanRepresentation to depend on TopologicalExtentDatabaseRepresentation, as intended.
- @immutable itself now provides __copy__, and its documentation recommends __deepcopy__ be provided sometimes; - @immutable documents that it is inherited, and I've removed the decorator from some subclasses; - we now prefer __init__ and __reduce__ for pickling (but continue to use __new__ and __getnewargs__ for types that need __new__ for other reasons).
Dimensions are now stored in registry attributes table with a key "config:dimensions.yaml", serialized as YAML data. This change means that we cannot use SQLite in-memory datqabase for tests that make multiple connections (multiple Butler instances) so I had to update many unit tests to use on-disk database instead. With this commit dimensions are still part of full butler config but they will be separated in next commits.
...instead of DatabaseTimespanRepresentation. This more naturally corresponds to its soon-to-be base class, TopologicalExtentDatabaseRepresentation.
DimensionConfig is now a separate optional argument to `makeRepo`, by default it is loaded from `dimensions.yaml`. ButlerConfig does not include dimensions any more. `butler` CLI adds an option --dimension-config. `dimensions.yaml` dropped top-level `dimensions` key, and this is exactly how it is stored in database. DimensionConfig still inherits from ConfigSubset, this is mostly for convenience of finding defaults, though defaults loading behavior has been overriden, it only loads first file that it finds.
Must have been an errant keystroke long ago.
This serves two purposes: - it avoids the module names being publicly exported as symbols by higher-level __init__.py files (this is already the practice used in 'registry' and other newer daf_butler subpackages); - because elements.py is _not_ being renamed yet, it makes the future commit on which it is not just renamed but rewritten easier to understand, because it'll look like a totally new file. This commit is _just_ moving files and changing imports accordingly.
This is a three-pronged rewrite of a sizeable chunk of the dimensions codebase, inspired by DM-26336 prototyping: - There are now ABCs for DimensionElements (in _elements.py), with concrete implementations in _skypix.py and standard.py. This provides a better separation of interface and implementation and sets the stage for more to come later. We also now have an explicit interface (DimensionCombination) for DimensionElements that are not Dimensions, instead of just relying on the (previouly concrete) base class for that (though this doesn't do much yet). The interfaces of the DimensionElement classes are essentially unchanged by this, though many regular attributes have become properties. - DimensionUniverse construction now goes through a builder/visitor system (see the new construction.py module) that separates generic construction concerns from the config representation (useful for DM-26407, hope) and the post-construction state of the objects. In many cases, construction is also simplified (or at least better organized) by deferring calculation of some attributes until first use, using the new cached_getter decorator to avoid recomputing them. - The system for describing the spatial and temporal attributes of DimensionElements has changed, with a number of new classes in _topology.py. This removes the problematic (i.e. confusing and cyclical-dependency-inducing) fact that an element's spatial and temporal attributes were references to either self or some other element in order to model a priority order for related spatial elements (like tract and patch, or visit and visit_detector_region). Instead, those attributes now point to new TopologicalFamily objects that explicitly hold their member priority order. The new system will also play a role in allowing non-dimension spatial and temporal entities in queries in the near future (namely datasets in CALIBRATION collections, which are temporal via their validity ranges). Only the last of these is expected to affect code outside the dimensions package, and it's probable that even those won't affect anything outside daf_butler. So while this is pretty big change (especially for one commit), it's also a highly localized one.
This involves adding a new concrete NamedValueAbstractSet implementation (NameMappingSetView), which is now a base class of NamedValueSet - they're both the same under the hood right now, except that NamedValueSet conceptually owns its internal dictionary and considers it an implementation detail, while NameMappingSetView is explicitly just a view of an arbitrary mapping.
In older gen3 schema we did not have butler_attributes table, this small update tries to detect this situation by catching exceptions when reading from that table. This is a temporary measure to help people during migration period and fix can be removed once things become stable.
DimensionElement is now a factory for its record storage class, via a fully-qualified class name in configuration. This is *maybe* a step backward in terms of dependencies - it gives DimensionElement a (weak) dependency on classes we otherwise consider "downstream" in the registry subpackage, but it's a big step forward making those record storage classes more fully responsible for things they should have always been responsible for, like caching, and for avoiding a lot of smelly isinstance checks in record storage in the future as the database representations of different kinds of dimensions diverge further. It also gives a lot more flexibility in configuring dimension storage, because we can pass arbitrary kwargs directly from config. This transformation also doesn't feel complete: the viewOf and hasTable properties of DimensionElement really belong on the record storage classes instead (and the new implementation of DatabaseDimensionElement.viewOf reflects that by peeking at what are supposed to be - from its perspective - opaque keyword arguments to forward). But fixing that is best put on another commit (and maybe not done now), as this commit is plenty big already.
DM-27373: Improve error message for missing butler_attributes table
DM-27251: add governor dimensions and use them to check queries for consistency and completeness
This insulates existing repos from changes in the default managers in daf_butler, as long as we don't delete the old managers. That will let us introduce schema changes behind new manager classes, and only upgrade old repos when desired, not when the defaults change.
The dataset record storage classes were already being passed a DimensionRecordStorageManager at construction, so passing the DimensionUniverse to their methods (mostly added very recently on DM-27251) was just unnecessary. We'll soon want to pass DimensionRecordStorageManager to CollectionManager at construction for other reasons, so I've done that here to use the same pattern of passing dimensions once early across the board.
DM-27253: reimplement commonSkyPix overlaps with more general tables
DM-27390: replace DimensionGraph.{encode,decode}
No-one thinks this is useful and it's not clear how to populate it.
DM-24660: minor changes for future repo backwards compatibility support
This also removes those restrictions from CHAINED collection definitions. The *Restriction classes in wildcards.py are still present, as they may still be useful in the future. "May be useful" is not a good reason to keep them around, given that they'll always be in the git history, but I'll remove them on a separate commit.
DM-27409: Remove seeing from visit record
DM-27397: drop support for dataset/governor restrictions in collection searches
DM-24939: Record which dataset types and governor values are in collections.
Originally merged via GitHub #417 without rebasing; now manually merged after.
TallJimbo
force-pushed
the
tickets/DM-27033
branch
from
November 4, 2020 17:19
f187bc7
to
659bafd
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.