Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-27033: Development branch for schema changes #397

Merged
merged 85 commits into from Nov 4, 2020
Merged

Conversation

@timj timj force-pushed the tickets/DM-27033 branch 3 times, most recently from 52edc3a to 11773d0 Compare October 22, 2020 16:03
@timj timj force-pushed the tickets/DM-27033 branch 4 times, most recently from d584c15 to 98729e4 Compare October 30, 2020 16:41
@TallJimbo TallJimbo force-pushed the tickets/DM-27033 branch 3 times, most recently from 8418ee0 to 5e4dd80 Compare November 4, 2020 12:44
TallJimbo and others added 19 commits November 4, 2020 12:19
The class was already documented as immutable, so the lack of
protection seems to have been an oversight.  This required some
workarounds and type-checking fixes, but that's to be expected given
the unusual and highly dynamic nature of the class.
This allows a dependency order of

    core._topology -> core.timespan -> core.dimensions

which will allow DatabaseTimespanRepresentation to depend on
TopologicalExtentDatabaseRepresentation, as intended.
- @immutable itself now provides __copy__, and its documentation
recommends __deepcopy__ be provided sometimes;

- @immutable documents that it is inherited, and I've removed the
decorator from some subclasses;

- we now prefer __init__ and __reduce__ for pickling (but continue to
use __new__ and __getnewargs__ for types that need __new__ for other
reasons).
Dimensions are now stored in registry attributes table with a key
"config:dimensions.yaml", serialized as YAML data. This change means
that we cannot use SQLite in-memory datqabase for tests that make
multiple connections (multiple Butler instances) so I had to update many
unit tests to use on-disk database instead. With this commit dimensions
are still part of full butler config but they will be separated in next
commits.
...instead of DatabaseTimespanRepresentation.

This more naturally corresponds to its soon-to-be base class,
TopologicalExtentDatabaseRepresentation.
DimensionConfig is now a separate optional argument to `makeRepo`, by
default it is loaded from `dimensions.yaml`. ButlerConfig does not
include dimensions any more. `butler` CLI adds an option
--dimension-config. `dimensions.yaml` dropped top-level `dimensions`
key, and this is exactly how it is stored in database.

DimensionConfig still inherits from ConfigSubset, this is mostly for
convenience of finding defaults, though defaults loading behavior has
been overriden, it only loads first file that it finds.
Must have been an errant keystroke long ago.
This serves two purposes:

- it avoids the module names being publicly exported as symbols by
  higher-level __init__.py files (this is already the practice used in
  'registry' and other newer daf_butler subpackages);

- because elements.py is _not_ being renamed yet, it makes the future
  commit on which it is not just renamed but rewritten easier to
  understand, because it'll look like a totally new file.

This commit is _just_ moving files and changing imports accordingly.
This is a three-pronged rewrite of a sizeable chunk of the dimensions
codebase, inspired by DM-26336 prototyping:

 - There are now ABCs for DimensionElements (in _elements.py), with
   concrete implementations in _skypix.py and standard.py.  This
   provides a better separation of interface and implementation and
   sets the stage for more to come later.  We also now have an
   explicit interface (DimensionCombination) for DimensionElements
   that are not Dimensions, instead of just relying on the (previouly
   concrete) base class for that (though this doesn't do much yet).
   The interfaces of the DimensionElement classes are essentially
   unchanged by this, though many regular attributes have become
   properties.

 - DimensionUniverse construction now goes through a builder/visitor
   system (see the new construction.py module) that separates generic
   construction concerns from the config representation (useful for
   DM-26407, hope) and the post-construction state of the objects.  In
   many cases, construction is also simplified (or at least better
   organized) by deferring calculation of some attributes until first
   use, using the new cached_getter decorator to avoid recomputing
   them.

 - The system for describing the spatial and temporal attributes of
   DimensionElements has changed, with a number of new classes in
   _topology.py.  This removes the problematic (i.e. confusing and
   cyclical-dependency-inducing) fact that an element's spatial and
   temporal attributes were references to either self or some other
   element in order to model a priority order for related spatial
   elements (like tract and patch, or visit and
   visit_detector_region).  Instead, those attributes now point to new
   TopologicalFamily objects that explicitly hold their member
   priority order.  The new system will also play a role in allowing
   non-dimension spatial and temporal entities in queries in the near
   future (namely datasets in CALIBRATION collections, which are
   temporal via their validity ranges).

Only the last of these is expected to affect code outside the
dimensions package, and it's probable that even those won't affect
anything outside daf_butler.  So while this is pretty big change
(especially for one commit), it's also a highly localized one.
This involves adding a new concrete NamedValueAbstractSet
implementation (NameMappingSetView), which is now a base class
of NamedValueSet - they're both the same under the hood right now,
except that NamedValueSet conceptually owns its internal dictionary
and considers it an implementation detail, while NameMappingSetView is
explicitly just a view of an arbitrary mapping.
TallJimbo and others added 27 commits November 4, 2020 12:19
In older gen3 schema we did not have butler_attributes table, this small
update tries to detect this situation by catching exceptions when
reading from that table. This is a temporary measure to help people
during migration period and fix can be removed once things become
stable.
DimensionElement is now a factory for its record storage class, via
a fully-qualified class name in configuration.

This is *maybe* a step backward in terms of dependencies - it gives
DimensionElement a (weak) dependency on classes we otherwise consider
"downstream" in the registry subpackage, but it's a big step forward
making those record storage classes more fully responsible for things
they should have always been responsible for, like caching, and for
avoiding a lot of smelly isinstance checks in record storage in the
future as the database representations of different kinds of dimensions
diverge further.  It also gives a lot more flexibility in configuring
dimension storage, because we can pass arbitrary kwargs directly from
config.

This transformation also doesn't feel complete: the viewOf and hasTable
properties of DimensionElement really belong on the record storage
classes instead (and the new implementation of
DatabaseDimensionElement.viewOf reflects that by peeking at what are
supposed to be - from its perspective - opaque keyword arguments to
forward).  But fixing that is best put on another commit (and maybe not
done now), as this commit is plenty big already.
DM-27373: Improve error message for missing butler_attributes table
DM-27251: add governor dimensions and use them to check queries for consistency and completeness
This insulates existing repos from changes in the default managers
in daf_butler, as long as we don't delete the old managers.  That will
let us introduce schema changes behind new manager classes, and only
upgrade old repos when desired, not when the defaults change.
The dataset record storage classes were already being passed
a DimensionRecordStorageManager at construction, so passing the
DimensionUniverse to their methods (mostly added very recently on
DM-27251) was just unnecessary.  We'll soon want to pass
DimensionRecordStorageManager to CollectionManager at construction for
other reasons, so I've done that here to use the same pattern of
passing dimensions once early across the board.
DM-27253: reimplement commonSkyPix overlaps with more general tables
DM-27390: replace DimensionGraph.{encode,decode}
No-one thinks this is useful and it's not clear how to populate it.
DM-24660: minor changes for future repo backwards compatibility support
This also removes those restrictions from CHAINED collection
definitions.

The *Restriction classes in wildcards.py are still present, as they may
still be useful in the future.  "May be useful" is not a good reason
to keep them around, given that they'll always be in the git history,
but I'll remove them on a separate commit.
DM-27409: Remove seeing from visit record
DM-27397: drop support for dataset/governor restrictions in collection searches
DM-24939: Record which dataset types and governor values are in collections.
Originally merged via GitHub #417 without rebasing; now manually merged
after.
@TallJimbo TallJimbo merged commit b44916f into master Nov 4, 2020
@TallJimbo TallJimbo deleted the tickets/DM-27033 branch November 4, 2020 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants