New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-15365: Move SqlRegistry API back down into Registry #71
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay. That's a large API surface. I have plenty of minor comments but otherwise it's good to go.
@abstractmethod | ||
@transactional | ||
def associate(self, collection, refs): | ||
r"""Add existing Datasets to a Collection, possibly creating the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the r
is needed here since I don't see any back slashes.
raise NotImplementedError("Must be implemented by subclass") | ||
|
||
@abstractmethod | ||
@transactional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This @transactional
doesn't do anything right? Since this method is never called anyway and no-one ever does super()
to call it. I assume it's more documenting that this method should be transactional in the derived classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, transactionality is (or is not, depending on what we want) part of the API.
@transactional | ||
def registerDatasetType(self, datasetType): | ||
""" | ||
Add a new `DatasetType` to the SqlRegistry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere SqlRegistry
should be Registry
.
Raises | ||
------ | ||
ValueError | ||
If `run` already exists, but is not identical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use double backticks for run
.
def transfer(self, src, expr, collection): | ||
r"""Transfer contents from a source `SqlRegistry`, limited to those | ||
reachable from the Datasets identified by the expression `expr`, | ||
into this `SqlRegistry` and collection them with a Collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and collection them with a Collection
Needs some work on the grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to associate
.
An expression that limits the `DataUnit`\ s and (indirectly) | ||
the Datasets transferred. | ||
collection : `str` | ||
An additional Collection collection assigned to the newly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Collection collection"?
An additional Collection collection assigned to the newly | ||
imported Datasets. | ||
""" | ||
self.import_(src.export(expr), collection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This abstract method isn't abstract. It does something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An abstract method can have an implementation. E.g. convenient place for a default. Now here this was not the intention of course...
def _validateDataId(self, datasetType, dataId): | ||
"""Check if a dataId is valid for a particular `DatasetType`. | ||
|
||
TODO move this function to some other place once DataUnit relations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use same TODO syntax as for _isValidDatasetType
.
this Registry. | ||
@transactional | ||
def subset(self, collection, expr, datasetTypes): | ||
r"""Create a new `Collection` by subsetting an existing one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again there is no Collection
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There used to be ;) Will fix.
|
||
@transactional | ||
def merge(self, outputCollection, inputCollections): | ||
r"""Create a new Collection from a series of existing ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not need r"""
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made lots of doc and API-edge-case comments; we may want to spin at least some of the latter into new tickets.
I also wonder if we should just remove some of the APIs that we haven't implemented at all yet, pending a use case that clarifies how they ought to work in detail.
@@ -122,6 +124,10 @@ def __init__(self, registryConfig, schemaConfig=None, create=False): | |||
self.config = registryConfig | |||
self._pixelization = None | |||
|
|||
@contextlib.contextmanager | |||
def transaction(self): | |||
raise NotImplementedError("Must be implemented by subclass") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should document what subclasses should implement it to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of whether derived classes are required to support transactions, I think I'd say that they're required to support the transaction interface so one can write sometimes-transaction-safe registry-generic code, but they don't actually have to be transactional. A dummy transaction implementation might be something we'd want to put in the base class.
@transactional | ||
def registerDatasetType(self, datasetType): | ||
""" | ||
Add a new `DatasetType` to the SqlRegistry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should document up here (not necessarily in the one-line docstring, but before Returns
) that it's not an error to register the same DatasetType
twice.
@abstractmethod | ||
@transactional | ||
def addDataset(self, datasetType, dataId, run, producer=None): | ||
"""Add a Dataset to a Collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be better to say "Adds a Dataset entry to the Registry". In fact, if the run.collection is None
(which I think is possible but difficult to achieve) it probably should add the Dataset without adding it to any collection.
------ | ||
ValueError | ||
If a Dataset with the given `DatasetRef` already exists in the | ||
given Collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also raises something if you violate a foreign key constraint by using nonexistent DataUnit values in the data ID. Should we have a standard exception for that, too? Or perhaps just document that the data ID values must already have been registered? (I suppose the Python equivalent of "undefined behavior" is "undefined exception".)
|
||
@abstractmethod | ||
def getDataset(self, id): | ||
"""Retrieve an Dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"an" -> "a"; probably say "Dataset entry" so there's no confusion about InMemoryDatasets.
"""Record the given `DatasetRef` as an actual (not just predicted) | ||
input of the given `Quantum`. | ||
|
||
This updates both the `SqlRegistry`"s `Quantum` table and the Python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SqlRegistry
.
ValueError | ||
If ``ref`` is not already in the predicted inputs list. | ||
KeyError | ||
If ``ref`` is not a predicted consumer for ``quantum``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should say ref
and quantum
should be reversed here - but then I don't understand how this differs from the ValueError
condition.
|
||
@abstractmethod | ||
def export(self, expr): | ||
"""Export contents of the `SqlRegistry`, limited to those reachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SqlRegistry
@transactional | ||
def import_(self, tables, collection): | ||
"""Import (previously exported) contents into the (possibly empty) | ||
`SqlRegistry`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SqlRegistry
datasetTable.c.dataset_type_name == datasetType.name, | ||
datasetCollectionTable.c.collection == collection, | ||
dataIdExpression))).fetchone() | ||
# TODO update unit values and add Run, Quantum and assembler? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have some options in the public API controlling how much we do here. Let's try just having a full=False
option, where calling with full=True
would add optional DataUnit link values (e.g. physical_filter
when visit
is present), Quantums, and (until we get rid of it) assembler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but probably best done on a new ticket.
aaa102b
to
5ed54a0
Compare
Also cleanup (Sql)Registry documentation.
5ed54a0
to
0740eb9
Compare
No description provided.