Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-15365: Move SqlRegistry API back down into Registry #71

Merged
merged 3 commits into from Aug 10, 2018

Conversation

pschella
Copy link
Collaborator

@pschella pschella commented Aug 9, 2018

No description provided.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay. That's a large API surface. I have plenty of minor comments but otherwise it's good to go.

@abstractmethod
@transactional
def associate(self, collection, refs):
r"""Add existing Datasets to a Collection, possibly creating the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the r is needed here since I don't see any back slashes.

raise NotImplementedError("Must be implemented by subclass")

@abstractmethod
@transactional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This @transactional doesn't do anything right? Since this method is never called anyway and no-one ever does super() to call it. I assume it's more documenting that this method should be transactional in the derived classes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, transactionality is (or is not, depending on what we want) part of the API.

@transactional
def registerDatasetType(self, datasetType):
"""
Add a new `DatasetType` to the SqlRegistry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere SqlRegistry should be Registry.

Raises
------
ValueError
If `run` already exists, but is not identical.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use double backticks for run.

def transfer(self, src, expr, collection):
r"""Transfer contents from a source `SqlRegistry`, limited to those
reachable from the Datasets identified by the expression `expr`,
into this `SqlRegistry` and collection them with a Collection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and collection them with a Collection

Needs some work on the grammar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to associate.

An expression that limits the `DataUnit`\ s and (indirectly)
the Datasets transferred.
collection : `str`
An additional Collection collection assigned to the newly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Collection collection"?

An additional Collection collection assigned to the newly
imported Datasets.
"""
self.import_(src.export(expr), collection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This abstract method isn't abstract. It does something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An abstract method can have an implementation. E.g. convenient place for a default. Now here this was not the intention of course...

def _validateDataId(self, datasetType, dataId):
"""Check if a dataId is valid for a particular `DatasetType`.

TODO move this function to some other place once DataUnit relations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use same TODO syntax as for _isValidDatasetType.

this Registry.
@transactional
def subset(self, collection, expr, datasetTypes):
r"""Create a new `Collection` by subsetting an existing one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again there is no Collection class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There used to be ;) Will fix.


@transactional
def merge(self, outputCollection, inputCollections):
r"""Create a new Collection from a series of existing ones.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not need r""" here.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made lots of doc and API-edge-case comments; we may want to spin at least some of the latter into new tickets.

I also wonder if we should just remove some of the APIs that we haven't implemented at all yet, pending a use case that clarifies how they ought to work in detail.

@@ -122,6 +124,10 @@ def __init__(self, registryConfig, schemaConfig=None, create=False):
self.config = registryConfig
self._pixelization = None

@contextlib.contextmanager
def transaction(self):
raise NotImplementedError("Must be implemented by subclass")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document what subclasses should implement it to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of whether derived classes are required to support transactions, I think I'd say that they're required to support the transaction interface so one can write sometimes-transaction-safe registry-generic code, but they don't actually have to be transactional. A dummy transaction implementation might be something we'd want to put in the base class.

@transactional
def registerDatasetType(self, datasetType):
"""
Add a new `DatasetType` to the SqlRegistry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document up here (not necessarily in the one-line docstring, but before Returns) that it's not an error to register the same DatasetType twice.

@abstractmethod
@transactional
def addDataset(self, datasetType, dataId, run, producer=None):
"""Add a Dataset to a Collection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to say "Adds a Dataset entry to the Registry". In fact, if the run.collection is None (which I think is possible but difficult to achieve) it probably should add the Dataset without adding it to any collection.

------
ValueError
If a Dataset with the given `DatasetRef` already exists in the
given Collection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also raises something if you violate a foreign key constraint by using nonexistent DataUnit values in the data ID. Should we have a standard exception for that, too? Or perhaps just document that the data ID values must already have been registered? (I suppose the Python equivalent of "undefined behavior" is "undefined exception".)


@abstractmethod
def getDataset(self, id):
"""Retrieve an Dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"an" -> "a"; probably say "Dataset entry" so there's no confusion about InMemoryDatasets.

"""Record the given `DatasetRef` as an actual (not just predicted)
input of the given `Quantum`.

This updates both the `SqlRegistry`"s `Quantum` table and the Python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SqlRegistry.

ValueError
If ``ref`` is not already in the predicted inputs list.
KeyError
If ``ref`` is not a predicted consumer for ``quantum``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should say ref and quantum should be reversed here - but then I don't understand how this differs from the ValueError condition.


@abstractmethod
def export(self, expr):
"""Export contents of the `SqlRegistry`, limited to those reachable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SqlRegistry

@transactional
def import_(self, tables, collection):
"""Import (previously exported) contents into the (possibly empty)
`SqlRegistry`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SqlRegistry

datasetTable.c.dataset_type_name == datasetType.name,
datasetCollectionTable.c.collection == collection,
dataIdExpression))).fetchone()
# TODO update unit values and add Run, Quantum and assembler?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have some options in the public API controlling how much we do here. Let's try just having a full=False option, where calling with full=True would add optional DataUnit link values (e.g. physical_filter when visit is present), Quantums, and (until we get rid of it) assembler.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but probably best done on a new ticket.

@pschella pschella merged commit 0740eb9 into master Aug 10, 2018
@ktlim ktlim deleted the tickets/DM-15365 branch August 25, 2018 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants