DM-13840: Prepare Butler for composite work to begin #21

pschella · 2018-03-19T16:19:34Z

Please ignore changes to default_schema.yaml which are mostly cherry picked from @TallJimbo DM-12620 and will be rebased out later.

timj · 2018-03-19T20:38:29Z

python/lsst/daf/butler/butler.py

+        datasetType : `DatasetType` instance or `str`
+            The `DatasetType`.
+        dataId : `dict`
+            An identifier with `DataUnit` names and values.


Are you referring to the class here because the keys are names that convert to instances of DataUnit classes?

No. They are names of the tables. (e.g. {"camera" : "HSC", "visit" : 3}).

timj

Looks okay. How are you expecting storage class name to be mapped to an actual class? Is that still expected to be done inside datastore?

timj · 2018-03-19T20:42:11Z

python/lsst/daf/butler/butler.py


        Returns
        -------
-        inMemoryDataset : `InMemoryDataset`
-            The requested `Dataset`.
+        `DatasetRef`


Needs a variable name since Returns sections are meant to be identical to Parameters sections. I won't repeat the comment in each case.

timj · 2018-03-19T20:48:53Z

python/lsst/daf/butler/butler.py

-        refs = [self.registry.find(self.run.collection, ref) for ref in refs]
-        for ref in self.registry.disassociate(self.run.collection, refs, remove=True):
-            self.datastore.remove(ref.uri)
+        datasetType = self.registry.getDatasetType(datasetType)


getDatasetType only takes a str but this method is documented to provide either a str or a DatasetType.

It should probably be pass-through (or perhaps fill in details) for DatasetType instances.

timj · 2018-03-19T20:50:42Z

python/lsst/daf/butler/core/datasets.py

@@ -96,51 +96,50 @@ class DatasetRef(object):
    ----------
    datasetType : `DatasetType`
        The `DatasetType` for this `Dataset`.
-    units : `dict`
+    dataId : `dict`


This documentation for a dataId is more explicit than those elsewhere.

timj · 2018-03-19T20:53:07Z

python/lsst/daf/butler/core/datasets.py

+
+    @property
+    def assembler(self):
+        """Fully-qualified name of an importable Assembler object that can be


Remember that assemblers are classes with assemble and disassemble methods. You have to store the assembler class name, then to assemble you create an instance and run the assemble method.

Not according to https://confluence.lsstcorp.org/display/DM/Gen3+Butler+Composites+Design
But I'm perfectly happy for them to be so. This is part of the composite work to be done on a different ticket.

When you asked me to combine free functions into classes the code got significantly cleaner. I'll be surprised if we gain by pulling everything apart again. I was really happy with the way assembler/disassembler turned out.

I must admit that I hadn't taken that confluence page as gospel. I thought it was guiding principles so I haven't gone into edit it with my thoughts.

The code on the confluence page is absolutely intended as just pseudocode. You're both very much encouraged to actively rethink all of it (the code parts, that is; I hope the conceptual stuff will actually stick this time around).

timj · 2018-03-19T20:57:24Z

python/lsst/daf/butler/core/run.py

+        associated, also used as a human-readable name for this Run.
+    environment : `int`
+        A Dataset that contains a description of
+        the software environment (e.g. versions) used for this Run.


ie a Dataset that you can butler.get() using this integer? The contents of which are arbitrary?

In that case probably DatasetRef instances. But I'm not sure.

Yes, they should be DatasetRef.

timj · 2018-03-19T20:57:44Z

python/lsst/daf/butler/core/run.py

+        A Dataset that contains a description of
+        the software environment (e.g. versions) used for this Run.
+    pipeline : `int`
+        A Dataset that contains a serialization of


Same comment as for environment.

Same answer ;)

timj · 2018-03-19T21:08:43Z

tests/test_sqlRegistry.py

+        run = registry.makeRun(collection="test")
+        ref = registry.addDataset(datasetType, dataId={"camera": "DummyCam"}, run=run)
+        self.assertIsNone(ref.assembler)
+        assembler = "some.fully.qualified.assembler"  # TODO replace by actual dummy assember once implemented


We already have assemblers that should work for your testing.

I thought that (re)designing assemblers/dissasemblers would be part of the Datastore work.

I wasn't foreseeing any of the assembler/disassembler classes having to change. They all seemed to be entirely consistent with your redesign.

This may be re-introduced at some later point, but now only serves to complicate the initial implementation and it isn't clear if it is needed.

Has placeholders for `put` and `get` but nothing else.

This is not enforced by the schema, but is required for getRun by collection to make sense. It is also needed for the Butler constructor. We may want to revisit this later.

This relies on a not yet implemented Datastore interface for get and put, so the associated test is an expected failure. But it should unblock work on said interfaces.

timj reviewed Mar 19, 2018

View reviewed changes

timj approved these changes Mar 19, 2018

View reviewed changes

TallJimbo and others added 26 commits March 20, 2018 11:55

Initial Butler metadata schema proposal.

fe11932

Remove registry_id compound primary-key component

d788988

This may be re-introduced at some later point, but now only serves to complicate the initial implementation and it isn't clear if it is needed.

Add minimal Butler API

1436363

Has placeholders for `put` and `get` but nothing else.

Add basic unittest framework for Butler

561338f

Update Run to new schema

641b32a

Add basic unittest for Run

d2b8b9c

Implement and test makeRun and getRun

943757a

Cleanup flake8 errors

ad240be

Only allow one run per collection

e69715e

This is not enforced by the schema, but is required for getRun by collection to make sense. It is also needed for the Butler constructor. We may want to revisit this later.

Enable test for Butler constructor

2ef019c

Add structure and test for non-composite Butler.put

7ea2f1c

Fix logic error in getRun and test

682ffa7

assertEquals -> assertEqual

70c9296

Update DatasetRef to new schema

db65867

Partially implement Registry.addDataset

cee46b1

Require DatasetRef to hold a DatasetType instance not str

1bcb3d1

Assembler is set with addAssembler

e0f1370

Add Registry.setAssembler.

790e3c8

Cleanup with flake8

9c368df

Add Registry.attachComponent

5e9696c

Add minimal non-composite Butler.get.

f9bdfb4

Implement Butler.getDirect and use it in test.

3d5067f

This relies on a not yet implemented Datastore interface for get and put, so the associated test is an expected failure. But it should unblock work on said interfaces.

Implement and test Registry.attachComponent

4c2afd3

Add test for Registry.setAssembler.

e3cebe3

Cleanup flake8 errors

3661840

Cleanup docstrings.

b99012f

pschella force-pushed the tickets/DM-13840 branch from 6ab6a87 to b99012f Compare March 20, 2018 16:00

pschella merged commit 0682c64 into master Mar 20, 2018

ktlim deleted the tickets/DM-13840 branch August 25, 2018 06:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-13840: Prepare Butler for composite work to begin #21

DM-13840: Prepare Butler for composite work to begin #21

pschella commented Mar 19, 2018

timj Mar 19, 2018

pschella Mar 20, 2018

timj left a comment

timj Mar 19, 2018

timj Mar 19, 2018

pschella Mar 20, 2018

timj Mar 19, 2018

timj Mar 19, 2018

pschella Mar 20, 2018

timj Mar 20, 2018

timj Mar 20, 2018

TallJimbo Mar 20, 2018

timj Mar 19, 2018

pschella Mar 20, 2018

pschella Mar 20, 2018

timj Mar 19, 2018

pschella Mar 20, 2018

timj Mar 19, 2018

pschella Mar 20, 2018

timj Mar 20, 2018

DM-13840: Prepare Butler for composite work to begin #21

DM-13840: Prepare Butler for composite work to begin #21

Conversation

pschella commented Mar 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment