New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-13840: Prepare Butler for composite work to begin #21
Conversation
datasetType : `DatasetType` instance or `str` | ||
The `DatasetType`. | ||
dataId : `dict` | ||
An identifier with `DataUnit` names and values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the class here because the keys are names that convert to instances of DataUnit
classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. They are names of the tables. (e.g. {"camera" : "HSC", "visit" : 3}
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay. How are you expecting storage class name to be mapped to an actual class? Is that still expected to be done inside datastore?
python/lsst/daf/butler/butler.py
Outdated
|
||
Returns | ||
------- | ||
inMemoryDataset : `InMemoryDataset` | ||
The requested `Dataset`. | ||
`DatasetRef` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a variable name since Returns
sections are meant to be identical to Parameters
sections. I won't repeat the comment in each case.
refs = [self.registry.find(self.run.collection, ref) for ref in refs] | ||
for ref in self.registry.disassociate(self.run.collection, refs, remove=True): | ||
self.datastore.remove(ref.uri) | ||
datasetType = self.registry.getDatasetType(datasetType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getDatasetType
only takes a str
but this method is documented to provide either a str
or a DatasetType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably be pass-through (or perhaps fill in details) for DatasetType
instances.
@@ -96,51 +96,50 @@ class DatasetRef(object): | |||
---------- | |||
datasetType : `DatasetType` | |||
The `DatasetType` for this `Dataset`. | |||
units : `dict` | |||
dataId : `dict` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation for a dataId
is more explicit than those elsewhere.
|
||
@property | ||
def assembler(self): | ||
"""Fully-qualified name of an importable Assembler object that can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember that assemblers are classes with assemble and disassemble methods. You have to store the assembler class name, then to assemble you create an instance and run the assemble
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not according to https://confluence.lsstcorp.org/display/DM/Gen3+Butler+Composites+Design
But I'm perfectly happy for them to be so. This is part of the composite work to be done on a different ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you asked me to combine free functions into classes the code got significantly cleaner. I'll be surprised if we gain by pulling everything apart again. I was really happy with the way assembler/disassembler turned out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit that I hadn't taken that confluence page as gospel. I thought it was guiding principles so I haven't gone into edit it with my thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code on the confluence page is absolutely intended as just pseudocode. You're both very much encouraged to actively rethink all of it (the code parts, that is; I hope the conceptual stuff will actually stick this time around).
python/lsst/daf/butler/core/run.py
Outdated
associated, also used as a human-readable name for this Run. | ||
environment : `int` | ||
A Dataset that contains a description of | ||
the software environment (e.g. versions) used for this Run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie a Dataset that you can butler.get()
using this integer? The contents of which are arbitrary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case probably DatasetRef
instances. But I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they should be DatasetRef
.
python/lsst/daf/butler/core/run.py
Outdated
A Dataset that contains a description of | ||
the software environment (e.g. versions) used for this Run. | ||
pipeline : `int` | ||
A Dataset that contains a serialization of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer ;)
run = registry.makeRun(collection="test") | ||
ref = registry.addDataset(datasetType, dataId={"camera": "DummyCam"}, run=run) | ||
self.assertIsNone(ref.assembler) | ||
assembler = "some.fully.qualified.assembler" # TODO replace by actual dummy assember once implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have assemblers that should work for your testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that (re)designing assemblers/dissasemblers would be part of the Datastore
work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't foreseeing any of the assembler/disassembler classes having to change. They all seemed to be entirely consistent with your redesign.
This may be re-introduced at some later point, but now only serves to complicate the initial implementation and it isn't clear if it is needed.
Has placeholders for `put` and `get` but nothing else.
This is not enforced by the schema, but is required for getRun by collection to make sense. It is also needed for the Butler constructor. We may want to revisit this later.
This relies on a not yet implemented Datastore interface for get and put, so the associated test is an expected failure. But it should unblock work on said interfaces.
6ab6a87
to
b99012f
Compare
Please ignore changes to
default_schema.yaml
which are mostly cherry picked from @TallJimbo DM-12620 and will be rebased out later.