Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-13840: Prepare Butler for composite work to begin #21

Merged
merged 26 commits into from Mar 20, 2018

Conversation

pschella
Copy link
Collaborator

Please ignore changes to default_schema.yaml which are mostly cherry picked from @TallJimbo DM-12620 and will be rebased out later.

datasetType : `DatasetType` instance or `str`
The `DatasetType`.
dataId : `dict`
An identifier with `DataUnit` names and values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the class here because the keys are names that convert to instances of DataUnit classes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. They are names of the tables. (e.g. {"camera" : "HSC", "visit" : 3}).

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay. How are you expecting storage class name to be mapped to an actual class? Is that still expected to be done inside datastore?


Returns
-------
inMemoryDataset : `InMemoryDataset`
The requested `Dataset`.
`DatasetRef`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a variable name since Returns sections are meant to be identical to Parameters sections. I won't repeat the comment in each case.

refs = [self.registry.find(self.run.collection, ref) for ref in refs]
for ref in self.registry.disassociate(self.run.collection, refs, remove=True):
self.datastore.remove(ref.uri)
datasetType = self.registry.getDatasetType(datasetType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDatasetType only takes a str but this method is documented to provide either a str or a DatasetType.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be pass-through (or perhaps fill in details) for DatasetType instances.

@@ -96,51 +96,50 @@ class DatasetRef(object):
----------
datasetType : `DatasetType`
The `DatasetType` for this `Dataset`.
units : `dict`
dataId : `dict`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation for a dataId is more explicit than those elsewhere.


@property
def assembler(self):
"""Fully-qualified name of an importable Assembler object that can be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that assemblers are classes with assemble and disassemble methods. You have to store the assembler class name, then to assemble you create an instance and run the assemble method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not according to https://confluence.lsstcorp.org/display/DM/Gen3+Butler+Composites+Design
But I'm perfectly happy for them to be so. This is part of the composite work to be done on a different ticket.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you asked me to combine free functions into classes the code got significantly cleaner. I'll be surprised if we gain by pulling everything apart again. I was really happy with the way assembler/disassembler turned out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit that I hadn't taken that confluence page as gospel. I thought it was guiding principles so I haven't gone into edit it with my thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code on the confluence page is absolutely intended as just pseudocode. You're both very much encouraged to actively rethink all of it (the code parts, that is; I hope the conceptual stuff will actually stick this time around).

associated, also used as a human-readable name for this Run.
environment : `int`
A Dataset that contains a description of
the software environment (e.g. versions) used for this Run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie a Dataset that you can butler.get() using this integer? The contents of which are arbitrary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case probably DatasetRef instances. But I'm not sure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they should be DatasetRef.

A Dataset that contains a description of
the software environment (e.g. versions) used for this Run.
pipeline : `int`
A Dataset that contains a serialization of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for environment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer ;)

run = registry.makeRun(collection="test")
ref = registry.addDataset(datasetType, dataId={"camera": "DummyCam"}, run=run)
self.assertIsNone(ref.assembler)
assembler = "some.fully.qualified.assembler" # TODO replace by actual dummy assember once implemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have assemblers that should work for your testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that (re)designing assemblers/dissasemblers would be part of the Datastore work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't foreseeing any of the assembler/disassembler classes having to change. They all seemed to be entirely consistent with your redesign.

TallJimbo and others added 26 commits March 20, 2018 11:55
This may be re-introduced at some later point, but now only
serves to complicate the initial implementation and it isn't
clear if it is needed.
Has placeholders for `put` and `get` but nothing else.
This is not enforced by the schema, but is required
for getRun by collection to make sense.
It is also needed for the Butler constructor.
We may want to revisit this later.
This relies on a not yet implemented Datastore interface for get and
put, so the associated test is an expected failure. But it should
unblock work on said interfaces.
@pschella pschella merged commit 0682c64 into master Mar 20, 2018
@ktlim ktlim deleted the tickets/DM-13840 branch August 25, 2018 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants