New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-15189: core butler support for raw data ingest #59
Conversation
1e6c30e
to
d6b4931
Compare
d6b4931
to
7628b5a
Compare
Most of the code originally here was moved to DM-15212, so some earlier comments may no longer make sense. |
93dd568
to
82378d4
Compare
f2aae5e
to
7aa301e
Compare
The full ticket is finally ready for review. In this package (as well as obs_subaru), I recommend looking at each commit separately; this package in particular just contains a lot of miscellaneous improvements necessary for making higher-level code in obs_base and obs_subaru work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good. I have quite a few comments.
ImageF: | ||
<<: *Image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Image
only has one field and that field is being over-ridden, I'm not sure this line does anything (and the other similar changes). Are you worried that Image
might get extra fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it doesn't actually do anything; I just felt that using <<:
to indicate inheritance even when it was a no-op was a useful bit of code-as-documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not too late to add a parent:
keyword and get proper pythonic inheritance for StorageClasses (I think I can sort out print through of those class properties).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see what happens with DM-15238 first.
@@ -5,7 +5,7 @@ datastore: | |||
table: PosixDatastoreRecords | |||
create: true | |||
templates: | |||
default: "{datasetType}/{tract:?}/{patch:?}/{filter:?}/{camera:?}_{visit:?}" | |||
default: "{collection}/{datasetType}/{tract:?}/{patch:?}/{filter:?}/{camera:?}_{visit:?}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment on defaulting filenames: I'm a bit worried that with this in the base definition it's not possible for people to remove defaulting in their own datastores. Maybe I should allow default: None
to disable defaulting.
config/storageClasses.yaml
Outdated
apCorrMap: TablePersistableApCorr | ||
coaddInputs: TablePersistableCoaddInputs | ||
transmissionCurve: TablePersistableTransmissionCurve | ||
metadata: PropertyList | ||
ExposureComposite: &ExposureComposite | ||
<<: *Exposure | ||
assembler: lsst.daf.butler.assemblers.exposureAssembler.ExposureAssembler | ||
ExposureCompositeF: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change won't be needed once #76 is merged because ExposureCompositeF
becomes a simple label that allows you to control concrete/virtual at the butler level.
apCorrMap: TablePersistableApCorr | ||
coaddInputs: TablePersistableCoaddInputs | ||
transmissionCurve: TablePersistableTransmissionCurve | ||
metadata: PropertyList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getMetadata
is documented to return a PropertySet
though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we discussed this on the other ticket. I'll convert to PropertySet
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually left it alone in #76 since it worked regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I'll stick with PropertyList
here, too, then.
python/lsst/daf/butler/butler.py
Outdated
@@ -266,6 +266,9 @@ def getDirect(self, ref): | |||
---------- | |||
ref : `DatasetRef` | |||
Reference to an already stored dataset. | |||
parameters : `dict` | |||
Additional StorageClass-defined options to control reading, | |||
typically used to efficiently only a subset of the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing word between "efficiently" and "only". Maybe I should work on parameters next (inMemoryDatastore will mess up with any tests where you use parameters).
if field_name in ("run", "collection"): | ||
usedRunOrCollection = True | ||
if optional: | ||
raise KeyError("'run' and 'collection' may not be optional.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you allowed to specify an optional run
if collection
is mandatory? (not that it's possible for run
to be optional of course). Maybe the error here should be "can not be optional" ("may" sounds like you might be able to be optional). Do you have a test for optional collection
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on "can be optional". Will add a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually, yeah, I suppose you could make one optional if the other is mandatory. Probably best to just remove this particular check, as there is another one to verify that at least one of those is present.
tiny = self.readFull(fileDescriptor, parameters) | ||
return fileDescriptor.storageClass.assembler().getComponent(tiny, component) | ||
|
||
def readFull(self, fileDescriptor, parameters=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are defining the parameters for get
to be any parameters understood by the Exposure constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at the moment - just bbox
and origin
. I suppose we need a place to document this at the StorageClass level. I'm open to ideas.
__all__ = ("FitsRawFormatterBase",) | ||
|
||
|
||
class FitsRawFormatterBase(FitsExposureFormatter, metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the methods here need parameters and returns sections to the numpydocs.
@@ -33,6 +33,53 @@ class FitsExposureFormatter(Formatter): | |||
|
|||
parameters = frozenset(("bbox", "origin")) | |||
|
|||
def readImageComponent(self, fileDescriptor, component): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add parameters/returns sections to docstrings.
A dictionary containing all fields in the Exposure table. | ||
""" | ||
avg = visitInfo.getDate() | ||
begin = DateTime(int(avg.nsecs(DateTime.TAI) - 0.5E9*visitInfo.getExposureTime()), DateTime.TAI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean we can't get the value of DATE-OBS
from the object but need to guess it by subtracting half the exposure time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, yes. I don't know much about VisitInfo
. I stole this recipe from obs.base.makeRawVisitInfo.offsetDate
, and I'd like to replace both that and VisitInfo
itself, but this seemed like an easy way to get the correct (or at least consistent with Gen2) answer.
7aa301e
to
255f119
Compare
I think I've addressed all comments (and rebased on latest master). One question from that rebase (which I should have asked on a previous review, probably): do we still need the |
Re ExposureComposite: I think we should move it to the test configs so that the tests can try a virtual and concrete composite. |
I gave this a quick try on this ticket, but it's not easy because aliases apparently can't be used outside the file where they're defined (so you can't easily make |
I was worrying about that cross-yaml file inheritance this morning. Would trivially be fixed if we had true python inheritance as well. Leave ExposureComposite in there for now -- I really do need to make a test for Exposure that uses virtual and one that uses concrete. |
These are useful for raw data ingest, too, so move them out of the convert package. New home still probably isn't their permanent one.
This change didn't turn out to be necessary for this ticket, but it seemed worth preserving.
The duplication of all Exposure components in order to refine the types of some is unfortunate, but should go away after DM-15238.
In retrospect, this needs to be modified to produce an Exposure, not a DecoratedImage, and we can safely remove the write method.
This can be useful when you don't want an exception to start a rollback because you're about to catch it.
4404687
to
c88c322
Compare
@timj, this is by no means done yet, but it occurred to me that we might be doing conflicting things with
Datastores
, so I wanted to get this out there in case you'd like me to spin it off to a new issue and merge sooner. I also need to write some tests for it, but since you've just refactoredDatastore
tests I definitely didn't want to do that before rebasing on that work.