New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-28698: clean up image-like Formatter inheritance relationships #377
Conversation
This moves the logic that does FITS formatter parameter validation to a helper property, and removes the unused parameter argument from several methods.
This splits the FitsImageFormatterBase hierarchy into two branches: FitsRawFormatterBase and its per-instrument concrete subclasses: These do not need write support and don't involve afw's "reader" classes, but they do need to be able to construct other components from (patched) header metadata and strip that metadata in the process. StandardFitsImageFormatterBase (new) and Fits*Formatter classes here: These do need write support (including compression) but can delegate essentially all reading (including metadata stripping) to the afw "reader" classes (we just need to work out which reader method each component dispatches to). This revealed a small problem in some test code that was trying to get compression-related headers that we'd normally want to strip (the new stripping is more aggressive than the old one), but the fix is easy.
Base class is already an ABC.
These override hooks are never going to be used, and they will complicate amplifier subimage handling in the future.
Previously this was just silently ignored.
This will reduce code duplication with the obs_lsst subclass, and probably avoid new code duplication on DM-29370.
8707879
to
9a3a195
Compare
parameters = super().checked_parameters | ||
if "bbox" in parameters: | ||
raise TypeError("Raw formatters do not support reading arbitrary subimages, as some " | ||
"implementations may be assembled on-the-fly.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right way to disable a storage class's parameter in a particular formatter, because there's also that unsupportedParameters
class attribute.
But IIRC, that attribute is actually used by assembly/disassembly to know whether a component StorageClass accepts parameters passed when loading the parent, and hence I shouldn't actually remove bbox
(and origin
) there. If that's correct, maybe we should remove that attribute from the base Formatter
and have that logic look at component StorageClass components instead? Is there some subtle reason why we can't do that either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boundary between StorageClassDelegate and Formatter is tricky because in theory the delegate has to be able to process parameters if the formatter says it doesn't understand them. If you say None
for unsupported parameters it's saying that the formatter will handle everything. I think there is code in datastore that complains about parameters that are not understood at all at the storage class level. The delegate still has to understand all of them anyhow for a in-memory datastore to be able to handle parameters.
|
||
def read(self, component=None): | ||
# Docstring inherited. | ||
if self.fileDescriptor.readStorageClass != self.fileDescriptor.storageClass: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is another thing I'd like to see move into the base Formatter
or FileDatastore
on DM-26658, letting concrete Formatters
just dispatch on what component
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning up the read and write methods to know they really are files is definitely on that ticket, but this seems like it should instead be a new method like if self.expected_component():
or something. The only way to remove the if statement here completely would be to say that the new Formatter class makes a distinction between read and readComponent and makes them both part of the ABC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking along the lines of "a concrete formatter can assume that component is None
if and only if readStorageClass == storageClass
". Or is that not actually a guarantee we can make?
parameters = self.fileDescriptor.parameters | ||
if parameters is None: | ||
parameters = {} | ||
self.fileDescriptor.storageClass.validateParameters(parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something the base Formatter
or FileDatastore
(and/or maybe assemblers) could guarantee, post-DM-26658, so concrete formatters don't have to do it themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datastores already do call validateParameters. Doesn't look like assemblers do but that's not relevant for a formatter anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay. I have some confusion over the header stripping since I can't square the comments with my version of reality.
parameters = super().checked_parameters | ||
if "bbox" in parameters: | ||
raise TypeError("Raw formatters do not support reading arbitrary subimages, as some " | ||
"implementations may be assembled on-the-fly.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boundary between StorageClassDelegate and Formatter is tricky because in theory the delegate has to be able to process parameters if the formatter says it doesn't understand them. If you say None
for unsupported parameters it's saying that the formatter will handle everything. I think there is code in datastore that complains about parameters that are not understood at all at the storage class level. The delegate still has to understand all of them anyhow for a in-memory datastore to be able to handle parameters.
def checked_parameters(self): | ||
# Docstring inherited. | ||
parameters = super().checked_parameters | ||
if "bbox" in parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we said bbox
is an "unsupported parameter" then the storage class delegate would apply the bbox after read. Would we not want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting; that's a behavior possibility I hadn't considered. It's tempting because it does sort of give the user what they asked for, but in this case, loading the whole image and then taking a subimage is so contrary to the resource usage expectation of the user that it's probably safer to consider that an invalid implementation and just prevent code from trying it (I'm envisioning some huge processing job falling over because it used too much memory, and nobody being able to figure out why). So I think the code as-is yields the least bad behavior we can provide (an exception), but I'm not sure that's a general statement about parameters.
parameters = self.fileDescriptor.parameters | ||
if parameters is None: | ||
parameters = {} | ||
self.fileDescriptor.storageClass.validateParameters(parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datastores already do call validateParameters. Doesn't look like assemblers do but that's not relevant for a formatter anyhow.
|
||
def read(self, component=None): | ||
# Docstring inherited. | ||
if self.fileDescriptor.readStorageClass != self.fileDescriptor.storageClass: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning up the read and write methods to know they really are files is definitely on that ticket, but this seems like it should instead be a new method like if self.expected_component():
or something. The only way to remove the if statement here completely would be to say that the new Formatter class makes a distinction between read and readComponent and makes them both part of the ABC.
obj : component-dependent | ||
In-memory component object. | ||
@cached_getter | ||
def reader(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because I don't know whether to put leading underscores on "protected" things. If we do want leading underscores on this, I think they belong on every method here other than read
and write
and the handful of class attributes defined by Formatter
itself; every other method should only ever be called by other methods in the same inheritance tree. I lean slightly towards no underscores on protected things here, just because I'd prefer not to have add obs_subaru and obs_decam branches for this ticket, but if we go that way, I should probably at least rename _readerClass
-> readerClass
(or ReaderClass
) for consistency. But I don't have a general preference or a strong one.
md = lsst.afw.fits.readMetadata(self.fileDescriptor.location.path) | ||
fix_header(md) | ||
return md | ||
|
||
def stripMetadata(self): | ||
"""Remove metadata entries that are parsed into components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that comment below is wrong. Making the VisitInfo this way will not strip the headers at all. The only headers being stripped here are the WCS headers in the createSkyWcsFromMetadata line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually know, myself; the comment is prexisting and the logic in this case should be unchanged from before (just in a new spot). I'm happy to adjust the comment if you're sure making the ObservationInfo
doesn't stripping anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ObservationInfo never strips. It records which headers it used. If you make VisitInfo from a header you can ask it to strip or not strip. We make VisitInfo directly from the ObservationInfo and not the header so stripping the header is impossible.
return full | ||
|
||
def readRawHeaderWcs(self, parameters=None): | ||
def readRawHeaderWcs(self): | ||
"""Read the SkyWcs stored in the un-modified raw FITS WCS header keys. | ||
""" | ||
return lsst.afw.geom.makeSkyWcs(lsst.afw.fits.readMetadata(self.fileDescriptor.location.path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit annoying that we have to read the header twice just because we might have stripped the header by calling stripMetadata and dropping the resulting WCS object on the floor. Also note that if we are going to read the header again we have to apply fix_header here to ensure that we have applied corrections that might affect the WCS. Can we guarantee that this routine is only ever called if stripMetadata has not been called (because it's only called if a component is being read)? Should we cache the stripped and unstripped headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I moved around but tried not to change, but now that you mention it, I don't see this method called anywhere, even to satisfy a component. I am inclined to delete it and see if Jenkins complains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method exists because in gen2 we have a butler component to return the header wcs. We haven't enabled that feature in the gen3 as a derived component (because initially we hadn't got derived components). I think the component is called header_wcs
in gen2. RFC-616 I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be beyond relevance, but we had a previous discussion about this on DM-24024, specifically on this and the following three comments: https://jira.lsstcorp.org/browse/DM-24024?focusedCommentId=261332&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-261332
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurenam, I was actually going to ask if you thought you'd notice if Gen3 never got around to providing raw.header_wcs
. I'm currently thinking that if nobody has noticed that we don't have it yet, maybe we never should, as the kind of problems it was helpful to debug are just not problems anymore, now that Gen3 is more consistent about the raw WCS being the right one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m inclined to agree with your current line of thinking...
info.setVisitInfo(self.makeVisitInfo()) | ||
info.setWcs(self.makeWcs(info.getVisitInfo(), info.getDetector())) | ||
# We don't need to call stripMetadata() here because it has already | ||
# been stripped during creation of the ObservationInfo, WCS, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ObservationInfo shouldn't be stripping anything. WCS is the only thing that does and I think only if we call the read-from-header variant and not the read-from-visitinfo variant. Strip metadata wastes its time and calls makeVisitInfo but does strip the header WCS -- maybe I'm getting confused but can you check that this header has been stripped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll drop ObservationInfo
from the comment as per the other thread.
The default makeWcs
implementation definitely does strip, because it starts by making the WCS from metadata first - both as a fallback in case there's no VisitInfo
for the boresight+cameraGeom one, and to ensure that the stripping happens. But I should at least go add a note to the documentation that this is an expected side effect of any reimplementation of makeWcs
; it's unfortunate that we have to rely on side effects at all, but given that the real work is happening in an afw routine via side effects, I don't think we have much choice.
This seems to have been added to support a Gen3 equivalent of Gen2's raw_header_wcs dataset. But there is no such equivalent in Gen3, and no one seems to have noticed its absence. This is probably because Gen3 has always been consistent about using a boresight+cameraGeom WCS for raws, and debugging when that happened (or didn't) was what made the Gen2 dataset useful.
The FITS formatter methods and attributes are almost all conceptually "protected" rather than public or private; in the absence of a general convention for how to style those, and the local precedent that the vast majority of them look like public methods/attributes here, I'm just converting the last holdout (_readerClass) to look public as well, while also adopting our convention of using a leading capital for class attributes that represent types (ReaderClass).
No description provided.