Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-28698: clean up image-like Formatter inheritance relationships #377

Merged
merged 11 commits into from May 9, 2021

Conversation

TallJimbo
Copy link
Member

No description provided.

This moves the logic that does FITS formatter parameter validation to
a helper property, and removes the unused parameter argument from
several methods.
@TallJimbo TallJimbo marked this pull request as draft May 3, 2021 22:31
This splits the FitsImageFormatterBase hierarchy into two branches:

FitsRawFormatterBase and its per-instrument concrete subclasses:
  These do not need write support and don't involve afw's "reader"
  classes, but they do need to be able to construct other components
  from (patched) header metadata and strip that metadata in the process.

StandardFitsImageFormatterBase (new) and Fits*Formatter classes here:
  These do need write support (including compression) but can delegate
  essentially all reading (including metadata stripping) to the afw
  "reader" classes (we just need to  work out which reader method each
  component dispatches to).

This revealed a small problem in some test code that was trying to get
compression-related headers that we'd normally want to strip (the new
stripping is more aggressive than the old one), but the fix is easy.
Base class is already an ABC.
These override hooks are never going to be used, and they will
complicate amplifier subimage handling in the future.
Previously this was just silently ignored.
This will reduce code duplication with the obs_lsst subclass, and
probably avoid new code duplication on DM-29370.
parameters = super().checked_parameters
if "bbox" in parameters:
raise TypeError("Raw formatters do not support reading arbitrary subimages, as some "
"implementations may be assembled on-the-fly.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right way to disable a storage class's parameter in a particular formatter, because there's also that unsupportedParameters class attribute.

But IIRC, that attribute is actually used by assembly/disassembly to know whether a component StorageClass accepts parameters passed when loading the parent, and hence I shouldn't actually remove bbox (and origin) there. If that's correct, maybe we should remove that attribute from the base Formatter and have that logic look at component StorageClass components instead? Is there some subtle reason why we can't do that either?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boundary between StorageClassDelegate and Formatter is tricky because in theory the delegate has to be able to process parameters if the formatter says it doesn't understand them. If you say None for unsupported parameters it's saying that the formatter will handle everything. I think there is code in datastore that complains about parameters that are not understood at all at the storage class level. The delegate still has to understand all of them anyhow for a in-memory datastore to be able to handle parameters.


def read(self, component=None):
# Docstring inherited.
if self.fileDescriptor.readStorageClass != self.fileDescriptor.storageClass:
Copy link
Member Author

@TallJimbo TallJimbo May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is another thing I'd like to see move into the base Formatter or FileDatastore on DM-26658, letting concrete Formatters just dispatch on what component is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning up the read and write methods to know they really are files is definitely on that ticket, but this seems like it should instead be a new method like if self.expected_component(): or something. The only way to remove the if statement here completely would be to say that the new Formatter class makes a distinction between read and readComponent and makes them both part of the ABC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking along the lines of "a concrete formatter can assume that component is None if and only if readStorageClass == storageClass". Or is that not actually a guarantee we can make?

parameters = self.fileDescriptor.parameters
if parameters is None:
parameters = {}
self.fileDescriptor.storageClass.validateParameters(parameters)
Copy link
Member Author

@TallJimbo TallJimbo May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something the base Formatter or FileDatastore (and/or maybe assemblers) could guarantee, post-DM-26658, so concrete formatters don't have to do it themselves?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datastores already do call validateParameters. Doesn't look like assemblers do but that's not relevant for a formatter anyhow.

@TallJimbo TallJimbo marked this pull request as ready for review May 4, 2021 03:36
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay. I have some confusion over the header stripping since I can't square the comments with my version of reality.

parameters = super().checked_parameters
if "bbox" in parameters:
raise TypeError("Raw formatters do not support reading arbitrary subimages, as some "
"implementations may be assembled on-the-fly.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boundary between StorageClassDelegate and Formatter is tricky because in theory the delegate has to be able to process parameters if the formatter says it doesn't understand them. If you say None for unsupported parameters it's saying that the formatter will handle everything. I think there is code in datastore that complains about parameters that are not understood at all at the storage class level. The delegate still has to understand all of them anyhow for a in-memory datastore to be able to handle parameters.

def checked_parameters(self):
# Docstring inherited.
parameters = super().checked_parameters
if "bbox" in parameters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we said bbox is an "unsupported parameter" then the storage class delegate would apply the bbox after read. Would we not want that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting; that's a behavior possibility I hadn't considered. It's tempting because it does sort of give the user what they asked for, but in this case, loading the whole image and then taking a subimage is so contrary to the resource usage expectation of the user that it's probably safer to consider that an invalid implementation and just prevent code from trying it (I'm envisioning some huge processing job falling over because it used too much memory, and nobody being able to figure out why). So I think the code as-is yields the least bad behavior we can provide (an exception), but I'm not sure that's a general statement about parameters.

parameters = self.fileDescriptor.parameters
if parameters is None:
parameters = {}
self.fileDescriptor.storageClass.validateParameters(parameters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datastores already do call validateParameters. Doesn't look like assemblers do but that's not relevant for a formatter anyhow.


def read(self, component=None):
# Docstring inherited.
if self.fileDescriptor.readStorageClass != self.fileDescriptor.storageClass:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning up the read and write methods to know they really are files is definitely on that ticket, but this seems like it should instead be a new method like if self.expected_component(): or something. The only way to remove the if statement here completely would be to say that the new Formatter class makes a distinction between read and readComponent and makes them both part of the ABC.

obj : component-dependent
In-memory component object.
@cached_getter
def reader(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this public?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because I don't know whether to put leading underscores on "protected" things. If we do want leading underscores on this, I think they belong on every method here other than read and write and the handful of class attributes defined by Formatter itself; every other method should only ever be called by other methods in the same inheritance tree. I lean slightly towards no underscores on protected things here, just because I'd prefer not to have add obs_subaru and obs_decam branches for this ticket, but if we go that way, I should probably at least rename _readerClass -> readerClass (or ReaderClass) for consistency. But I don't have a general preference or a strong one.

md = lsst.afw.fits.readMetadata(self.fileDescriptor.location.path)
fix_header(md)
return md

def stripMetadata(self):
"""Remove metadata entries that are parsed into components.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that comment below is wrong. Making the VisitInfo this way will not strip the headers at all. The only headers being stripped here are the WCS headers in the createSkyWcsFromMetadata line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually know, myself; the comment is prexisting and the logic in this case should be unchanged from before (just in a new spot). I'm happy to adjust the comment if you're sure making the ObservationInfo doesn't stripping anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObservationInfo never strips. It records which headers it used. If you make VisitInfo from a header you can ask it to strip or not strip. We make VisitInfo directly from the ObservationInfo and not the header so stripping the header is impossible.

return full

def readRawHeaderWcs(self, parameters=None):
def readRawHeaderWcs(self):
"""Read the SkyWcs stored in the un-modified raw FITS WCS header keys.
"""
return lsst.afw.geom.makeSkyWcs(lsst.afw.fits.readMetadata(self.fileDescriptor.location.path))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit annoying that we have to read the header twice just because we might have stripped the header by calling stripMetadata and dropping the resulting WCS object on the floor. Also note that if we are going to read the header again we have to apply fix_header here to ensure that we have applied corrections that might affect the WCS. Can we guarantee that this routine is only ever called if stripMetadata has not been called (because it's only called if a component is being read)? Should we cache the stripped and unstripped headers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I moved around but tried not to change, but now that you mention it, I don't see this method called anywhere, even to satisfy a component. I am inclined to delete it and see if Jenkins complains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method exists because in gen2 we have a butler component to return the header wcs. We haven't enabled that feature in the gen3 as a derived component (because initially we hadn't got derived components). I think the component is called header_wcs in gen2. RFC-616 I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be beyond relevance, but we had a previous discussion about this on DM-24024, specifically on this and the following three comments: https://jira.lsstcorp.org/browse/DM-24024?focusedCommentId=261332&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-261332

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurenam, I was actually going to ask if you thought you'd notice if Gen3 never got around to providing raw.header_wcs. I'm currently thinking that if nobody has noticed that we don't have it yet, maybe we never should, as the kind of problems it was helpful to debug are just not problems anymore, now that Gen3 is more consistent about the raw WCS being the right one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m inclined to agree with your current line of thinking...

info.setVisitInfo(self.makeVisitInfo())
info.setWcs(self.makeWcs(info.getVisitInfo(), info.getDetector()))
# We don't need to call stripMetadata() here because it has already
# been stripped during creation of the ObservationInfo, WCS, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObservationInfo shouldn't be stripping anything. WCS is the only thing that does and I think only if we call the read-from-header variant and not the read-from-visitinfo variant. Strip metadata wastes its time and calls makeVisitInfo but does strip the header WCS -- maybe I'm getting confused but can you check that this header has been stripped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll drop ObservationInfo from the comment as per the other thread.

The default makeWcs implementation definitely does strip, because it starts by making the WCS from metadata first - both as a fallback in case there's no VisitInfo for the boresight+cameraGeom one, and to ensure that the stripping happens. But I should at least go add a note to the documentation that this is an expected side effect of any reimplementation of makeWcs; it's unfortunate that we have to rely on side effects at all, but given that the real work is happening in an afw routine via side effects, I don't think we have much choice.

This seems to have been added to support a Gen3 equivalent of Gen2's
raw_header_wcs dataset.  But there is no such equivalent in Gen3, and
no one seems to have noticed its absence.  This is probably because
Gen3 has always been consistent about using a boresight+cameraGeom WCS
for raws, and debugging when that happened (or didn't) was what made
the Gen2 dataset useful.
The FITS formatter methods and attributes are almost all conceptually
"protected" rather than public or private; in the absence of a general
convention for how to style those, and the local precedent that the
vast majority of them look like public methods/attributes here, I'm
just converting the last holdout (_readerClass) to look public as well,
while also adopting our convention of using a leading capital for
class attributes that represent types (ReaderClass).
@TallJimbo TallJimbo merged commit 60776b0 into master May 9, 2021
@TallJimbo TallJimbo deleted the tickets/DM-28698 branch May 9, 2021 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants