DM-28698: clean up image-like Formatter inheritance relationships #377

TallJimbo · 2021-05-02T15:41:57Z

No description provided.

This moves the logic that does FITS formatter parameter validation to a helper property, and removes the unused parameter argument from several methods.

This splits the FitsImageFormatterBase hierarchy into two branches: FitsRawFormatterBase and its per-instrument concrete subclasses: These do not need write support and don't involve afw's "reader" classes, but they do need to be able to construct other components from (patched) header metadata and strip that metadata in the process. StandardFitsImageFormatterBase (new) and Fits*Formatter classes here: These do need write support (including compression) but can delegate essentially all reading (including metadata stripping) to the afw "reader" classes (we just need to work out which reader method each component dispatches to). This revealed a small problem in some test code that was trying to get compression-related headers that we'd normally want to strip (the new stripping is more aggressive than the old one), but the fix is easy.

Base class is already an ABC.

These override hooks are never going to be used, and they will complicate amplifier subimage handling in the future.

Previously this was just silently ignored.

This will reduce code duplication with the obs_lsst subclass, and probably avoid new code duplication on DM-29370.

TallJimbo · 2021-05-04T03:32:58Z

python/lsst/obs/base/_fitsRawFormatterBase.py

+        parameters = super().checked_parameters
+        if "bbox" in parameters:
+            raise TypeError("Raw formatters do not support reading arbitrary subimages, as some "
+                            "implementations may be assembled on-the-fly.")


I'm not sure this is the right way to disable a storage class's parameter in a particular formatter, because there's also that unsupportedParameters class attribute.

But IIRC, that attribute is actually used by assembly/disassembly to know whether a component StorageClass accepts parameters passed when loading the parent, and hence I shouldn't actually remove bbox (and origin) there. If that's correct, maybe we should remove that attribute from the base Formatter and have that logic look at component StorageClass components instead? Is there some subtle reason why we can't do that either?

The boundary between StorageClassDelegate and Formatter is tricky because in theory the delegate has to be able to process parameters if the formatter says it doesn't understand them. If you say None for unsupported parameters it's saying that the formatter will handle everything. I think there is code in datastore that complains about parameters that are not understood at all at the storage class level. The delegate still has to understand all of them anyhow for a in-memory datastore to be able to handle parameters.

TallJimbo · 2021-05-04T03:34:12Z

python/lsst/obs/base/formatters/fitsExposure.py

+
+    def read(self, component=None):
+        # Docstring inherited.
+        if self.fileDescriptor.readStorageClass != self.fileDescriptor.storageClass:


This check is another thing I'd like to see move into the base Formatter or FileDatastore on DM-26658, letting concrete Formatters just dispatch on what component is.

Cleaning up the read and write methods to know they really are files is definitely on that ticket, but this seems like it should instead be a new method like if self.expected_component(): or something. The only way to remove the if statement here completely would be to say that the new Formatter class makes a distinction between read and readComponent and makes them both part of the ABC.

I was thinking along the lines of "a concrete formatter can assume that component is None if and only if readStorageClass == storageClass". Or is that not actually a guarantee we can make?

TallJimbo · 2021-05-04T03:34:41Z

python/lsst/obs/base/formatters/fitsExposure.py

+        parameters = self.fileDescriptor.parameters
+        if parameters is None:
+            parameters = {}
+        self.fileDescriptor.storageClass.validateParameters(parameters)


Is this something the base Formatter or FileDatastore (and/or maybe assemblers) could guarantee, post-DM-26658, so concrete formatters don't have to do it themselves?

Datastores already do call validateParameters. Doesn't look like assemblers do but that's not relevant for a formatter anyhow.

timj

Looks okay. I have some confusion over the header stripping since I can't square the comments with my version of reality.

timj · 2021-05-04T23:33:54Z

python/lsst/obs/base/_fitsRawFormatterBase.py

+        parameters = super().checked_parameters
+        if "bbox" in parameters:
+            raise TypeError("Raw formatters do not support reading arbitrary subimages, as some "
+                            "implementations may be assembled on-the-fly.")


The boundary between StorageClassDelegate and Formatter is tricky because in theory the delegate has to be able to process parameters if the formatter says it doesn't understand them. If you say None for unsupported parameters it's saying that the formatter will handle everything. I think there is code in datastore that complains about parameters that are not understood at all at the storage class level. The delegate still has to understand all of them anyhow for a in-memory datastore to be able to handle parameters.

timj · 2021-05-04T23:37:02Z

python/lsst/obs/base/_fitsRawFormatterBase.py

+    def checked_parameters(self):
+        # Docstring inherited.
+        parameters = super().checked_parameters
+        if "bbox" in parameters:


If we said bbox is an "unsupported parameter" then the storage class delegate would apply the bbox after read. Would we not want that?

Oh, interesting; that's a behavior possibility I hadn't considered. It's tempting because it does sort of give the user what they asked for, but in this case, loading the whole image and then taking a subimage is so contrary to the resource usage expectation of the user that it's probably safer to consider that an invalid implementation and just prevent code from trying it (I'm envisioning some huge processing job falling over because it used too much memory, and nobody being able to figure out why). So I think the code as-is yields the least bad behavior we can provide (an exception), but I'm not sure that's a general statement about parameters.

timj · 2021-05-04T23:42:58Z

python/lsst/obs/base/formatters/fitsExposure.py

+        parameters = self.fileDescriptor.parameters
+        if parameters is None:
+            parameters = {}
+        self.fileDescriptor.storageClass.validateParameters(parameters)


Datastores already do call validateParameters. Doesn't look like assemblers do but that's not relevant for a formatter anyhow.

timj · 2021-05-04T23:46:31Z

python/lsst/obs/base/formatters/fitsExposure.py

+
+    def read(self, component=None):
+        # Docstring inherited.
+        if self.fileDescriptor.readStorageClass != self.fileDescriptor.storageClass:


Cleaning up the read and write methods to know they really are files is definitely on that ticket, but this seems like it should instead be a new method like if self.expected_component(): or something. The only way to remove the if statement here completely would be to say that the new Formatter class makes a distinction between read and readComponent and makes them both part of the ABC.

timj · 2021-05-04T23:52:11Z

python/lsst/obs/base/formatters/fitsExposure.py

-        obj : component-dependent
-            In-memory component object.
+    @cached_getter
+    def reader(self):


Why is this public?

Just because I don't know whether to put leading underscores on "protected" things. If we do want leading underscores on this, I think they belong on every method here other than read and write and the handful of class attributes defined by Formatter itself; every other method should only ever be called by other methods in the same inheritance tree. I lean slightly towards no underscores on protected things here, just because I'd prefer not to have add obs_subaru and obs_decam branches for this ticket, but if we go that way, I should probably at least rename _readerClass -> readerClass (or ReaderClass) for consistency. But I don't have a general preference or a strong one.

timj · 2021-05-04T23:58:21Z

python/lsst/obs/base/_fitsRawFormatterBase.py

+        md = lsst.afw.fits.readMetadata(self.fileDescriptor.location.path)
+        fix_header(md)
+        return md
+
    def stripMetadata(self):
        """Remove metadata entries that are parsed into components.


I'm pretty sure that comment below is wrong. Making the VisitInfo this way will not strip the headers at all. The only headers being stripped here are the WCS headers in the createSkyWcsFromMetadata line.

I don't actually know, myself; the comment is prexisting and the logic in this case should be unchanged from before (just in a new spot). I'm happy to adjust the comment if you're sure making the ObservationInfo doesn't stripping anything.

ObservationInfo never strips. It records which headers it used. If you make VisitInfo from a header you can ask it to strip or not strip. We make VisitInfo directly from the ObservationInfo and not the header so stripping the header is impossible.

timj · 2021-05-05T00:03:25Z

python/lsst/obs/base/_fitsRawFormatterBase.py

        return full

-    def readRawHeaderWcs(self, parameters=None):
+    def readRawHeaderWcs(self):
        """Read the SkyWcs stored in the un-modified raw FITS WCS header keys.
        """
        return lsst.afw.geom.makeSkyWcs(lsst.afw.fits.readMetadata(self.fileDescriptor.location.path))


It's a bit annoying that we have to read the header twice just because we might have stripped the header by calling stripMetadata and dropping the resulting WCS object on the floor. Also note that if we are going to read the header again we have to apply fix_header here to ensure that we have applied corrections that might affect the WCS. Can we guarantee that this routine is only ever called if stripMetadata has not been called (because it's only called if a component is being read)? Should we cache the stripped and unstripped headers?

This is something I moved around but tried not to change, but now that you mention it, I don't see this method called anywhere, even to satisfy a component. I am inclined to delete it and see if Jenkins complains.

I think this method exists because in gen2 we have a butler component to return the header wcs. We haven't enabled that feature in the gen3 as a derived component (because initially we hadn't got derived components). I think the component is called header_wcs in gen2. RFC-616 I think.

This may be beyond relevance, but we had a previous discussion about this on DM-24024, specifically on this and the following three comments: https://jira.lsstcorp.org/browse/DM-24024?focusedCommentId=261332&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-261332

@laurenam, I was actually going to ask if you thought you'd notice if Gen3 never got around to providing raw.header_wcs. I'm currently thinking that if nobody has noticed that we don't have it yet, maybe we never should, as the kind of problems it was helpful to debug are just not problems anymore, now that Gen3 is more consistent about the raw WCS being the right one.

I’m inclined to agree with your current line of thinking...

timj · 2021-05-05T00:05:27Z

python/lsst/obs/base/_fitsRawFormatterBase.py

+        info.setVisitInfo(self.makeVisitInfo())
+        info.setWcs(self.makeWcs(info.getVisitInfo(), info.getDetector()))
+        # We don't need to call stripMetadata() here because it has already
+        # been stripped during creation of the ObservationInfo, WCS, etc.


ObservationInfo shouldn't be stripping anything. WCS is the only thing that does and I think only if we call the read-from-header variant and not the read-from-visitinfo variant. Strip metadata wastes its time and calls makeVisitInfo but does strip the header WCS -- maybe I'm getting confused but can you check that this header has been stripped?

I'll drop ObservationInfo from the comment as per the other thread.

The default makeWcs implementation definitely does strip, because it starts by making the WCS from metadata first - both as a fallback in case there's no VisitInfo for the boresight+cameraGeom one, and to ensure that the stripping happens. But I should at least go add a note to the documentation that this is an expected side effect of any reimplementation of makeWcs; it's unfortunate that we have to rely on side effects at all, but given that the real work is happening in an afw routine via side effects, I don't think we have much choice.

This seems to have been added to support a Gen3 equivalent of Gen2's raw_header_wcs dataset. But there is no such equivalent in Gen3, and no one seems to have noticed its absence. This is probably because Gen3 has always been consistent about using a boresight+cameraGeom WCS for raws, and debugging when that happened (or didn't) was what made the Gen2 dataset useful.

The FITS formatter methods and attributes are almost all conceptually "protected" rather than public or private; in the absence of a general convention for how to style those, and the local precedent that the vast majority of them look like public methods/attributes here, I'm just converting the last holdout (_readerClass) to look public as well, while also adopting our convention of using a leading capital for class attributes that represent types (ReaderClass).

TallJimbo added 2 commits May 3, 2021 17:36

Add test temporaries to .gitignore.

a62fae1

Clean up parameters handling in FITS image formatters.

cef6c73

This moves the logic that does FITS formatter parameter validation to a helper property, and removes the unused parameter argument from several methods.

TallJimbo marked this pull request as draft May 3, 2021 22:31

TallJimbo added 6 commits May 3, 2021 22:22

Remove redundant metaclass specification.

5165802

Base class is already an ABC.

Stop pretending raws might have mask or variance planes to read.

1c6b3d5

These override hooks are never going to be used, and they will complicate amplifier subimage handling in the future.

Raise when raw formatters are passed a bbox parameter.

4528685

Previously this was just silently ignored.

Move raw formatter component-attaching into a separate method.

3d58d39

This will reduce code duplication with the obs_lsst subclass, and probably avoid new code duplication on DM-29370.

Remove unnecessary local import.

9a3a195

TallJimbo force-pushed the tickets/DM-28698 branch from 8707879 to 9a3a195 Compare May 4, 2021 03:30

TallJimbo commented May 4, 2021

View reviewed changes

TallJimbo marked this pull request as ready for review May 4, 2021 03:36

timj approved these changes May 5, 2021

View reviewed changes

TallJimbo added 3 commits May 7, 2021 15:39

ObservationInfo doesn't strip metadata, so don't use it for that.

6adb94e

TallJimbo merged commit 60776b0 into master May 9, 2021

TallJimbo deleted the tickets/DM-28698 branch May 9, 2021 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-28698: clean up image-like Formatter inheritance relationships #377

DM-28698: clean up image-like Formatter inheritance relationships #377

TallJimbo commented May 2, 2021

TallJimbo May 4, 2021

timj May 4, 2021

TallJimbo May 4, 2021 •

edited

timj May 4, 2021

TallJimbo May 5, 2021

TallJimbo May 4, 2021 •

edited

timj May 4, 2021

timj left a comment

timj May 4, 2021

timj May 4, 2021

TallJimbo May 5, 2021

timj May 4, 2021

timj May 4, 2021

timj May 4, 2021

TallJimbo May 5, 2021

timj May 4, 2021

TallJimbo May 5, 2021

timj May 5, 2021

timj May 5, 2021

TallJimbo May 5, 2021

timj May 5, 2021

laurenam May 6, 2021

TallJimbo May 6, 2021

laurenam May 6, 2021

timj May 5, 2021

TallJimbo May 5, 2021

DM-28698: clean up image-like Formatter inheritance relationships #377

DM-28698: clean up image-like Formatter inheritance relationships #377

Conversation

TallJimbo commented May 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo May 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo May 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo May 4, 2021 •

edited

TallJimbo May 4, 2021 •

edited