DM-54083: add VisitImage and butler integration by TallJimbo · Pull Request #7 · lsst/images

TallJimbo · 2026-02-20T21:15:04Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

timj

Some quick comments for a Friday afternoon. I'll look more on Monday

tests/test_transforms.py

timj · 2026-02-20T21:53:08Z

python/lsst/images/fits/formatters.py

+
+    def check_unhandled_parameters(self) -> None:
+        if self.file_descriptor.parameters:
+            raise RuntimeError("Parameters {")


Did you mean to write something else for the error message?

timj · 2026-02-20T21:55:46Z

python/lsst/images/tests/_roundtrip.py

+import astropy.io.fits
+
+try:
+    from lsst.daf.butler import Butler, DataCoordinate, DatasetRef, DatasetType


There's no harm in adding lsst-daf-butler to pyproject.toml as a tests optional dependency for the purposes of testing (I see it is in requirements.txt).

I did add it to pyproject.toml as its own feature:

# Add feature for Butler support. butler = ["lsst-daf-butler"]

because I figured some pip users might want butler support while others didn't. Is there still a reason to add it there as a tests feature dependency?

Ok. I only suggested for tests since I saw it in the tests package.

timj · 2026-02-20T21:57:29Z

python/lsst/images/tests/_roundtrip.py

+        root = self._exit_stack.enter_context(
+            tempfile.TemporaryDirectory(ignore_cleanup_errors=True, delete=True)
+        )
+        Butler.makeRepo(root)


Suggested change

Butler.makeRepo(root)

config = Butler.makeRepo(root)

then below Butler.from_config(config, ...) is in theory more efficient.

timj · 2026-02-20T22:01:09Z

python/lsst/images/tests/_roundtrip.py

+        empty_data_id = DataCoordinate.make_empty(self.butler.dimensions)
+        for name, storage_class in self._kwargs.items():
+            dataset_type = DatasetType(name, self.butler.dimensions.empty, storage_class)
+            self.butler.registry.registerDatasetType(dataset_type)


This line fails with KeyError Projection if you have the wrong butler set up. Maybe catch and add_note indicating that the butler is too old?

python/lsst/images/fits/_common.py

timj · 2026-02-20T23:41:23Z

python/lsst/images/_image.py

+        compression
+            Compression options.
+        compression_seed
+            A FITs tile compression seed to use whenever the configured


Suggested change

A FITs tile compression seed to use whenever the configured

A FITS tile compression seed to use whenever the configured

There are a few more "FITs" in the code.

python/lsst/images/tests/_checks.py

timj · 2026-02-20T23:42:52Z

python/lsst/images/_visit_image.py

+
+    # Inherited attributes are duplicated because that improves the docs
+    # (some limitation in the sphinx/pydantic integration), and these are
+    # import docs.


timj · 2026-02-20T23:43:46Z

python/lsst/images/_masked_image.py

    @classmethod
    def read_fits(cls, url: ResourcePathExpression, *, bbox: Box | None = None) -> MaskedImage:
-        """Read a masked image from a FITS file.
+        """Read an image from a FITS file.


It seems like the doc string was right before?

Yes, but I find myself copying these a lot, and since (informally) "image" applies to all of these types, I figured it's less likely I'll get one dramatically wrong by using that.

TallJimbo

I've addressed your comments so far, as fixup! comments that I'll squash later.

One issue I've already spotted in starting work downstream of this ticket is that I think the kwargs on the view and copy methods of the various image types are trying to do too much; changing the units or the projection (especially on a view) is going to be a very rare thing and it's better to just ask users to use the constructor for that. Changing the start is an even bigger problem, because it's something the CellCoadd subclass of MaskedImage can't really support, which makes for a Liskov substitution error.

The mask schema is sort of an exception: copying a Mask or something holding a Mask in order to add a new mask plane seems like it might be really common. But for that we probably want a dedicated method that only copies the Mask and even then only when needed.

Since this affects both of the downstream tickets I'm planning to work on next (color images an cell coadds), I'm going to do it as one more commit on this branch, coming soon.

TallJimbo · 2026-02-22T14:45:02Z

python/lsst/images/fits/_common.py

+
+    @classmethod
+    def from_index_row(cls, row: np.ndarray) -> ExtensionKey:
+        """Construct from a row of the index binary table appended to the


It's because I never turned it on. And while I don't like not having a lint for getting the Parameters wrong, I kind of do like not having a lint for cases like this where there's only one parameter and its docs would just be a repeat of the function docstring; I'm not convinced we need Parameters there, just like I don't think we need always Returns when the type is provided by the annotation.. But maybe that's an option I can set.

TallJimbo · 2026-02-22T14:50:11Z

python/lsst/images/fits/formatters.py

+                header.set(key, value)
+
+
+class ComponentSentinal(enum.Enum):


This word and "writable" are my nemeses. I feel I've got spelling under control except for those two 🤦‍♂️.

TallJimbo · 2026-02-22T15:10:12Z

python/lsst/images/tests/_roundtrip.py

+import astropy.io.fits
+
+try:
+    from lsst.daf.butler import Butler, DataCoordinate, DatasetRef, DatasetType


I did add it to pyproject.toml as its own feature:

# Add feature for Butler support. butler = ["lsst-daf-butler"]

because I figured some pip users might want butler support while others didn't. Is there still a reason to add it there as a tests feature dependency?

TallJimbo · 2026-02-22T15:19:20Z

python/lsst/images/_masked_image.py

    @classmethod
    def read_fits(cls, url: ResourcePathExpression, *, bbox: Box | None = None) -> MaskedImage:
-        """Read a masked image from a FITS file.
+        """Read an image from a FITS file.


Yes, but I find myself copying these a lot, and since (informally) "image" applies to all of these types, I figured it's less likely I'll get one dramatically wrong by using that.

timj

Looks great. I ran out of steam before the tests since I have a load of things piling up this afternoon, but I don't want to block you on me not getting around to the test suite.

timj · 2026-02-23T18:33:27Z

python/lsst/images/fits/_common.py

+        of a legacy (`lsst.afw.image`) FITS file.
+        """
+        primary_header = header.copy(strip=True)
+        # No idea what these spare TAN-SIP headers are doing in the afw


There are tests in the afw test code looking for them...

I see afw tests looking for A_ORDER and B_ORDER when they appear as part of actual TAN-SIP WCSs, presumably in headers for actual image HDUs. What's bizarre here is that:

this is an empty primary HDU, so it has no business having a WCS;

there are no other WCS keys, just A_ORDER and B_ORDER.

timj · 2026-02-23T18:35:31Z

python/lsst/images/fits/_common.py

+def strip_legacy_exposure_cards(header: astropy.io.fits.Header) -> None:
+    """Strip header keywords added by lsst.afw.image.Exposure."""
+    header.remove("AR_HDU", ignore_missing=True)
+    for name in (


For the record, these are the VisitInfo headers.

std::vector<std::string> keyList = {"EXPTIME", "DARKTIME", "DATE-AVG", "TIMESYS", "TIME-MID", "MJD-AVG-UT1", "AVG-ERA", "BORE-RA", "BORE-DEC", "BORE-AZ", "BORE-ALT", "BORE-AIRMASS", "BORE-ROTANG", "ROTTYPE", "OBS-LONG", "OBS-LAT", "OBS-ELEV", "AIRTEMP", "AIRPRESS", "HUMIDITY", "INSTRUMENT", "IDNUM", "FOCUSZ", "OBSTYPE", "PROGRAM", "REASON", "OBJECT", "HAS-SIMULATED-CONTENT"};

and annoyingly they clash with some of the LSSTCam raw headers.

If they clash with the LSSTCam raw headers, I'm guessing I can't just blindly strip them all? Or has the damage already been done by the time we get to visit_image (i.e. raw header overwritten by VisitInfo), so I should just strip them?

If necessary I can cook up some way to get a few more header keys from the raws when I rewrite the visit_image datasets, especially if this means we have to extract the ObservationInfo from the raw rather than the old visit_image.

The damage has been done in that the visit info overwrites. It's a nightmare and we long ago talked about fixing visit info to not do that but never got around to it.

timj · 2026-02-23T18:38:18Z

python/lsst/images/fits/_input_archive.py

+    page_size
+        Minimum number of bytes to read at at once.  Making this a multiple
+        of the FITS block size (2880) is recommended.
+    partial


When I first looked at this I assumed partial meant that the file would be left open so that additional reads can happen efficiently to minimize memory pressure. I see now that it means that you can read a subset to save on memory but the file is closed.

Right. I very much had implementing Formatter interfaces in mind here.

timj · 2026-02-23T18:53:11Z

python/lsst/images/fits/_input_archive.py

-        if name not in self._opaque_metadata.headers:
+        if key not in self._opaque_metadata.headers:
            opaque_header = reader.header.copy(strip=True)
+            opaque_header.remove("EXTNAME", ignore_missing=True)


Not overly happy with this stripping being repeated in many places.

I'm trying to keep header stripping of a particular keyword in the same class that adds that keyword on write (or converts the legacy type that added that keyword), which is part of the reason it's spread out. But I might be able to put EXTNAME and EXTVER stripping in particular somewhere it'd be more reusable.

timj · 2026-02-23T19:07:07Z

python/lsst/images/fits/formatters.py

+
+    INVALID_COMPONENT_MODEL = enum.auto()
+    """This formatter recognizes the given component, but the expected
+    attribute of the top-level `..serialization.ArchiveTree` did not exist


I'm pretty sure .. doesn't work for sphinx, or at least that's what I learned when I was trying to fix warnings in daf_butler. I think this has no impact because ComponentSentinel or any of the formatter classes are not in the docs.

I'm pretty sure .. does work, but it's relative to Sphinx's hierarchy, not Python's, which are not always the same, and of course it doesn't work if you put it in a base class docstring and that gets inherited in another module.

But yeah, we can't use this as a test case until we get the daf_butler docs in good enough shape to allow a downstream subclass like this to inherit those docstrings in another package.

Interesting. I will need to revisit. I only managed to get a single dot working last time I tried but that might have been because all the bonus packages in butler confuse things.

timj · 2026-02-23T19:24:08Z

python/lsst/images/serialization/_dtypes.py

-    return np.dtype(t.to_numpy()).kind == "u"
+def is_integer(t: NumberType) -> TypeGuard[IntegerType]:
+    """Test whether a `NumberType` corresponds to an integer type."""
+    match np.dtype(t.to_numpy()).kind:


Is match that much better than:

return np.dtype(t.to_numpy()).kind in ("i", "u")

?

timj · 2026-02-23T19:35:07Z

python/lsst/images/_mask.py

+                bit = self.bit(plane.name)
+                if bit.index != 0:
+                    raise TypeError("Only mask schemas with mask_size==1 can be described in FITS.")
+                header.set(f"MSKN{n + 1:04d}", plane.name, f"Name for mask plane {n + 1}.")


Is MSKN and lsst.images convention or a broader convention? We should be documenting these somewhere. In some sense it's more explicit if we use HIERARCH LSST IMAGES MASK NAME and then everyone knows what's happening and that it's not a standard.

This is a new lsst.images convention I just invented now. I figured if we were going to try to push for it to become a registered convention at some point it'd be better as a non-HIERARCH keyword, but I'm happy to defer to you and @gpdf on that point, and I agree that if we're not going to push to make it a registered convention then a longer name with HIERARCH is better.

timj · 2026-02-23T19:40:05Z

python/lsst/images/_visit_image.py

+                raise ValueError(
+                    "Visit ID could not be found in butler data ID metadata and must be provided."
+                ) from None
+        if unit is None:


I am concerned that this can mean you can specify a unit that doesn't match the BUNIT and confusion would reign.

I'll add a consistency check here.

timj · 2026-02-23T19:41:52Z

python/lsst/images/_visit_image.py

+        md = legacy.getMetadata()
+        if instrument is None:
+            try:
+                instrument = str(md["LSST BUTLER DATAID INSTRUMENT"])


It's also going to be in the INSTRUME header. Same worry below that the file could say LATISS but you pass in LSSTCam.

Do you know if that would have the exact same value? I was hoping to consistently use the butler short name, i.e. the one in data IDs.

And if they're different variants of the same name it gets a lot harder to do a consistency check.

LSSTCam instrume does not match the butler name so in retrospect it's not great to use it. You'd have to throw the header at a metadata translator and ask it for the standardized name but that's already in the provenance and the visit info serialization.

timj · 2026-02-23T19:44:00Z

python/lsst/images/_visit_image.py

+
+        Parameters
+        ----------
+        filename


In theory this can be a ResourcePathExpression and you can open the file with

fs, fspath = ResourcePath(filename).to_fsspec() with fs.open(fspath) as f, astropy.io.fits.open(f) as fits_obj: ...

we do that in dax_images_cutout.

Unfortunately that only works for the Astropy reads here; there's also some ExposureFitsReader usage that requires a local file.

I was hoping you had written the ability for an astropy user to read in DP1 visit images without needing afw...

Not for a full Exposure -> VisitImage. But I can go apply this suggestion to MaskedImage.read_legacy, Image.read_legacy, and Mask.read_legacy (since those don't need to use afw readers), and give them an option to attach the FITS header WCSs as their Projection, and that would provide a way for Astropy users to read the most user-relevant bits of a DP1 visit image.

I've pushed two more commits to enable this (reading URIs, attaching FITS WCS) for Image/Mask/MaskedImage. I didn't add anything to the legacy formatters in obs_base to use it, but in theory we've now got at a low level what we need for (e.g.) the DP1 cutout service to return one of those without completely dropping the WCS.

timj · 2026-02-24T17:57:53Z

I don't think we can strip visit info unless we are creating observation info.

Saving multiple HDUs with same now automatically increments EXTVER.

To be extra careful about overflow we drop one bit from each element in this case.

This is primarily motivated by Firefly's preference for multiple int32 HDUs vs. a data cube, since they also need to support naturally higher-dimension bitmasks. But it also ties in nicely with CFITSIO and Astropy using int32 as the intermediate type for all image compression.

Avoid writing types in reStructuredText when they'll be added to the docs via annotations. Avoid noting the serialize/deserialize signature in specific instances when it's a generic feature of the library.

This sets Pydantic config to round-trip inf/NaN and bytes properly through JSON. The Pydantic docs claim these operate on each model in the hierarchy independently (i.e. so a model could set the configs only if it saves values that they would apply to), but this is demonstrably not true for at least the inf/nan case, where the option needs to be set on the outermost model to have any effect. Safest to just set them everywhere.

Instead of passing this as a constructor argument, we just assign to objects after creation. This involves more boilerplate, but it's more consistent with treating that metadata as private. This change also fixes a few spots where archive metadata was not being propagated.

We can now serialize any type with both serialize and deserialize methods and the new _get_archive_tree_type method. The Image, Mask, and MaskedImage classes have read_fits/write_fits wrappers for convenience and type-checking (explicit keyword arguments instead of **kwargs).

This includes adding support for an external compression seed to FitsOutputArchive, which we'll populate with a hash of a data ID in the formatter.

This requires a tiny change in the formatters in obs_base, since we can't quite make everything work with an in-memory conversion.

The inherited docstrings have way too many now-invalid relative links to other butler types, and users don't gain much from seeing them in the docs.

These are not obviously useful in more than a handful of rare cases (where they're only a tiny bit more convenient than the constructor), and they're both full of boilerplate and potentially problematic for subclassing.

This reverts commit 56b8d2f.

We can't extend this to VisitImage because we need to use lsst.afw.image.ExposureFitsReader there.

TallJimbo force-pushed the tickets/DM-54083 branch 2 times, most recently from e2a290a to bb8b45a Compare February 20, 2026 21:59

timj reviewed Feb 20, 2026

View reviewed changes

TallJimbo force-pushed the tickets/DM-54083 branch 2 times, most recently from 8e0f2e2 to 42895ff Compare February 22, 2026 15:23

TallJimbo commented Feb 22, 2026

View reviewed changes

timj approved these changes Feb 23, 2026

View reviewed changes

TallJimbo force-pushed the tickets/DM-54083 branch from 78237d3 to bada147 Compare February 24, 2026 17:15

TallJimbo added 21 commits February 25, 2026 13:37

Add limited EXTVER support.

41e969a

Saving multiple HDUs with same now automatically increments EXTVER.

Add methods for converting between mask schemas.

1165f2a

Allow signed integer dtypes in Masks.

c162935

To be extra careful about overflow we drop one bit from each element in this case.

Reorder mask methods and add __eq__.

3549896

Make some docstrings more generic and inheritable.

c5b6bdb

Avoid writing types in reStructuredText when they'll be added to the docs via annotations. Avoid noting the serialize/deserialize signature in specific instances when it's a generic feature of the library.

Clean up Archive pointer generics in MaskedImage.read_fits.

1d45350

Raise ArchiveReadError when PSF packages can't be imported on read.

7baceea

Include inherited members in docs.

b8bac24

Return approximate FITS WCSs in Image and Mask, and cache them.

09da305

Move test check/comparison logic to convenience functions.

acfb190

Rework and expand image legacy converters.

494bdc3

Support converting butler recipes to FITS compression options.

ce8cc34

This includes adding support for an external compression seed to FitsOutputArchive, which we'll populate with a hash of a data ID in the formatter.

Add, use roundtrip test helper.

7bddf57

Add butler formatter implementation.

7a8a979

Add minimal VisitImage class.

81f384d

Add component-read tests.

a4811af

Make legacy readers usable via butler.

73077a1

This requires a tiny change in the formatters in obs_base, since we can't quite make everything work with an in-memory conversion.

TallJimbo added 2 commits February 25, 2026 13:39

Test serialization of Projection with no FITS approximation.

54b750b

Drop formatters from docs for now.

febd981

The inherited docstrings have way too many now-invalid relative links to other butler types, and users don't gain much from seeing them in the docs.

TallJimbo force-pushed the tickets/DM-54083 branch from 903079f to 2e59336 Compare February 25, 2026 18:39

TallJimbo added 8 commits February 25, 2026 17:55

Fix deserialization of MaskedImage.projection.

9d94630

Simplify copy on image types and remove MaskedImage.view.

ba152c2

These are not obviously useful in more than a handful of rare cases (where they're only a tiny bit more convenient than the constructor), and they're both full of boilerplate and potentially problematic for subclassing.

Strip FITS headers added by lsst.afw.image.VisitInfo.

001452f

Move EXTNAME/EXTVER stripping to one place.

235248c

Compare args and legacy headers instead of just overriding headers.

e36e882

Revert "Strip FITS headers added by lsst.afw.image.VisitInfo."

be91e9b

This reverts commit 56b8d2f.

Add option to attach FITS WCS when reading Image/Mask/MaskeDImage.

3afe06a

Support URIs in Image/Mask/MaskedImage read_legacy.

88df131

We can't extend this to VisitImage because we need to use lsst.afw.image.ExposureFitsReader there.

TallJimbo force-pushed the tickets/DM-54083 branch from 2e59336 to 88df131 Compare February 25, 2026 22:55

TallJimbo merged commit cea53f4 into main Feb 25, 2026
14 checks passed

TallJimbo deleted the tickets/DM-54083 branch February 25, 2026 22:58

	A FITs tile compression seed to use whenever the configured
	A FITS tile compression seed to use whenever the configured

Conversation

TallJimbo commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

timj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TallJimbo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

TallJimbo commented Feb 20, 2026 •

edited

Loading