DM-54083: add VisitImage and butler integration#7
Conversation
e2a290a to
bb8b45a
Compare
timj
left a comment
There was a problem hiding this comment.
Some quick comments for a Friday afternoon. I'll look more on Monday
|
|
||
| def check_unhandled_parameters(self) -> None: | ||
| if self.file_descriptor.parameters: | ||
| raise RuntimeError("Parameters {") |
There was a problem hiding this comment.
Did you mean to write something else for the error message?
| import astropy.io.fits | ||
|
|
||
| try: | ||
| from lsst.daf.butler import Butler, DataCoordinate, DatasetRef, DatasetType |
There was a problem hiding this comment.
There's no harm in adding lsst-daf-butler to pyproject.toml as a tests optional dependency for the purposes of testing (I see it is in requirements.txt).
There was a problem hiding this comment.
I did add it to pyproject.toml as its own feature:
# Add feature for Butler support.
butler = ["lsst-daf-butler"]
because I figured some pip users might want butler support while others didn't. Is there still a reason to add it there as a tests feature dependency?
There was a problem hiding this comment.
Ok. I only suggested for tests since I saw it in the tests package.
| root = self._exit_stack.enter_context( | ||
| tempfile.TemporaryDirectory(ignore_cleanup_errors=True, delete=True) | ||
| ) | ||
| Butler.makeRepo(root) |
There was a problem hiding this comment.
| Butler.makeRepo(root) | |
| config = Butler.makeRepo(root) |
then below Butler.from_config(config, ...) is in theory more efficient.
| empty_data_id = DataCoordinate.make_empty(self.butler.dimensions) | ||
| for name, storage_class in self._kwargs.items(): | ||
| dataset_type = DatasetType(name, self.butler.dimensions.empty, storage_class) | ||
| self.butler.registry.registerDatasetType(dataset_type) |
There was a problem hiding this comment.
This line fails with KeyError Projection if you have the wrong butler set up. Maybe catch and add_note indicating that the butler is too old?
python/lsst/images/_image.py
Outdated
| compression | ||
| Compression options. | ||
| compression_seed | ||
| A FITs tile compression seed to use whenever the configured |
There was a problem hiding this comment.
| A FITs tile compression seed to use whenever the configured | |
| A FITS tile compression seed to use whenever the configured |
There are a few more "FITs" in the code.
python/lsst/images/_visit_image.py
Outdated
|
|
||
| # Inherited attributes are duplicated because that improves the docs | ||
| # (some limitation in the sphinx/pydantic integration), and these are | ||
| # import docs. |
| @classmethod | ||
| def read_fits(cls, url: ResourcePathExpression, *, bbox: Box | None = None) -> MaskedImage: | ||
| """Read a masked image from a FITS file. | ||
| """Read an image from a FITS file. |
There was a problem hiding this comment.
It seems like the doc string was right before?
There was a problem hiding this comment.
Yes, but I find myself copying these a lot, and since (informally) "image" applies to all of these types, I figured it's less likely I'll get one dramatically wrong by using that.
8e0f2e2 to
42895ff
Compare
TallJimbo
left a comment
There was a problem hiding this comment.
I've addressed your comments so far, as fixup! comments that I'll squash later.
One issue I've already spotted in starting work downstream of this ticket is that I think the kwargs on the view and copy methods of the various image types are trying to do too much; changing the units or the projection (especially on a view) is going to be a very rare thing and it's better to just ask users to use the constructor for that. Changing the start is an even bigger problem, because it's something the CellCoadd subclass of MaskedImage can't really support, which makes for a Liskov substitution error.
The mask schema is sort of an exception: copying a Mask or something holding a Mask in order to add a new mask plane seems like it might be really common. But for that we probably want a dedicated method that only copies the Mask and even then only when needed.
Since this affects both of the downstream tickets I'm planning to work on next (color images an cell coadds), I'm going to do it as one more commit on this branch, coming soon.
|
|
||
| @classmethod | ||
| def from_index_row(cls, row: np.ndarray) -> ExtensionKey: | ||
| """Construct from a row of the index binary table appended to the |
There was a problem hiding this comment.
It's because I never turned it on. And while I don't like not having a lint for getting the Parameters wrong, I kind of do like not having a lint for cases like this where there's only one parameter and its docs would just be a repeat of the function docstring; I'm not convinced we need Parameters there, just like I don't think we need always Returns when the type is provided by the annotation.. But maybe that's an option I can set.
| header.set(key, value) | ||
|
|
||
|
|
||
| class ComponentSentinal(enum.Enum): |
There was a problem hiding this comment.
This word and "writable" are my nemeses. I feel I've got spelling under control except for those two 🤦♂️.
| import astropy.io.fits | ||
|
|
||
| try: | ||
| from lsst.daf.butler import Butler, DataCoordinate, DatasetRef, DatasetType |
There was a problem hiding this comment.
I did add it to pyproject.toml as its own feature:
# Add feature for Butler support.
butler = ["lsst-daf-butler"]
because I figured some pip users might want butler support while others didn't. Is there still a reason to add it there as a tests feature dependency?
| @classmethod | ||
| def read_fits(cls, url: ResourcePathExpression, *, bbox: Box | None = None) -> MaskedImage: | ||
| """Read a masked image from a FITS file. | ||
| """Read an image from a FITS file. |
There was a problem hiding this comment.
Yes, but I find myself copying these a lot, and since (informally) "image" applies to all of these types, I figured it's less likely I'll get one dramatically wrong by using that.
timj
left a comment
There was a problem hiding this comment.
Looks great. I ran out of steam before the tests since I have a load of things piling up this afternoon, but I don't want to block you on me not getting around to the test suite.
| of a legacy (`lsst.afw.image`) FITS file. | ||
| """ | ||
| primary_header = header.copy(strip=True) | ||
| # No idea what these spare TAN-SIP headers are doing in the afw |
There was a problem hiding this comment.
There are tests in the afw test code looking for them...
There was a problem hiding this comment.
I see afw tests looking for A_ORDER and B_ORDER when they appear as part of actual TAN-SIP WCSs, presumably in headers for actual image HDUs. What's bizarre here is that:
- this is an empty primary HDU, so it has no business having a WCS;
- there are no other WCS keys, just
A_ORDERandB_ORDER.
| def strip_legacy_exposure_cards(header: astropy.io.fits.Header) -> None: | ||
| """Strip header keywords added by lsst.afw.image.Exposure.""" | ||
| header.remove("AR_HDU", ignore_missing=True) | ||
| for name in ( |
There was a problem hiding this comment.
For the record, these are the VisitInfo headers.
std::vector<std::string> keyList = {"EXPTIME", "DARKTIME", "DATE-AVG", "TIMESYS",
"TIME-MID", "MJD-AVG-UT1", "AVG-ERA", "BORE-RA",
"BORE-DEC", "BORE-AZ", "BORE-ALT", "BORE-AIRMASS",
"BORE-ROTANG", "ROTTYPE", "OBS-LONG", "OBS-LAT",
"OBS-ELEV", "AIRTEMP", "AIRPRESS", "HUMIDITY",
"INSTRUMENT", "IDNUM", "FOCUSZ", "OBSTYPE",
"PROGRAM", "REASON", "OBJECT", "HAS-SIMULATED-CONTENT"};and annoyingly they clash with some of the LSSTCam raw headers.
There was a problem hiding this comment.
If they clash with the LSSTCam raw headers, I'm guessing I can't just blindly strip them all? Or has the damage already been done by the time we get to visit_image (i.e. raw header overwritten by VisitInfo), so I should just strip them?
If necessary I can cook up some way to get a few more header keys from the raws when I rewrite the visit_image datasets, especially if this means we have to extract the ObservationInfo from the raw rather than the old visit_image.
There was a problem hiding this comment.
The damage has been done in that the visit info overwrites. It's a nightmare and we long ago talked about fixing visit info to not do that but never got around to it.
| page_size | ||
| Minimum number of bytes to read at at once. Making this a multiple | ||
| of the FITS block size (2880) is recommended. | ||
| partial |
There was a problem hiding this comment.
When I first looked at this I assumed partial meant that the file would be left open so that additional reads can happen efficiently to minimize memory pressure. I see now that it means that you can read a subset to save on memory but the file is closed.
There was a problem hiding this comment.
Right. I very much had implementing Formatter interfaces in mind here.
| if name not in self._opaque_metadata.headers: | ||
| if key not in self._opaque_metadata.headers: | ||
| opaque_header = reader.header.copy(strip=True) | ||
| opaque_header.remove("EXTNAME", ignore_missing=True) |
There was a problem hiding this comment.
Not overly happy with this stripping being repeated in many places.
There was a problem hiding this comment.
I'm trying to keep header stripping of a particular keyword in the same class that adds that keyword on write (or converts the legacy type that added that keyword), which is part of the reason it's spread out. But I might be able to put EXTNAME and EXTVER stripping in particular somewhere it'd be more reusable.
|
|
||
| INVALID_COMPONENT_MODEL = enum.auto() | ||
| """This formatter recognizes the given component, but the expected | ||
| attribute of the top-level `..serialization.ArchiveTree` did not exist |
There was a problem hiding this comment.
I'm pretty sure .. doesn't work for sphinx, or at least that's what I learned when I was trying to fix warnings in daf_butler. I think this has no impact because ComponentSentinel or any of the formatter classes are not in the docs.
There was a problem hiding this comment.
I'm pretty sure .. does work, but it's relative to Sphinx's hierarchy, not Python's, which are not always the same, and of course it doesn't work if you put it in a base class docstring and that gets inherited in another module.
But yeah, we can't use this as a test case until we get the daf_butler docs in good enough shape to allow a downstream subclass like this to inherit those docstrings in another package.
There was a problem hiding this comment.
Interesting. I will need to revisit. I only managed to get a single dot working last time I tried but that might have been because all the bonus packages in butler confuse things.
| return np.dtype(t.to_numpy()).kind == "u" | ||
| def is_integer(t: NumberType) -> TypeGuard[IntegerType]: | ||
| """Test whether a `NumberType` corresponds to an integer type.""" | ||
| match np.dtype(t.to_numpy()).kind: |
There was a problem hiding this comment.
Is match that much better than:
return np.dtype(t.to_numpy()).kind in ("i", "u")?
| bit = self.bit(plane.name) | ||
| if bit.index != 0: | ||
| raise TypeError("Only mask schemas with mask_size==1 can be described in FITS.") | ||
| header.set(f"MSKN{n + 1:04d}", plane.name, f"Name for mask plane {n + 1}.") |
There was a problem hiding this comment.
Is MSKN and lsst.images convention or a broader convention? We should be documenting these somewhere. In some sense it's more explicit if we use HIERARCH LSST IMAGES MASK NAME and then everyone knows what's happening and that it's not a standard.
There was a problem hiding this comment.
This is a new lsst.images convention I just invented now. I figured if we were going to try to push for it to become a registered convention at some point it'd be better as a non-HIERARCH keyword, but I'm happy to defer to you and @gpdf on that point, and I agree that if we're not going to push to make it a registered convention then a longer name with HIERARCH is better.
python/lsst/images/_visit_image.py
Outdated
| raise ValueError( | ||
| "Visit ID could not be found in butler data ID metadata and must be provided." | ||
| ) from None | ||
| if unit is None: |
There was a problem hiding this comment.
I am concerned that this can mean you can specify a unit that doesn't match the BUNIT and confusion would reign.
There was a problem hiding this comment.
I'll add a consistency check here.
python/lsst/images/_visit_image.py
Outdated
| md = legacy.getMetadata() | ||
| if instrument is None: | ||
| try: | ||
| instrument = str(md["LSST BUTLER DATAID INSTRUMENT"]) |
There was a problem hiding this comment.
It's also going to be in the INSTRUME header. Same worry below that the file could say LATISS but you pass in LSSTCam.
There was a problem hiding this comment.
Do you know if that would have the exact same value? I was hoping to consistently use the butler short name, i.e. the one in data IDs.
And if they're different variants of the same name it gets a lot harder to do a consistency check.
There was a problem hiding this comment.
LSSTCam instrume does not match the butler name so in retrospect it's not great to use it. You'd have to throw the header at a metadata translator and ask it for the standardized name but that's already in the provenance and the visit info serialization.
|
|
||
| Parameters | ||
| ---------- | ||
| filename |
There was a problem hiding this comment.
In theory this can be a ResourcePathExpression and you can open the file with
fs, fspath = ResourcePath(filename).to_fsspec()
with fs.open(fspath) as f, astropy.io.fits.open(f) as fits_obj:
...we do that in dax_images_cutout.
There was a problem hiding this comment.
Unfortunately that only works for the Astropy reads here; there's also some ExposureFitsReader usage that requires a local file.
There was a problem hiding this comment.
I was hoping you had written the ability for an astropy user to read in DP1 visit images without needing afw...
There was a problem hiding this comment.
Not for a full Exposure -> VisitImage. But I can go apply this suggestion to MaskedImage.read_legacy, Image.read_legacy, and Mask.read_legacy (since those don't need to use afw readers), and give them an option to attach the FITS header WCSs as their Projection, and that would provide a way for Astropy users to read the most user-relevant bits of a DP1 visit image.
There was a problem hiding this comment.
I've pushed two more commits to enable this (reading URIs, attaching FITS WCS) for Image/Mask/MaskedImage. I didn't add anything to the legacy formatters in obs_base to use it, but in theory we've now got at a low level what we need for (e.g.) the DP1 cutout service to return one of those without completely dropping the WCS.
78237d3 to
bada147
Compare
|
I don't think we can strip visit info unless we are creating observation info. |
Saving multiple HDUs with same now automatically increments EXTVER.
To be extra careful about overflow we drop one bit from each element in this case.
This is primarily motivated by Firefly's preference for multiple int32 HDUs vs. a data cube, since they also need to support naturally higher-dimension bitmasks. But it also ties in nicely with CFITSIO and Astropy using int32 as the intermediate type for all image compression.
Avoid writing types in reStructuredText when they'll be added to the docs via annotations. Avoid noting the serialize/deserialize signature in specific instances when it's a generic feature of the library.
This sets Pydantic config to round-trip inf/NaN and bytes properly through JSON. The Pydantic docs claim these operate on each model in the hierarchy independently (i.e. so a model could set the configs only if it saves values that they would apply to), but this is demonstrably not true for at least the inf/nan case, where the option needs to be set on the outermost model to have any effect. Safest to just set them everywhere.
Instead of passing this as a constructor argument, we just assign to objects after creation. This involves more boilerplate, but it's more consistent with treating that metadata as private. This change also fixes a few spots where archive metadata was not being propagated.
We can now serialize any type with both serialize and deserialize methods and the new _get_archive_tree_type method. The Image, Mask, and MaskedImage classes have read_fits/write_fits wrappers for convenience and type-checking (explicit keyword arguments instead of **kwargs).
This includes adding support for an external compression seed to FitsOutputArchive, which we'll populate with a hash of a data ID in the formatter.
This requires a tiny change in the formatters in obs_base, since we can't quite make everything work with an in-memory conversion.
The inherited docstrings have way too many now-invalid relative links to other butler types, and users don't gain much from seeing them in the docs.
903079f to
2e59336
Compare
These are not obviously useful in more than a handful of rare cases (where they're only a tiny bit more convenient than the constructor), and they're both full of boilerplate and potentially problematic for subclassing.
This reverts commit 56b8d2f.
We can't extend this to VisitImage because we need to use lsst.afw.image.ExposureFitsReader there.
2e59336 to
88df131
Compare
Checklist
doc/changes