DM-40563: Persist ObservationIdentifiers #33

arunkannawadi · 2024-05-01T16:27:06Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes
updated the FILE_FORMAT_VERSION number correctly (if python/lsst/cell_coadds/_fits.py was modified)

arunkannawadi · 2024-05-06T15:53:51Z

python/lsst/cell_coadds/_fits.py

+                        visit=visit,
+                        detector=link_row["detector"],
+                        day_obs=visit_dict[visit].day_obs,
+                        physical_filter=visit_dict[visit].physical_filter,


I may have gone a bit too far in creating and using the VisitRecord dataclass to avoid any positional argument. visit_dict its scope only in this function and is not persisted. So I'd be okay dropping this.

arunkannawadi · 2024-05-06T15:55:21Z

python/lsst/cell_coadds/_fits.py

+    )
+    cell_recarray = np.rec.fromrecords(
+        recList=cell_records,
+        formats=None,  # formats has specified to please mypy. See numpy#26376.


Someday when my fix upstream makes its way to our rubin-env we can remove it specifying formats=None. Today is not that day.

arunkannawadi · 2024-05-06T16:03:56Z

python/lsst/cell_coadds/_single_cell_coadd.py

@@ -85,8 +85,7 @@ def __init__(
        self._common = common
        # Remove any duplicate elements in the input, sorted them and make
        # them an immutable sequence.
-        # TODO: Remove the conditioning in DM-40563.
-        self._inputs = tuple(sorted(set(inputs))) if inputs else ()


I'm intentionally dropping support for files generated with previous versions (since we are still in version 0.x) since I feel that it's easier to drop support for no inputs rather than maintaining this until we hit 1.0.

TallJimbo · 2024-05-06T15:56:18Z

python/lsst/cell_coadds/_fits.py

@@ -241,12 +259,47 @@ def readAsMultipleCellCoadd(self) -> MultipleCellCoadd:
            outer_cell_size = Extent2I(header["OCELL1"], header["OCELL2"])
            psf_image_size = Extent2I(header["PSFSIZE1"], header["PSFSIZE2"])

+            # Attempt to get inputs for each cell.
+            inputs = GridContainer[list[ObservationIdentifiers]](shape=grid.shape)
+            if written_version >= "0.3":


This kind of lexicographic comparison is not going to work if you ever hit 0.10. It might be better to switch all the version variables to tuple[int, int].

Or use packaging.version which is designed for this.

Switched to packaging.version. I had wanted to do this when having a patch number was under consideration but had dropped the idea since then.

TallJimbo · 2024-05-06T15:58:34Z

python/lsst/cell_coadds/_fits.py

+                    visit = link_row["visit"]
+                    obs_id = ObservationIdentifiers(
+                        instrument=header["INSTRUME"],
+                        packed=link_row["packed"],


Probably out of scope for this ticket, but I would like to drop "packed" from both the in-memory and on-disk forms. I think it'll end up more of a maintenance headache than a convenience.

TallJimbo · 2024-05-06T16:52:48Z

python/lsst/cell_coadds/_fits.py

+    assert len(instrument_set) == 1, "All cells must have the same instrument."
+    instrument = instrument_set.pop()
+
+    visit_recarray = np.rec.fromrecords(


Several years ago, I remember learning that numpy record arrays had some surprising performance overheads compared to using a regular numpy.ndarray with a structured dtype, and I've avoided them ever since, since the difference is just whether you use attribute access vs. item access (i.e. row["field_name"]). I have no idea if that's still the case.

I don't care as much about the attribute access vs. item access, but avoiding positional index access and also having to create several fits.Column objects. I'm inclined to leave this as is, and change it in the future if there is a noticeable overhead. Fortunately, this is purely internal and can be changed without affecting FILE_FORMAT_VERSION at all.

👍, it sounds like astropy does something special with numpy.recarray, which is not something I realized.

when inputs is None.

arunkannawadi force-pushed the tickets/DM-40563 branch 9 times, most recently from 73753dc to e584e5b Compare May 6, 2024 15:46

arunkannawadi requested a review from TallJimbo May 6, 2024 15:47

arunkannawadi marked this pull request as ready for review May 6, 2024 15:49

arunkannawadi commented May 6, 2024

View reviewed changes

TallJimbo approved these changes May 6, 2024

View reviewed changes

arunkannawadi force-pushed the tickets/DM-40563 branch from f9a0244 to ea41368 Compare May 6, 2024 19:32

arunkannawadi added 8 commits May 6, 2024 23:02

Autoupdate pre-commit-config

3d6f5c2

Return hdu_list from write method

6076e1f

Persist metadata about inputs

9fb9b89

Switch to FILTER from BAND

aa1f0b6

Bump FILE_FORMAT_VERSION

874c68e

Check equality for inputs when comparing two MultipleCellCoadd instances

4ab448e

Use a dataclass for VisitRecord

144cbf9

Add a note on backward compatibility

eb20205

when inputs is None.

arunkannawadi force-pushed the tickets/DM-40563 branch from ea41368 to eb20205 Compare May 7, 2024 03:02

arunkannawadi merged commit c90fb82 into main May 7, 2024
7 checks passed

arunkannawadi deleted the tickets/DM-40563 branch May 7, 2024 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-40563: Persist ObservationIdentifiers #33

DM-40563: Persist ObservationIdentifiers #33

arunkannawadi commented May 1, 2024 •

edited

arunkannawadi May 6, 2024

arunkannawadi May 6, 2024

arunkannawadi May 6, 2024

TallJimbo May 6, 2024

timj May 6, 2024

arunkannawadi May 6, 2024

TallJimbo May 6, 2024

TallJimbo May 6, 2024

arunkannawadi May 6, 2024

TallJimbo May 6, 2024

DM-40563: Persist ObservationIdentifiers #33

DM-40563: Persist ObservationIdentifiers #33

Conversation

arunkannawadi commented May 1, 2024 • edited

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arunkannawadi commented May 1, 2024 •

edited