Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-40563: Persist ObservationIdentifiers #33

Merged
merged 8 commits into from May 7, 2024
Merged

Conversation

arunkannawadi
Copy link
Member

@arunkannawadi arunkannawadi commented May 1, 2024

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes
  • updated the FILE_FORMAT_VERSION number correctly (if python/lsst/cell_coadds/_fits.py was modified)

@arunkannawadi arunkannawadi force-pushed the tickets/DM-40563 branch 9 times, most recently from 73753dc to e584e5b Compare May 6, 2024 15:46
@arunkannawadi arunkannawadi marked this pull request as ready for review May 6, 2024 15:49
visit=visit,
detector=link_row["detector"],
day_obs=visit_dict[visit].day_obs,
physical_filter=visit_dict[visit].physical_filter,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have gone a bit too far in creating and using the VisitRecord dataclass to avoid any positional argument. visit_dict its scope only in this function and is not persisted. So I'd be okay dropping this.

)
cell_recarray = np.rec.fromrecords(
recList=cell_records,
formats=None, # formats has specified to please mypy. See numpy#26376.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someday when my fix upstream makes its way to our rubin-env we can remove it specifying formats=None. Today is not that day.

@@ -85,8 +85,7 @@ def __init__(
self._common = common
# Remove any duplicate elements in the input, sorted them and make
# them an immutable sequence.
# TODO: Remove the conditioning in DM-40563.
self._inputs = tuple(sorted(set(inputs))) if inputs else ()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm intentionally dropping support for files generated with previous versions (since we are still in version 0.x) since I feel that it's easier to drop support for no inputs rather than maintaining this until we hit 1.0.

@@ -241,12 +259,47 @@ def readAsMultipleCellCoadd(self) -> MultipleCellCoadd:
outer_cell_size = Extent2I(header["OCELL1"], header["OCELL2"])
psf_image_size = Extent2I(header["PSFSIZE1"], header["PSFSIZE2"])

# Attempt to get inputs for each cell.
inputs = GridContainer[list[ObservationIdentifiers]](shape=grid.shape)
if written_version >= "0.3":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of lexicographic comparison is not going to work if you ever hit 0.10. It might be better to switch all the version variables to tuple[int, int].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use packaging.version which is designed for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to packaging.version. I had wanted to do this when having a patch number was under consideration but had dropped the idea since then.

visit = link_row["visit"]
obs_id = ObservationIdentifiers(
instrument=header["INSTRUME"],
packed=link_row["packed"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably out of scope for this ticket, but I would like to drop "packed" from both the in-memory and on-disk forms. I think it'll end up more of a maintenance headache than a convenience.

assert len(instrument_set) == 1, "All cells must have the same instrument."
instrument = instrument_set.pop()

visit_recarray = np.rec.fromrecords(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several years ago, I remember learning that numpy record arrays had some surprising performance overheads compared to using a regular numpy.ndarray with a structured dtype, and I've avoided them ever since, since the difference is just whether you use attribute access vs. item access (i.e. row["field_name"]). I have no idea if that's still the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care as much about the attribute access vs. item access, but avoiding positional index access and also having to create several fits.Column objects. I'm inclined to leave this as is, and change it in the future if there is a noticeable overhead. Fortunately, this is purely internal and can be changed without affecting FILE_FORMAT_VERSION at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, it sounds like astropy does something special with numpy.recarray, which is not something I realized.

@arunkannawadi arunkannawadi merged commit c90fb82 into main May 7, 2024
7 checks passed
@arunkannawadi arunkannawadi deleted the tickets/DM-40563 branch May 7, 2024 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants