Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43586: Add versioning to the fits module #32

Merged
merged 11 commits into from
Apr 24, 2024
5 changes: 5 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
## Checklist

- [ ] ran Jenkins
- [ ] added a release note for user-visible changes to `doc/changes`
- [ ] updated the FILE_FORMAT_VERSION number correctly (if `python/lsst/cell_coadds/_fits.py` was modified)
1 change: 1 addition & 0 deletions doc/changes/DM-43516.api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Allowed `inputs` argument when constructing a `SingleCellCoadd` instance to be any iterable, not just a frozenset.
1 change: 1 addition & 0 deletions doc/changes/DM-43516.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Included information about `day_obs` and `physical_filter` in `ObservationIdentifiers` so those metadata are available when reading a file.
1 change: 1 addition & 0 deletions doc/changes/DM-43586.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Started including a semantic version number for the file format.
22 changes: 22 additions & 0 deletions doc/changes/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Recording Changes
=================

This directory contains "news fragments" which are small files containing text that will be integrated into release notes.
The files can be in restructured text format or plain text.

Each file should be named like ``<JIRA TICKET>.<TYPE>`` with a file extension defining the markup format.
The ``<TYPE>`` should be one of:

* ``feature``: New feature
* ``bugfix``: A bug fix.
* ``api``: An API change.
* ``perf``: A performance enhancement.
* ``doc``: A documentation improvement.
* ``removal``: An API removal or deprecation.
* ``other``: Other Changes and Additions of interest to general users.
* ``misc``: Changes that are of minor interest.

An example file name would therefore look like ``DM-30291.misc.rst``.

You can test how the content will be integrated into the release notes by running ``towncrier --draft --version=V.vv``.
``towncrier`` can be installed from PyPI or conda-forge.
138 changes: 136 additions & 2 deletions python/lsst/cell_coadds/_fits.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,71 @@
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.

"""Module to handle FITS serialization and de-serialization.

The routines to write and read the files are in the same module, as a change to
one is typically accompanied by a corresponding change to another. Code changes
relating to writing the file must bump to the version number denoted by the
module constant FILE_FORMAT_VERSION.

Although the typical use case is for newer versions of the code to read files
written by an older version, for the purposes of deciding the newer version
string, it is helpful to think about an older version of the reader attempting
to read a newer version of the file on disk. The policy for bumping the version
is as follows:

1. When the on-disk file format written by this module changes such that the
previous version of the reader can still read files written by the newer
version, then there should be a minor bump.

2. When the on-disk format written by this module changes in a way that will
prevent the previous version of the reader from reading a file produced by the
current version of the module, then there should be a major bump. This usually
means that the new version of the reader cannot read older file either,
save the temporary support with deprecation warnings, possibly until a new
release of the Science Pipelines is made.

Examples
--------
1. A file with VERSION=1.3 should still be readable by the reader in
this module when the module-level constant FILE_FORMAT_VERSION=1.4. A file
written with VERSION=1.4 will typically be readable by a reader when the
module-level FILE_FORMAT_VERSION=1.3, although such a use case is not expected.
A concrete example of change
that requires only a minor bump is adding another BinTable that keeps track of
the input visits.

2. An example of major change would be migrating from using
BinTableHDU to ImageHDU to save data. Even if the reader supports reading
either of this formats based on the value of VERSION from the header, it should
be a major change because the previous version of the reader cannot read data
from ImageHDUs.

Unit tests only check that a file written can be read by the concurrent version
of the module, but not by any of the previous ones. Hence, bumping
FILE_FORMAT_VERSION to the appropriate value is ultimately at the discretion of
the developers.

A major bump must also be recorded in the `isCompatibleWith` method.
It is plausible that different (non-consequent) major format versions can be
read by the same reader (due to reverting back to an earlier format, or to
something very similar). `isCompatibleWith` method offers the convenience of
checking if a particular format version can be read by the current reader.

Note that major version 0 is considered unstable and experimental and none of
the guarantee above applies.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's obvious how one maps semver to I/O versioning; I think we need to spell out what is and is not supported under major, minor, and patch version bumps (and we might find that those are not sufficiently granular for the guarantees we might want to make, if both forwards- and backwards-compatibility and read/write permutations are in play).

But I also think going to all that effort might be premature, and maybe a single-integer version would be better for now? The most important thing is getting some kind of number into the files so future code doesn't have to guess about what it's dealing with.

I also have to admit that I'm a little skeptical of the value of the GitHub Action and the policy of bumping the patch version for all changes. Is this something you've seen in other codebases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm envisioning the semantic versioning to work in exactly the same way as it does for software. I suppose one assumption here is that the version with which we are reading a file is never older than the version used to write a file, since a write operation must preceed a read operation. An example of a major bump would be if we decide to use ImageHDUs instead of BinTableHDUs and drop the support for the latter. That would be a breaking change rendering the old files unreadable. Examples of minor bump would include this PR, supporting versions with or without VERSION in the header and substituting a default value if not found for newer items like day_obs and physical_filter we just introduced. The changes in all of the three examples aren't fundamentally different - in one case we decide to drop support, and for the others we don't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still rely on the developer doing the due diligence in deciding the correct version after a change. This is also enforceable but much harder. We could also call 0.x.y versions unstable in that we don't give any semver kind of guarantee until 1.0.0. Bumping minor versions for every change would be equivalent to just an integer numbering that you suggested. It seems wasteful to me to discard old data for any missing metadata. So this is definitely something we should support in production, even if not now.


from __future__ import annotations

__all__ = (
"CellCoaddFitsFormatter",
"CellCoaddFitsReader",
"IncompatibleVersionError",
"writeMultipleCellCoaddAsFits",
)

import logging
import os
from collections.abc import Mapping
from typing import Any
Expand All @@ -47,6 +104,19 @@
from ._multiple_cell_coadd import MultipleCellCoadd, SingleCellCoadd
from ._uniform_grid import UniformGrid

FILE_FORMAT_VERSION = "0.2"
"""Version number for the file format as persisted, presented as a string of
the form M.m, where M is the major version, m is the minor version.
"""

logger = logging.getLogger(__name__)


class IncompatibleVersionError(RuntimeError):
"""Exception raised when the CellCoaddFitsReader version is not compatible
with the FITS file attempted to read.
"""


class CellCoaddFitsFormatter(FitsGenericFormatter):
"""Interface for writing and reading cell coadds to/from FITS files.
Expand All @@ -69,17 +139,75 @@ class CellCoaddFitsReader:
The name of the FITS file to read.
"""

# Minimum and maximum compatible file format versions are listed as
# iterables so as to allow for discontiguous intervals.
MINIMUM_FILE_FORMAT_VERSIONS = ("0.1",)
MAXIMUM_FILE_FORMAT_VERSIONS = ("1.0",)

def __init__(self, filename: str) -> None:
if not os.path.exists(filename):
raise FileNotFoundError(f"File {filename} not found")

self.filename = filename

@classmethod
def isCompatibleWith(cls, written_version: str, /) -> bool:
"""Check if the serialization version is compatible with the reader.

This is a convenience method to ask if the current version of this
class can read a file, based on the VERSION in its header.

Parameters
----------
written_version: `str`
The VERSION of the file to be read.

Returns
-------
compatible : `bool`
Whether the reader can read a file whose VERSION is
``written_version``.

Notes
-----
This accepts the other version as a positional argument only.
"""
for min_version, max_version in zip(
cls.MINIMUM_FILE_FORMAT_VERSIONS,
cls.MAXIMUM_FILE_FORMAT_VERSIONS,
strict=True,
):
if min_version <= written_version < max_version:
arunkannawadi marked this conversation as resolved.
Show resolved Hide resolved
return True

return False

def readAsMultipleCellCoadd(self) -> MultipleCellCoadd:
"""Read the FITS file as a MultipleCellCoadd object."""
"""Read the FITS file as a MultipleCellCoadd object.

Raises
------
IncompatibleError
Raised if the version of this module that wrote the file is
incompatible with this module that is reading it in.
"""
with fits.open(self.filename) as hdu_list:
data = hdu_list[1].data
header = hdu_list[1].header
written_version = header.get("VERSION", "0.1")
if not self.isCompatibleWith(written_version):
raise IncompatibleVersionError(
f"{self.filename} was written with version {written_version}"
f"but attempting to read it with a reader designed for {FILE_FORMAT_VERSION}"
)
if written_version != FILE_FORMAT_VERSION:
logger.info(
"Reading %s having version %s with reader designed for %s",
self.filename,
written_version,
FILE_FORMAT_VERSION,
)

data = hdu_list[1].data

# Read in WCS
ps = PropertySet()
Expand Down Expand Up @@ -245,6 +373,11 @@ def writeMultipleCellCoaddAsFits(
Whether to overwrite the file if it already exists?
metadata : `~lsst.daf.base.PropertySet`, optional
Additional metadata to write to the FITS file.

Notes
-----
Changes to this function that modify the way the file is written to disk
must be accompanied with a change to FILE_FORMAT_VERSION.
"""
cell_id = fits.Column(
name="cell_id",
Expand Down Expand Up @@ -327,6 +460,7 @@ def writeMultipleCellCoaddAsFits(
primary_hdu = fits.PrimaryHDU()
primary_hdu.header.extend(wcs_cards)

hdu.header["VERSION"] = FILE_FORMAT_VERSION
hdu.header["TUNIT1"] = multiple_cell_coadd.common.units.name
# This assumed to be the same as multiple_cell_coadd.common.identifers.band
# See DM-38843.
Expand Down
54 changes: 54 additions & 0 deletions tests/test_fits.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# This file is part of cell_coadds.
#
# Developed for the LSST Data Management System.
# This product includes software developed by the LSST Project
# (https://www.lsst.org).
# See the COPYRIGHT file at the top-level directory of this distribution
# for details of code ownership.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.

import unittest

import lsst.utils.tests
from lsst.cell_coadds import CellCoaddFitsReader
from lsst.cell_coadds._fits import FILE_FORMAT_VERSION


class FitsTestCase(lsst.utils.tests.TestCase):
"""Class for testing FITS specific aspects.

Some FITS specific aspects are also implemented in test_coadds.py
"""

def test_read_compatibility(self):
"""Test that the isCompatibleWith method works."""
self.assertTrue(CellCoaddFitsReader.isCompatibleWith("0.1"))
self.assertFalse(CellCoaddFitsReader.isCompatibleWith("12345.67"))

# Check that the reader is compatible with itself.
self.assertTrue(CellCoaddFitsReader.isCompatibleWith(FILE_FORMAT_VERSION))


class TestMemory(lsst.utils.tests.MemoryTestCase):
"""Check for resource/memory leaks."""


def setup_module(module): # noqa: D103
lsst.utils.tests.init()


if __name__ == "__main__":
lsst.utils.tests.init()
unittest.main()