Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43586: Add versioning to the fits module #32

Merged
merged 11 commits into from Apr 24, 2024
Merged

Conversation

arunkannawadi
Copy link
Member

No description provided.

@arunkannawadi arunkannawadi force-pushed the tickets/DM-43586 branch 3 times, most recently from 5e8c163 to b79cef1 Compare April 1, 2024 00:22
@arunkannawadi arunkannawadi marked this pull request as ready for review April 1, 2024 00:22
@arunkannawadi arunkannawadi force-pushed the tickets/DM-43586 branch 6 times, most recently from 4657d83 to 1a28756 Compare April 1, 2024 12:56
python/lsst/cell_coadds/_fits.py Outdated Show resolved Hide resolved
readable or not. Any changes to this module, even as trivial as formatting
changes and documentation updates should bump the version number (in this case,
it would bump the patch). This is mandated in GitHub Actions.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's obvious how one maps semver to I/O versioning; I think we need to spell out what is and is not supported under major, minor, and patch version bumps (and we might find that those are not sufficiently granular for the guarantees we might want to make, if both forwards- and backwards-compatibility and read/write permutations are in play).

But I also think going to all that effort might be premature, and maybe a single-integer version would be better for now? The most important thing is getting some kind of number into the files so future code doesn't have to guess about what it's dealing with.

I also have to admit that I'm a little skeptical of the value of the GitHub Action and the policy of bumping the patch version for all changes. Is this something you've seen in other codebases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm envisioning the semantic versioning to work in exactly the same way as it does for software. I suppose one assumption here is that the version with which we are reading a file is never older than the version used to write a file, since a write operation must preceed a read operation. An example of a major bump would be if we decide to use ImageHDUs instead of BinTableHDUs and drop the support for the latter. That would be a breaking change rendering the old files unreadable. Examples of minor bump would include this PR, supporting versions with or without VERSION in the header and substituting a default value if not found for newer items like day_obs and physical_filter we just introduced. The changes in all of the three examples aren't fundamentally different - in one case we decide to drop support, and for the others we don't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still rely on the developer doing the due diligence in deciding the correct version after a change. This is also enforceable but much harder. We could also call 0.x.y versions unstable in that we don't give any semver kind of guarantee until 1.0.0. Bumping minor versions for every change would be equivalent to just an integer numbering that you suggested. It seems wasteful to me to discard old data for any missing metadata. So this is definitely something we should support in production, even if not now.

@arunkannawadi arunkannawadi force-pushed the tickets/DM-43586 branch 3 times, most recently from d4fe428 to 0e02461 Compare April 4, 2024 19:26
@arunkannawadi
Copy link
Member Author

This is ready for another round of review, Jim.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid it seems we didn't land on the same page as much as we both thought after talking about it live. I'd still like to at least hope for a bit more forward-compatibility, and I'm still unconvinced about "semantic versioning" and especially "patch versions" being useful concepts here. (Major and minor versions are great, but I think semantic versioning implies a lot more than that.)


Changes to this module may require a bump to the version number denoted by the
module constant FILE_FORMAT_VERSION. A change to this module without bumping
the version would result in a failure in GitHub Actions to act as a reminder.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this policy explicit; how about:

  • when the on-disk format written by this code changes in a way that should not prevent any previous version of the code from reading it (though some new information may be dropped), there should be a minor version bump;
  • when the on-disk format written by this code changes in a way that will prevent a previous version of the code from being able to read it, there should be a major version bump.

Having written that I'm still not sure what a patch version bump is good for; maybe "changes that might change the on-disk format slightly but might not?" But that just sounds like an excuse to commit poorly-tested code.

And if we don't need a patch version, I definitely think the phrase "semantic versioning" raises more questions than it answers; thought I think that might be the case even if we had a reason for a patch version.

return False

if this_version.minor < other_version.minor:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels too pessimistic. It's shouldn't be that hard for us to ignore header keys, table columns, and HDUs we don't recognize.

python/lsst/cell_coadds/_fits.py Outdated Show resolved Hide resolved
@arunkannawadi
Copy link
Member Author

I'm afraid it seems we didn't land on the same page as much as we both thought after talking about it live.

My bad. My scant notes of our discussion partly to blame here. Let's revisit this again next week. I think versioning is important since this is a dataset that is likely to be bulk-downloaded and accessed outside of RSP so I'd like to get a good versioning scheme going on.

@arunkannawadi arunkannawadi force-pushed the tickets/DM-43586 branch 2 times, most recently from 6600c88 to eaf527b Compare April 23, 2024 15:58
@arunkannawadi
Copy link
Member Author

I've switched to using just major and minor versions, and removed any mention of semantic versioning. I've expanded on the policy on bumping versions and compatibility across versions is largely put in by hand. I'm hoping to wrap this up, barring any minor implementation changes after today.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not fully on board with the descriptions of what major and minor version bumps mean, though I do agree with your examples. So I'm hoping we're almost on the same page now.

doc/changes/README.rst Outdated Show resolved Hide resolved
module constant FILE_FORMAT_VERSION. The policy for bumping the version is:

1. When the on-disk format written by this module changes such that the
accompanying reader can still read files written by the previous version, then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think minor bumps are when older readers can still read files written by new writers. As you've described it here, the format is not changing; only the reader code is. If you don't want to try to rigorously support compatibility in that direction (as suggested in the examples), then we could say "old readers may be able to read new code", while major would mean they definitely cannot.

2. When the on-disk format written by this module changes in a way that will
prevent the reader from reading a previous version of the code, then there
should be a major bump. This holds even if the reader temporarily supports a
previous version with deprecation warnings.
Copy link
Member

@TallJimbo TallJimbo Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this qualifier renders the previous sentence moot. When we switch to writing a new file format but keep backwards-compatibility reader code around, I agree we want a major bump in the version embedded in the file. But it's not because we're declaring at that point that we'll someday remove the backwards-compatibility; it's because we're definitely breaking the ability of old readers to read the new file.

I think it's also true that major version bumps defined this way will tend to have read support deprecated and ultimately removed in future code releases, but there's nothing stopping us from dropping read support for particular minor versions in future code releases as well.

python/lsst/cell_coadds/_fits.py Show resolved Hide resolved
python/lsst/cell_coadds/_fits.py Outdated Show resolved Hide resolved
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this live, and we're pretty much on the same page; we'll leave major vs. minor to developer discretion, with intent to possibly deprecate being a strong indication that a major version bump is in order.

@arunkannawadi
Copy link
Member Author

Thinking about this more after the call, I am with your points stronger than I was during the call. I'll clean up the changes and merge by EOD tomorrow. Thank you for all the discussions, Jim.

@arunkannawadi arunkannawadi force-pushed the tickets/DM-43586 branch 2 times, most recently from 2d41d8e to e9504f7 Compare April 24, 2024 15:28
@arunkannawadi arunkannawadi merged commit 536658c into main Apr 24, 2024
7 checks passed
@arunkannawadi arunkannawadi deleted the tickets/DM-43586 branch April 24, 2024 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants