Skip to content

DM-54780: add pure-JSON archive implementations#32

Merged
TallJimbo merged 11 commits intomainfrom
tickets/DM-54780
Apr 29, 2026
Merged

DM-54780: add pure-JSON archive implementations#32
TallJimbo merged 11 commits intomainfrom
tickets/DM-54780

Conversation

@TallJimbo
Copy link
Copy Markdown
Member

@TallJimbo TallJimbo commented Apr 28, 2026

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 58.18182% with 161 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.20%. Comparing base (cdb62ee) to head (945836d).
⚠️ Report is 12 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
python/lsst/images/json/_input_archive.py 55.76% 23 Missing ⚠️
python/lsst/images/json/formatters.py 0.00% 23 Missing ⚠️
tests/test_transforms.py 12.00% 22 Missing ⚠️
python/lsst/images/json/_output_archive.py 64.40% 21 Missing ⚠️
python/lsst/images/serialization/_tables.py 44.73% 21 Missing ⚠️
python/lsst/images/fits/_input_archive.py 51.61% 15 Missing ⚠️
python/lsst/images/fits/_output_archive.py 40.00% 12 Missing ⚠️
tests/test_psfs.py 9.09% 10 Missing ⚠️
python/lsst/images/serialization/_asdf_utils.py 61.90% 8 Missing ⚠️
python/lsst/images/tests/_roundtrip.py 92.10% 3 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #32      +/-   ##
==========================================
- Coverage   75.11%   74.20%   -0.92%     
==========================================
  Files          60       64       +4     
  Lines        6241     6478     +237     
==========================================
+ Hits         4688     4807     +119     
- Misses       1553     1671     +118     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TallJimbo
Copy link
Copy Markdown
Member Author

Local test coverage from diff-cover (with legacy packages and test data vailable) is:

-------------
python/lsst/images/_image.py (100%)
python/lsst/images/_mask.py (100%)
python/lsst/images/fits/_common.py (86.7%): Missing lines 75,83
python/lsst/images/fits/_input_archive.py (61.9%): Missing lines 210,212,244,263,275,307,309,314
python/lsst/images/fits/_output_archive.py (72.2%): Missing lines 279,281-284
python/lsst/images/fits/formatters.py (100%)
python/lsst/images/json/__init__.py (100%)
python/lsst/images/json/_input_archive.py (90.4%): Missing lines 95,105,111,122,133
python/lsst/images/json/_output_archive.py (89.8%): Missing lines 70,104,139-140,154,156
python/lsst/images/json/formatters.py (100%)
python/lsst/images/psfs/_legacy.py (100%)
python/lsst/images/psfs/_piff.py (100%)
python/lsst/images/serialization/_asdf_utils.py (61.9%): Missing lines 129,133,137-141,261
python/lsst/images/serialization/_common.py (100%)
python/lsst/images/serialization/_input_archive.py (75.0%): Missing lines 187
python/lsst/images/serialization/_output_archive.py (80.0%): Missing lines 340
python/lsst/images/serialization/_tables.py (71.1%): Missing lines 119,122,127-128,142,161,170,185-186,189-190
python/lsst/images/tests/_roundtrip.py (91.7%): Missing lines 283,295,298
tests/test_image.py (100%)
tests/test_psfs.py (100%)
tests/test_transforms.py (100%)
-------------
Total:   370 lines
Missing: 50 lines
Coverage: 86%
-------------

Most of what's missing is:

  • exception-raises for nearly-impossible conditions (mostly corrupted archives)
  • saving tables to archives via astropy.table rather than structured numpy arrays (usage and tests pending on DM-54225: Add cell coadd type and format #18).

@TallJimbo TallJimbo marked this pull request as ready for review April 28, 2026 16:52
Copy link
Copy Markdown
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I like how a second file format forces things to reorganize (and it makes me want to try an HDF5 variant using the Starlink NDF data model).

  • I think we need to call .inspect somewhere to show that it actually works for JSON tests (I don't think it does)
  • I have concerns that astropy.io.fits is turning up in the JSON API.

Comment thread python/lsst/images/serialization/_asdf_utils.py Outdated
Comment thread python/lsst/images/tests/_roundtrip.py Outdated
return self._exit_stack.enter_context(from_json(self.filename))

def _get_extension(self) -> str:
return ".fits"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be .json? Implies that we aren't testing this.

Comment thread python/lsst/images/tests/_roundtrip.py Outdated


class RoundtripJson[T](RoundtripBase):
def inspect(self) -> astropy.io.fits.HDUList:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy and paste error? Seems wrong for a JSON round trip.

Comment thread python/lsst/images/tests/_roundtrip.py Outdated
class RoundtripJson[T](RoundtripBase):
def inspect(self) -> astropy.io.fits.HDUList:
"""Read the JSON file as a dictionary."""
return self._exit_stack.enter_context(from_json(self.filename))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong to me. from_json takes bytes and not a filename. It also returns a dict so I don't think a context manager is needed at all.

I think this means that there are no test calls to this method.

Comment thread python/lsst/images/json/_input_archive.py
return TableReferenceModel(source=str(key), columns=columns)
for n, c in enumerate(columns, start=1):
assert isinstance(c.data, ArrayReferenceModel)
c.data.source = f"{key}[{n}]"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anyway to push this lower down (into TableColumnModel?) so that we don't have to do the identical source fixup in two places? Is see that TableReferenceModel did accept a source parameter.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I held off on that because I think it bakes a FITS-specific assumption into a more generic model, even though that's just a hypothetical concern now. For ASDF column-major tables in particular, there would be a different source for every column, because they'd go in different blocks.

key, reader = self._get_source_reader(ref)
if not isinstance(model.columns[0].data, ArrayReferenceModel):
raise ArchiveReadError("Inline array found where a reference array was expected.")
key, reader = self._get_source_reader(model.columns[0].data.source, is_table=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that in a table the first column can always be trusted to return the source.

Comment thread tests/test_image.py

def test_json_roundtrip(self) -> None:
"""Test saving a tiny image to pure JSON."""
image = Image(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add units to the test image?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had them originally and then removed them as a (lazy) way to get a little more test coverage. Turns out most of our tests have images with units and relatively few don't. That's orthogonal enough to the archive type (except that it's important for FITS to make sure BUNIT gets set) that I don't think it's worth a near-duplicate of this test to try it.



def read[T: Any](cls: type[T], target: ResourcePathExpression | ArchiveTree) -> ReadResult[T]:
"""Read an object from a FITS file.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a FITS file.


def write(
obj: Any,
filename: str | None = None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory for JSON this could be a URI to allow direct writes to S3 (through ResourcePath). I understand that since FITS couldn't do that (and neither can HDF5) that it might be easier to stick to files in the interface else you end up re-implementing the butler datastore "write to local file and then transfer to cloud" approach.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'm trying to make the write and read functions compatible where they can be without forcing them into a least-common-denominator interface, so accepting ResourcePathExpression here sounds fine.

@TallJimbo TallJimbo force-pushed the tickets/DM-54780 branch 3 times, most recently from a06361f to 9152e8a Compare April 29, 2026 18:57
This also includes:

- moving TableCellReferenceModel to the 'fits' subpackage, where it has
  been renamed to PointerModel to reflect the fact that it's only used
  there, and only as a pointer;

- adding support (at archive implementation discretion) for tables with
  inline arrays for columns;

The ASDF table data model gives each column a 'source' field, which is
flexibility we don't need for the FITS archives, since we really only
need a pointer to the full HDU.  But since we've got flexibility to
cook up whatever source strings we want, we can just invent a way to
append a column number (1-indexed, because FITS; note that the column
name is already nearby), and then strip that off entirely when we read
it to get the HDU EXTNAME[,EXTVER].
@TallJimbo TallJimbo force-pushed the tickets/DM-54780 branch 2 times, most recently from 48b6867 to 7a1834a Compare April 29, 2026 19:36
And rename the FITS 'write' method argument from 'filename' to 'path'
for consistency, even though that can't do URIs.
@TallJimbo TallJimbo merged commit af1a2a9 into main Apr 29, 2026
17 of 19 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-54780 branch April 29, 2026 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants