DM-54780: add pure-JSON archive implementations#32
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #32 +/- ##
==========================================
- Coverage 75.11% 74.20% -0.92%
==========================================
Files 60 64 +4
Lines 6241 6478 +237
==========================================
+ Hits 4688 4807 +119
- Misses 1553 1671 +118 ☔ View full report in Codecov by Sentry. |
79a4417 to
f6ecdca
Compare
|
Local test coverage from Most of what's missing is:
|
timj
left a comment
There was a problem hiding this comment.
Looks good. I like how a second file format forces things to reorganize (and it makes me want to try an HDF5 variant using the Starlink NDF data model).
- I think we need to call
.inspectsomewhere to show that it actually works for JSON tests (I don't think it does) - I have concerns that astropy.io.fits is turning up in the JSON API.
| return self._exit_stack.enter_context(from_json(self.filename)) | ||
|
|
||
| def _get_extension(self) -> str: | ||
| return ".fits" |
There was a problem hiding this comment.
Shouldn't this be .json? Implies that we aren't testing this.
|
|
||
|
|
||
| class RoundtripJson[T](RoundtripBase): | ||
| def inspect(self) -> astropy.io.fits.HDUList: |
There was a problem hiding this comment.
Copy and paste error? Seems wrong for a JSON round trip.
| class RoundtripJson[T](RoundtripBase): | ||
| def inspect(self) -> astropy.io.fits.HDUList: | ||
| """Read the JSON file as a dictionary.""" | ||
| return self._exit_stack.enter_context(from_json(self.filename)) |
There was a problem hiding this comment.
This looks wrong to me. from_json takes bytes and not a filename. It also returns a dict so I don't think a context manager is needed at all.
I think this means that there are no test calls to this method.
| return TableReferenceModel(source=str(key), columns=columns) | ||
| for n, c in enumerate(columns, start=1): | ||
| assert isinstance(c.data, ArrayReferenceModel) | ||
| c.data.source = f"{key}[{n}]" |
There was a problem hiding this comment.
Is there anyway to push this lower down (into TableColumnModel?) so that we don't have to do the identical source fixup in two places? Is see that TableReferenceModel did accept a source parameter.
There was a problem hiding this comment.
I held off on that because I think it bakes a FITS-specific assumption into a more generic model, even though that's just a hypothetical concern now. For ASDF column-major tables in particular, there would be a different source for every column, because they'd go in different blocks.
| key, reader = self._get_source_reader(ref) | ||
| if not isinstance(model.columns[0].data, ArrayReferenceModel): | ||
| raise ArchiveReadError("Inline array found where a reference array was expected.") | ||
| key, reader = self._get_source_reader(model.columns[0].data.source, is_table=True) |
There was a problem hiding this comment.
Maybe add a comment that in a table the first column can always be trusted to return the source.
|
|
||
| def test_json_roundtrip(self) -> None: | ||
| """Test saving a tiny image to pure JSON.""" | ||
| image = Image( |
There was a problem hiding this comment.
Maybe add units to the test image?
There was a problem hiding this comment.
I actually had them originally and then removed them as a (lazy) way to get a little more test coverage. Turns out most of our tests have images with units and relatively few don't. That's orthogonal enough to the archive type (except that it's important for FITS to make sure BUNIT gets set) that I don't think it's worth a near-duplicate of this test to try it.
|
|
||
|
|
||
| def read[T: Any](cls: type[T], target: ResourcePathExpression | ArchiveTree) -> ReadResult[T]: | ||
| """Read an object from a FITS file. |
|
|
||
| def write( | ||
| obj: Any, | ||
| filename: str | None = None, |
There was a problem hiding this comment.
In theory for JSON this could be a URI to allow direct writes to S3 (through ResourcePath). I understand that since FITS couldn't do that (and neither can HDF5) that it might be easier to stick to files in the interface else you end up re-implementing the butler datastore "write to local file and then transfer to cloud" approach.
There was a problem hiding this comment.
Good idea. I'm trying to make the write and read functions compatible where they can be without forcing them into a least-common-denominator interface, so accepting ResourcePathExpression here sounds fine.
a06361f to
9152e8a
Compare
This also includes: - moving TableCellReferenceModel to the 'fits' subpackage, where it has been renamed to PointerModel to reflect the fact that it's only used there, and only as a pointer; - adding support (at archive implementation discretion) for tables with inline arrays for columns; The ASDF table data model gives each column a 'source' field, which is flexibility we don't need for the FITS archives, since we really only need a pointer to the full HDU. But since we've got flexibility to cook up whatever source strings we want, we can just invent a way to append a column number (1-indexed, because FITS; note that the column name is already nearby), and then strip that off entirely when we read it to get the HDU EXTNAME[,EXTVER].
48b6867 to
7a1834a
Compare
And rename the FITS 'write' method argument from 'filename' to 'path' for consistency, even though that can't do URIs.
7a1834a to
945836d
Compare
Checklist
doc/changes