New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-28716: Add pybind wrapper to afw archive that handles adding set of HDUs #588
Conversation
b138543
to
8d8e8a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good but I do have some things that need changing:
- Do not set EXTVER -- let the caller do it and add a note about that until we can sort out a real API. I worry that if people call persist multiple times into a single file we'll be opening ourselves up to confusion. Does the Stamps writing work fine on this PR if EXTVER is continually being forced to 1?
- Add some tests that we are getting a good EXTNAME. I think it might be worth doing a test with astropy.io.fits since if you do that you will be assured of wider compatibility because you can use the
hdu.name
property directly without reading the header explicitly for EXTNAME. PropertyList.set()
vsPropertyList.add()
confusion.
using PyInputArchive = py::class_<InputArchive, std::shared_ptr<InputArchive>>; | ||
|
||
void wrapInputArchive(utils::python::WrapperCollection &wrappers) { | ||
// TODO: uncomment once afw.fits uses WrapperCollection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to the Jira ticket that is going to be doing this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with Kryzstof; this is actually out of date as of DM-20703. Removed.
src/image/MaskedImage.cc
Outdated
@@ -454,6 +454,8 @@ void processPlaneMetadata(std::shared_ptr<daf::base::PropertySet const> metadata | |||
} | |||
hdr->set("INHERIT", true); | |||
hdr->set("EXTTYPE", exttype); | |||
hdr->set("EXTNAME", exttype); | |||
hdr->set("EXTVER", 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to have to have a scheme that allows the EXTVER to be set by the caller somehow if they are writing a new set of images into an existing FITS file. I think we should probably not set EXTVER in the fits writer at all and leave it unset. That way the caller can manually set it to something and we can think later about how to explicitly control EXTVER when persisting at some other time. I'm pretty sure this all means we should not be forcing EXTVER anywhere on this PR (except maybe leaving something in the writeFits docs to say that the caller needs to set EXTVER if they are doing multiple writes to a single file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed places where EXTVER was set. As for a mention in the docs, I assume it would be in the writeFits of Exposure...? So there? Should I add it for every overload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know the best place since adding it to every writeFits
is going to be painful. I'm not familiar with the documentation in this part of the code. Maybe @TallJimbo can advise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no good way to let the user control EXTVER that I can see; anything we'd do would be a big API change. But I also don't see why they would need to; if they've got more than one archive in a single FITS file they're using the library incorrectly anyway.
src/table/io/OutputArchive.cc
Outdated
@@ -99,6 +100,15 @@ class OutputArchive::Impl { | |||
} catch (pex::exceptions::NotFoundError &) { | |||
iter->getTable()->getMetadata()->add("AR_NAME", name, "Class name for objects stored here"); | |||
} | |||
// Also add it as the EXTNAME. | |||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we doing this sort of "overwrite existing else add new one" logic in other places? I'm a bit confused because I thought the PropertyList set()
method did this already. The add()
method is designed for "add a new keyword here in addition to what's already there" but set()
is "replace anything there already with this new one". So why isn't this code using set()
? (I understand this is copying the implementation above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was just stupidly copying the above. I changed it to set
and it works fine; do you want me to do the same change for AR_NAME?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please change AR_NAME as well -- it will continue to confuse people I think. If it is like that for some reason we can't comprehend I'd hope someone would put a more explicit comment in there. set()
is used elsewhere in the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git blame
indicates that @TallJimbo wrote the code so maybe there's some subtlety here we are not understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, I'm afraid. I agree it looks like it should just be a set
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Maybe I just built the wrong version then
$ rebuild -r tickets/DM-28716 afw
Fetching Products...
Fetching afw...
afw ok [tickets/DM-28716] (2.2 sec).
Fetching daf_base...
Fetching daf_persistence...
Fetching pex_config...
Fetching geom...
Fetching log...
Fetching sphgeom...
Fetching utils...
Fetching astshim...
pex_config ok [master] (1.2 sec).
Fetching afwdata...
log ok [master] (1.3 sec).
Fetching base...
sphgeom ok [master] (1.3 sec).
Fetching sconsUtils...
astshim ok [master] (1.3 sec).
daf_base ok [master] (1.5 sec).
Fetching pex_exceptions...
geom ok [master] (1.5 sec).
utils ok [master] (1.5 sec).
daf_persistence ok [master] (1.6 sec).
pex_exceptions ok [master] (0.9 sec).
base ok [master] (1.1 sec).
sconsUtils ok [master] (1.1 sec).
afwdata ok [master] (1.2 sec).
Performing git-lfs pulls...
Pulling afwdata LFS data...
afwdata ok (0.1 sec).
# BUILD ID: b5356
Saving environment information in /Volumes/MediaHD/Users/timj/work/lsstsw/build/builds/b5356.env
afwdata: 21.0.0-2-g294ccec (already installed).
sconsUtils: 21.0.0-11-ge0be345 (already installed).
astshim: 21.0.0-2-g45278ab+64c1bc5aa5 (already installed).
base: 21.0.0-7-gdf92d54+64c1bc5aa5 (already installed).
sphgeom: 21.0.0-1-g8760c09+64c1bc5aa5 (already installed).
pex_exceptions: 21.0.0-2-gde069b7+4f46bdaea8 (already installed).
utils: 21.0.0-6-g0bf7090+18535a8d22 (already installed).
daf_base: 21.0.0-3-gd7ab7aa+fdc5edd43f (already installed).
geom: 21.0.0-5-g8c1d971+7b9a448d34 (already installed).
log: 21.0.0-5-gcc89fd6+fdc5edd43f (already installed).
daf_persistence: 21.0.0-8-g5674e7b+8087abed30 (already installed).
pex_config: 21.0.0-1-ga51b5d4+5491f2a448 (already installed).
afw: tickets.DM-28716-g1a1c2439e9+fb0e51fe34 .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ok (1425.4 sec).
# BUILD b5356 completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've finally read the error message and got it to fail. I thought it was a segv but it's actually a simple C++ test failure.
./test_tableArchives
Running 15 test cases...
--------------------------------------------------------------------------------------
The following warning is expected, and is an indication the test has passed.
--------------------------------------------------------------------------------------
afw.image.fits.ExposureFitsReader WARN: Could not read PSF; setting to null: PersistableFactory with name 'DummyPsf' not found, and import of module 'testTableArchivesLib' failed (possibly because Python calls were not available from C++). {0}; loading object with id=1, name='DummyPsf' {1}
tests/test_tableArchives.cc:714: error: in "ArchiveMetadata": check names.size() == 2u has failed [1 != 2]
tests/test_tableArchives.cc:715: error: in "ArchiveMetadata": check names[0] == "Footprint" has failed [HeavyFootprintF != Footprint]
tests/test_tableArchives.cc:716: error: in "ArchiveMetadata": check names[1] == "HeavyFootprintF" has failed [ != HeavyFootprintF]
*** 3 failures are detected in the test module "table - archives"
and these tests are reading AR_NAME header I guess? @TallJimbo is the DummyPsf warning a red herring? I think I need to read the test because it naively looks like it thought it was getting two headers for the name but is actually only getting one and it says HeavyFootprintF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto names = metadata.getArray<std::string>("AR_NAME");
std::sort(names.begin(), names.end());
BOOST_CHECK_EQUAL(names.size(), 2u);
BOOST_CHECK_EQUAL(names[0], "Footprint");
BOOST_CHECK_EQUAL(names[1], "HeavyFootprintF");
so I don't understand how this test is consistent with the code we removed that explicitly stated that it wanted to ensure only one AR_NAME.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we misunderstood the original motivation for the code we replaced for AR_NAME. I think we thought that it was trying to ensure only one AR_NAME is used. What it's actually doing is let you set multiple distinct AR_NAME values but ensures that it doesn't end up with setting a value that is already set. I guess we need to put the AR_NAME code back but leave the EXTNAME code using set and state that the most recent AR_NAME given is the value we want to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I should have remembered this. In retrospect it's probably too clever, but the system can save objects of different types to the same HDU if they have the same on-disk representation, and that does happen once in a while.
@@ -179,6 +179,10 @@ def checkExposureFitsReader(self, exposureIn, fileName, dtypesOut): | |||
reader = ExposureFitsReader(fileName) | |||
self.assertIn('EXPINFO_V', reader.readMetadata().toDict(), "metadata is automatically versioned") | |||
reader.readMetadata().remove('EXPINFO_V') | |||
self.assertIn('EXTNAME', reader.readMetadata().toDict(), "EXTNAME is added upon writing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like a test that checks that EXTNAME has the expected value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a shot, but I'm pretty sure that's not what you had in mind. I essentially have a predetermined list of acceptable EXTNAMEs, and I doubt it is exhaustive (PiffPsf
should probably be in there, though i'm not sure how we persist those and now that I think of it I doubt it's as part of Archives...?). If this is really kind of what we want, how can I make sure I've got every kind of Persistable we have?
It works as is with all extensions from our regular processing calexps and coadds, fwiw.
I'm not sure how to be less stupid about this - since I can't know beforehand which HDU should correspond to which element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do know the set of EXTNAMEs you expect for your specific test dataset don't you?
At the very least reading the file with astropy.io.fits and iterating over each HDU to check that it is returning an actual name for .name
property seems useful. It would show that this ticket is doing something and would also break if somehow we stopped filling in a name.
8d8e8a0
to
95e6533
Compare
Yep. Stamps only have the one Archive that contains all Transforms (or whatever else we want to store alongside the stamp images). Each set of IMAGE/MASK/VARIANCE then has its EXTVER incremented upon writing, so there's no conflict possible there. See my example on the Jira ticket:
|
95e6533
to
5030591
Compare
tests/test_readers.py
Outdated
@@ -179,6 +180,18 @@ def checkExposureFitsReader(self, exposureIn, fileName, dtypesOut): | |||
reader = ExposureFitsReader(fileName) | |||
self.assertIn('EXPINFO_V', reader.readMetadata().toDict(), "metadata is automatically versioned") | |||
reader.readMetadata().remove('EXPINFO_V') | |||
# ensure EXTNAMEs can be read and make sense | |||
astropyReadFile = astropy.io.fits.open(fileName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a context manager here so you don't need to do the explicit close (which would never run if the assert fails).
extnames = {...}
with astropy.io.fits.open(fileName) as astropyReadFile:
for hdu in astropyReadFile[1:]:
assert
tests/test_readers.py
Outdated
'FilterLabel', 'SkyWcs', 'ApCorrMap', 'PhotoCalib', | ||
'ChebyshevBoundedField', 'CoaddInputs', 'GaussianPsf', | ||
'Polygon', 'VisitInfo') | ||
for hdu in astropyReadFile[1:]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment here that you are skipping the primary HDU?
tests/test_readers.py
Outdated
@@ -179,6 +180,18 @@ def checkExposureFitsReader(self, exposureIn, fileName, dtypesOut): | |||
reader = ExposureFitsReader(fileName) | |||
self.assertIn('EXPINFO_V', reader.readMetadata().toDict(), "metadata is automatically versioned") | |||
reader.readMetadata().remove('EXPINFO_V') | |||
# ensure EXTNAMEs can be read and make sense | |||
astropyReadFile = astropy.io.fits.open(fileName) | |||
extnames = ('IMAGE', 'MASK', 'VARIANCE', 'ARCHIVE_INDEX', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a set
rather than tuple
-- it's slightly more efficient for the in
operator.
5030591
to
1a1c243
Compare
tests/test_readers.py
Outdated
with astropy.io.fits.open(fileName) as astropyReadFile: | ||
for hdu in astropyReadFile: | ||
self.assertIn(hdu.name, extnames) | ||
astropyReadFile.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should remove this close since it's not needed with a context manager.
1a1c243
to
17610c8
Compare
This is required so Transforms are recognized as subclasses of Persistable on the python side. This commit also changes the warning against doing that.
17610c8
to
45895ae
Compare
Jira ticket
Jenkins run #34190