Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-28716: Add pybind wrapper to afw archive that handles adding set of HDUs #588

Merged
merged 4 commits into from May 18, 2021

Conversation

MorganSchmitz
Copy link
Contributor

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good but I do have some things that need changing:

  • Do not set EXTVER -- let the caller do it and add a note about that until we can sort out a real API. I worry that if people call persist multiple times into a single file we'll be opening ourselves up to confusion. Does the Stamps writing work fine on this PR if EXTVER is continually being forced to 1?
  • Add some tests that we are getting a good EXTNAME. I think it might be worth doing a test with astropy.io.fits since if you do that you will be assured of wider compatibility because you can use the hdu.name property directly without reading the header explicitly for EXTNAME.
  • PropertyList.set() vs PropertyList.add() confusion.

using PyInputArchive = py::class_<InputArchive, std::shared_ptr<InputArchive>>;

void wrapInputArchive(utils::python::WrapperCollection &wrappers) {
// TODO: uncomment once afw.fits uses WrapperCollection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the Jira ticket that is going to be doing this work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with Kryzstof; this is actually out of date as of DM-20703. Removed.

@@ -454,6 +454,8 @@ void processPlaneMetadata(std::shared_ptr<daf::base::PropertySet const> metadata
}
hdr->set("INHERIT", true);
hdr->set("EXTTYPE", exttype);
hdr->set("EXTNAME", exttype);
hdr->set("EXTVER", 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to have to have a scheme that allows the EXTVER to be set by the caller somehow if they are writing a new set of images into an existing FITS file. I think we should probably not set EXTVER in the fits writer at all and leave it unset. That way the caller can manually set it to something and we can think later about how to explicitly control EXTVER when persisting at some other time. I'm pretty sure this all means we should not be forcing EXTVER anywhere on this PR (except maybe leaving something in the writeFits docs to say that the caller needs to set EXTVER if they are doing multiple writes to a single file).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed places where EXTVER was set. As for a mention in the docs, I assume it would be in the writeFits of Exposure...? So there? Should I add it for every overload?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know the best place since adding it to every writeFits is going to be painful. I'm not familiar with the documentation in this part of the code. Maybe @TallJimbo can advise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no good way to let the user control EXTVER that I can see; anything we'd do would be a big API change. But I also don't see why they would need to; if they've got more than one archive in a single FITS file they're using the library incorrectly anyway.

@@ -99,6 +100,15 @@ class OutputArchive::Impl {
} catch (pex::exceptions::NotFoundError &) {
iter->getTable()->getMetadata()->add("AR_NAME", name, "Class name for objects stored here");
}
// Also add it as the EXTNAME.
try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing this sort of "overwrite existing else add new one" logic in other places? I'm a bit confused because I thought the PropertyList set() method did this already. The add() method is designed for "add a new keyword here in addition to what's already there" but set() is "replace anything there already with this new one". So why isn't this code using set()? (I understand this is copying the implementation above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was just stupidly copying the above. I changed it to set and it works fine; do you want me to do the same change for AR_NAME?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please change AR_NAME as well -- it will continue to confuse people I think. If it is like that for some reason we can't comprehend I'd hope someone would put a more explicit comment in there. set() is used elsewhere in the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git blame indicates that @TallJimbo wrote the code so maybe there's some subtlety here we are not understanding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, I'm afraid. I agree it looks like it should just be a set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Maybe I just built the wrong version then

$ rebuild -r tickets/DM-28716 afw
Fetching Products...
Fetching afw...
                                            afw ok [tickets/DM-28716] (2.2 sec).
Fetching daf_base...
Fetching daf_persistence...
Fetching pex_config...
Fetching geom...
Fetching log...
Fetching sphgeom...
Fetching utils...
Fetching astshim...
                                               pex_config ok [master] (1.2 sec).
Fetching afwdata...
                                                      log ok [master] (1.3 sec).
Fetching base...
                                                  sphgeom ok [master] (1.3 sec).
Fetching sconsUtils...
                                                  astshim ok [master] (1.3 sec).
                                                 daf_base ok [master] (1.5 sec).
Fetching pex_exceptions...
                                                     geom ok [master] (1.5 sec).
                                                    utils ok [master] (1.5 sec).
                                          daf_persistence ok [master] (1.6 sec).
                                           pex_exceptions ok [master] (0.9 sec).
                                                     base ok [master] (1.1 sec).
                                               sconsUtils ok [master] (1.1 sec).
                                                  afwdata ok [master] (1.2 sec).
Performing git-lfs pulls...
Pulling afwdata LFS data...
                                                           afwdata ok (0.1 sec).
# BUILD ID: b5356
Saving environment information in /Volumes/MediaHD/Users/timj/work/lsstsw/build/builds/b5356.env
             afwdata: 21.0.0-2-g294ccec (already installed).
          sconsUtils: 21.0.0-11-ge0be345 (already installed).
             astshim: 21.0.0-2-g45278ab+64c1bc5aa5 (already installed).
                base: 21.0.0-7-gdf92d54+64c1bc5aa5 (already installed).
             sphgeom: 21.0.0-1-g8760c09+64c1bc5aa5 (already installed).
      pex_exceptions: 21.0.0-2-gde069b7+4f46bdaea8 (already installed).
               utils: 21.0.0-6-g0bf7090+18535a8d22 (already installed).
            daf_base: 21.0.0-3-gd7ab7aa+fdc5edd43f (already installed).
                geom: 21.0.0-5-g8c1d971+7b9a448d34 (already installed).
                 log: 21.0.0-5-gcc89fd6+fdc5edd43f (already installed).
     daf_persistence: 21.0.0-8-g5674e7b+8087abed30 (already installed).
          pex_config: 21.0.0-1-ga51b5d4+5491f2a448 (already installed).
                 afw: tickets.DM-28716-g1a1c2439e9+fb0e51fe34 .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ok (1425.4 sec).
# BUILD b5356 completed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finally read the error message and got it to fail. I thought it was a segv but it's actually a simple C++ test failure.

./test_tableArchives 
Running 15 test cases...
--------------------------------------------------------------------------------------
The following warning is expected, and is an indication the test has passed.
--------------------------------------------------------------------------------------
afw.image.fits.ExposureFitsReader WARN: Could not read PSF; setting to null: PersistableFactory with name 'DummyPsf' not found, and import of module 'testTableArchivesLib' failed (possibly because Python calls were not available from C++). {0}; loading object with id=1, name='DummyPsf' {1}
tests/test_tableArchives.cc:714: error: in "ArchiveMetadata": check names.size() == 2u has failed [1 != 2]
tests/test_tableArchives.cc:715: error: in "ArchiveMetadata": check names[0] == "Footprint" has failed [HeavyFootprintF != Footprint]
tests/test_tableArchives.cc:716: error: in "ArchiveMetadata": check names[1] == "HeavyFootprintF" has failed [ != HeavyFootprintF]

*** 3 failures are detected in the test module "table - archives"

and these tests are reading AR_NAME header I guess? @TallJimbo is the DummyPsf warning a red herring? I think I need to read the test because it naively looks like it thought it was getting two headers for the name but is actually only getting one and it says HeavyFootprintF.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    auto names = metadata.getArray<std::string>("AR_NAME");
    std::sort(names.begin(), names.end());
    BOOST_CHECK_EQUAL(names.size(), 2u);
    BOOST_CHECK_EQUAL(names[0], "Footprint");
    BOOST_CHECK_EQUAL(names[1], "HeavyFootprintF");

so I don't understand how this test is consistent with the code we removed that explicitly stated that it wanted to ensure only one AR_NAME.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we misunderstood the original motivation for the code we replaced for AR_NAME. I think we thought that it was trying to ensure only one AR_NAME is used. What it's actually doing is let you set multiple distinct AR_NAME values but ensures that it doesn't end up with setting a value that is already set. I guess we need to put the AR_NAME code back but leave the EXTNAME code using set and state that the most recent AR_NAME given is the value we want to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I should have remembered this. In retrospect it's probably too clever, but the system can save objects of different types to the same HDU if they have the same on-disk representation, and that does happen once in a while.

@@ -179,6 +179,10 @@ def checkExposureFitsReader(self, exposureIn, fileName, dtypesOut):
reader = ExposureFitsReader(fileName)
self.assertIn('EXPINFO_V', reader.readMetadata().toDict(), "metadata is automatically versioned")
reader.readMetadata().remove('EXPINFO_V')
self.assertIn('EXTNAME', reader.readMetadata().toDict(), "EXTNAME is added upon writing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like a test that checks that EXTNAME has the expected value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a shot, but I'm pretty sure that's not what you had in mind. I essentially have a predetermined list of acceptable EXTNAMEs, and I doubt it is exhaustive (PiffPsf should probably be in there, though i'm not sure how we persist those and now that I think of it I doubt it's as part of Archives...?). If this is really kind of what we want, how can I make sure I've got every kind of Persistable we have?

It works as is with all extensions from our regular processing calexps and coadds, fwiw.

I'm not sure how to be less stupid about this - since I can't know beforehand which HDU should correspond to which element.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do know the set of EXTNAMEs you expect for your specific test dataset don't you?

At the very least reading the file with astropy.io.fits and iterating over each HDU to check that it is returning an actual name for .name property seems useful. It would show that this ticket is doing something and would also break if somehow we stopped filling in a name.

@MorganSchmitz
Copy link
Contributor Author

Does the Stamps writing work fine on this PR if EXTVER is continually being forced to 1?

Yep. Stamps only have the one Archive that contains all Transforms (or whatever else we want to store alongside the stamp images). Each set of IMAGE/MASK/VARIANCE then has its EXTVER incremented upon writing, so there's no conflict possible there. See my example on the Jira ticket:

0 PRIMARY 1 PrimaryHDU 6 ()
1 ARCHIVE_INDEX 1 BinTableHDU 42 2R x 7C [1J, 1J, 1J, 1J, 1J, 64A, 64A]
2 TransformPoint2ToPoint2 1 BinTableHDU 18 2R x 1C [1QB(123)]
3 IMAGE 1 ImageHDU 19 (11, 11) float32
4 MASK 1 ImageHDU 26 (11, 11) int32
5 VARIANCE 1 ImageHDU 19 (11, 11) float32
6 IMAGE 2 ImageHDU 19 (11, 11) float32
7 MASK 2 ImageHDU 26 (11, 11) int32
8 VARIANCE 2 ImageHDU 19 (11, 11) float32```

@@ -179,6 +180,18 @@ def checkExposureFitsReader(self, exposureIn, fileName, dtypesOut):
reader = ExposureFitsReader(fileName)
self.assertIn('EXPINFO_V', reader.readMetadata().toDict(), "metadata is automatically versioned")
reader.readMetadata().remove('EXPINFO_V')
# ensure EXTNAMEs can be read and make sense
astropyReadFile = astropy.io.fits.open(fileName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a context manager here so you don't need to do the explicit close (which would never run if the assert fails).

extnames = {...}
with astropy.io.fits.open(fileName) as astropyReadFile:
    for hdu in astropyReadFile[1:]:
        assert

'FilterLabel', 'SkyWcs', 'ApCorrMap', 'PhotoCalib',
'ChebyshevBoundedField', 'CoaddInputs', 'GaussianPsf',
'Polygon', 'VisitInfo')
for hdu in astropyReadFile[1:]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here that you are skipping the primary HDU?

@@ -179,6 +180,18 @@ def checkExposureFitsReader(self, exposureIn, fileName, dtypesOut):
reader = ExposureFitsReader(fileName)
self.assertIn('EXPINFO_V', reader.readMetadata().toDict(), "metadata is automatically versioned")
reader.readMetadata().remove('EXPINFO_V')
# ensure EXTNAMEs can be read and make sense
astropyReadFile = astropy.io.fits.open(fileName)
extnames = ('IMAGE', 'MASK', 'VARIANCE', 'ARCHIVE_INDEX',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a set rather than tuple -- it's slightly more efficient for the in operator.

with astropy.io.fits.open(fileName) as astropyReadFile:
for hdu in astropyReadFile:
self.assertIn(hdu.name, extnames)
astropyReadFile.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove this close since it's not needed with a context manager.

This is required so Transforms are recognized as subclasses
of Persistable on the python side. This commit also changes
the warning against doing that.
@MorganSchmitz MorganSchmitz merged commit 68b03cc into master May 18, 2021
@MorganSchmitz MorganSchmitz deleted the tickets/DM-28716 branch May 18, 2021 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants