Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-29266: Add translator methods to read all relevant HDUs #52

Merged
merged 15 commits into from Mar 23, 2021

Conversation

timj
Copy link
Member

@timj timj commented Mar 19, 2021

No description provided.

Sometimes we have read the header from a YAML file and in that
scenario DECam and CFHT code should not try to treat it as
a raw fits file.
The code now asks for the metadata translator class earlier.
HSC compressed files have two HDUs but uncompressed only
have one. The metadata is all in the second so we can't avoid
doing the merge. Only attempt it though if EXTEND=T.
@timj timj requested a review from kfindeisen March 22, 2021 19:24
If something more complicated is required the translator class
should implement a specialist reader.
Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about the behavior of the new read_all_headers method. It looks like it sometimes just returns the user-provided header, and sometimes not, sometimes considers alternatives, and sometimes not, and sometimes merges multiple headers, and sometimes not. This would be hard enough to reason about if it were documented, but since the docs just say it reads all headers and yields them one by one, any deviations from that behavior are going to be very surprising.

If this is a consequence of trying to unify too many different existing behaviors, then perhaps it would be less error-prone to have a custom ingest task for each instrument, where the user can at least know what to expect.

python/astro_metadata_translator/translators/decam.py Outdated Show resolved Hide resolved
python/astro_metadata_translator/translators/decam.py Outdated Show resolved Hide resolved
python/astro_metadata_translator/translators/megaprime.py Outdated Show resolved Hide resolved
python/astro_metadata_translator/translators/megaprime.py Outdated Show resolved Hide resolved
Comment on lines 334 to 339
# If this is not a FITS file return the primary if defined.
# It is likely that the metadata came from a test YAML file.
if not re.search(r"\.fit[s]?\b", filename) and primary is not None:
yield primary
return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned by how many things read_all_headers appears to be doing other than reading all headers. Can you mention this special case in the base class documentation? It wouldn't even occur to me that it would be valid to call this method on a non-FITS file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the documentation but now that I think about it it might be better to have astrometadata translate skip over this call completely if a test YAML file is encountered.

python/astro_metadata_translator/cli/astrometadata.py Outdated Show resolved Hide resolved
@timj
Copy link
Member Author

timj commented Mar 23, 2021

To reply to the global comment. read_all_headers is not meant to read every HDU. It's meant to read every header that is relevant for metadata translation. In the default case it yields back the primary header as given because it has no real knowledge of what should happen. If no primary header is given it attempts to read one -- merging the first two if it's a MEF. It can be reading a non-fits file because all the test code in astro_metadata_translator uses YAML files because they were small and easy to deal with. I could now add stripped files.

I'm not sure what a better name would be. "read_relevant_headers" is more accurate.

The idea is that the default is going to be mostly fine for most instruments that use one detector per file (and this is true for HSC, LATISS, LSSTCam etc). DECam, CFHT and PhoSim subclass it since in the first two cases they have per-detector headers in one file and for PhoSim the primary header and the first amplifier header have different information that needs to be merged. The subclass is allowed to do whatever is best for it but I tried to make the base class do the thing that works for most of our cameras. The interface always yields one of more headers that are passed to metadata translation so it is all consistent.

@timj
Copy link
Member Author

timj commented Mar 23, 2021

There are two pull requests in astropy that will improve things:

@kfindeisen
Copy link
Member

The subclass is allowed to do whatever is best for it but I tried to make the base class do the thing that works for most of our cameras. The interface always yields one of more headers that are passed to metadata translation so it is all consistent.

Strictly speaking, the subclass must do a subset of the base behavior, otherwise code that doesn't know the exact class it's dealing with (in particular, code in this package or in obs_base) may be working off false assumptions. It sounds from your other comments that you're relying on having a registry to always match up the right code with the right class, but that arrangement can be broken any time you add or modify a subclass.

In this case, I think the solution is to broaden the base class method's description to say that it returns one or more headers with enough information to process a file, without implying either that it returns all headers or that the headers it returns are necessarily those in the file (which covers merging and similar pre-processing). The formal return type is, as you say, not a problem.

@timj
Copy link
Member Author

timj commented Mar 23, 2021

I agree that the description is lacking and I will change it to make clear why returning the primary without reading anything else is reasonable.

I'm not sure I understand the comment about things breaking unexpectedly. The "registry" at issue here is the set of translator classes themselves and that's where the implementation is found. The system reads the primary (or first and second headers merged), asks which translator class to use and then asks that translator class if any more information should be read from the file to improve translation. I think that's all solid. If someone has an HSC file and asks for the DECam translator class to read more header information then they get what they get.

I believe that in a situation where we have a ccdNN HDU,
the EXTNAME will not have been stripped.
This removes the special casing in the DECam and MegaPrime
translators because now `astrometadata translate` decides
not to ask the translator class to read additional headers
if it knows it has a YAML file (which must be a test header).
@timj
Copy link
Member Author

timj commented Mar 23, 2021

@kfindeisen can you please do another review to see if you are happy with the current changes?

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better. I still have some questions about consistency across classes (in particular, the handling of primary), but nothing that would block merging.

@@ -121,12 +122,13 @@ def read_basic_metadata_from_file(file, hdrnum, errstream=sys.stderr, can_raise=
top-level dict.
hdrnum : `int`
Header number to read. Only relevant for FITS. If greater than 1
it will be merged with the primary header.
it will be merged with the primary header. If negative the second
header will be merged with the primary header if a second is present.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is a bit confusing, especially in the context of the previous sentence. Maybe say "If any negative number, the second header need not exist but will be merged with the primary header if it does."?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to rewrite.

Comment on lines 955 to 963
two headers. In the base class implementation it is assumed that
this supplied header is the only useful header for metadata translation
and it will be returned unchanged if given. This can avoid
unnecesarily re-opening the file and re-reading the header when the
content is already known.

If no header is supplied, a header will be read from the supplied
file using `read_basic_metadata_from_file`, allowing it to merge
the primary and secondary header of a multi-extension FITS file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend moving the paragraph break to between "first two headers" and "In the base class implementation". Otherwise, it's not clear that the statement about read_basic_metadata_from_file applies only to this implementation and not to determine_translatable_headers. (I'm assuming, of course, that there is no requirement that subclasses call that particular function?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subclasses can do whatever they need to do to get a header from a file. I've added the line break.

the primary and secondary header of a multi-extension FITS file.

Subclasses can return multiple headers and ignore the externally
supplied header.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that subclasses are also allowed to merge headers as appropriate for that instrument?

without already knowing what data is in the file. For many
instruments where the primary header is the only relevant
header, the primary header will be returned with no further
action.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , this is much clearer than the original wording.

Yields
------
headers : iterator of `dict`-like
Each relevant header in turn. Can be the supplied primary header.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Can include"?

I wonder if this is a little too generic, since it makes it hard for the caller to know if the primary header is redundant with the return value. Does this also need to be special-cased by instrument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to clarify. The method is only required to return a usable header for translation. How the method implements that is up to the method. If it wants to use the supplied header it can or else it can ignore it completely. I was trying not to repeat the general text from above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't asking about whether it uses the supplied argument, but about whether headers includes the primary header at all (including a primary header it's read internally). From the DECam and Megacam examples it looks like you only want to return the primary header if there are no others, and if there are secondary headers then you never want to return the primary header separately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding why the caller needs to know. Some instruments may return some information from the supplied header, other instruments may not. The only reason primary exists is to help out the general case by allowing the file open to be bypassed for instruments where the primary header is going to be enough for translation. MegaPrime will never look at the supplied header because by definition it's irrelevant. DECam will use it if given and merge the detector headers over the top. HSC can return the header immediately because it knows that there is only one header in an HSC file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If primary=None the method still has to return the correct answer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps say "Includes the primary header, if this header is needed for translation."? That makes it clear that the caller can just trust the output, instead of making the presence/absence of primary sound like an arbitrary choice.

(Of course, that would invalidate my requests for explicit mention of the primary header in the subclasses.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire method assumes that each header returned by it should be directly usable as input to ObservationInfo constructor. It shouldn't be trying to check if what was given as the header is identical to what is returned. Presence or absence of primary in returned header is instrument-specific and not something that caller needs to care about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly! But "Can be the supplied primary header." implies otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah 😄 -- I'll remove that comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had already removed that comment.

Yields
------
headers : iterator of `dict`-like
Each detector header in turn.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly state that this does not include an unmodified primary?


Notes
-----
Each translator class can have code specifically tailored to its
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this block might be redundant for subclass methods...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified to say DECam.

MegaPrime files are multi-extension FITS with a primary header and
each detector stored in a subsequent extension. MegaPrime uses
``INHERIT=F`` therefore the primary header will always be ignored
if given.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "ignored" mean "not returned in headers"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that whatever you supply as primary will make no difference to what is returned. INHERIT=F declares that each HDU has a standalone header that depends on no other header.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, as with the DECam translator, I suggest explicitly saying that the returned headers don't include the primary header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me "ignored" implies "will not be used". I have explicitly stated this now also in the yield line.

@timj timj merged commit ffa9b84 into master Mar 23, 2021
@timj timj deleted the tickets/DM-29266 branch March 23, 2021 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants