New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-29266: Add translator methods to read all relevant HDUs #52
Conversation
DECam and CFHT now report all HDUs.
Sometimes we have read the header from a YAML file and in that scenario DECam and CFHT code should not try to treat it as a raw fits file.
The code now asks for the metadata translator class earlier.
HSC compressed files have two HDUs but uncompressed only have one. The metadata is all in the second so we can't avoid doing the merge. Only attempt it though if EXTEND=T.
If something more complicated is required the translator class should implement a specialist reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about the behavior of the new read_all_headers
method. It looks like it sometimes just returns the user-provided header, and sometimes not, sometimes considers alternatives, and sometimes not, and sometimes merges multiple headers, and sometimes not. This would be hard enough to reason about if it were documented, but since the docs just say it reads all headers and yields them one by one, any deviations from that behavior are going to be very surprising.
If this is a consequence of trying to unify too many different existing behaviors, then perhaps it would be less error-prone to have a custom ingest task for each instrument, where the user can at least know what to expect.
# If this is not a FITS file return the primary if defined. | ||
# It is likely that the metadata came from a test YAML file. | ||
if not re.search(r"\.fit[s]?\b", filename) and primary is not None: | ||
yield primary | ||
return | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned by how many things read_all_headers
appears to be doing other than reading all headers. Can you mention this special case in the base class documentation? It wouldn't even occur to me that it would be valid to call this method on a non-FITS file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the documentation but now that I think about it it might be better to have astrometadata translate
skip over this call completely if a test YAML file is encountered.
To reply to the global comment. read_all_headers is not meant to read every HDU. It's meant to read every header that is relevant for metadata translation. In the default case it yields back the primary header as given because it has no real knowledge of what should happen. If no primary header is given it attempts to read one -- merging the first two if it's a MEF. It can be reading a non-fits file because all the test code in astro_metadata_translator uses YAML files because they were small and easy to deal with. I could now add stripped files. I'm not sure what a better name would be. "read_relevant_headers" is more accurate. The idea is that the default is going to be mostly fine for most instruments that use one detector per file (and this is true for HSC, LATISS, LSSTCam etc). DECam, CFHT and PhoSim subclass it since in the first two cases they have per-detector headers in one file and for PhoSim the primary header and the first amplifier header have different information that needs to be merged. The subclass is allowed to do whatever is best for it but I tried to make the base class do the thing that works for most of our cameras. The interface always yields one of more headers that are passed to metadata translation so it is all consistent. |
There are two pull requests in astropy that will improve things:
|
Strictly speaking, the subclass must do a subset of the base behavior, otherwise code that doesn't know the exact class it's dealing with (in particular, code in this package or in In this case, I think the solution is to broaden the base class method's description to say that it returns one or more headers with enough information to process a file, without implying either that it returns all headers or that the headers it returns are necessarily those in the file (which covers merging and similar pre-processing). The formal return type is, as you say, not a problem. |
I agree that the description is lacking and I will change it to make clear why returning the primary without reading anything else is reasonable. I'm not sure I understand the comment about things breaking unexpectedly. The "registry" at issue here is the set of translator classes themselves and that's where the implementation is found. The system reads the primary (or first and second headers merged), asks which translator class to use and then asks that translator class if any more information should be read from the file to improve translation. I think that's all solid. If someone has an HSC file and asks for the DECam translator class to read more header information then they get what they get. |
I believe that in a situation where we have a ccdNN HDU, the EXTNAME will not have been stripped.
This removes the special casing in the DECam and MegaPrime translators because now `astrometadata translate` decides not to ask the translator class to read additional headers if it knows it has a YAML file (which must be a test header).
@kfindeisen can you please do another review to see if you are happy with the current changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much better. I still have some questions about consistency across classes (in particular, the handling of primary
), but nothing that would block merging.
@@ -121,12 +122,13 @@ def read_basic_metadata_from_file(file, hdrnum, errstream=sys.stderr, can_raise= | |||
top-level dict. | |||
hdrnum : `int` | |||
Header number to read. Only relevant for FITS. If greater than 1 | |||
it will be merged with the primary header. | |||
it will be merged with the primary header. If negative the second | |||
header will be merged with the primary header if a second is present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording is a bit confusing, especially in the context of the previous sentence. Maybe say "If any negative number, the second header need not exist but will be merged with the primary header if it does."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to rewrite.
two headers. In the base class implementation it is assumed that | ||
this supplied header is the only useful header for metadata translation | ||
and it will be returned unchanged if given. This can avoid | ||
unnecesarily re-opening the file and re-reading the header when the | ||
content is already known. | ||
|
||
If no header is supplied, a header will be read from the supplied | ||
file using `read_basic_metadata_from_file`, allowing it to merge | ||
the primary and secondary header of a multi-extension FITS file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend moving the paragraph break to between "first two headers" and "In the base class implementation". Otherwise, it's not clear that the statement about read_basic_metadata_from_file
applies only to this implementation and not to determine_translatable_headers
. (I'm assuming, of course, that there is no requirement that subclasses call that particular function?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subclasses can do whatever they need to do to get a header from a file. I've added the line break.
the primary and secondary header of a multi-extension FITS file. | ||
|
||
Subclasses can return multiple headers and ignore the externally | ||
supplied header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that subclasses are also allowed to merge headers as appropriate for that instrument?
without already knowing what data is in the file. For many | ||
instruments where the primary header is the only relevant | ||
header, the primary header will be returned with no further | ||
action. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 , this is much clearer than the original wording.
Yields | ||
------ | ||
headers : iterator of `dict`-like | ||
Each relevant header in turn. Can be the supplied primary header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Can include"?
I wonder if this is a little too generic, since it makes it hard for the caller to know if the primary header is redundant with the return value. Does this also need to be special-cased by instrument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to clarify. The method is only required to return a usable header for translation. How the method implements that is up to the method. If it wants to use the supplied header it can or else it can ignore it completely. I was trying not to repeat the general text from above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't asking about whether it uses the supplied argument, but about whether headers
includes the primary header at all (including a primary header it's read internally). From the DECam and Megacam examples it looks like you only want to return the primary header if there are no others, and if there are secondary headers then you never want to return the primary header separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding why the caller needs to know. Some instruments may return some information from the supplied header, other instruments may not. The only reason primary exists is to help out the general case by allowing the file open to be bypassed for instruments where the primary header is going to be enough for translation. MegaPrime will never look at the supplied header because by definition it's irrelevant. DECam will use it if given and merge the detector headers over the top. HSC can return the header immediately because it knows that there is only one header in an HSC file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If primary=None the method still has to return the correct answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps say "Includes the primary header, if this header is needed for translation."? That makes it clear that the caller can just trust the output, instead of making the presence/absence of primary sound like an arbitrary choice.
(Of course, that would invalidate my requests for explicit mention of the primary header in the subclasses.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire method assumes that each header returned by it should be directly usable as input to ObservationInfo constructor. It shouldn't be trying to check if what was given as the header is identical to what is returned. Presence or absence of primary in returned header is instrument-specific and not something that caller needs to care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly! But "Can be the supplied primary header." implies otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah 😄 -- I'll remove that comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had already removed that comment.
Yields | ||
------ | ||
headers : iterator of `dict`-like | ||
Each detector header in turn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly state that this does not include an unmodified primary
?
|
||
Notes | ||
----- | ||
Each translator class can have code specifically tailored to its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this block might be redundant for subclass methods...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified to say DECam.
MegaPrime files are multi-extension FITS with a primary header and | ||
each detector stored in a subsequent extension. MegaPrime uses | ||
``INHERIT=F`` therefore the primary header will always be ignored | ||
if given. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "ignored" mean "not returned in headers
"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that whatever you supply as primary
will make no difference to what is returned. INHERIT=F declares that each HDU has a standalone header that depends on no other header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, as with the DECam translator, I suggest explicitly saying that the returned headers don't include the primary header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me "ignored" implies "will not be used". I have explicitly stated this now also in the yield line.
No description provided.