Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-26343: Use less expansive definition of extension #354

Merged
merged 4 commits into from Aug 18, 2020
Merged

Conversation

timj
Copy link
Member

@timj timj commented Aug 17, 2020

  • Change ButlerURI.getExtension to return a single extension unless that extension matches a specific set (e.g. .gz).
  • Change extension validation in Formatter to check the end of the file name directly rather than trying to use the source file extension (which is an unnecessary extra step).

These changes allow "flat_i_sim_1.4_blah.fits" to be ingested as a fits file.

@timj timj requested a review from erykoff August 17, 2020 21:04
@erykoff
Copy link
Contributor

erykoff commented Aug 17, 2020

Before I leave a review, is there a completely different path for .fz or is that an oversight here?

@timj
Copy link
Member Author

timj commented Aug 17, 2020

I wasn't sure what to do with .fits.fz since I could cover it completely by declaring that .fz is a supported extension in obs_base FitsExposureFormatter. There is no other usage of .fz I think. That does remind me that I need to make a one line patch to the formatter in obs_base to make that explicit. I can add .fz here as a special extension if you like.

"""
special = {".gz", ".bz2", ".xz"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see that .fz is not "special" because it is handled natively by the fits formatter. Which is actually good, because then we can support fits.fz or .fz which I think are both acceptable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at the moment even though .fits.fz is listed as a supported extension in the formatter it's irrelevant because the .fz on its own is going to be enough to match.

("file", ""),
("flat_i_sim_1.4_blah.fits.gz", ".fits.gz"),
("flat_i_sim_1.4_blah.txt", ".txt"),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do support .fits and .fits.fz then I think that we should explicitly test that here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since .fz is not listed as a special extension at the moment getExtension() will return .fz for .fits.fz.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine! I would suggest putting those in the list so that it's clear what it will return (and that we know it's returning what we think it should return).

We generally want a single extension to be returned unless it's
one of the standard compression extensions such as .gz. Now
recognize those and return .fits.gz for "a.b.c.fits.gz"
and .fits for "a.b.c.fits".
Rather than parsing extensions out of filenames, change to the
safer approach of seeing if the file to be ingested ends with
one of the supported extensions.
This can happen if it's is implemented as an instance property.
In these cases we assume that the supported extensions class property
is complete.
@timj timj merged commit 1147fe7 into master Aug 18, 2020
@timj timj deleted the tickets/DM-26343 branch August 18, 2020 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants