New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-25327: Add file extension validation on ingest #309
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TallJimbo
approved these changes
Jun 12, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most comments are little style things or obvious accidents. I'm not a big fan of the test workaround, but if it's the pragmatic choice, fine.
timj
force-pushed
the
tickets/DM-25327
branch
2 times, most recently
from
June 12, 2020 21:38
247923b
to
6ab6ec8
Compare
* Change makeUpdatedLocation to be a normal method so that we can have dynamic extensions based on formatter parameters. * Remove associated predictPathFromLocation class method * Add new classmethod validateExtension The only place that was using predictPathFromLocation was ingest of external files and in those cases what we really want is to use the extension that was already being used in that external file.
Unify code for taking a URI to a target file and a dataset ref and converting that into a file name inside the datastore. Use the new extension validation formatter method. Also rewrites S3 ingest a little to simplify it to use this new method.
Now formatters check that the file extension is usable. This required a new Lenient formatter that didn't check because one of the constraints tests resulted in a Json storage class being associated with a Yaml formatter.
Use the specialist formatter and configure it for YAML.
os.path.splitext thinks that the extension for file.fits.gz is .gz -- from a butler perspective it has to be .fits.gz so change to use Path.suffixes in Location and ButlerURI.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ostensibly this PR exists to support the new
PackagesFormatter
which I wanted to demonstrate could work by using the file format as a write parameter rather than embedding the file format in the formatter itself. This required that I change how Formatter.makeUpdatedLocation works and led to me realizing that the predict path location was not the right thing to do on ingest. We were simply assuming that the thing being ingested (with whatever extension it had) could always be handled by the gen3 formatter and we weren't checking. This means that in theory a .pickle file could be ingested and renamed to .yaml file. Now we do the rename but copy across the file extension from the ingested file and complain if the ingested file has an unsupported extension.