Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-25327: Add file extension validation on ingest #309

Merged
merged 6 commits into from Jun 13, 2020
Merged

Conversation

timj
Copy link
Member

@timj timj commented Jun 11, 2020

Ostensibly this PR exists to support the new PackagesFormatter which I wanted to demonstrate could work by using the file format as a write parameter rather than embedding the file format in the formatter itself. This required that I change how Formatter.makeUpdatedLocation works and led to me realizing that the predict path location was not the right thing to do on ingest. We were simply assuming that the thing being ingested (with whatever extension it had) could always be handled by the gen3 formatter and we weren't checking. This means that in theory a .pickle file could be ingested and renamed to .yaml file. Now we do the rename but copy across the file extension from the ingested file and complain if the ingested file has an unsupported extension.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most comments are little style things or obvious accidents. I'm not a big fan of the test workaround, but if it's the pragmatic choice, fine.

python/lsst/daf/butler/core/formatter.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/core/formatter.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/datastores/posixDatastore.py Outdated Show resolved Hide resolved
@timj timj force-pushed the tickets/DM-25327 branch 2 times, most recently from 247923b to 6ab6ec8 Compare June 12, 2020 21:38
timj added 6 commits June 12, 2020 16:51
* Change makeUpdatedLocation to be a normal method so that
  we can have dynamic extensions based on formatter parameters.
* Remove associated predictPathFromLocation class method
* Add new classmethod validateExtension

The only place that was using predictPathFromLocation was
ingest of external files and in those cases what we really
want is to use the extension that was already being
used in that external file.
Unify code for taking a URI to a target file and a dataset ref
and converting that into a file name inside the datastore.
Use the new extension validation formatter method.

Also rewrites S3 ingest a little to simplify it to use this
new method.
Now formatters check that the file extension is usable.
This required a new Lenient formatter that didn't check
because one of the constraints tests resulted in a
Json storage class being associated with a Yaml
formatter.
Use the specialist formatter and configure it for YAML.
os.path.splitext thinks that the extension for file.fits.gz
is .gz -- from a butler perspective it has to be .fits.gz
so change to use Path.suffixes in Location and ButlerURI.
@timj timj merged commit 7dd67ae into master Jun 13, 2020
@timj timj deleted the tickets/DM-25327 branch June 13, 2020 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants