Conversation
…essed This avoids some useless work but mostly serves to head off confusion (e.g. curl without the --compression option) and/or quirks of HTTP clients (e.g. Snakemake's HTTP remote file provider¹) when a compressed file is compressed again and served with a Content-Encoding: gzip header.² This doesn't come up with Nextstrain dataset and narrative files but does with adjacent input files like metadata.tsv.gz and sequences.fasta.xz which we also put in the S3 buckets (e.g. s3://nextstrain-data/files/zika/…).³ The remote family of commands is not intended for generic S3 management per se, but they're often useful in the Nextstrain ecosystem to manage these ancillary data files. Part of this is that the commands are handy and available, part of it is that Cloudfront invalidation still remains a complication with using `aws s3` directly. Avoiding double compression doesn't go so far out of our way and helps support this slightly off-label use case. ¹ snakemake/snakemake#1508 ² https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1647910842228169 ³ nextstrain/fauna#114
trvrb
approved these changes
Mar 22, 2022
Member
|
Thanks for including these guardrails @tsibley. Running with my original commands prevents the confusing behavior. Much appreciated. |
Contributor
Author
|
Released with 3.2.1. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This avoids some useless work but mostly serves to head off confusion
(e.g. curl without the --compression option) and/or quirks of HTTP
clients (e.g. Snakemake's HTTP remote file provider¹) when a compressed
file is compressed again and served with a Content-Encoding: gzip
header.²
This doesn't come up with Nextstrain dataset and narrative files but
does with adjacent input files like metadata.tsv.gz and
sequences.fasta.xz which we also put in the S3 buckets (e.g.
s3://nextstrain-data/files/zika/…).³
The remote family of commands is not intended for generic S3 management
per se, but they're often useful in the Nextstrain ecosystem to manage
these ancillary data files. Part of this is that the commands are handy
and available, part of it is that Cloudfront invalidation still remains
a complication with using
aws s3directly. Avoiding doublecompression doesn't go so far out of our way and helps support this
slightly off-label use case.
¹ snakemake/snakemake#1508
² https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1647910842228169
³ nextstrain/fauna#114