Merged
Conversation
…y lines It's a binary stream where we don't care about lines, and iterating in tiny line-wise chunks (via the requests package) resulted in extremely slow `nextstrain remote upload` times when the destination was nextstrain.org.¹ When the destination was S3, the s3transfer package underlying boto3 took care of reading from the file handle in chunks instead of lines. ¹ https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1648748118413089
On a ~200 MB example input (a real dataset JSON), the difference in compressed size of ~200 kB (6 MB vs. 6.2MB) seems not worth the difference in compression speed of ~4s (5.6s vs. 1.5s). I assume similar ratios for other inputs of different sizes but similar composition (dataset JSONs and narrative markdowns).
victorlin
approved these changes
Apr 1, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See commit messages for details.
Tested manually. Dramatically improves upload speed to nextstrain.org. I performed a few ad-hoc benchmarks to ensure the compression level change was worth it.