Skip to content

Improve upload speed#165

Merged
tsibley merged 2 commits intomasterfrom
trs/upload-speed
Apr 1, 2022
Merged

Improve upload speed#165
tsibley merged 2 commits intomasterfrom
trs/upload-speed

Conversation

@tsibley
Copy link
Contributor

@tsibley tsibley commented Mar 31, 2022

See commit messages for details.

Tested manually. Dramatically improves upload speed to nextstrain.org. I performed a few ad-hoc benchmarks to ensure the compression level change was worth it.

tsibley added 2 commits March 31, 2022 14:46
…y lines

It's a binary stream where we don't care about lines, and iterating in
tiny line-wise chunks (via the requests package) resulted in extremely
slow `nextstrain remote upload` times when the destination was
nextstrain.org.¹  When the destination was S3, the s3transfer package
underlying boto3 took care of reading from the file handle in chunks
instead of lines.

¹ https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1648748118413089
On a ~200 MB example input (a real dataset JSON), the difference in
compressed size of ~200 kB (6 MB vs. 6.2MB) seems not worth the
difference in compression speed of ~4s (5.6s vs. 1.5s).  I assume
similar ratios for other inputs of different sizes but similar
composition (dataset JSONs and narrative markdowns).
@tsibley tsibley requested a review from a team March 31, 2022 22:02
@tsibley tsibley merged commit d7e5738 into master Apr 1, 2022
@tsibley tsibley deleted the trs/upload-speed branch April 1, 2022 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

2 participants