-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Context
To create signed URLs for the directory-based file format called Zarr, Polaris implements a custom Zarr storage class called PolarisFS. We have a first version of this subsystem implemented, but performance is not great yet.
Description
This ticket is concerned with profiling and optimizing the current implementation such that we can read and write larger datasets. Using the internet in the Valence office in Montreal and using Cloudflare's speed test, I found that we have an up- and download speed of about 210 Mbps. This would mean we can up- and download 10GB files in about 6.5 minutes. The goal of this ticket is to be able to read and write a Zarr archive at most three times as slow as when we would've uploaded / downloaded the archive directly. (In other words, using Polaris in the Valence office in Montreal, the upload should take at most ~19.5 min).
Acceptance Criteria
- We can read a 10GB Zarr archive in 13 minutes.
- We can write a 10GB Zarr archive in 13 minutes.
- The log details which optimizations were tried and what their effect was.
Links
- First implementation tracked here: https://github.com/polaris-hub/polaris-hub/issues/266#issue-2109315242
- Related PRs:
- Document with all optimizations: https://docs.google.com/document/d/1eznBBa_e-WIjsfuXXB875lAouoFpoQtJizxG2yeqRqI/edit?usp=sharing