Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaffold S3 remote persistence interface #3

Merged
merged 3 commits into from Apr 3, 2023
Merged

Scaffold S3 remote persistence interface #3

merged 3 commits into from Apr 3, 2023

Conversation

ayazhafiz
Copy link
Member

The new RemotePersistence trait provides an interface for loading and
storing files on in remote location. The present implementation includes
one implementation of this interface for AWS's S3 service. The
RemotePersistence trait will optionally back the file-based persisters
of results and manifests.

Support for S3 is added as a feature, so that folks self-compiling ABQ do
not have to compile-in S3. By default, RWX will build ABQ binaries with
support for S3.

Note that the implementation adds a dependency on AWS's Rust SDK,
which is presently a "developer preview" and not stable. However,
anecdotal evidence suggests (1, 2)
the dependency is usable. We also do not have much other choice with
regards to AWS S3 clients.

The new `RemotePersistence` trait provides an interface for loading and
storing files on in remote location. The present implementation includes
one implementation of this interface for AWS's S3 service. The
`RemotePersistence` trait will optionally back the file-based persisters
of results and manifests.
@ayazhafiz ayazhafiz requested a review from doxavore April 3, 2023 19:07
@github-actions
Copy link

github-actions bot commented Apr 3, 2023

Bigtest for f0392e3 (run)

Benchmarks:

  • RSpec: 15.24% overhead
    • RSpec time: 17.72 seconds
    • ABQ time: 20.42 seconds
  • RSpec parallel, 10 runs: max % overhead
    • min % overhead
    • standard deviation: %
  • Jest: 7.15% overhead
    • Jest time: 21.277 seconds
    • ABQ time: 22.799 seconds

Fuzz result sizes:

  • PASSED

It appears that with ubuntu-latest runners, we can end up with queues
much further away than we observed with rwx hosted runners. We should
investigate what we can do to reduce this latency other than increasing
batch sizes, but as a stopgap, bump the max overhead.
@ayazhafiz ayazhafiz enabled auto-merge (squash) April 3, 2023 19:33
@github-actions
Copy link

github-actions bot commented Apr 3, 2023

Bigtest for f3deb88 (run)

Benchmarks:

  • RSpec: 7.51% overhead
    • RSpec time: 17.71 seconds
    • ABQ time: 19.04 seconds
  • RSpec parallel, 10 runs: max 18.07% overhead
    • min 7.45% overhead
    • standard deviation: 3.48%
  • Jest: 5.49% overhead
    • Jest time: 20.809 seconds
    • ABQ time: 21.951 seconds

Fuzz result sizes:

  • PASSED

@ayazhafiz ayazhafiz merged commit bbf9bd8 into main Apr 3, 2023
17 checks passed
@ayazhafiz ayazhafiz deleted the s3-scaffold branch April 3, 2023 19:52
Comment on lines +94 to +95
PersistenceKind::Manifest => "manifest",
PersistenceKind::Results => "results",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayazhafiz Do we want these to include a file extension, eg. .jsonl or .jsonl.gz?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have too much of a preference - what would you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's easy enough to add in a follow-up, I like having the extension so we'll never need to guess the format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll be sure to add that in a next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants