Skip to content

Conversation

abr-egn
Copy link
Contributor

@abr-egn abr-egn commented Mar 20, 2024

RUST-1871

I opted to omit the versions of the upload/download methods that accept a stream - while the spec has both, IMO in the case of Rust it's both unidiomatic and unnecessary. For simple cases write_all / read_to_end does the same thing, and for the more general case it's covered by futures_util::io::copy.

Also, previously open_upload_stream was both non-async and infallible; I think that was an incident of implementation rather than an API guarantee we want to uphold, so I've made it consistent with open_download_stream.

@abr-egn abr-egn marked this pull request as ready for review March 20, 2024 20:13
@abr-egn abr-egn requested a review from isabelatkinson March 20, 2024 20:13
Copy link
Contributor

@isabelatkinson isabelatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to omit the versions of the upload/download methods that accept a stream - while the spec has both, IMO in the case of Rust it's both unidiomatic and unnecessary.

I think this should be fine, but I'm going to post something in dbx-devs to double check we're not breaking any big rules by doing this -- the spec mentions a few times that both APIs are required. It also seems like we have a decent number of people using gridfs based on issues/bug reports, so I'm not sure if we'll be upsetting anyone by removing half of the gridfs API. But the prereleases will hopefully give us some indication of that.

@isabelatkinson
Copy link
Contributor

Thought about this more and I think we should keep both APIs. AsyncRead and AsyncWrite both only work with in-memory data, which means that if a user wants to download from or upload to a file they'd either need to load it into memory all at once or do their own chunking logic to avoid high memory usage. The download_to_writer and upload_from_reader methods only load one file chunk into memory at a time.

@abr-egn
Copy link
Contributor Author

abr-egn commented Mar 25, 2024

AsyncRead and AsyncWrite both only work with in-memory data, which means that if a user wants to download from or upload to a file they'd either need to load it into memory all at once or do their own chunking logic to avoid high memory usage.

I don't think that's accurate 🙂 While the gridfs AsyncRead/AsyncWrite impls only deal with reading to / writing from an in-memory buffer, that's a buffer provided by the caller and doesn't need to be the size of the whole file.

For example, futures_util::io::copy internally uses a BufReader, which will read blocks of 8k bytes* at a time from the source and write those to the destination stream. This mean that running copy with a GridFsDownloadStream as the source and a tokio File as the destination will be processed as if it was (psuedocode):

const BUF_SIZE: usize = 8192;
while let Some(gridfs_chunk) = source.next_chunk() {
  for write_buf in gridfs_chunk.chunks(BUF_SIZE) {
    dest.write(write_buf);
  }
}

* This can be tweaked by the user if needed by creating their own BufReader and calling copy_buf.

@abr-egn abr-egn requested a review from isabelatkinson March 26, 2024 15:19
@isabelatkinson
Copy link
Contributor

Ah I wasn't aware of copy -- I was thinking more about users working directly with the methods in AsyncRead(Ext) and AsyncWrite(Ext). Our documentation for working with the streams (here and here) currently points users towards those methods, so if we're no longer going to support the alternative API I suggest adding some instructions for working with large files to those docs -- linking to copy would probably be sufficient.

@abr-egn
Copy link
Contributor Author

abr-egn commented Mar 26, 2024

Good thought, added notes to those docs about copy.

@abr-egn abr-egn force-pushed the RUST-1871/fluent-gridfs branch from 16af04d to 1eb1025 Compare March 27, 2024 14:28
@abr-egn abr-egn merged commit 34ae89b into mongodb:main Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants