Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenChunkWriter and ChunkWriter interfaces and refactor s3.uploadMultpart and multiThreadCopy to use them #7154

Merged
merged 4 commits into from Aug 12, 2023

Conversation

vitorog
Copy link

@vitorog vitorog commented Jul 19, 2023

What is the purpose of this change?

Following the discussions in #7061 and #7056 this change is meant to pave the way for fast parallel transfers between remotes.
It defines the interfaces suggested by @jorjao81 and implements them for the S3 and Local backend.
Also, the s3.uploadMultipart logic was refactored to use OpenChunkWriter and ChunkWriter (s3 multi part uploads should behave the same as before).

In https://github.com/rclone/rclone/pull/7061 - @ncw suggested the following roadmap:

  • Define new interfaces in fs
  • Implement OpenChunkWriter and ChunkWriter for local backend.
  • Add tests of OpenChunkWriter and ChunkWriter to the integration tests fstest/fstests/fstests.go
  • Refactor multipart upload in s3 backend to create OpenChunkWriter and ChunkWriter and Leave multipart upload in s3 backend for the moment just using the new interface.

Therefore, the next steps would be:

  • Rework multi-thread copy to use new interface
  • Refactor multipart upload in s3 to use version in operations

Looking forward to some feedback. I'm not experienced with Go, so apologies for any (dumb) mistake. 😅

Was the change discussed in an issue or in the forum before?

#7061
#7056

Checklist

  • I have read the contribution guidelines.
  • I have added tests for all changes in this PR if appropriate.
  • I have added documentation for the changes if appropriate.
  • All commit messages are in house style.
  • I'm done, this Pull Request is ready for review :-)

@vitorog vitorog changed the title Add OpenChunkWriter and ChunkWriter intefaces and refactor s3.uploadMultpart to use them Add OpenChunkWriter and ChunkWriter interfaces and refactor s3.uploadMultpart to use them Jul 19, 2023
@jorjao81
Copy link

@ncw FYI: @vitorog is my work colleague, we are working together to get this done. You might also see commits from @AffDNeto and @sysedwinistrator :)

Tomorrow we will try to do the same thing @vitorog did here for S3 on another provider, probably GCS, to further validate the interface.

@jorjao81
Copy link

After looking at the code a bit, I saw that the way gcp does multipart is basically by implement the S3 API, and this is already supported by configuring a GCS bucket to use S3, so I'll leave that out for now

@ncw
Copy link
Member

ncw commented Jul 22, 2023

If we can work out whether we need an io.Reader or a []byte that would be very useful. Rclone has a useful little adapter readers.Repeatable which can turn an io.Reader into an io.ReadSeeker but only using the buffer memory if needed.

I've just done the 1.63 release so this is a great time in the dev cycle for experimental code!

Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking nice! I put a few comments inline.

We should probably sort out exactly what goes in which commit at some point but that can wait until the end easily enough.

backend/chunker/chunker_test.go Show resolved Hide resolved
if err != nil {
return -1, errors.New("failed to write chunk")
}
fs.Debugf("", "wrote chunk %v with %v bytes", chunkNumber, n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably worth caching the result of src.Remote() so you can use it here - this will make the logs much easier to read!

fs.Debugf(w.remote, ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}

func (w *localChunkWriter) Abort() error {
// TODO: Is just closing enough?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to delete the file - that is what s3 etc will do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to delete the file.

@@ -1374,7 +1374,7 @@ type localChunkWriter struct {
}

func (w *localChunkWriter) WriteChunk(chunkNumber int, reader []byte) (int, error) {
offset := int64(chunkNumber) * w.chunkSize
offset := int64(chunkNumber-1) * w.chunkSize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably start the chunks from 0 otherwise we'll be confusing all the 0-based Go programmers until the end of time ;-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably a good idea 😅
Sorry, I was thinking on the S3 API while implementing this (where part numbers start with 1).
I updated the code to start chunks from 0.

@ncw
Copy link
Member

ncw commented Jul 22, 2023

Therefore, the next steps would be:

  • Rework multi-thread copy to use new interface

Are we going to remove support for OpenWriterAt in the multi-thread copy code?

Just thinking about how this would work for the local backend. It would use more memory with more buffers, but it would probably leave the files less fragmented as it would be writing small chunks but all close to each other which the OS is likely to buffer in RAM. So as far as local backend is concerned this is probably an improvement.

So we should ditch the OpenWriterAt code and convert it completely to ChunkWriter.

I note that the conversion from OpenWriterAt to ChunkWriter is generic - I say this because there is a PR in the pipeline adding OpenWriterAt to the smb backend so it would be nice not to re-write it there! So you could make the code in local just instantiate an adaptor to implement the interface.

  • Refactor multipart upload in s3 to use version in operations

Multithread copies can't do unknown sized copies at the moment so this needs a little care. I guess what we are factoring out is not the entire multithread copying routine but just the guts of the moving chunks around.

To think about - memory management mmap, memory pools etc. We currently manage this for the s3 etc backends and it is quite important, so we need the same controls in the operations multithread routines.

Though we could turn []byte into io.Reader and punt that to the backends. Having a unified memory pool would be a good idea though. It needs to be a bit more sophisticated than it is at the moment to have chunks of different sizes but that is on the cards.

Note also we have two uses of multithread copy in the s3 backend, one for multipart uploads and the other for server side copies of large files. They use pretty much the same code structure but don't need to move actual bytes. I don't know if this can be factored out - probably not.

@jorjao81
Copy link

If we can work out whether we need an io.Reader or a []byte that would be very useful. Rclone has a useful little adapter readers.Repeatable which can turn an io.Reader into an io.ReadSeeker but only using the buffer memory if needed.

I've just done the 1.63 release so this is a great time in the dev cycle for experimental code!

IMHO reader is a better choice:

  1. I think it's the "natural" type that we need here, i.e, semantically we need a source of data, []byte seems too low level/implementation specific.
  2. The S3 SDK takes a Reader (v2) or ReadSeeker (v1), I think Azure Blob also
  3. It also keeps the door of not buffering the whole chunk and using trailers for checksum open.

@ncw
Copy link
Member

ncw commented Jul 25, 2023

If we can work out whether we need an io.Reader or a []byte that would be very useful. Rclone has a useful little adapter readers.Repeatable which can turn an io.Reader into an io.ReadSeeker but only using the buffer memory if needed.
I've just done the 1.63 release so this is a great time in the dev cycle for experimental code!

IMHO reader is a better choice:

  1. I think it's the "natural" type that we need here, i.e, semantically we need a source of data, []byte seems too low level/implementation specific.

Yes, noted.

  1. The S3 SDK takes a Reader (v2) or ReadSeeker (v1), I think Azure Blob also

There are two reasons the SDK needs to seek

  1. Calculation of the hash of the chunk which is needed at the start of the transaction before sending any data (unless s3 has started supporting HTTP trailers which is not impossible!)
  2. Retries

So I suspect io.ReadSeeker is a better option than io.Reader

If we were to implement this naively by seeking an incoming data stream then (which is fairly easy - rclone has a way of opening objects as io.ReadSeekers) then this will read all the data twice over the network to calculate the hash.

So I think this is likely going to need to be a memory backed buffer anyway. However at some point I'd like to switch to more of a scatter gather memory buffer with a lot of 1MB pages say which will would fit well within the io.ReadSeeker interface.

  1. It also keeps the door of not buffering the whole chunk and using trailers for checksum open.

Yes. Though see above re retries!

I had a look to see what the SDKs are expecting

  • s3 v1: io.ReadSeeker - note we also need to calculate the MD5SUM of the blob before we upload
  • azureblob: io.ReadSeekCloser
  • b2: this is up to us (no SDK), but it can get away with an io.Reader as hashes are applied at the end. It needs to buffer for retries though.
  • box: this is up to us (no SDK). Needs to be able to retry.
  • google cloud storage: io.Reader - the SDK handles chunking for us
  • onedrive: we use io.ReadSeeker implemented by rclone's own readers.NewRepeatableReader
  • there are others but I stopped there!

So I think io.ReadSeeker is probably the minimum interface we can get away with. Would you agree?

@jorjao81
Copy link

If we can work out whether we need an io.Reader or a []byte that would be very useful. Rclone has a useful little adapter readers.Repeatable which can turn an io.Reader into an io.ReadSeeker but only using the buffer memory if needed.
I've just done the 1.63 release so this is a great time in the dev cycle for experimental code!

IMHO reader is a better choice:

  1. I think it's the "natural" type that we need here, i.e, semantically we need a source of data, []byte seems too low level/implementation specific.

Yes, noted.

  1. The S3 SDK takes a Reader (v2) or ReadSeeker (v1), I think Azure Blob also

There are two reasons the SDK needs to seek

  1. Calculation of the hash of the chunk which is needed at the start of the transaction before sending any data (unless s3 has started supporting HTTP trailers which is not impossible!)
  2. Retries

So I suspect io.ReadSeeker is a better option than io.Reader

If we were to implement this naively by seeking an incoming data stream then (which is fairly easy - rclone has a way of opening objects as io.ReadSeekers) then this will read all the data twice over the network to calculate the hash.

So I think this is likely going to need to be a memory backed buffer anyway. However at some point I'd like to switch to more of a scatter gather memory buffer with a lot of 1MB pages say which will would fit well within the io.ReadSeeker interface.

  1. It also keeps the door of not buffering the whole chunk and using trailers for checksum open.

Yes. Though see above re retries!

I had a look to see what the SDKs are expecting

  • s3 v1: io.ReadSeeker - note we also need to calculate the MD5SUM of the blob before we upload
  • azureblob: io.ReadSeekCloser
  • b2: this is up to us (no SDK), but it can get away with an io.Reader as hashes are applied at the end. It needs to buffer for retries though.
  • box: this is up to us (no SDK). Needs to be able to retry.
  • google cloud storage: io.Reader - the SDK handles chunking for us
  • onedrive: we use io.ReadSeeker implemented by rclone's own readers.NewRepeatableReader
  • there are others but I stopped there!

So I think io.ReadSeeker is probably the minimum interface we can get away with. Would you agree?

Damn, I keep forgetting about the retries. So I guess you are right, io.ReadSeeker is the minimum we can get away with.

@vitorog
Copy link
Author

vitorog commented Jul 25, 2023

hi, @ncw @jorjao81 , based on the discussion, I refactored the ChunkWriter interface to use an io.ReadSeeker.

I also updated the multi-thread copy to use the new interfaces, based on @jorjao81 's PR in #7061 and fixed the build and lint errors (I think).

So we should ditch the OpenWriterAt code and convert it completely to ChunkWriter.
I note that the conversion from OpenWriterAt to ChunkWriter is generic - I say this because there is a PR in the pipeline adding OpenWriterAt to the smb backend so it would be nice not to re-write it there! So you could make the code in local just instantiate an adaptor to implement the interface.

@ncw I implemented an adapter (openChunkWriterFromOpenWriterAt) in multithread.go. From my understanding, with this adapter we don't even need to implement OpenChunkWriter in the local backend, right?
Is this approach what you had in mind for the generic conversion from OpenWriterAt to ChunkWriter?

@vitorog vitorog force-pushed the fast-s3-to-s3 branch 2 times, most recently from 69ce928 to 7a05ca0 Compare July 25, 2023 21:27
Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look through the OpenWriterAt adapter - looks great :-)

}

func (w writerAtChunkWriter) Close() error {
return nil // NOP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need to close the w.writerAt since we opened it in openChunkWriterFromOpenWriterAt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks. I updated the PR.

return obj.Remove(w.ctx)
}

func openChunkWriterFromOpenWriterAt(openWriterAt func(ctx context.Context, remote string, size int64) (fs.WriterAtCloser, error), writeBufferSize int64, streams int, f fs.Fs) func(ctx context.Context, src fs.ObjectInfo, options ...fs.OpenOption) (chunkSizeResult int64, writer fs.ChunkWriter, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function definition is a bit of a mouthful! Perhaps we should define some aliases in fs/features.go. Anyway it is fine so don't do that for the moment - that can be a job for another day!

@ncw
Copy link
Member

ncw commented Jul 28, 2023

hi, @ncw @jorjao81 , based on the discussion, I refactored the ChunkWriter interface to use an io.ReadSeeker.

Great.

I also updated the multi-thread copy to use the new interfaces, based on @jorjao81 's PR in #7061 and fixed the build and lint errors (I think).

I've re-ran the CI so you can check in a moment.

So we should ditch the OpenWriterAt code and convert it completely to ChunkWriter.
I note that the conversion from OpenWriterAt to ChunkWriter is generic - I say this because there is a PR in the pipeline adding OpenWriterAt to the smb backend so it would be nice not to re-write it there! So you could make the code in local just instantiate an adaptor to implement the interface.

@ncw I implemented an adapter (openChunkWriterFromOpenWriterAt) in multithread.go. From my understanding, with this adapter we don't even need to implement OpenChunkWriter in the local backend, right? Is this approach what you had in mind for the generic conversion from OpenWriterAt to ChunkWriter?

Your adapter looks great. I made a couple of comments above but it is the right approach definitely. And we don't need the local implementation of OpenChunkWriter at all. For some backends OpenWriterAt is easy to write. I suspect we could do it for sftp and hdfs quite easily.

@vitorog
Copy link
Author

vitorog commented Jul 31, 2023

Thanks for the review and comments @ncw .

After running some tests, I noticed an issue with the OpenChunkWriter interface:
Shouldn't it also receive the remote as a parameter?
For example:

type OpenChunkWriter interface {
	OpenChunkWriter(ctx context.Context, remote string, src ObjectInfo, options ...OpenOption) (chunkSize int64, writer ChunkWriter, err error)
}

I'm asking this because the multiThreadCopy function is defined as:

func multiThreadCopy(ctx context.Context, f fs.Fs, remote string, src fs.Object, streams int, tr *accounting.Transfer)

where the remote refers to the destination file.

If we use "src.Remote()" in OpenChunkWriter, it will only work if the destination file has the same name (remote) as the source (otherwise it fails with "multi-thread copy: failed to find object after copy").
What do you think?

@ncw
Copy link
Member

ncw commented Aug 1, 2023

After running some tests, I noticed an issue with the OpenChunkWriter interface: Shouldn't it also receive the remote as a parameter? For example:

type OpenChunkWriter interface {
	OpenChunkWriter(ctx context.Context, remote string, src ObjectInfo, options ...OpenOption) (chunkSize int64, writer ChunkWriter, err error)
}

Traditionally you'd use fs.NewOverrideRemote to override the Remote in the src.

However making it line up with the multi thread copy code makes it slightly easier to use, so adding a remote parameter seems like a good idea probably.

@vitorog
Copy link
Author

vitorog commented Aug 3, 2023

Executed some tests in a m6in.8xlarge EC2 instance (network bandwidth of 50 Gbps).
Transfer of a 100 GB file from S3 to S3. Each command was executed 3 times.

Command Mean [s] Min [s] Max [s] Relative Avg Speed
100-threads-50M-chunk-size 41.131 ± 1.483 39.784 42.720 1.07 ± 0.04 2.4 GiB/s
250-threads-50M-chunk-size 38.593 ± 0.397 38.203 38.996 1.00 2.59 GiB/s
100-threads-100M-chunk-size 42.426 ± 2.405 40.975 45.202 1.10 ± 0.06 2.35 GiB/s
250-threads-100M-chunk-size 39.532 ± 0.942 38.860 40.608 1.02 ± 0.03 2.52 GiB/s
100-threads-250M-chunk-size 44.504 ± 0.565 43.918 45.046 1.15 ± 0.02 2.24 GiB/s
250-threads-250M-chunk-size - - - - -

*250-threads-250M-chunk-size: crashed with an "out of memory" error

Transfer of the same file (single threaded):

Transferred:   	      100 GiB / 100 GiB, 100%, 74.130 MiB/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:     21m48.5s

@vitorog
Copy link
Author

vitorog commented Aug 3, 2023

hi, @ncw, after running more tests, I noticed an issue related to this setModTime call:
https://github.com/rclone/rclone/blob/master/fs/operations/multithread.go#L235

For S3, the setModTime implementation (https://github.com/rclone/rclone/blob/master/backend/s3/s3.go#L5130) makes a copy of the object over itself. However, after doing a multi-threaded copy we don't need to do this, since the metadata should already be correct.
The workaround for this was this commit: c2d16a6
But I'm not sure if this could cause any problems or if there is a better approach.

Another issue is related to the Accounting. I couldn't find a better way of integrating it with the OpenChunkWriter.
Currently, acc.AccountRead is called when a chunk finishes - which causes a "delay" in the reporting.

@msays2000
Copy link
Contributor

msays2000 commented Aug 3, 2023

keep up the good work guys, looking forward to these parallel chunked download, upload interfaces.

In chunked upload interfaces: please also consider the interface to list and use meta information of already uploaded parts. This can be used to resume multipart upload after rclone is restarted for whatever reason and its smart to skip uploading parts already upload. Example: #7189

@ncw
Copy link
Member

ncw commented Aug 4, 2023

Executed some tests in a m6in.8xlarge EC2 instance (network bandwidth of 50 Gbps). Transfer of a 100 GB file from S3 to S3. Each command was executed 3 times.

Amazing performance :-) Is 2.5 GiB/s network saturated? As we are doing down and up or would 5GiB/s be network saturated?

hi, @ncw, after running more tests, I noticed an issue related to this setModTime call: https://github.com/rclone/rclone/blob/master/fs/operations/multithread.go#L235

For S3, the setModTime implementation (https://github.com/rclone/rclone/blob/master/backend/s3/s3.go#L5130) makes a copy of the object over itself. However, after doing a multi-threaded copy we don't need to do this, since the metadata should already be correct.

Yes, this SetModTime is left over from copying to the local file system where it is necessary.

The workaround for this was this commit: c2d16a6 But I'm not sure if this could cause any problems or if there is a better approach.

I think avoiding the SetModTime at the end if the Fs we are copying to does not have the f.Features().Partial feature flag set is probably the right thing to do. Fs without Partial only have the complete file at the end so must get the correct modtime at that point too. Sound OK?

I think your patch is probably OK too though!

Another issue is related to the Accounting. I couldn't find a better way of integrating it with the OpenChunkWriter. Currently, acc.AccountRead is called when a chunk finishes - which causes a "delay" in the reporting.

What we want is something like this

func (acc *Account) WrapStream(in io.Reader) io.Reader {

That wraps an io.ReadSeeker instead.

I'd be happy to leave a FIXME in the code and address this later if you want.

@vitorog
Copy link
Author

vitorog commented Aug 8, 2023

Amazing performance :-) Is 2.5 GiB/s network saturated? As we are doing down and up or would 5GiB/s be network saturated?

I think 50 Gbps is the aggregate bandwidth: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html
That would explain the speed, but honestly I'm not 100% sure... 😅

I think avoiding the SetModTime at the end if the Fs we are copying to does not have the f.Features().Partial feature flag set is probably the right thing to do. Fs without Partial only have the complete file at the end so must get the correct modtime at that point too. Sound OK?

I reverted my previous commit and implemented it as you said.

What we want is something like this

func (acc *Account) WrapStream(in io.Reader) io.Reader {

That wraps an io.ReadSeeker instead.

I'd be happy to leave a FIXME in the code and address this later if you want.

I tried that approach, but I couldn't make it work correctly, I think because of the seeks+reads.
For the S3 backend, we do at least one seek (+re-read) when calculating the chunk's md5, but it seems that the AWS SDK also makes extra seeks behind the scenes (noticed this during debugging).
For example: https://github.com/aws/aws-sdk-go/blob/main/aws/request/request.go#L307
So, for now I left the FIXME as you suggested. 😅

Finally, I had to add a new flag "multi-thread-chunk-size" to set the chunk size for the "openChunkWriter with writerAt" adapter.

Do you think we could merge this PR? If yes, should I rebase/squash some of the commits?

@ncw
Copy link
Member

ncw commented Aug 9, 2023

Amazing performance :-) Is 2.5 GiB/s network saturated? As we are doing down and up or would 5GiB/s be network saturated?

I think 50 Gbps is the aggregate bandwidth: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html That would explain the speed, but honestly I'm not 100% sure... sweat_smile

I think from reading the doc that yes we are saturating the network :-)

I think avoiding the SetModTime at the end if the Fs we are copying to does not have the f.Features().Partial feature flag set is probably the right thing to do. Fs without Partial only have the complete file at the end so must get the correct modtime at that point too. Sound OK?

I reverted my previous commit and implemented it as you said.

👍

What we want is something like this

func (acc *Account) WrapStream(in io.Reader) io.Reader {

That wraps an io.ReadSeeker instead.
I'd be happy to leave a FIXME in the code and address this later if you want.

I tried that approach, but I couldn't make it work correctly, I think because of the seeks+reads. For the S3 backend, we do at least one seek (+re-read) when calculating the chunk's md5, but it seems that the AWS SDK also makes extra seeks behind the scenes (noticed this during debugging). For example: https://github.com/aws/aws-sdk-go/blob/main/aws/request/request.go#L307 So, for now I left the FIXME as you suggested. sweat_smile

That's fine. Can't fix all of the world's problems in one PR :-)

Finally, I had to add a new flag "multi-thread-chunk-size" to set the chunk size for the "openChunkWriter with writerAt" adapter.

Do you think we could merge this PR? If yes, should I rebase/squash some of the commits?

I think we are heading towards merge yes :-)

Can you rebase off master and then squash into logical changes like the plan we originally agreed. That would be perfect :-)

I think what I'd like to do then is pull this locally for the last review. I'll fix up any little things I notice when doing the final review if that is OK with you? Then I'll merge.

We can then work on the next bits!

… available rclone#7056

If the feature OpenChunkWriter is not available, multithread tries to create an adapter from OpenWriterAt to OpenChunkWriter.
@vitorog vitorog changed the title Add OpenChunkWriter and ChunkWriter interfaces and refactor s3.uploadMultpart to use them Add OpenChunkWriter and ChunkWriter interfaces and refactor s3.uploadMultpart and multiThreadCopy to use them Aug 10, 2023
@vitorog
Copy link
Author

vitorog commented Aug 10, 2023

I think we are heading towards merge yes :-)

Can you rebase off master and then squash into logical changes like the plan we originally agreed. That would be perfect :-)

I think what I'd like to do then is pull this locally for the last review. I'll fix up any little things I notice when doing the final review if that is OK with you? Then I'll merge.

We can then work on the next bits!

Sounds awesome, thanks!
I rebased and squashed the commits.

Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you - this code is looking excellent :-)

I will merge this now.

Do you want to work on the final part of this

Refactor multipart upload in s3 to use version in operations

In another PR?

Or I can have a go at that if you want.

1.64 is going to be a very good release :-)

@ncw ncw merged commit 181feca into rclone:master Aug 12, 2023
10 checks passed
@vitorog
Copy link
Author

vitorog commented Aug 14, 2023

Thank you - this code is looking excellent :-)

I will merge this now.

1.64 is going to be a very good release :-)

Thanks, @ncw !

Do you want to work on the final part of this

Refactor multipart upload in s3 to use version in operations

In another PR?

Or I can have a go at that if you want.

I can give it a try (though I could use some pointers).
I also want to try implementing @msays2000 's suggestion for resuming the multipart uploads.

@ncw
Copy link
Member

ncw commented Aug 14, 2023

Do you want to work on the final part of this

Refactor multipart upload in s3 to use version in operations

In another PR?
Or I can have a go at that if you want.

I can give it a try (though I could use some pointers).

I had a look at what this would involve and I think this bit isn't properly thought through yet! I will have a go with it, but I think it is going to be more complicated than I first thought!

I also want to try implementing @msays2000 's suggestion for resuming the multipart uploads.

I'd be interested to see what that looks like.

Note that there are some unmerged patches for resuming uploads in the backlog...

@Rootax
Copy link

Rootax commented Aug 14, 2023

Wait, does this work with every backend, or only S3 ? Good work either way !

EDIT : Only S3 it seems, upload to Dropbox still using only one stream/connex. Sorry I misunderstood the commit.

@jorjao81
Copy link

As I understand, this PR creates an interface and implements it for S3 only (and for things that implement OpenWriterAt, probably only filesystems like local and the SMB backend). As other backends implement the interface, they will also get the capability. This has to be done one by one; and of course requires the backend provider to even support it (I looked into NetStorage because that would be a use case for us, but apparently there is no way).

@ncw ncw mentioned this pull request Aug 15, 2023
5 tasks
@ncw
Copy link
Member

ncw commented Aug 15, 2023

There is more to do on this - I'm going to discuss on #7056 with some things to test :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants