Add OpenChunkWriter and ChunkWriter interfaces and refactor s3.uploadMultpart and multiThreadCopy to use them #7154

vitorog · 2023-07-19T15:24:12Z

What is the purpose of this change?

Following the discussions in #7061 and #7056 this change is meant to pave the way for fast parallel transfers between remotes.
It defines the interfaces suggested by @jorjao81 and implements them for the S3 and Local backend.
Also, the s3.uploadMultipart logic was refactored to use OpenChunkWriter and ChunkWriter (s3 multi part uploads should behave the same as before).

In https://github.com/rclone/rclone/pull/7061 - @ncw suggested the following roadmap:

Define new interfaces in fs
Implement OpenChunkWriter and ChunkWriter for local backend.
Add tests of OpenChunkWriter and ChunkWriter to the integration tests fstest/fstests/fstests.go
Refactor multipart upload in s3 backend to create OpenChunkWriter and ChunkWriter and Leave multipart upload in s3 backend for the moment just using the new interface.

Therefore, the next steps would be:

Rework multi-thread copy to use new interface
Refactor multipart upload in s3 to use version in operations

Looking forward to some feedback. I'm not experienced with Go, so apologies for any (dumb) mistake. 😅

Was the change discussed in an issue or in the forum before?

#7061
#7056

Checklist

I have read the contribution guidelines.
I have added tests for all changes in this PR if appropriate.
I have added documentation for the changes if appropriate.
All commit messages are in house style.
I'm done, this Pull Request is ready for review :-)

jorjao81 · 2023-07-20T08:52:17Z

@ncw FYI: @vitorog is my work colleague, we are working together to get this done. You might also see commits from @AffDNeto and @sysedwinistrator :)

Tomorrow we will try to do the same thing @vitorog did here for S3 on another provider, probably GCS, to further validate the interface.

jorjao81 · 2023-07-21T12:12:49Z

After looking at the code a bit, I saw that the way gcp does multipart is basically by implement the S3 API, and this is already supported by configuring a GCS bucket to use S3, so I'll leave that out for now

ncw · 2023-07-22T08:52:42Z

If we can work out whether we need an io.Reader or a []byte that would be very useful. Rclone has a useful little adapter readers.Repeatable which can turn an io.Reader into an io.ReadSeeker but only using the buffer memory if needed.

I've just done the 1.63 release so this is a great time in the dev cycle for experimental code!

ncw

This is looking nice! I put a few comments inline.

We should probably sort out exactly what goes in which commit at some point but that can wait until the end easily enough.

backend/chunker/chunker_test.go

ncw · 2023-07-22T08:57:58Z

backend/local/local.go

+	if err != nil {
+		return -1, errors.New("failed to write chunk")
+	}
+	fs.Debugf("", "wrote chunk %v with %v bytes", chunkNumber, n)


Its probably worth caching the result of src.Remote() so you can use it here - this will make the logs much easier to read!

fs.Debugf(w.remote, ...

ncw · 2023-07-22T08:58:39Z

backend/local/local.go

+}
+
+func (w *localChunkWriter) Abort() error {
+	// TODO: Is just closing enough?


We probably want to delete the file - that is what s3 etc will do.

Updated the code to delete the file.

ncw · 2023-07-22T09:02:36Z

backend/local/local.go

@@ -1374,7 +1374,7 @@ type localChunkWriter struct {
 }

 func (w *localChunkWriter) WriteChunk(chunkNumber int, reader []byte) (int, error) {
-	offset := int64(chunkNumber) * w.chunkSize
+	offset := int64(chunkNumber-1) * w.chunkSize


We should probably start the chunks from 0 otherwise we'll be confusing all the 0-based Go programmers until the end of time ;-)

That's probably a good idea 😅
Sorry, I was thinking on the S3 API while implementing this (where part numbers start with 1).
I updated the code to start chunks from 0.

ncw · 2023-07-22T09:30:16Z

Therefore, the next steps would be:

Rework multi-thread copy to use new interface

Are we going to remove support for OpenWriterAt in the multi-thread copy code?

Just thinking about how this would work for the local backend. It would use more memory with more buffers, but it would probably leave the files less fragmented as it would be writing small chunks but all close to each other which the OS is likely to buffer in RAM. So as far as local backend is concerned this is probably an improvement.

So we should ditch the OpenWriterAt code and convert it completely to ChunkWriter.

I note that the conversion from OpenWriterAt to ChunkWriter is generic - I say this because there is a PR in the pipeline adding OpenWriterAt to the smb backend so it would be nice not to re-write it there! So you could make the code in local just instantiate an adaptor to implement the interface.

Refactor multipart upload in s3 to use version in operations

Multithread copies can't do unknown sized copies at the moment so this needs a little care. I guess what we are factoring out is not the entire multithread copying routine but just the guts of the moving chunks around.

To think about - memory management mmap, memory pools etc. We currently manage this for the s3 etc backends and it is quite important, so we need the same controls in the operations multithread routines.

Though we could turn []byte into io.Reader and punt that to the backends. Having a unified memory pool would be a good idea though. It needs to be a bit more sophisticated than it is at the moment to have chunks of different sizes but that is on the cards.

Note also we have two uses of multithread copy in the s3 backend, one for multipart uploads and the other for server side copies of large files. They use pretty much the same code structure but don't need to move actual bytes. I don't know if this can be factored out - probably not.

jorjao81 · 2023-07-24T09:40:25Z

If we can work out whether we need an io.Reader or a []byte that would be very useful. Rclone has a useful little adapter readers.Repeatable which can turn an io.Reader into an io.ReadSeeker but only using the buffer memory if needed.

I've just done the 1.63 release so this is a great time in the dev cycle for experimental code!

IMHO reader is a better choice:

I think it's the "natural" type that we need here, i.e, semantically we need a source of data, []byte seems too low level/implementation specific.
The S3 SDK takes a Reader (v2) or ReadSeeker (v1), I think Azure Blob also
It also keeps the door of not buffering the whole chunk and using trailers for checksum open.

ncw · 2023-07-25T06:48:40Z

If we can work out whether we need an io.Reader or a []byte that would be very useful. Rclone has a useful little adapter readers.Repeatable which can turn an io.Reader into an io.ReadSeeker but only using the buffer memory if needed.
I've just done the 1.63 release so this is a great time in the dev cycle for experimental code!

IMHO reader is a better choice:

I think it's the "natural" type that we need here, i.e, semantically we need a source of data, []byte seems too low level/implementation specific.

Yes, noted.

The S3 SDK takes a Reader (v2) or ReadSeeker (v1), I think Azure Blob also

There are two reasons the SDK needs to seek

Calculation of the hash of the chunk which is needed at the start of the transaction before sending any data (unless s3 has started supporting HTTP trailers which is not impossible!)
Retries

So I suspect io.ReadSeeker is a better option than io.Reader

If we were to implement this naively by seeking an incoming data stream then (which is fairly easy - rclone has a way of opening objects as io.ReadSeekers) then this will read all the data twice over the network to calculate the hash.

So I think this is likely going to need to be a memory backed buffer anyway. However at some point I'd like to switch to more of a scatter gather memory buffer with a lot of 1MB pages say which will would fit well within the io.ReadSeeker interface.

It also keeps the door of not buffering the whole chunk and using trailers for checksum open.

Yes. Though see above re retries!

I had a look to see what the SDKs are expecting

s3 v1: io.ReadSeeker - note we also need to calculate the MD5SUM of the blob before we upload
azureblob: io.ReadSeekCloser
b2: this is up to us (no SDK), but it can get away with an io.Reader as hashes are applied at the end. It needs to buffer for retries though.
box: this is up to us (no SDK). Needs to be able to retry.
google cloud storage: io.Reader - the SDK handles chunking for us
onedrive: we use io.ReadSeeker implemented by rclone's own readers.NewRepeatableReader
there are others but I stopped there!

So I think io.ReadSeeker is probably the minimum interface we can get away with. Would you agree?

jorjao81 · 2023-07-25T11:22:59Z

If we can work out whether we need an io.Reader or a []byte that would be very useful. Rclone has a useful little adapter readers.Repeatable which can turn an io.Reader into an io.ReadSeeker but only using the buffer memory if needed.
I've just done the 1.63 release so this is a great time in the dev cycle for experimental code!

IMHO reader is a better choice:

I think it's the "natural" type that we need here, i.e, semantically we need a source of data, []byte seems too low level/implementation specific.

Yes, noted.

The S3 SDK takes a Reader (v2) or ReadSeeker (v1), I think Azure Blob also

There are two reasons the SDK needs to seek

Calculation of the hash of the chunk which is needed at the start of the transaction before sending any data (unless s3 has started supporting HTTP trailers which is not impossible!)

Retries

So I suspect io.ReadSeeker is a better option than io.Reader

If we were to implement this naively by seeking an incoming data stream then (which is fairly easy - rclone has a way of opening objects as io.ReadSeekers) then this will read all the data twice over the network to calculate the hash.

So I think this is likely going to need to be a memory backed buffer anyway. However at some point I'd like to switch to more of a scatter gather memory buffer with a lot of 1MB pages say which will would fit well within the io.ReadSeeker interface.

It also keeps the door of not buffering the whole chunk and using trailers for checksum open.

Yes. Though see above re retries!

I had a look to see what the SDKs are expecting

s3 v1: io.ReadSeeker - note we also need to calculate the MD5SUM of the blob before we upload

azureblob: io.ReadSeekCloser

b2: this is up to us (no SDK), but it can get away with an io.Reader as hashes are applied at the end. It needs to buffer for retries though.

box: this is up to us (no SDK). Needs to be able to retry.

google cloud storage: io.Reader - the SDK handles chunking for us

onedrive: we use io.ReadSeeker implemented by rclone's own readers.NewRepeatableReader

there are others but I stopped there!

So I think io.ReadSeeker is probably the minimum interface we can get away with. Would you agree?

Damn, I keep forgetting about the retries. So I guess you are right, io.ReadSeeker is the minimum we can get away with.

vitorog · 2023-07-25T17:17:55Z

hi, @ncw @jorjao81 , based on the discussion, I refactored the ChunkWriter interface to use an io.ReadSeeker.

I also updated the multi-thread copy to use the new interfaces, based on @jorjao81 's PR in #7061 and fixed the build and lint errors (I think).

So we should ditch the OpenWriterAt code and convert it completely to ChunkWriter.
I note that the conversion from OpenWriterAt to ChunkWriter is generic - I say this because there is a PR in the pipeline adding OpenWriterAt to the smb backend so it would be nice not to re-write it there! So you could make the code in local just instantiate an adaptor to implement the interface.

@ncw I implemented an adapter (openChunkWriterFromOpenWriterAt) in multithread.go. From my understanding, with this adapter we don't even need to implement OpenChunkWriter in the local backend, right?
Is this approach what you had in mind for the generic conversion from OpenWriterAt to ChunkWriter?

ncw

I had a look through the OpenWriterAt adapter - looks great :-)

ncw · 2023-07-28T00:51:36Z

fs/operations/multithread.go

+}
+
+func (w writerAtChunkWriter) Close() error {
+	return nil // NOP


Doesn't this need to close the w.writerAt since we opened it in openChunkWriterFromOpenWriterAt?

Nice catch, thanks. I updated the PR.

ncw · 2023-07-28T00:53:54Z

fs/operations/multithread.go

+	return obj.Remove(w.ctx)
+}
+
+func openChunkWriterFromOpenWriterAt(openWriterAt func(ctx context.Context, remote string, size int64) (fs.WriterAtCloser, error), writeBufferSize int64, streams int, f fs.Fs) func(ctx context.Context, src fs.ObjectInfo, options ...fs.OpenOption) (chunkSizeResult int64, writer fs.ChunkWriter, err error) {


That function definition is a bit of a mouthful! Perhaps we should define some aliases in fs/features.go. Anyway it is fine so don't do that for the moment - that can be a job for another day!

ncw · 2023-07-28T01:00:34Z

hi, @ncw @jorjao81 , based on the discussion, I refactored the ChunkWriter interface to use an io.ReadSeeker.

Great.

I also updated the multi-thread copy to use the new interfaces, based on @jorjao81 's PR in #7061 and fixed the build and lint errors (I think).

I've re-ran the CI so you can check in a moment.

So we should ditch the OpenWriterAt code and convert it completely to ChunkWriter.
I note that the conversion from OpenWriterAt to ChunkWriter is generic - I say this because there is a PR in the pipeline adding OpenWriterAt to the smb backend so it would be nice not to re-write it there! So you could make the code in local just instantiate an adaptor to implement the interface.

@ncw I implemented an adapter (openChunkWriterFromOpenWriterAt) in multithread.go. From my understanding, with this adapter we don't even need to implement OpenChunkWriter in the local backend, right? Is this approach what you had in mind for the generic conversion from OpenWriterAt to ChunkWriter?

Your adapter looks great. I made a couple of comments above but it is the right approach definitely. And we don't need the local implementation of OpenChunkWriter at all. For some backends OpenWriterAt is easy to write. I suspect we could do it for sftp and hdfs quite easily.

vitorog · 2023-07-31T16:21:52Z

Thanks for the review and comments @ncw .

After running some tests, I noticed an issue with the OpenChunkWriter interface:
Shouldn't it also receive the remote as a parameter?
For example:

type OpenChunkWriter interface {
	OpenChunkWriter(ctx context.Context, remote string, src ObjectInfo, options ...OpenOption) (chunkSize int64, writer ChunkWriter, err error)
}

I'm asking this because the multiThreadCopy function is defined as:

func multiThreadCopy(ctx context.Context, f fs.Fs, remote string, src fs.Object, streams int, tr *accounting.Transfer)

where the remote refers to the destination file.

If we use "src.Remote()" in OpenChunkWriter, it will only work if the destination file has the same name (remote) as the source (otherwise it fails with "multi-thread copy: failed to find object after copy").
What do you think?

ncw · 2023-08-01T02:21:57Z

After running some tests, I noticed an issue with the OpenChunkWriter interface: Shouldn't it also receive the remote as a parameter? For example:
type OpenChunkWriter interface {
	OpenChunkWriter(ctx context.Context, remote string, src ObjectInfo, options ...OpenOption) (chunkSize int64, writer ChunkWriter, err error)
}

Traditionally you'd use fs.NewOverrideRemote to override the Remote in the src.

However making it line up with the multi thread copy code makes it slightly easier to use, so adding a remote parameter seems like a good idea probably.

vitorog · 2023-08-03T15:19:12Z

Executed some tests in a m6in.8xlarge EC2 instance (network bandwidth of 50 Gbps).
Transfer of a 100 GB file from S3 to S3. Each command was executed 3 times.

Command	Mean [s]	Min [s]	Max [s]	Relative	Avg Speed
`100-threads-50M-chunk-size`	41.131 ± 1.483	39.784	42.720	1.07 ± 0.04	2.4 GiB/s
`250-threads-50M-chunk-size`	38.593 ± 0.397	38.203	38.996	1.00	2.59 GiB/s
`100-threads-100M-chunk-size`	42.426 ± 2.405	40.975	45.202	1.10 ± 0.06	2.35 GiB/s
`250-threads-100M-chunk-size`	39.532 ± 0.942	38.860	40.608	1.02 ± 0.03	2.52 GiB/s
`100-threads-250M-chunk-size`	44.504 ± 0.565	43.918	45.046	1.15 ± 0.02	2.24 GiB/s
`250-threads-250M-chunk-size`	-	-	-	-	-

*250-threads-250M-chunk-size: crashed with an "out of memory" error

Transfer of the same file (single threaded):

Transferred:   	      100 GiB / 100 GiB, 100%, 74.130 MiB/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:     21m48.5s

vitorog · 2023-08-03T15:58:01Z

hi, @ncw, after running more tests, I noticed an issue related to this setModTime call:
https://github.com/rclone/rclone/blob/master/fs/operations/multithread.go#L235

For S3, the setModTime implementation (https://github.com/rclone/rclone/blob/master/backend/s3/s3.go#L5130) makes a copy of the object over itself. However, after doing a multi-threaded copy we don't need to do this, since the metadata should already be correct.
The workaround for this was this commit: c2d16a6
But I'm not sure if this could cause any problems or if there is a better approach.

Another issue is related to the Accounting. I couldn't find a better way of integrating it with the OpenChunkWriter.
Currently, acc.AccountRead is called when a chunk finishes - which causes a "delay" in the reporting.

msays2000 · 2023-08-03T18:42:03Z

keep up the good work guys, looking forward to these parallel chunked download, upload interfaces.

In chunked upload interfaces: please also consider the interface to list and use meta information of already uploaded parts. This can be used to resume multipart upload after rclone is restarted for whatever reason and its smart to skip uploading parts already upload. Example: #7189

ncw · 2023-08-04T08:35:19Z

Executed some tests in a m6in.8xlarge EC2 instance (network bandwidth of 50 Gbps). Transfer of a 100 GB file from S3 to S3. Each command was executed 3 times.

Amazing performance :-) Is 2.5 GiB/s network saturated? As we are doing down and up or would 5GiB/s be network saturated?

hi, @ncw, after running more tests, I noticed an issue related to this setModTime call: https://github.com/rclone/rclone/blob/master/fs/operations/multithread.go#L235

For S3, the setModTime implementation (https://github.com/rclone/rclone/blob/master/backend/s3/s3.go#L5130) makes a copy of the object over itself. However, after doing a multi-threaded copy we don't need to do this, since the metadata should already be correct.

Yes, this SetModTime is left over from copying to the local file system where it is necessary.

The workaround for this was this commit: c2d16a6 But I'm not sure if this could cause any problems or if there is a better approach.

I think avoiding the SetModTime at the end if the Fs we are copying to does not have the f.Features().Partial feature flag set is probably the right thing to do. Fs without Partial only have the complete file at the end so must get the correct modtime at that point too. Sound OK?

I think your patch is probably OK too though!

Another issue is related to the Accounting. I couldn't find a better way of integrating it with the OpenChunkWriter. Currently, acc.AccountRead is called when a chunk finishes - which causes a "delay" in the reporting.

What we want is something like this

rclone/fs/accounting/accounting.go

Line 564 in 4444037

func (acc *Account) WrapStream(in io.Reader) io.Reader {

That wraps an io.ReadSeeker instead.

I'd be happy to leave a FIXME in the code and address this later if you want.

vitorog · 2023-08-08T10:19:20Z

Amazing performance :-) Is 2.5 GiB/s network saturated? As we are doing down and up or would 5GiB/s be network saturated?

I think 50 Gbps is the aggregate bandwidth: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html
That would explain the speed, but honestly I'm not 100% sure... 😅

I think avoiding the SetModTime at the end if the Fs we are copying to does not have the f.Features().Partial feature flag set is probably the right thing to do. Fs without Partial only have the complete file at the end so must get the correct modtime at that point too. Sound OK?

I reverted my previous commit and implemented it as you said.

What we want is something like this

rclone/fs/accounting/accounting.go

Line 564 in 4444037

func (acc *Account) WrapStream(in io.Reader) io.Reader {

That wraps an io.ReadSeeker instead.

I'd be happy to leave a FIXME in the code and address this later if you want.

I tried that approach, but I couldn't make it work correctly, I think because of the seeks+reads.
For the S3 backend, we do at least one seek (+re-read) when calculating the chunk's md5, but it seems that the AWS SDK also makes extra seeks behind the scenes (noticed this during debugging).
For example: https://github.com/aws/aws-sdk-go/blob/main/aws/request/request.go#L307
So, for now I left the FIXME as you suggested. 😅

Finally, I had to add a new flag "multi-thread-chunk-size" to set the chunk size for the "openChunkWriter with writerAt" adapter.

Do you think we could merge this PR? If yes, should I rebase/squash some of the commits?

ncw · 2023-08-09T10:30:57Z

Amazing performance :-) Is 2.5 GiB/s network saturated? As we are doing down and up or would 5GiB/s be network saturated?

I think 50 Gbps is the aggregate bandwidth: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html That would explain the speed, but honestly I'm not 100% sure... sweat_smile

I think from reading the doc that yes we are saturating the network :-)

I think avoiding the SetModTime at the end if the Fs we are copying to does not have the f.Features().Partial feature flag set is probably the right thing to do. Fs without Partial only have the complete file at the end so must get the correct modtime at that point too. Sound OK?

I reverted my previous commit and implemented it as you said.

👍

What we want is something like this

rclone/fs/accounting/accounting.go

Line 564 in 4444037

func (acc *Account) WrapStream(in io.Reader) io.Reader {

That wraps an io.ReadSeeker instead.
I'd be happy to leave a FIXME in the code and address this later if you want.

I tried that approach, but I couldn't make it work correctly, I think because of the seeks+reads. For the S3 backend, we do at least one seek (+re-read) when calculating the chunk's md5, but it seems that the AWS SDK also makes extra seeks behind the scenes (noticed this during debugging). For example: https://github.com/aws/aws-sdk-go/blob/main/aws/request/request.go#L307 So, for now I left the FIXME as you suggested. sweat_smile

That's fine. Can't fix all of the world's problems in one PR :-)

Finally, I had to add a new flag "multi-thread-chunk-size" to set the chunk size for the "openChunkWriter with writerAt" adapter.

Do you think we could merge this PR? If yes, should I rebase/squash some of the commits?

I think we are heading towards merge yes :-)

Can you rebase off master and then squash into logical changes like the plan we originally agreed. That would be perfect :-)

I think what I'd like to do then is pull this locally for the last review. I'll fix up any little things I notice when doing the final review if that is OK with you? Then I'll merge.

We can then work on the next bits!

…clone#7056

… available rclone#7056 If the feature OpenChunkWriter is not available, multithread tries to create an adapter from OpenWriterAt to OpenChunkWriter.

vitorog · 2023-08-10T07:43:19Z

I think we are heading towards merge yes :-)

Can you rebase off master and then squash into logical changes like the plan we originally agreed. That would be perfect :-)

I think what I'd like to do then is pull this locally for the last review. I'll fix up any little things I notice when doing the final review if that is OK with you? Then I'll merge.

We can then work on the next bits!

Sounds awesome, thanks!
I rebased and squashed the commits.

ncw

Thank you - this code is looking excellent :-)

I will merge this now.

Do you want to work on the final part of this

Refactor multipart upload in s3 to use version in operations

In another PR?

Or I can have a go at that if you want.

1.64 is going to be a very good release :-)

vitorog · 2023-08-14T09:48:50Z

Thank you - this code is looking excellent :-)

I will merge this now.

1.64 is going to be a very good release :-)

Thanks, @ncw !

Do you want to work on the final part of this

Refactor multipart upload in s3 to use version in operations

In another PR?

Or I can have a go at that if you want.

I can give it a try (though I could use some pointers).
I also want to try implementing @msays2000 's suggestion for resuming the multipart uploads.

ncw · 2023-08-14T16:08:46Z

Do you want to work on the final part of this

Refactor multipart upload in s3 to use version in operations

In another PR?
Or I can have a go at that if you want.

I can give it a try (though I could use some pointers).

I had a look at what this would involve and I think this bit isn't properly thought through yet! I will have a go with it, but I think it is going to be more complicated than I first thought!

I also want to try implementing @msays2000 's suggestion for resuming the multipart uploads.

I'd be interested to see what that looks like.

Note that there are some unmerged patches for resuming uploads in the backlog...

Rootax · 2023-08-14T17:19:04Z

Wait, does this work with every backend, or only S3 ? Good work either way !

EDIT : Only S3 it seems, upload to Dropbox still using only one stream/connex. Sorry I misunderstood the commit.

jorjao81 · 2023-08-14T18:34:05Z

As I understand, this PR creates an interface and implements it for S3 only (and for things that implement OpenWriterAt, probably only filesystems like local and the SMB backend). As other backends implement the interface, they will also get the capability. This has to be done one by one; and of course requires the backend provider to even support it (I looked into NetStorage because that would be a use case for us, but apparently there is no way).

ncw · 2023-08-15T20:47:41Z

There is more to do on this - I'm going to discuss on #7056 with some things to test :-)

vitorog changed the title ~~Add OpenChunkWriter and ChunkWriter intefaces and refactor s3.uploadMultpart to use them~~ Add OpenChunkWriter and ChunkWriter interfaces and refactor s3.uploadMultpart to use them Jul 19, 2023

ncw reviewed Jul 22, 2023

View reviewed changes

vitorog force-pushed the fast-s3-to-s3 branch from e3fc664 to 6a7c961 Compare July 24, 2023 16:14

vitorog force-pushed the fast-s3-to-s3 branch from 6a7c961 to 52ccbc1 Compare July 25, 2023 09:38

vitorog force-pushed the fast-s3-to-s3 branch from 52ccbc1 to 8e85200 Compare July 25, 2023 17:15

vitorog force-pushed the fast-s3-to-s3 branch 2 times, most recently from 69ce928 to 7a05ca0 Compare July 25, 2023 21:27

ncw reviewed Jul 28, 2023

View reviewed changes

msays2000 mentioned this pull request Aug 1, 2023

oracleobjectstorage: Use rclone's rate limiter in multipart transfers #7189

Merged

5 tasks

vitorog force-pushed the fast-s3-to-s3 branch from ee4ba03 to b5a9b3d Compare August 8, 2023 08:09

Vitor Gomes and others added 3 commits August 10, 2023 09:32

features: add new interfaces OpenChunkWriter and ChunkWriter rclone#7056

1ebc6ac

s3: refactor MultipartUpload to use OpenChunkWriter and ChunkWriter r…

5c26bec

…clone#7056

config: add "multi-thread-chunk-size" flag rclone#7056

44e2592

multithread: refactor multithread operation to use OpenChunkWriter if…

b682af9

… available rclone#7056 If the feature OpenChunkWriter is not available, multithread tries to create an adapter from OpenWriterAt to OpenChunkWriter.

vitorog force-pushed the fast-s3-to-s3 branch from b5a9b3d to b682af9 Compare August 10, 2023 07:38

vitorog changed the title ~~Add OpenChunkWriter and ChunkWriter interfaces and refactor s3.uploadMultpart to use them~~ Add OpenChunkWriter and ChunkWriter interfaces and refactor s3.uploadMultpart and multiThreadCopy to use them Aug 10, 2023

ncw approved these changes Aug 12, 2023

View reviewed changes

ncw merged commit 181feca into rclone:master Aug 12, 2023
10 checks passed

ncw mentioned this pull request Aug 15, 2023

parallel transfers between remotes #7061

Closed

5 tasks

msays2000 mentioned this pull request Aug 20, 2023

oracleobjectstorage: Use ChunkWriter Interfaces #7240

Merged

5 tasks

Add OpenChunkWriter and ChunkWriter interfaces and refactor s3.uploadMultpart and multiThreadCopy to use them #7154

Add OpenChunkWriter and ChunkWriter interfaces and refactor s3.uploadMultpart and multiThreadCopy to use them #7154

Conversation

vitorog commented Jul 19, 2023 • edited

What is the purpose of this change?

Was the change discussed in an issue or in the forum before?

Checklist

jorjao81 commented Jul 20, 2023

jorjao81 commented Jul 21, 2023

ncw commented Jul 22, 2023

ncw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncw commented Jul 22, 2023

jorjao81 commented Jul 24, 2023

ncw commented Jul 25, 2023

jorjao81 commented Jul 25, 2023

vitorog commented Jul 25, 2023 • edited

ncw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncw commented Jul 28, 2023

vitorog commented Jul 31, 2023 • edited

ncw commented Aug 1, 2023

vitorog commented Aug 3, 2023 • edited

vitorog commented Aug 3, 2023 • edited

msays2000 commented Aug 3, 2023 • edited

ncw commented Aug 4, 2023

vitorog commented Aug 8, 2023

ncw commented Aug 9, 2023

vitorog commented Aug 10, 2023

ncw left a comment

Choose a reason for hiding this comment

vitorog commented Aug 14, 2023

ncw commented Aug 14, 2023

Rootax commented Aug 14, 2023 • edited

jorjao81 commented Aug 14, 2023

ncw commented Aug 15, 2023

vitorog commented Jul 19, 2023 •

edited

vitorog commented Jul 25, 2023 •

edited

vitorog commented Jul 31, 2023 •

edited

vitorog commented Aug 3, 2023 •

edited

vitorog commented Aug 3, 2023 •

edited

msays2000 commented Aug 3, 2023 •

edited

Rootax commented Aug 14, 2023 •

edited