Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata (UID/GID) not set properly for multipart downloads from S3 to Unix file system #7424

Closed
ppapadopoulos opened this issue Nov 15, 2023 · 7 comments
Labels
bug Remote: Local filesystem Support Contract Issues made for customers with support contracts
Milestone

Comments

@ppapadopoulos
Copy link

The associated forum post URL from https://forum.rclone.org

https://forum.rclone.org/t/uid-gid-not-restored-for-objects-stored-in-glacier/42885/3

What is the problem you are having with rclone?

When downloading large files (larger than --multi-thread-cutoff), date,permissions are set properly, but UID/GID is NOT set properly.
Setting the --multi-thread-cutoff to be larger than the file size (forcing single thread), results in proper metadata

What is your rclone version (output from rclone version)

rclone v1.64.2-DEV
- os/version: rocky 8.8 (64 bit)
- os/kernel: 4.18.0-477.15.1.el8_8.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.4
- go/linking: dynamic
- go/tags: none

Which OS you are using and how many bits (e.g. Windows 7, 64 bit)

Rocky Linux 8.8

Which cloud storage system are you using? (e.g. Google Drive)

Amazon S3

The command you were trying to run (e.g. rclone copy /tmp remote:tmp)

rclone copy

# rclone --metadata --links --transfers 2 --checkers 32  --include pero.scaff.fa.pac  copy s3-backup:testing /tmp/ppapadop

A log from the command with the -vv flag (e.g. output from rclone -vv copy /tmp remote:tmp)

# $rclone --include pero.scaff.fa.pac -vv  copy s3-backup:testing /tmp/ppapadop
2023/11/15 13:27:59 DEBUG : rclone: Version "v1.64.2-DEV" starting with parameters ["rclone" "--config" "/root/rcs3/POC/config/rclone.conf" "--s3-shared-credentials-file" "/root/rcs3/POC/config/credentials" "--metadata" "--links" "--transfers" "2" "--checkers" "32" "--include" "pero.scaff.fa.pac" "-vv" "copy" "s3-backup:testing" "/tmp/ppapadop"]
2023/11/15 13:27:59 DEBUG : Creating backend with remote "s3-backup:testing"
2023/11/15 13:27:59 DEBUG : Using config file from "/root/rcs3/POC/config/rclone.conf"
2023/11/15 13:27:59 DEBUG : Creating backend with remote "s3-native:ppapadop-tmpstore1-uci-bkup-bucket/testing"
2023/11/15 13:27:59 DEBUG : s3-native: detected overridden config - adding "{DSqVk}" suffix to name
2023/11/15 13:27:59 DEBUG : fs cache: renaming cache item "s3-native:ppapadop-tmpstore1-uci-bkup-bucket/testing" to be canonical "s3-native{DSqVk}:ppapadop-tmpstore1-uci-bkup-bucket/testing"
2023/11/15 13:27:59 DEBUG : fs cache: renaming cache item "s3-backup:testing" to be canonical "s3-native{DSqVk}:ppapadop-tmpstore1-uci-bkup-bucket/testing"
2023/11/15 13:27:59 DEBUG : Creating backend with remote "/tmp/ppapadop"
2023/11/15 13:27:59 DEBUG : local: detected overridden config - adding "{b6816}" suffix to name
2023/11/15 13:27:59 DEBUG : fs cache: renaming cache item "/tmp/ppapadop" to be canonical "local{b6816}:/tmp/ppapadop"
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac: Need to transfer - File not found at Destination
2023/11/15 13:27:59 DEBUG : Local file system at /tmp/ppapadop: Waiting for checks to finish
2023/11/15 13:27:59 DEBUG : Local file system at /tmp/ppapadop: Waiting for transfers to finish
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac: multi-thread copy: write buffer set to 131072
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac: multi-thread copy: using backend concurrency of 4 instead of --multi-thread-streams 4
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac: Starting multi-thread copy with 10 chunks of size 64Mi with 4 parallel streams
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 4/10 (201326592-268435456) size 64Mi starting
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 1/10 (0-67108864) size 64Mi starting
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 2/10 (67108864-134217728) size 64Mi starting
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 3/10 (134217728-201326592) size 64Mi starting
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac.yinalon3.partial: writing chunk 3
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac.yinalon3.partial: writing chunk 0
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac.yinalon3.partial: writing chunk 2
2023/11/15 13:27:59 DEBUG : pero.scaff.fa.pac.yinalon3.partial: writing chunk 1
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 1/10 (0-67108864) size 64Mi finished
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 5/10 (268435456-335544320) size 64Mi starting
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 4/10 (201326592-268435456) size 64Mi finished
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 6/10 (335544320-402653184) size 64Mi starting
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac.yinalon3.partial: writing chunk 4
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac.yinalon3.partial: writing chunk 5
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 3/10 (134217728-201326592) size 64Mi finished
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 7/10 (402653184-469762048) size 64Mi starting
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 2/10 (67108864-134217728) size 64Mi finished
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 8/10 (469762048-536870912) size 64Mi starting
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac.yinalon3.partial: writing chunk 6
2023/11/15 13:28:03 DEBUG : pero.scaff.fa.pac.yinalon3.partial: writing chunk 7
2023/11/15 13:28:07 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 6/10 (335544320-402653184) size 64Mi finished
2023/11/15 13:28:07 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 9/10 (536870912-603979776) size 64Mi starting
2023/11/15 13:28:08 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 5/10 (268435456-335544320) size 64Mi finished
2023/11/15 13:28:08 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 10/10 (603979776-618791129) size 14.125Mi starting
2023/11/15 13:28:08 DEBUG : pero.scaff.fa.pac.yinalon3.partial: writing chunk 8
2023/11/15 13:28:08 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 8/10 (469762048-536870912) size 64Mi finished
2023/11/15 13:28:08 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 7/10 (402653184-469762048) size 64Mi finished
2023/11/15 13:28:08 DEBUG : pero.scaff.fa.pac.yinalon3.partial: writing chunk 9
2023/11/15 13:28:09 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 10/10 (603979776-618791129) size 14.125Mi finished
2023/11/15 13:28:12 DEBUG : pero.scaff.fa.pac: multi-thread copy: chunk 9/10 (536870912-603979776) size 64Mi finished
2023/11/15 13:28:12 DEBUG : pero.scaff.fa.pac: Finished multi-thread copy with 10 parts of size 64Mi
2023/11/15 13:28:14 DEBUG : pero.scaff.fa.pac: md5 = b436224037c28753190c0628a7269eb0 OK
2023/11/15 13:28:14 DEBUG : pero.scaff.fa.pac.yinalon3.partial: renamed to: pero.scaff.fa.pac
2023/11/15 13:28:14 INFO  : pero.scaff.fa.pac: Multi-thread Copied (new)
2023/11/15 13:28:14 INFO  : 
Transferred:   	  590.125 MiB / 590.125 MiB, 100%, 39.337 MiB/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:        15.8s

2023/11/15 13:28:14 DEBUG : 11 go routines active

Steps Reproduce

  1. Upload a file to S3 that is larger than the --multi-thread-cutoff limit (make sure to include --metadata)
  2. Check metadata on S3 to verify upload and metadata
  3. Download the file back from s3 to a different location
  4. Verify that UID/GID is incorrect
  5. Erase local copy of downloaded data
  6. Re-download the file back from S3, but set --multi-thread-cutoff to a size larger than the file itself
  7. Verify that UID/GID is correct

Details:

  1. Upload file greater than limit of 256Mi
rclone --metadata --links --include pero.scaff.fa.pac copy /pub/ppapadop/testdir s3-backup:testing
  1. Verify metadata in S3
# rclone  --metadata --links  lsf --format "ptTM" s3-backup:testing
pero.scaff.fa.pac;2019-04-05 12:37:55;STANDARD;{"atime":"2023-06-09T11:34:03-07:00","btime":"2023-11-15T16:34:30Z","content-type":"application/octet-stream","gid":"1698224","mode":"100644","mtime":"2019-04-05T12:37:55-07:00","tier":"STANDARD","uid":"1698224"}
  1. Download data to a temporary location
# rclone  --metadata --links  --include pero.scaff.fa.pac  copy s3-backup:testing /tmp/ppapadop
  1. Verify UID/GID incorrect (owned by root, not by uid=1698224 as it should be)
# ls -l /tmp/ppapadop
total 604292
-rw-r--r-- 1 root root 618791129 Apr  5  2019 pero.scaff.fa.pac
  1. erase local file
/bin/rm /tmp/ppapadop/pero.scaff.fa.pac
  1. Re-download data with --multi-thread-cutoff > size of file
# rclone --metadata --multi-thread-cutoff 1024Mi --include pero.scaff.fa.pac copy s3-backup:testing /tmp/ppapadop
  1. Verify that UID/GID is now set correctly (owned by user/group 1698224 (ppapadop)
# ls -l /tmp/ppapadop
total 604292
-rw-r--r-- 1 ppapadop ppapadop 618791129 Apr  5  2019 pero.scaff.fa.pac
 # id -u ppapadop
1698224

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.
@ncw
Copy link
Member

ncw commented Nov 16, 2023

I managed to replicate this

Setup

rclone test makefile 300M /tmp/300M
setfattr -n user.lettuce -v crispy /tmp/300M
rclone copy -vv -M /tmp/300M s3:rclone/

Verify metadata present on s3, gid and lettuce

$ rclone lsjson --stat -M s3:rclone/300M
{
	"Path": "300M",
	"Name": "300M",
	"Size": 314572800,
	"MimeType": "application/octet-stream",
	"ModTime": "2023-10-27T16:30:56.891738670+01:00",
	"IsDir": false,
	"Tier": "STANDARD",
	"Metadata": {
		"atime": "2023-11-16T17:42:28.790799057Z",
		"btime": "2023-11-16T17:42:57Z",
		"content-type": "application/octet-stream",
		"gid": "1000",
		"lettuce": "crispy",
		"mode": "100664",
		"mtime": "2023-10-27T16:30:56.89173867+01:00",
		"tier": "STANDARD",
		"uid": "1000"
	}
}

Download with multipart streams: S3 metadata mising :-(

$ rclone copyto -Mvv s3:rclone/300M /tmp/300M.copy
$ rclone lsjson --stat -M /tmp/300M.copy
{
	"Path": "300M.copy",
	"Name": "300M.copy",
	"Size": 314572800,
	"MimeType": "application/octet-stream",
	"ModTime": "2023-10-27T16:30:56.891738670+01:00",
	"IsDir": false,
	"Metadata": {
		"atime": "2023-11-16T17:48:42.00153649Z",
		"btime": "2023-11-16T17:48:03.358003382Z",
		"gid": "1000",
		"mode": "100664",
		"mtime": "2023-10-27T16:30:56.89173867+01:00",
		"uid": "1000"
	}
}

Download without multithread streams - works!

$ rclone copyto --multi-thread-streams 0 -Mvv s3:rclone/300M /tmp/300M.copy.no-multithread
$ rclone lsjson --stat -M /tmp/300M.copy.no-multithread 
{
	"Path": "300M.copy.no-multithread",
	"Name": "300M.copy.no-multithread",
	"Size": 314572800,
	"MimeType": "application/octet-stream",
	"ModTime": "2023-10-27T16:30:56.891738670+01:00",
	"IsDir": false,
	"Metadata": {
		"atime": "2023-11-16T17:42:28.790799057Z",
		"btime": "2023-11-16T17:49:40.840851377Z",
		"content-type": "application/octet-stream",
		"gid": "1000",
		"lettuce": "crispy",
		"mode": "100664",
		"mtime": "2023-10-27T16:30:56.89173867+01:00",
		"tier": "STANDARD",
		"uid": "1000"
	}
}

I think this is a bug in the local filesystem.

In fact it will manifest with any backend which supports the
WriterAt interface which doesn't have a way to pass metadata :-(

So to fix this this will need a change of the WriterAt interface so that we supply metadata.

The only backend which supports WriterAt and metadata is the local backend.

It also needs to be in the integration tests!

Note to self --metadata-set doesn't seem to be working either with multipart copies or maybe at all... Just finger trouble forgetting --metadata flag - maybe it should warn?

@ncw ncw added this to the v1.65 milestone Nov 16, 2023
@ncw ncw modified the milestones: v1.65, v1.66 Jan 3, 2024
@ncw ncw modified the milestones: v1.66, v1.67 Mar 10, 2024
ncw added a commit that referenced this issue May 9, 2024
Before this change multipart downloads to the local disk with
--metadata failed to have their metadata set properly.

This was because the OpenWriterAt interface doesn't receive metadata
when creating the object.

This patch fixes the problem by using the recently introduced
Object.SetMetadata method to set the metadata on the object after the
download has completed (when using --metadata). If the backend we are
copying to is using OpenWriterAt but the Object doesn't support
SetMetadata then it will write an ERROR level log but complete
successfully. This should not happen at the moment as only the local
backend supports metadata and OpenWriterAt but it may in the future.

It also adds a test to check metadata is preserved when doing
multipart transfers.

Fixes #7424
@ncw
Copy link
Member

ncw commented May 9, 2024

@ppapadopoulos I've finally found some time to fix this!

I think this should work properly now - any testing much appreciated - thank you.

v1.67.0-beta.7927.c3eb9ca13.fix-7424-metadata on branch fix-7424-metadata

@ncw ncw added the Support Contract Issues made for customers with support contracts label May 10, 2024
Copy link

Heads up @rclone/support - the "Support Contract" label was applied to this issue.

@ppapadopoulos
Copy link
Author

ppapadopoulos commented May 10, 2024 via email

@ppapadopoulos
Copy link
Author

ppapadopoulos commented May 10, 2024 via email

@ppapadopoulos
Copy link
Author

ppapadopoulos commented May 10, 2024 via email

@ncw ncw closed this as completed in 6a0a54a May 14, 2024
@ncw
Copy link
Member

ncw commented May 14, 2024

Thank you for testing @ppapadopoulos

I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.67

If you need anything else please drop us an email :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Remote: Local filesystem Support Contract Issues made for customers with support contracts
Projects
None yet
Development

No branches or pull requests

2 participants