Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket Based Remotes: empty directories #3453

Closed
2 of 6 tasks
yparitcher opened this issue Aug 18, 2019 · 15 comments
Closed
2 of 6 tasks

Bucket Based Remotes: empty directories #3453

yparitcher opened this issue Aug 18, 2019 · 15 comments

Comments

@yparitcher
Copy link
Contributor

yparitcher commented Aug 18, 2019

Progress

  1. error on empty directory
  • azureblob
  • b2
  • gcs
  • qingstor
  • s3
  • swift

Current Situation:

Rclone does not officially support empty directories on BBR, however as many parts of the rclone api use file based systems there are many edge cases where rclone acts as if empty directories exist on BBR.
this causes non existent paths to be validated as rclone cant tell the difference between them and empty directories. this can cause inconsistencies with mount etc.
this has been discussed before, see #3427 #3415

Proposed solutions:

1. enforce non support of empty directories on BBR:

  • make empty directories be an invalid path
  • must fix list commands
  • this will need work to make consistent with mount and similar commands which expect a filesystem (empty directories)

2. support empty directories on BBR via a hidden empty file:

  • this is already done by some clients on other remotes (B2 web client, some S3 clients)
  • need standardization to ensure that rclone uses the same files as others
  • require refactoring of many BBR to include this logic
  • cause some confusion about these files, but make filesystem commands (mount) work better
  • requires more api calls - more expensive

3. 1 & 2 via a flag:

  • this give the most options but will be slightly more confusing to use and require duplicate codepaths

if i missed anything please comment and i will add them in.

this will be a significant decision. rclone should have an policy which will help unified implementations for different backends.

i do not mind helping will this but it will definitely require help from many people.
thanks to all those that have made these remotes work until now
@ncw Please review

@ncw ncw added this to the v1.50 milestone Aug 21, 2019
@ncw
Copy link
Member

ncw commented Aug 21, 2019

I think we should probably do 1. first - no flag needed.

As for 2. rclone mostly just contents itself with ignoring the 0 length directory files other programs make. Making rclone actively put them in would be quite a lot of work so if we want to do it, should probably be in phase 2. I think it needs a backend flag so you can set it on a per backend basis.

Unifying code between BBR backends will be important!

There are 6 BBR b2, s3, gcs, swift, qingstor, azureblob. It is possible to get free accounts for all of those except for gcs and azureblob. (you can test s3 against minio or alibabacloud). gcs has a free tier and azureblob may too. Signing up for free services will probably mean putting your credit card in though!

I'd love to have your help with this :-)

@rclone rclone deleted a comment from Hermain Aug 23, 2019
@yparitcher
Copy link
Contributor Author

besides for the azureblob, b2, s3, gcs, swift, qingstor that are bucket based Features().BucketBased

googlephotos and hubic do not have the empty directory flag set: Features().CanHaveEmptyDirectories
what is their status regarding this?

@yparitcher
Copy link
Contributor Author

Right now on BBR Mkdir is a wrapper for createbucket and always (if the bucket is valid) returns without error.

in order to not support empty directories,i would like to make Mkdir return an error on BBR when called on a non existent directory. this will require rewriting some tests etc that expect Mkdir to always succeed.

the same goes for Rmdir

does this sound reasonable?

@yparitcher
Copy link
Contributor Author

operations.Rmdirs() has different behavior w & w/o --fast-list/listR:

func Rmdirs(ctx context.Context, f fs.Fs, dir string, leaveRoot bool) error {
dirEmpty := make(map[string]bool)
dirEmpty[dir] = !leaveRoot
err := walk.Walk(ctx, f, dir, true, fs.Config.MaxDepth, func(dirPath string, entries fs.DirEntries, err error) error {
if err != nil {
fs.CountError(err)
fs.Errorf(f, "Failed to list %q: %v", dirPath, err)
return nil
}

  • without listR errors in walk.Walk() are trapped by the func passed to Walk() and are not propagated. (if a wrong directory is listed causing an error the error is printed here and the Rmdirs succeeds without error.)
  • with listR errors in walk.Walk() are not trapped by the func and are propagated. (if a wrong directory is listed causing an error Rmdirserrors and fails.)

@ncw
Copy link
Member

ncw commented Aug 29, 2019

besides for the azureblob, b2, s3, gcs, swift, qingstor that are bucket based Features().BucketBased

googlephotos and hubic do not have the empty directory flag set: Features().CanHaveEmptyDirectories
what is their status regarding this?

hubic is a thin wrapper around swift so that is correct.

I think the google photos is a mistake - you can have empty directories.

@ncw
Copy link
Member

ncw commented Aug 29, 2019

Right now on BBR Mkdir is a wrapper for createbucket and always (if the bucket is valid) returns without error.

in order to not support empty directories,i would like to make Mkdir return an error on BBR when called on a non existent directory. this will require rewriting some tests etc that expect Mkdir to always succeed.

the same goes for Rmdir

does this sound reasonable?

If the directory doesn't exist when we do rclone mkdir bbr:bucket/dir that is probably the moment we should be creating the zero length directory marker

I think returning an error will break too much stuff - we want the directory made, we don't want an error returned saying that we can't make the directory.

@ncw
Copy link
Member

ncw commented Aug 29, 2019

operations.Rmdirs() has different behavior w & w/o --fast-list/listR:

func Rmdirs(ctx context.Context, f fs.Fs, dir string, leaveRoot bool) error {
dirEmpty := make(map[string]bool)
dirEmpty[dir] = !leaveRoot
err := walk.Walk(ctx, f, dir, true, fs.Config.MaxDepth, func(dirPath string, entries fs.DirEntries, err error) error {
if err != nil {
fs.CountError(err)
fs.Errorf(f, "Failed to list %q: %v", dirPath, err)
return nil
}

  • without listR errors in walk.Walk() are trapped by the func passed to Walk() and are not propagated. (if a wrong directory is listed causing an error the error is printed here and the Rmdirs succeeds without error.)

That is so we do as much as possible and report a failure at the end.

  • with listR errors in walk.Walk() are not trapped by the func and are propagated. (if a wrong directory is listed causing an error Rmdirserrors and fails.)

Hmm, have you got a demo I can try for this?

@yparitcher
Copy link
Contributor Author

If the directory doesn't exist when we do rclone mkdir bbr:bucket/dir that is probably the moment we should be creating the zero length directory marker

exactly.

I think returning an error will break too much stuff - we want the directory made, we don't want an error returned saying that we can't make the directory.

i agree, just this is limiting 1. to be just in the list command.
also B2 does not allow objects ending in / so the directory marker would have to be a hidden subfile. this would require redoing all that stuff anyway to make it detect the markers properly.

i am not sure which way to go here.

yparitcher added a commit to yparitcher/rclone that referenced this issue Aug 29, 2019
yparitcher added a commit to yparitcher/rclone that referenced this issue Aug 29, 2019
yparitcher added a commit to yparitcher/rclone that referenced this issue Aug 29, 2019
@yparitcher
Copy link
Contributor Author

yparitcher commented Aug 29, 2019

Hmm, have you got a demo I can try for this?

try running the S3 integration tests on https://github.com/yparitcher/rclone/tree/BBR-empty-dir
the operations test purge will succeed without listR and fail with listR

also BBR give different behavior when run from the root or from a bucket you caught most of them in #3421 however some tests i had to change to work when running from a bucket, as the tests were designed for root

@shynome
Copy link

shynome commented Dec 20, 2020

I backup my postgres db directory to s3 storage with rclone, today I have verify the backup, it is broken because lose some empty directory like pg_tblspc, pg_replslot, I think we should add some warning for someone who sync empty directory to s3 storage, because today we have not implemented this feature

I like support empty directories on BBR via a hidden empty file

the full log
2020-12-20 13:26:14.201 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-12-20 13:26:14.201 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2020-12-20 13:26:14.207 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-12-20 13:26:14.222 UTC [1] LOG:  could not open directory "pg_tblspc": No such file or directory
2020-12-20 13:26:14.226 UTC [19] LOG:  database system was interrupted; last known up at 2020-12-20 13:09:08 UTC
2020-12-20 13:26:14.226 UTC [19] LOG:  creating missing WAL directory "pg_wal/archive_status"
2020-12-20 13:26:14.467 UTC [19] LOG:  could not open directory "pg_tblspc": No such file or directory
2020-12-20 13:26:14.468 UTC [19] LOG:  could not open directory "pg_tblspc": No such file or directory
2020-12-20 13:26:14.468 UTC [19] FATAL:  could not open directory "pg_replslot": No such file or directory
2020-12-20 13:26:14.469 UTC [1] LOG:  startup process (PID 19) exited with exit code 1
2020-12-20 13:26:14.469 UTC [1] LOG:  aborting startup due to startup process failure
2020-12-20 13:26:14.472 UTC [1] LOG:  database system is shut down

Jancis added a commit to Jancis/rclone that referenced this issue May 13, 2021
Jancis added a commit to Jancis/rclone that referenced this issue May 13, 2021
ncw added a commit to Jancis/rclone that referenced this issue Apr 28, 2023
- Report correct feature flag
- Fix test failures due to that
- don't output the root directory marker
- Don't create the directory marker if it is the bucket or root
- Create directories when uploading files
ncw pushed a commit that referenced this issue Apr 28, 2023
ncw added a commit that referenced this issue Apr 28, 2023
- Report correct feature flag
- Fix test failures due to that
- don't output the root directory marker
- Don't create the directory marker if it is the bucket or root
- Create directories when uploading files
ncw pushed a commit that referenced this issue Apr 28, 2023
ncw added a commit that referenced this issue Apr 28, 2023
- Report correct feature flag
- Fix test failures due to that
- don't output the root directory marker
- Don't create the directory marker if it is the bucket or root
- Create directories when uploading files
ncw added a commit that referenced this issue May 5, 2023
Use Update to upload the directory markers
ncw added a commit that referenced this issue May 5, 2023
Use Update to upload the directory markers
ncw added a commit that referenced this issue May 5, 2023
ncw added a commit that referenced this issue May 7, 2023
Use Update to upload the directory markers
ncw added a commit that referenced this issue May 7, 2023
Use Update to upload the directory markers
ncw added a commit that referenced this issue May 7, 2023
joec4i added a commit to joec4i/rclone that referenced this issue May 22, 2023
* Support --swift-directory-markers for creation of Swift directory markers
* Update TestSwiftAIO to allow retries
joec4i added a commit to joec4i/rclone that referenced this issue May 22, 2023
* Support --swift-directory-markers for creation of Swift directory markers
* Update TestSwiftAIO to allow retries
joec4i added a commit to bigcommerce/rclone that referenced this issue May 28, 2023
* Support --swift-directory-markers for creation of Swift directory markers
* Update TestSwiftAIO to allow retries
miku added a commit to internetarchive/rclone that referenced this issue Jun 2, 2023
* master: (51 commits)
  build(deps): bump github.com/cloudflare/circl from 1.1.0 to 1.3.3
  sftp: don't check remote points to a file if it ends with /
  sftp: don't stat directories before listing them
  pikpak: set the NoMultiThreading feature flag to disable multi-thread copy
  operations: Don't use multi-thread copy if the backend doesn't support it rclone#6915
  fs: add new backend feature NoMultiThreading
  ftp: Fix "501 Not a valid pathname." errors when creating directories
  ftp: fix "unsupported LIST line" errors on startup
  Add Janne Hellsten to contributors
  operations: implement uploads to temp name with --inplace to disable
  fs: Implement PartialUploads feature flag
  sftp: fix move to allow overwriting existing files
  fs: when creating new fs.OverrideRemotes don't layer overrides if not needed
  fs: fix String() method on fs.OverrideRemote
  uptobox: add --uptobox-private flag to make all uploaded files private
  azureblob: empty directory markers rclone#3453
  gcs: fix directory marker code rclone#3453
  s3: fix directory marker code rclone#3453
  azureblob: fix azure blob uploads with multiple bits of metadata
  Add Andrei Smirnov to contributors
  ...
joec4i added a commit to joec4i/rclone that referenced this issue Jun 15, 2023
* Support --swift-directory-markers for creation of Swift directory markers
* Update TestSwiftAIO to support multiple test runs
multikatt pushed a commit to multikatt/rclone that referenced this issue Aug 27, 2023
multikatt pushed a commit to multikatt/rclone that referenced this issue Aug 27, 2023
- Report correct feature flag
- Fix test failures due to that
- don't output the root directory marker
- Don't create the directory marker if it is the bucket or root
- Create directories when uploading files
multikatt pushed a commit to multikatt/rclone that referenced this issue Aug 27, 2023
multikatt pushed a commit to multikatt/rclone that referenced this issue Aug 27, 2023
- Report correct feature flag
- Fix test failures due to that
- don't output the root directory marker
- Don't create the directory marker if it is the bucket or root
- Create directories when uploading files
multikatt pushed a commit to multikatt/rclone that referenced this issue Aug 27, 2023
Use Update to upload the directory markers
multikatt pushed a commit to multikatt/rclone that referenced this issue Aug 27, 2023
Use Update to upload the directory markers
multikatt pushed a commit to multikatt/rclone that referenced this issue Aug 27, 2023
@ncw
Copy link
Member

ncw commented Sep 3, 2023

We now have these to support empty directory markers

  --azureblob-directory-markers   Upload an empty object with a trailing slash when a new directory is created
  --gcs-directory-markers         Upload an empty object with a trailing slash when a new directory is created
  --s3-directory-markers          Upload an empty object with a trailing slash when a new directory is created

This covers the major bucket based backends.

I'm going to close this now. We might do swift and b2 in the future depending on demand.

@ncw ncw closed this as completed Sep 3, 2023
joec4i added a commit to joec4i/rclone that referenced this issue Mar 18, 2024
* Support --swift-directory-markers for creation of Swift directory markers
* Update TestSwiftAIO to support multiple test runs
joec4i added a commit to bigcommerce/rclone that referenced this issue Mar 19, 2024
* Support --swift-directory-markers for creation of Swift directory markers
* Update TestSwiftAIO to support multiple test runs
@lemeshovich
Copy link

We now have these to support empty directory markers

please add --b2-directory-markers 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment