Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vfs: improve VFS cache to accept very long file/directory names #1907

Open
zenjabba opened this issue Dec 11, 2017 · 28 comments
Open

vfs: improve VFS cache to accept very long file/directory names #1907

zenjabba opened this issue Dec 11, 2017 · 28 comments
Assignees
Milestone

Comments

@zenjabba
Copy link

Unable to write cache chunk due to filename being too long

2017/12/11 07:05:37 ERROR : worker-1 <ur76lghmk0lgu8no3i3i9iphqn8d0ri3ad7rhs2mhdn74igln7od06pnglvrkj77313471lo18omdv6526tit9h46t6k6r9llocpj69lb15a838i41tlekoth970eqvju2ct6pnsjpsqrnj0bm4soahn34q68turb7udlirm9ql16j216d5684fa3udr1fcapervhrknhpc7dggt3qvo5t8ooihp656ulthofgd4l9l68psbi388elk7dq3unkhk>: failed caching chunk in storage 0: open /tmp/rclone-cache-streamer2/streamer2_cache/s5dri5l6lhg70tqicfopng89h4/pkk7v920jgq9jbo3v1lg6goo80/p2kfesuq32ru2q93qh1otf7j2k/pau1o64bl8uiertgbck2vcpmt0pte6vqvfs9t321l4maac6gm1rigf9s7v62pm17f5ds07ter3if5r64p7iubj4kqskb3l1oofac4que2lcn97ab5eor2uqpvsu235bl/ur76lghmk0lgu8no3i3i9iphqn8d0ri3ad7rhs2mhdn74igln7od06pnglvrkj77313471lo18omdv6526tit9h46t6k6r9llocpj69lb15a838i41tlekoth970eqvju2ct6pnsjpsqrnj0bm4soahn34q68turb7udlirm9ql16j216d5684fa3udr1fcapervhrknhpc7dggt3qvo5t8ooihp656ulthofgd4l9l68psbi388elk7dq3unkhk/0: file name too long

rclone v1.38-223-g7c972d37β

  • os/arch: linux/amd64
  • go version: go1.9.2
@ncw
Copy link
Member

ncw commented Dec 11, 2017

The chunk before the /0 is 256 bytes long exactly - most linux file systems only allow 255 byte segments.

I guess this is normally stored on drive which allows very large path names, so this hasn't been a problem before.

What do you think @remusb ?

@zenjabba
Copy link
Author

or maybe hash the file names to 128 bytes?

@remusb
Copy link
Collaborator

remusb commented Dec 11, 2017

It's the first time seen in cache cause the files are now stored on disk too. This wouldn't work on crypt + local either.

Sadly, mapping filenames would add complexity to cache which I don't think would benefit it right now. I would like to have this fixed by fixing crypt to be allowed to be wrapped by cache and in that way fix this, performance and cosmetic issues caused.

@remusb remusb self-assigned this Dec 11, 2017
@cal2195
Copy link

cal2195 commented Dec 13, 2017

By that last comment, do you mean allowing remote->crypt->cache to work well?

And if so, is there a way to encrypt the cached contents? (Is that what the password is for when setting up a cache remote?)

@remusb remusb added this to the Known Problem milestone Dec 19, 2017
@Covernel
Copy link

Covernel commented Feb 4, 2018

will it be fixed?

@remusb
Copy link
Collaborator

remusb commented Feb 4, 2018

The easiest way is to make crypt work behind cache which is something I intended to do anyway.

@mvia
Copy link

mvia commented Feb 13, 2018

Is there any chance for a fix soon?

@remusb
Copy link
Collaborator

remusb commented Feb 13, 2018

So this particular issue can't be fixed in a traditional way. It's a limitation of the OS and the fact that cache writes on the disk rather than directly to the cloud provider to provide persistency across rclone restarts.

There are multiple options to overcome it:

  1. don't wrap cache under crypt -> this is not recommended now as it can cause 403s. The work done here should help with the eventual fix to allow this order: Can't tell if Object.Open supports RangeOption or not #1825
  2. use shorter file names to accommodate for crypt expanding the file name
  3. maybe others like checking how you can tweak the OS to allow longer file names (long shot)

Note that 1 isn't really a fix to this. It's just a workaround.
Out of curiosity, what's the real length of the filename that can't be written? There was recently a doc change that says that file names shouldn't be longer than 143 characters with crypt: #2040

@Covernel
Copy link

Noted with Thanks.
I think 2 also not really a fix too.

@remusb
Copy link
Collaborator

remusb commented Feb 13, 2018

Yes, I do agree, one shouldn't need to rename their files but there's not much we can do about an OS limitation either.
If cache would trim the file names then the encrypted version wouldn't be decipherable anymore. If rclone does this from crypt then you'll have a lot of people asking who's renaming their files.

If you already have files with more than 143 then it won't work anyway.

@jusher00

This comment has been minimized.

@gryphonmyers

This comment has been minimized.

@ordinarygulp

This comment has been minimized.

@splitice
Copy link

Experienced this too, wow long names !

/mnt/temp/rclone/cache/chunk/cache/k519e9m2f1rcm04pqi8m8hemro/smldujvugip4vr7q438bueijn8/jtq5nucb6neic0leoike5q41jrmr33oap3hubsfe70cbjemnj4deugpmmtfld9bj9lor8g2cessel7oi51hufhdkkifqarrvi3fm3l6j5vf1k4gr65gjlvefstnu1ushbsie3uv285dglk3m1f5cluaqggev0c50n54egivh1auj1elh8lcos4afkm1vbrn9tui33j5gt878bc0ns7uqc1uh5u24sm0oju28k24bn4l3hi6ef26lgqdi1sbqmav62hdkd5fsuricelfds2loed4k08/0: file name too long'

282 characters !

What about introducing a second level for crypt? ${name:0:254}/${name:254} ? Since the name is known as the lookup key it should be possible to know which format would be required.

@douglassirk

This comment has been minimized.

@stuartro

This comment has been minimized.

@vb0
Copy link

vb0 commented Nov 2, 2019

Running into the same too and I was thinking we can side-step it by using a file-system that has more generous limits but it seems the 255 limits is kind of the norm even for file-systems designed (for a change) to really store more data than we'll get to have, probably ever (barring some major singularity). Really?! BRB, checking the calendar, yep the end of 2019 I wasn't imagining it.

Anyway the only one that seems to have more (4032 bytes) is Reiser4. I'll do some tests to see if this is really the case and they don't mean the total path length or something.

@esticle
Copy link

esticle commented Nov 28, 2019

Same here with cache'd crypt:

# grep -cE '^2019/11/27.*file name too long' rclone.log
8496

Presumably if the ordering 'cloud remote -> crypt -> cache' could occur without the known issue it would be less of a problem.

@Massaguana

This comment has been minimized.

@ivandeex
Copy link
Member

ivandeex commented Nov 15, 2020

@ncw @remusb
I think this might be mitigated by the recent vfs/cache improvements.

However, the current limits for a typical OS (max path length 255) in a typical setup (cache in a user directory like /home/username/.cache/rclone, overhead around 30 characters) could be described in the vfs docs, with a note about crypt.

@jarfil
Copy link

jarfil commented Jan 24, 2021

@ivandeex
The current max path length for Windows with long paths enabled, is about 32,767 [1].
For Linux it's 4,096 [2].
OSX seems to have a limit around 1,024 [3].

[1] https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation#enable-long-paths-in-windows-10-version-1607-and-later
[2] https://serverfault.com/a/306726/70926
[3] https://stackoverflow.com/questions/7140575/mac-os-x-lion-what-is-the-max-path-length

@ncw
Copy link
Member

ncw commented Jan 25, 2021

This is fixed with the VFS cache in 4f8ee73

@jarfil - those are the total lengths. I think this original issue was concerned with a path element being > 256 bytes which I think is still an issue on windows/macOS/linux

@zenjabba
Copy link
Author

zenjabba commented Jan 25, 2021 via email

@ivandeex
Copy link
Member

ivandeex commented Mar 13, 2021

@ncw

I propose the following enhancements to the VFS layer

  • add a new field realName in the vfscache.Item
  • Item.save() and Item.load() will check whether os.Open or os.Create returned a error with "path too long"
  • if that is the case, fill in the realName field in the json metadata and produce a compacted name, then use it to read/write chunks and metadata
  • the compacted name can be formed as the original relative path with segments longer than 256 chars replaced by their SHA-256 hexified (64 runes), and prepended with a prefix like __longNames so such chunks will go under a separate directory tree under vfs and vfsMeta
  • do the needful in other places of vfscache to settle the code changes
  • add a unit test

Does this proposal sound reasonable?

@ivandeex ivandeex changed the title file name too long - cache vfs: improve VFS cache to accept very long file/directory names Mar 13, 2021
@ivandeex ivandeex assigned ivandeex and ncw and unassigned remusb Mar 13, 2021
@ncw
Copy link
Member

ncw commented Mar 15, 2021

@ncw

I propose the following enhancements to the VFS layer

  • add a new field realName in the vfscache.Item
  • Item.save() and Item.load() will check whether os.Open or os.Create returned a error with "path too long"
  • if that is the case, fill in the realName field in the json metadata and produce a compacted name, then use it to read/write chunks and metadata
  • the compacted name can be formed as the original relative path with segments longer than 256 chars replaced by their SHA-256 hexified (64 runes), and prepended with a prefix like __longNames so such chunks will go under a separate directory tree under vfs and vfsMeta
  • do the needful in other places of vfscache to settle the code changes
  • add a unit test

Does this proposal sound reasonable?

Some thoughts

  • detection of the "path too long" error isn't straight forward!
  • that will need to work for Open as well as Create so we need to check that
  • I'm not sure this scheme will work for directories will it?
  • It might be easier to pick a max leaf size (say 250 bytes or UTF-8 encoded string) and say anything above that we will encode somehow
    • I'd suggest keeping the first 250-hash bytes constant then adding a hash of the full file name on.
  • That will mean that we know without doing any fs operations whether it needs encoding or not
  • It will also work for directories.

@ivandeex
Copy link
Member

  • detection of the "path too long" error isn't straight forward!
  • that will need to work for Open as well as Create so we need to check that
  • I'm not sure this scheme will work for directories will it?
  • It might be easier to pick a max leaf size (say 250 bytes or UTF-8 encoded string) and say anything above that we will encode somehow

I weighted between instability/system dependency of such a check on one side and a way to keep existing cache entries unchanged on the other side. Introducing a hard threshold on path segment length will make algorithm stable. However, it will make few existing cache entries with 250-256 char segments "jump" into a "long" cache subtree. We will have to rearrange control flow in vfscache a little to read metadata first and forcibly move such entries to a new place. That will add an extra disk access. Nevertheless, I totally agree with your approach.

I'd suggest keeping the first 250-hash bytes constant then adding a hash of the full file name on.
That will mean that we know without doing any fs operations whether it needs encoding or not
It will also work for directories.

This will not work. 250 chars below threshold + 64 chars of the hash makes 314, will not be accepted by local FS. We'll have to replace every long path segment completely by its hash (whether it's a directory name in the middle or a file name at the end).

@ncw
Copy link
Member

ncw commented Mar 15, 2021

I weighted between instability/system dependency of such a check on one side and a way to keep existing cache entries unchanged on the other side. Introducing a hard threshold on path segment length will make algorithm stable. However, it will make few existing cache entries with 250-256 char segments "jump" into a "long" cache subtree.

Good point.

The process that runs through the cache initially could fix these.

I'd rather keep them in the same vfs tree rather than move them to a new long cache subtree. I guess that makes the possibility of collisions but I would have thought it would be very low...

250-hash bytes

I meant 250-32 if using a 32 byte hash here. So files would become exactly 250 bytes long. This helps users when trying to recover lost files in the vfs cache.

"very long file ...... longer than 250 bytes.txt"

would then become exactly 250 bytes long with the hash on the end

"very long file ...XXXXXXXXXXXXXXXX"

@ivandeex ivandeex modified the milestones: Known Problem, Soon Mar 15, 2021
@ivandeex ivandeex modified the milestones: Soon, v1.57 Apr 4, 2021
@ivandeex ivandeex modified the milestones: v1.57, Soon May 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

18 participants