Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mount: issue with renames on mount with azureblob - excess kernel caching #4977

Closed
daniele-spl opened this issue Jan 30, 2021 · 7 comments
Closed

Comments

@daniele-spl
Copy link

What is the problem you are having with rclone?

Reference: https://forum.rclone.org/t/rclone-mount-vfs-azureblob/21503/22

The issue is, that rclone somehow does not handle renames properly. Internally it still uses the old names. As soon as this internal cache is gone, it cannot find the data as it searches at the old places instead of the new one.

What is your rclone version (output from rclone version)

Tested on:
1.53.2
1.53.3
1.53.4
and latest Betas

Which OS you are using and how many bits (e.g. Windows 7, 64 bit)

Suse Linux Enterprise 12 SP5

Which cloud storage system are you using? (e.g. Google Drive)

Azure Blob

The command you were trying to run (e.g. rclone copy /tmp remote:tmp)

rename from util-linux 2.33.2

A log from the command with the -vv flag (e.g. output from rclone -vv copy /tmp remote:tmp)

Can be found in rclone forum post

@ncw ncw changed the title Issue with renames on mount with azureblob mount: issue with renames on mount with azureblob - excess kernel caching Jan 31, 2021
@ncw
Copy link
Member

ncw commented Jan 31, 2021

I have managed to replicate this in the beta

I ran

rclone mount --vfs-cache-mode full -vv azure:rclone /mnt/tmp/

Then

$ cd /mnt/tmp/
$ mkdir rclonetest
$ echo "this is a test" > rclonetest/test.txt
$ cat rclonetest/test.txt
this is a test
$ mv rclonetest rclone1test
$ cat rclone1test/test.txt
this is a test

Very clearly in the log we can see that the second cat is accessing the old object

2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: Open: flags=OpenReadOnly
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: Open: flags=O_RDONLY
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: newRWFileHandle: 
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: >newRWFileHandle: err=<nil>
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: >Open: fd=rclonetest/test.txt (rw), err=<nil>
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: >Open: fh=&{rclonetest/test.txt (rw)}, err=<nil>
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: Attr: 
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: >Attr: a=valid=1s ino=0 size=15 mode=-rw-rw-r--, err=<nil>
2021/01/30 11:30:06 DEBUG : &{rclonetest/test.txt (rw)}: Read: len=4096, offset=0
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt(0xc0005c7b80): _readAt: size=4096, off=0
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt(0xc0005c7b80): openPending: 
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: vfs cache: checking remote fingerprint "15,2021-01-30 11:29:37.83851914 +0000 UTC,e19c1283c925b3206685ff522acfe3e6" against cached fingerprint "15,2021-01-30 11:29:37.83851914 +0000 UTC,e19c1283c925b3206685ff522acfe3e6"
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: vfs cache: truncate to size=15
2021/01/30 11:30:06 DEBUG : rclone1test: Added virtual directory entry vAddFile: "test.txt"
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt(0xc0005c7b80): >openPending: err=<nil>
2021/01/30 11:30:06 DEBUG : vfs cache: looking for range={Pos:0 Size:15} in [{Pos:0 Size:15}] - present true
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt(0xc0005c7b80): >_readAt: n=15, err=EOF
2021/01/30 11:30:06 DEBUG : &{rclonetest/test.txt (rw)}: >Read: read=15, err=<nil>
2021/01/30 11:30:06 DEBUG : &{rclonetest/test.txt (rw)}: Flush: 
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt(0xc0005c7b80): RWFileHandle.Flush
2021/01/30 11:30:06 DEBUG : &{rclonetest/test.txt (rw)}: >Flush: err=<nil>
2021/01/30 11:30:06 DEBUG : &{rclonetest/test.txt (rw)}: Release: 
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt(0xc0005c7b80): RWFileHandle.Release
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt(0xc0005c7b80): close: 
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt: vfs cache: setting modification time to 2021-01-30 11:29:37.83851914 +0000 UTC
2021/01/30 11:30:06 DEBUG : rclonetest/test.txt(0xc0005c7b80): >close: err=<nil>
2021/01/30 11:30:06 DEBUG : &{rclonetest/test.txt (rw)}: >Release: err=<nil>

This problem does not replicate with rclone cmount which means it is almost certainly due to caching in the kernel.

It replicates in a simpler way with --vfs-cache-mode off

rclone mount --vfs-cache-mode off -vv azure:rclone /mnt/tmp/
$ mkdir /mnt/tmp/rclonetest
$ echo "this is a test" > /mnt/tmp/rclonetest/test.txt
$ cat /mnt/tmp/rclonetest/test.txt
this is a test
$ mv /mnt/tmp/rclonetest /mnt/tmp/rclone1test
$ cat /mnt/tmp/rclone1test/test.txt
cat: /mnt/tmp/rclone1test/test.txt: Input/output error

With the same underlying cause of reading the wrong object.

I can prove this is due to kernel caching, as if I execute this as root after the second cat

sync; echo 2 > /proc/sys/vm/drop_caches

Then repeating the cat works fine.

$ mkdir /mnt/tmp/rclonetest
$ echo "this is a test" > /mnt/tmp/rclonetest/test.txt
$ cat /mnt/tmp/rclonetest/test.txt
this is a test
$ mv /mnt/tmp/rclonetest /mnt/tmp/rclone1test
$ cat /mnt/tmp/rclone1test/test.txt
cat: /mnt/tmp/rclone1test/test.txt: Input/output error

# sync; echo 2 > /proc/sys/vm/drop_caches executed from a root shell here

$ cat /mnt/tmp/rclone1test/test.txt
this is a test

Lots of digging around in the source and logs later I find that when the fuse library creates directories, it sets their valid time to 60 seconds. The mkdir produces this log which has a valid for 1m in it.

2021/01/31 11:27:21 DEBUG : fuse: <- Lookup [ID=0xa Node=0x1 Uid=1000 Gid=1000 Pid=883814] "rclonetest"
2021/01/31 11:27:21 DEBUG : /: Lookup: name="rclonetest"
2021/01/31 11:27:21 DEBUG : /: >Lookup: node=<nil>, err=no such file or directory
2021/01/31 11:27:21 DEBUG : fuse: -> [ID=0xa] Lookup error=ENOENT
2021/01/31 11:27:21 DEBUG : fuse: <- Mkdir [ID=0xc Node=0x1 Uid=1000 Gid=1000 Pid=883814] "rclonetest" mode=drwxrwxr-x umask=--------w-
2021/01/31 11:27:21 DEBUG : /: Mkdir: name="rclonetest"
2021/01/31 11:27:21 DEBUG : : Added virtual directory entry vAddDir: "rclonetest"
2021/01/31 11:27:21 DEBUG : /: >Mkdir: node=rclonetest/, err=<nil>
2021/01/31 11:27:21 DEBUG : rclonetest/: Attr: 
2021/01/31 11:27:21 DEBUG : rclonetest/: >Attr: attr=valid=1s ino=0 size=0 mode=drwxrwxr-x, err=<nil>
2021/01/31 11:27:21 DEBUG : fuse: -> [ID=0xc] Mkdir 0x2 gen=0 valid=1m0s attr={valid=1s ino=2824735186830538099 size=0 mode=drwxrwxr-x}

I haven't found a way to decrease this yet.

I've also been trying to invalidate the entry in the kernel cache, but I haven't succeeded with that yet either!

@ncw ncw added this to the v1.54 milestone Jan 31, 2021
@daniele-spl
Copy link
Author

sync; echo 2 > /proc/sys/vm/drop_caches
does not work for me to get the cache cleared. It still says Input/Output Error.
I reduced the caching time to 24 hours and dir cache time to 1 hour. I can still see some "losses" of the links.

@ncw
Copy link
Member

ncw commented Feb 1, 2021

sync; echo 2 > /proc/sys/vm/drop_caches
does not work for me to get the cache cleared. It still says Input/Output Error.

Did you do this after the rename but before the cat?

I reduced the caching time to 24 hours and dir cache time to 1 hour. I can still see some "losses" of the links.

I don't think that will help. The caching is in the kernel and in the fuse library.

@daniele-spl
Copy link
Author

Did you do this after the rename but before the cat?

Yes. Its renamed and I get the error.
When I try to drop the caches it does not do anything. I still get the error.

@daniele-spl
Copy link
Author

I think it is because of the rclone cache. I can run your mentioned command but rclone still have the old directory.
As vfs/refresh vfs/forget or kill -SIGHUP $(pidof rclone) does not clear the cache at all I can only restart rclone to solve the issue.
The cache is only cleared when i reduce the the max age and dir cache time to minimum. Otherwise its not possible to clear the cache.

@ncw ncw modified the milestones: v1.54, v1.55 Feb 3, 2021
ncw added a commit that referenced this issue Mar 16, 2021
It was discovered `rclone mount` (but not `rclone cmount`) cached
directories after rename which it shouldn't have done.

This caused strange empty listings of directories after rename and IO
errors when trying to access files in renamed directories.

This turned out to be the kernel caching the directories as basil/fuse
sets their expiry time to 60s for some reason.

This fix invalidates the relevant kernel cache entries in the for the
directories which fixes the problem.

Fixes: #4977
See: https://forum.rclone.org/t/after-a-directory-renmane-using-mv-files-are-not-visible-any-longer/22797
@ncw
Copy link
Member

ncw commented Mar 16, 2021

I think I've managed to fix this.

Can you have a go with

v1.55.0-beta.5303.b48137c9b.fix-4977-mount-caching on branch fix-4977-mount-caching (uploaded in 15-30 mins)

ncw added a commit that referenced this issue Mar 17, 2021
It was discovered `rclone mount` (but not `rclone cmount`) cached
directories after rename which it shouldn't have done.

This caused IO errors when trying to access files in renamed
directories on bucket based file systems.

This turned out to be the kernel caching the directories as basil/fuse
sets their expiry time to 60s for some reason.

This fix invalidates the relevant kernel cache entries in the for the
directories which fixes the problem.

Fixes: #4977
See: https://forum.rclone.org/t/after-a-directory-renmane-using-mv-files-are-not-visible-any-longer/22797
ncw added a commit that referenced this issue Mar 17, 2021
It was discovered `rclone mount` (but not `rclone cmount`) cached
directories after rename which it shouldn't have done.

This caused IO errors when trying to access files in renamed
directories on bucket based file systems.

This turned out to be the kernel caching the directories as basil/fuse
sets their expiry time to 60s for some reason.

This fix invalidates the relevant kernel cache entries in the for the
directories which fixes the problem.

Fixes: #4977
See: https://forum.rclone.org/t/after-a-directory-renmane-using-mv-files-are-not-visible-any-longer/22797
@ncw ncw closed this as completed in 4cc2a7f Mar 22, 2021
@ncw
Copy link
Member

ncw commented Mar 22, 2021

I've merged this to master now which means it will be in the latest beta in 15-30 mins and released in v1.55

negative0 pushed a commit to negative0/rclone that referenced this issue Aug 13, 2021
It was discovered `rclone mount` (but not `rclone cmount`) cached
directories after rename which it shouldn't have done.

This caused IO errors when trying to access files in renamed
directories on bucket based file systems.

This turned out to be the kernel caching the directories as basil/fuse
sets their expiry time to 60s for some reason.

This fix invalidates the relevant kernel cache entries in the for the
directories which fixes the problem.

Fixes: rclone#4977
See: https://forum.rclone.org/t/after-a-directory-renmane-using-mv-files-are-not-visible-any-longer/22797
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants