Files appear truncated when writeback cache is active #88

Closed
brettowe opened this Issue Sep 1, 2017 · 15 comments

Comments

Projects
None yet
4 participants

brettowe commented Sep 1, 2017

I'm using sshfs 3.2.0 on arch between a raspberry pi and an x86 box
sshfs being ran on the pi in this case but I had the reverse do the same
if in the first case I connect directory to my x86 box and I edit a file with vim for example and save
the file that is accessed on the pi is corrupt for say using golang and building i see errors like
"invalid NUL character" pointing to a line 1 higher than EOF
and "syntax error: unexpected EOF, expecting }"
when I was doing the reverse and looking at png files made on the pi viewing from x86 i would see the lower 3rd or so show corruption
the pngs were made from gnuplot
looking at md5s or whichever shows mismatch
ls -l for files looks same tho
if drop the mount and reconnect files are fine
I've tried playing with the cache options none seem to help
also tried resaving or regenerating the files and once corrupt they never seem to stop until disconnection

Contributor

Nikratio commented Sep 7, 2017

Thanks for the report! Does mounting with -o writeback_cache=no make a difference?

@Nikratio Nikratio added the needs-info label Sep 7, 2017

brettowe commented Sep 8, 2017

yes adding that option looks to solve it
tested it by updating on both sides of sshfs mounted on x86 side connecting to pi
saw no corrupt images during testing and no evidence on text files of corruption via md5sum or errors while trying to use them

should mention as it looks like I forgot to previously I'd consider this a regression as I hadn't seen the issue on previous versions

tksuoran commented Sep 13, 2017

I am also experiencing this issue.

I am using Ubuntu 16.04. Due to #7 (which I also can reproduce consistently) I have removed stock libfuse and sshfs packages and instead manually built and installed latest libfuse and sshfs. sshfs --version output:

SSHFS version 3.2.0
FUSE library version 3.1.1
using FUSE kernel interface version 7.26
fusermount3 version: 3.1.1

When working with sshfs mount, I can 100% reproduce a case in which a file appears truncated when seen through sshfs. Meanwhile the file is complete in the remote end. Using -o writeback_cache=no avoids the issue. It seems to be as writeback cache is broken in sshfs 3.2.0.

Contributor

Nikratio commented Sep 19, 2017

@tksuoran thanks for reporting this! Hopefully I won't need a Raspberry Pi to reproduce the problem then :-).

Could you provide more detailed instructions for reproducing the issue? Ideally would be a script that connects to localhost and triggers the problem.

@Nikratio Nikratio changed the title from corruption issues to Files appear truncated when writeback cache is active Sep 19, 2017

@Nikratio Nikratio self-assigned this Sep 19, 2017

@Nikratio Nikratio added the bug label Sep 19, 2017

Contributor

Nikratio commented Sep 19, 2017

Apologies for taking so long to realize this, but I just read the original problem description again and this behavior is actually expected. Enabling writeback caching means you have no control over when written data will actually be transmitted to the client. Seeing incorrect data on the server while the filesystem is mounted is expected behavior - if you don't want that, do not use writeback caching :-).

It would be different if the files were still truncated after unmounting, or if they also appeared truncated on the client - but if I understand correctly, this isn't the case, right?

@Nikratio Nikratio removed the bug label Sep 19, 2017

the client I assume would be the one using sshfs to make the mount point locally
the issue I'm seeing is file gets modified server side and client sees corruption

Contributor

Nikratio commented Sep 19, 2017

That is expected behavior as well.

However, since people have apparently been relying on this, it may be an argument for not enabling writeback caching by default. Hmm..

I'll submit that nfs can do this without special options

mk-fg commented Sep 19, 2017

it may be an argument for not enabling writeback caching by default

Current default behavior - silent data corruption after any remote change (which is quite common when using sshfs for e.g. configuration) - is very surprising and terrible in a filesystem, while slower operation in some cases is not surprising at all for network fs and will not screw anyone over.

Documentation for such cache should have huge warning in all-caps imho, and maybe even print one when used, instead of being on by default.

mk-fg commented Sep 20, 2017

Hopefully I won't need a Raspberry Pi to reproduce the problem
Enabling writeback caching means you have no control over when written data will actually be transmitted to the client.

It's interesting that I've bumped into this issue with arm boards as well.
When re-reading file (via sshfs client) that was changed on ssh server (locally, not via sshfs), e.g. few lines of text added in the middle, text in it appears truncated and rest is padded with NUL bytes.

Either old or new version of it wouldn't be that surprising with caching, but corrupted version is - should option really produce that behavior?

I mean, at the time of reading file contents on remote host should be perfectly stable and valid, and there are no NUL bytes in it, nor ever were, why'd sshfs ever return these?

Contributor

Nikratio commented Sep 20, 2017

Lots of comments, let's see if I can address them all:

  • @brettowe I'm not sure what exactly you mean with "this", but NFS is in a somewhat better situation since it sits in the kernel. SSHFS can only write data when it receives them from the FUSE kernel module. I'm sure patches to improve the writeback behaviour will be looked upon favorably - are you volunteering to work on this?
  • @mk-fg As long as you are only reading data, you should not have any problems. If you experience this, could you provide more detailed information (maybe in a separate bug, this ones is rather busy already)? Ideally would be a script that connects to localhost and triggers the problem.
  • There always has been (and always will be) a chance to get silent data corruption if you make changes on both client and server, even if writeback is turned off. However, without writeback the window in which simultanuous access can cause corruption is much shorter (and mostly determined by network latency)
  • Yes, I agree that the current behavior is not ideal. When making the decision to enable writeback by default I was assuming that the kernel would flush the buffers within a few seconds, so that the window for corruption to happen wasn't significantly bigger than what exists without writeback. However, the writeback cache should definitely be flushed when the file is closed, so I am surprised that so many people report problems. In other words, while the chance of corruption cannot be eliminated, there may still be some bug that would at least improve the situation a lot. Could the people who experience problems please share what exactly they do? Ideally by providing a script that connects to localhost and triggers the problem?

mk-fg commented Sep 20, 2017

Could the people who experience problems please share what exactly they do? Ideally by providing a script that connects to localhost and triggers the problem?

Yeah, sorry, will definitely try to do that in the next few days, seem to be 100% reproducible here, so gotta be worst-case reproducible with qemu-user-aarch64/arm chroot and/or netem, though hopefully with simple localhost connection as well.

Thanks!

Contributor

Nikratio commented Sep 20, 2017

I think the reported problems may have the same root cause as #93 - if the attributes are not updated correctly, the file contents would appear truncated or zero-filled.

my comment was mainly pointing out that another remote filesystem can be treated this way and no corruption happens nor is it expected.
I might expect corruption if I'm writing on both sides at same time as that could be a clock sync is required issue and it's not in sync.
I'm however not trying for high speed access.

my usage case is in 1 sshfs to the pi from desktop
ssh into pi run a collection of gnuplot scripts that generate pngs
as soon as the scripts return to prompt I then open the png from desktop side to view them
and that's when I would see some if not all are half corrupt

2nd way is on pi sshfs to desktop
on desktop edit files with vim save then on pi try to compile the file just saved and get errors about file being corrupt

@Nikratio Nikratio closed this in d193b19 Sep 20, 2017

mk-fg commented Sep 20, 2017

Oh well, sshfs is awesome for many purposes that don't really need cache anyway.
Thanks for looking into it, looking forward to not needing that extra -o on sshfs lines :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment