Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible glitch on ZFS on Linux #484

Closed
beorn7 opened this issue Jan 27, 2015 · 16 comments
Closed

Possible glitch on ZFS on Linux #484

beorn7 opened this issue Jan 27, 2015 · 16 comments

Comments

@beorn7
Copy link
Member

beorn7 commented Jan 27, 2015

The os.File.Seek method doesn't seem to return the right result when used to find out about the file offset.

That breaks TestPersistLoadDropChunks, but it might also break things in the storage layer quite badly...

@brian-brazil
Copy link
Contributor

Relevant bit of strace:

[pid 10855] open("/tmp/test_persistence169947877/d7/186cb399b4f2a1.db", O_WRONLY|O_CREAT|O_APPEND|O_CLOEXEC, 0640) = 20
[pid 10855] fstat(20, {st_mode=S_IFREG|0640, st_size=2082, ...}) = 0
[pid 10852] <... select resumed> )      = 0 (Timeout)
[pid 10855] write(2, "I0127 23:31:04.373673   10851 pe"..., 96I0127 23:31:04.373673   10851 persistence.go:674] DEBUG file size before persisting chunk: 2082
 <unfinished ...>
[pid 10852] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 10855] <... write resumed> )       = 96
[pid 10855] write(20, "\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\1\1\1\2\0\0\0\0\0\0\0\2366s\226"..., 1041) = 1041
[pid 10852] <... select resumed> )      = 0 (Timeout)
[pid 10855] lseek(20, 0, SEEK_CUR <unfinished ...>
[pid 10852] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 10855] <... lseek resumed> )       = 1041
[pid 10855] write(2, "I0127 23:31:04.373825   10851 pe"..., 91I0127 23:31:04.373825   10851 persistence.go:700] DEBUG offset after chunk persisted: 1041
) = 91
[pid 10855] fstat(20, {st_mode=S_IFREG|0640, st_size=3123, ...}) = 0
[pid 10852] <... select resumed> )      = 0 (Timeout)
[pid 10855] write(2, "I0127 23:31:04.373912   10851 pe"..., 90I0127 23:31:04.373912   10851 persistence.go:702] DEBUG size after persisting chunk: 3123
 <unfinished ...>
[pid 10852] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 10855] <... write resumed> )       = 90
[pid 10855] close(20)                   = 0

It opens the file at size 2082, appends 1041 bytes so you'd expect the lseek(20, 0, SEEK_CUR) to return 2082+1041=3123 (which is what fstat returns) but instead it returns 1041.

@brian-brazil
Copy link
Contributor

zfslib2 version is 0.6.3-2~precise, I'm on amd64.

@brian-brazil
Copy link
Contributor

Per http://pubs.opengroup.org/onlinepubs/009695399/functions/lseek.html
"bytes from the beginning of the file, shall be returned"

@ryao
Copy link

ryao commented Jan 28, 2015

Are other Linux filesystems behaving differently? The specification suggests that this behavior is correct:

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

http://pubs.opengroup.org/onlinepubs/009695399/functions/lseek.html

If other Linux filesystems (e.g. ext4) behave differently here, we will need to talk to their maintainers to compare notes on this discrepancy.

@brian-brazil
Copy link
Contributor

@ryao
Copy link

ryao commented Jan 28, 2015

I did. Thanks for posting the correct link. That said, it seems that I misread. If my cursory reading of the spec is to be believed, we should be returning 2082, not 1041.

@brian-brazil
Copy link
Contributor

Minimal test case: https://gist.github.com/brian-brazil/0022440426ea12728651

On zfs:

lseek returns: 3 Expected: 5

On ext4:

lseek returns: 5 Expected: 5

On tmpfs:

lseek returns: 5 Expected: 5

Linux mari 3.5.0-54-generic #81~precise1-Ubuntu SMP Tue Jul 15 04:02:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

@ryao
Copy link

ryao commented Jan 28, 2015

sarnold in #zfsonlinux on freenode commented that the spec requires two updates. One is to update the file offset to the EOF before a FAPPEND write operation and another is the normal increment following the write operation. This is consistent with what you have observed on other filesystems. Something like this should fix this:

diff --git a/module/zfs/zpl_file.c b/module/zfs/zpl_file.c
index cabe9bf..ffdb54d 100644
--- a/module/zfs/zpl_file.c
+++ b/module/zfs/zpl_file.c
@@ -283,6 +283,14 @@ zpl_write_common_iovec(struct inode *ip, const struct iovec *iovp, size_t count,
    if (error < 0)
        return (error);

+   /*
+    * The write() documentation in the SUS requires us to update the file
+    * offset whenever an FAPPEND write() is done. zfs_write() will update
+    * the uio offset to achieve this on Illumos. Copy it back to get the
+    * same behavior on Linux.
+    */
+   *ppos = uio.uio_loffset;
+
    wrote = count - uio.uio_resid;
    *ppos += wrote;
    task_io_account_write(wrote);

I will open a pull request later today after I have done proper testing.

@beorn7
Copy link
Member Author

beorn7 commented Jan 28, 2015

So it's not our fault. Cool. ;-)

What's a good way to communicate this to Prometheus users on ZFS? Just leave this issue open? Or should we start a 'known issues' list somewhere?

@brian-brazil
Copy link
Contributor

This sounds like something we should communicate to Go and ZFS users generally.

@beorn7
Copy link
Member Author

beorn7 commented Jan 28, 2015

Posted it to the G+ golang community. I'm not on go-nuts@ or any ZFS mailing list, though...

@beorn7
Copy link
Member Author

beorn7 commented Jan 29, 2015

ZFS devs have already chimed in, apparently working on a PR. So I
leave it to them if they want to file an issue or not.

On Wed, Jan 28, 2015 at 12:23 PM, Jakub Turski notifications@github.com wrote:

https://github.com/zfsonlinux/zfs/issues/new ?


Reply to this email directly or view it on GitHub.

Björn Rabenstein, Engineer
http://soundcloud.com/brabenstein

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

@ryao
Copy link

ryao commented Jan 29, 2015

@beorn7 Filing an issue in the with ZoL tracker would have been a nice reminder in case I had forgotten about this. Thankfully, I did not. A fix is in a pull request.

ryao added a commit to ryao/zfs that referenced this issue Jan 29, 2015
@beorn7
Copy link
Member Author

beorn7 commented Jan 29, 2015

Cool. Thanks @ryao .

Prometheans, do we want to add an anticipated FAQ or something?

@beorn7
Copy link
Member Author

beorn7 commented Feb 2, 2015

It's now in the FAQ. Closing.

@beorn7 beorn7 closed this as completed Feb 2, 2015
simonpasquier pushed a commit to simonpasquier/prometheus that referenced this issue Oct 12, 2017
@lock
Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants