Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: File.Sync() for Darwin should use the F_FULLFSYNC fcntl #26650

Closed
networkimprov opened this issue Jul 27, 2018 · 31 comments
Closed

os: File.Sync() for Darwin should use the F_FULLFSYNC fcntl #26650

networkimprov opened this issue Jul 27, 2018 · 31 comments
Assignees
Milestone

Comments

@networkimprov
Copy link

@networkimprov networkimprov commented Jul 27, 2018

What version of Go?

1.10 or tip

What operating system and processor architecture?

darwin, *

What is the matter?

According to this and other sites, fcntl(fd, F_FULLFSYNC) is necessary for proper fsync on MacOS (OS X): https://danluu.com/file-consistency/

Github search finds no uses of "F_FULLFSYNC" in the src tree.

If there is a rationale for the present implementation, it should be documented in File.Sync() with directions on how to invoke the fcntl variant.

@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Jul 27, 2018
@odeke-em odeke-em added the OS-Darwin label Jul 27, 2018
@odeke-em odeke-em changed the title os: File.Sync() for darwin not correctly implemented os: File.Sync() for Darwin should use the F_FULLFSYNC fcntl Jul 27, 2018
@networkimprov

This comment has been minimized.

Copy link
Author

@networkimprov networkimprov commented Jul 27, 2018

Since this issue threatens data loss/corruption, please consider fixing it in 1.11.x and 1.10.x

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jul 27, 2018

Thank you for reporting this issue @networkimprov! It is triaged for 1.12 as 1.11 is scheduled to be released in just a few days. It has also been like this since forever. However we'll definitely consider it as an early candidate for Go1.12 when the tree opens up in the next few weeks. If you are interested in submitting a CL, please feel free, it'd be great to have your contribution in the Go project and community, otherwise we'll get it in for Go1.12.

@networkimprov

This comment has been minimized.

Copy link
Author

@networkimprov networkimprov commented Jul 27, 2018

I'm not proposing 1.11.0 as a target; rather 1.11.1 & 1.10.4 or the minor releases that appear after a fix lands on tip.

Do you have guidance on where fcntl should be called for Darwin File.Sync()?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 27, 2018

The question is whether we should do it in syscall.Fsync or os.File.Fsync. I guess I would lean slightly toward the former, but happy to hear comments.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jul 27, 2018

I'm not proposing 1.11.0 as a target; rather 1.11.1 & 1.10.4 or the minor releases that appear after a fix lands on tip.

A backport I believe warrants something that has plagued previous versions or results for memory corruption, but I'll defer to others on whether this qualifies too -- in this case potential file corruption might qualify :)

The question is whether we should do it in syscall.Fsync or os.File.Fsync. I guess I would lean slightly toward the former, but happy to hear comments.

@ianlancetaylor I too was leaning towards it going in to syscall.Fsync since perhaps we can minimally make it

// As per /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/sys/fcntl.h
const F_FULLFSYNC = 51
func Fsync(fd int) error {
     _, err := fcntl(fd, F_FULLFSYNC, 1)
    return err
}

so in full

+// As per /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/sys/fcntl.h
+const F_FULLFSYNC = 51
+
+func Fsync(fd int) error {
+       _, err := fcntl(fd, F_FULLFSYNC, 0)
+       return err
+}
+
 // SockaddrDatalink implements the Sockaddr interface for AF_LINK type sockets.
 type SockaddrDatalink struct {
        Len    uint8
@@ -332,7 +340,6 @@ func Uname(uname *Utsname) error {
 //sys  Fstat(fd int, stat *Stat_t) (err error) = SYS_FSTAT64
 //sys  Fstatat(fd int, path string, stat *Stat_t, flags int) (err error) = SYS_FSTATAT64
 //sys  Fstatfs(fd int, stat *Statfs_t) (err error) = SYS_FSTATFS64
-//sys  Fsync(fd int) (err error)
 //sys  Ftruncate(fd int, length int64) (err error)
 //sys  Getdirentries(fd int, buf []byte, basep *uintptr) (n int, err error) = SYS_GETDIRENTRIES64
 //sys  Getdtablesize() (size int)

Also happy to hear what comments others have.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 27, 2018

On the other hand, thinking about it further, most of the syscall functions simply implement that operation on the system, rather than trying to present a unified cross-system API. So that suggests that maybe syscall.Fsync on MacOS should just implement the fsync system call, and that we should make the fcntl call in os.File.Sync, or, really, internal/poll/Fd.Fsync. I'm not sure.

@networkimprov

This comment has been minimized.

Copy link
Author

@networkimprov networkimprov commented Jul 28, 2018

I'm not proposing 1.11.0 as a target; rather 1.11.1 & 1.10.4 ...

A backport I believe warrants something that has plagued previous versions or results for memory corruption, but I'll defer to others on whether this qualifies too -- in this case potential file corruption might qualify :)

Note that this can hurt end-users without the Go developer ever realizing it: User does xyz in app, OS crashes (or hw fails), user sees odd behavior in app after restart and restores from backup (losing today's work), user assumes problem was due to hw or os.

+const FULLFSYNC = 51

It's already defined in tree:
https://github.com/golang/go/search?q=f_fullfsync&unscoped_q=f_fullfsync

Agreed you should leave MacOS fsync available in case anyone needs it.

@networkimprov

This comment has been minimized.

Copy link
Author

@networkimprov networkimprov commented Jul 28, 2018

Perhaps surface MacOS fsync as syscall.FsyncDarwin and implement syscall.Fsync with fcntl?

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jul 28, 2018

Cool, thanks for the discussion @networkimprov and @ianlancetaylor!

@networkimprov

This comment has been minimized.

Copy link
Author

@networkimprov networkimprov commented Aug 22, 2018

@odeke-em the 1.12 tree is open; can we get this CL posted?

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Aug 22, 2018

@networkimprov thank you for the ping and reminder! For sure, let's get the ball rolling. I am in the process of submitting a CL.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Aug 22, 2018

Change https://golang.org/cl/130676 mentions this issue: internal/poll: use F_FULLFSYNC fcntl for FD.Fsync on OS X

@odeke-em odeke-em self-assigned this Aug 22, 2018
@gopherbot gopherbot closed this in be10ad7 Aug 22, 2018
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Aug 22, 2018

In regards to backporting this fix to Go1.11 and earlier, what do y'all think @ianlancetaylor @andybons?
Also perhaps we need to add this to Go1.12 release notes?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 22, 2018

This is not a new problem. I don't see a need to backport.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Aug 22, 2018

Roger that! The rationale that @networkimprov proposed, which I thought was reasonable was that because it is an underlying problem that has existed since Go1.0, it'd be nice to have it in the latest Go release since it can lead to disk memory corruption and affects many databases, folks that need flushes to permanent storage. However, as we've all noticed, no one had frantically reported this bug during previous Go cycles hence doesn't meet the bar for a backport.

@networkimprov

This comment has been minimized.

Copy link
Author

@networkimprov networkimprov commented Aug 22, 2018

This can hurt end-users without the Go developer ever realizing it:

  1. user does xyz in app; OS crashes or hw fails
  2. user sees odd app behavior after restart and restores from backup, losing this week's work
  3. user assumes problem was due to hw or OS

I plan to release two apps near-future which rely on os.File.Sync() for crash protection, and MacOS is widely used by my audience.

Not Go-related, but my Git repo was corrupted last week due to a hw failure.

@networkimprov

This comment has been minimized.

Copy link
Author

@networkimprov networkimprov commented Sep 12, 2018

@ianlancetaylor could you consider cherry-picking this for 1.11.1, given the report from a bbolt db user on MacOS?

https://github.com/etcd-io/bbolt

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 12, 2018

I don't see a report at that link.

We've also seen a bug report on this due to the changed performance: #27415. I'm pretty reluctant to make this kind of change in a point release. As I said earlier, it's not a new problem.

@mark-rushakoff

This comment has been minimized.

Copy link
Contributor

@mark-rushakoff mark-rushakoff commented Sep 12, 2018

Couldn't bbolt handle this by making the syscall directly? From Go 1.12 onwards they can go back to a plain (*os.File).Sync().

@networkimprov

This comment has been minimized.

Copy link
Author

@networkimprov networkimprov commented Sep 12, 2018

@mark-rushakoff, probably they could. Bleve is also affected and they don't use the bbolt fork yet; hopefully they'll switch if bbolt applies the workaround. Wish I knew who else depends on os.File.Sync :-(

@ianlancetaylor, yes that was the report I meant. I hope you won't fault me for asking.

@networkimprov

This comment has been minimized.

Copy link
Author

@networkimprov networkimprov commented Dec 19, 2018

Regarding the 1.12 release notes:

"Note that this might have a negative performance impact."

I'd suggest adding that File.Sync() prior to 1.12 is broken on MacOS, and may cause data loss/corruption.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 19, 2018

That would imply that calling File.Sync could in itself cause data loss/corruption, which is not the case.

I think the current release notes text is clear enough.

@mark-rushakoff

This comment has been minimized.

Copy link
Contributor

@mark-rushakoff mark-rushakoff commented Dec 19, 2018

I was looking at the release notes for this as well, wondering to myself: "negative performance impact" relative to ... the previous implementation of Sync? Not calling Sync but just relying on Close?

I think it would be more clear if it mentioned that calling File.Sync is expected to block for a longer time in 1.12 than in 1.11 and earlier, because now Sync forces writes to disk rather than just OS buffers, and as a result Sync is now more resilient to crashes. "Negative performance impact" is a bit vague in my opinion.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Dec 19, 2018

Change https://golang.org/cl/155038 mentions this issue: doc: clarify change to File.Sync on macOS

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 19, 2018

What do you think of https://golang.org/cl/155038?

(I really think we are spending too much time on this. It's a bug fix.)

@mark-rushakoff

This comment has been minimized.

Copy link
Contributor

@mark-rushakoff mark-rushakoff commented Dec 19, 2018

LGTM @ianlancetaylor. Thanks for rewording it.

@networkimprov

This comment has been minimized.

Copy link
Author

@networkimprov networkimprov commented Dec 19, 2018

That would imply that calling File.Sync could in itself cause data loss/corruption

True. Maybe "File.Sync() prior to 1.12 is ineffective on MacOS, and cannot prevent data loss/corruption on system crash/power-off."

It's a bug fix with broad impact among database packages, including etcd bbolt, syndtr goleveldb, dgraph badger, and bleve.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 20, 2018

Thanks, I'm going with what I've got.

@campoy

This comment has been minimized.

Copy link
Contributor

@campoy campoy commented Jun 17, 2019

I'm definitely late to the conversation, but I wanted to document my frustration with the decision of not backporting this fix, and how it's affecting https://github.com/dgraph-io/badger.

We now need to replace all the calls to os.File.Sync to instead call a FileSync function for which we provide two implementations.

Once is very straight forward:

file_sync.go

// +build !darwin go1.12

package y

import "os"

// FileSync calls os.File.Sync with the right parameters.
// This function can be removed once we stop supporting Go 1.11
// on MacOS.
func FileSync(f *os.File) error {return f.Sync()}

The second one for now simply makes the corresponding syscall.

file_sync_darwin.go

// +build darwin,!go1.12

package y

import (
	"os"
	"syscall"
)

func FileSync(f *os.File) error {
	_, _, err := syscall.Syscall(syscall.SYS_FCNTL, f.Fd(), syscall.F_FULLFSYNC, 0)
	if err == 0 {
		return nil
	}
	return err
}

Making this function do the right thing seems to require interacting with syscall package in ways it seems it's not designed to (according to this change). Before calling the syscall.Syscall(syscall.SYS_FCNTL, ...) we need to first call incref which is not exported.

Could you, at least, provide a clear way of how to implement this function correctly?
Is my code correct?
I'm afraid I'll hack something that works but eventually will figure out something is missing.

Thanks

UPDATE: I already found a way my code was wrong, added the check to see whether errno is zero.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 19, 2019

@campoy I understand your frustration, but there was no good solution here. Making this change has led to multiple bug reports about significant performance changes, changes due to now using the correct behavior. It would have been inappropriate to make a change that led to multiple bug reports in a minor release.

The code you show above looks correct and fine for a file on local disk. I'm not sure there is a fully correct to call fsync for an NFS file in Go 1.11 on Darwin. The incref is not an issue because you are calling the Fd method.

@campoy

This comment has been minimized.

Copy link
Contributor

@campoy campoy commented Jun 21, 2019

Thanks for the confirmation, @ianlancetaylor!

I think that just keeping my code in this issue should be enough to help others wondering how to fix it on their own codebases.

The PR fixing this issue for badger is dgraph-io/badger#871.

mostynb added a commit to mostynb/bazel-remote that referenced this issue Dec 19, 2019
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: buchgr#67 (while
being less risky than buchgr#68).
mostynb added a commit to mostynb/bazel-remote that referenced this issue Dec 30, 2019
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: buchgr#67 (while
being less risky than buchgr#68).
mostynb added a commit to mostynb/bazel-remote that referenced this issue Jan 1, 2020
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: buchgr#67 (while
being less risky than buchgr#68).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.