Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: File implicitly closed when garbage-collected due to runtime finalizer but this behavior is not documented #41505

Closed
PureWhiteWu opened this issue Sep 20, 2020 · 19 comments

Comments

@PureWhiteWu
Copy link
Contributor

@PureWhiteWu PureWhiteWu commented Sep 20, 2020

What version of Go are you using (go version)?

$ go version
go 1.15.2

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

Linux and macOS.

What did you do?

Code: https://play.golang.org/p/YmHIIc7sha4

Note: This does not reproduce in go playground, but can reproduce on my mac/linux.

What did you expect to see?

No panic.

What did you see instead?

panic: bad file descriptor

Reason

I found that this is because there's an implicitly close call when the file object is garbage-collected.

In os/file_unix.go:

func newFile(fd uintptr, name string, kind newFileKind) *File {
	...

	runtime.SetFinalizer(f.file, (*file).close)
	return f
}

What do I expect

I know there's thousands of method to avoid being garbage-collected such as runtime.KeepAlive, but I think this is unreasonable, either this behaviour should be removed or the file object should not be garbage-collected.

If a user forgets to call file.Close, the memory should be leaking.

Also, this behaviour isn't documented anywhere.

@odeke-em odeke-em changed the title os: File implicitly closed when garbage-collected os: File implicitly closed when garbage-collected due to runtime finalizer but this behavior is not documented Sep 20, 2020
@odeke-em
Copy link
Member

@odeke-em odeke-em commented Sep 20, 2020

Thank you for filing this issue @PureWhiteWu!

Unfortunately we can’t remove this behavior, but perhaps we can document it. I worry thought that documenting that files have finalizers might encourage callous behavior in which folks just rely on garbage collection to sort out for them file close problems which will cause undeterministic leaks and bugs. I understand that in between invoking syscall.Write(fd, payload) and the end, a garbage collection cycle might run, but unfortunately that program has a leak that then depends on undefined behavior. How often have you seen such code?

@PureWhiteWu
Copy link
Contributor Author

@PureWhiteWu PureWhiteWu commented Sep 20, 2020

@odeke-em Thanks for your reply!

This is a bug we encountered in our real-world program, in which we are doing something like this(simplified version):

// open the file
file, err := os.OpenFile("x", os.O_RDWR, 0666)
...
// get its fd
fd := int(file.Fd())
// do some work using syscall directly forever
for {
    data := getData()
    syscall.Write(fd, data)
    ....
}
syscall.Close(fd)

And we got -9 (bad fd) after some time.

This bug is hard to debug, so I think something should be changed to prevent this.

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Sep 20, 2020

Gotcha, I understand and thank you for the citation. For some remedies:
a) defer file.Close()
b) runtime.KeepAlive(file) although that would mask the problem with the missing file.Close()

Some docs updates to state that a file might be garbage collected perhaps could have helped you in this case then, if we stated the need to keep the file alive for the course of usage or to ensure it is closed, but the finalizer is an implementation/system specific detail. It is definitely inconvenient to try to go down many rabbit holes trying to figure out why a bad fd error occurred.

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Sep 20, 2020

@cespare
Copy link
Contributor

@cespare cespare commented Sep 20, 2020

Also, this behaviour isn't documented anywhere.

It is documented. The os.File.Fd documentation says, in part,

The file descriptor is valid only until f.Close is called or f is garbage collected.

It sounds like you already understand the problem and the workarounds from your original description.

This bug is hard to debug

Perhaps, though the crash seems fairly hard to miss.

In any case, if you're using direct syscalls to write files rather than the usual mechanisms, you should be prepared for some amount of tricky debugging.

@PureWhiteWu
Copy link
Contributor Author

@PureWhiteWu PureWhiteWu commented Sep 20, 2020

@cespare Thanks very much, I missed this part.

Perhaps, though the crash seems fairly hard to miss.

In some cases, the for loop may quit before gc, so this may occur occasionally. I think this is really hard to debug for users, because users are not expected to understand how gc works and when will gc execute.

@odeke-em Thanks for your reply, I think defer file.Close() makes sense in this situation.

But maybe we can make things more clearly, such as adding an extra line:

// File represents an open file descriptor.
// Note: File.Close is automatically called if it hasn't been when it is garbage collected. 
type File struct {
	*file // os specific
}
@beoran
Copy link

@beoran beoran commented Sep 20, 2020

Related issue: #34810

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 21, 2020

The problem here is the call of the Fd method, and the Fd method is the place to document any possible issues with the finalizer. As @cespare points out, the Fd docs already say "The file descriptor is valid only until f.Close is called or f is garbage collected." Can we make that clearer? More prominent?

I do not think we should add any documentation to os.File because there is no problem with ordinary use of os.File. The problem only arises for programs that call the Fd method, which is a rare case. Most programs that need to all Fd should instead be using syscall.Open or the SyscallConn method. Perhaps that we what we should say in the Fd docs.

@andrius4669
Copy link
Contributor

@andrius4669 andrius4669 commented Sep 21, 2020

Can we make that clearer?

Perhaps runtime.KeepAlive(f) could be suggested there, to prevent f from being garbage collected.
Currently "or f is garbage collected" doesn't look important enough because many don't know when GC can hit them (importantly, it can get GC'd even before f goes out of scope).
On the other hand if f.Close() or any other method gets called it'd be kept alive too, some maybe these should be documented as well to not cause needless use of runtime.KeepAlive(f).
Maybe one more sentence like "Method call like f.Close() or runtime.KeepAlive(f) can be used to prevent unexpected closure of file descriptor".
I agree with suggestion to use syscall.Open instead too.
Either way documentation could be expanded there, and probably should, but that's just my 2c.

@beoran
Copy link

@beoran beoran commented Sep 22, 2020

I think that the File.Fd method should be deprecated in favor of RawConn.Control and perhaps a new RawConn.MultiControl. We cannot remove File.Fd due to the Go 1 promise, but it is very hard to use correctly. It is what I would call a "foot gun".

@changkun
Copy link
Contributor

@changkun changkun commented Sep 23, 2020

@ianlancetaylor Why this is a NeedsFix? As you said, is there anything really need a fix? How is the FD doc not clear enough on its usage? After reading this thread I get the message that the "bug" was a misuse of FD, other than an issue of the document since it is the caller's responsibility to keep the file open, such as @odeke-em suggested use defer f.Close() other than use syscall.Close.

@andrius4669
Copy link
Contributor

@andrius4669 andrius4669 commented Sep 23, 2020

I think that the File.Fd method should be deprecated in favor of RawConn.Control and perhaps a new RawConn.MultiControl.

Perhaps. Should probably be separate issue though.

but it is very hard to use correctly. It is what I would call a "foot gun".

It's here to stay so might as well minimize damage by documenting it.

How is the FD doc not clear enough on its usage?

Apparently it wasn't if user still made mistake after reading it. And took their time to write up issue in bug tracker (how many more users have done this mistake? how many fixed it for themselves and didn't make an issue?).

After reading this thread I get the message that the "bug" was a misuse of FD, other than an issue of the document since it is the caller's responsibility to keep the file open, such as @odeke-em suggested use defer f.Close() other than use syscall.Close.

Yes. But documentation doesn't suggest using f.Close(). And runtime.KeepAlive() combined with syscall.Close wouldn't have been disastrous either. But none of them were suggested.
I know that it's not a place to explain whole GC workings inside one function documentation, but IMO documenting common footgun (GC closing fd) and common workarounds (proper usage of f.Close() or runtime.KeepAlive()) would be valuable and would save people some time debugging.
If this function were discouraged in the future, it would also be good explanation, why it is (namely, this footgun).

@changkun
Copy link
Contributor

@changkun changkun commented Sep 23, 2020

(how many more users have done this mistake? how many fixed it for themselves and didn't make an issue?).

Hmm. I think people who made such a mistake, solved, and stay silent thought it was their mistake other than an issue of the document, meaning the doc is clear on documenting its behavior.

A meta issue regarding the misuse of a (potential) problematic APIs (such as Timer.Stop) are neither removed from standard library nor redesigned because of the Go 1 promise, could add comments like // Deprecated, // Abandoned. But this should be discussed in a separate issue.

But documentation doesn't suggest using f.Close().

I don't see a reason the document needs explicitly suggest using f.Close()? If a file is opened using os.Open, is there any good reason to seek far and neglect what lies close at hand, i.e. using syscall.Close other than a better choice of f.Close? Especially this pattern has been primarily advertised over tutorials.

f, err := os.Open("foo")
if err != nil { ... }
defer f.Close()
@networkimprov
Copy link

@networkimprov networkimprov commented Sep 23, 2020

Why spend time arguing against documentation enhancement? Amending the docs is no more costly :-)

@andrius4669
Copy link
Contributor

@andrius4669 andrius4669 commented Sep 23, 2020

If a file is opened using os.Open, is there any good reason to seek far and neglect what lies close at hand, i.e. using syscall.Close other than a better choice of f.Close?

I believe that issue is that some people consider f.Fd() method as "disown", therefore they continue using only returned fd only, and in this case calling syscall.Close() sorta makes sense. Dealing with only single variable instead of 2 is probably easier.
Yeah, it isn't very good reason to use it like that because it's wrong (fd isn't disowned), but if they overlook that it's wrong, it's convenient shortcut to make and forget f.

They probably miss "or f is garbage collected" because frankly it's short and looks insignificant since var won't even go out of scope. If at least one more sentence were to be added it probably would bring a bit more attention to possible mistake (they would at least check if they do f.Close() properly, or runtime.KeepAlive() maybe), if they know that there may be unexpected dragons if they don't.

We could also add suggestion to prefer using syscall.Open() or f.SyscallConn() depending on usage scenario instead.

Is there reason to not do anything? I don't think adding a bit more info to known footgun method would hurt anyone.

Why spend time arguing against documentation enhancement? Amending the docs is no more costly :-)

Indeed. Maybe I should just try making PR at this point.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 23, 2020

@changkun I marked this as needs-fix because I think the documentation can be improved. I think the docs should explicitly point at f.SyscallConn as an alternative, and I think the docs should emphasize that when f is garbage collected the descriptor will be closed rather than leaving that at the end of the sentence, and I think docs should explicitly point to runtime.SetFinalizer which carries all sorts of warnings about finalizers.

@PureWhiteWu
Copy link
Contributor Author

@PureWhiteWu PureWhiteWu commented Sep 24, 2020

First, thank you all for involving in this issue!
I want to share what the really experience was when I made this mistake:

I usually use Goland for coding, so when I was writing this code, I just saw the Fd method through the code hint and used it directly, without reading the Fd comment carefully, and assumed that if I didn't close it, Fd would always be alive, based on my C experience.
When the problem occured, I didn't read Fd's comments, but instead I read the comments for OpenFile and the File struct, because I thought this behavior should have been explained at the whole struct level, I couldn't find it at that time (and ended up looking at the code to confirm).
Based on @cespare's reply, I found that indeed this behavior is stated in the Fd comment, and that it was indeed my misuse that caused it. But I also have to say that this comment is easy to miss (getting Fd is such a common operation that I'm sure there are very few people who would read the comment carefully), at least for a rookie like me missed it.
So I wonder if we could consider adding some annotations to explicitly point out the problem, since it's counter-intuitive and counter-empirical, and it's a bit harder to troubleshoot.
We are familiar with the GC and the source code of Go, so maybe it's not so difficult for us to find out the problem. But I don't think we should require users to be familiar with either GC or the source code before they can use Go to write code.

Thank you for reading this.

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 24, 2020

Change https://golang.org/cl/256899 mentions this issue: os: document and emphasize a potential misuse of File.Fd

@gopherbot gopherbot closed this in 7bb6fed Sep 27, 2020
@andrius4669
Copy link
Contributor

@andrius4669 andrius4669 commented Sep 27, 2020

actual comment has SyscallCon not SyscallConn (typo) (already mentioned by someone on gerrit)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.