Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do with when finalizing channels? #6902

Closed
vicuna opened this issue Jun 12, 2015 · 28 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Jun 12, 2015

Original bug ID: 6902
Reporter: @alainfrisch
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2017-02-16T14:15:04Z)
Resolution: fixed
Priority: high
Severity: major
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: runtime system and C interface
Monitored by: @gasche @dbuenzli

Bug description

Commit 15817 changes the C finalizer of channel values, so that the C
structure is kept around in case the channel requires a flush (the channel will be flush by the default at_exit hook when the program exits). A comment explains that flushing is not possible during the finalizer, because it could raise an exception, which is not allowed in this context. Before the commit, the finalizer would just forget about the channel, meaning that it would never be flushed.

Clearly, the new strategy can make "bad programs" behave better, but mostly in the context of short-lived programs. Long-lived programs (such as servers) do not really benefit from the new behavior, and worst, it can cause memory leaks (in addition to existing fd leaks).

  • We could try to flush the channel in the finalizer, and ignore the error (if any). It is unlikely that trying to flush again later would work in that case.

  • Is it really the case that not freeing the C structure can only happen for unflushed out channels? It seems the same fields are used also for input buffers. (I did not check.)

  • The current version makes it harder to spot channels which are not properly closed (because they are flushed, so we don't notice any more that the code is wrong). But even if flushed at exit time, this still eats memory and file descriptors until then. Shouldn't we instead somehow report channels values which are finalized before being closed (e.g. through a globally registered hook)? And what about closing the channels in the finalizer (in addition to flushing them and to reporting the condition with the hook mentioned above)?

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 12, 2015

Comment author: @alainfrisch

It's dangereous to close automatically, since several channels can exist on the same underlying file descriptor, and we don't keep track of that; this comes from Unix.*_channel_of_descr. One could still close automatically channels created with standard functions in Pervasives (keeping a flag in the channel structure).

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 12, 2015

Comment author: @alainfrisch

Confirmed, the code below will eat all your memory:

let () =
  while true do
    let ic = open_in_bin Sys.argv.(0) in
    let _c = input_char ic in
    close_in ic
  done

I increase the severity of the bug because of that. (Initially, I thought that only "wrong code" that doesn't flush enough could produce memory leaks.)

One should at least restrict the change (which disables freeing the structure in the finalizer) to out channels (which requires to keep the in/out info in a flag).

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 12, 2015

Comment author: @alainfrisch

A related question: currently, the buffer itself (65kB) is part of the C structure, which is released when the heap value is finalized (well, modulo fixing the current ticket). This means that one should be careful to tweak the ints passed to caml_alloc_custom in caml_alloc_channel so that the GC runs often enough. But typically, a channel is closed explicitly, and this would seem to be a good time to release the buffer memory. This would allow to run the GC much less often without using too much memory. This could be done easily if the buffer was malloced instead of inlined directly in the C structure.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 12, 2015

Comment author: bobot

The channel C structure can also be inlined (with or without the buffer) inside the custom OCaml value, for simplifying the memory counting of the GC since it is not freed before the finalization.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 12, 2015

Comment author: @alainfrisch

The channel C structure can also be inlined (with or without the buffer) inside the custom OCaml value, for simplifying the memory counting of the GC since it is not freed before the finalization.

I thought about it, but it is not clear to me how one would then keep the linked list of all existing channels (to support the flush_all at exit time).

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 12, 2015

Comment author: bobot

Oh, that's the first time I see this feature. It have been added for Cash, the Caml Shell!

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 12, 2015

Comment author: @alainfrisch

Unless someone objects to it, I suggest to undo commit 15817 for now, since never releasing in_channel buffers is clearly worse than not flushing automatically out channels which are not manually closed.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 12, 2015

Comment author: @damiendoligez

Unless someone objects to it, I suggest to undo commit 15817 for now

I agree with this suggestion, and I propose that we report channels values which are finalized before being closed by either outputting a message on stderr or even by aborting the program. I don't think it's enough to hope that someone will notice some truncated files.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 12, 2015

Comment author: @alainfrisch

and I propose that we report channels values which are finalized before being
closed

It seems one would need to remember how channels have been created, right? (a single Unix fd can be turned into multiple channels, and only one of them can/must be closed).

by either outputting a message on stderr or even by aborting the program

Aborting the program is a bit extreme,I would say.

Alternatively, one could automatically close (non shared) channels upon finalization (and flush out channels before closing them, of course). I tend to prefer reporting non-closed channels, though.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 12, 2015

Comment author: @damiendoligez

It seems one would need to remember how channels have been created, right? (a single Unix fd can be turned into multiple channels, and only one of them can/must be closed).

Then there is probably no solution, since that fd might be shared with some C code, so you'll never know which channels should have been closed...

Alternatively, one could automatically close (non shared) channels upon finalization (and flush out channels before closing them, of course). I tend to prefer reporting non-closed channels, though.

Automatically closing file descriptors is IMO a very bad idea because it's a side-effect that's visible from other processes. I'd much rather let badly-written programs have invisible fd leaks.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 13, 2015

Comment author: @alainfrisch

Then there is probably no solution

Well, at least one could do something for the common case (channels created by open_* functions in Pervasives).

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 15, 2015

Comment author: @alainfrisch

Automatically closing file descriptors is IMO a very bad idea because it's a side-effect that's visible from other processes.

Ths same argument holds for automatically flushing. So, the best one could do is to report some errors (e.g. on stderr) when a channel created by functions in Pervasives is finalized before being closed or flushed (for out channels only).

But one should then also probably revisit the idea of automatically flushing at exit time (which makes the semantics depend on when the GC runs), and instead report the same error as above.

For now, I prefer to simply to revert commit 15817, and keep this ticket open for further discussion.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 15, 2015

Comment author: bobot

Do you revert it for 4.02.2?

revisit the idea of automatically flushing at exit time
That seems a good idea, a junior_job?

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 15, 2015

Comment author: @gasche

If both Damien and Alain worked on the problem and couldn't find a satisfying solution, maybe it's a bit tricky as a junior job.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 15, 2015

Comment author: @alainfrisch

Do you revert it for 4.02.2?

As far as I can say, the commit only went to trunk, not the 4.02 branch.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 15, 2015

Comment author: @alainfrisch

revisit the idea of automatically flushing at exit time
That seems a good idea, a junior_job?

I don't think there is any technical challenge: one just need to decide or not to stop flushing automatically at exit time any more. But this is a breaking change, so this cannot be done lightly.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 15, 2015

Comment author: @damiendoligez

This same argument holds for automatically flushing.

Not really. Flushing is just deferred execution of a write that was done by the program. If you close the FD you're just inventing a system call that the program never called.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 15, 2015

Comment author: @damiendoligez

to stop flushing automatically at exit time

Old versions of Caml Light did that, and it was very unfriendly to programmers (beginners and seasoned alike). Not all programs are long-lived servers, don't forget the read-process-write-exit kind.

I think the best solution we have so far is, flush, ignore errors (or report them to stderr), and keep the fd open. The grail of buffering is to delay some writes, but not change the behavior in any other way.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 15, 2015

Comment author: @alainfrisch

I think the best solution we have so far is, flush, ignore errors (or report them to stderr), and keep the fd open.

What about reporting non-closed channels at finalization time (for channels created with open_* functions in Pervasives) on stderr as well? (If one wanted to report something more useful for both unflushable channels and potentially non-closed ones, one could collect some stack context when the channel is created.)

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 15, 2015

Comment author: @lpw25

I'm not sure about reporting errors on stderr. Is there any other place which reports things to stderr and continues execution? It seems likely to produce libraries which write to stderr, which always seems a bad idea to me.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 15, 2015

Comment author: @damiendoligez

It seems likely to produce libraries which write to stderr, which always seems a bad idea to me.

Indeed, I don't like it much, for the same reason I don't like auto-closing FDs. That's why I also suggested the possibility of aborting the program.

It's hard to find a balance between reporting the problem and keeping the program's behavior undisturbed. In practice, maybe the FD leak is enough to make the programmer aware of the problem.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 16, 2015

Comment author: @alainfrisch

That's why I also suggested the possibility of aborting the program.

As said, I find this a bit extreme: it will cause programs that currently works fine to fail. Typically, a short-lived program can today open files and never close them. This is not very nice, but it works. This is basically the argument you gave against not flushing automatically at exit time:

Old versions of Caml Light did that, and it was very unfriendly to programmers (beginners and seasoned alike). Not all programs are long-lived servers, don't forget the read-process-write-exit kind.

(Of course, a program which closes fds properly does not require automatic flushing at exit time.)

maybe the FD leak is enough to make the programmer aware of the problem.

Not clear. If this happens in a library, the authors might not use the code in a long-lived program (but users of the library could). And unit tests are unlikely that catch the problem.

I'm not a big fan either of writing to stderr, but this seems less aggressive than aborting the program, and better than doing nothing. Perhaps these warnings on stderr could be enabled only when the user explicitly ask for them (by calling a new C primitive).

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 24, 2015

Comment author: @damiendoligez

Perhaps these warnings on stderr could be enabled only when the user explicitly ask for them (by calling a new C primitive).

Then nobody will enable them... Maybe we should use a primitive (or even an env variable) to silence the warning.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 24, 2015

Comment author: @alainfrisch

Then nobody will enable them... Maybe we should use a primitive (or even an env variable) to silence the warning.

Indeed, the default should probably be on.

An environment variable is perhaps overkill, this could be rather a new flag in OCAMLRUNPARAM. I think we must expose also a primitive anyway; or only the primitive: the client code could itself parse some env variable if needed).

Do see see other instances of "important warnings from the runtime" which could be treated in the same way? (If so, one should decide the level of granularity for controlling them.)

And do you have an opinion on the text of the warning? We could collect the top of the stacktrace when any new channel is created in order to give an idea of which code leaked the file descriptor, but this might be over-engineering...

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 26, 2015

Comment author: @damiendoligez

The text of the warning should give (or point to):

  1. an explanation of what's going on
  2. a way to silence the warning

I wouldn't go as far as a stack trace, but if we could store the file name (when available) that would probably be quite useful.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 26, 2015

Comment author: @alainfrisch

I didn't think about keeping the filename, but clearly, it's much simpler than the stacktrace and probably enough in practice.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 30, 2015

Comment author: @alainfrisch

I've implemented the proposal

#210

I think it would be coherent to stop flushing channels automatically at exit time (because this happens only if they haven't been collected before, and this is not predictable), and instead report the same warning if unclosed channels remain live when the program terminates. Damien: do you agree?

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 24, 2015

Comment author: @alainfrisch

Committed to trunk, revision 16245.

@vicuna vicuna closed this Feb 16, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.