-
Notifications
You must be signed in to change notification settings - Fork 197
Don't hold on to channels after they've been closed #1577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In my use case, this is responsible for O(100MiB) of memory usage, because we hold on to multiple copies of time zone data, and each file has a 65KiB buffer that doesn't get released. The file io.js provides the same runtime as the C runtime from [2]. There's a global array of all existing channels, and this never gets released even if a channel is closed. In the C runtime, this does eventually happen in a call to [caml_finalize_channel] which is called by the GC, but I don't think we have that ability in js_of_ocaml. This feature makes [chanid] an object rather than an integer, so we can let the javascript garbage collector take care of it. The object [caml_ml_channels] continues to exist, but its only remaining function is to provide "override" functionality in the way that ppx_expect needs it. Some useful references: [1]: https://v2.ocaml.org/releases/5.1/api/Stdlib.html#2_Outputfunctionsonstandardoutput [2]: https://github.com/ocaml/ocaml/blob/trunk/runtime/io.c#L163 [3]: https://github.com/janestreet/ppx_expect/blob/master/runtime/runtime.js
|
The tests are currently broken, is this expected ? do we need a modified version of There are at least two other packages using similar tricks
I'm not convinced by the override logic. We should be able to solve the leak using a WeakMap instead of an array for |
|
I've worked on an alternative in #1578. Let me know what you think. |
|
The CI failure seems to be due to a piece of code in |
| chan.opened = false; | ||
| caml_sys_close(chan.fd) | ||
| caml_sys_close(chan.fd); | ||
| chanid.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happen if you close an "overridden" channel twice ?
Don't you end up closing both the override and the original channel ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, and yes that's what would happen. For example, if you close stderr when you're in the middle of executing an expect test, then the first call will close the temporary output channel instead. Any second call would close stderr itself.
We're stuck with the (non garbage collectible) string keys here, so this is not trivial to fix.
This override logic is only for ppx_expect, and arguably it's pretty poorly defined what closing the channel is supposed to to. But let me see if we can improve the compatibility even more.
|
Thanks for engaging with this @hhugo . It looks like, by working on #1578, you discovered the same constraints that I did. The only way to fix the memory leak is replacing the integer ids by something garbage collectible; objects or symbols both work. That still has some design freedom:
Now all of these things need changing It's up to you to decide how you manage releases together with your reverse dependencies. But this pull request gives you the option of fixing the issue without breaking your reverse dependencies. I think it's the best we can do under that constraint. You could still follow up afterwards: create an actual API for overrides, migrating the reverse dependencies to it, and remove |
|
I'll look into the test failures; it's embarrassing but I'm having a hard time running this in its own tree because "vanilla" dune is being shadowed by a company-specific build. I'm sure I'll be able to sort that out. |
|
The CI comes from function log(x) {
var s = caml_string_of_jsbytes(x + "\n");
caml_ml_output(2, s, 0, caml_ml_string_length(s)); |
I like that approach because it would bring the behavior closer to native. Would you be interested working in that direction ? Another approach to mitigate the leak in the very short term with minimal diff would be to destroy the buffer on close. |
see #1581 |
Can you elaborate a bit why one need to move the offset ? |
In the C runtime, channel->offset = lseek(fd, 0, SEEK_CUR);and from then on we keep it in sync using written = caml_write_fd(channel->fd, channel->flags,
channel->buff, towrite);
channel->offset += written;In the JS runtime, chan.file.write(chan.offset, chan.buffer, 0, chan.buffer_curr);
chan.offset += chan.buffer_curr;To be honest, the C runtime seems a bit brittle when it comes to swapping out file descriptors, but evidently it works in practice. In the JS runtime we'd have to override the |
|
replaced by #1578 |
In my use case, this issue is responsible for O(100MiB) of memory usage, because I'm reading one file per named time zone, and each file has a 65KiB buffer that doesn't get released.
The file io.js provides the same runtime as the C runtime from 2. There's a global array of all existing channels, and this never gets released even if a channel is closed. In the C runtime, this does eventually happen in a call to [caml_finalize_channel] which is called by the GC, but I don't think we have that ability in js_of_ocaml.
This feature makes [chanid] an object rather than an integer, so we can let the javascript garbage collector take care of it.
The object [caml_ml_channels] continues to exist, but its only remaining function is to provide "override" functionality in the way that ppx_expect needs it.
Status:
class)Some useful references: