Don't hold on to channels after they've been closed #1577

tkluck-janestreet · 2024-03-01T17:41:17Z

In my use case, this issue is responsible for O(100MiB) of memory usage, because I'm reading one file per named time zone, and each file has a 65KiB buffer that doesn't get released.

The file io.js provides the same runtime as the C runtime from 2. There's a global array of all existing channels, and this never gets released even if a channel is closed. In the C runtime, this does eventually happen in a call to [caml_finalize_channel] which is called by the GC, but I don't think we have that ability in js_of_ocaml.

This feature makes [chanid] an object rather than an integer, so we can let the javascript garbage collector take care of it.

The object [caml_ml_channels] continues to exist, but its only remaining function is to provide "override" functionality in the way that ppx_expect needs it.

Status:

get feedback from js_of_ocaml developers
when everyone is happy with the direction, translate to an older/more compatible version of javascript (avoiding class)
merge

Some useful references:

In my use case, this is responsible for O(100MiB) of memory usage, because we hold on to multiple copies of time zone data, and each file has a 65KiB buffer that doesn't get released. The file io.js provides the same runtime as the C runtime from [2]. There's a global array of all existing channels, and this never gets released even if a channel is closed. In the C runtime, this does eventually happen in a call to [caml_finalize_channel] which is called by the GC, but I don't think we have that ability in js_of_ocaml. This feature makes [chanid] an object rather than an integer, so we can let the javascript garbage collector take care of it. The object [caml_ml_channels] continues to exist, but its only remaining function is to provide "override" functionality in the way that ppx_expect needs it. Some useful references: [1]: https://v2.ocaml.org/releases/5.1/api/Stdlib.html#2_Outputfunctionsonstandardoutput [2]: https://github.com/ocaml/ocaml/blob/trunk/runtime/io.c#L163 [3]: https://github.com/janestreet/ppx_expect/blob/master/runtime/runtime.js

hhugo · 2024-03-02T15:11:26Z

The tests are currently broken, is this expected ? do we need a modified version of ppx_expect ?

There are at least two other packages using similar tricks

I'm not convinced by the override logic. We should be able to solve the leak using a WeakMap instead of an array for caml_ml_channels. We just need to track the opened channels somehow.

hhugo · 2024-03-03T14:21:53Z

I've worked on an alternative in #1578. Let me know what you think.

hhugo · 2024-03-03T21:51:41Z

The CI failure seems to be due to a piece of code in parsing.js assuming that a channel is an integer.

hhugo · 2024-03-03T22:06:52Z

runtime/io.js

  chan.opened = false;
-  caml_sys_close(chan.fd)
+  caml_sys_close(chan.fd);
+  chanid.close();


What happen if you close an "overridden" channel twice ?
Don't you end up closing both the override and the original channel ?

Good catch, and yes that's what would happen. For example, if you close stderr when you're in the middle of executing an expect test, then the first call will close the temporary output channel instead. Any second call would close stderr itself.

We're stuck with the (non garbage collectible) string keys here, so this is not trivial to fix.

This override logic is only for ppx_expect, and arguably it's pretty poorly defined what closing the channel is supposed to to. But let me see if we can improve the compatibility even more.

tkluck-janestreet · 2024-03-04T11:20:27Z

Thanks for engaging with this @hhugo . It looks like, by working on #1578, you discovered the same constraints that I did.

The only way to fix the memory leak is replacing the integer ids by something garbage collectible; objects or symbols both work. That still has some design freedom:

We could remove the concept of chanid entirely and just return the channel object as a handle to itself. Overrides can be represented by a new property on the channel object.
But I'd actually be in favour of leaving chanid and chan as separate objects so it's easier to understand where the override applies. Your solution with WeakMap and Symbol works, and is equivalent to returning the object that's stored as the value in the WeakMap except for opaqueness.
Alternatively, you could swap out the fd; that's what the C runtime does. But that would need moving the offset property to the file descriptor object as well.

Now all of these things need changing ppx_expect. (It took me embarrassingly long to realize that WeakMap doesn't use a[b] indexing but uses .get and .set instead, and that it's therefore backwards incompatible with how ppx_expect uses it.)

It's up to you to decide how you manage releases together with your reverse dependencies. But this pull request gives you the option of fixing the issue without breaking your reverse dependencies. I think it's the best we can do under that constraint.

You could still follow up afterwards: create an actual API for overrides, migrating the reverse dependencies to it, and remove caml_ml_channels entirely. I expect that that's a slower process and the memory leak remains in the meantime. That might be a reason for merging this pull request.

tkluck-janestreet · 2024-03-04T11:21:53Z

I'll look into the test failures; it's embarrassing but I'm having a hard time running this in its own tree because "vanilla" dune is being shadowed by a company-specific build. I'm sure I'll be able to sort that out.

hhugo · 2024-03-04T14:57:43Z

The CI comes from parsing.js, caml_ml_output take a "chanid" but is called with 2 in the following code snippet.

  function log(x) {
    var s = caml_string_of_jsbytes(x + "\n");
    caml_ml_output(2, s, 0, caml_ml_string_length(s));

hhugo · 2024-03-04T15:06:53Z

Alternatively, you could swap out the fd; that's what the C runtime does. But that would need moving the offset property to the file descriptor object as well.

I like that approach because it would bring the behavior closer to native. Would you be interested working in that direction ?

Another approach to mitigate the leak in the very short term with minimal diff would be to destroy the buffer on close.

hhugo · 2024-03-04T15:15:41Z

Another approach to mitigate the leak in the very short term with minimal diff would be to destroy the buffer on close.

see #1581

hhugo · 2024-03-04T21:18:09Z

Alternatively, you could swap out the fd; that's what the C runtime does. But that would need moving the offset property to the file descriptor object as well.

Can you elaborate a bit why one need to move the offset ?

tkluck-janestreet · 2024-03-05T09:07:52Z

Can you elaborate a bit why one need to move the offset ?

In the C runtime, channel->offset tracks the underlying file descriptors offset: it's initialized using

  channel->offset = lseek(fd, 0, SEEK_CUR);

and from then on we keep it in sync using

written = caml_write_fd(channel->fd, channel->flags,
                            channel->buff, towrite);
channel->offset += written;

In the JS runtime, chan.offset is actually the source of truth, and e.g. the write function takes the offset as a parameter:

chan.file.write(chan.offset, chan.buffer, 0, chan.buffer_curr);
chan.offset += chan.buffer_curr;

To be honest, the C runtime seems a bit brittle when it comes to swapping out file descriptors, but evidently it works in practice.

In the JS runtime we'd have to override the offset together with the fd to get the correct behavior. Alternatively, we'd have to move offset to chan.file.

hhugo · 2024-04-11T08:33:25Z

replaced by #1578

hhugo reviewed Mar 3, 2024

View reviewed changes

tkluck-janestreet mentioned this pull request Mar 4, 2024

Runtime: reduce memory leak caused by channels #1581

Merged

hhugo closed this Apr 11, 2024

hhugo mentioned this pull request Apr 15, 2024

Don't leak channels. #1601

Merged

hhugo mentioned this pull request Dec 9, 2024

make v0.16 usable with js_of_ocaml GPR 1601 janestreet/ppx_expect#58

Merged

Don't hold on to channels after they've been closed #1577

Don't hold on to channels after they've been closed #1577

Uh oh!

Conversation

tkluck-janestreet commented Mar 1, 2024

Uh oh!

hhugo commented Mar 2, 2024

Uh oh!

hhugo commented Mar 3, 2024

Uh oh!

hhugo commented Mar 3, 2024

Uh oh!

hhugo Mar 3, 2024

Choose a reason for hiding this comment

Uh oh!

tkluck-janestreet Mar 4, 2024

Choose a reason for hiding this comment

Uh oh!

tkluck-janestreet commented Mar 4, 2024

Uh oh!

tkluck-janestreet commented Mar 4, 2024

Uh oh!

hhugo commented Mar 4, 2024

Uh oh!

hhugo commented Mar 4, 2024

Uh oh!

hhugo commented Mar 4, 2024

Uh oh!

hhugo commented Mar 4, 2024

Uh oh!

tkluck-janestreet commented Mar 5, 2024

Uh oh!

hhugo commented Apr 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants