Skip to content

Conversation

@tkluck-janestreet
Copy link

In my use case, this issue is responsible for O(100MiB) of memory usage, because I'm reading one file per named time zone, and each file has a 65KiB buffer that doesn't get released.

The file io.js provides the same runtime as the C runtime from 2. There's a global array of all existing channels, and this never gets released even if a channel is closed. In the C runtime, this does eventually happen in a call to [caml_finalize_channel] which is called by the GC, but I don't think we have that ability in js_of_ocaml.

This feature makes [chanid] an object rather than an integer, so we can let the javascript garbage collector take care of it.

The object [caml_ml_channels] continues to exist, but its only remaining function is to provide "override" functionality in the way that ppx_expect needs it.

Status:

  • get feedback from js_of_ocaml developers
  • when everyone is happy with the direction, translate to an older/more compatible version of javascript (avoiding class)
  • merge

Some useful references:

In my use case, this is responsible for O(100MiB) of
memory usage, because we hold on to multiple copies of time zone data, and
each file has a 65KiB buffer that doesn't get released.

The file io.js provides the same runtime as the C runtime from [2]. There's a global
array of all existing channels, and this never gets released even if a channel is closed.
In the C runtime, this does eventually happen in a call to [caml_finalize_channel] which
is called by the GC, but I don't think we have that ability in js_of_ocaml.

This feature makes [chanid] an object rather than an integer, so we can let
the javascript garbage collector take care of it.

The object [caml_ml_channels] continues to exist, but its only remaining
function is to provide "override" functionality in the way that ppx_expect
needs it.

Some useful references:

[1]: https://v2.ocaml.org/releases/5.1/api/Stdlib.html#2_Outputfunctionsonstandardoutput
[2]: https://github.com/ocaml/ocaml/blob/trunk/runtime/io.c#L163
[3]: https://github.com/janestreet/ppx_expect/blob/master/runtime/runtime.js
@hhugo
Copy link
Member

hhugo commented Mar 2, 2024

The tests are currently broken, is this expected ? do we need a modified version of ppx_expect ?

There are at least two other packages using similar tricks

I'm not convinced by the override logic. We should be able to solve the leak using a WeakMap instead of an array for caml_ml_channels. We just need to track the opened channels somehow.

@hhugo
Copy link
Member

hhugo commented Mar 3, 2024

I've worked on an alternative in #1578. Let me know what you think.

@hhugo
Copy link
Member

hhugo commented Mar 3, 2024

The CI failure seems to be due to a piece of code in parsing.js assuming that a channel is an integer.

chan.opened = false;
caml_sys_close(chan.fd)
caml_sys_close(chan.fd);
chanid.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happen if you close an "overridden" channel twice ?
Don't you end up closing both the override and the original channel ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and yes that's what would happen. For example, if you close stderr when you're in the middle of executing an expect test, then the first call will close the temporary output channel instead. Any second call would close stderr itself.

We're stuck with the (non garbage collectible) string keys here, so this is not trivial to fix.

This override logic is only for ppx_expect, and arguably it's pretty poorly defined what closing the channel is supposed to to. But let me see if we can improve the compatibility even more.

@tkluck-janestreet
Copy link
Author

Thanks for engaging with this @hhugo . It looks like, by working on #1578, you discovered the same constraints that I did.

The only way to fix the memory leak is replacing the integer ids by something garbage collectible; objects or symbols both work. That still has some design freedom:

  • We could remove the concept of chanid entirely and just return the channel object as a handle to itself. Overrides can be represented by a new property on the channel object.

  • But I'd actually be in favour of leaving chanid and chan as separate objects so it's easier to understand where the override applies. Your solution with WeakMap and Symbol works, and is equivalent to returning the object that's stored as the value in the WeakMap except for opaqueness.

  • Alternatively, you could swap out the fd; that's what the C runtime does. But that would need moving the offset property to the file descriptor object as well.

Now all of these things need changing ppx_expect. (It took me embarrassingly long to realize that WeakMap doesn't use a[b] indexing but uses .get and .set instead, and that it's therefore backwards incompatible with how ppx_expect uses it.)

It's up to you to decide how you manage releases together with your reverse dependencies. But this pull request gives you the option of fixing the issue without breaking your reverse dependencies. I think it's the best we can do under that constraint.

You could still follow up afterwards: create an actual API for overrides, migrating the reverse dependencies to it, and remove caml_ml_channels entirely. I expect that that's a slower process and the memory leak remains in the meantime. That might be a reason for merging this pull request.

@tkluck-janestreet
Copy link
Author

I'll look into the test failures; it's embarrassing but I'm having a hard time running this in its own tree because "vanilla" dune is being shadowed by a company-specific build. I'm sure I'll be able to sort that out.

@hhugo
Copy link
Member

hhugo commented Mar 4, 2024

The CI comes from parsing.js, caml_ml_output take a "chanid" but is called with 2 in the following code snippet.

  function log(x) {
    var s = caml_string_of_jsbytes(x + "\n");
    caml_ml_output(2, s, 0, caml_ml_string_length(s));

@hhugo
Copy link
Member

hhugo commented Mar 4, 2024

Alternatively, you could swap out the fd; that's what the C runtime does. But that would need moving the offset property to the file descriptor object as well.

I like that approach because it would bring the behavior closer to native. Would you be interested working in that direction ?

Another approach to mitigate the leak in the very short term with minimal diff would be to destroy the buffer on close.

@hhugo
Copy link
Member

hhugo commented Mar 4, 2024

Another approach to mitigate the leak in the very short term with minimal diff would be to destroy the buffer on close.

see #1581

@hhugo
Copy link
Member

hhugo commented Mar 4, 2024

Alternatively, you could swap out the fd; that's what the C runtime does. But that would need moving the offset property to the file descriptor object as well.

Can you elaborate a bit why one need to move the offset ?

@tkluck-janestreet
Copy link
Author

Can you elaborate a bit why one need to move the offset ?

In the C runtime, channel->offset tracks the underlying file descriptors offset: it's initialized using

  channel->offset = lseek(fd, 0, SEEK_CUR);

and from then on we keep it in sync using

written = caml_write_fd(channel->fd, channel->flags,
                            channel->buff, towrite);
channel->offset += written;

In the JS runtime, chan.offset is actually the source of truth, and e.g. the write function takes the offset as a parameter:

chan.file.write(chan.offset, chan.buffer, 0, chan.buffer_curr);
chan.offset += chan.buffer_curr;

To be honest, the C runtime seems a bit brittle when it comes to swapping out file descriptors, but evidently it works in practice.

In the JS runtime we'd have to override the offset together with the fd to get the correct behavior. Alternatively, we'd have to move offset to chan.file.

@hhugo
Copy link
Member

hhugo commented Apr 11, 2024

replaced by #1578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants