Add BigArray.Genarray.free #389

talex5 · 2016-01-04T11:21:36Z

Free the memory used by a bigarray, setting all dimensions to zero to prevent further use.

This is useful to ensure that memory-mapped files are closed promptly, without waiting for the garbage collector. Otherwise, mapping a large number of files may hit operating system limits on the number of open files.

It's also needed in MirageOS, where some Xen protocols require shared memory to be unmapped at certain points.

If e.g. sub_left is used to create multiple bigarrays backed by the same memory block, free must be called on all of them before the memory is actually released.

Test program:

(* ocamlc unix.cma bigarray.cma test.ml -o test && ./test *)

open Bigarray

let maps = ref []

let open_and_close () =
  let fd = Unix.(openfile "/etc/passwd" [O_RDONLY; O_NONBLOCK] 0o644) in
  let mapped = Array1.map_file fd char c_layout false 100 in
  (* Make bug more obvious by ensuring no GC *)
  maps := mapped :: !maps;
  (* Array1.free mapped; *)
  Unix.close fd

let () =
  let i = ref 0 in
  while true do
    incr i;
    open_and_close ();
    Printf.printf "Mapped %d files\n%!" !i
  done

For me, on Linux, this prints:

...
Mapped 35589 files
Mapped 35590 files
Fatal error: exception Unix.Unix_error(Unix.ENFILE, "open", "/etc/passwd")

Free the memory used by a bigarray, setting all dimensions to zero to prevent further use. This is useful to ensure that memory-mapped files are closed promptly, without waiting for the garbage collector. Otherwise, mapping a large number of files may hit operating system limits on the number of open files. If e.g. [sub_left] is used to create multiple bigarrays backed by the same memory block, [free] must be called on all of them before the memory is actually released. Test program: (* ocamlc unix.cma bigarray.cma test.ml -o test && ./test *) open Bigarray let maps = ref [] let open_and_close () = let fd = Unix.(openfile "/etc/passwd" [O_RDONLY; O_NONBLOCK] 0o644) in let mapped = Array1.map_file fd char c_layout false 100 in (* Make bug more obvious by ensuring no GC *) maps := mapped :: !maps; (* Array1.free mapped; *) Unix.close fd let () = let i = ref 0 in while true do incr i; open_and_close (); Printf.printf "Mapped %d files\n%!" !i done ... Mapped 35589 files Mapped 35590 files Fatal error: exception Unix.Unix_error(Unix.ENFILE, "open", "/etc/passwd")

alainfrisch · 2016-01-04T11:57:34Z

@xavierleroy implemented such a proposal some time ago and then reverted it. One problem with changing dynamically the dimensions is that it prevents interesting optimizations such as lifting/eliminating array bound checks (e.g. in the body of a for-loop iterating over indices of a bigarray). Or one can still assume that dimension are non mutable, at the risk of getting segfaults/bus error instead of proper out-of-bounds exceptions if the bigarray is accessed after being freed.

gasche · 2016-01-04T12:24:14Z

See MPR#4180, MPR#6962.

talex5 · 2016-01-04T12:38:44Z

What's the current recommended solution then? I hit this twice recently. First in ocaml-git, which uses mmap to read files (and runs out, similar to the example posted above). Second, in a Mirage unikernel where I mapped a share memory region for a network device and started an Lwt thread watching it. When the device was disconnected, the memory was unmapped. Later, other data was mapped to the same address, and the old thread started trying to process it (which didn't end well).

chambart · 2016-01-04T13:36:45Z

Declare an new type that ensure that an access never occur after freeing.

module Unloadable : sig
  type t
  val map_file : string -> t
  val free : t -> unit
  val get : t -> int -> char
end = struct
  type t = { mutable freed : bool; data : (...) Bigarray.Array1.t }
  let map_file file = { freed = false; data = Bigarray.Array1.map_file ... }
  external unsafe_free_ba : (...) Bigarray.Array1.t -> unit = "..."
  let free t = t.freed <- true; unsafe_free_ba t.data
  let get t x =
    if t.freed then raise ...
    else Bigarray.Array1.get t.data x
end

If the indirection cost really matters you can imagine adding the 'freed' field to the C representation and add new primitives to check it when doing an access and implement this new module using it.

For the ocaml-git case, you also probably want to have a read-only type to allow sharing without doing the weak hash table dance, but that's another story.

talex5 · 2016-01-04T15:15:04Z

The problem is that we expose the fact that we're using a Bigarray. For example, there is code that does:

let write fd t =
  Lwt_bytes.write fd t.Cstruct.buffer t.Cstruct.off t.Cstruct.len

If I hide the fact that t.buffer is a bigarray, I can't use it with Lwt (and I can't ensure that Lwt will check whether the Cstruct is freed without changing Bigarray itself).

chambart · 2016-01-04T15:29:32Z

On 04/01/2016 16:15, Thomas Leonard wrote:

The problem is that we expose the fact that we're using a Bigarray.
For example, there is code that does:

|let write fd t = Lwt_bytes.write fd t.Cstruct.buffer t.Cstruct.off
t.Cstruct.len |

If I hide the fact that t.buffer is a bigarray, I can't use it with
Lwt (and I can't ensure that Lwt will check whether the Cstruct is
freed without changing Bigarray itself).

—
Reply to this email directly or view it on GitHub
#389 (comment).

You can either:

add an lwt_bytes_write function to your library and carefully check
that this does not use the bigarray after returning;

copy

gasche · 2016-01-05T23:00:57Z

For reference, I considered the following thought experiment: we could store in each bigarray an initialization-defined immutable bit that indicates whether or not its size may change in the future. The bigarray disposable/release/free function would fail dynamically on size-immutable arrays (ok, a dynamic check is not so nice), and bound-checking hoisting would only be allowed assuming this bit checked size-immutable (which is worth it as soon as at least two size-checks can be factorized; but admittedly such a there-is-potential-for-hoisting-logic may be harder to implement).

(You may think of phantom types for statical classification of may-be-resized, but the pain of transitioning end-user APIs could be a bit too large.)

The potential performance benefits of never-resizing bigarrays were discussed in MPR#5203. It may be the case that with the more aggressive inlining settings of flambda, a non-escaping analysis suffices to locally optimize bigarray accesses without imposing global non-resizability guarantees. It is, unfortunately, at least one release too soon to be able to know this.

dbuenzli · 2016-01-05T23:11:12Z

Just for reference, I think that it is since PR #115 that you can no longer play tricks with bigarray sizes (I used to do the same kind of thing for temporarily mapping GPU memory pointers).

alainfrisch · 2016-01-05T23:20:29Z

The potential performance benefits of never-resizing bigarrays were discussed in MPR#5203.

This was about lifting the access to the data pointer, which assumes that this pointer does not change. This property is not strictly related to resizability (which would affect the ability to lift bound checks).
Your proposal could apply to these two properties (and associated optimizations) in a similar way, or perhaps a single flag should control both.

and bound-checking hoisting would only be allowed assuming this bit checked size-immutable

I guess you are not proposing to compile the code twice and dispatch dynamically between the two versions according to the value of the flag; but rather to somehow let the programmer assert explicitly that the flag is set (causing a runtime failure if not, and allowing later optimizations). Is that right?

Something like:

let foo a =
  Bigarray.assert_immutable a;
  (* from this point, the compiler nows that 'a' has an immutable size and/or data-pointer *)
 ...

talex5 · 2016-02-28T13:25:01Z

For reference, Jane Street's Bigstring library provides:

val unsafe_destroy : t -> unit

unsafe_destroy bstr destroys the bigstring by deallocating its associated data or, if memory-mapped, unmapping the corresponding file, and setting all dimensions to zero. This effectively frees the associated memory or address-space resources instantaneously. This feature helps working around a bug in the current OCaml runtime, which does not correctly estimate how aggressively to reclaim such resources.

This operation is safe unless you have passed the bigstring to another thread that is performing operations on it at the same time. Access to the bigstring after this operation will yield array bounds exceptions.
Raises Failure if the bigstring has already been deallocated (or deemed "external", which is treated equivalently), or if it has proxies, i.e. other bigstrings referring to the same data.

https://ocaml.janestreet.com/ocaml-core/109.24.01/doc/core/Bigstring.html#6_Destruction

DemiMarie · 2016-02-29T10:15:35Z

I think that the best solution would be for the compiler to assume that there are no calls to Bigarray.Genarray.free that it cannot see – that is, it should hoist accesses as long as it can see that nothing could change the bigarray's length. This might require interprocedural optimizations for best effect.

gasche · 2016-04-18T14:15:24Z

For what it's worth, I also think that the current policy has caused too much troubles for too little actual gain. We'll see how effectively flambda's inlining can help here, but I would be in favor of allowing explicit deallocation of bigarrays.

DemiMarie · 2016-04-23T04:18:25Z

@gasche I agree, especially since CSE and inlining should keep the performance penalty to a minimum.

xavierleroy · 2016-12-04T18:01:28Z

@gasche writes: "I would be in favor of allowing explicit deallocation of bigarrays." And turn off CSE on bigarray bounds and data pointer, degrading performance? Or keeping CSE and risk segfaults instead of "out of bound" exceptions when accessing a bigarray that has been freed? This is the dilemma...

gasche · 2016-12-04T18:58:27Z

And turn off CSE on bigarray bounds and data pointer, degrading performance?

Yes -- I feel that the qualitative difference in some interesting system workflows being made impossible is worse than the quantitative performance regressions that are being discussed. Supposedly we have better inlining capacities than we had previously, and in any case a local (post-inlining) escape analysis should be enough to restore the optimization in most cases. I don't know who has bigarray-intensive computations in production ( @alainfrisch ? ) but they could probably test this assumption.

Another route to take would be to declare Bigarray unfit to represent raw memory, and develop an alternative library to do what people have been doing with Bigarrays instead. This alternative should probably be in stdlib for people to actually accept to move to it. But if we have no one willing to do the work of organizing this (which would have a fairly invasive impact on a lot of third-party libraries), I personally think that giving up on the never-freed guarantee is the best move.

ghost · 2016-12-09T16:01:44Z

We definitely have bigarray-intensive computations in production. I think allowing bigarray to be freed without implementing this escape analysis would be pretty bad for us

mshinwell · 2016-12-09T16:08:31Z

I'm not sure I understand what's wrong with the unsafe_destroy solution. If the intended use cases of freeing Bigarrays are supposed to temporally match up with some external event, for example the disconnection of a device using memory-mapped I/O, a reasonable amount of care is going to be needed anyway. To my mind, it is therefore reasonable to place the burden on the programmer that after they call such an unsafe_destroy function, they don't manipulate the Bigarray subsequently. The main question that remains would seem to be: we would need to ensure that the compiler doesn't, for example, spill a Bigarray data pointer across a call to unsafe_destroy and then reload it and use it. However I suspect it is the case that such a code pattern would only arise if CSE has been in effect---and I don't think CSE should have this behaviour since registers of type Addr (e.g. Bigarray data pointers) are dropped across a call.

@xavierleroy Can you tell me where this reasoning is flawed?

mshinwell · 2016-12-27T11:15:55Z

@xavierleroy Ping, please.

xavierleroy · 2019-10-14T18:39:21Z

Nearly three years later, I have no idea what to do with this PR. I implemented bigarray deallocation a long time ago, then was convinced by @alainfrisch to take it out. Now I don't care and will let you guys fight it out.

…locking_section Avoid holding domain_lock when using backup thread

The host doesn't treat the Wayland connection as closed until the ring is unmapped, which previously only happened when the garbage collector got to it. This resulted in windows not disappearing immediately when the application exited. This is a hack: see ocaml/ocaml#389 This also improves error handling around connection shutdown: - Check the device is still open before accessing the shared ring (since it might now be unmapped). - Silently ignore sends after a close. ocaml-wayland may need to flush its transmit buffer and this isn't an error. - Return end-of-file immediately when reading if the device has been closed. Cstruct.shift checks that the buffer is still valid, which can give a poor error message if we close during a read.

DemiMarie · 2022-10-17T20:33:20Z

@talex5: since the main uses for this seem to involve memory mapped buffers, another thought came to mind: what about replacing the underlying buffer with a PROT_NONE mapping? That would not break any compiler optimizations, and any access after freeing would (deterministically) cause a segmentation fault.

talex5 · 2022-10-20T19:19:32Z

what about replacing the underlying buffer with a PROT_NONE mapping?

That would certainly be more secure. However, it seems that people in this thread are not happy about segfaults at all, even deterministic ones (#389 (comment)):

Or keeping CSE and risk segfaults instead of "out of bound" exceptions when accessing a bigarray that has been freed?

DemiMarie · 2022-10-20T21:24:05Z

@talex5: what about catching the segfaults and turning them into an exception?

gadmm · 2022-10-21T10:06:31Z

Being able to convert a synchronous signal into an exception would be nice and open new opportunities in various scenarios, not just for freeing the backing memory of the bigarray. See the discussion at #5471 for (essentially) a previous attempt. It is unfortunate that the discussion is not fully available publicly though.

xavierleroy · 2022-10-23T17:30:06Z

Being able to convert a synchronous signal into an exception\

We've tried to do this for the last 25 years to convert "segmentation fault" signals to Stack_overflow exceptions, and it's a complete mess of machine and OS dependencies. One of the best things about OCaml 5 is that we got rid of that code. I wish it will never come back.

talex5 mentioned this pull request Jan 5, 2016

unmap is not safe mirage/ocaml-gnt#21

Open

damiendoligez added this to the 4.04-or-later milestone Jan 22, 2016

damiendoligez removed this from the 4.04 milestone Aug 2, 2016

damiendoligez added this to the 4.07-or-later milestone Sep 27, 2017

damiendoligez removed this from the consider-for-4.07 milestone Jun 1, 2018

dinosaure mentioned this pull request Aug 16, 2018

[RFC] Add Bigstring module #1961

Closed

talex5 mentioned this pull request Mar 20, 2019

First draft to add capabilities on cstruct mirage/ocaml-cstruct#237

Merged

anmolsahoo25 pushed a commit to anmolsahoo25/ocaml that referenced this pull request Aug 25, 2020

Merge pull request ocaml#389 from ctk21/remove_bt_sync_on_wake_from_b…

a81e7be

…locking_section Avoid holding domain_lock when using backup thread

poechsel added a commit to poechsel/ocaml that referenced this pull request Apr 5, 2021

Let cont does not allocate (ocaml#389)

f8bd46b

chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Jan 4, 2022

Install modules for comprehensions in compiler-libs/ (ocaml#389)

397ee87

talex5 mentioned this pull request Apr 20, 2022

Manually unmap the virtio-gpu ring when done talex5/wayland-proxy-virtwl#31

Merged

art-w mentioned this pull request Feb 20, 2023

irmin-pack 3.6: fix mmap keeps a file descriptor mirage/irmin#2196

Closed

gasche added the maintainer-decision-needed label Mar 10, 2023

gadmm mentioned this pull request May 15, 2023

Garbage collection behavior when intentionally accumulating out-of-heap memory #12228

Open

lthls mentioned this pull request Aug 11, 2023

Flambda2 alignment check primitive ocaml-flambda/flambda-backend#1730

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BigArray.Genarray.free #389

Add BigArray.Genarray.free #389

talex5 commented Jan 4, 2016

alainfrisch commented Jan 4, 2016

gasche commented Jan 4, 2016

talex5 commented Jan 4, 2016

chambart commented Jan 4, 2016

talex5 commented Jan 4, 2016

chambart commented Jan 4, 2016

gasche commented Jan 5, 2016

dbuenzli commented Jan 5, 2016

alainfrisch commented Jan 5, 2016

talex5 commented Feb 28, 2016

DemiMarie commented Feb 29, 2016

gasche commented Apr 18, 2016

DemiMarie commented Apr 23, 2016

xavierleroy commented Dec 4, 2016

gasche commented Dec 4, 2016 •

edited

ghost commented Dec 9, 2016

mshinwell commented Dec 9, 2016

mshinwell commented Dec 27, 2016

xavierleroy commented Oct 14, 2019

DemiMarie commented Oct 17, 2022

talex5 commented Oct 20, 2022

DemiMarie commented Oct 20, 2022

gadmm commented Oct 21, 2022

xavierleroy commented Oct 23, 2022

Add BigArray.Genarray.free #389

Are you sure you want to change the base?

Add BigArray.Genarray.free #389

Conversation

talex5 commented Jan 4, 2016

alainfrisch commented Jan 4, 2016

gasche commented Jan 4, 2016

talex5 commented Jan 4, 2016

chambart commented Jan 4, 2016

talex5 commented Jan 4, 2016

chambart commented Jan 4, 2016

gasche commented Jan 5, 2016

dbuenzli commented Jan 5, 2016

alainfrisch commented Jan 5, 2016

talex5 commented Feb 28, 2016

DemiMarie commented Feb 29, 2016

gasche commented Apr 18, 2016

DemiMarie commented Apr 23, 2016

xavierleroy commented Dec 4, 2016

gasche commented Dec 4, 2016 • edited

ghost commented Dec 9, 2016

mshinwell commented Dec 9, 2016

mshinwell commented Dec 27, 2016

xavierleroy commented Oct 14, 2019

DemiMarie commented Oct 17, 2022

talex5 commented Oct 20, 2022

DemiMarie commented Oct 20, 2022

gadmm commented Oct 21, 2022

xavierleroy commented Oct 23, 2022

gasche commented Dec 4, 2016 •

edited