Skip to content

Dynarray.blit, which allows to extend the destination dynarray#13197

Merged
gasche merged 3 commits into
ocaml:trunkfrom
gasche:dynarray-blit
Aug 7, 2024
Merged

Dynarray.blit, which allows to extend the destination dynarray#13197
gasche merged 3 commits into
ocaml:trunkfrom
gasche:dynarray-blit

Conversation

@gasche

@gasche gasche commented May 26, 2024

Copy link
Copy Markdown
Member

We allow blit to extend the destination array with new elements at the end. In other words, Dynarray.blit src src_pos dst dst_pos blit_len is valid when 0 <= dst_pos <= length dst, in contrast to Array.blit which is only valid when 0 <= dst_pos <= length dst - blit_len.

Self-blitting is also allowed: whereas Dynarray.append a a is invalid, Dynarray.blit a 0 a (length a) (length a) extends a with a copy of itself.

This feature came up in the discussion of #13196, when @hyphenrf proposes to add a Dynarray.insert which cannot easily&efficiently be implemented outside the abstraction boundary. The present PR does not take a position on whether Dynarray.insert should go in the stdlib, but it proposes to add .blit as a building block that would be useful for this and other use-cases as well.

Comment thread stdlib/dynarray.ml Outdated
@gasche gasche force-pushed the dynarray-blit branch 2 times, most recently from dfb69f1 to 9ef6e34 Compare May 27, 2024 07:29

@OlivierNicole OlivierNicole left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again a very pleasant code to read thanks to the clarifying comments. Only minor comments here. I believe this is correct, but I prefer to hold off from approving until I have increased my certainty by torturing the module with randomized, parallel testing.

(Currently my extension of ocaml-multicore/multicoretests#463 to cover blit is failing and I’m not quite sure why. Probably a blooper in my test code. Branch at https://github.com/OlivierNicole/multicoretests/tree/dynarray-blit)

Comment thread stdlib/dynarray.ml Outdated
Comment on lines +282 to +294
if len < 0
|| src_pos < 0
|| src_pos + len < 0 (* overflow check *)
|| src_pos + len > Array.length src
|| dst_pos < 0
|| dst_pos + len < 0 (* overflow check *)
|| dst_pos + len > Array.length dst
then
(* We assume that the caller has already checked this and
will raise a proper error. The check here is only for
memory safety, it should not be reached and it is okay if
the error is uninformative. *)
invalid_arg "dummy";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? The arrays in argument having been checked in blit_assume_room already, I can’t come up with a scenario that could go wrong here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks make the function memory-safe without assuming anything about the caller. (Otherwise it would be called unsafe_blit.) In particular if we later start using Dummy.Array.blit from other user-facing functions, we don't have to worry about introducing segfaults due to misuse. Given that the performance cost is very small, I think the extra simplicity is worth it.

Comment thread stdlib/dynarray.ml
done;
end

let blit src src_dummy src_pos dst dst_dummy dst_pos ~len =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this kind of critical functions with several arguments of the same type, I like to use labels, not for convenience but for safety: they prevent getting the order wrong at the call site, e.g., swapping source and destination.

Comment thread stdlib/dynarray.ml
Comment on lines +695 to +694
let blit_assume_room
(Pack src) src_pos src_length
(Pack dst) dst_pos dst_length
blit_length

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as above on labelled arguments (in particular, I’m so used to the C convention of putting the dst first that I may slip sometimes). Maybe label all 6 arguments is overkill, but labelling the first and 4th would already make it much harder to get the order wrong.

Comment thread stdlib/dynarray.mli Outdated
[1], [2], and then [3] at the end of [a].
[append_iter a Queue.iter q] adds elements from the queue [q]. *)

val blit : 'a t -> int -> 'a t -> int -> int -> unit

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a labelled version of this function, but I won’t fight it if you tell me that it belongs to a separate StdLabels.Dynarray PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer labels, and initially I wanted to have them, and then I realized that this introduces an inconsistency with Array.blit that I found hard to justify. I am happy to include labels by default if there is consensus in favor. I don't mind using labelling for the internal functions, I will update the PR with that.

@OlivierNicole OlivierNicole left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The randomized testing shows no crashes and no deviation from the specification, which makes me a bit more confident in my fallible audit.

Comment thread stdlib/dynarray.ml Outdated
Comment on lines +295 to +304
(* In case of self-blitting with overlap we must be careful about
the copying order -- we might overwrite an element we need later on.
Same logic as runtime/array.c *)
if src == dst && src_pos <= dst_pos then
(* copy in descending order *)
if src_pos = dst_pos then ()
else for i = len - 1 downto 0 do
Array.unsafe_set dst (dst_pos + i)
(Array.unsafe_get src (src_pos + i));
done

@hyphenrf hyphenrf Jun 8, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a case where arr1 == arr2 but dummy1 != dummy2? Noting that the dummies must agree on the stamp types and that's not the case if a fresh one is passed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch! If we already know that the dummies differ (we checked this above), then we know that we cannot be in the self-blit scenario, as the backing arrays of distinct dynarrays are always distinct. I wrote the same-dummy test above strictly as an optimization over a correct-on-its-own implementation, but in fact I could simplify the implementation here based on this precondition.

(I'm not sure what you meant about the stamp types, but in any case they are sort of a distraction here: outside the Dummy signature the function is typed to accept arrays with different stamp types, so we don't know much about the stamps.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you meant about the stamp types

I was trying to find a contradiction in my reasoning. What I meant was that inside the positive branch of this if condition, which I believe is unreachable in all cases, those two facts are true: dummy1 != dummy2, arr1 == arr2.
but if arr1 == arr2, it must be the case for me as a caller that I can't pass blit any other dummy for either of them, because the stamp types won't match with their respective array (and by extension the other dummy). e.g.

let (Pack a) = Dynarray.create() in
let (Pack b) = Dynarray.create() in
Dynarray.Dummy.Array.blit a.arr a.dummy 0 a.arr b.dummy 0 ~len:0

so it must be the case that the dummies are in fact equal.
this is built on my assumption that 'stamp ensures the following implication stamp1 = stamp2 -> (dummy1 == dummy 2) = true.. and I wasn't really sure of that.

I was actually trying to find a way to ever safely hit a case where dummies aren't equal (in stamp or value) but the arrays are (in value), and I found that this is only the case if the arrays are empty (and that's only because the empty array has a stable out of heap repr like the empty list etc, so it's a harmless -- and useless -- blit).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the question is to find a simple explanation for why a.dummy != b.dummy implies either a.arr != b.arr or a.arr == b.arr == [||]. I think that the reasoning is that a == b implies a.dummy == b.dummy (obviously) and that a != b implies a.arr != b.arr or a.arr == b.arr == [||]. The second part of this argument, that distinct dynarrays have distinct backing arrays, is a global invariant of the whole Dynarray module, which can be checked by looking at all functions to ensure that they respect this invariant: all initializations or writes to the arr field of a dynarray use an empty or fresh/separated backing array.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is as a discussion about a possible simplification/optimization and not an incorrectness, so my approval still stands and you can merge as-is, unless you want to update the PR with further improvements.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up implementing the small simplification, which makes the code shorter and easier to read. I also rebased the PR on the current trunk.

@hyphenrf hyphenrf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to confirm that blit works wonderfully for my insert use-case, the code is much simplified and with negligible-to-no differences in performance observed (when stress-tested with many insertions to the front).

@OlivierNicole

Copy link
Copy Markdown
Contributor

The addition looks correct to me. I keep thinking that at least Dynarray.blit would be much nicer (and safer) to use with labels.

@gasche

gasche commented Aug 2, 2024

Copy link
Copy Markdown
Member Author

I pushed a new commit that uses labels, the same labels as for ArrayLabels.blit : src, src_pos, dst, dst_pos, len.

@Octachron, @nojb: is it okay to add labels to a blit function that is not inside a FooLabels module?

(bonus question: is one of you interested in being a second reviewer for a stdlib PR?)

@Octachron

Copy link
Copy Markdown
Member

There is a similar prior case with Format.pp_print_substring which uses two labels.
I tend to agree that blit falls within the "highly asymmetric arguments with the same type" case that warrants the use of labels outside of the labeled sublibrary.
There is however the small infelicity that it creates an irregularity with the array version. I think this is still ok for blit.
And yes, I agree that this is a fine addition to the standard library.

@gasche

gasche commented Aug 2, 2024

Copy link
Copy Markdown
Member Author

And yes, I agree that this is a fine addition to the standard library.

(You are too kind, and/but we also need a second approval.)

@nojb

nojb commented Aug 3, 2024

Copy link
Copy Markdown
Contributor

@Octachron, @nojb: is it okay to add labels to a blit function that is not inside a FooLabels module?

Just for future reference: is there a consensus that the unlabelled version of blit is error-prone and that such functions should always be labelled, but that for backwards compatibility reasons we don't modify Array.blit, Bytes.blit, etc?

@nojb nojb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I only skimmed over the implementation. I left a couple of non-blocking questions below.

Comment thread stdlib/dynarray.ml Outdated
will raise a proper error. The check here is only for
memory safety, it should not be reached and it is okay if
the error is uninformative. *)
invalid_arg "dummy";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but couldn't/shouldn't we use assert false here?

Comment thread stdlib/dynarray.mli
Comment on lines +193 to +196
[dst_pos + len] is larger than [length dst]. The only requirement
is that [dst_pos] must be at most [length dst] (included), so that
there is no gap between the current elements and the blit
region.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at the implementation in detail, but is there a good reason for this restriction? Lifting it would seem to give a bit more flexibility (or perhaps we want to keep this restriction to make the function a bit hard to misuse).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynarrays are contiguous, all elements from 0 to length - 1 must be defined. I don't see how to preserve this property if we allow blit to create a gap in between.

(The underlying implementation of Dynarray would support dynamic "optional" arrays where arbitrary elements may be missing, but the API would be quite different.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I got mixed up for a moment. Thanks.

Comment thread stdlib/dynarray.ml Outdated
Comment on lines +713 to +714
~len:blit_length;
()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
~len:blit_length;
()
~len:blit_length

@gasche

gasche commented Aug 3, 2024

Copy link
Copy Markdown
Member Author

Thanks @nojb for your review. I made the changes you suggested, this should be good to merge if the CI agrees.

@gasche gasche added the merge-me label Aug 3, 2024
@gasche gasche merged commit 08b4c49 into ocaml:trunk Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants