Dynarray.blit, which allows to extend the destination dynarray#13197
Conversation
dfb69f1 to
9ef6e34
Compare
OlivierNicole
left a comment
There was a problem hiding this comment.
Once again a very pleasant code to read thanks to the clarifying comments. Only minor comments here. I believe this is correct, but I prefer to hold off from approving until I have increased my certainty by torturing the module with randomized, parallel testing.
(Currently my extension of ocaml-multicore/multicoretests#463 to cover blit is failing and I’m not quite sure why. Probably a blooper in my test code. Branch at https://github.com/OlivierNicole/multicoretests/tree/dynarray-blit)
| if len < 0 | ||
| || src_pos < 0 | ||
| || src_pos + len < 0 (* overflow check *) | ||
| || src_pos + len > Array.length src | ||
| || dst_pos < 0 | ||
| || dst_pos + len < 0 (* overflow check *) | ||
| || dst_pos + len > Array.length dst | ||
| then | ||
| (* We assume that the caller has already checked this and | ||
| will raise a proper error. The check here is only for | ||
| memory safety, it should not be reached and it is okay if | ||
| the error is uninformative. *) | ||
| invalid_arg "dummy"; |
There was a problem hiding this comment.
Is this necessary? The arrays in argument having been checked in blit_assume_room already, I can’t come up with a scenario that could go wrong here.
There was a problem hiding this comment.
These checks make the function memory-safe without assuming anything about the caller. (Otherwise it would be called unsafe_blit.) In particular if we later start using Dummy.Array.blit from other user-facing functions, we don't have to worry about introducing segfaults due to misuse. Given that the performance cost is very small, I think the extra simplicity is worth it.
| done; | ||
| end | ||
|
|
||
| let blit src src_dummy src_pos dst dst_dummy dst_pos ~len = |
There was a problem hiding this comment.
For this kind of critical functions with several arguments of the same type, I like to use labels, not for convenience but for safety: they prevent getting the order wrong at the call site, e.g., swapping source and destination.
| let blit_assume_room | ||
| (Pack src) src_pos src_length | ||
| (Pack dst) dst_pos dst_length | ||
| blit_length |
There was a problem hiding this comment.
Same remark as above on labelled arguments (in particular, I’m so used to the C convention of putting the dst first that I may slip sometimes). Maybe label all 6 arguments is overkill, but labelling the first and 4th would already make it much harder to get the order wrong.
| [1], [2], and then [3] at the end of [a]. | ||
| [append_iter a Queue.iter q] adds elements from the queue [q]. *) | ||
|
|
||
| val blit : 'a t -> int -> 'a t -> int -> int -> unit |
There was a problem hiding this comment.
I would prefer a labelled version of this function, but I won’t fight it if you tell me that it belongs to a separate StdLabels.Dynarray PR.
There was a problem hiding this comment.
I also prefer labels, and initially I wanted to have them, and then I realized that this introduces an inconsistency with Array.blit that I found hard to justify. I am happy to include labels by default if there is consensus in favor. I don't mind using labelling for the internal functions, I will update the PR with that.
| (* In case of self-blitting with overlap we must be careful about | ||
| the copying order -- we might overwrite an element we need later on. | ||
| Same logic as runtime/array.c *) | ||
| if src == dst && src_pos <= dst_pos then | ||
| (* copy in descending order *) | ||
| if src_pos = dst_pos then () | ||
| else for i = len - 1 downto 0 do | ||
| Array.unsafe_set dst (dst_pos + i) | ||
| (Array.unsafe_get src (src_pos + i)); | ||
| done |
There was a problem hiding this comment.
Is there ever a case where arr1 == arr2 but dummy1 != dummy2? Noting that the dummies must agree on the stamp types and that's not the case if a fresh one is passed.
There was a problem hiding this comment.
Ah, good catch! If we already know that the dummies differ (we checked this above), then we know that we cannot be in the self-blit scenario, as the backing arrays of distinct dynarrays are always distinct. I wrote the same-dummy test above strictly as an optimization over a correct-on-its-own implementation, but in fact I could simplify the implementation here based on this precondition.
(I'm not sure what you meant about the stamp types, but in any case they are sort of a distraction here: outside the Dummy signature the function is typed to accept arrays with different stamp types, so we don't know much about the stamps.)
There was a problem hiding this comment.
I'm not sure what you meant about the stamp types
I was trying to find a contradiction in my reasoning. What I meant was that inside the positive branch of this if condition, which I believe is unreachable in all cases, those two facts are true: dummy1 != dummy2, arr1 == arr2.
but if arr1 == arr2, it must be the case for me as a caller that I can't pass blit any other dummy for either of them, because the stamp types won't match with their respective array (and by extension the other dummy). e.g.
let (Pack a) = Dynarray.create() in
let (Pack b) = Dynarray.create() in
Dynarray.Dummy.Array.blit a.arr a.dummy 0 a.arr b.dummy 0 ~len:0so it must be the case that the dummies are in fact equal.
this is built on my assumption that 'stamp ensures the following implication stamp1 = stamp2 -> (dummy1 == dummy 2) = true.. and I wasn't really sure of that.
I was actually trying to find a way to ever safely hit a case where dummies aren't equal (in stamp or value) but the arrays are (in value), and I found that this is only the case if the arrays are empty (and that's only because the empty array has a stable out of heap repr like the empty list etc, so it's a harmless -- and useless -- blit).
There was a problem hiding this comment.
Indeed, the question is to find a simple explanation for why a.dummy != b.dummy implies either a.arr != b.arr or a.arr == b.arr == [||]. I think that the reasoning is that a == b implies a.dummy == b.dummy (obviously) and that a != b implies a.arr != b.arr or a.arr == b.arr == [||]. The second part of this argument, that distinct dynarrays have distinct backing arrays, is a global invariant of the whole Dynarray module, which can be checked by looking at all functions to ensure that they respect this invariant: all initializations or writes to the arr field of a dynarray use an empty or fresh/separated backing array.
There was a problem hiding this comment.
IIUC this is as a discussion about a possible simplification/optimization and not an incorrectness, so my approval still stands and you can merge as-is, unless you want to update the PR with further improvements.
There was a problem hiding this comment.
I ended up implementing the small simplification, which makes the code shorter and easier to read. I also rebased the PR on the current trunk.
|
The addition looks correct to me. I keep thinking that at least |
|
I pushed a new commit that uses labels, the same labels as for ArrayLabels.blit : src, src_pos, dst, dst_pos, len. @Octachron, @nojb: is it okay to add labels to a (bonus question: is one of you interested in being a second reviewer for a |
|
There is a similar prior case with |
(You are too kind, and/but we also need a second approval.) |
Just for future reference: is there a consensus that the unlabelled version of |
nojb
left a comment
There was a problem hiding this comment.
LGTM, but I only skimmed over the implementation. I left a couple of non-blocking questions below.
| will raise a proper error. The check here is only for | ||
| memory safety, it should not be reached and it is okay if | ||
| the error is uninformative. *) | ||
| invalid_arg "dummy"; |
There was a problem hiding this comment.
Nit, but couldn't/shouldn't we use assert false here?
| [dst_pos + len] is larger than [length dst]. The only requirement | ||
| is that [dst_pos] must be at most [length dst] (included), so that | ||
| there is no gap between the current elements and the blit | ||
| region. |
There was a problem hiding this comment.
I didn't look at the implementation in detail, but is there a good reason for this restriction? Lifting it would seem to give a bit more flexibility (or perhaps we want to keep this restriction to make the function a bit hard to misuse).
There was a problem hiding this comment.
Dynarrays are contiguous, all elements from 0 to length - 1 must be defined. I don't see how to preserve this property if we allow blit to create a gap in between.
(The underlying implementation of Dynarray would support dynamic "optional" arrays where arbitrary elements may be missing, but the API would be quite different.)
There was a problem hiding this comment.
Of course, I got mixed up for a moment. Thanks.
| ~len:blit_length; | ||
| () |
There was a problem hiding this comment.
| ~len:blit_length; | |
| () | |
| ~len:blit_length |
Suggested-by: Hazem ElMasry <hazem-work@riseup.net>
|
Thanks @nojb for your review. I made the changes you suggested, this should be good to merge if the CI agrees. |
We allow
blitto extend the destination array with new elements at the end. In other words,Dynarray.blit src src_pos dst dst_pos blit_lenis valid when0 <= dst_pos <= length dst, in contrast toArray.blitwhich is only valid when0 <= dst_pos <= length dst - blit_len.Self-blitting is also allowed: whereas
Dynarray.append a ais invalid,Dynarray.blit a 0 a (length a) (length a)extendsawith a copy of itself.This feature came up in the discussion of #13196, when @hyphenrf proposes to add a
Dynarray.insertwhich cannot easily&efficiently be implemented outside the abstraction boundary. The present PR does not take a position on whetherDynarray.insertshould go in the stdlib, but it proposes to add.blitas a building block that would be useful for this and other use-cases as well.