-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add I/O primitives for Bigarrays #12365
Conversation
Thanks for getting the ball rolling on this idea. Some high-level comments following a quick look:
A compromise could be to have "read whole" and "write whole" operations for arbitrary bigarrays in the stdlib, and byte-oriented operations over 1D char bigarrays in the Unix module... but this is to be discussed more. |
I think it would be nice to have functions for bigarray I/O in the stdlib (since bigarrays are in the stdlib now), but there is a reason that my original patch with @shindere was limited to This is because channels read and write byte sequences, so some conversion needs to be done if you are reading and writing sequences of things other than bytes. I'm not sure that the stdlib I/O functions are the right place for such a conversion API: perhaps the user should do the conversion to bytes using functions from Bigarray, and the I/O should operate only on bigarrays of bytes? |
This sounds reasonable to me. |
I am planning to rework the PR to restrict I/O primitives to 1-D char bigarrays, with the following signatures: Unix.read_bigarray : Unix.file_descr -> (char, int8_unsigned_int, _) Bigarray.Array1.t -> int -> int -> int
Unix.write_bigarray : Unix.file_descr -> (char, int8_unsigned_int, _) Bigarray.Array1.t -> int -> int -> int
Unix.single_write_bigarray : Unix.file_descr -> (char, int8_unsigned_int, _) Bigarray.Array1.t -> int -> int -> int
In_channel.input_bigarray : t -> (char, int8_unsigned_int, _) Bigarray.Array1.t -> int -> int -> int
In_channel.really_input_bigarray : t -> (char, int8_unsigned_int, _) Bigarray.Array1.t -> int -> int -> unit
Out_channel.output_bigarray : t -> (char, int8_unsigned_int, _) Bigarray.Array1.t -> int -> int -> unit Please speak up if you have any objections! |
I think it'd be clearer to also restrict the layout to |
Good point, will do. |
I agree that the layout should better be forced to be |
On the one hand, I agree that endianness differences are best handled by using marshaling (input_value/output_value). On the other hand, the memory-mapping API ( On the third hand, just like we have |
It is. Amended! |
The latest GC safety patch looks good to me, although appveyor points out that one remaining |
I agree with the third hand here - the |
We have |
Thanks, fixed! |
otherlibs/unix/write_win32.c
Outdated
ofs += numwritten; | ||
len -= numwritten; | ||
} | ||
caml_leave_blocking_section(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line shouldn't be here, and is causing the remaining appveyor failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, fixed. Thanks!
This PR needs two official approvals if it is to move forward. Any takers? Thanks! |
Nicolás Ojeda Bär (2023/07/27 05:41 -0700):
This PR needs two official approvals if it is to move forward. Any
takers?
Sure. Would you please mind squashing all the commits?
|
Sure, done. |
Thanks! What's the status of the change requested by @yallop? GH shows it
to me as not taken into account, is that correct?
|
I believe all mentioned issues have been addressed. |
The `read()` and `write()` system calls take a length with type `size_t` ≈ `uintnat` and return a result of type `ssize_t` ≈ `intnat`. So, on a 64-bit platform, the number of bytes read or written may not fit in type `int` and must be given type `intnat`.
`ReadFile` and `WriteFile` take a length of type `DWORD` (unsigned 32 bits), so the number of bytes to read or write must be capped at 0xFFFFFFFF. `recv` and `send` take a length of type `int` (signed 32 bits), so the number of bytes to read or write must be capped at INT_MAX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the implementation. I think it's OK except for the cases where the number of bytes to read or write is not representable as an int
or a DWORD
. I took the liberty to push fixes directly on this PR: one commit is for Unix, the other for Win32. Let me know what you think.
I'm also tempted to factor out the C code between the "write" and "single_write" cases, but haven't done anything in this direction yet.
This is going to be another nail in the coffin for modular IO, isn't it? More C functions, now operating on something that's not byte buffers… |
Here is a first try, on a personal branch: 8b954fb |
Thanks for the fix and the careful comments. Both commits look good to me. |
Thanks, looks good to me so I cherry-picked to this branch. Should we do the same for |
I thought about it, but was afraid to break 3rd-party reimplementations of the Unix module (JS_of_ocaml, maybe?) that might assume there are two different primitives. So, let's leave it as that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, approving! Formally, a second approval is needed, as this is a stdlib extension.
Xavier Leroy (2023/08/18 10:43 -0700):
@xavierleroy approved this pull request.
Looks good to me, approving! Formally, a second approval is needed,
as this is a stdlib extension.
Isn't your approvla the second one?
I did approve thisPR a few weeks ago but perhaps you'd prefer the second
approval to be from a core dev more seasoned with stdlib changes?
|
Ah, sorry, I forgot about it (was one month ago) and didn't look at the full history.
I'm neutral. More eyeballs is always good, but as stdlib extensions go. this PR isn't controversial, I believe. At any rate, I'll look into this again when I'm back next week. |
Xavier Leroy (2023/08/22 08:59 -0700):
Ah, sorry, I forgot about it (was one month ago) and didn't look at
the full history.
No problem. :)
> but perhaps you'd prefer the second approval to be from a core dev more seasoned with stdlib changes?
I'm neutral. More eyeballs is always good,
Yeah, especially given the poor quality of mine. Sorry, coulnd't resist.
but as stdlib extensions go. this PR isn't controversial, I believe.
I don't believe either. :)
At any rate, I'll look into this again when I'm back next week.
Thanks. Its merge will then unblock #12360 which will be made simpler
once rebased, I expect.
|
@shindere's review took place before the changes explained in #12365 (review) so it would be best to have another review of the current state of the patch.
Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LVGTM2! There is one actual typo to fix in the doc comment for Out_channel.output_bigarray
.
Only because of that, the description of Unix.read_bigarray
uses the verb "read" in its description yet the descriptions of In_channel.input_bigarray
and In_channel.really_input_bigarray
use the verb "write" which read oddly in this unusual position of reading them all at once. I'd alter the In_channel
functions to use "read the data into a bigarray" instead.
Finally, in the doc strings, there's inconsistency between "take data" and "take the data" between otherwise identical descriptions - FWIW I'd go for adding the (i.e. "read the data into a bigarray" and "take the data from a bigarray").
Cool! Thanks a lot for the review!
Perhaps worth taking the opportunity of dong the fixes to squash all the
commits or at least make sure their number is minimal and the history
coherent?
|
No need, I'll squash the PR when merging. |
Co-authored-by: David Allsopp <david.allsopp@metastack.com>
Co-authored-by: David Allsopp <david.allsopp@metastack.com>
Co-authored-by: David Allsopp <david.allsopp@metastack.com>
Thanks @dra27 for your review! I accepted all your suggestions. I suggest we wait for @xavierleroy's second look before merging. |
I had a (quick) second look and I think it's high time to merge this PR! Thanks to all who participated. |
Thanks! |
Following discussion in #12360 this PR proposes adding I/O primitives for bigarrays to the standard library and Unix. I took the liberty of copying the
In_channel
andOut_channel
functions from #12360. On top of that,Unix
variants are also added here:In_channel.input_bigarray
,In_channel.really_input_bigarray
Out_channel.output_bigarray
Unix.read_bigarray
,Unix.write_bigarray
,Unix.single_write_bigarray
In each case the signature of the functions are identical to the existing ones on
bytes
, except that they use_ Bigarray.Genarray.t
in its place. Offset and length parameters are interpreted in terms of bytes.An alternative would be to use
_ Bigarray.Array1.t
instead ofBigarray.Genarray.t
and interpret offset and length in terms of elements. I was going to do this originally, but I sensed a small difficulty with theUnix.single_write
operation, which could fail to write a "full" element of a bigarray (but perhaps this is not a problem). Opinions welcome.There is some code duplication in the
Unix
bindings but it is a bit difficult to share code as we take advantage of the fact that the data part of a bigarray does not move in memory to avoid copying data to an intermediate buffer and to release the runtime lock more liberally.