Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unix: add Cstruct_unix.{read,write,writev,send,recv} #302

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

djs55
Copy link
Member

@djs55 djs55 commented Oct 26, 2022

This is useful for direct-style code to avoid copying Cstructs into temporary byte buffers to use the Unix. API.

Signed-off-by: David Scott dave@recoil.org

This is useful for direct-style code to avoid copying Cstructs into temporary
byte buffers.

Signed-off-by: David Scott <dave@recoil.org>
@haesbaert
Copy link
Member

Me gusta ! I've been carrying this around for ages: https://github.com/haesbaert/rawlink/blob/520207190814ebfc0db9fd4f6d9c6e0f1297732d/lib/rawlink_stubs.c#L466

unix/recv_stubs.c Outdated Show resolved Hide resolved
| remaining ->
(* write at most iov_max at a time *)
let to_send = first iov_max [] remaining in
let n = stub_writev fd (List.map (fun x -> x.Cstruct.buffer, x.Cstruct.off, x.Cstruct.len) to_send) in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you expand to a tuple here because the Cstruct.t record type might change? Given this is in the same repository, it's probably safe enough to just pass the Cstruct record in directly to the FFI and save some allocations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was it -- I was using these stubs in another repo. I'll drop the tuple since we're here.

int fd = Int_val(val_fd);

caml_release_runtime_system();
n = recv(fd, buf, len, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure a recv with null flags is very useful, it becomes just read(2), (don't know about windows though).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I’ve copied the flags code from your stubs :)

val_ofs = Field(val_c, 1);
val_len = Field(val_c, 2);

void *buf = (void *)Caml_ba_data_val(val_buf) + Long_val(val_ofs);
Copy link
Member

@haesbaert haesbaert Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor bikeshedding, would move buf to the beginning of the block, declarations after statement is not valid C89 (not sure if anyone cares these days)
Also void * arithmetic is technically undefined behaviour (also not sure if anyone cares).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think void * is used here to be in-sync with what recv expects:

ssize_t recv(int sockfd, void *buf, size_t len, int flags);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack but passing pointer of any type to something that expects a void pointer is ok, my only bit was that we're doing (void *)++, which is undefined behavior (but every compiler implicitly assumes char *) so it's ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/questions/3922958/void-arithmetic says it's a GCC extension, and attempting this should produce warnings.


SOCKET s = Socket_val(val_fd);
caml_release_runtime_system();
n = send(s, buf, len, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as recv

val_len = Field(val_c, 2);
void *buf = (char *)Caml_ba_data_val(val_buf) + Long_val(val_ofs);
size_t len = Long_val(val_len);
int n = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ssize_t

#include <limits.h>
#endif

CAMLprim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshedding, the other stubs don't break line here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still breaking line here in stub_cstruct_iov_max

unix/unix_cstruct.ml Outdated Show resolved Hide resolved
@haesbaert
Copy link
Member

I can provide Cstruct_unix.recvfrom and Cstruct_unix.sendto if it's desirable, I just can't do the windows bits.

djs55 and others added 3 commits November 1, 2022 14:22
Linux doesn't export IOV_MAX unless you define _XOPEN_SOURCE or _GNU_SOURCE (from @haesbaert)

Co-authored-by: Christiano Haesbaert <haesbaert@haesbaert.org>
makes it easier to grep for the source of the error if we just name it after the function. (from @avsm)

Co-authored-by: Anil Madhavapeddy <anil@recoil.org>
The recv/send tests looked a bit complex so I've changed mine, don't see why we
would need threads since it's just one message.
@@ -39,3 +39,9 @@ val send: Unix.file_descr -> Cstruct.t -> int
val recv: Unix.file_descr -> Cstruct.t -> int
(** [recv fd c] receives a message from a socket. This can be used to receive a datagram.
If only a partial receive is possible, the return argument is now many bytes were received. *)

val recvfrom: Unix.file_descr -> Cstruct.t -> Unix.msg_flag list -> int * Unix.sockaddr
(** [recvfrom fd c] Like Unix.recvfrom, but for Cstruct. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(** [recvfrom fd c] Like Unix.recvfrom, but for Cstruct. *)
(** [recvfrom fd c] Like {! Unix.recvfrom}, but for Cstruct. *)

will link to it on the ocaml.org docs then.

caml_acquire_runtime_system();

if (n == -1)
caml_uerror("recvfrom", Nothing);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
caml_uerror("recvfrom", Nothing);
caml_uerror("cstruct_recvfrom", Nothing);

I do find these Unix_errors are marginally more useful if you can grep for the function they came from

Signed-off-by: David Scott <dave@recoil.org>
- `caml_uerror` is defined in ocaml/ocaml#67a4d75cfafa040099cdd322e23464362359af29
  and is in 5.0+
- for backwards compat ocaml 5.0 has #define uerror caml_uerror

Signed-off-by: David Scott <dave@recoil.org>
This allows compat with OCaml 4.13

Signed-off-by: David Scott <dave@recoil.org>
Windows socket handles and error codes are managed differently.

Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
There's no need to repack it as a tuple for the C bindings because
the definition is in this repository.

Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Back to vi...

Signed-off-by: David Scott <dave@recoil.org>
This became redundant when the re-packing of a Cstruct.t as a tuple
was removed.

Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
#endif

static int msg_flag_table[] = {
MSG_OOB, MSG_DONTROUTE, MSG_PEEK /* XXX */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the XXX for here? Incomplete flags?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cause we assume how stdlib ordered things and also because we redefine them, would be nice if stdlib exposed them as not static so we could just use caml_msg_flag_table or something

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stdlib does expose them -- as OCaml values. Just add a type definition in unix_cstruct.mli instead and get rid of the XXX:

type msg_flag = Unix.msg_flag = MSG_OOB | MSG_DONTROUTE | MSG_PEEK

This will create an alias to Unix.msg_flag that also checks that it's exactly the same as what is specified there.
Try for example, swapping two of the flags:

type t = Unix.msg_flag = MSG_OOB | MSG_PEEK | MSG_DONTROUTE;;
Error: This variant or record definition does not match that of type
         Unix.msg_flag
       Constructors MSG_DONTROUTE and MSG_PEEK have been swapped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's smart ! yes we should do that.

let iov_max = stub_iov_max ()

(* return the first n fragments, suitable for writev *)
let rec first n rev_acc = function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mildly dislike having to do these potentially O(n) list operations for a writev, and mandate that in the external interface. In your vpnkit usecase, would you be able to pass in a Cstruct.t array instead for writev, and then just call writev with simple O(1) time slices instead?

It affects the interface we expose in the mli, hence my asking now (as opposed to optimising later)

val_buf = Field(val_c, 0);
val_ofs = Field(val_c, 1);
val_len = Field(val_c, 2);
void *buf = (char *)Caml_ba_data_val(val_buf) + Long_val(val_ofs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void *buf = (char *)Caml_ba_data_val(val_buf) + Long_val(val_ofs);
uint8_t *buf = (uint8_t *)Caml_ba_data_val(val_buf) + Long_val(val_ofs);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this change, the FFI about bigarray says that it's an uint8_t array. We should keep this assumption on the C side too.

#include <limits.h>
#endif

CAMLprim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still breaking line here in stub_cstruct_iov_max

length++;
/* Only copy up to the iovec array size */
if (length > IOV_MAX)
length = IOV_MAX;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently truncating it might be worse than just letting the syscall failing later on in uerror with EINVAL.
You exposed the maximum, if the user goes above he should be punished with an exception imho.
It's a bit harmless for a STREAM that has to handle shortcounts, but a DATAGRAM will send half a packet.
(even if we're all a bit insane because IOV_MAX is huge)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mostly mirror what writev(2) itself does, and that returns an EINVAL if > IOV_MAX. So I agree that a uerror should be raised if it's too large, and not a silent truncation.

val_buf = Field(head, 0);
val_ofs = Field(head, 1);
val_len = Field(head, 2);
iovec[i].iov_base = (char *)Caml_ba_data_val(val_buf) + Long_val(val_ofs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
iovec[i].iov_base = (char *)Caml_ba_data_val(val_buf) + Long_val(val_ofs);
iovec[i].iov_base = (uint8_t *)Caml_ba_data_val(val_buf) + Long_val(val_ofs);

@dinosaure
Copy link
Member

A release should be done soon but this PR has some remaining issues. As far as I can say, it's mostly about style. @djs55 can you do a review of them?

@patricoferris
Copy link

Just thought I would mention that I will try to get a pread/pwrite/preadv/pwritev PR open soonish. I started hacking on that over here 4fd8e39 but don't currently have a windows machine to do that part and couldn't get macOS (11+) to be happy about the preadv/pwritev functions. I mention this because it would help with the potential IOCP/kqueue backends in Eio. Just thought I would mention it in case others were already hacking on this :))

@patricoferris
Copy link

I haven't worked out a nice story for supporting pread/pwrite on Windows because the calls to ReadFile/WriteFile with an Overlapped changes the current position (pread/pwrite do not change the position). Lwt_unix has decided to leave that as undefined behaviour (see ocsigen/lwt#768). Would that be okay here?

@dinosaure
Copy link
Member

Would that be okay here?

I think it's ok when the module explicitely mention unix in it's name 🙂 . A note (like lwt) will be cool to inform the user if he/she tries to use this module on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants