Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e-nal-async: Missing lifetimes in UdpStack #103

Open
ivmarkov opened this issue Feb 5, 2024 · 15 comments
Open

e-nal-async: Missing lifetimes in UdpStack #103

ivmarkov opened this issue Feb 5, 2024 · 15 comments

Comments

@ivmarkov
Copy link

ivmarkov commented Feb 5, 2024

Connected, UniquelyBound and MultiplyBound associated types are currently not lifetimed.

I need them to be, so that I can keep a non-mutable reference to the "Udp Stack" instance in their impls.
While I can Arc the "Udp Stack" and then push the arc into the UDP socket impl, this just doesn't feel right.

Perhaps it is no coincidence that embassy-net's UDP socket stack is not implementing the e-nal-async traits? I don't think it would be possible anyway, without the above lifetimes.

Background: I stumbled on this by accident, while trying to implement a toy websockets gateway that tries to implement embedded-nal-async traits by multiplexing all traffic (UDP and TCP) over a single websocket to an actual STD based embedded-nal-async impl.

Anyway, I'll also work on a PR that implements a lifetimed version of the UDP associated types on top of embassy-net.

@ivmarkov
Copy link
Author

ivmarkov commented Feb 6, 2024

Supporting the bind_multiple contract on top of smoltcp seems currently impossible, or in the best case would require terrible hacks, as smoltcp does seem to assign a source (local) address very late - only when dispatching the UDP packet. The UDP packets in the ring buffer queue however do not have any notion of "source address" before that.

So it seems sending with an arbitrary local address would require us to
a) wait for the UDP socket ring buffer to become empty (so that we are sure all other pending packets are sent with the "previous" local addr)
b) unbind the UDP socket
c) rebind the socket with the local address of the packet we want to send (NOTE: this way however we would be missing packets that we want to receive, as the receive_from part will now work with the local address that we want to use for sending the packet
d) unbind the socket once the packet was sent
e) rebind the socket again with the local address as specified in bind_multiple

What's also weird is that there is quite a bit of asymmetry wrt UDP sockets between embedded-nal and embedded-nal-async. embedded-nal is pretty simplistic, while embedded-nal-async seems to enforce an API which is too close to the BSD sockets one, and is not necessarily a good fit for smoltcp. (Not that implementing bind_multiple is particularly easy with STD either.)

I'm unsure how to proceed, honestly.

  • One option would be to just not support bind_multiple on top of smoltcp by having the impl always return an error.
  • Another would be to remove it and/or make it optional in embedded-nal-async.

In any case, not having an impl of the UDP socket traits for smoltcp is an odd situation to be in, given that smoltcp is the one stack used in embedded...

@chrysn @lulf @Dirbaio @MabezDev Sorry for pinging. Happy to listen to your feedback / comments / guidance.

Did I miss the elephant in the room?

@chrysn
Copy link
Contributor

chrysn commented Feb 6, 2024 via email

@ivmarkov
Copy link
Author

ivmarkov commented Feb 6, 2024

My own WIP implementation here, if that's helpful.

@ivmarkov
Copy link
Author

ivmarkov commented Feb 6, 2024

... and my fork of embedded-nal-async that makes the associated types in UdpStack generic over a lifetime. A precondition for implementing those on top of embassy-net. While at it, I've also implemented facilities for splitting a UDP socket into a send and receive parts - which - while important for real world use cases is orthogonal to the issues discussed here.

EDIT: Don't pay attention to the comments in the fork yet, they are not updated.

@chrysn
Copy link
Contributor

chrysn commented Feb 6, 2024

Perhaps it is no coincidence that embassy-net's UDP socket stack is not implementing the e-nal-async traits? I don't think it would be possible anyway, without the above lifetimes.

Even if there were Stack<'a>, I don't think that the core embedded-nal-async trait can be implemented for embassy, because there is a second issue: Creating a socket needs an allocation lifetime'd to the socket (it takes a &'a stack and &'a mut buffers), and those can't be provided by a fn(&'long stack) -> Socket<'a>. (Or it may be possible, but it'll need some added trickery and will not be an embassy_net::udp::UdpSocket but rather something similar that owns buffers that are all part of the object).

I've had a similar issue with RIOT, not because it needs allocations but because it is self-referential and can only live in a pinned location (which would have needed a pinned constructor, which would also alleviate that second issue. In both cases, a sensible workaround is to implement a Stack on Cell<Option<socket>> (for RIOT I did it with an array of generic size), so that you at least get a stack from which you can create one socket. The stack is nothing more than a socket factory FnMut, and with that we have a factory FnOnce (that errs on later calls). Not great, but not terrible either.

I haven't thought the lifetime'd trait through yet -- do you think it has a chance of resolving both issues? Or will we just run into the second issue anyway (unless we do something weird such as passing in something Default capable as an extra argument just to store the buffers)?

The bindmultiple concern is separate from this, I'm creating a separate issue that will fullquote and open, because AIU its only overlap with this issue is that it's about that it is hard to implement embedded-nal-async on smoltcp.

@ivmarkov
Copy link
Author

ivmarkov commented Feb 6, 2024

Perhaps it is no coincidence that embassy-net's UDP socket stack is not implementing the e-nal-async traits? I don't think it would be possible anyway, without the above lifetimes.

Even if there were Stack<'a>, I don't think that the core embedded-nal-async trait can be implemented for embassy, because there is a second issue: Creating a socket needs an allocation lifetime'd to the socket (it takes a &'a stack and &'a mut buffers), and those can't be provided by a fn(&'long stack) -> Socket<'a>. (Or it may be possible, but it'll need some added trickery and will not be an embassy_net::udp::UdpSocket but rather something similar that owns buffers that are all part of the object).

Here's how this problem is solved for the TcpConnect trait.
I've just applied the same solution in my embassy-net UDP fork I mentioned previously.

I've - for now - hard-coded the PacketMetadata elements per UDP socket to 16, as I don't yet fully understand their meaning and why they are separate from the buffers. But that's a temporary situation and would not change the approach.

@chrysn
Copy link
Contributor

chrysn commented Feb 6, 2024

Ah thanks. That's the same "we're creating a stack with the capacity of creating N sockets" pattern. Comparing them also shows me why I didn't run into this: As the RIOT socket backends need to be pinned, they already demand that the stack-with-pool-of-sockets is 'static, and thus the types could be as well.

And in my PR for embedded-nal-async support in embassy-net I don't even try to implement Socket so far, but merely implement some ConnectedUdp / UnconnectedUdp types. For many applications I think that's even good enough: If a library is providing a CoAP server, it doesn't need to be able to open a UDP socket from a stack, it can just take the UDP socket as an impl. Doing this may be worthwhile as a pattern no matter the outcome of this PR.

The lifetime additions you propose in your PR sound reasonable to me -- I overlooked the need for this because the impls I did when prototyping the UDP traits were always either ZSTs or references to something that was static anyway. I'm not a member of this project, but if you file them as a PR I can promise to review them, and from what I see in your fork I'll expect that review to be very positive.

(I'm not sure yet what to think of the splitting proposal -- guess I'll have to see how it is used. Also I'm not sure that every socket can be split, given that the underlying implementation may have a single socket object, and even if the split turned it into something like an Arc<Mutex<S>>, that might lock when the send and receive sides try to make progress simultaneously.)

[edit: Formatting, because Markdown Is Bad]

@ivmarkov
Copy link
Author

ivmarkov commented Feb 6, 2024

Ah thanks. That's the same "we're creating a stack with the capacity of creating N sockets" pattern.

Indeed.

And in my PR for embedded-nal-async support in embassy-net I don't even try to implement Socket so far, but merely implement some ConnectedUdp / UnconnectedUdp types. For many applications I think that's even good enough: If a library is providing a CoAP server, it doesn't need to be able to open a UDP socket from a stack, it can just take the UDP socket as an impl. Doing this may be worthwhile as a pattern no matter the outcome of this PR.

I completely agree. Especially in the context of UDP that might even be enough. Yet - if we already have the notion of UdpStack, ideally it should be implementable for smoltcp or else perhaps we should remove it?

The lifetime additions you propose in your PR sound reasonable to me -- I overlooked the need for this because the impls I did when prototyping the UDP traits were always either ZSTs or references to something that was static anyway. I'm not a member of this project, but if you file them as a PR I can promise to review them, and from what I see in your fork I'll expect that review to be very positive.

Alright, will do!

(I'm not sure yet what to think of the splitting proposal -- guess I'll have to see how it is used. Also I'm not sure that every socket can be split, given that the underlying implementation may have a single socket object, and even if the split turned it into something like an Arc<Mutex<S>>, that might lock when the send and receive sides try to make progress simultaneously.)

No - splitting does not use Arc<Async_Mutex<S>> of course precisely because this cannot work. It just takes advantage of two facts, which are valid for both async STD and embassy-net/smoltcp:

  • Sending / receiving via a TCP or UDP socket - in both STD and embassy-net is implemeted by socket &self, not by socket &mut self.
  • The sockets in both STD (as in - say - async-io at least but I'm relatively confident that was the case elsewhere) as well as in smoltcp have two separate wakers for reading and writing. So i.e. waiting on new data to come does not "fight" with waiting for a place in the ringbuffer to perform "send".

So the TL;DR here is - lifetimed associated types in UdpSocket / TcpConnect help with splitting as well, because - as per my fork - splitting simply means to return two & references to the actual socket - one for reading, and another - for writing.

@chrysn
Copy link
Contributor

chrysn commented Feb 6, 2024

Sending / receiving via a TCP or UDP socket - in both STD and embassy-net is implemeted by socket &self, not by socket &mut self.

I'd have to look it up to verify, but I'm relatively sure that RIOT's sockets are a counterexample. There the &mut self on send and receive is important (and really held across the await point), because both operations try to perform their action, and if that WOULDBLOCK they register the waker as a callback to the socket. If they were to be split, both a send and receive could be done concurrently, and if that happens in different tasks, one waker would overwrite the other, and they'd need another complex multiplexer on top.

@ivmarkov
Copy link
Author

ivmarkov commented Feb 6, 2024

... as for cases for socket splitting:

  • For TCP (perhaps more obvious) - anything that does not follow a "request" / "response" paradigm and needs full duplex. As in - say - websockets.
  • For UDP - in a way - the same - the moment you don't follow "request" / "response" you need socket splitting so as to implement full duplex send/receive via a single socket. Concrete cases I have:
    • mDNS responder - as it needs to periodically multicast but also respond to specific queries - over a single socket
    • The main matter-rs transport loop. It needs to send (and re-send) UDP packets which are not necessary a response from some incoming "recv"d message - as in - answering to event subscriptions. Or even for implementing the "reliable" packet retrasmission these folks have implemented on top of UDP in their own home-grown way.

@ivmarkov
Copy link
Author

ivmarkov commented Feb 6, 2024

Sending / receiving via a TCP or UDP socket - in both STD and embassy-net is implemeted by socket &self, not by socket &mut self.

I'd have to look it up to verify, but I'm relatively sure that RIOT's sockets are a counterexample. There the &mut self on send and receive is important (and really held across the await point), because both operations try to perform their action, and if that WOULDBLOCK they register the waker as a callback to the socket. If they were to be split, both a send and receive could be done concurrently, and if that happens in different tasks, one waker would overwrite the other, and they'd need another complex multiplexer on top.

As per above, in both STD as well as smoltcp the waker for reading is different from the waker for writing. So they cannot overwrite each other.

Multiple readers or multiple writers is a wholly different topic and there the wakers WILL overwrite each other, but
(a) the user should not do this
(b) more importantly, our traits actually do not allow this
(c) even if that was allowed by my splittable traits, it is not necessarily an incorrect behaviour, strictly speaking. It will result in a lot of CPU consumption, but would otherwise work. :) Yeah, I know the last one is debatable.

@ivmarkov
Copy link
Author

ivmarkov commented Feb 6, 2024

I'd have to look it up to verify, but I'm relatively sure that RIOT's sockets are a counterexample. There the &mut self on send and receive is important (and really held across the await point), because both operations try to perform their action, and if that WOULDBLOCK they register the waker as a callback to the socket. If they were to be split, both a send and receive could be done concurrently, and if that happens in different tasks, one waker would overwrite the other, and they'd need another complex multiplexer on top.

@chrysn Sorry, that's a complete digression now, but looking at the RIOT impl it seems sending is actually not really async (though non-blocking), so there should not be a problem with sending and receiving wakers overwriting each other. (I'm not sure what happens if RIOT returns ENOMEM (which I guess means the sending buffer is full) - we won't really retry?)

On the other hand - yes - reading does require &mut self. It can be turned into &self but yeah, that would require a blocking mutex and more infrastructure. BTW: isn't this a UB (crash) in the presence of core::mem::forget? Sorry - might be completely off here, but thought mentioning it just in case. If it is not - I have something to learn from this code, as I had a similar case in the past.

@chrysn
Copy link
Contributor

chrysn commented Feb 6, 2024

Indeed, now that I checked, RIOT does currently not have a blocking send (but may grow one; line 278:

                        // When setting all this up, we got a &mut self, so we can be sure that there
                        // is no other thread or task simultaneously trying to receive on this, or even
                        // doing a blocking send on it (which we currently don't do anyway).
                        //
                        // (Otherwise, we'd have to worry about overwriting a callback).

-- and then it'll be in a single callback, sock_udp_set_cb). Point is not only about a single implementation, but that we can't be sure in general.

Your mDNS case mentions that you'd both send responses and do regular multicasts. That sounds like both the request/response part and the regular part would do sending, and unless they're in a single future (which is how I implement coap in embedded-nal-coap), that'd require the sender to be clonable. If we started demanding that, we could just already start doing .receive() on shared references, and I don't think that's a good pattern. But I should look at the mDNS example, maybe I can learn something for embedded-nal-coap

this a UB

Not as it is used: self.socket is a &'static. The internal types have needless generics there though, and if they were used with anything but 'static, this would be UB. Tracked in RIOT-OS/rust-riot-wrappers#81 now, thanks for pointing it out.

@ivmarkov
Copy link
Author

ivmarkov commented Feb 6, 2024

Indeed, now that I checked, RIOT does currently not have a blocking send (but may grow one; line 278:

                        // When setting all this up, we got a &mut self, so we can be sure that there
                        // is no other thread or task simultaneously trying to receive on this, or even
                        // doing a blocking send on it (which we currently don't do anyway).
                        //
                        // (Otherwise, we'd have to worry about overwriting a callback).

-- and then it'll be in a single callback, sock_udp_set_cb). Point is not only about a single implementation, but that we can't be sure in general.

Your mDNS case mentions that you'd both send responses and do regular multicasts. That sounds like both the request/response part and the regular part would do sending, and unless they're in a single future (which is how I implement coap in embedded-nal-coap),

They happen to be in a single future, yes, but I don't think that's exactly what you meant, as in a way - a lot of things end up being in a single future, including how some executors model their execution.

that'd require the sender to be clonable.

Not necessarily. You can wrap the sender in an async mutex as I do, and it would work.

What my splittable traits cannot guarantee, is that the returned read and write parts are 'static. That would only be true if the socket itself is borrowed by &'static mut, which would mean that the factory needs to be borrowed with &'static and so on. Possible of course - with static_cell - but I think not necessary in most of the cases.

@ivmarkov
Copy link
Author

ivmarkov commented Feb 7, 2024

@chrysn

(I've submitted the #106 now.)

I just realized, that even if you have a single waker registration for both sending and receiving (i.e. it is awoken when either of the two happens - i.e. there's space to put the next datagram in the send queue or a new datagram had arrived), this should actually work just fine as long as you use only intra-task concurrency (per boats's terminology). Or to put it simply - if both your sending and receiving code is ultimately ending in a single future, which ends up being a single task (and not two futures spawned on the executor separately and thus being two separate tasks each with its own waker) there will be no CPU-consuming "fight" between the two separate wakers from the two separate tasks for constantly re-registering themselves in the single waker registration slot, because the task is a single one, and thus the waker is a single one.

Given that the *Split traits I propose return lifetimed GATs, I would expect almost always or always the receive and send halves to be used in a single task anyway.

@Dirbaio sorry for pinging - would appreciate your expertise - and correct me if ^^^ is wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants