Why do we need `Context/Address::stop_all` #119

thomaseizinger · 2022-07-11T21:50:13Z

With #118, Context::stop_all is moved to Address::stop_all.

Whilst doing this change and updating the docs, I was wondering why we need to provide this functionality within xtra? If the behaviour is the exact same as sending a Stop message to each actor, writing yourself a custom Address::stop_all function is pretty trivial:

struct Actor;
struct Stop;

#[async_trait]
impl Handler<Stop> for Actor {
    type Return = ();

    fn handle(&mut self, _: Stop, ctx: &mut Context<Self>) {
        ctx.stop();
    }
}

// To stop all actors:
address.broadcast(Stop).await;

If we still consider this too much boilerplate, we could provide a xtra-stop-handler crate that offers:

The Stop message type
A custom derive #[derive(StopHandler)]

Is there a reason why the implementation of stop_all needs to live xtra? There is a fair bit of code involved in providing it, like a custom envelope etc. We could save a good few lines of code by deleting it if it can really be expressed with existing building blocks.

The text was updated successfully, but these errors were encountered:

Restioson · 2022-07-12T09:37:19Z

The behaviour is not quite the same, since StopEnvelope has infinite priority, whereas a custom message would not. Users could approximately address this with broadcast(Stop).priority(u32::MAX), though, assuming they don't have any other max priority messages.

Fundamentally, it feels a bit strange not to have stop_all when there is also stop_self - it's somewhat asymmetrical.

thomaseizinger · 2022-07-12T11:49:22Z

The behaviour is not quite the same, since StopEnvelope has infinite priority, whereas a custom message would not. Users could approximately address this with broadcast(Stop).priority(u32::MAX), though, assuming they don't have any other max priority messages.

Is this enough to justify it as a dedicated feature? As soon as people use a multi-threaded executor, it is pretty hard to make strict ordering guarantees and unless I am missing something, we only get a slightly better latency for stopping but no strict guarantees about what does and does not get processed.

Fundamentally, it feels a bit strange not to have stop_all when there is also stop_self - it's somewhat asymmetrical.

I see the symmetry aspect but at the same time, it feels redundant to have one feature if it can also be expressed through a different one.

Differently said, I think orthogonality of features is more important than providing all combinations out of the box.

Restioson · 2022-07-15T13:01:56Z

I see the symmetry aspect but at the same time, it feels redundant to have one feature if it can also be expressed through a different one.

I wonder how small an implementation of this using the broadcast StopAll message approach would be - maybe this would be small enough to be justified in the interests of symmetry?

thomaseizinger · 2022-07-15T13:51:31Z

I see the symmetry aspect but at the same time, it feels redundant to have one feature if it can also be expressed through a different one.

I wonder how small an implementation of this using the broadcast StopAll message approach would be - maybe this would be small enough to be justified in the interests of symmetry?

By symmetry you mean exposing a Context::stop_all function? I am not convinced that that one even should exist which is why I opened #118.

I find it odd that one actor out of a fleet of actors should shutdown all others but I am generally not a fan of choreography and would implement such a requirement using an orchestration actor that would manage the lifecycle of a fleet of worker actors.

Restioson · 2022-07-15T13:57:12Z

I find it odd that one actor out of a fleet of actors should shutdown all others

One use case I'd imagine is if you are using the actors to abstract over a few handles to one resource - e.g in tantivy, you need to block to retrieve a searcher, so I spawned num_searcher actors so I could make it async. If, hypothetically, one encounters the file being moved, it might make sense to stop them all so they could be restarted with the new file location.

This is just a contrived example, but generally it could be used any time that all the actors need to stop as the common state resource they are accessing encounters an error.

thomaseizinger · 2022-07-15T18:15:36Z

Interesting. I think I'd model this with 1 actor actually owning the resource and spawning more worker actors. In other words: orchestration instead of choreography.

Restioson · 2022-07-15T18:16:31Z

What would the worker actors do themselves?

thomaseizinger · 2022-07-16T09:25:06Z

What would the worker actors do themselves?

Whatever the actual unit of work is? I don't know about the domain much but my thinking would have been that instead of owning the resource, you can instantiate the workers with their unit of work already or send it to them with a message.

To spawn multiple actors, you need the resource to be a shared reference right? Some Arc-Mutex or something. I'd find that a bit odd if I could at the same time have one actor full owning the resource, grab units of work and dispatch to workers in a loop.

But I am not sure if I'd use actors for this generally? Threadpools are useful because spawning a new thread takes significant resources so it makes sense to reuse them. Tasks are incredibly cheap to spawn so the same thing could be achieved by spawning tasks in a loop and joining on the handles afterwards.

I am struggling to really see the usecase for spawning multiple actors to the same address but I can also understand that if we don't have this feature natively in xtra, it is almost impossible to build on top or only with significant performance penality by doing double dispatching through a proxy actor.

Restioson · 2022-07-16T13:13:35Z

To spawn multiple actors, you need the resource to be a shared reference right?

Yea. It's the case for tantivy that getting a searcher from the pool blocks, unfortunately, so that's why I used multiple actors on the same address there.

if I could at the same time have one actor full owning the resource, grab units of work and dispatch to workers in a loop.

At this point, why not spawn a task? I agree.

It's also nice to have this for those use cases and also get parity with libraries like actix which do support this, which might encourage more use. I don't think it should cause too much of a runtime performance penalty, especially at async scales - what I mean is that polling the channel is going to be a small amount of the time spent, as I'd imagine orders of magnitude more overhead is incurred from the runtime itself.

For reference, flume is natively MPMC, and iirc this change didn't make it any slower than when it was MPSC. The main extra is having to check the broadcast mailbox in poll, but I'd imagine this doesn't take long in the case of not using broadcasts, since a spin lock is very fast to unlock without contention (hundreds of nanoseconds or less).

thomaseizinger · 2022-07-16T15:00:25Z

To spawn multiple actors, you need the resource to be a shared reference right?

Yea. It's the case for tantivy that getting a searcher from the pool blocks, unfortunately, so that's why I used multiple actors on the same address there.

Would it make sense to retrieve the searcher in the orchestration actor and send it to a worker actor once it is retrieved to actually perform the search? (Not sure if this is how things work)

It's also nice to have this for those use cases and also get parity with libraries like actix which do support this, which might encourage more use. I don't think it should cause too much of a runtime performance penalty, especially at async scales - what I mean is that polling the channel is going to be a small amount of the time spent, as I'd imagine orders of magnitude more overhead is incurred from the runtime itself.

For reference, flume is natively MPMC, and iirc this change didn't make it any slower than when it was MPSC. The main extra is having to check the broadcast mailbox in poll, but I'd imagine this doesn't take long in the case of not using broadcasts, since a spin lock is very fast to unlock without contention (hundreds of nanoseconds or less).

I am not against the feature per se. Esp. ever since we shipped #95, the feature is offered in an extremely lean way from an API perspective.

Restioson · 2022-07-16T17:24:03Z

Would it make sense to retrieve the searcher in the orchestration actor and send it to a worker actor once it is retrieved to actually perform the search? (Not sure if this is how things work)

In the specific case of tantivy, I'm not 100% sure, to be honest! It might involve an explicit async semaphore to synchronize when there are not enough searchers left (as the method to acquire one is synchronous and will block) which is less elegant to me. It would be interesting to see if there are any other users of this feature - I wonder how much use it gets in actix?

thomaseizinger · 2022-07-16T19:55:18Z

To get back on topic, I am gonna try and summarise:

Apart from having infinite priority, the internal Shutdown message offers no functionality that couldn't be replicated with a dedicated Stop message.

My final position is that I don't think the improved latency of "better" priority is worth the complexity in the implementation and the additional API. If users care about this, they can choose a priority for a Stop broadcast message that is higher than any of the priorities they use in their application.

For now, I'd suggest to remove this feature and in the long run, we can provide an xtra-stop-handler crate as outlined above that makes this usecase more convenient.

Thoughts?

Restioson · 2022-07-17T15:37:11Z

the internal Shutdown message offers no functionality that couldn't be replicated with a dedicated Stop message.

I've realised that this is not exactly true. The one thing it does offer is that shutdown is guaranteed to be dispatched without waiting for a lagging actor to catch up.

My final position is that I don't think the improved latency of "better" priority is worth the complexity in the implementation and the additional API. If users care about this, they can choose a priority for a Stop broadcast message that is higher than any of the priorities they use in their application.

I agree, if it's okay that the shutdown would not be dispatched immediately.

thomaseizinger · 2022-07-17T15:54:58Z

the internal Shutdown message offers no functionality that couldn't be replicated with a dedicated Stop message.

I've realised that this is not exactly true. The one thing it does offer is that shutdown is guaranteed to be dispatched without waiting for a lagging actor to catch up.

Ah that is indeed interesting. I guess the one is a graceful shutdown and the other one isn't?

Restioson · 2022-07-17T16:14:22Z

Not really. StopAll implemented in-crate with privilege can send immediately, bypassing the broadcast tail. Meanwhile, out of crate, it must still wait. In both cases, the actors will get stopped called on them, so I'd say both are graceful.

This does raise the interesting question of how an actor would be able to call a non-privileged stop_all from a handler if it is the lagging actor... I think that it would wait forever.

thomaseizinger · 2022-07-17T17:12:12Z

With lagging actor, you mean an actor that is currently busy processing a certain message, i.e. not returning from a Handler?

I think there should be a way of shutting such an actor down. This might also be useful for a supervisor implementation. The question then is, should we maybe have a privileged Stop message for a single actor that can also be broadcasted?

Restioson · 2022-07-17T18:50:15Z

I mean the actor which is the tail of the broadcast queue. I.e, if the bound is 10, it has 10 in its queue. When it tries to broadcast stop all, it will wait forever and deadlock.

thomaseizinger · 2022-07-17T19:15:46Z

That sounds like a strong enough reason to keep this functionality. Good thing that we fleshed it out.

I'd say we keep this issue open and re-purpose it to updating the docs to reflect the findings. In summary:

stop_all has infinite priority and will thus reliably stop all actors even if all mailboxes are full.

thomaseizinger · 2022-07-27T11:26:04Z

Closing in favor of #157.

thomaseizinger assigned Restioson Jul 11, 2022

thomaseizinger mentioned this issue Jul 12, 2022

Document goals and non-goals of xtra #121

Open

thomaseizinger mentioned this issue Jul 16, 2022

Roadmap to 0.6 #137

Closed

Restioson added this to the 0.6.0 milestone Jul 17, 2022

Restioson mentioned this issue Jul 18, 2022

Move Context::stop_all to Address::stop_all #118

Closed

thomaseizinger closed this as completed Jul 27, 2022

thomaseizinger mentioned this issue Jul 27, 2022

Document features of Context::stop_all #157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we need `Context/Address::stop_all` #119

Why do we need `Context/Address::stop_all` #119

thomaseizinger commented Jul 11, 2022

Restioson commented Jul 12, 2022

thomaseizinger commented Jul 12, 2022

Restioson commented Jul 15, 2022

thomaseizinger commented Jul 15, 2022

Restioson commented Jul 15, 2022

thomaseizinger commented Jul 15, 2022

Restioson commented Jul 15, 2022

thomaseizinger commented Jul 16, 2022 •

edited

Restioson commented Jul 16, 2022 •

edited

thomaseizinger commented Jul 16, 2022

Restioson commented Jul 16, 2022

thomaseizinger commented Jul 16, 2022 •

edited

Restioson commented Jul 17, 2022 •

edited

thomaseizinger commented Jul 17, 2022

Restioson commented Jul 17, 2022 •

edited

thomaseizinger commented Jul 17, 2022

Restioson commented Jul 17, 2022

thomaseizinger commented Jul 17, 2022

thomaseizinger commented Jul 27, 2022

Why do we need Context/Address::stop_all #119

Why do we need Context/Address::stop_all #119

Comments

thomaseizinger commented Jul 11, 2022

Restioson commented Jul 12, 2022

thomaseizinger commented Jul 12, 2022

Restioson commented Jul 15, 2022

thomaseizinger commented Jul 15, 2022

Restioson commented Jul 15, 2022

thomaseizinger commented Jul 15, 2022

Restioson commented Jul 15, 2022

thomaseizinger commented Jul 16, 2022 • edited

Restioson commented Jul 16, 2022 • edited

thomaseizinger commented Jul 16, 2022

Restioson commented Jul 16, 2022

thomaseizinger commented Jul 16, 2022 • edited

Restioson commented Jul 17, 2022 • edited

thomaseizinger commented Jul 17, 2022

Restioson commented Jul 17, 2022 • edited

thomaseizinger commented Jul 17, 2022

Restioson commented Jul 17, 2022

thomaseizinger commented Jul 17, 2022

thomaseizinger commented Jul 27, 2022

Why do we need `Context/Address::stop_all` #119

Why do we need `Context/Address::stop_all` #119

thomaseizinger commented Jul 16, 2022 •

edited

Restioson commented Jul 16, 2022 •

edited

thomaseizinger commented Jul 16, 2022 •

edited

Restioson commented Jul 17, 2022 •

edited

Restioson commented Jul 17, 2022 •

edited