New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ActorFailure Clarification #84
Comments
You are correct: the documentation is vague and needs some cleanup and clarifications. In regards "failure and retry", the failure is any uncaught Exception, and the "ActorSystem will automatically restart" is not quite correct; the latter should have said "a failure in one or more Actors is handled elsewhere in the actor system" and was primarily intended to encourage a different perspective on writing Actors but ended up over-promising. I will correct the documentation in these areas, but let me provide some more details and describe some situations where the above is true: In regards to your understanding, 1 and 2 are correct, but 3 does not send a If the Actor failure is not just an exception but is fatal to the process then a In general, it is usually not possible to guarantee that the same address can be set for the replacement Child for most transports, so there's no provision for retaining the same address. This is where the globalAddress can be useful, or an actor that manages address registrations. One mechanism whereby Actors which fail by process failure can be restarted via Thespian itself is to use the Please let me know if this helps answer your questions or if you would like additional information. Thank you for the clear report and accompanying code, and I will leave this issue open until I have the opportunity to update the documentation. |
Thank you for the quick response! I think this answers all my questions. I'll take a deeper look into the Director suggestion as that seems like it might be a good fit for my application. P.S. Update: Also in the |
I suppose I do have one more quick question, and let me know if you want me to create a new issue with this question instead of putting it here. In the example code above, if I were to kill the child actor's process ID using In the thespian example code (multi-proc Act4), the ChildActorExited message is only sent to the parent on the next instance of a message being sent to the killed child actor. Is this the intended behavior or is something set up incorrectly in my environment? The documentation doesn't mention this behavior anywhere so from reading it I had assumed that this message would be sent immediately following a child process exiting but that doesn't appear to be the case. I need the ChildActorExited message as soon as a process fails since one of the child actors in my projects works based on messages from a subscribed MQTT topic. Thus if this child actor dies, the parent wouldn't ever be notified since the child isn't being sent any messages from other actors. My naive approach to getting the ChildActorExited message when a process is killed would be to set up a "heartbeat" using a WakeupMessage for each child actor that simply sends a message to the child actor on an interval. However if there is something built in to the thespian system to handle this type of issue I'd much rather use that instead. Thank you again for your time and assistance. |
Here is some code I mocked up to explain my naive approach to the ChildActorExited issue.
Parent Code: from thespian.actors import *
import child
import random
from datetime import timedelta
import logging as log
import uuid
class Initialize:
def __init__(self):
pass
class Parent(ActorTypeDispatcher):
heartbeats = []
def __init__(self, *args, **kwargs):
super(Parent, self).__init__(*args, **kwargs)
def receiveMsg_Initialize(self, msg, sender):
# Register as dead letter handler
self.handleDeadLetters()
self.child_actor = self.createActor(child.Child)
x = random.randint(0, 100)
self.send(self.child_actor, child.Initialize(x=x))
self.wakeupAfter(timePeriod=1)
def receiveMsg_WakeupMessage(self, msg, sender):
uid = str(uuid.uuid4())
self.heartbeats.append(uid)
self.send(self.child_actor, child.Heartbeat(uid=uid))
log.debug(f'Sending heartbeat: {uid}')
self.wakeupAfter(timePeriod=1)
def receiveMsg_Heartbeat(self, msg, sender):
log.debug(f'Received heartbeat: {msg.uid}')
if msg.uid in self.heartbeats:
idx = self.heartbeats.index(msg.uid)
if idx == len(self.heartbeats) - 1:
# If most recent heartbeat received, clear list
self.heartbeats = []
else:
# Otherwise, remove the received heartbeat only.
self.heartbeats.pop(idx)
def receiveMsg_DeadEnvelope(self, msg, sender):
if type(msg.deadMessage) == child.Heartbeat and msg.deadAddress == self.child_actor:
if len(self.heartbeats) > 5: # fail after 5 missed hearbeats
log.critical(f'Restarting child actor after 5 missed heartbeats.')
old_address = self.child_actor
self.send(old_address, ActorExitRequest()) # attempt clean shutdown
self.child_actor = self.createActor(child.Child)
self.heartbeats = []
self.send(self.child_actor, child.Initialize(x=random.randint(0, 100)))
def receiveMsg_str(self, msg, sender):
self.send(self.child_actor, child.PrintValues())
def receiveMsg_PoisonMessage(self, msg, sender):
self.send(self.child_actor, child.PrintValues())
def receiveMsg_ChildActorExited(self, msg, sender):
log.critical(f'ChildActorExited: {msg.childAddress}')
def receiveMsg_ActorExitRequest(self, msg, sender):
# Unset as handler for dead letters
self.handleDeadLetters(False) Child Code: from thespian.actors import *
import random
import os
import logging as log
class Heartbeat:
def __init__(self, uid):
self.uid = uid
class FailMessage:
def __init__(self):
pass
class PrintValues:
def __init__(self):
pass
class Initialize:
def __init__(self, x=None) -> None:
self.x = x
class Child(ActorTypeDispatcher):
x1 = None
x2 = None
def __init__(self, *args, **kwargs):
super(Child, self).__init__(*args, **kwargs)
x1 = random.randint(0, 100)
self.x1 = x1
log.debug(f'__init__: addr={hex(id(self))}, x1={self.x1}, x2={self.x2}')
def receiveMsg_Initialize(self, msg, sender):
self.x2 = msg.x
log.debug(f'Initialize: addr={hex(id(self))}, x1={self.x1}, x2={self.x2}')
def receiveMsg_Heartbeat(self, msg, sender):
self.send(sender, Heartbeat(uid=msg.uid))
def receiveMsg_FailMessage(self, msg, sender):
log.debug('Forcing exception in child!')
raise ValueError
def receiveMsg_PrintValues(self, msg, sender):
log.debug(f'PrintValues: addr={hex(id(self))}, x1={self.x1}, x2={self.x2}') |
I've updated and published the documentation that should be a bit clearer in the areas you identified. Thanks for helping identify the ambiguities in the docs! Thanks for the notes about the links in the PDF files. Unfortunately, it appears the org-mode processing doesn't do any adjustment of the file extension when exporting to LaTeX mode (which makes sense, since it cannot now what the ultimate format is, but unfortunately there's no way I can find to provide this information either). The hyperlinks should work well for the HTML form of the document and will look at filing an issue with org-mode-latex-export to handle this. With regards to your updated code examples, I don't see anything particularly objectionable about them. I would suggest that perhaps you can also restart the child when you receive the However, I'm also not entirely sure what your scenario is where you don't see the |
From the Using.pdf (page 13 section 2.2.5) documentation's
Actor Failure
section:According to this, the actor will be restarted following a failure. However I am unsure of what is meant by
restarting the Actor
in this context.I set up some test code with a parent and a child actor.
The parent actor code is below:
...and the child code:
When running this code, the child actor will print out its class' address in memory and the two random values; x1 being set in the class
__init__
and x2 being set after receiving the Initialize message.In this example, all three of class address, x1, and x2 have the same values after the actor has failed and restarted. From the documentation, I had assumed that when an actor restarts, it would essentially run the
createActor
function while retaining the previous ActorAddress however that doesn't appear to be the case since the__init__
function isn't being run again.Here is my current understanding of the Actor Failure process:
So what I'm looking for is a more in depth explanation of HOW actors restart upon an Actor Failure since the documentation is pretty vague.
The text was updated successfully, but these errors were encountered: