Clarify expectations when passing mutable data in messages for each ActorSystem #7

Closed
fprimex opened this Issue Mar 20, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@fprimex

fprimex commented Mar 20, 2016

Two major selling points for me on Thespian have been (1) supporting distributed concurrency transparently and (2) keeping actors truly atomic/private (apologies as I'm probably not using the correct technical terminology).

When considering users who are reading docs and comparing Thespian to other actor system implementations, I think it is helpful to differentiate on these two points in the docs explicitly. The first is already pretty clear, I think. For the second, compare to this statement in the Pykka docs:

For performance reasons, Pykka does not clone the dict you send before delivering it to the receiver. You are yourself responsible for either using immutable data structures or to copy.deepcopy() the data you’re sending off to other actors.

With some reasoning, one can figure out when Thespian would probably behave like this (simpleSystemBase) and when it wouldn't (the others), but I think a note at a higher level of abstraction would be good. That is, this statement:

All Actors run synchronously in the context of the current process thread.

Implies that mutable data structures are probably being passed around instead of copied or pickled, but you could certainly still be doing the copying/pickling in this implementation.

A silly test:

from thespian.actors import ActorSystem, ActorSystemMessage, \
    ActorTypeDispatcher


class SendDictMsg(object):
    def __init__(self, d):
        self.d = d


class ModDictMsg(object):
    pass


class TestActor(ActorTypeDispatcher):
    def receiveMsg_SendDictMsg(self, msg, sender):
        self.d = msg.d
        print('received: ', end='')
        print(repr(self.d))

    def receiveMsg_ModDictMsg(self, msg, sender):
        self.d['modkey'] = 'modvalue'
        print('modified: ', end='')
        print(repr(self.d))
        self.send(sender, 'done')

def run_system(system):
    ActorSystem(system)

    addr = ActorSystem().createActor(TestActor)

    test_d = {'one':1}

    print('original: ', end='')
    print(repr(test_d))

    ActorSystem().tell(addr, SendDictMsg(test_d))
    ActorSystem().ask(addr, ModDictMsg())

    print('after mod: ', end='')
    print(repr(test_d))

    ActorSystem().shutdown()

if __name__ == '__main__':
    for system in ('simpleSystemBase', 'multiprocQueueBase', 'multiprocTCPBase', 'multiprocUDPBase'):
        print(system)
        run_system(system)
        print()

Output:

simpleSystemBase
original: {'one': 1}
received: {'one': 1}
modified: {'one': 1, 'modkey': 'modvalue'}
after mod: {'one': 1, 'modkey': 'modvalue'}

multiprocQueueBase
INFO:Thespian.Admin:ActorSystem Administrator startup @ ActorAddr-Q.ThespianQ
original: {'one': 1}
received: {'one': 1}
modified: {'one': 1, 'modkey': 'modvalue'}
after mod: {'one': 1}

multiprocTCPBase
INFO:Thespian.Admin:ActorSystem Administrator startup @ ActorAddr-(TCP|192.168.1.155:1900)
original: {'one': 1}
received: {'one': 1}
modified: {'one': 1, 'modkey': 'modvalue'}
after mod: {'one': 1}

multiprocUDPBase
INFO:Thespian.Admin:ActorSystem Administrator startup @ ActorAddr-(UDP|192.168.1.155:1029)
original: {'one': 1}
received: {'one': 1}
modified: {'one': 1, 'modkey': 'modvalue'}
after mod: {'one': 1}

This does raise the question as to whether this is a "bug" in simpleSystemBase, since, if it is used mostly for testing, wouldn't it be desirable to keep its behavior as close to the other systems as possible?

@kwquick

This comment has been minimized.

Show comment
Hide comment
@kwquick

kwquick Mar 21, 2016

Contributor

Thank you for identifying this lack of clarification.

I have updated the documentation (9bd0801) to reflect the current behavior.

Thespian actually takes the same stance as Pykka (no automatic copying for performance reasons) but as you noted, there are differences based on which system base is used. For the multiprocess bases in your example above, the original is not modified because the TestActor runs in a separate process, so it does get copies of the message by virtue of it having been serialized and transmitted to that process.

Your point regarding behavior differences is a good one. At the moment, I'm inclined to keep the behavior as is because the simpleSystemBase is the least safe and therefore most likely to expose issues, which is desirable for testing. Please feel free to provide arguments to the contrary if you think this is the wrong position on this situation.

Thanks,
Kevin

Contributor

kwquick commented Mar 21, 2016

Thank you for identifying this lack of clarification.

I have updated the documentation (9bd0801) to reflect the current behavior.

Thespian actually takes the same stance as Pykka (no automatic copying for performance reasons) but as you noted, there are differences based on which system base is used. For the multiprocess bases in your example above, the original is not modified because the TestActor runs in a separate process, so it does get copies of the message by virtue of it having been serialized and transmitted to that process.

Your point regarding behavior differences is a good one. At the moment, I'm inclined to keep the behavior as is because the simpleSystemBase is the least safe and therefore most likely to expose issues, which is desirable for testing. Please feel free to provide arguments to the contrary if you think this is the wrong position on this situation.

Thanks,
Kevin

@fprimex

This comment has been minimized.

Show comment
Hide comment
@fprimex

fprimex Mar 24, 2016

Looks good to me.

The only argument I have for changing the behavior of simpleSystemBase considers the novice/misguided user. They may try to abuse the ability to continue referencing sent data in the simpleSystemBase, and because of this I worry that someone might develop a working implementation that tests well early on, only to find out later that they need to change their approach significantly.

I think it's fine to leave as-is, and tackle the problem through education/documentation should it come up. If, however, every other posting to the google group becomes "why isn't my code working" and this is the problem, then reconsider the simpleSystemBase implementation.

fprimex commented Mar 24, 2016

Looks good to me.

The only argument I have for changing the behavior of simpleSystemBase considers the novice/misguided user. They may try to abuse the ability to continue referencing sent data in the simpleSystemBase, and because of this I worry that someone might develop a working implementation that tests well early on, only to find out later that they need to change their approach significantly.

I think it's fine to leave as-is, and tackle the problem through education/documentation should it come up. If, however, every other posting to the google group becomes "why isn't my code working" and this is the problem, then reconsider the simpleSystemBase implementation.

@fprimex fprimex closed this Mar 24, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment