iostream fix for multiprocessing with no performance hit to single process #2734

piti118 · 2013-01-02T19:26:01Z

Better version of #2712

…ess.

piti118 · 2013-01-02T19:29:29Z

So, what it does here is that it has two buffers. Whenever multiprocess is detected, it switch to multiprocess mode and use que as buffer instead of StringIO.

This way there would be no performance hit to single process use case.

%%timeit
for i in range(100):
    print i

Before any kind of fork it gives ~2ms/loop. After something like

import multiprocessing as mp
def f(x):
    print 'hello',x
pool = [mp.Process(target=f,args=(i,)) for i in range(10)]
for p in pool: p.start()
for p in pool: p.join()

%%timeit
for i in range(100):
    print i

will be around 200ms/loop.

piti118 · 2013-01-02T19:37:20Z

Still couldn't quite figure out how to write a test for this though....

minrk · 2013-01-03T19:43:21Z

Why switch modes for the master process? Why doesn't the master process only write to the regular buffer, and just use the Queue for subprocesses? I know it ensures ordering of print statements between the master process and the subprocesses, but that's not a guarantee that I feel like we need to make (it's not even a guarantee that the terminal makes, really).

piti118 · 2013-01-03T20:08:06Z

import multiprocessing as mp
from time import sleep
manager = mp.Manager()
v = manager.Value('i',9)
def f(x, v):
    while(v.value!=x):
        pass
    print 'hello',x
    v.value -= 1
pool = [mp.Process(target=f,args=(i,v)) for i in range(10)]
for p in pool: p.start()
for p in pool: p.join()

It's not 100% guaranteed(since the buffer will be merge only when the master process know what's going on) but It definitely gives a more sensible ordering though.

minrk · 2013-01-03T22:20:09Z

I think this approach makes sense. If you make any print statements from a forked process you will trigger (forever) the super slow MP-safe approach. It's unfortunate that it will stick around forever, even while no subprocesses are running, but I think it's acceptable for now.

Things to check:

fast-printing subprocesses (while True: print i)
non-printing subprocesses shouldn't trigger this
Windows (doesn't have a proper fork, so mp.Process is a totally different beast)

One alternative would be to only write to the Queue on flush, rather than on every write. This would change the sync point for interleaved output, but would result in many fewer Queue items.

piti118 · 2013-01-03T22:24:56Z

Could you write one simple test to do the testing? I can implement the others but i'm not sure where to start writing test.

minrk · 2013-01-03T22:27:03Z

Could you write one simple test to do the testing? I can implement the
others but i'm not sure where to start writing test.

Sure, no problem.

piti118 · 2013-01-03T22:34:53Z

Found one bug with current implementation though.

If I let fork.process run and kill ipython notebook server before it joins... it segfault. I'm guessing it's trying to do something funny with that socket.

tests new mp.Process behavior for PR ipython#2734

minrk · 2013-01-03T23:41:43Z

If you pull from my '2734' branch: https://github.com/minrk/ipython/tree/2734, you will get a few basic tests.

minrk · 2013-01-04T00:09:24Z

Hm, there is an issue with the mp manager, though. mp.Manager creates subprocesses, and they don't get properly cleaned up if the master process is terminated:

import os, signal
from multiprocessing import Manager
m = Manager()
os.kill(os.getpid(), signal.SIGTERM)

will always leave the manager's process running. This will have to be addressed (it may be another reason that using a Manager will simply not be acceptable).

piti118 · 2013-01-04T00:40:50Z

Thanks, will pull that soon. But for issue your raise, I'm not sure about inner working of ipython signal handling and manger but wouldn't something like this work?

def __del__(self):
    self._manager.shutdown()

Not sure what happens if the old process still trying to write to it though.

piti118 · 2013-01-04T00:44:19Z

Or may be we should implement a poor man queue from mp.sharedctypes

minrk · 2013-01-04T01:04:56Z

On Thu, Jan 3, 2013 at 4:44 PM, Piti Ongmongkolkul <notifications@github.com

wrote:

Or may be we should implement a poor man queue from mp.sharedctypes

It seems every mp datatype that doesn't use a Manager has corruption issues,
and Manager can't be used unless we can fix the zombie manager issue.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2734#issuecomment-11867934.

piti118 · 2013-01-04T01:07:18Z

mp.sharedctypes has lock though(those without raw in its name). What do you mean by corruption issues?

http://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.sharedctypes

Although I'm not sure about dynamic allocation...

minrk · 2013-01-04T01:12:53Z

On Thu, Jan 3, 2013 at 5:07 PM, Piti Ongmongkolkul <notifications@github.com

wrote:

mp.sharedctypes has lock though(those without raw in its name). What do
you mean by corruption issues?

http://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.sharedctypes

The mp docs are littered with warnings about corruption issues.

But even without corruption, there are other issues. For instance, what
happens when a subprocess is terminated while it has a lock?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2734#issuecomment-11868489.

piti118 · 2013-01-04T01:27:16Z

The doc said you are screwed

Avoid terminating processes

Using the Process.terminate() method to stop a process is liable to cause any shared resources (such as locks, semaphores, pipes and queues) currently being used by the process to become broken or unavailable to other processes.

Therefore it is probably best to only consider using Process.terminate() on processes which never use any shared resources.

I think that's what manager.shutdown is for though if it's gets called that the right places we should be ok.

minrk · 2013-01-04T02:27:13Z

I think that's what manager.shutdown is for though if it's gets called that the right places we should be ok.

But that's exactly the point - this simply cannot be guaranteed. This means that Manager can never be an acceptable tool for this.

piti118 · 2013-01-04T02:37:07Z

Why wouldn't something like this work?

def __del__(self):
    if self.is_masterprocess():
         self._manager.shutdown()

Wouldn't del get called when terminating ipython kernel? I'm not talking about kill -9 but SIGTERM or something proper.

minrk · 2013-01-04T02:49:31Z

On Thu, Jan 3, 2013 at 6:37 PM, Piti Ongmongkolkul <notifications@github.com

wrote:

Why wouldn't something like this work?

def del():
if self.is_masterprocess():
self._manager.shutdown()

Wouldn't on terminating ipython kernel every single del gets called?

No. __del__ is only called on a clean exit (sys.exit()).

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2734#issuecomment-11870272.

piti118 · 2013-01-04T02:56:25Z

Hmmm that cuts manager out from picture completely....

I'm thinking about memory map file solution. That way I delegate the problem to OS level instead.

piti118 · 2013-01-04T02:59:51Z

Just thinking out loud here:

ZMQ with pull on flush... would screwed up the order.
But with poller or listener....we have the same zombie problem as manager.

minrk · 2013-01-04T03:10:10Z

On Thu, Jan 3, 2013 at 6:59 PM, Piti Ongmongkolkul <notifications@github.com

wrote:

Just thinking out loud there:

ZMQ with pull on flush... would screwed up the order.
But with poller or listener....we have the same zombie problem as manager.

Not true - Manager spawns a new process, with zmq that is not necessary.
For instance, you could have PULL running in a thread in the master,
and write could PUSH.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2734#issuecomment-11870685.

piti118 · 2013-01-04T03:18:22Z

That brings me to one of the previous question: whether the forked process after threading will have a copy of thread from the master process attach to it or not. (sorry I learned this long time ago and couldn't remember the finer details)

Concretely

Our master process is p0.
p0 spawn a thread t0.
Now I fork p1 from p0
The question is does p1 have t1 which is a copy of t0 attach to it or not.

minrk · 2013-01-04T03:32:13Z

On Thu, Jan 3, 2013 at 7:18 PM, Piti Ongmongkolkul <notifications@github.com

wrote:

That brings me to one of the previous question: whether the forked process
after threading will have a copy of thread from the master process attach
to it or not. (sorry I learned this long time ago and couldn't remember the
finer details)

Concretely

Our master process is p0.
p0 spawn a thread t0.
Now I fork p1 from p0
The question is does p1 have t1 which is a copy of t0 attach to it or
not. io.rprint("message sent from fork")

Yes, it will have that thread, but it will be defunct (you shouldn't have
to worry about it).

I have a prototype working with zmq push/pull that I can show you, if you
like

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2734#issuecomment-11871163.

piti118 · 2013-01-04T03:58:58Z

I stop writing zmq with polling thread midway because of the following concern.

def poller():
    lock()
    write_to_buffer() #<<< supposed fork happens when this thread is here
    unlock()

Lock is acquired forever(??) in the forked process or double unlock or what happen?

piti118 · 2013-01-04T04:09:52Z

Not true - Manager spawns a new process, with zmq that is not necessary.
For instance, you could have PULL running in a thread in the master,
and write could PUSH.

Hmm thread gets automatically killed when main process is killed? If not then we, have a zombie.

minrk · 2013-01-04T04:19:58Z

On Thu, Jan 3, 2013 at 8:09 PM, Piti Ongmongkolkul <notifications@github.com

wrote:

Not true - Manager spawns a new process, with zmq that is not necessary.
For instance, you could have PULL running in a thread in the master,
and write could PUSH.

Hmm thread gets automatically killed when main process is killed?

yes, threads are parts of a process - when a process dies, all threads die

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2734#issuecomment-11871998.

piti118 · 2013-01-04T04:23:31Z

You may be right. From my vague memory I recall that child thread will have no idea the parent thread gets kill and need signal handler to do that. I may be wrong though let me consult pthread manual.

piti118 · 2013-01-04T04:26:04Z

Or maybe python thread is something different... I give up. I trust you though 😉

minrk · 2013-01-04T04:29:40Z

you don't need any inter-process locks at all.
Here's how my test works:

main process:
- PULL socket in background thread polls
  - acquire thread lock while writing to buffer
- main thread writes directly to buffer as before (with lock, if multiprocessing)
background process
- write to PUSH socket

So the background process never grabs any lock.

piti118 · 2013-01-04T04:34:16Z

good point. will have another PR (hopefully will fix this for real this time). Thanks for being so patient with me.

piti118 · 2013-01-04T04:51:21Z

Now that I think about it again it wouldn't work though. Here is the problem
In polling thread t0 of the main process p0

def poll(self):
    while True:
        if ismasterprocess(): #doesn't guard sufficiently though
            thread_lock()
            pull_socket.pull() <<< suppose fork on p0 happens while t0 is deep inside that call
            writer_to_buffer()
            thread_unlock()
        else:
            break

Now after fork from p0, thread t1 a copy of t0 will still be running at that same point? (you said defuct but i guess it's still running at the same place, right?)

This would mean that it might reuse the zmq socket and casue the segfault again.

piti118 · 2013-01-04T05:50:04Z

from fork man page:

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. [THR] Fork handlers may be established by means of the pthread_atfork() function in order to maintain application invariants across fork() calls.

When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that is not asynch-signal-safe, the behavior is undefined.

Exactly 1 single calling thread is duplicated. I worried too much.

http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html

minrk · 2013-01-04T06:24:56Z

If you want to pull from my forksafe_nomp branch, that should get us most of the way there.

piti118 · 2013-01-04T08:47:27Z

I also have my zmq. The implementation is a little different, a bit shorter. The major different from yours is that child process flush get cascaded to parent. Mine doesn't have uuid though.

However, for me if fork and join is fast message drops like crazy.

https://github.com/piti118/ipython/tree/mp_fix_iostream_zmq <<< a bunch of debug message is still there

piti118 · 2013-01-04T10:35:20Z

Hmmm actually I'm not sure if your solution will have this problem. Run this like 3 times and it will definitely hung up.
The problem I think is because socket has LINGER by default. If I try to close them in order but the message que is in different order it will go in to a dead lock. Without linger message might get dropped.

import zmq
import multiprocessing as mp
import threading as t

def listener(quit):
    context = zmq.Context()
    puller = context.socket(zmq.PULL)
    puller.bind('tcp://127.0.0.1:5555')
    i=0
    while not quit.is_set():
        i+=1
        print 'poll',i
        s = puller.poll(1000)
        if s:
            m = puller.recv_string()
            print m


def sender(i):
    context = zmq.Context()
    pusher = context.socket(zmq.PUSH)
    pusher.connect('tcp://127.0.0.1:5555')
    pusher.send_string(str(i))


quit = t.Event()
listen_thread = t.Thread(target=listener, args=(quit,))
listen_thread.start()
numsend=100
pool = [mp.Process(target=sender, args=(i,)) for i in range(numsend)]
for i, p in enumerate(pool):
    print 'start', i
    p.start()
for i, p in enumerate(pool):
    print 'join',i
    p.join()
print 'hey'
quit.set()
listen_thread.join()

minrk · 2013-01-04T20:02:34Z

On Fri, Jan 4, 2013 at 2:35 AM, Piti Ongmongkolkul <notifications@github.com

wrote:

Hmmm actually I'm not sure if your solution will have this problem. Run
this like 3 times and it will definitely hung up.
The problem I think is because socket has LINGER by default. If I try to
close them in order but the message que is in different order it will go in
to a dead lock. Without linger message might get dropped.

I thought it might be LINGER as well, but that doesn't appear to be the
case.
The default linger is -1 (infinite) on zeromq-2, and 1000 (1 second) on
zeromq-3,
but the subprocess is clearly exiting before LINGER is able to have any
effect (mp.Process must be forcing it to halt, rather than allowing clean
exit).
This is what my tracker/wait code in flush() for a subprocesses works
around.

import zmqimport multiprocessing as mpimport threading as t
def listener(quit):
context = zmq.Context()
puller = context.socket(zmq.PULL)
puller.bind('tcp://127.0.0.1:5555')
i=0
while not quit.is_set():
i+=1
print 'poll',i
s = puller.poll(1000)
if s:
m = puller.recv_string()
print m

def sender(i):
context = zmq.Context()
pusher = context.socket(zmq.PUSH)
pusher.connect('tcp://127.0.0.1:5555')
pusher.send_string(str(i))

quit = t.Event()listen_thread = t.Thread(target=listener, args=(quit,))listen_thread.start()numsend=100pool = [mp.Process(target=sender, args=(i,)) for i in range(numsend)]for i, p in enumerate(pool):
print 'start', i
p.start()for i, p in enumerate(pool):
print 'join',i
p.join()print 'hey'quit.set()listen_thread.join()

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2734#issuecomment-11879104.

piti118 · 2013-01-07T11:17:37Z

Feel free to close this PR since your solution is clearly superior.

minrk · 2013-01-18T05:45:25Z

superseded by #2791

tests new mp.Process behavior for PR ipython#2734

Dual buffer with automatic switch. No performance hit for single proc…

2a9cec2

…ess.

piti118 mentioned this pull request Jan 3, 2013

make iostream multiprocessing aware #2712

Closed

minrk added a commit to minrk/ipython that referenced this pull request Jan 3, 2013

add basic print tests for kernel

36b4d8c

tests new mp.Process behavior for PR ipython#2734

minrk mentioned this pull request Jan 15, 2013

forward stdout from forked processes #2791

Merged

2 tasks

minrk closed this Jan 18, 2013

minrk added a commit to minrk/ipython that referenced this pull request Feb 4, 2013

add basic print tests for kernel

077cb3a

tests new mp.Process behavior for PR ipython#2734

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

add basic print tests for kernel

cca47df

tests new mp.Process behavior for PR ipython#2734

iostream fix for multiprocessing with no performance hit to single process #2734

iostream fix for multiprocessing with no performance hit to single process #2734

Conversation

piti118 commented Jan 2, 2013

piti118 commented Jan 2, 2013

piti118 commented Jan 2, 2013

minrk commented Jan 3, 2013

piti118 commented Jan 3, 2013

minrk commented Jan 3, 2013

piti118 commented Jan 3, 2013

minrk commented Jan 3, 2013

piti118 commented Jan 3, 2013

minrk commented Jan 3, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 4, 2013

piti118 commented Jan 4, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 4, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 4, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 4, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 4, 2013

piti118 commented Jan 4, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 4, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 4, 2013

piti118 commented Jan 4, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 4, 2013

piti118 commented Jan 4, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 4, 2013

piti118 commented Jan 4, 2013

piti118 commented Jan 4, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 4, 2013

piti118 commented Jan 4, 2013

minrk commented Jan 4, 2013

piti118 commented Jan 7, 2013

minrk commented Jan 18, 2013