In-process kernel support #2382

Closed
wants to merge 8 commits into
from

Projects

None yet

5 participants

@epatters
epatters commented Sep 4, 2012

I am opening this pull request to track my ongoing implementation of an in-process kernel manager.

I outlined my general approach for this in #1085. Feedback is welcome.

The following gist shows some example usage: https://gist.github.com/3659874.

@epatters
epatters commented Sep 4, 2012

Status: Nothing works yet.

@travisbot

This pull request passes (merged b50af5cb into c678a87).

@minrk
Member
minrk commented Sep 4, 2012

Does it not make more sense to use the existing Kernel (possibly subclass) and only change the way the KernelManager communicates with it? E.g. instead of queueing up a send and registering a callback, simply call the handler method?

That would result in significantly more code reuse, I think.

@travisbot

This pull request fails (merged 2afdbd91 into c678a87).

@epatters
epatters commented Sep 4, 2012

Well, the existing Kernel has the ZMQ code baked in pretty thoroughly. Disentangling it into a ZMQ-free subclass might be tricky. But even in this case, the kernel manager has to be re-written to remove the ZMQ dependency.

So far, the only non-trivial code duplication is in EmbeddedKernel.execute. Most of the code is just stubbing out a kernel manager look-alike so the frontends can be used without modification.

Have I understood your point correctly?

@minrk
Member
minrk commented Sep 4, 2012

Well, the existing Kernel has the ZMQ code baked in pretty thoroughly.

There shouldn't be anything zmq-specific in the Kernel class outside start(). I think any implementation should still use the Session object, and still build messages according to the message spec.

@epatters
epatters commented Sep 5, 2012

The Session object is largely for ZMQ messaging, and most of the methods in Kernel use the Session object to send and receive messages. So I would say that the Kernel class is heavily ZMQ-specific. I cannot see a way to use the Kernel class without substantially rewriting it.

Now, I do fully intend to implement the messaging spec. Indeed, the Qt console will break unless virtually everything is implemented. I don't intend to use a Session object, but I will generate a session UUID, message UUIDs, headers, and everything else required by the spec. This is very simple, and the only relevant method from Session is msg.

Is your preference to refactor both Kernel and Session into ZMQ-free base classes? I can see doing that for Session, but I would be wary of attempting it for Kernel.

@epatters
epatters commented Sep 5, 2012

@minrk, after some more reflection, I agree that it would be a good idea to split Session into ZMQ-free base class and ZMQ-specific subclass for sending and receiving messages.

I am trying to work out a good organizational scheme for this. Currently, I have an IPython.embedded package. However, this is not really a good place to put generic messaging tools (or my refactored kernelmagics module). I would prefer to create a package structure like:

IPython/
    kernel/
        embedded/
        zmq/
        kernelmagics.py
        serialize.py
        session.py

This would, however, introduce a serious backwards incompatibility. Of course, we could have IPython.zmq import from IPython.kernel.zmq for a few versions, if desired.

@bgranger, @fperez, is this or a comparable refactor something that you would support?

@epatters
epatters commented Sep 5, 2012

Dang it, I used @bgranger instead of @ellisonbg again. At least that appears to be a dead account.

@epatters
epatters commented Sep 5, 2012

Status: I refactored Session into a ZMQ-free base class. It remains to refactor the corresponding unit tests, though they all still pass.

Pending feedback, I have not implemented the package restructuring described above.

@epatters
epatters commented Sep 5, 2012

Status: Can now execute code from a Qt console using an embedded kernel manager. No IO--stdout, stderr, displayhook, or raw_input--is implemented yet.

@epatters
epatters commented Sep 6, 2012

Status: I think everything is basically working (except for raw_input in the terminal frontend).

Here are some examples: https://gist.github.com/3659874. Notice how changing from an out-of-process kernel to an in-process kernel is as simple as swapping out kernel managers.

@takluyver
Member

Thanks, I think this will really help projects like Spyder to use IPython. It looks like there's a merge conflict, though - have you got time to quickly rebase?

@epatters
epatters commented Sep 6, 2012

Sure, let me work on that.

@ccordoba12

Spyder dev here :-) Thanks @takluyver for pointing it out. We already have an almost complete integration with IPython 0.13, embedding an external kernel inside Spyder itself (in its own widget), but I've been following quite closely this PR because it'll probably let us simplify a lot our current implementation.

I have a couple of questions to @epatters (sorry if they are too simple):

  1. How could we interact with the embedded kernel? I mean, how could we call, for example, object_info? Should we maintain a reference to the kernel and then just call the corresponding method?
  2. How other RichIPythonWidget's can connect to a current kernel? Would it be just a matter of assigning an already created kernel manager to it?
  3. How do you recommend Json connection files should be handled? In the gist you referenced, they are created inside the dir where the script is started. Are there other possibilities to create them inside a specified dir?
  4. What about stderr? Will it be printed to the console? Our users have reported they can easily check it out in our "external kernel" widget.

The main problem for us before adopting this approach is our Variable Explorer. It reads and writes the current namespace objects in pickle format, something that, if I understand correctly, the IPython messaging spec doesn't support.

@minrk
Member
minrk commented Sep 6, 2012

This seems like an significant amount of code, and far more disruptive than I believe is actually necessary for this project.

The principal implementation of the kernel and message spec are two classes:

  • the Kernel, which has zero zmq-specific code outside start()
  • the Session, which has zero zmq-specific code other than a type-check, which can simply be removed.

I see no reason for there to be parallel implementations of message handlers, and probably no reason for there to be more than one Session class (not sure on that one, though).

I think the bulk of this code can be implemented with a DummySocket, which is really just a Queue, with a send_multipart method that simply pushes messages onto a list. With that in place, there should be zero Kernel or Session code that needs to be duplicated. The implementation becomes purely in the KernelManager, whereas this as it is has an entire duplicate implementation of the Kernel, which we already discovered to be an unacceptable maintenance burden with the pykernel and streamkernel implementations that we have since removed.

@epatters
epatters commented Sep 6, 2012

Regardless of the approach taken, a substantial refactoring is required to ensure that the embedded kernel neither depends on nor imports ZMQ. Session does need to be split into multiple classes.

Your complaint about the code duplication in the embedded kernel is reasonable. Indeed, I was concerned about this myself. I am willing to consider an approach where both Kernel and EmbeddedKernel derive from a common base class BaseKernel, which does essentially everything except instantiate the sockets. Kernel would provide ZMQ sockets and EmbeddedKernel would provide dummy sockets. There would be some other differences between the two, such as when stdout/stderr are redirected.

Ultimately, I'm unsure of whether the gain in code reuse offsets the general hackiness of using dummy sockets. As a matter of fact, I'm inclined to agree with you, since the dummy sockets are really just an implementation detail. I should warn you, though, that if you measure "disruptiveness" crudely in terms of diff size, then I expect refactoring Kernel will only make this PR more "disruptive".

@epatters
epatters commented Sep 6, 2012

@ccordoba12, I have to run now, but all of the things that you want to do will be very easy to do. I will respond in more detail when I get a chance.

@minrk
Member
minrk commented Sep 6, 2012

Disruptive is not a particularly useful term, and if refactor is needed, refactor is needed. My main point is that whether or not the Kernel is local or remote should really be a minor implementation detail, whereas this approach takes an entirely forked code path, which has been proven already by pykernel and streamkernel to be untenable in IPython.

I am willing to consider an approach where both Kernel and EmbeddedKernel derive from a common base class BaseKernel, which does essentially everything except instantiate the sockets.

The existing Kernel does not instantiate sockets, so it should already be extremely close to this base class, with minor changes, such as:

  1. Skip type-checking on the socket/stream traits, allowing them to be dummies
  2. leave out the IOLoop-specific start method, to be defined in a zmq-specific subclass (or in the Application object, I'm not sure).
  3. possibly replace _abort_queue with a dummy, because it is irrelevant when embedded.

That should be it.

Session does need to be split into multiple classes.

How so? What does Session do that is zmq-specific? I see literally nothing beside type checking on sockets, which is itself entirely unnecessary, as it artificially prevents the DummySocket approach.

One thing that I find unfortunate in this is that making an embedded local kernel, and making a kernel that does not require zmq to be importable are really two very different (if related) projects. The local/embedded kernel should require only fairly subtle changes, but the import-avoidance will certainly require moving files around, which mainly serves to obscure where there are actual changes to the code.

@epatters

Very well, @minrk, I'll adopt your minimally disruptive approach as I return to work on this. In particular, I'm not going to worry about ZMQ import avoidance. If people care about this, someone other than me can deal with it.

@epatters

This PR has been superseded by #2397.

@epatters epatters closed this Sep 10, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment