Add a metadata attribute to messages #2051

Merged
merged 10 commits into from Jul 21, 2012

Projects

None yet

2 participants

@jasongrout
Member

We'd like to easily add metadata to messages. Adding an optional top-level metadata attribute to messages seems like a natural solution.

With this pull request, we can add metadata to a session using a context handler like:

@contextmanager
def session_metadata(metadata):
    # flush any messages waiting in buffers
    sys.stdout.flush()
    sys.stderr.flush()

    session = sys.stdout.session
    old_metadata = session.metadata
    new_metadata = old_metadata.copy()
    new_metadata.update(metadata)
    session.metadata = new_metadata
    try:
        yield None
    finally:
        sys.stdout.flush()
        sys.stderr.flush()
        session.metadata = old_metadata

with session_metadata({'location': 4}):
     print 'hello world' # goes to location 4

(is there a better way to get the session object than to get it from sys.stdout?)

Old Description

We'd like to attach some metadata to stdout and stderr messages, for example, indicating where
the frontend should put the output. It would be really convenient if sys.stdout and sys.stderr (which I think is
IPython.zmq.iostream.OutStream, right?) had a 'metadata' attribute that could be set and would be added to each outgoing message. This would also change the messaging spec to add a 'metadata' attribute to the
content dict for stream messages (just like display_data messages already have).

@contextmanager
def stream_metadata(metadata):
    sys.stdout.flush()
    stdout_metadata = sys.stdout.metadata
    old_stdout_metadata = stdout_metadata.copy()
    stdout_metadata.update(metadata)
    sys.stdout.set_metadata(stdout_metadata)

    sys.stderr.flush()
    stderr_metadata = sys.stderr.metadata
    old_stderr_metadata = stderr_metadata.copy()
    stderr_metadata.update(metadata)
    sys.stderr.set_metadata(stderr_metadata)
    try:
        yield None
    finally:
        sys.stdout.set_metadata(old_stdout_metadata)
        sys.stdout.set_metadata(old_stderr_metadata)

with stream_metadata({'location': 4}):
     print 'hello world' # goes to location 4
@jasongrout
Member

Of course, docs need to change as well, and there may be other places in the codebase that would need to be changed. I'm willing to do those, but I want to make sure that you guys are good with the idea before putting in that time.

@minrk
Member
minrk commented Jun 28, 2012

I think this is sensible, let's hear what @ellisonbg has to say.

@jasongrout
Member

As we experiment with this, we realize that we also need to add this sort of information on display_data messages, pyout messages, etc. Maybe it would be cleaner to add a metadata attribute to the session object, and anytime the session sends a message (on the iopub channel?), it adds the metadata to the message. Should this sort of iopub-level metadata be added to the header instead of the content, though?

@jasongrout
Member

Looking into things even more, it seems like if we just add the capability to define a default subheader in the session object, that might do all that we want.

@jasongrout
Member

If we did put the metadata in the subheader, we'd also need to modify kernel.js to pass the header information (or maybe just a header metadata attribute) to callbacks, in the Kernel.execute function.

@jasongrout
Member

I just added some commits which implement a configurable subheader object and also pass the header to javascript callbacks.

@jasongrout
Member

Okay, here's a third idea, since having the metadata in the header just seems awkward. For one, the header lives on as parent_headers for possibly a long time. Secondly, an application (like callbacks in javascript) could care about metadata, but would probably rarely care about message ids or session uuids, so there seems to be a logical distinction between metadata/subheader and the header. In a sense, the header deals with routing, while metadata deals with the content.

Instead, how about adding a (optional?) metadata attribute to the basic message spec, so messages would look like {header: {...}, parent_header: {...}, metadata: {...}, content: {...}}? Things like the javascript callbacks would get the message content and the metadata.

@minrk
Member
minrk commented Jun 30, 2012

I like the notion of metadata being a fundamental message property. Since we already put it in the content for some messages, I would say we should extend that to metadata being an optional key in the content dict of all message types. I'm not 100% sure about the API for this, at the Session level or otherwise.

@minrk
Member
minrk commented Jun 30, 2012

Further thinking out loud:

There are three options:

  1. metadata in the header (possibly via subheader). I don't like this, because I don't think most library code should be using headers for anything. The content should be everything you need to know about the message itself, and the headers should only be used for low-level routing of handlers. But this is exactly what I did in the parallel code for message introspection without unpacking content.
  2. metadata as a fourth top-level component of all messages. On some level, this is the cleanest, but most handlers really only need the 'content' of the message after the Application has used the headers to figure out what handlers to call. So this would mean we are always passing the whole Message around, or we are always passing two dicts (content and metadata) around.
  3. metadata as an optional key in all content dicts. This has the benefit of extending what we already have, rather than changing it. But we don't actually use the metadata key from displaypub for anything, so there's not a high cost to change.

Personally, I'm thinking 2. is the best choice right now, as @jasongrout proposed.

@jasongrout
Member

I'm liking the idea of a metadata top-level element more. The distinction from the content is elegant, in my mind. I think we may be to the point where a post to ipython-dev would make sense to get a wider audience; I'll post right now.

@jasongrout
Member

I think the patches above add a top-level metadata attribute to messages in the right place. When the design decision has been made, I can squash the commits to make the history make more sense.

It'd be nice if github kept track of old versions of the description, so we could change it, but not lose context of the comments at the top.

@jasongrout
Member

I added a few more commits to start working towards having metadata a mandatory element of messages. There might still be parts where I'm missing things (for example, the javascript code that creates messages?)

I'm stopping working on this until a decision is made about a design direction, though, since the source changes now seem to be more and more involved.

@minrk
Member
minrk commented Jul 20, 2012

After SciPy discussion with enthought/enaml, we are definitely going with your metadata approach. I will check the JS to see if there's anything more that needs to be done.

One thing I would do is totally remove the subheader, and put all uses of that into metadata. Since I believe that 100% of subheader code is written by me, I am happy to do that.

@jasongrout
Member

Great! Who is enaml?

@minrk
Member
minrk commented Jul 20, 2012

https://github.com/enthought/enaml

Chris Colbert is the main developer here.

@minrk
Member
minrk commented Jul 20, 2012

Can you add the few commits from my branch?

They remove subheader entirely, using metadata instead, and make sure the notebook works as well.

@ellisonbg can you weigh in on this one?

@jasongrout
Member

Sure. Can you just submit a pull request to my branch to make it easy?

@minrk
Member
minrk commented Jul 21, 2012

I tried to, but it didn't let me. Perhaps you have pull requests disabled for your branch?

You can still do git pull minrk stream-metadata the old-fashioned way.

@jasongrout
Member

It didn't let you? That's very odd---what did it say? I don't even know if there is a way to disable pull requests on a branch.

I know I can pull, but I thought it would be good to learn the ins and outs of github better. If you don't want to diagnose this further, though, I can just pull. I just did it in a test branch and it merged just fine.

@jasongrout
Member

(so just let me know if you want to try to figure github out better, or if you want me to just pull and get on with coding.)

@minrk
Member
minrk commented Jul 21, 2012

I honestly have no idea why. But in the list of available pull targets in the New PR UI, which includes dozens of IPython forks, your fork is not listed. I think just give it a pull, and maybe it's a GitHub bug that will work itself out magically.

@jasongrout
Member

huh, interesting. Okay, I pulled and pushed.

@jasongrout
Member

Also, are there any other implementations of the spec that need to be changed? qtconsole or anything like that?

@minrk
Member
minrk commented Jul 21, 2012

The Session object is the only implementation, shared by all IPython applications. The subheader was only used at the application level in IPython.parallel, so there shouldn't be anything to change anywhere else.

@jasongrout
Member

Great!

@minrk
Member
minrk commented Jul 21, 2012

This actually needs a rebase, due to a cleanup PR earlier in the month. It's just on the blocks of assertEquals in test_session in one commit.

If you want to do that rebase, go ahead. I also made a rebased version of this branch at minrk/metadata-rebased, so you can just use my version in-place with:

git fetch minrk
git reset minrk/metadata-rebased --hard
git push -f
@jasongrout
Member

If you've already done the work, great! I just pushed the rebased branch.

It also probably wouldn't hurt to squash some of my commits down, since my early commits were just exploring ideas.

@minrk
Member
minrk commented Jul 21, 2012

Since this PR is owned by you, you would have to do the squashing. You are welcome to do a git rebase -i if you like. But otherwise, I think this can go in as-is (as soon as I confirm passing with test_pr)

@jasongrout
Member

Well, it's a question about the style of the project---do you want the thinking process, the scaffolding, as it were, or do you want a nice finished set of commits. If you think it can go in as-is, that's great. If you want me to clean it up a bit, let me know.

@minrk
Member
minrk commented Jul 21, 2012

We definitely don't require that each commit be final - cleaning up a PR is usually up to the style preferences of the committer (I tend to try to squash my own typos), but not enforced unless a big mess is made.

@minrk
Member
minrk commented Jul 21, 2012

Test results for commit 58134b4 merged into master
Platform: darwin

  • python2.6: OK (libraries not available: cython matplotlib oct2py pygments pymongo qt rpy2 tornado wx wx.aui)
  • python2.7: OK (libraries not available: oct2py wx wx.aui)
  • python3.2: OK (libraries not available: cython matplotlib oct2py pymongo qt rpy2 wx wx.aui)

Not available for testing: python3.1

@minrk minrk merged commit ea4f608 into ipython:master Jul 21, 2012
@jasongrout
Member

I guess we were working past each other. I just cleaned things up and force pushed again, but I think it was after you merged, so hopefully things didn't get messed up.

@jasongrout
Member

I'm glad it's in! Now we can run the Sage cell server off of ipython master rather than our own custom branch.

@minrk
Member
minrk commented Jul 21, 2012

Sorry about that - we are working through our outstanding PRs, and I wanted this one in before others of mine that might have caused this to need another rebase.

Thanks for the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment