Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored execute preprocessor to have a process_message function #905

Merged
merged 8 commits into from Apr 21, 2019

Conversation

@MSeal
Copy link
Collaborator

MSeal commented Oct 28, 2018

Built ontop of #982 which added additional requested testing. We can merge that PR first if it's desired.

Broke apart the large run_cell function to enable intercepting messages easier. This will enable deleting code from papermill where we have to reimplement the entire run_cell method.

@MSeal MSeal requested review from minrk and mpacer Oct 28, 2018
@MSeal MSeal mentioned this pull request Oct 28, 2018
@MSeal

This comment has been minimized.

Copy link
Collaborator Author

MSeal commented Oct 29, 2018

Hmmm seeings this test fail against master so I don't think it's related to the PR

@maartenbreddels

This comment has been minimized.

Copy link
Contributor

maartenbreddels commented Oct 29, 2018

I see the same in #900

@mpacer

This comment has been minimized.

Copy link
Member

mpacer commented Oct 30, 2018

@Carreau I think you mentioned something about changing the error formatting in IPython a while back — shall I assume that the errors that are cropping up are related to a recent release of IPython?


outs.append(out)
cell.outputs.append(result)

This comment has been minimized.

Copy link
@mpacer

mpacer Oct 30, 2018

Member

I don't really like the idea of having an API that works primarily via side-effects — that's a difficult API to document and test for.

Can you think of a way that we could do this more as a pure function? E.g., could we adhere to the pattern for we use with preprocessors? I think that would involve returning and reassigning the modified cell object rather than returning the result itself while treating the cell.outputs as a location where we can append changes?

This comment has been minimized.

Copy link
@MSeal

MSeal Oct 31, 2018

Author Collaborator

I can take a look. Mostly I tried to keep the logic the same and not change the design, just break it into smaller parts.

Copy link
Member

mpacer left a comment

First, this is a really exciting idea as it should make it much easier for people to handle messages in custom ways, which I know we're already doing in papermill in order to log messages as they come in.

However, given how this is a piece of core functionality to the app, I'd prefer to see a new test for handling the new API in the standard case explicitly. Because it's such core functionality it would be a testing failure if we weren't already covering this.

Ideally, I think we should also add a new test specifically to add a mocked replacement method, in part because that would surface some of the details of how the API would work when it is overridden. I think the side-effects would show up as an anti-pattern in that case.

Overall, I'd like if if we could find a way to make the new API work with fewer side-effects and if we could make the logic of how messages are being handled a little more explicit.

if cell_index in cell_map:
cell_map[cell_index] = []
# Check for remaining messages we don't process
elif not (msg_type in ['execute_input', 'update_display_data'] or msg_type.startswith('comm')):

This comment has been minimized.

Copy link
@mpacer

mpacer Oct 30, 2018

Member

This seems like really different logic from what we had before with msg_type.startsith('comm'); previously it was a continue.

This comment has been minimized.

Copy link
@MSeal

MSeal Oct 31, 2018

Author Collaborator

It's actually identical logically, by not returning None as a function it will effectively continue for the parent function loop

This comment has been minimized.

Copy link
@MSeal

MSeal Oct 31, 2018

Author Collaborator

Might have missed the not?

This comment has been minimized.

Copy link
@mpacer

mpacer Dec 13, 2018

Member

I don't think so. If the msg_type.startswith('comm') is true, then it does not fall into here and is not handled with another else, instead it will return the msg as the response (since that's what it defaults to and it is never reassigned).

This comment has been minimized.

Copy link
@mpacer

mpacer Dec 13, 2018

Member

Additionally now if something is an update_display_data message, it will return something (specifically the content of the msg).

I'm starting to wonder if this all might be fixed if we default to returning None

This comment has been minimized.

Copy link
@MSeal

MSeal Apr 15, 2019

Author Collaborator

Was working fine post-rebase, but I changed this to a raise control flow pattern which I think is a lot easier to read.

try:
out = output_from_msg(msg)
# Assign output as our processed "result"

This comment has been minimized.

Copy link
@mpacer

mpacer Oct 30, 2018

Member

Unless I'm reading this incorrectly, I think this may have in effect added nesting and only applied The cell_map logic in the case where it is not an execute_input or a comm? if that's what this is intended to accomplish we should talk about that, but I was reviewing this originally as a refactor not a change in functionality.

This comment has been minimized.

Copy link
@MSeal

MSeal Oct 31, 2018

Author Collaborator

! (A || B) == (!A and !B) So it should be equivalent to before I believe. I could change it to msg_type not in ['execute_input', 'update_display_data'] and not msg_type.startswith('comm') if you think it reads easier?

This comment has been minimized.

Copy link
@mpacer

mpacer Dec 13, 2018

Member

The problem is that display_id should have been dealt with in the cell_map logic regardless of the message type previously, except in the cases where a continue was followed. That try is not going to be hit by all message types and so the else also won't be hit by all message types.


display_id = content.get('transient', {}).get('display_id', None)
if display_id and msg_type in {'execute_result', 'display_data', 'update_display_data'}:
self._update_display_id(display_id, msg)

This comment has been minimized.

Copy link
@mpacer

mpacer Oct 30, 2018

Member

Previously there was a continue in the case where there was an update_display_data message with a statement that # update_display_data doesn't get recorded am I correct that that logic is now implicit if it's present at all?

This comment has been minimized.

Copy link
@mpacer

mpacer Dec 13, 2018

Member

@MSeal could you respond to this comment?

This comment has been minimized.

Copy link
@MSeal

MSeal Apr 15, 2019

Author Collaborator

Late response (post rebase), but yes it's implicit by not raising CellExecutionComplete here

@Carreau

This comment has been minimized.

Copy link
Contributor

Carreau commented Oct 30, 2018

@MSeal

This comment has been minimized.

Copy link
Collaborator Author

MSeal commented Oct 31, 2018

I can add a test for the inheritance case. It did seem when I tried moving things to the wrong places on purpose that tests would usually fail. I'll do a more thorough check.

Do you want me to do functional rewrite to avoid the side_effect patterns already there? Might be higher risk of changing execution logic but I can do the more extensive changes if you think it's worth it.

@MSeal MSeal force-pushed the MSeal:processMessage branch from 114d66e to b64442e Dec 8, 2018
@MSeal

This comment has been minimized.

Copy link
Collaborator Author

MSeal commented Dec 8, 2018

@mpacer I added a new test. From reading the code I think we'd need to do a semi-significant update to add in the message handler we talked about. Because of that I'd prefer to split the implementation into two parts and do that refactor post-merge of this PR.

@MSeal MSeal mentioned this pull request Dec 10, 2018

original = copy.deepcopy(input_nb)
wpp = WrappedPreProc()
executed = wpp.preprocess(input_nb, {})[0]

This comment was marked as outdated.

Copy link
@mpacer

mpacer Dec 13, 2018

Member

I would prefer it if this processed more kinds of messages since my main concerns have to do with maintaining compatibility with the current logic across all of the different message types (including comms, update_display_data, etc.).

original = copy.deepcopy(input_nb)
wpp = WrappedPreProc()
executed = wpp.preprocess(input_nb, {})[0]
self.assertEqual(outputs, [

This comment has been minimized.

Copy link
@mpacer

mpacer Dec 13, 2018

Member

Elsewhere in the library we tend to use direct asserts as we are using pytest to run the tests. In addition to stylistic consistency, I think it's a lot easier to read.

This comment has been minimized.

Copy link
@MSeal

MSeal Dec 14, 2018

Author Collaborator

Noted, will fix

def test_process_message_wrapper(self):
outputs = []

class WrappedPreProc(ExecutePreprocessor):

This comment has been minimized.

Copy link
@mpacer

mpacer Dec 13, 2018

Member

I would prefer if we directly created the context needed so that process_message can be tested functionally, iterating over the cell, msg and cell_index as directly as possible.

I'd prefer if we weren't concatenating all of the outputs into a single object for the whole notebook, I'm thinking it should follow the cell structure.

I'd also like to make sure that message types that are supposed to return None are returning None, which means keeping track of at least the msg type in relation to the output if not some of the more predictive fields (e.g., a display_id that we know is set to a particular value).

This comment has been minimized.

Copy link
@MSeal

MSeal Apr 15, 2019

Author Collaborator

Done -- this is rebased off #982 now

@MSeal

This comment has been minimized.

Copy link
Collaborator Author

MSeal commented Apr 15, 2019

@mpacer PR should now be ready for re-review

@MSeal MSeal force-pushed the MSeal:processMessage branch from f2a99cc to 6804e9e Apr 16, 2019
@MSeal MSeal force-pushed the MSeal:processMessage branch from 6804e9e to 9ed71d7 Apr 16, 2019
@@ -25,6 +25,8 @@
from ..utils.exceptions import ConversionException


class CellExecutionComplete(Exception): pass # Used as a control signal

This comment has been minimized.

Copy link
@mpacer

mpacer Apr 16, 2019

Member

can you make this an actual docstring? Specifically one that describes what this is going to be used for?

@@ -185,18 +185,18 @@ def normalize_output(output):
def assert_notebooks_equal(self, expected, actual):
expected_cells = expected['cells']
actual_cells = actual['cells']
self.assertEqual(len(expected_cells), len(actual_cells))
assert len(expected_cells) == len(actual_cells)

This comment has been minimized.

Copy link
@mpacer

mpacer Apr 16, 2019

Member

Huzzah! Many thanks for switching to using the pytest assert idiom.

def process_message(self, msg, cell, cell_index):
result = super(WrappedPreProc, self).process_message(msg, cell, cell_index)
if result:
outputs.append(result)

This comment has been minimized.

Copy link
@mpacer

mpacer Apr 16, 2019

Member

why does this still need special logic distinct from the standard process_message?

This comment has been minimized.

Copy link
@MSeal

MSeal Apr 16, 2019

Author Collaborator

It's purely to test that one could wrap an retrieve outputs to process on it's own. It has no real purpose for doing so beyond showing how to wrap the function in inherited classes. Namely to prove we can wrap the function safely in papermill cleanly.

@mpacer

This comment has been minimized.

Copy link
Member

mpacer commented Apr 21, 2019

This looks great! Thanks for working through all the issues!

@mpacer mpacer merged commit 7502276 into jupyter:master Apr 21, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@MSeal MSeal added this to the 5.5 milestone Apr 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.