New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enables input removal via preprocessor, template and invalid nb intermediate representation #643
Conversation
175b9c7
to
c54646c
Compare
A few design questions:
I may change my mind once I hear more of the reasoning, but my initial reaction is definitely -1 to a new 'pseudo-notebook' format: it doesn't seem to be clearly defined, and I don't see a compelling need. Exchanging data between components in the existing well-specified notebook format is much nicer for interoperability. |
Thinking about this some more, I'm not even sure if we need a preprocessor to do it - it could be configurable on
Implemented something like this: def input_visible(cell):
tags = cell.get('metadata', {}).get('tags', [])
return resources.global_content_filter.include_input \
and not self.remove_input_tags.intersection(tags) |
That's cool! I didn't know about tests. |
In the video meeting today, a couple of arguments were presented for not relying on cell-level metadata for this. Firstly, that we don't want that to become a public API, where people rely on setting the metadata directly (paraphrased - let me know if I've misunderstood). I see a few possible ways to deal with this without involving pseudo-notebooks:
The other argument was that we should consider what we can do if we're not constrained to valid notebooks going into exporters. I'm on board with that, but we should consider it before making that change, and look carefully at whether we can do what we want within valid notebooks (or by only adding new fields). Even if we can't, we need to know which bits of the format pseudo-notebooks are allowed to break. |
Having an intermediate representation (IR) makes sense to me, as does the fact that the in-memory representation in pretty much all the frontends doesn't match the on-disk format. I couldn't imagine standardizing the in-memory representation (much of the time its implementation specific), but I could imagine an intermediate representation of a notebook being similar to the way we think of a compiler's IR. |
Nbconvert exporters are a public interface - we encourage third parties to implement them and integrate them using entry points. So whatever data goes into them needs to be specified one way or another, it can't be an implementation detail. One possible spec would be: it's a notebook which may have extra fields, as if it was in format version 4.x+1. Another: it's a notebook where each cell is also required to have a |
The data that goes into the exporter when you call But, that doesn't constrain the kind of thing that enters the templating step. Currently, we have no schema checking for what goes into the template. Implicitly that suggests that what is to be passed in need not necessarily be a notebook. What this is doing is making that explicit and acknowledging that there maybe notebook-like objects that are able to be handled by the templating system. |
NB: I'm going to suggest the following idea, because I already did the work to write it out and I think it's worth considering. Still I don't think its a good solution. Is there a way to extend an existing json schema? If so, what if we were to require that someone how to extend the schema to accommodate their sieve? I can see that becoming a process that is complicated and unable to be composed uniquely. However, it would require people to be specific about how things were being extended… For example, the thought would be something along the lines of: {"code_cell": {
"properties": {
"source": {
"$ref": "#/definitions/misc/multiline_string_or_null"
}
}
}
{"definitions": {
"misc": {
"multiline_string_or_null":{
"oneOf" : [
{"type": "string"},
{
"type": "array",
"items": {"type": "string"}
},
{"type": "null"}
]
}
}
} |
Thanks for tackling this @mpacer ! I am not quite sure why the tag based input/output filtering can't be implemented as a notebook->notebook transform, with all the pseudo-notebook stuff being an internal implementation detail that doesn't introduce any new public APIs. Can you clarify this point? I apologize in advance that I haven't though much about this problem. |
Output and cell filtering can be implemented as a nb to nb transformation.
However, there is no way to have a cell that has output with no input, by
premise as part of the notebook format.
Accordingly, to remove input (but not the output) either requires making a
pseudo-nb or involves long-range signaling from where the tag is identified
to the point where it is decided how to handle the presence of that signal.
The current solution directly manipulates the notebook to set the source to
None, and the only necessary changes to the templates was to ensure that
they recognised that None meant there was no source and so no input area.
…On Tue, Aug 8, 2017 at 21:06 Brian E. Granger ***@***.***> wrote:
Thanks for tackling this @mpacer <https://github.com/mpacer> !
I am not quite sure why the tag based input/output filtering can't be
implemented as a notebook->notebook transform, with all the pseudo-notebook
stuff being an internal implementation detail that doesn't introduce any
new public APIs. Can you clarify this point? I apologize in advance that I
haven't though much about this problem.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#643 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACXg6JoJv9HG2zv4Ch_2VvXCiyf1DzBbks5sWTBegaJpZM4OwCfu>
.
|
It sees that only hiding input has the problem, yes? Because outputs and cells can be removed as plain transforms. And since we do already have the transient field for data that should not be saved in other frontends, we know that it won't come from files on disk. So I would propose:
Are there any cases this doesn't cover? I do think that the IR being a notebook with additional fields is AOK. The only case where we need to do anything about these extra fields is |
We still can't produce an invalid IR in the current Preprocessor steps, so is the idea that I'll just change the way this Sieve works… and now if we're only talking about adding things it probably makes sense to change the name to a metaphor that suggests adding things rather than removing them. |
|
@takluyver @minrk I |
This reverts commit 3f7b013.
da8ec19
to
8b68e7b
Compare
Thanks, this makes much more sense to me now. :-) I'd suggest we add a default tag for each kind of filtering, so you can use it without configuring nbconvert by adding one of the standard tags. E.g. |
@takluyver I'd prefer to leave any default value questions for a later date. What we choose to make those be will be likely an opportunity for disagreement. I would like to move forward with this & then #645 so that we can get a new release of nbconvert out with this functionality. Barring defaults does anyone else have issues with the PR as it stands? |
I'm fine with leaving default tags out of this PR, but I'd like to consider them before making a release. |
I don't support their existence as part of this release. Partially, I want to release before Jupytercon so I can talk about these features in my talk about the document structure and how to use it. I'm opposed to including them for this release partially because once they are added, we cannot remove them without breaking notebooks and notebook functionality. Whereas we can always add them at a later date without removing functionality. The only challenge then would be conflicting with user tag names, but that would be a concern even if we released it with defaults. For that reason (among others), I would like us to consider using a different signaling mechanism for unconditional element removal, with tag-based element removal being reserved for conditional element removal with no conditions included as default. |
@takluyver can I release once I take care of this and #645 and release the new nbformat? One of the reasons I want to not have a non-empty default is that it gets complicated regarding how to handle traitlet defaults when it's a If you want to cancel the removal of the cells then you need to know to overwrite reassigning the traitlet value to an empty I do agree with you that there should be a way for people to set things to "always" be filtered out, but I think that should not use a magic tag value. Maybe it could live in the |
Fine on the release. I agree that configuring collections through traitlets is a bit of a pain. But I would ultimately like to define some magic tag values that can remove cells with default nbconvert config. We added cell tags to give notebook authors an easier way to do this kind of thing than editing JSON cell metadata. |
So… can this be merged? |
@takluyver @minrk @rgbkrk @ellisonbg Do I have consensus to merge & release this & #645? |
Please hold off on merging this. I think we have a significant misunderstanding about all these things. |
I looked at part of this, but not all of it. As I understood what happened at the last in person team meeting, we decided to use official (as in Kyle's PR on nbformat here https://github.com/jupyter/nbformat/pull/94/files) cell metadata (not tags) as the official way to specify hidden input/output in the notebook format, that would be respected across our entire stack. The idea is that we want a single way in the notebook format for this to be represented. Views may have additional ways of representing it, for example, to allow different users to see different views in a real-time collaboration setting. But once the information hits the notebook document, there should be only one way of representing that. I should note that in this context, hidden means "not currently shown, but still present." In the live notebook, this will amount to the content having some sort of "collapse" based UI. I don't see why nbconvert should not respect this metadata. In this case, nbconvert would (for output formats that support it) should keep the content but show it collapsed by default and provide a way for the user to show it. Ideally, we could reuse whatever UI we settle on for JupyterLab. We also talked about tag based filtering, but the idea there is that nbconvert and the live notebook could hid/show content based on semantic tags that a user adds. For example, a user could add |
@ellisonbg This specifically does not implement any "magic tags"; I had the same understanding as you regarding their user-defined nature. @takluyver has suggested that that would be a desirable feature, but I am not sure where you are seeing that here. |
OK great. I saw the |
@ellisonbg No worries! And thanks for checking! |
Thanks, @mpacer! I think this PR is uncontroversial, and we can continue to explore more convenient ways to expose this kind of thing in a future iteration. |
This introduces a new step in between Preprocessors and passing the notebook to the template and a new class to manage that step. This also introduces testing infrastructure for this class.
The new class is called a
Sieve
. The key idea is that this class allows us to have a step afterPreprocessor
s are done that produces an object that is not guaranteed to adhere to the notebook format. This is needed in order to have output format independent ways of removing input.Accordingly along with this object this introduces the
TagRemoveInputSieve
. This is equipped with the traitletTagRemoveInputSieve.remove_input_tags
which is a set of strings that are checked to see if a cell's input is to be transformed to None.This also changes the skeleton to exclude any cells' whose source is
None
from being included in the output format.I anticipate discovering more clever ways of using the ability to break the notebook json schema, but this should get us started.
The API for the Sieve class is largely derived from the Preprocessor class, which I took to be a good model since Sieves are (essentially) Preprocessors that allow producing invalid in-memory notebook objects.
NB: Sieves are not to be invoked by the
NotebookExporter
or any in-place notebook transformation, meaning they should never appear as part of theExporter
class.Implementation influenced by extensive conversations with:
@Carreau @fperez @ian-r-rose @OliverEvans96
From those conversations additional things were agreed to be improvements, but possibly outside the scope of this PR…
Tasks worth considering:
add narrative documentation explaining what sieves doconsider the appropriateness of the name in the context of not just removing content but modifying contentcc'ing people who probably want to know about how this affects nbconvert & the larger Jupyter ecosystem:
@takluyver @minrk @rgbkrk @ellisonbg