Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell metadata #38

Closed
rossant opened this issue Mar 13, 2015 · 14 comments
Closed

Cell metadata #38

rossant opened this issue Mar 13, 2015 · 14 comments
Labels

Comments

@rossant
Copy link
Owner

rossant commented Mar 13, 2015

  • Provide a simple syntax to specify arbitrary block-level metadata
  • Line-level metadata is also possible, but to be processed independently by each format
  • Every format is free to interpret the metadata as they want
  • An ipymd Markdown cell has a new field metadata = {} which is just a Python dictionary

Examples of use-cases

  • Specifying a specific ODF style in a Markdown cell
  • Force rendering a code block as Markdown instead of a notebook cell
  • Skip particular code lines in a given format (better to deal with this directly in a format reader/writer)

How it works

  • Consider the body of a Markdown
  • If the first line contains ipymd-[a-zA-Z]+, it defines the cell's metadata
  • There can be several ipymd-foo or ipymd-foo=bar metadata fields on the line
  • ipymd-foo means metadata['foo'] = True
  • ipymd-foo=3 means metadata['foo'] = 3
  • If a format chooses to process some metadata, it must ensure that the information is not lost during the conversion, and it must implement two-way transformation of metadata. For example, an ODFReader must recreate the metadata from the cell's ODF style.
@bollwyvl
Copy link
Contributor

How about adopting the jekyll front-matter approach, and use some embedded YAML? I'd love to see a text-forward notebook UI (editor) that maintained nbformat metadata... in fact, this would be far easier than the current metadata UI!

An object between === could denote the notebook metadata (which could appear anywhere) while an object between --- could denote cell metadata. Of course, other parsing would give a nice default value for the cell_type and, say, nbformat, so it would be more of an metadata.update(**parsed_yaml). This would keep the base document really lean and readable.

I haven't had a chance to dig into the code, but would love to take a whack at throwing this together.

---
slideshow:
  cell_type: slide
---

# A Slideshow

---
slideshow:
  cell_type: fragment
---
```python
    print("Brought to you by ipymd")
\```
===
name: a slideshow
===

@rossant
Copy link
Owner Author

rossant commented May 29, 2015

it would be great to have metadata support with a relatively standard syntax

see also http://rmarkdown.rstudio.com/ and https://github.com/chronitis/ipyrmd

let me know if you're going to work on the code -- i might have to merge my pending PR first...

@bollwyvl
Copy link
Contributor

No Intent To Implement yet! Just was reminded of this in another
discussion. Will check out the other links!

On 06:22, Fri, May 29, 2015 Cyrille Rossant notifications@github.com
wrote:

it would be great to have metadata support with a relatively standard
syntax

see also http://rmarkdown.rstudio.com/ and
https://github.com/chronitis/ipyrmd

let me know if you're going to work on the code -- i might have to merge
my pending PR first...


Reply to this email directly or view it on GitHub
#38 (comment).

@rossant
Copy link
Owner Author

rossant commented Jun 15, 2015

Here are a few ideas:

  • Full support for notebook and cell metadata with YAML. There is 1 to 1 correspondance between metadata JSON in the notebook and metadata YAML in Markdown
  • ipymd-specific metadata belong to the ipymd namespace (ipymd. prefix) -- dotted names are automatically replaced by nested dicts (?)
  • Notebook metadata:
===
ipymd.key1: value1
ipymd.key2:
    foo: bar
===
  • Cell metadata: idem but with ---
  • A cell metadata applies to the next cell
  • Metadata aliases: --- MY_ALIAS is replaced by a full metadata tree during a preprocessing stage
  • There are predefined aliases like IMPORT (see below)
  • Further aliases can be defined in the notebook metadata. Here is how the IMPORT alias would be defined (but in fact, this one would be a default alias, so no need to redefine it):
===
ipymd.aliases:
    - name: IMPORT
      replace:
        ipymd.import:
            name: $1
===

This means that --- IMPORT myfile.md will be replaced by:

  ---
  ipymd.import:
    name: myfile.md   
  ---
  • The ipymd.import metadata inserts the specified file in a preprocessing stage

@bollwyvl
Copy link
Contributor

Ahhhhh. Looking very promising!

aml will do . nested keys, but YAML won't... But i'd still stick with YAML. Whitespace nesting is probably good enough for most things, and one can fall back to JSON {} if you want it on one line.

I like the alias as a generalization/realization of the --- notation. JSON Patch might be up to the task, though it looks like the python implementation doesn't support inversion... yet! Indeed. Using the Patch might be better than re-inventing something new, even if it adds a dependency.

For roundtrip to work, you'd want to also remember that you used an alias, and its arguments...

--- SLIDE

secretly generates this metadata:

{"metadata": {
    "ipymd": {"aliased": {"SLIDE": []}}},
    "slideshow": {"slide_type": "slide"}
}}

the stack of changes for md -> ipnb sounds like

  • start with {}
  • patch with any explicit, long-form meta
  • patch with any aliases (and the implicit patch of remembering the alias)

reversed for ipynb -> md:

  • strip the "cruft": collapsed, trusted, prompt
  • invert the alias changes
  • collapse empty trees
  • if the result isn't {}, store that as explicit metadata

Shortcut/args being space delimited suggests you just get one, which is probably fine... the case where i would want more shortcuts is for slides: could one import n cells, and update their structure, i.e. load some slides as subslides? What happens when your import hits another import?

--- SUBSLIDE IMPORT other.md
---

Could just accept that an alias is really a function, and use named params...

--- SUBSLIDE IMPORT(other.md)
---

...which would then leave space for some kind of query language to pull out cells (in this example, the filter would be wrapped in $.cells[<expr>]:

--- SLIDE
# Title

--- SLIDE
# Recap...

--- SUBSLIDE IMPORT(part1.md, `this`.length-1)

--- SLIDE
# New stuff...
...

-- SLIDE IMPORT(common.md, ?(`this`.metadata.id=contact))

Could the alias definition be more namespace-y? Can't have multiple aliases to the same thing, and if someone wants to overload it, it shouldn't require looking through names.

Are document-level aliases are possible? I don't know what they would be... but the metadata regime is totally different, and i wouldn't want to mix them.

Combining all of these ideas, it's reasonably compact and standards-compliant, and should provide enough room to grow:

===
ipymd:
  alias:
    cell: 
      SLIDE: [{op: add, path: /slideshow/slide_type, value: slide}]
      SKIP: [{op: add, path: /slideshow/slide_type, value: skip}]
      NOTES: [{op: add, path: /slideshow/slide_type, value: notes}]
      SUBSLIDE: [{op: add, path: /slideshow/slide_type, value: subslide}]
      FRAGMENT: [{op: add, path: /slideshow/slide_type, value: fragment}]
      IMPORT($path): [{op: add, path: /ipymd/import, value: $path}]
===

ping @tonyfast

@rossant
Copy link
Owner Author

rossant commented Jun 15, 2015

But i'd still stick with YAML

agreed

Are document-level aliases are possible? I don't know what they would be... but the metadata regime is totally different, and i wouldn't want to mix them.

agreed

What happens when your import hits another import?

the import() alias function could implement recursive import, so there would be no recursivity at the level of the generic alias system

Questions:

  • Is it not too weird to mix JSON and YAML? (btw it looks like github renders YAML metadata in Markdown documents)
  • The query language idea is interesting but we might want to leave it for later. At the very least we should have a modular architecture for cell metadata that would allow users to write their own custom behavior.
  • Maybe lowercase is better for aliases...
  • I love the JSON patch idea for aliases! It's very neat. So, to summarize:
    • Simple static YAML for cell metadata (applied to the next cell)
    • Aliases let you set/update cell metadata dynamically (applied to the next cell). An alias may be defined internally, in the notebook metadata, or in a configuration file. It can be a function alias_name(cell, *args, **kwargs) (where args and kwargs are those appearing in --- alias(*args, **kwargs)), or a JSONPatch applied to the cell metadata.

@bollwyvl
Copy link
Contributor

Hopefully we're not creating too much of a monster :)

Are document-level aliases possible?

Had a duh moment: the obvious use case is a kernel name:

=== KERNEL(python)

this would go out and grab the whole kernelspec, which nobody wants to type by hand.

  • Is it not too weird to mix JSON and YAML? (btw it looks like github renders YAML metadata in Markdown documents)

JSON is a strict subset of YAML :) YAML even inherits the duplicate keys problem that ijson intends to fix. Also, the gh rendering only works for the first chunk of meta, a la jekyll, and everything else thinks you're making a heading. gist thinks everything is a heading.

If you were to indent everything as code, the rendering is better on gist, but wouldn't work with the gh custom rendering. no big loss, i say.

If i was doing a lot of tricky meta editing, I might even choose to explicitly use ticks and declare yml for syntax highlighting and linting... either way, we are proposing some stuff fairly incompatible with editors: --- SOMEALIAS looks pretty ugly, and would certainly be ignored by gh.

We don't want to get into the syntax-highlighting-package-business if we don't have to.

  • The query language idea is interesting but we might want to leave it for later. At the very least we should have a modular architecture for cell metadata that would allow users to write their own custom behavior.

I suppose for reuse, if you want to reuse a little bit, you pull it out of the original file, and import it from both places. much cleaner. but someday...

  • Maybe lowercase is better for aliases...

sure, was just working off the showoff notation. I kinda like it, because it would be easier to search and not get false positives.

I love the JSON patch idea for aliases

hooray! i suppose it wouldn't be insane to actually set the scope of the thing to be the whole cell instead of just the meta... i can think of horrible, dirty things like template execution resulting in markdown, or, horror of horror, code (thanks, @tonyfast, for putting these thoughts in my head).

--- JINJA(data=http://foo.bar/data.csv, body_part="brain")
My {{ body_part }} just exploded.
{% for line in data %}
- {{ line.text }}
{% endfor %}

Secretly, this would stash the whole template in metadata... and this cell, much like imported cells, would not be editable... or rather, the generated text would be discarded on roundtrip.

alias_name(cell, _args, *_kwargs)

Yeah, that's obvious now that you mention it:

--- SUBSLIDE IMPORT(slides.md, moreslides.md)

I might go a bit more verbose, if you're thinking magic names:

def alias_cell_<name>(cell, *args, **kwargs):
def alias_nb_<name>(nb, *args, **kwargs):

i wonder, if you IMPORT, do you get the aliases, defined/imported too? this would make the config file... just another file ./aliases.md, that one could maintain next to a big stack of notebooks. perhaps that is the meaning of import if done at the notebook, as opposed to cell, level.

would then all notebook meta come along? this would make it hard to overload, for example, a title or a theme (once slides support that). Perhaps the aliases can be used on either the opening or closing ---/===, which would control the resolution order.

Great stuff brewing here that answers the mail on a lot of long-standing issues with the usability of the notebook format itself. My colleagues are certainly excited by notebooks with sane PRs, and even more so for reuse. Wearing my nbviewer hat, I think one would still have to publish .ipynbs to get these rendered... but who knows!

@tonyfast
Copy link

I am having a little difficulty parsing all of this. I am going to offer my two cents as far as the choice of markdown and yaml go.

Use YAML always if a user will be entering their own keys and values. There are fewer mistakes than JSON. Also, the widespread adoption of things like Jekyll clearing indicate the ability for anyone to write JSON as YAML.

As for Markdown, I do not think that language or tool specific markdown flavors scale for the future.

RMarkdown is great for R users and the proposed syntax above may be great for notebook users.

The Ipython notebook's future applications are rapidly growing. Many syntaxes, languages, and kernels can be used in the Ipython notebook. If the notebook is treated as syntax/language agnostic then a conversion to/from markdown should be too.

Github Flavored Markdown is a proven language agnostic text document, see all the readmes. Github uses Github Flavored Markdown for syntax highlighting, but as a text document GFM indicates that a block of text that is following has a specific syntax.

In this issue y'all mentioned slides, templates, kernels, and metadata. Each feature has a very different application in practice:

  • Slides change use some Javascript to change HTML and CSS to create slides in the presentation view.
  • Templates help create views from variables in the kernel.
  • Kernels are important when the notebook is being used locally.
  • Metadata only matters if the readme is used as a notebook.

Very few users need all of these features.

I believe that GFM's success as a language agnostic document should guide any extension of the Ipython notebook. I have been tinkering with converting GFM to Ipython Notebooks. From a readme.md, each block of markdown and code is transformed into an appropriate notebook cell using some Java(Coffee)script to create a readme.ipynb. The fenced code block languages are passed as cell magics.

At this point, I believe anything in this issue can be described by GFM markdown.

  • Slides are triggered by a Javascript code fence
  • Templates are already rendered as HTML and Javascript.
  • Kernel information and Metadata can be included in a code as YAML or JSON.

This bl.ock shows some extensibility of the readme file where a Javascript Template tool passes YAML variables to the Markdown. One could imagine Reveal being used or the YAML being passed to the notebook metadata.

@bollwyvl
Copy link
Contributor

@tonyfast Sorry to bring you in without more preamble, Thanks for the feedback: we needed some of those words based on the stuff you linked and our lengthy discussions on previous topics.

You should definitely give ipymd a spin with pip: it's at heart a ContentsManager which replaces the stock FileContentsManager. It uses markdown as its storage mechanism, so you can round trip from md <-> json, editing wherever is appropriate at the time.

I have been advocating for notebooks-as-presentations for some time, but the approach has some shortcomings. Slides are where users are knowlingly manipulating at least cell-level metadata. Thus, one of the key drivers here is making slide editing and management really, really approachable, in the style of showoff, ioslides, etc.

I think we're in violent agreement that (GF)Markdown is teachable, mostly because it is readable but partially because it is diffable and mergeable to an extent beyond HTML and JSON. Basically, I (and others) have found that a directory of ipynb is not a long-term format for maintaining a family of docs, say a course, or a recurring meeting, or even the documentation for a decent-sized project. A "field reconsitutable" text representation that fully supports all features of the notebook format is the primary goal with the added goal of being more human-centric.

It is In this vein of user-centricity, that this whole alias discussion started. I would rather maintain and train:

--- SLIDE

over

```yaml
slideshow:
  slide_type: "slide"
```

though maybe there is some other GFM-compatible way to do the former...

  • HTML comments:

    <!-- SLIDE -->
  • jinja

    {{ SLIDE() }}

but each of these has its drawbacks.

Anyhow, to the management point: I really, really want to be able to reuse sanely-versionable content and not rely on hacking javascript. The IMPORT alias has tremendous potential, if we can figure it out properly :)

Kernel information and Metadata can be included in a code as YAML or JSON.

It becomes more out of control at the notebook level, even assuming we used the just-one-frontmatter approach. I want this:

--- KERNEL(python3)

vs

---
kernelspec: 
  display_name: "Python 3"
  language: "python"
  name: "python3"
language_info: 
  codemirror_mode: 
    name: "ipython"
    version: 3
  file_extension: ".py"
  mimetype: "text/x-python"
  name: "python"
  nbconvert_exporter: "python"
  pygments_lexer: "ipython3"
  version: "3.4.2"
---
  • Slides are triggered by a Javascript code fence

I don't think that the best long term solution: the source needs to be as far removed from the details of the presentation engine as possible. I suspect at some point another, more modular slide framework will show up that supports most of the same concepts as reveal (heck, throw prezi in, too) but has better modularity, and definitely more stable and flexible archival generation.

Further, if using IJavascript or ITorch, it would be difficult to determine the cell fenced blocks from the meta fenced blocks without introducing some other mime type... which would again defeat the editor's attempt to assist us.

  • Templates are already rendered as HTML and Javascript.

That jinja examples was of what a plugin to ipymd could do, given access to the whole cell and not just the metadata. Basically, what could you do with a cell/notebook-level preprocessor that could modify what you see in the Notebook?

@tonyfast
Copy link

Oh wow, I see where y'all are coming from now. Thanks for that description. I love the idea of the human-centric part. Largely, I have been focused on the ease of adoption and teaching, but you guys are at a much grander scale of interaction than I have been thinking. It'll take a bit to get on this level.

Regardless, the example syntaxes like

=== KERNEL(python)
--- JINJA(data=http://foo.bar/data.csv, body_part="brain")

look mightly similar to explicit and global tags in YAML. I have never applied them before, but they look analogous to some the suggestions above.

@bollwyvl
Copy link
Contributor

No worries! I didn't even think of all the extra stuff yaml just does. The
argument for using something native to the spec is strong.

It looks the tags take uris which in python resolve to callables. Ignoring
!, such that a user can still create their own, we could take !!. So a
!!slide could call alias_cell_slide. If they wrap, or list, i.e


!!import somefile.md
!!slide

other: data

And if it can do round trip, this would just about solve the issue, and
convince me that we don't want single line --- aliases.

Also, i like the ---/... Start and end of a meta block... Much better for
parsing. I wonder why the jekyll folk decided to use ---/---.

On 00:20, Tue, Jun 16, 2015 Tony Fast notifications@github.com wrote:

Oh wow, I see where y'all are coming from now. Thanks for that
description. I love the idea of the human-centric part. Largely, I have
been focused on the ease of adoption and teaching, but you guys are at a
much grander scale of interaction than I have been thinking. It'll take a
bit to get on this level.

Regardless, the example syntaxes like

=== KERNEL(python)
--- JINJA(data=http://foo.bar/data.csv, body_part="brain")

look mightly similar to explicit and global tags in YAML
http://www.yaml.org/spec/1.2/spec.html#id2761694. I have never applied
them before, but they look analogous to some the suggestions above.


Reply to this email directly or view it on GitHub
#38 (comment).

@tonyfast
Copy link

Do the YAML tags and Python types for Names, Classes, and Objects move you in the right direction?

There are two examples that look similar

!!python/object/new:module.Class [argument, ...]
!!python/object/apply:module.function [argument, ...]

@bollwyvl
Copy link
Contributor

Heh, had a look at some of that stuff: ended up with this:
http://nbviewer.ipython.org/gist/bollwyvl/1c5d5f1040515fb108e1

We DON'T want to use the python namespace, because it can Do Horrible Things. Registering to the application-specific ! namespace is fine for me, as long as we document it :)

Otherwise, aside from not fully understanding how to handle missing stuff in JSON Patch, it's certainly looking pretty good!

@rossant
Copy link
Owner Author

rossant commented Jul 15, 2015

closed by #62

@rossant rossant closed this as completed Jul 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants