Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting the title from jupyter notebooks (ipynb) #5905

Open
grst opened this issue Nov 14, 2019 · 7 comments
Open

Getting the title from jupyter notebooks (ipynb) #5905

grst opened this issue Nov 14, 2019 · 7 comments

Comments

@grst
Copy link

grst commented Nov 14, 2019

When I convert a ipynb file to html, I don't manage to get pandoc to find the title:

I tried

1) specifying the title in yaml format in a raw cell (jupytext style, see #5398).

{
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "---\n",
    "title: \"This is a title\"\n",
    "---"
   ]
  }

yaml_title.ipynb

2) specifying the title in the notebook-metadata:

 "metadata": {
  # [...] 
  "title": "Title from notebook metadata. "
 }

metadata_title.ipynb

running on plain markdown file works fine

---
title: "This is a title"
---

# Heading 1
Lorem ipsum dolor sit amet

yaml_title_md.md

Pandoc was run with --self-contained:

pandoc --self-contained metadata_title.ipynb -o html/metadata_title.html
pandoc --self-contained yaml_title.ipynb -o html/yaml_title.html
pandoc --self-contained yaml_title_md.md -o html/yaml_title_md.md.html

No html title is found for the jupyter notebooks
(the filename is used instead)

> grep -r "<title>" html
html/metadata_title.html:  <title>metadata_title</title>
html/yaml_title.html:  <title>yaml_title</title>
html/yaml_title_md.html:  <title>This is a title</title>

Availability:

Full example repo: https://github.com/grst/test_pandoc_title

Pandoc version

tested on the latest release and built from source as of 2019-11-13.

grst added a commit to grst/reportsrender that referenced this issue Nov 14, 2019
Getting the title from metadata cannot work for now  because of
jgm/pandoc#5905.
@jgm
Copy link
Owner

jgm commented Nov 14, 2019

Is there a standard way of representing the title in a notebook?

@grst
Copy link
Author

grst commented Nov 15, 2019

tbh, I don't think so. But reading out the notebook metadata would make sense (it's the equivalent of a yaml header in a markdown file.).

The yaml header in a raw cell at the top of the document is, afaik, a convention used by jupytext. It is arguably more convenient to edit than the notebook metadata, but probably not widely used outside jupytext.

Maybe @mwouts can tell us more about this?

@mwouts
Copy link

mwouts commented Nov 15, 2019

The yaml header is also used by pandoc itself, cf. this example.

I'd tend to add title at the root of the metadata like R Markdown does. But maybe pandoc could also look for a title at jupyter.title in case there is no title at the root?

@claudioperez
Copy link

claudioperez commented Sep 15, 2020

I'm currently working around this with the following Lua filter:

function Meta(m)
    if m.jupyter.meta then 
        for k,v in pairs(m.jupyter.meta) do
            m[k] = v
            m.jupyter.meta[k] = nil
        end
        m.jupyter.meta = nil
    end
    return m
end

This will extract fields from under a meta key in the notebooks metadata object.

So, for example, if you save this to a file called pandoc_ipynb_meta.lua and run

pandoc --lua-filter=pandoc_ipynb_meta.lua ...

on a notebook with the following metadata;

{
    "meta": {
        "title": "This is a title",
    },
    "kernelspec": {
        "language": "python"
    }
}

you will get a markdown file with the following header:

title: This is a title
jupyter:
    kernelspec:
        language: python

@mwouts
Copy link

mwouts commented Sep 15, 2020

Hi @claudioperez , to follow up on @grst 's earlier comment, in Jupytext we either

  1. display the root level metadata in a raw cell at the top of the Jupyter notebook (default)
  2. or, store the root level metadata in jupytext.root_level_metadata when the option root_level_metadata_as_raw_cell is false.

So 2. is similar to your approach, except that we store the title, etc at jupytext.root_level_metadata rather than meta.

The corresponding test is here (for a R Markdown file, but it works the same for a md file):
https://github.com/mwouts/jupytext/blob/1f355fec4fa3a29b6d6a78f00dee27b05017d613/tests/test_read_simple_rmd.py#L167-L218

@ickc
Copy link
Contributor

ickc commented Sep 2, 2021

I suggest changing the title of this issue to something like "ipynb metadata support", as this is general to any metadata, not only the title.

I think the ipynb-way is to just dump it as JSON in metadata field. I'm not certain the jupytext style should be supported, as it is making a special case to change the semantic of the (first?) raw cell.

@ickc
Copy link
Contributor

ickc commented Sep 2, 2021

FYI, I just wrote a filter where one of the thing it does is to convert the jupytext metadata block to native pandoc metadata block: https://ickc.github.io/pannb/api/pannb/#pannb.walk_and_convert_jupytext_metadata

Although I think using the JSON metadata field is more natural in the ipynb format, the workflow I need the filter to work uses jupytext so that's what I implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants