Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension to treat first heading level as title? #5615

Closed
DanielSWolf opened this issue Jun 23, 2019 · 33 comments
Closed

Extension to treat first heading level as title? #5615

DanielSWolf opened this issue Jun 23, 2019 · 33 comments

Comments

@DanielSWolf
Copy link

DanielSWolf commented Jun 23, 2019

I'm using Pandoc 2.7.3 to convert Markdown to AsciiDoc. In the output file, the level of every heading is 1 deeper than expected.

README.md:

# Title

## Level 2

Invocation:

pandoc -o README.adoc README.md

Output (README.adoc):

== Title

=== Level 2

Expected output:

= Title

== Level 2
@jgm
Copy link
Owner

jgm commented Jun 23, 2019

The = is reserved for the document title, provided in metadata.
This is standard in asciidoc; see e.g. https://raw.githubusercontent.com/asciidoctor/asciidoctor.org/master/docs/asciidoc-syntax-quick-reference.adoc as an example.

@jgm jgm closed this as completed Jun 23, 2019
@jgm
Copy link
Owner

jgm commented Jun 23, 2019

Use this form:

---
title: Title
...

# First section
 

@DanielSWolf
Copy link
Author

I'm not sure that I got my problem across. I realize that in AsciiDoc, = is the document title. The thing is that Markdown has no comparable syntax for defining the document title. As far as I know, the syntax you mention in your previous comment isn't standard Markdown. So in practice, # is often used in Markdown documents for the document title, then ##, ###, etc. are used for headings -- just like in AsciiDoc.

I have a large number of Markdown files that use this approach and that I'd like to convert to AsciiDoc. So I need to convert # to =, ## to ==, and so on. Is there a way to achieve this?

@mb21
Copy link
Collaborator

mb21 commented Jun 24, 2019

Well, pandoc uses pandoc flavoured markdown...

I was thinking you can could write a lua-filter to decrement all header levels by one, similar to this example: https://pandoc.org/lua-filters.html#modifying-pandocs-manual.txt-for-man-pages, but this causes the current pandoc version to crash on Prelude.init: empty list somewhere in the asciidoc writer.

You could change this line and compile pandoc yourself...

@mb21
Copy link
Collaborator

mb21 commented Jun 24, 2019

Specifically, the hierarchicalizeWithIds function in Pandoc.Shared breaks...

@mb21
Copy link
Collaborator

mb21 commented Jun 24, 2019

Or write a filter that decrements all heading levels by one, except for the heading 1, which should be set as the document meta data title.

@tarleb
Copy link
Collaborator

tarleb commented Jun 24, 2019

Relevant StackOverflow question and answer: Pandoc: set document title to first title

@agusmba
Copy link
Contributor

agusmba commented Jun 24, 2019

@tarleb you're a lua wizard! I just grabbed another one of yours for numbered-chapter-reference (not enough karma for my votes to show yet)

@jgm
Copy link
Owner

jgm commented Jun 25, 2019

Maybe we need a markdown extension that treats a unique level-one header as the metadata title, and promotes all other headers. Something like +top_heading_as_title.

@jgm jgm reopened this Jun 25, 2019
@mb21
Copy link
Collaborator

mb21 commented Jun 26, 2019

That would solve it for markdown.. but what about reading other formats like HTML?

@jgm
Copy link
Owner

jgm commented Jun 26, 2019

We could make the extension affect HTML as well. I don't know if there are other formats that use this convention for indicating titles.

@jgm jgm changed the title Heading levels are changed Extension to treat first heading level as title? Jul 20, 2019
@mb21
Copy link
Collaborator

mb21 commented Aug 30, 2019

This issue is more vexing to me than expected...

It seems to me now that what we need is an extension that treats a unique level-one header as the metadata title, and does nothing to the other headers. Let me explain...

This markdown:

---
title: Title
---

## Level 2

converted with pandoc -s, results in this HTML structure:

<head>
  <title>Title</title>
</head>
<body>
  <header>
    <h1 class="title">Title</h1>
  </header>
  <h2>Level 2</h2>
</body>

which is usually what you want: one h1, one h2.

Now to the asciidoc and html readers:

If you're converting from pandoc markdown (same md as above):

```
% pandoc -t asciidoc -s
---
title: foo
---

## bar
^D
= foo

== bar
```

If you're converting from markdown with a different header convention (OP's md):

```
% pandoc -f markdown+top_heading_as_title -t asciidoc -s
# foo

## bar
^D
= foo

== bar
```


Similarly from HTML; currently we have:

```
% pandoc -f html -t markdown --atx-headers -s
<html>
<head>
  <title>foo</title>
</head>
<body>
  <h1>foo2</h1>
  <h2>bar</h2>
</body>
</html>
^D
---
title: foo
---

# foo2

## bar
```

But what you usually want when converting a website to pandoc markdown is:

```
% pandoc -f html+top_heading_as_title -t markdown --atx-headers -s
<html>
<head>
  <title>foo</title>
</head>
<body>
  <h1>foo2</h1>
  <h2>bar</h2>
</body>
</html>
^D
---
title: foo2
---

## bar
```

Does that make sense?

@jgm
Copy link
Owner

jgm commented Aug 30, 2019

it makes sense, but that doesn't mean we should parse h2 -> Header 2 in this case.

Converting HTML -> LaTeX, with this style of HTML, you'd generally want the h2's to convert to \section, not \subsection. So they'd need to be Header 1 in the AST. The same is true for many other formats pandoc supports.

One could try to address this at the HTML writer level. When the special extension or option is set, Header 1 renders as h2.

So, to summarize:

  • with -f html+top_heading_as_title, h1 goes to the metadata title and h2 goes to a Header 1.
  • with -t html+top_heading_as_title, metadata title goes to h1 and Header 1 goes to h2.

@mb21
Copy link
Collaborator

mb21 commented Aug 31, 2019

Ah, considering LaTeX output is indeed interesting.

I had the impression now that the recommend md for html export is the following (otherwise you get two <h1> with -s, which is usually not recommended):

---
title: foo
---

## bar

But you're saying that for LaTeX export you'd usually want the following?

---
title: foo
---

# bar

@jgm
Copy link
Owner

jgm commented Aug 31, 2019

But you're saying that for LaTeX export you'd usually want the following?

Yes, for LaTeX and most other formats. With the ## you'd get subsections numbered 0.1, 0.2, etc. I also produce HTML this way (it's the default for pandoc), but I understand this has drawbacks.

Relevant old issue: #686.

@mb21
Copy link
Collaborator

mb21 commented Sep 8, 2019

I've been thinking about this some more and made peace with the fact that depending on what you're doing you'll always want to adjust your heading levels. Might depend on the output format you're converting to, or more importantly on where in any existing hierarchy your piece of text will fit into: say an existing website that uses <h1> for its logo text, or maybe you're writing a book and converting individual md files one by one to HTML pages (each should have an h1), but you concatenate all your md files when going to LaTeX.

So there will always be cases where you should simply make use of the --base-header-level option. Two things:

  • I'm not sure --base-header-level is the most intuitive name. It kind of leaves open what happens to all headings that are not at the 'base level'. What about something like --shift-heading-levels-by with the default value being 0?
  • Wouldn't it be nice if the option could also take negative values? If you would do --shift-heading-levels-by=-1 the first heading with level 1 would be set as the metadata title, and we wouldn't need yet another option/extension. I suppose other headings could be dropped, as well as headings that end up having a level <= 0.

@jgm
Copy link
Owner

jgm commented Sep 9, 2019

Interesting idea. But isn't it a bit odd if

  • shift by -1 makes a level-1 heading the metadata title

unless also

  • shift by +1 makes metadata title a level-1 heading ?

This latter would definitely be a change to how --base-header-level=2 currently works.

EDIT: Even so I'm pretty positively disposed to this idea. I believe people have requested negative heading level shifts before. (See #4342)

@mb21
Copy link
Collaborator

mb21 commented Sep 9, 2019

shift by +1 makes metadata title a level-1 heading

yes, I think that would be useful in some rare cases as well.

I suppose --base-header-level should be deprecated then...

@jgm jgm added this to the 2.8 milestone Sep 9, 2019
@jgm jgm closed this as completed in 88dc6fa Sep 11, 2019
@mb21
Copy link
Collaborator

mb21 commented Sep 11, 2019

🎉

@brainchild0
Copy link

brainchild0 commented Sep 20, 2019

I think it makes sense that shifting between metadata and level-1 heading occurs in both directions. I don't find a compelling counter-example. But if the effect applies to all inputs after concatenation, then can the user not provide a document title that doesn't get demoted to a mere header?

Sometimes users split a book into files. Or a book may constitute a compilation of articles from sources written originally for standalone publication.

For example, consider inputs:

---
title:  Book Title
---
---
title: Beginning
---

In the beginning...
---
title: Ending
---

At the end...

The intention might be to represent:

---
title:  Book Title
---

# Beginning

In the beginning...

# Ending

At the end...

Could the global metadata input be protected from demotion? Could shifting be selected at the granularity of individual inputs, and applied before concatenation?

And would a title be handled differently if coming from the metadata given on the command line versus a YAML source? I wouldn't suggest a solution in which giving a title on the command line is the only way to protect it from demotion.

@jgm
Copy link
Owner

jgm commented Sep 20, 2019

@brainchild0 see --file-scope.

@brainchild0
Copy link

Would this switch guard any single input file from transformations that would apply to others? I am not seeing any indication of such in the manual.

@jgm
Copy link
Owner

jgm commented Sep 20, 2019

I don't understand the question. (But I recommend you experiment to find out.)

@brainchild0
Copy link

Following are the best experiments I can do currently:

Looking at my first post in the issue, notice a sequence of three examples of file contents following the line beginning with "For example". I place them in a.md, b.md, and c.md. The example following "The intention" I place in x.md.

Now I try:

$ pandoc x.md 
<h1 id="beginning">Beginning</h1>
<p>In the beginning…</p>
<h1 id="ending">Ending</h1>
<p>At the end…</p>

The files a.md, b.md, and c.md represent the idea that file x.md is being decomposed into parts. Since the latter two files represent chapters in the form of stanadalone documents, the level-1 headings are represented as the document titles.

The idea expressed in that post was that it might be useful if I could run a command using these three files, but is equivalent to pandoc x.md. This involves shifting b.md and c.md to the right one level, but "guarding" a.md.

With the current options, I think it is impossible. The closest approximation would be

$ pandoc a.md b.md c.md --shift-heading=1
<h1>Book Title</h1>
<p>In the beginning…</p>
<p>At the end…</p>

Actually, the result currently is that the chapter titles are dropped, because only one title may be recognized for the document.

With --file-scope there is no difference:

$ pandoc a.md b.md c.md  --shift-heading=1  --file-scope
<h1>Book Title</h1>
<p>In the beginning…</p>
<p>At the end…</p>

What would be needed is a way to shift the contents of b.md and c.md to right one level, such that the titles are demoted to level-1 headings, while a.md is "guarded", providing the actual title of the document.

At first it may seem like an unusual case, but I think probably not so. It seems that currently any positive shift value prevents the input stream from giving any data that is used for the document title of the output.

@jgm
Copy link
Owner

jgm commented Sep 20, 2019

What would be needed is a way to shift the contents of b.md and c.md to right one level, such that the titles are demoted to level-1 headings, while a.md is "guarded", providing the actual title of the document.

Correct, that can't currently be done.
It would be desirable to make this sort of thing possible using a combination of --file-scope and --metadata-file. But that combination doesn't seem to work the way I'd expect.

% cat m.yaml
title: my real title
% cat a.md
---
title: hi
...

# ok
% pandoc --file-scope --metadata-file m.yaml a.md -t native -s
Pandoc (Meta {unMeta = fromList [("title",MetaInlines [Str "hi"])]})
[Header 1 ("ok",[],[]) [Str "ok"]]
% pandoc --file-scope --metadata-file m.yaml a.md -t native -s --shift-heading=1
Pandoc (Meta {unMeta = fromList []})
[Header 1 ("",[],[]) [Str "hi"]
,Header 2 ("ok",[],[]) [Str "ok"]]

I'd find it more intuitive (and useful) if the heading-shift transformation was done before the metadata was integrated, so the metadata from m.yaml isn't clobbered. However, this should be a new issue.

@jgm
Copy link
Owner

jgm commented Sep 20, 2019

Actually on reflection, I'm not so sure about this.

@brainchild0
Copy link

brainchild0 commented Sep 20, 2019

I'd find it more intuitive (and useful) if the heading-shift transformation was done before the metadata was integrated, so the metadata from m.yaml isn't clobbered. However, this should be a new issue.

It's not obvious how to create a new issue that captures the immediate concerns without including the history.

The sequence of processing would seem to be close to the following:

  1. Collect each file not including the one (or ones) containing the global metadata.
  2. For each file collected as such:
    1. Interpret it, including the metadata within it.
    2. Apply any appropriate shift, including, in the case of a positive shift, changing the title from the metadata into a heading.
    3. (Discard other metadata... I assume... or not?)
  3. Concatenate the results.
  4. Apply the global metadata.

Which parts of this discussion, if any, would you want moved to a new issue, and which would you be less open to seriously considering at the current moment?

@brainchild0
Copy link

Also, not sure about a compelling use case, but if a left shift squashes several header levels into one, then the original level of each affected header in principle can be preserved in ancillary data, like XHTML data- elements or class tags (e.g. <h1 data-original-level="-1">).

@jgm
Copy link
Owner

jgm commented Dec 3, 2019

See #5957 for an unintended consequence of this change.

@jgm
Copy link
Owner

jgm commented Dec 5, 2019

I'm going to roll back:

shift by +1 makes metadata title a level-1 heading

This breaks some workflows that used to be supported with --base-heading-level (see #5957).

Also: suppose you want to render a document with both latex and html. IT would be natural to use level-1 headings. But in the HTML version you might want level-2 headings, since the title will be rendered as level-1. So you'd want to shift heading levels, without depopulating the title in metadata.

jgm added a commit that referenced this issue Dec 5, 2019
With positive heading shifts, starting in 2.8 this option caused
metadata titles to be removed and changed to regular headings.
This behavior is incompatible with the old behavior of
`--base-header-level` and breaks old workflows, so with this
commit we are rolling back this change.

Now, there is an asymmetry in positive and negative heading
level shifts:

+ With positive shifts, the metadata title stays the same and
  does not get changed to a heading in the body.
+ With negative shifts, a heading can be converted into the
  metadata title.

I think this is a desirable combination of features, despite
the asymmetry.  One might, e.g., want to have a document
with level-1 section headigs, but render it to HTML with
level-2 headings, retaining the metadata title (which pandoc
will render as a level-1 heading with the default template).

Closes #5957.
Revises #5615.
renjianxiongqi pushed a commit to renjianxiongqi/pandoc that referenced this issue Dec 23, 2019
With positive heading shifts, starting in 2.8 this option caused
metadata titles to be removed and changed to regular headings.
This behavior is incompatible with the old behavior of
`--base-header-level` and breaks old workflows, so with this
commit we are rolling back this change.

Now, there is an asymmetry in positive and negative heading
level shifts:

+ With positive shifts, the metadata title stays the same and
  does not get changed to a heading in the body.
+ With negative shifts, a heading can be converted into the
  metadata title.

I think this is a desirable combination of features, despite
the asymmetry.  One might, e.g., want to have a document
with level-1 section headigs, but render it to HTML with
level-2 headings, retaining the metadata title (which pandoc
will render as a level-1 heading with the default template).

Closes jgm#5957.
Revises jgm#5615.
@rauschma
Copy link

rauschma commented Aug 19, 2021

I use --shift-heading-level-by=-1 here:

# Title of document
## First section
## Second section

I love that the title of the document is now specified via # – it’s more “WYSIWYG” in that the output looks closer to common Markdown previews (Visual Studio Code etc.).

The title is displayed via <h1>. Alas, the sections are also <h1>, which seems wrong. If I wrote the output HTML by hand, section titles would be <h2>.

@jgm
Copy link
Owner

jgm commented Aug 19, 2021

Given the recommendation only to use one h1 element per page, it would make a certain amount of sense to map a pandoc Header 1 to HTML <h2>, Header 2 to <h3>, etc., reserving <h1> for the title if any.
You could raise this for discussion on pandoc-discuss. It would be a pretty big change in behavior.

@the-solipsist
Copy link
Contributor

the-solipsist commented Sep 7, 2023

@rauschma's use case (which happens to be mine) can be solved by using this lua filter (adapted with a single changed line from one that @tarleb posted in 2019). This filter will make all # headings into <title> headings with <h1>, but will keep all lower-level headings (##, etc.) as they are.

local title

-- Set title from level 1 header, or
-- discard level 1 header if title is already set.
function make_header1_title (header)

  if header.level >= 2 then
    return header
  end

  if not title then
    title = header.content
    return {}
  end

  local msg = '[WARNING] title already set; discarding header "%s"\n'
  io.stderr:write(msg:format(pandoc.utils.stringify(header)))
  return {}
end

return {
  {Meta = function (meta) title = meta.title end}, -- init title
  {Header = make_header1_title},
  {Meta = function (meta) meta.title = title; return meta end}, -- set title
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants