feat!: metadata blocks. Fixes #580 #641

Martin1887 · 2023-04-30T10:22:24Z

YAML-style and pluses-delimited metadata blocks are supported, each one in a different Options flag. This is a breaking change because a new Tag enum variant is added.

Fixes: #580

YAML-style and pluses-delimited metadata blocks are supported, each one in a different `Options` flag. This is a breaking change because a new `Tag` enum variant is added. Fixes: pulldown-cmark#580

raphlinus · 2023-05-03T03:33:10Z

I am inclined against this pull request, as it feels like an expansion of scope of the crate. The biggest question I have is why the functionality shouldn't be provided in another crate; there's already the existing cmark-frontmatter and gray_matter projects. A secondary concern is that front matter is not standardized to anywhere the same degree as commonmark itself. In particular, I'm not aware of any standardized approach to metadata blocks inline to the document, nor do I yet understand the benefits of doing so.

Another approach to interleaving metadata and commonmark is to consider the outer document a "container" with metadata, commonmark text, and perhaps other content. That requires the ability to determine structure boundaries without necessarily doing detailed parsing of the text, but in my opinion that might be a good thing, avoiding what would otherwise be something of a layering violation.

I could perhaps be persuaded otherwise if there were other users arguing in favor of this approach.

Martin1887 · 2023-05-03T07:48:08Z

The main obstacle to use other crates is the support for multiple metadata blocks, as specified in the comment #580 (comment).

Existing crates only support front matter at the start of the document, but although they support multiple metadata blocks, it would be hard integrating the metadata blocks into the events gotten by pulldown-cmark. Note that the problem here is not getting the information contained in the metadata blocks, but having metadata blocks produces different parsing like h2 Setext headings, horizontal rules and h1 ATX headings (the last one in commented lines of metadata blocks).

Multiple metadata blocks are supported by Pandoc (https://pandoc.org/MANUAL.html#metadata-blocks) and not supporting them is the only error I have found to parse a valid Pandoc document. That is, other Pandoc features are obviously not tracked by this crate, but they are simply ignored meanwhile metadata blocks produce incorrect output.

Also note that, as I commented in the issue, pulldown-cmark may not be always used for final output, but as a Markdown parser it can be used for some preprocessing before other engines like Pandoc, mdBook, etc. I think that this pull request does not hurt pulldown-cmark increasing complexity nor breaking anything meanwhile it would add support for correctly parsing documents with metadata blocks, and always only if flags are enabled.

@wooorm, @jaythomas, and @joelparkerhenderson put a thumb up in my issue comment, so I guess they may be interested in this feature.

joelparkerhenderson · 2023-05-03T12:14:30Z

Yes I'm highly in favor of a solution because I experience the same Pandoc incompatibility decribed by @Martin1887.

At the same time, I've seen front matter done in such different ways, many of which are incompatible, so I definitely like the points of @raphlinus about layering, boundaries, architecture, and crate scope.

In my own similar kind of parser and processing, I do the boundary detection approach, and toss each chunk into a list of ad-hoc recognizers, such as for front matter that uses YAML, or TOML, or HTML, or JSON. It turns out that JSON is surprisingly useful for controlling other tooling, much as @Martin1887 discusses.

My opinion (for whatever it's worth) is to set a timebox for us all to think about layers/boundaries/helpers/etc., such as a timebox of one week. If there's a good practical implementable architecture idea by the end of the timebox, then do it, otherwise ship this PR because it's good and helpful for real-world work.

Martin1887 · 2023-05-07T08:10:36Z

I cannot found a fast and clean manner of implementing a layering approach and at the same time providing a clean interface to retrieve boundaries. Have you any ideas, @joelparkerhenderson?

In the future it would interesting to provide an interface to plug new rules with different priorities, multiple parsing stages and/or pre-processors and post-processors interfaces.

On the other hand, do you need a new release soon? My idea was creating a big release with all the breaking changes to reduce the number of breaking releases, but probably there are too many urgent issues to do that, and breaking releases in 0.x stage are not so rare.

Thanks.

joelparkerhenderson · 2023-05-08T06:00:33Z

Here's how I did it in my own code... it's brute force done via a list of syntax-matcher-parser structs, that try to recognize front-matter versus normal markdown.

E.g. try a syntax-matcher-parser to see if there's a match of front-matter, and if so, parse it. Otherwise, try the next syntax-matcher-parser. Etc.

The purpose of this is to be able to handle a variety of front-matter syntaxes (e.g. HTML-like, JSON-like, YAML-like), then output a btree that is the configuration state of the document.

https://github.com/SixArm/sita-rust-crate/tree/main/src/matter

https://github.com/SixArm/sita-rust-crate/tree/main/src/state

Files:

* `matter/` - Markdown front matter and back matter files.
    * `matter_parser_enum.rs` - Matter parser enum (among BTMS, JSON, TOML, YAML).
    * `matter_parser_mutex.rs` - Matter parser mutex (among BTMS, JSON, TOML, YAML).
    * `matter_parser_trait.rs` - Matter parser trait (among `matter_parser_with_*`).
    * `matter_parser_with_btms.rs` - Matter parser implementation with BTMS (BTreeMap Struct).
    * `matter_parser_with_json.rs` - Matter parser implementation with JSON (JavaScript Object Notation).
    * `matter_parser_with_toml.rs` - Matter parser implementation with TOML (Tom's Obvious Minimal Language).
    * `matter_parser_with_yaml.rs` - Matter parser implementation with YAML (Yet Another Markup Language).
* `state/` - State that holds variables, such as front matter.
    * `state_enum.rs` - State enum (among BTMS, JSON, TOML, YAML).
    * `state_trait.rs` - State trait (among `state_with_*.rs`).
    * `state_with_btms.rs` - State implementation with BTMS (BTreeMap Struct).
    * `state_with_json.rs` - State implementation with JSON (JavaScript Object Notation).
    * `state_with_toml.rs` - State implementation with TOML (Tom's Obvious Minimal Language).
    * `state_with_yaml.rs` - State implementation with YAML (Yet Another Markup Language).

My guess is there's a better strategy.

I don't need a release soon.

Martin1887 · 2023-05-14T08:32:49Z

OK, I see you are using regexes to parse blocks, not the best nor the most correct strategy. A high level and user-friendly approach will need too much time, so I merge this pull request now and in a future version a best approach to handle layers will be designed.

feat!: metadata blocks

4cef04c

YAML-style and pluses-delimited metadata blocks are supported, each one in a different `Options` flag. This is a breaking change because a new `Tag` enum variant is added. Fixes: pulldown-cmark#580

Martin1887 added 2 commits May 14, 2023 10:35

Merge branch 'master' into metadata_blocks

a1714f2

fix: regenerate heading_attrs test file

73a82b4

Martin1887 merged commit c2a65da into pulldown-cmark:master May 14, 2023
1 check passed

vedant-pandey mentioned this pull request Jun 11, 2023

fmt is not compatible with some front matters in markdown files denoland/deno#19458

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: metadata blocks. Fixes #580 #641

feat!: metadata blocks. Fixes #580 #641

Martin1887 commented Apr 30, 2023

raphlinus commented May 3, 2023 •

edited

Martin1887 commented May 3, 2023

joelparkerhenderson commented May 3, 2023 •

edited

Martin1887 commented May 7, 2023

joelparkerhenderson commented May 8, 2023

Martin1887 commented May 14, 2023

feat!: metadata blocks. Fixes #580 #641

feat!: metadata blocks. Fixes #580 #641

Conversation

Martin1887 commented Apr 30, 2023

raphlinus commented May 3, 2023 • edited

Martin1887 commented May 3, 2023

joelparkerhenderson commented May 3, 2023 • edited

Martin1887 commented May 7, 2023

joelparkerhenderson commented May 8, 2023

Martin1887 commented May 14, 2023

raphlinus commented May 3, 2023 •

edited

joelparkerhenderson commented May 3, 2023 •

edited