Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the use of the capture shortcode (or: how to prepare for Goldmark) #20780

Closed
bep opened this issue May 5, 2020 · 19 comments
Closed

Remove the use of the capture shortcode (or: how to prepare for Goldmark) #20780

bep opened this issue May 5, 2020 · 19 comments

Comments

@bep
Copy link
Contributor

bep commented May 5, 2020

@zacharysarah asked me to do some investigative work figuring out if it would be possible/how hard to get this site up and running with Goldmark/latest Hugo (i.e. CommonMark compliant).

A lot of work on this has already been done in #19907.

I diffed a sub-set (English content, no blog) using stripped down templates to make it easier to see some patterns. A very common pattern was HTML blocks treated as Markdown, so I created this yuin/goldmark#125 -- which is closed (I agree about that).

So, to use HTML blocks in Markdown and still be CommonMark compliant, we would either have to:

  • Remove all blank newlines in these blocks.
  • Or wrap it in a {{< wrap >}} shortcode or something, so Goldmark never sees it.

Both of the above could probably be easily scripted (esp. the last part), had it now been for the indirection of the {{% capture %}} shortcode, which produces Markdown with template includes etc.

Of the 365 different files (out of 667 samples), 226 of them use the {{% capture %}} shortcode.

A simple example:

{{% capture prerequisites %}}
## Hello
{{< version-check >}}
{{% /capture %}}

The inner {{< version-check >}} uses {{< which marks it as (usually) raw HTML. But since it's included in a {{% block, its content will be passed to Goldmark.

A long story short, to solve this issue I think we either need to

  1. Write a parser extension to Goldmark that treats HTML blocks differently. I'm not sure how easy this is; "easy parsing" was probably the motivation for the current very simple rule.
  2. Remove the use of the capture shortcode so we can get better control on this from the Markdown file itself.

The {{% capture %}} templating was ported by me from the old Jekyll site. I suspect its biggest value is that it sets a common structure for disposition of guides and avoids some repetition. But:

  • It makes the Markdown very unportable, even when browsing on GitHub.
  • It slows down the build significantly
  • A common structure could probably be achieved using Hugo Archetypes
  • This construction is no longer needed to make shortcodes be part of the ToC (as long as you use {{% brackets, they will be part of the ToC).

I think it should be possible to temporarily patch Hugo to make it do a very shallow render that expands this capture one level and writes the files to disk. It would be a big change set, but it would be a correct one.

@sftim
Copy link
Contributor

sftim commented May 5, 2020

A side effect of the {{% capture ... %}} shortcodes is that it adds high level structure and helps reviewers see what is present / missing in a PR.

I'm OK with losing that. Instead, that kind of check is more of a job for a linter or similar automated process. Archetypes also sound like a helpful approach.

We could still use a shortcode for the section titles? Eg

## {{< heading-whatsnext >}}

Example future lints:

  • if you use the version-check shortcode, warn when min-kubernetes-server-version isn't set
  • if you create a task, warn if it doesn't have a prerequisites section
  • if you create a page that ends with an unordered list where each list item contains a hyperlink, warn if that part of the page isn't headed with {{< heading-whatsnext >}}

@bep
Copy link
Contributor Author

bep commented May 5, 2020

@sftim you would still be able to use all the shortcodes you want, it's just that very special capture magic.

There could be a possibility to change these capture blocks with "something else", e.g. separate them into different Markdown-files, e.g. prerequisites.md and whats-next.md which server the same purpose, but without the troubles.

@sftim
Copy link
Contributor

sftim commented May 5, 2020

temporarily patch Hugo to make it do a very shallow render that expands this capture one level and writes the files to disk

Sounds good to me.

@kbhawkey
Copy link
Contributor

kbhawkey commented May 5, 2020

@bep, Are you referring to:
blocks.html and block.html? If so, seems like a good thing.
This construction is no longer needed to make shortcodes be part of the ToC (as long as you use {{% brackets, they will be part of the ToC).

I don't fully understand what very shallow render that expands this capture one level?
Can you be more explicit?

I think it should be possible to temporarily patch Hugo to make it do a very shallow render that expands this capture one level and writes the files to disk. It would be a big change set, but it would be a correct one.

@bep
Copy link
Contributor Author

bep commented May 5, 2020

@kbhawkey not sure what you mean by block.html -- but my re. my ToC reference. Back when I ported this site from Jekyll to Hugo, the shortcodes vs Markdown behaved a little differently (mostly because how it was implemented).

So, if you had Markdown that looked linke this:


## A title

{{% some-shortcode %}}
### Another title
{{% /some-shortcode %}}

In "earlier days" the "Another title" above would not be part of the TableOfContents produces unless you did some trickery.

I think it should be possible to temporarily patch Hugo

That would be something that I would help out with and would be a one time hack that would make this task practical (it would be a gigantic job if done manually) and, if done correctly, guaranteed to produce the same output as you have today. In short it will inline all the Markdown, but keep the shortcodes as-is.

@sftim
Copy link
Contributor

sftim commented May 5, 2020

So, use code from Hugo to generate what we think the files should have looked like in the first place (in an ideal world). Does that summarize that part of your proposal @bep?

@bep
Copy link
Contributor Author

bep commented May 5, 2020

@sftim I guess so, but it depends a little on how much you like the current capture setup.

@kbhawkey
Copy link
Contributor

kbhawkey commented May 5, 2020

I admit that I don't fully understand the details of the proposal (need to think on this a bit more).
The capture statements define the structure (required) for a given class of page (concept, task, tutorial).
For instance, concept template,

{{ partial "templates/block" (dict "page" .page "block" "overview" "purpose" "states, in one or two sentences, the purpose of this document") }}

{{ .ctx.Scratch.Set "blocks" slice }}
{{ .ctx.Scratch.Add "blocks" (dict "page" .page "block" "body" "purpose" "supplies the body of the page content.")  }}
{{ .ctx.Scratch.Add "blocks" (dict "content" .page.Content) }}
{{ .ctx.Scratch.Add "blocks" (dict "page" .page "block" "whatsnext" "heading" (i18n "whatsnext_heading") "optional" true )  }}
{{ partial "templates/blocks" . }}

Without knowing more details, I typically favor non-custom solutions.
Do we need the capture statements or what is the alternative to achieve similar templating?

@kbhawkey
Copy link
Contributor

kbhawkey commented May 5, 2020

As mentioned above, we could remove the requirement of capture statements and instead provide templates for some of the page-specific H2 level headings.
The inline, page TOC (H2 level headings) goes away with the docsy theme (becomes right hand column TOC).
So as not to lose some form of page structure, add linting of the required page headings (capture statements do this already). Would a render hook template work here or can the error be caught earlier in the process?
Linting: Check per page template type. If a given H2 level heading is not
found on the page, fail the build and show how to fix the problem (more than the current capture statement error handling).
For example, possible required H2 level headings:

  • Task pages: prerequisites
  • Concept pages: overview, what's next
  • Tutorial pages: objective and prerequisites
  • Tool/reference pages: synopsis, options

@sftim
Copy link
Contributor

sftim commented May 6, 2020

Combining a few ideas, the linter could check for:

## {{< heading-whatsnext >}}

rather than

## What's next

and this helps with localization.

@bep
Copy link
Contributor Author

bep commented May 6, 2020

I would recommend you do:

## {{% heading-whatsnext %}}

The % makes sure that Goldmark sees the expanded content (for TableOfContents etc.)

@sftim
Copy link
Contributor

sftim commented May 6, 2020

I also like the idea of using render hooks to catch problems. I don't know if it's a good idea / there's a better way.

@kbhawkey
Copy link
Contributor

kbhawkey commented May 7, 2020

Archetypes could help with new content creation.
The current page templates provide consistency/structure/maintainability.
Here are some more ideas:

  • Checking for specific H2 level headings does not cover the cases of enforcing structure where previously there was no heading generated (post parse check).
  • Option: Hugo/parser validates the set of ordered headings/strings (add shortcodes to page) and wordCount() between page sections. Cleans up/removes headings that should not be displayed.
  • Option: Move the validation and cleanup of "structure" headings to a post processing step.
    Validate set of ordered page headings (+ wordCount()) and remove (or replace as comments) headings that should not be displayed. Depends on page template configuration (heading1: structure, heading2: display, ...).

@bep, Any thoughts?

@sftim
Copy link
Contributor

sftim commented May 14, 2020

I've seen a few issues (see #20942) that look hard to solve without either a significant rewrite or removal of the problematic capture shortcode.

@bep
Copy link
Contributor Author

bep commented May 14, 2020

@kbhawkey as to validating the structure without capture I suggest you enforce this using header IDs, and do that in a language-agnostic way, e.g.:

### What's next {#whats-next}

There are some ongoing work creating a render hook for headings in Hugo, see gohugoio/hugo#7133 (comment) -- I have had a little problem seeing "if it's worth it", but I can certainly see a value here. It would be an easy way to validate content structure while building.

I have been a little busy with other work lately, but I'll cc @zacharysarah on this issue; I still think my initial post sums it up fairly good.

@kbhawkey
Copy link
Contributor

I'm finding that the parser will not add an H2 heading from a shortcode
into the TOC.

Here is an example shortcode that gets called as:
{{% myshortcode_heading %}}

{{- $id := T "myshortcode_heading" -}}

<h2 id="{{ $id | anchorize }}">{{- $id  -}}</h2>

<!-- This works standalone -->
<!-- {{- T "myshortcode_heading" -}} -->

Instead, this works (when the shortcode just returns a string):
## {{% myshortcode_heading %}}

I'd rather have the entire statement wrapped in the shortcode so that authors can omit the
opening ##.

@sftim
Copy link
Contributor

sftim commented Jun 6, 2020

PR #21359 is working towards fixing this

@sftim
Copy link
Contributor

sftim commented Jun 10, 2020

Fixed in #21359
/close

@k8s-ci-robot
Copy link
Contributor

@sftim: Closing this issue.

In response to this:

Fixed in #21359
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants