Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indented highlight shortcode brings in the indentation as part of the code block #4717

Closed
kaushalmodi opened this issue May 12, 2018 · 28 comments · Fixed by #9955
Closed

Indented highlight shortcode brings in the indentation as part of the code block #4717

kaushalmodi opened this issue May 12, 2018 · 28 comments · Fixed by #9955

Comments

@kaushalmodi
Copy link
Contributor

kaushalmodi commented May 12, 2018

  • When using the highlight shortcode for code blocks in lists, the indentation needed to have the code blocks indented gets added to the code block itself.
  • This does not happen when using code fences.

So the bug seems to be localized to Hugo and not Blackfriday.

Here is a test page demonstrating the bug:

In the same test page above, the first section has the same code blocks, but done using the code fences, where that extra indentation issue is not seen.

Another effect of not removing that indentation is that the indentation before the closing {{< /highlight >}} creates an extra blank line in the final HTML.

kaushalmodi added a commit to kaushalmodi/ox-hugo that referenced this issue May 13, 2018
@kaushalmodi kaushalmodi changed the title Code blocks using highlight shortcode in lists have extra indentation Code blocks using highlight shortcode in lists have extra indentation, and extra blank line May 13, 2018
@kaushalmodi
Copy link
Contributor Author

Updated the tests showing the bug with smaller/targeted test files.

@kaushalmodi
Copy link
Contributor Author

@bep can you please tag this as a bug (and hopefully see if an easy fix is possible for this?

@kaushalmodi
Copy link
Contributor Author

Can this be tagged as a Bug please? This issue prevents me from using highlight shortcode (and thus the line number prefixing using Chroma) in lists.

@kaushalmodi
Copy link
Contributor Author

@bep can you please add this issue to a near term milestone. This particular issue prevents me from using highlight shortcode in lists (which I seem to end up using a lot more than general, I think).

Also, I don't have any outside-hugo workaround for this.

Thanks.

@bep bep added this to the v0.53 milestone Dec 19, 2018
@bep bep added the Bug label Dec 19, 2018
@bep
Copy link
Member

bep commented Dec 20, 2018

Your shortcode variant shows the hightlighting as a table with line numbers, you can adjust the indentation with CSS, I guess, but there will be indentation ...

 {{< highlight emacs-lisp "linenos=table, linenostart=1" >}}

If you remove the "linenos=table", that should fix it.

@bep bep closed this as completed Dec 20, 2018
@kaushalmodi
Copy link
Contributor Author

you can adjust the indentation with CSS

The indentation is due to extra spaces in the pre block. And also the closing tag of the highlight shortcode introduces a blank line.

The indentation also increases with each nested list item. So it gets very tricky to fix with CSS (also not sure if it would be possible because then I need to make the CSS rules apply only on the badly indented blocks created by highlight shortcode and not by the fenced blocks).

If you remove the "linenos=table", that should fix it.

But I need to use the line numbers and so I am using the highlight shortcode. I use the fenced code blocks when I don't need line numbers.


Can you please reopen this issue? The root cause is that the highlight shortcode absorbs all the indentation whitespace which it should not. It should be behaving like indented fenced code blocks.

@bep
Copy link
Member

bep commented Dec 20, 2018

The indentation is due to extra spaces in the pre block. And also the closing tag of the highlight shortcode introduces a blank line.

OK, but in any case you need to take that with Chroma. Note that the code fence example and shortcode example are two totally different cases.

@kaushalmodi
Copy link
Contributor Author

kaushalmodi commented Dec 20, 2018

@bep

When using fenced code blocks

From https://ox-hugo.scripter.co/test/posts/source-block-indented/#code-blocks-in-list-using-code-fences:

image

When using highlight shortcode

From https://ox-hugo.scripter.co/test/posts/source-block-indented/#code-blocks-in-list-using-highlight-shortcode:

image

So the solution would be to figure out the indentation level of the starting {{< highlight >}} shortcode tag, and remove that indentation from the whole text block within it, including removal of that indentation from the closing {{< /highlight >}} tag.

@kaushalmodi
Copy link
Contributor Author

kaushalmodi commented Dec 20, 2018

OK, but in any case you need to take that with Chroma.

From the above screenshots, see that Chroma is used in both cases, and Chroma knows nothing about the highlight shortcode.

I think that Chroma simply wraps the code blocks that Hugo passes to it in pre tags. Isn't that right?

Chroma also cannot blindly remove all the leading spaces, because what if one actually wants indentation in the code blocks (Python comes to mind).

@kaushalmodi
Copy link
Contributor Author

@bep

One more take at explaining the issue:

image

@bep bep added the Upstream label Dec 20, 2018
@bep bep changed the title Code blocks using highlight shortcode in lists have extra indentation, and extra blank line Highlight in table/lineno mode adds spaces in pre Dec 20, 2018
@bep bep reopened this Dec 20, 2018
@bep
Copy link
Member

bep commented Dec 20, 2018

I have reopened this to track this upstream issue. I'm not spending more time investigating this. Others are welcome, but I suggest you start by looking at what you get from Chroma.

Also, for the future: When doing a "it works fine for code fences, but fails for shortcodes", it would be good if you do the same thing in the two cases.

@bep bep modified the milestones: v0.53, v0.54 Dec 20, 2018
@kaushalmodi
Copy link
Contributor Author

kaushalmodi commented Dec 20, 2018

@bep

I don't know Go, but trying some poorman's debug here using printing of debug info to stdout ..

I am still not convinced that this is a Chroma issue.. see below.

I added those two fmt.Printf to do a real dumb analysis of what's going to Chroma and what's coming back.

func (h highlighters) chromaHighlight(code, lang, optsStr string) (string, error) {
	opts, err := h.cs.parsePygmentsOpts(optsStr)
	if err != nil {
		jww.ERROR.Print(err.Error())
		return code, err
	}

	style, found := opts["style"]
	if !found || style == "" {
		style = "friendly"
	}

	f, err := h.cs.chromaFormatterFromOptions(opts)
	if err != nil {
		jww.ERROR.Print(err.Error())
		return code, err
	}

	b := bp.GetBuffer()
	defer bp.PutBuffer(b)

	// Added debug
	fmt.Printf("[%s] input to Chroma = `%s'\n", lang, code)
	
	err = chromaHighlight(b, code, lang, style, f)
	if err != nil {
		jww.ERROR.Print(err.Error())
		return code, err
	}

	// Added debug
	fmt.Printf("output from Chroma = `%s'\n\n", b.String())

	return h.injectCodeTag(`<div class="highlight">`+b.String()+"</div>", lang), nil
}

Fenced code block

For the code block in:

    -   List item 1.1

        ```emacs-lisp
        (message "I am in list at level-2 indentation")
        ```

I get:

[emacs-lisp] input to Chroma = `(message "I am in list at level-2 indentation")'
output from Chroma = `<pre class="chroma"><span class="p">(</span><span class="nf">message</span> <span class="s">&#34;I am in list at level-2 indentation&#34;</span><span class="p">)</span></pre>'

Notice that the code string input has the whitespace indentation already stripped off in Hugo land.

highlight shortcode with table

For the code block in:

    -   List item 1.1

        {{< highlight emacs-lisp "linenos=table, linenostart=1" >}}
        (message "I am in list at level-2 indentation")
        {{< /highlight >}}

I get:

[emacs-lisp] input to Chroma = `        (message "I am in list at level-2 indentation")
        '
output from Chroma = `<div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre class="chroma"><span class="lnt">1
</span><span class="lnt">2
</span></pre></td>
<td class="lntd">
<pre class="chroma">        <span class="p">(</span><span class="nf">message</span> <span class="s">&#34;I am in list at level-2 indentation&#34;</span><span class="p">)</span>
        </pre></td></tr></table>
</div>
'

Notice that the code string input here has the unstripped extra whitespace indentation.


Now I need your help in understanding where that code variable is created, that gets passed to chromaHighlight here:

err = chromaHighlight(b, code, lang, style, f)

@kaushalmodi
Copy link
Contributor Author

@bep OK, I have another proof that this has little to do with Chroma.

This time, I removed the linenos and linenostart args.

With this in the content file:

    -   List item 1.1

        {{< highlight emacs-lisp >}}
        (message "I am in list at level-2 indentation")
        {{< /highlight >}}

I get this in debug:

[emacs-lisp] input to Chroma = `        (message "I am in list at level-2 indentation")
        '
output from Chroma = `<pre class="chroma">        <span class="p">(</span><span class="nf">message</span> <span class="s">&#34;I am in list at level-2 indentation&#34;</span><span class="p">)</span>
        </pre>'

image

@kaushalmodi kaushalmodi changed the title Highlight in table/lineno mode adds spaces in pre Indented highlight shortcode brings in the indentation as part of the code block Dec 21, 2018
@kaushalmodi
Copy link
Contributor Author

kaushalmodi commented Dec 21, 2018

@bep What do you think of the above analysis? Can you remove the "upstream" tag?

I believe the fix is simple but I don't know how to code that in Go:

  1. First analyze if that code variable (that's passed to chromaHighlight in
    err = chromaHighlight(b, code, lang, style, f)
    ) is ending in ".*\n\s+$" (crude regex, but trying to check here that the code is ending in a newline char followed by one or more spaces).
  2. Do the following if above condition matches.
  3. Split the code into an array split by the \n char.
  4. Count the number of spaces in the last element (these will be the exact indentation count, the number of spaces before the closing {{ /highlight }} tag). Let's say the this count is N.
  5. Loop through the array.
  6. If any element of this array (including that last element) begins with N spaces, delete those spaces.
  7. Join this array back into a string joined by the same \n character.
  8. Pass that to Chroma.

@bep
Copy link
Member

bep commented Dec 21, 2018

I claim that

  1. Chroma adds this superflous whitespace inside the pre
  2. Chroma can fix this
  3. Chroma should be the place for this fix

That Hugo can work around this by implementing a 8 step algo isn't very convincing ...

Note that the Christmas train has left for this particular issue. I started to look at it assuming we (Hugo) did something stupid (adding some space), but that isn't the case.

@kaushalmodi
Copy link
Contributor Author

I claim that Chroma adds this superflous whitespace inside the pre

Did you read through the debug messages I pasted? The whitespace exists in the code string that Hugo is passing to Chroma! Chroma is not adding any space. It's just wrapping your passed whitespace with HTML pre blocks.

Pinging @alecthomas for some help here ..

@bep
Copy link
Member

bep commented Dec 21, 2018

Will come back to this ... after Christmas.

@moorereason
Copy link
Contributor

As I see it, this is local to Hugo, not upstream.

The shortcode parser doesn't care where the shortcode is in the document and doesn't understand the context of the surrounding markdown. It doesn't know that all of the leading whitespace in your code block is for the surrounding markdown list. It just sees whitespace and assumes you put it there for a reason.

Making the shortcode parser "markdown-aware" would take a lot of work, I think, but I'm wondering if this scenario could be solved easier if the shortcode parser was a BFv2 renderer. Either way, I don't think this is a trivial matter.

@kaushalmodi
Copy link
Contributor Author

@jmooring I had been this this shortcode:

shortcodes/highlight.html
{{- $code := (trim .Inner "\n\r") -}}
{{- $lang := .Get 0 -}}
{{- $options := .Get 1 | default "" -}}
{{- $code_ending_in_line_with_spaces := (findRE "[\\r\\n][[:blank:]]+$" $code) -}}
{{/* Ref: https://github.com/gohugoio/hugo/issues/4717#issuecomment-449375012 */}}
{{- with $code_ending_in_line_with_spaces -}}
    {{- $code = (replace $code "\r" "") -}} {{/* Windows -> Unix style line endings for ease in further parsing */}}
    {{- $offset := (trim (index . 0) "\n") -}}
    {{- $lines := (split $code "\n") -}}
    {{- $num_lines := (len $lines) -}}
    {{- $scratch := newScratch -}}
    {{- $scratch.Add "lines_minus_offset" (slice) -}}
    {{- range $i, $line := $lines -}}
        {{- $line_minus_offset := (strings.TrimPrefix $offset $line) -}}
        {{- if (lt $i (sub $num_lines 1)) -}} {{/* Do not add the last blank line */}}
            {{- $scratch.Add "lines_minus_offset" (slice $line_minus_offset) -}}
        {{- end -}}
    {{- end -}}
    {{- $code = (delimit ($scratch.Get "lines_minus_offset") "\n") -}}
    {{- $scratch.Delete "lines_minus_offset" -}}
{{- end -}}
{{- highlight $code $lang $options -}}

That shortcode has been working great for me (it was also submitted in a PR with a test in #5553).

I tested out your shortcode and it's working well as well 👍 .

@bep bep modified the milestones: v0.93.0, v0.94.0 Mar 1, 2022
@bep bep modified the milestones: v0.94.0, v0.95.0, v0.96.0 Mar 9, 2022
@bep bep modified the milestones: v0.96.0, v0.97.0 Mar 24, 2022
@bep bep modified the milestones: v0.97.0, v0.98.0 Apr 13, 2022
@bep bep modified the milestones: v0.98.0, v0.99.0 Apr 28, 2022
@bep bep modified the milestones: v0.99.0, v0.100.0 May 24, 2022
@bep bep closed this as completed May 30, 2022
@bep bep reopened this May 30, 2022
@bep
Copy link
Member

bep commented May 30, 2022

I was asked if my recent "indentation fix" fixes this issue, but no, this is a different setup. Hugo currently includes every bytes (include whitespace) between open/close shortcode as the .Inner (including indentation), which is then passed on to the highlight func. To me this is about the highlight shortcode (which we can certainly adapt) and not a general thing.

@bep bep self-assigned this May 30, 2022
@bep bep removed the Upstream label May 30, 2022
bep added a commit to bep/hugo that referenced this issue May 30, 2022
bep added a commit to bep/hugo that referenced this issue May 31, 2022
bep added a commit to bep/hugo that referenced this issue May 31, 2022
This commit adds a new `.InnerDeindent` method to the shortcode context, which is `.Inner` with any
indendation removed. This is then used in the built-in `highlight` shortcode to prevent the extra
whitespace getting hightlighted.

Fixes gohugoio#4717
@bep bep closed this as completed in #9955 May 31, 2022
bep added a commit that referenced this issue May 31, 2022
This commit adds a new `.InnerDeindent` method to the shortcode context, which is `.Inner` with any
indendation removed. This is then used in the built-in `highlight` shortcode to prevent the extra
whitespace getting hightlighted.

Fixes #4717
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.