Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML minification sometimes adds duplicate characters to output #1292

Closed
flother opened this issue Jan 7, 2021 · 4 comments
Closed

HTML minification sometimes adds duplicate characters to output #1292

flother opened this issue Jan 7, 2021 · 4 comments

Comments

@flother
Copy link
Contributor

flother commented Jan 7, 2021

Bug report

When minify_html = true in the site config, Zola will minify HTML pages but will sometimes include unwanted duplicate characters in the HTML output. Extra junk is appended to the complete HTML page, with the junk being a copy of some of the characters at the end of the HTML document. This seems to be a problem with Zola, not the underlying minify-html crate.

Environment

Zola version 0.13.0, compiled from commit aa03a7fe.

Expected behaviour

When minify_html = true in the config, I would expect Zola's output to match the output of minify-html, but it doesn't.

Current behaviour

On some pages (possibly only the site homepage index.html template) extra junk is appended to the complete HTML page. The junk is a copy of some of the characters at the end of the HTML document.

Steps to reproduce

I've created a minimal example in the flother/zola-html-minification-test repo. There's one section, "Blog", with a single post, and there's an index.html template to render the site homepage.

If you clone that repo, set minify_html = false in config.toml, and use the site as input for zola build, Zola v0.13.0 will use templates/index.html to create public/index.html, which will look like this:

<!doctype html>
<html>
<head>
  <meta charset="utf-8">
</head>
<body>
  
  
    <p>Example blog post</p>
  
  FOO BAR
</body>
</html>

If you run the unminified version of public/index.html through minify-html (minify-html --src public/index.html) you get:

<!doctype html><html><head><meta charset=utf-8><body><p>Example blog post</p> FOO BAR

If you then change the minify_html setting in config.toml to be minify_html = true and run zola build, Zola will create public/index.html to look like this:

<!doctype html><html><head><meta charset=utf-8><body><p>Example blog post</p> FOO BARample blog post</p>
  
  FOO BAR
</body>
</html>

Everything after the first FOO BAR are duplicate characters from the page and shouldn't be included in the output.

@Keats
Copy link
Collaborator

Keats commented Jan 7, 2021

Hmm our code does basically

        let mut input_bytes = html.as_bytes().to_vec();
        match with_friendly_error(&mut input_bytes, cfg) {

where with_friendly_error is from the minify_html crate. I don't think we do any processing other than that so this is weird.

@Keats
Copy link
Collaborator

Keats commented Jan 7, 2021

Ah I see what's happening, we shouldn't use that method probably I guess, or at least we should split based on the length we get back.

Keats added a commit that referenced this issue Jan 7, 2021
@Keats
Copy link
Collaborator

Keats commented Jan 7, 2021

Fixed on the next branch, thanks for reporting! Super useful to have a beta tester

@Keats Keats closed this as completed Jan 7, 2021
@flother
Copy link
Contributor Author

flother commented Jan 7, 2021

Glad to help, and thanks for fixing this so quickly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants