Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--minify breaks on JSON since 59.0 #6472

Open
regisphilibert opened this issue Nov 1, 2019 · 11 comments
Open

--minify breaks on JSON since 59.0 #6472

regisphilibert opened this issue Nov 1, 2019 · 11 comments

Comments

@regisphilibert
Copy link

@regisphilibert regisphilibert commented Nov 1, 2019

Since 59.0 hugo throws errors when trying to minify JSON files.

Building sites … ERROR 2019/11/01 16:57:44 parse error:1:1: unexpected character
    1: {"output":{"data":{"created":"2010-02-11T12:46:02Z","draf...
       ^
ERROR 2019/11/01 16:57:44 parse error:1:1: unexpected character
    1: {"output":{"data":{"created":"2010-03-08T16:25:25Z","draf...
       ^

It appears the minify lib has been updated to v2.5.2 with Hugo 59.0 so wondered if that could trigger the issue... b401858

I'll try and share a repo later.

Thanks!

@bep bep added this to the v0.59.2 milestone Nov 2, 2019
@bep

This comment has been minimized.

Copy link
Member

@bep bep commented Nov 2, 2019

@bep bep added Bug Upstream labels Nov 2, 2019
@anthonyfok

This comment has been minimized.

Copy link
Contributor

@anthonyfok anthonyfok commented Nov 2, 2019

Hi Regis and Bjørn Erik,

Thank you for bringing this to my attention. Unfortunately, partly due to my unfamiliarity with the JSON output feature, or perhaps the JSON test cases that I have on hand are too simplistic, that I am unable to reproduce the error that you are seeing, so please do share a test repo when you have time. Many thanks!

bep added a commit that referenced this issue Nov 3, 2019
See #6472
@bep

This comment has been minimized.

Copy link
Member

@bep bep commented Nov 3, 2019

Yea, I added a quick test for this myself, and it doesn't look like "JSON is broken" is true in the general sense (which would have surprised me), so there must be something "special" about the JSON file used.

@regisphilibert

This comment has been minimized.

Copy link
Author

@regisphilibert regisphilibert commented Nov 3, 2019

Will try and look into it next week. Thanks a lot.

@regisphilibert

This comment has been minimized.

Copy link
Author

@regisphilibert regisphilibert commented Nov 5, 2019

here you go @anthonyfok : https://github.com/theNewDynamic/gohugo-6472

Thanks a lot for your patience.

@anthonyfok

This comment has been minimized.

Copy link
Contributor

@anthonyfok anthonyfok commented Nov 8, 2019

  • Before: parse v2.3.5 and minify v2.3.7 (Hugo ≤ v0.58.3), no error.
  • After: parse v2.3.9 and minify 2.5.2 (Hugo ≥ v0.59.0), error: unexpected character.

The error is reported from parse in commit tdewolff/parse@7763141 (between v2.3.5 and v2.3.6): "Improve error messages for NULL and unknown values"

So, according to @regisphilibert's MWE (minimal working example) https://github.com/theNewDynamic/gohugo-6472:

Error: Error building site: failed to render pages: parse error:1:1: unexpected character
    1: <!DOCTYPE html><html><head><title>http://example.org/arti...

or fuller report after I tweaked tdewolff/parse code:

Error: Error building site: failed to render pages: parse error:1:1: unexpected character '<'
    1: <!DOCTYPE html><html><head><title>http://example.org/article/</title><link rel="canonical" href="http://example.org/article/"/><meta name="robots" content="noindex"><meta charset="utf-8" /><meta http-equiv="refresh" content="0; url=http://example.org/article/" /></head></html>

It would appear that somehow Hugo is trying to ask Minify to process HTML as if it were JSON, and before commit tdewolff/parse@7763141, such error was silently ignored by tdewolff/parse. So...

But then, it does not seem to explain the error message that @regisphilibert's originally encountered:

Building sites … ERROR 2019/11/01 16:57:44 parse error:1:1: unexpected character
    1: {"output":{"data":{"created":"2010-02-11T12:46:02Z","draf...
       ^
ERROR 2019/11/01 16:57:44 parse error:1:1: unexpected character
    1: {"output":{"data":{"created":"2010-03-08T16:25:25Z","draf...
       ^

As { the "open curly bracket" is certainly a valid starting character for a JSON file, so, calling it an "unexpected character" hints at a bug in tdewolff/parse... ?

So:

  • Hypothesis 1: Newer version of parse uncovers a previously hidden bug in Hugo
  • Hypothesis 2: A bug in the newer versions of parse and/or minify
  • Hypothesis 3: Both of the above.

@regisphilibert, I know you are very busy, but, if possible, could you please also try to produce a MWE in order to get the error you see initially, i.e. "unexpected character" on seemingly valid JSON code?

@tdewolff, please help take a look and see if you can shed some light on this. Thank you!

(Sorry, I am really slow at trying to understand the code, and I have exceeded my quota for today, hence this "progress report" and plea for help.)

@regisphilibert

This comment has been minimized.

Copy link
Author

@regisphilibert regisphilibert commented Nov 8, 2019

@anthonyfok I just updated the example repo so it compiles with what you requested. Not sure what did it, but it seems to be assigning JSON on RegularPage as well as lists.

@anthonyfok

This comment has been minimized.

Copy link
Contributor

@anthonyfok anthonyfok commented Nov 8, 2019

Note to self: In the example that @regisphilibert provided, Hugo apparently calls Minify() from func Transformer() in minifier/minifier.go:

    return min.Minify(m.m, ft.To(), ft.From(), params)

I was going to try to decipher what m.m (mediatype) reads, but ran out of time.

@anthonyfok

This comment has been minimized.

Copy link
Contributor

@anthonyfok anthonyfok commented Nov 8, 2019

@anthonyfok I just updated the example repo so it compiles with what you requested. Not sure what did it, but it seems to be assigning JSON on RegularPage as well as lists.

Wow, that was really quick! Thank you so much @regisphilibert!

Error: Error building site: failed to render pages: parse error:1:1: unexpected character '<'
    1: {"content":"Be With by Forrest Gander wins  2019 Pulitzer Prize for PoetryNew Directions  overjoyed by  news that our poet, translator,  dear friend Forrest Gander—whom we first published  1998—has won  year\u0026rsquo;s Pulitzer Prize!Read \u0026ldquo;Epitaph\u0026rdquo; 
       ^

Hmm... strange, the ^ points at {, but the unexpected character seems to be < still. Intriguing!

That '<' I got from modifying tdewolff/parse/json/parse.go from:

    p.err = parse.NewErrorLexer("unexpected character", p.r)

to

    p.err = parse.NewErrorLexer("unexpected character"+string(c), p.r)
@tdewolff

This comment has been minimized.

Copy link

@tdewolff tdewolff commented Nov 10, 2019

What I think is happening is that passing HTML is causing the error (ie. the < character), but that the error display message is displaying the "first character is wrong" on the wrong file. This hypothesis means a bug in Hugo for putting HTML through the JSON parser, and a bug in the error reporting of parse.

When the parser finds an error, it registers the i th position it occured in the file. The error recovery process then parses the file again to find the line number and column number, as well as the context (surrounding text). When the file buffer is changed in between, the error recovery process reads the i th position from the wrong file. This is probably a subtle bug in parse, but might be caused by how Hugo uses it...let me investigate better next week when I have some time.

@bep bep modified the milestones: v0.59.2, v0.60, v0.61 Nov 25, 2019
@tdewolff

This comment has been minimized.

Copy link

@tdewolff tdewolff commented Nov 27, 2019

I have fixed this problem partially in tdewolff/parse@2.3.14 (with tdewolff/minify@2.6.1). What remains is that Hugo parses a generated HTML document by a JSON parser.

This can be confirmed by adding:

start := ft.From().Bytes()[:0]
if len(ft.From().Bytes()) > 30 {
    start = ft.From().Bytes()[:30]
}
fmt.Printf("Transform: %s %s\n", mediatype, string(start))

before line 54 in hugo/minifiers/minifiers.go. It outputs:

Transform: application/json <!DOCTYPE html><html><head><ti

Stack trace:

goroutine 121 [running]:
runtime/debug.Stack(0x3b, 0x0, 0x0)
	/usr/lib/go/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
	/usr/lib/go/src/runtime/debug/stack.go:16 +0x22
github.com/gohugoio/hugo/minifiers.Client.Transformer.func1(0x1e5e580, 0xc00149be80, 0xc001691260, 0x115)
	/home/taco/go/src/github.com/gohugoio/hugo/minifiers/minifiers.go:60 +0x249
github.com/gohugoio/hugo/transform.(*Chain).Apply(0xc00163f298, 0x1e4d940, 0xc001ea3110, 0x1e4d920, 0xc001accfc0, 0x0, 0x0)
	/home/taco/go/src/github.com/gohugoio/hugo/transform/chain.go:105 +0x25e
github.com/gohugoio/hugo/publisher.DestinationPublisher.Publish(0x1e94f20, 0xc00057d4a0, 0x1, 0xc00057a2c0, 0x1e4d920, 0xc001accfc0, 0x1aa051e, 0x4, 0x1ac7621, 0xb, ...)
	/home/taco/go/src/github.com/gohugoio/hugo/publisher/publisher.go:100 +0x373
github.com/gohugoio/hugo/hugolib.(*Site).publishDestAlias(0xc00023b500, 0xc001471e00, 0xc000cf6580, 0x1a, 0xc001471ee0, 0x1b, 0x1aa051e, 0x4, 0x1ac7621, 0xb, ...)
	/home/taco/go/src/github.com/gohugoio/hugo/hugolib/alias.go:124 +0x3d4
github.com/gohugoio/hugo/hugolib.(*Site).writeDestAlias(...)
	/home/taco/go/src/github.com/gohugoio/hugo/hugolib/alias.go:97
github.com/gohugoio/hugo/hugolib.(*Site).renderPaginator(0xc00023b500, 0xc000ff0090, 0xc0004e4900, 0x24, 0x30, 0x13, 0xc000ff0090)
	/home/taco/go/src/github.com/gohugoio/hugo/hugolib/site_render.go:181 +0x35e
github.com/gohugoio/hugo/hugolib.pageRenderer(0xc00147cc20, 0xc00023b500, 0xc00179e120, 0xc00100b020, 0xc0015be9f0)
	/home/taco/go/src/github.com/gohugoio/hugo/hugolib/site_render.go:157 +0x675
created by github.com/gohugoio/hugo/hugolib.(*Site).renderPages
	/home/taco/go/src/github.com/gohugoio/hugo/hugolib/site_render.go:73 +0x160

Edit:
Looks like Page(/article/_index.md) with output format JSON (and thus mediatype application/json) is send through the pages channel in hugolib/site_render.go:127 while the content is clearly HTML. I'm not very familiar with the code base of Hugo, so I don't know where this channel item is coming from. @anthonyfok any ideas?

@bep bep modified the milestones: v0.61, v0.62 Nov 28, 2019
@bep bep modified the milestones: v0.62, v0.63 Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.