New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Liquid Exception: invalid byte sequence in UTF-8..." with binaries #5181

Closed
jjarmoc opened this Issue Aug 2, 2016 · 19 comments

Comments

Projects
None yet
@jjarmoc

jjarmoc commented Aug 2, 2016

  • I believe this to be a bug, not a question about using Jekyll.
  • I Updated to the latest Jekyll (or) if on Github Pages to the latest github-pages
  • I am on (or have tested on) _macOS_ 10+
  • I was trying to build.

My Reproduction Steps

  1. Create a subdirectory under the _posts folder called 2016-08-02-example.
  2. Add a file 2016-08-02-example.md to this folder. (With proper frontmatter, etc.)
  3. bundle exec jekyll serve and all is well.
  4. Add an image file (ie. 08-02-16-image.png) to this folder.
  5. bundle exec jekyll serve now yields an error.

With both JEYKLL_LOG_LEVEL=debug and the -t switch output is:

...
 Rendering Markup: _posts/2016-08-02-example/2016-08-02-example.md
         Rendering: _posts/2016-08-02-example/2016-08-02-image.png
  Pre-Render Hooks: _posts/2016-08-02-example/2016-08-02-image.png
$USER_DIR$/blog/_posts/2016-08-02-example/2016-08-02-image.png render_with_liquid?  false 
  Rendering Liquid: _posts/2016-08-02-example/2016-08-02-image.png
  Liquid Exception: invalid byte sequence in UTF-8 in $USER_DIR$/blog/_posts/2016-08-02-example/2016-08-02-image.png
$USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `split': invalid byte sequence in UTF-8 (ArgumentError)
  from $USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `tokenize'
  from $USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:122:in `parse'
  from $USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:108:in `parse'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/liquid_renderer/file.rb:11:in `block in parse'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/liquid_renderer/file.rb:47:in `measure_time'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/liquid_renderer/file.rb:10:in `parse'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/renderer.rb:109:in `render_liquid'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/renderer.rb:62:in `run'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/site.rb:447:in `block (2 levels) in render_docs'
...

The Output I Wanted

I would expect binary files (images, etc.) to simply be ignored and copied through to the output directory unmodified.

Previous issues

This issue seems to have come up in past issues which may relate, including: #2592, #4276, #2262, and #2228.

Root cause

This appears to be due to the logic of Jekyll::Document.render_with_liquid?

    def render_with_liquid?
      !(coffeescript_file? || yaml_file?)
    end

As you can see, it returns true if the source file is not coffeescript or yaml, which a .png isn't. It's not suitable for liquid either though, hence the problem.

As a quick and dirty test, I added a case for .png extensions:

    def render_with_liquid?
      !(coffeescript_file? || yaml_file? || %w(.png .jpg .bin).include?(extname)
    end

This has the desired effect, and seems to confirm I'm on the right track. Obviously it's not a good solution though (hence no PR). Assuming that storing non-liquid files in the same directory is supported (I believe it is?) this probably needs to ensure the content is suitable (ie. text, likely UTF-8) before passing it to Liquid. Maybe something along the lines of checking against config_yml's existing markdown_ext?

I'll see if I can get a test together to send as a PR, while I'm not quite sure of the proper solution, I think I've got a solid feel for the issue behind this behavior at least.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Aug 3, 2016

Member

Hey @jjarmoc! Thanks for the detailed error report.

This has the desired effect, and seems to confirm I'm on the right track. Obviously it's not a good solution though (hence no PR).

I'd maybe put that into a method like image_file? and exclude .png, .jpg, .jpeg, .svg, and maybe 1 or 2 others, but otherwise it is an acceptable PR. In theory, we should be reading those as Jekyll::StaticFiles and copying them like normal. Curious why it is seen as a document.

Assuming that storing non-liquid files in the same directory is supported (I believe it is?)

This is kind of so-so support. We don't really support it outright (no tests AFAIK), but I don't see a reason why we shouldn't support it. If we support it for posts, then we would have automatic support for all other collections, too.

Member

parkr commented Aug 3, 2016

Hey @jjarmoc! Thanks for the detailed error report.

This has the desired effect, and seems to confirm I'm on the right track. Obviously it's not a good solution though (hence no PR).

I'd maybe put that into a method like image_file? and exclude .png, .jpg, .jpeg, .svg, and maybe 1 or 2 others, but otherwise it is an acceptable PR. In theory, we should be reading those as Jekyll::StaticFiles and copying them like normal. Curious why it is seen as a document.

Assuming that storing non-liquid files in the same directory is supported (I believe it is?)

This is kind of so-so support. We don't really support it outright (no tests AFAIK), but I don't see a reason why we shouldn't support it. If we support it for posts, then we would have automatic support for all other collections, too.

@parkr parkr added the bug label Aug 3, 2016

@parkr parkr added this to the 3.2.2 milestone Aug 3, 2016

@jake-low

This comment has been minimized.

Show comment
Hide comment
@jake-low

jake-low Aug 3, 2016

This might be another argument for not processing posts that don't have frontmatter. Other documents already require frontmatter, but posts are special. Presumably it's unlikely for a .png to begin with "---\n---" or similar.

Then again, this would also be solved by doing file-extension-based conversion (i.e. specify in a collection's config section what file extensions should be converted; the rest would just by copied directly). I seem to recall that was discussed at some point.

jake-low commented Aug 3, 2016

This might be another argument for not processing posts that don't have frontmatter. Other documents already require frontmatter, but posts are special. Presumably it's unlikely for a .png to begin with "---\n---" or similar.

Then again, this would also be solved by doing file-extension-based conversion (i.e. specify in a collection's config section what file extensions should be converted; the rest would just by copied directly). I seem to recall that was discussed at some point.

@jjarmoc

This comment has been minimized.

Show comment
Hide comment
@jjarmoc

jjarmoc Aug 3, 2016

@parkr No problem! It looked like other people had encountered this in the past, and not knowing what was behind it was bugging me. Happy to help when I can!

I'd maybe put that into a method like image_file? and exclude .png, .jpg, .jpeg, .svg, and maybe 1 or 2 others, but otherwise it is an acceptable PR. In theory, we should be reading those as Jekyll::StaticFiles and copying them like normal. Curious why it is seen as a document.

That'd work for my immediate need, but I worry that it's not too flexible. Right now, I really just want image folders in post directories (which I realize is a hotly debated topic), but down the road this might expand to other binary file formats; .pdf, .zip, .tar, .tgz, etc.

For that reason, I'd really prefer an approach that confirms a Document should be rendered via Liquid rather than trying to enumerate those which shouldn't be, or which handles a failure of Liquid to render more gracefully.

I'm not sure why it's handled as a Document object and not a StaticFile. I was playing around with the jekyll_post_files plugin, which explicitly adds files to site.static_files, but was still encountering this. The plugin seems to leave the original Document object in tact, though why it's there in the first place I'm not sure. I have this issue both with and without jekyll_post_files enabled, so I don't think it's the cause but it seems to exhibit the same issue.

This might be another argument for not processing posts that don't have frontmatter. Other documents already require frontmatter, but posts are special. Presumably it's unlikely for a .png to begin with "---\n---" or similar.

Good idea, @jake-low! I like this approach better, for reasons noted above; it's more flexible when encountering file types we didn't anticipate and explicitly add logic for. Nearly any binary format (really, any I'm aware of) should have some sort of magic number that isn't "---\n", as will other text files that don't need rendering. I don't think we should check deeper than that though; the frontmatter itself will vary, and I'm not sure that making sure the frontmatter tag is closed is important enough here to concern ourselves with, when it would make us sensitive to various text encodings. This should be a relatively simple change, and one that I think is much more future proof and flexible.

So long as there aren't cases where we need to render liquid files that don't contain frontmatter at all, this seems like a really good approach. I can't think of any such cases in the context of rendering collection directories. Liquid includes are handled later (by Liquid) and so wouldn't be affected by this logic. Anyone forsee problems?

I'll work on putting together a pull request that implements a has_frontmatter? method which performs this method and is called as part of the render_with_liquid? check. From there, the maintainers can determine if that's something they'd like to incorporate.

jjarmoc commented Aug 3, 2016

@parkr No problem! It looked like other people had encountered this in the past, and not knowing what was behind it was bugging me. Happy to help when I can!

I'd maybe put that into a method like image_file? and exclude .png, .jpg, .jpeg, .svg, and maybe 1 or 2 others, but otherwise it is an acceptable PR. In theory, we should be reading those as Jekyll::StaticFiles and copying them like normal. Curious why it is seen as a document.

That'd work for my immediate need, but I worry that it's not too flexible. Right now, I really just want image folders in post directories (which I realize is a hotly debated topic), but down the road this might expand to other binary file formats; .pdf, .zip, .tar, .tgz, etc.

For that reason, I'd really prefer an approach that confirms a Document should be rendered via Liquid rather than trying to enumerate those which shouldn't be, or which handles a failure of Liquid to render more gracefully.

I'm not sure why it's handled as a Document object and not a StaticFile. I was playing around with the jekyll_post_files plugin, which explicitly adds files to site.static_files, but was still encountering this. The plugin seems to leave the original Document object in tact, though why it's there in the first place I'm not sure. I have this issue both with and without jekyll_post_files enabled, so I don't think it's the cause but it seems to exhibit the same issue.

This might be another argument for not processing posts that don't have frontmatter. Other documents already require frontmatter, but posts are special. Presumably it's unlikely for a .png to begin with "---\n---" or similar.

Good idea, @jake-low! I like this approach better, for reasons noted above; it's more flexible when encountering file types we didn't anticipate and explicitly add logic for. Nearly any binary format (really, any I'm aware of) should have some sort of magic number that isn't "---\n", as will other text files that don't need rendering. I don't think we should check deeper than that though; the frontmatter itself will vary, and I'm not sure that making sure the frontmatter tag is closed is important enough here to concern ourselves with, when it would make us sensitive to various text encodings. This should be a relatively simple change, and one that I think is much more future proof and flexible.

So long as there aren't cases where we need to render liquid files that don't contain frontmatter at all, this seems like a really good approach. I can't think of any such cases in the context of rendering collection directories. Liquid includes are handled later (by Liquid) and so wouldn't be affected by this logic. Anyone forsee problems?

I'll work on putting together a pull request that implements a has_frontmatter? method which performs this method and is called as part of the render_with_liquid? check. From there, the maintainers can determine if that's something they'd like to incorporate.

@nhoizey

This comment has been minimized.

Show comment
Hide comment
@nhoizey

nhoizey Aug 4, 2016

Contributor

@jjarmoc I developed jekyll-postfiles (new name/version of jekyll_post_files), and use it daily on my own blog without any issue, with a lot of images (jpeg, png and even gif) in my _posts/ subfolders.

Contributor

nhoizey commented Aug 4, 2016

@jjarmoc I developed jekyll-postfiles (new name/version of jekyll_post_files), and use it daily on my own blog without any issue, with a lot of images (jpeg, png and even gif) in my _posts/ subfolders.

@nhoizey

This comment has been minimized.

Show comment
Hide comment
@nhoizey

nhoizey Aug 4, 2016

Contributor

If this can help, I'm still using Jekyll 3.1.6 because 3.2.x broke the Jekyll Tagging plugin…

Contributor

nhoizey commented Aug 4, 2016

If this can help, I'm still using Jekyll 3.1.6 because 3.2.x broke the Jekyll Tagging plugin…

@borisschapira

This comment has been minimized.

Show comment
Hide comment
@borisschapira

borisschapira Aug 4, 2016

  • I am on (or have tested on) _macOS_ 10+
  • I am using _Ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]_

Trace with Jekyll 3.2.0:

ArgumentError: invalid byte sequence in UTF-8
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `split'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `tokenize'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:122:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:108:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/liquid_renderer/file.rb:11:in `block in parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/liquid_renderer/file.rb:47:in `measure_time'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/liquid_renderer/file.rb:10:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/renderer.rb:109:in `render_liquid'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/renderer.rb:62:in `run'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:447:in `block (2 levels) in render_docs'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:445:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:445:in `block in render_docs'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:444:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:444:in `render_docs'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:190:in `render'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:69:in `process'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/command.rb:26:in `process_site'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/commands/build.rb:63:in `build'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/commands/build.rb:34:in `process'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/commands/serve.rb:36:in `block (2 levels) in init_with_program'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `block in execute'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `execute'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/program.rb:42:in `go'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary.rb:19:in `program'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/exe/jekyll:13:in `<top (required)>'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/bin/jekyll:23:in `load'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/bin/jekyll:23:in `<top (required)>'

Reproduced with same steps on Jekyll 3.1.6 (line numbers are different but the trace is quite the same):

ArgumentError: invalid byte sequence in UTF-8
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `split'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `tokenize'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:122:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:108:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/liquid_renderer/file.rb:11:in `block in parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/liquid_renderer/file.rb:43:in `measure_time'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/liquid_renderer/file.rb:10:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/renderer.rb:106:in `render_liquid'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/renderer.rb:61:in `run'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:171:in `block (2 levels) in render'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:169:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:169:in `block in render'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:168:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:168:in `render'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:59:in `process'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/command.rb:26:in `process_site'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/commands/build.rb:60:in `build'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/commands/build.rb:33:in `process'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/commands/serve.rb:34:in `block (2 levels) in init_with_program'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `block in execute'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `execute'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/program.rb:42:in `go'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary.rb:19:in `program'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/bin/jekyll:13:in `<top (required)>'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/bin/jekyll:23:in `load'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/bin/jekyll:23:in `<top (required)>'

borisschapira commented Aug 4, 2016

  • I am on (or have tested on) _macOS_ 10+
  • I am using _Ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]_

Trace with Jekyll 3.2.0:

ArgumentError: invalid byte sequence in UTF-8
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `split'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `tokenize'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:122:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:108:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/liquid_renderer/file.rb:11:in `block in parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/liquid_renderer/file.rb:47:in `measure_time'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/liquid_renderer/file.rb:10:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/renderer.rb:109:in `render_liquid'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/renderer.rb:62:in `run'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:447:in `block (2 levels) in render_docs'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:445:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:445:in `block in render_docs'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:444:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:444:in `render_docs'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:190:in `render'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/site.rb:69:in `process'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/command.rb:26:in `process_site'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/commands/build.rb:63:in `build'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/commands/build.rb:34:in `process'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/lib/jekyll/commands/serve.rb:36:in `block (2 levels) in init_with_program'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `block in execute'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `execute'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/program.rb:42:in `go'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary.rb:19:in `program'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.2.0/exe/jekyll:13:in `<top (required)>'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/bin/jekyll:23:in `load'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/bin/jekyll:23:in `<top (required)>'

Reproduced with same steps on Jekyll 3.1.6 (line numbers are different but the trace is quite the same):

ArgumentError: invalid byte sequence in UTF-8
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `split'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `tokenize'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:122:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/liquid-3.0.6/lib/liquid/template.rb:108:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/liquid_renderer/file.rb:11:in `block in parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/liquid_renderer/file.rb:43:in `measure_time'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/liquid_renderer/file.rb:10:in `parse'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/renderer.rb:106:in `render_liquid'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/renderer.rb:61:in `run'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:171:in `block (2 levels) in render'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:169:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:169:in `block in render'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:168:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:168:in `render'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/site.rb:59:in `process'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/command.rb:26:in `process_site'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/commands/build.rb:60:in `build'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/commands/build.rb:33:in `process'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/lib/jekyll/commands/serve.rb:34:in `block (2 levels) in init_with_program'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `block in execute'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `each'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `execute'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary/program.rb:42:in `go'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/mercenary-0.3.6/lib/mercenary.rb:19:in `program'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/gems/jekyll-3.1.6/bin/jekyll:13:in `<top (required)>'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/bin/jekyll:23:in `load'
  /Users/boris/Projects/perso/blog/blog/vendor/ruby/2.3.0/bin/jekyll:23:in `<top (required)>'
@nhoizey

This comment has been minimized.

Show comment
Hide comment
@nhoizey

nhoizey Aug 5, 2016

Contributor

I just tried on my Mac, and I have no issue:
nhoizey/jekyll-postfiles#1 (comment)

Contributor

nhoizey commented Aug 5, 2016

I just tried on my Mac, and I have no issue:
nhoizey/jekyll-postfiles#1 (comment)

@nhoizey

This comment has been minimized.

Show comment
Hide comment
@nhoizey

nhoizey Aug 5, 2016

Contributor

Isn't this a duplicate of #2592?

Contributor

nhoizey commented Aug 5, 2016

Isn't this a duplicate of #2592?

jjarmoc pushed a commit to jjarmoc/jekyll-example that referenced this issue Aug 5, 2016

Jeff Jarmoc
Add files to repro issue in jekyll/jekyll#5181
- `mkdir _posts/2016-08-05-example`
- `cd _posts/2016-08-05-example`
- `echo "---\n---Test!" > _posts/2016-08-05-example/2016-08-05.example.md`
- `wget https://jekyllrb.com/img/logo-2x.png`
@jjarmoc

This comment has been minimized.

Show comment
Hide comment
@jjarmoc

jjarmoc Aug 5, 2016

@nhoizey Yes, this does appear to be the same issue as #2592. I opened a new issue since that one is closed, but did reference it in my original report. It's interesting that the error doesn't seem to occur for everyone. I created a repo with a demo Jekyll project to show my config.

At jjarmoc/jekyll-example@45ae890 we have a base jekyll new example skeleton, which works fine.
After jjarmoc/jekyll-example@b7fc535 I've added the repro files, and bundle exec jekyll server throws the error on my machine just like those above from myself and others.

OSX 10.11 with ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-darwin14] and this Gemfile.lock should reproduce.

I can also switch to ruby 2.3.0p0 (2015-12-25 revision 53290) [x86_64-darwin15] which exhibits the same behavior.

Following up on @parkr's mention that these should be read as Jekyll::StaticFiles led me to the Jekyll::PostReader class which creates Jekyll::Documents for all files that aren't future dated (including binaries) at https://github.com/jekyll/jekyll/blob/master/lib/jekyll/readers/post_reader.rb#L25. That leads to attempting to parse binaries as Documents, and attempting to render them as liquid (per my comments above regarding render_with_liquid?) which then throws the invalid UTF-8 error.

I've gotten occupied with other things and haven't yet put together a PR, but still plan to do so. Should it also modify the PostReader class to add binary files as Jekyll::StaticFiles? If it did that, I think @nhoizey's plugin (which I appreciate you sharing by the way, thanks!) would be rendered moot.

I don't really understand how given this code, anyone could be seeing a different behavior though, so there's something we don't fully understand at play here and that bothers me.

jjarmoc commented Aug 5, 2016

@nhoizey Yes, this does appear to be the same issue as #2592. I opened a new issue since that one is closed, but did reference it in my original report. It's interesting that the error doesn't seem to occur for everyone. I created a repo with a demo Jekyll project to show my config.

At jjarmoc/jekyll-example@45ae890 we have a base jekyll new example skeleton, which works fine.
After jjarmoc/jekyll-example@b7fc535 I've added the repro files, and bundle exec jekyll server throws the error on my machine just like those above from myself and others.

OSX 10.11 with ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-darwin14] and this Gemfile.lock should reproduce.

I can also switch to ruby 2.3.0p0 (2015-12-25 revision 53290) [x86_64-darwin15] which exhibits the same behavior.

Following up on @parkr's mention that these should be read as Jekyll::StaticFiles led me to the Jekyll::PostReader class which creates Jekyll::Documents for all files that aren't future dated (including binaries) at https://github.com/jekyll/jekyll/blob/master/lib/jekyll/readers/post_reader.rb#L25. That leads to attempting to parse binaries as Documents, and attempting to render them as liquid (per my comments above regarding render_with_liquid?) which then throws the invalid UTF-8 error.

I've gotten occupied with other things and haven't yet put together a PR, but still plan to do so. Should it also modify the PostReader class to add binary files as Jekyll::StaticFiles? If it did that, I think @nhoizey's plugin (which I appreciate you sharing by the way, thanks!) would be rendered moot.

I don't really understand how given this code, anyone could be seeing a different behavior though, so there's something we don't fully understand at play here and that bothers me.

jjarmoc pushed a commit to jjarmoc/jekyll that referenced this issue Aug 8, 2016

Jeff Jarmoc
Work in progress for jekyll#5181
[x] Does not invoke Liquid renderer on files without YAML frontmatter.
[x] Copies files without frontmatter to the corresponding output path unmodified.
[x] Tests included

TODO: Failing tests from TestGenerateSite

[] TestGeneratedSite#test_: generated sites should ensure post count is as expected.  [/Users/jjarmoc/Projects/jekyll/test/test_generated_site.rb:14]
Minitest::Assertion: Expected: 51
  Actual: 52

[] TestGeneratedSite#test_: generated sites should render latest post's content.  [/Users/jjarmoc/Projects/jekyll/test/test_generated_site.rb:22]
Minitest::Assertion: Expected false to be truthy.
@jjarmoc

This comment has been minimized.

Show comment
Hide comment
@jjarmoc

jjarmoc Aug 8, 2016

I've made some progress on this, but I have a few problems.

My branch now only renders liquid in Posts which have YAML frontmatter (@jake-low's idea above). Other files are copied over unmodified.

However, there are a few failing tests. This is due to these files still being added to site.posts. I'm having trouble figuring out a good way to handle it; the PostReader simply creates Jekyll::Document instances for each file in _posts, which in turn populates site.posts. It seems that flagging these as Jekyll::StaticFile would be better, but due to some differences between StaticFile and Document constructors and logic, this is turning out to be somewhat difficult.

Does anyone who's more familiar with the ...Reader classes have an idea how to address this?

jjarmoc commented Aug 8, 2016

I've made some progress on this, but I have a few problems.

My branch now only renders liquid in Posts which have YAML frontmatter (@jake-low's idea above). Other files are copied over unmodified.

However, there are a few failing tests. This is due to these files still being added to site.posts. I'm having trouble figuring out a good way to handle it; the PostReader simply creates Jekyll::Document instances for each file in _posts, which in turn populates site.posts. It seems that flagging these as Jekyll::StaticFile would be better, but due to some differences between StaticFile and Document constructors and logic, this is turning out to be somewhat difficult.

Does anyone who's more familiar with the ...Reader classes have an idea how to address this?

@nhoizey

This comment has been minimized.

Show comment
Hide comment
@nhoizey

nhoizey Sep 5, 2016

Contributor

As I mentioned in the jekyll-postfiles issue related to this one, I never had any issue like this with my existing 750 images inside _posts/, and I just got the error with one of the images I added today.

It seems there could be something weird in the new image file, more than in Jekyll.

Still investigating…

Contributor

nhoizey commented Sep 5, 2016

As I mentioned in the jekyll-postfiles issue related to this one, I never had any issue like this with my existing 750 images inside _posts/, and I just got the error with one of the images I added today.

It seems there could be something weird in the new image file, more than in Jekyll.

Still investigating…

@parkr parkr modified the milestones: 3.2.2, 3.3 Sep 22, 2016

@parkr parkr modified the milestones: 3.3, 3.3.1 Oct 5, 2016

@parkr parkr modified the milestones: 3.3.2, 3.3.1 Nov 10, 2016

@jekyllbot jekyllbot added the stale label Jan 10, 2017

@paulditterline

This comment has been minimized.

Show comment
Hide comment
@paulditterline

paulditterline Jan 11, 2017

Made a github pages site with jekyll using Rstudio to push markdown files. Getting page render errors that reference UTF-8. I reached out to github and they pointed me to this thread, telling me to move my images out of the posts folder. When I manually do that and fix the link, the page renders. All posts otherwise do not work.

paulditterline commented Jan 11, 2017

Made a github pages site with jekyll using Rstudio to push markdown files. Getting page render errors that reference UTF-8. I reached out to github and they pointed me to this thread, telling me to move my images out of the posts folder. When I manually do that and fix the link, the page renders. All posts otherwise do not work.

@jekyllbot jekyllbot removed the stale label Jan 11, 2017

@nhoizey

This comment has been minimized.

Show comment
Hide comment
@nhoizey

nhoizey Jan 16, 2017

Contributor

@paulditterline I don't know if @jjarmoc made progress, or if anyone could help him find the solution to his latest issues.

Contributor

nhoizey commented Jan 16, 2017

@paulditterline I don't know if @jjarmoc made progress, or if anyone could help him find the solution to his latest issues.

@parkr parkr modified the milestones: 3.3.2, 3.4, 3.5 Jan 16, 2017

@jekyllbot

This comment has been minimized.

Show comment
Hide comment
@jekyllbot

jekyllbot Mar 16, 2017

Contributor

This issue has been automatically marked as stale because it has not been commented on for at least two months.

The resources of the Jekyll team are limited, and so we are asking for your help.

If this is a bug and you can still reproduce this error on the 3.3-stable or master branch, please reply with all of the information you have about it in order to keep the issue open.

If this is a feature request, please consider building it first as a plugin. Jekyll 3 introduced hooks which provide convenient access points throughout the Jekyll build pipeline whereby most needs can be fulfilled. If this is something that cannot be built as a plugin, then please provide more information about why in order to keep this issue open.

This issue will automatically be closed in two months if no further activity occurs. Thank you for all your contributions.

Contributor

jekyllbot commented Mar 16, 2017

This issue has been automatically marked as stale because it has not been commented on for at least two months.

The resources of the Jekyll team are limited, and so we are asking for your help.

If this is a bug and you can still reproduce this error on the 3.3-stable or master branch, please reply with all of the information you have about it in order to keep the issue open.

If this is a feature request, please consider building it first as a plugin. Jekyll 3 introduced hooks which provide convenient access points throughout the Jekyll build pipeline whereby most needs can be fulfilled. If this is something that cannot be built as a plugin, then please provide more information about why in order to keep this issue open.

This issue will automatically be closed in two months if no further activity occurs. Thank you for all your contributions.

@jekyllbot jekyllbot added the stale label Mar 16, 2017

@saagarjha

This comment has been minimized.

Show comment
Hide comment
@saagarjha

saagarjha Mar 17, 2017

I'm still seeing this with Jekyll 3.4.2.

saagarjha commented Mar 17, 2017

I'm still seeing this with Jekyll 3.4.2.

@jekyllbot jekyllbot removed the stale label Mar 17, 2017

@oe oe added the pinned label Mar 17, 2017

@nhoizey

This comment has been minimized.

Show comment
Hide comment
@nhoizey

nhoizey May 4, 2017

Contributor

I finally understood why I had an issue that looked like this one: nhoizey/jekyll-postfiles#1 (comment)

So, I no longer have this issue, but other people seem to still have it…

Contributor

nhoizey commented May 4, 2017

I finally understood why I had an issue that looked like this one: nhoizey/jekyll-postfiles#1 (comment)

So, I no longer have this issue, but other people seem to still have it…

@DirtyF

This comment has been minimized.

Show comment
Hide comment
@DirtyF

DirtyF May 4, 2017

Member

@nhoizey good catch! I just did a test : no error message when image file does not start with a date, error message when image file is named like a post.

Member

DirtyF commented May 4, 2017

@nhoizey good catch! I just did a test : no error message when image file does not start with a date, error message when image file is named like a post.

@DirtyF DirtyF changed the title from "Liquid Exception: invalid byte sequence in UTF-8..." with binaries in _posts dir. to "Liquid Exception: invalid byte sequence in UTF-8..." with binaries beginning with a date in _posts dir. May 4, 2017

@saagarjha

This comment has been minimized.

Show comment
Hide comment
@saagarjha

saagarjha May 4, 2017

I'm still having this issue, and it appears to occur without regard to whether there is a date in the image's filename.

saagarjha commented May 4, 2017

I'm still having this issue, and it appears to occur without regard to whether there is a date in the image's filename.

@Japanuspus

This comment has been minimized.

Show comment
Hide comment
@Japanuspus

Japanuspus Jun 16, 2017

I am also seeing this issue, and have and idea that the reason the error seems intermittent is that it depends on the binary content of the image file.

In my case, I get the error for a .png file, no matter how I name it. My guess is that this is because the initial bytes in the .png-header, which read 0x8950 4e47, is not valid UTF-8 (because all valid UTF-8 code points have either no leading ones or two or more leading ones in their binary form.

This means that as soon as you try converting even the first few bytes of a png-file to a string, you will get an encoding error.

A solution might be to have the front-matter detection code look for the ascii-encoded byte-string corresponding to ---\n rather than converting to string and then comparing.

Japanuspus commented Jun 16, 2017

I am also seeing this issue, and have and idea that the reason the error seems intermittent is that it depends on the binary content of the image file.

In my case, I get the error for a .png file, no matter how I name it. My guess is that this is because the initial bytes in the .png-header, which read 0x8950 4e47, is not valid UTF-8 (because all valid UTF-8 code points have either no leading ones or two or more leading ones in their binary form.

This means that as soon as you try converting even the first few bytes of a png-file to a string, you will get an encoding error.

A solution might be to have the front-matter detection code look for the ascii-encoded byte-string corresponding to ---\n rather than converting to string and then comparing.

@DirtyF DirtyF changed the title from "Liquid Exception: invalid byte sequence in UTF-8..." with binaries beginning with a date in _posts dir. to "Liquid Exception: invalid byte sequence in UTF-8..." with binaries Jun 16, 2017

@DirtyF DirtyF modified the milestones: 3.5, 3.6 Jun 18, 2017

@parkr parkr referenced this issue Aug 17, 2017

Closed

Release Jekyll v3.6.0 #6314

3 of 3 tasks complete

Crunch09 added a commit to Crunch09/jekyll that referenced this issue Sep 3, 2017

@jekyllbot jekyllbot closed this in #6344 Sep 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment