Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
"Liquid Exception: invalid byte sequence in UTF-8..." with binaries #5181
My Reproduction Steps
The Output I Wanted
I would expect binary files (images, etc.) to simply be ignored and copied through to the output directory unmodified.
This appears to be due to the logic of
def render_with_liquid? !(coffeescript_file? || yaml_file?) end
As you can see, it returns true if the source file is not coffeescript or yaml, which a
As a quick and dirty test, I added a case for
def render_with_liquid? !(coffeescript_file? || yaml_file? || %w(.png .jpg .bin).include?(extname) end
This has the desired effect, and seems to confirm I'm on the right track. Obviously it's not a good solution though (hence no PR). Assuming that storing non-liquid files in the same directory is supported (I believe it is?) this probably needs to ensure the content is suitable (ie. text, likely UTF-8) before passing it to Liquid. Maybe something along the lines of checking against config_yml's existing
I'll see if I can get a test together to send as a PR, while I'm not quite sure of the proper solution, I think I've got a solid feel for the issue behind this behavior at least.
Hey @jjarmoc! Thanks for the detailed error report.
I'd maybe put that into a method like
This is kind of so-so support. We don't really support it outright (no tests AFAIK), but I don't see a reason why we shouldn't support it. If we support it for posts, then we would have automatic support for all other collections, too.
This might be another argument for not processing posts that don't have frontmatter. Other documents already require frontmatter, but posts are special. Presumably it's unlikely for a
Then again, this would also be solved by doing file-extension-based conversion (i.e. specify in a collection's config section what file extensions should be converted; the rest would just by copied directly). I seem to recall that was discussed at some point.
@parkr No problem! It looked like other people had encountered this in the past, and not knowing what was behind it was bugging me. Happy to help when I can!
That'd work for my immediate need, but I worry that it's not too flexible. Right now, I really just want image folders in post directories (which I realize is a hotly debated topic), but down the road this might expand to other binary file formats; .pdf, .zip, .tar, .tgz, etc.
For that reason, I'd really prefer an approach that confirms a
I'm not sure why it's handled as a
Good idea, @jake-low! I like this approach better, for reasons noted above; it's more flexible when encountering file types we didn't anticipate and explicitly add logic for. Nearly any binary format (really, any I'm aware of) should have some sort of magic number that isn't "---\n", as will other text files that don't need rendering. I don't think we should check deeper than that though; the frontmatter itself will vary, and I'm not sure that making sure the frontmatter tag is closed is important enough here to concern ourselves with, when it would make us sensitive to various text encodings. This should be a relatively simple change, and one that I think is much more future proof and flexible.
So long as there aren't cases where we need to render liquid files that don't contain frontmatter at all, this seems like a really good approach. I can't think of any such cases in the context of rendering collection directories. Liquid includes are handled later (by Liquid) and so wouldn't be affected by this logic. Anyone forsee problems?
I'll work on putting together a pull request that implements a
Trace with Jekyll 3.2.0:
Reproduced with same steps on Jekyll 3.1.6 (line numbers are different but the trace is quite the same):
pushed a commit
Aug 5, 2016
@nhoizey Yes, this does appear to be the same issue as #2592. I opened a new issue since that one is closed, but did reference it in my original report. It's interesting that the error doesn't seem to occur for everyone. I created a repo with a demo Jekyll project to show my config.
At jjarmoc/jekyll-example@45ae890 we have a base
OSX 10.11 with
I can also switch to
Following up on @parkr's mention that these should be read as
I've gotten occupied with other things and haven't yet put together a PR, but still plan to do so. Should it also modify the
I don't really understand how given this code, anyone could be seeing a different behavior though, so there's something we don't fully understand at play here and that bothers me.
pushed a commit
Aug 8, 2016
I've made some progress on this, but I have a few problems.
However, there are a few failing tests. This is due to these files still being added to
Does anyone who's more familiar with the
As I mentioned in the
It seems there could be something weird in the new image file, more than in Jekyll.
Made a github pages site with jekyll using Rstudio to push markdown files. Getting page render errors that reference UTF-8. I reached out to github and they pointed me to this thread, telling me to move my images out of the posts folder. When I manually do that and fix the link, the page renders. All posts otherwise do not work.
This issue has been automatically marked as stale because it has not been commented on for at least two months.
The resources of the Jekyll team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the
If this is a feature request, please consider building it first as a plugin. Jekyll 3 introduced hooks which provide convenient access points throughout the Jekyll build pipeline whereby most needs can be fulfilled. If this is something that cannot be built as a plugin, then please provide more information about why in order to keep this issue open.
This issue will automatically be closed in two months if no further activity occurs. Thank you for all your contributions.
I am also seeing this issue, and have and idea that the reason the error seems intermittent is that it depends on the binary content of the image file.
In my case, I get the error for a .png file, no matter how I name it. My guess is that this is because the initial bytes in the .png-header, which read
This means that as soon as you try converting even the first few bytes of a png-file to a string, you will get an encoding error.
A solution might be to have the front-matter detection code look for the ascii-encoded byte-string corresponding to