Ruby 1.9 character encoding changes #188

blakesmith opened this Issue Jul 1, 2010 · 15 comments


None yet

With Ruby 1.8, incorrect UTF-8 encoded characters are silently ignored. If you have a post with incorrect UTF-8 characters in the content body, they will show up in your rendered page as question marks (unknown characters).

A user upgrading from Ruby 1.8 to Ruby 1.9 who's site seemed to be working fine would get a weird error when trying to render their site (assuming it had incorrectly encoded UTF-8 characters):

/Users/blake/projects/jekyll/lib/jekyll/convertible.rb:26:in `read_yaml': invalid byte sequence in UTF-8
        from /Users/blake/projects/jekyll/lib/jekyll/post.rb:39:in `initialize'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:110:in `new'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:110:in `block in read_posts'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:108:in `each'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:108:in `read_posts'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:169:in `read_directories'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:79:in `read'
        from /Users/blake/projects/jekyll/lib/jekyll/site.rb:71:in `process'
        from ../jekyll/bin/jekyll:150:in `'

This doesn't really help the user fix the problem post. This commit will at least display the problem post so that the user knows what needs to be fixed for the site to render successfully.

This is mainly an issue of how Ruby decides to handle String encodings by default. You can read more about it here:

lmmendes commented Sep 9, 2010

In my case i was getting the following error:

/usr/local/rvm/gems/ruby-1.9.1-p378/gems/jekyll-0.7.0/lib/jekyll/convertible.rb:26:in `read_yaml': invalid byte sequence in US-ASCII (ArgumentError)
    from /usr/local/rvm/gems/ruby-1.9.1-p378/gems/jekyll-0.7.0/lib/jekyll/page.rb:24:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.1-p378/gems/jekyll-0.7.0/lib/jekyll/site.rb:185:in `new'
    from /usr/local/rvm/gems/ruby-1.9.1-p378/gems/jekyll-0.7.0/lib/jekyll/site.rb:185:in `block in read_directories'
    from /usr/local/rvm/gems/ruby-1.9.1-p378/gems/jekyll-0.7.0/lib/jekyll/site.rb:175:in `each'

And solved the problem declaring the following locale in my shell:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

tatey commented Nov 9, 2010

Just got bitten by this after recently switching to 1.9 as my default Ruby. Thanks for the patch.

lloydh commented May 3, 2011

I think I'm running into this problem, but only when running the jekyll command via SSH, not if I run jekyll directly on the host machine. Jekyll also runs without errors on the client machine — it's only over SSH that I encounter this problem:

/usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/convertible.rb:26:in `read_yaml': invalid byte sequence in US-ASCII (ArgumentError)
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/post.rb:39:in `initialize'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:119:in `new'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:119:in `block in read_posts'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:117:in `each'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:117:in `read_posts'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:211:in `read_directories'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:88:in `read'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/lib/jekyll/site.rb:79:in `process'
    from /usr/local/lib/ruby/gems/1.9.1/gems/jekyll-0.10.0/bin/jekyll:164:in `<top (required)>'
    from /usr/local/bin/jekyll:19:in `load'
    from /usr/local/bin/jekyll:19:in `<main>'

I haven't tried lmmendes' fix yet (sorry, how/where do I declare those locales, and just on the host machine, or both?) but does anybody have any ideas why SSH is creating these problems?


Kwpolska commented May 3, 2011

Put these two lines to .bashrc:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
lloydh commented May 3, 2011

Thanks Kwpolska.

I ended up having to put those lines in my .profile, but they did the trick.

dengwh commented Aug 20, 2011

I just got the similar error.
My environment is Windows XP with ruby 1.9.2.
Any recommends under Windows?



@dengwh, for Windows set the same environment variables. In your cmd.exe, type

set LC_ALL=en_US.UTF-8
set LANG=en_US.UTF-8

@dengwh for windows you can use

chcp 65001  

seems connected to #117


I'm trying to get a post-receive hook to work on Arch Linux with Ruby 1.9 and I'm getting this ASCII error. I've tried adding the UTF-8 settings to my .profile, but I'm still getting the error. I assume the git hook doesn't use my .profile, though. Any further suggestions?

EDIT: I just applied to patch to this file and it works fine now. Duh... and Thank you!


connected to #226, #201

ehtb commented Jul 12, 2012

This fix worked for me, whereas the others didn't:

svnpenn commented Jul 16, 2012

I had a text file with a ü, but accidentally had it saved with ANSI encoding. Changing the encoding to UTF-8 fixed it for me. @stereobooster patch would be very helpful though.


Still getting errors but it just started out of nowhere:

/Users/kevinsuttle/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/jekyll-0.11.2/lib/jekyll/convertible.rb:29:in `read_yaml': invalid byte sequence in UTF-8 (ArgumentError)

This isn't new by the way. See issues 117, 188, 493, 135.

@crazymaster crazymaster referenced this issue in moto-net/ Nov 12, 2012

Windowsにてjekyllでページ生成できない #5

parkr commented Jan 2, 2013

Merged in #718.

@parkr parkr closed this Jan 2, 2013

Liquid Exception: invalid byte sequence in UTF-8 in index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment