Special character in chained ERB view cause Encoding::Compatibility error in 1.9.2 #126

Closed
Quintus opened this Issue Nov 20, 2010 · 13 comments

Projects

None yet

6 participants

@Quintus
Quintus commented Nov 20, 2010

Sinatra crashes with an Encoding::CompatibilityError if you try to render a ERB view that in turn renders another ERB view, where the second view contains a special character like ä.

Steps to reproduce:

Create this directory structure:

/
- my_app.rb
views/
    - main.rhtml
    - _part.rhtml

Grab the file's contents from this gist.

Run the script by ruby my_app.rb

Browse to http://localhost:3000

Look at the error presented there or at your console which should provide the same text as the error.txt file in the previously mentioned gist.

See also this thread at the Google group.

Marvin

Quintus commented Nov 20, 2010

Forgot this, sorry.

ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
gem -v: 1.3.7
sinatra version: 1.1.0
OS: Ubuntu 10.10 Maverick Meerkat

Marvin

Owner
rkh commented Nov 20, 2010

Thanks, I will look into this as soon as I'm back home.

Owner
rkh commented Nov 22, 2010

OK, I'll need to install Ubuntu to figure this out. Works on OSX. What's your $LC_CTYPE?

Contributor
kyanagi commented Nov 23, 2010

My environment:
Debian lenny, ruby 1.9.2p42 (2010-11-15 revision 29797) [i686-linux].
$LANG is ja_JP.UTF-8 and I checked the cases of $LC_CTYPE is empty or ja_JP.UTF-8.

I can reproduce the problem by writing non-ascii strings in both of controller and view.
https://gist.github.com/712180

In Tilt::Template#compile_template_method, source.encoding is .
source string includes non us-ascii characters like "\xC3\xA4" (ä) when they appear in a ERB template,
its encoding becomes .

Encoding of embedded variables is because they are written in a ruby code and concatenating them raises a exception.

I think that the argument string of CompileSite.class_eval in Tilt::Template#compile_template_method should be magic-commented.

Quintus commented Nov 24, 2010

Sorry for the delay, but I don't have my Ubuntu Maverick at hand anymore, but the issue is present with Arch Linux as well. Today (24 Nov 2010) I updated the whole system, installed sinatra and tried again. I get the same results as I already described. $LANG is de_DE.utf8, $LC_CTYPE is empty.

ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
uname -a: Linux ikarus 2.6.35-ARCH \#1 SMP PREEMPT Sat Oct 30 21:22:26 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T5800 @ 2.00GHz GenuineIntel GNU/Linux
gem -v: 1.3.7
sinatra version: 1.1.0

Sorry for the backslash in uname -a (#1), but GitHub's flavored markdown expands it to a <a> otherwise that doesn't get linked because of the code block and therefore shows up as <a href=...>1</a>...

Marvin

Quintus commented Nov 24, 2010

Seems as if the issue is not restricted to *nix systems. On Windows Vista Home Premium, SP2, I get the same result.

ruby -v: ruby 1.9.2p0 (2010-08-18) [i386-mingw32]
uname -a: MINGW32_NT-6.0 IKARUS 1.0.15(0.47/3/2) 2010-07-06 22:04 i686 Msys
gem -v: 1.3.7
sinatra version: 1.1.0
$LANG: de
$LC_CTYPE: (empty)

Marvin

Owner
rkh commented Nov 24, 2010

Have a VM now, can reproduce it.

Contributor
michelc commented Nov 25, 2010

I have a similar problem on a french Windows 7 with ruby 1.9.1p429 (2010-07-02 revision 28523) [i386-mingw32], although the error is "Encoding::InvalidByteSequenceError - "\xC3" on US-ASCII:"

See error.txt on https://gist.github.com/715060

On my PC, I solved the problem by using a modified version of tilt.rb. I just added the line "# encoding: UTF-8" at the beginning of it.

With this, the result is displayed correctly.

But I did tests for another project and this solution doesn't work on Heroku with Ruby 1.9.

And FYI, with Windows I can also get good result by running the application with "ruby -Ku main.rb". And in this case, I don't need to use my own tilt.rb file.

Owner
rkh commented Nov 25, 2010

Current state: I'm still on it. Will probably have to set up a Windows VM, too.
btw, there is a discussion on ruby-core about always defaulting to unicode instead of depending on the OS.

Here's how I think the problem is introduced...

When the template is compiled, the eval'd string picks up the encoding from the tilt.rb file (BINARY aka ASCII-8BIT). Then, the strings generated in the compiled method use the same encoding. So, when @_out_buf = '' is executed, that empty string gets a BINARY encoding.

That explains why michelc's fix works. It switches the encoding of tilt.rb over to UTF-8 so that @_out_buf starts out in UTF-8 too. Clearly, that doesn't work for non-UTF-8 projects, but it's explanatory.

Another solution is to convert the eval'd string to the correct encoding:

CompileSite.module_eval method_string.force_encoding(encoding)

You would have to figure out the correct encoding from the underlying templating engine. This is similar to the fix mentioned in the tilt pull request.

Owner
rkh commented Dec 1, 2010

Also, not sure this plays a role, but I tried to reconstruct the error without Tilt. The code ERB generates includes magic comments specifying the encoding. However, eval only honors those if in the first line. The method wrapping Tilt does might disable this.

CompileSite has been removed in Tilt master.

gimite commented Dec 5, 2010

Also, not sure this plays a role, but I tried to reconstruct the error without Tilt. The code ERB generates includes magic comments specifying the encoding. However, eval only honors those if in the first line. The method wrapping Tilt does might disable this.

Looks like this is the case. I sent pull request to Tilt, to fix the issue by moving magic comment to the first line.
rtomayko/tilt#48

Note that you can write magic comment in ERB template e.g.
<%# coding: UTF-8 %>
and you should do it, otherwise ERB uses template string's encoding, which is probably based on environment variable in this case.

FYI Short term workaround for this issue would be to use Erubis instead of ERB, and set environment variable LANG to e.g. en_US.UTF-8. It worked fine for me.

Owner
rkh commented Dec 6, 2010

Confirmed it's fixed in current tilt master.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment