Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding::CompatibilityError caused by non-latin characters in ERB template #481

Open
kinkou opened this issue Mar 8, 2015 · 2 comments

Comments

@kinkou
Copy link

kinkou commented Mar 8, 2015

Hi, I'm using Thor as generator in Rails. Some of my templates contain non-latin (cyrillic) characters, and it causes template method to fail. Here's the backtrace (I omitted some lines and shortened the paths):

(erb):14:in `concat': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
    from (erb):14:in `template'
    from .../ruby-2.1.4/lib/ruby/2.1.0/erb.rb:850:in `eval'
    from .../ruby-2.1.4/lib/ruby/2.1.0/erb.rb:850:in `result'
    ...
    from .../gems/thor-0.19.1/lib/thor/actions/file_manipulation.rb:115:in `template'
    ...
    from .../gems/railties-4.1.8/lib/rails/generators/named_base.rb:25:in `template'
    from .../lib/generators/my_generator/my_generator.rb:23:in `generate_my_stuff'

The reason must be that the template is read as ASCII-8BIT:

# lib/thor/actions/file_manipulation.rb:116
content = ERB.new(::File.binread(source), nil, "-", "@output_buffer").result(context)

..as removing non-latin chars from the template fixes the issue (as well as changing binread to read – I wonder why you had to use binread here).

What's at fault here, Thor or ERB?

@jpgeek
Copy link
Contributor

jpgeek commented Apr 7, 2016

Same issue here. Using a customized template with Rails default scaffold view generator:

(erb):4:in `concat': incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)
        from (erb):4:in `template'                                                        
        ruby/2.2.0/erb.rb:863:in `eval'                                                   
        ruby/2.2.0/erb.rb:863:in `result'                                                 
thor-0.19.1/lib/thor/actions/file_manipulation.rb:116:in `block in template'              
thor-0.19.1/lib/thor/actions/create_file.rb:53:in `call' 
thor-0.19.1/lib/thor/actions/create_file.rb:53:in `render'                                
thor-0.19.1/lib/thor/actions/create_file.rb:46:in `identical?'                            
thor-0.19.1/lib/thor/actions/create_file.rb:72:in `on_conflict_behavior'                  
thor-0.19.1/lib/thor/actions/empty_directory.rb:113:in `invoke_with_conflict_check'       
thor-0.19.1/lib/thor/actions/create_file.rb:60:in `invoke!'
thor-0.19.1/lib/thor/actions.rb:94:in `action'
thor-0.19.1/lib/thor/actions/create_file.rb:25:in `create_file'                           
thor-0.19.1/lib/thor/actions/file_manipulation.rb:115:in `template'                       
railties-4.2.5.2/lib/rails/generators/named_base.rb:26:in `block in template'  

@jpgeek
Copy link
Contributor

jpgeek commented Apr 8, 2016

Digging deeper, this looks like a regression. It looks like it broke starting with ruby 1.9.1 when IO.binread was introduced.

# lib/thor/core_ext/io_binary_read.rb

class IO #:nodoc:
  class << self                                                                           
    def binread(file, *args)
      fail ArgumentError, "wrong number of arguments (#{1 + args.size} for 1..3)" unless args.size < 3
      File.open(file, "rb") do |f|                                                        
        f.read(*args)
      end
    end unless method_defined? :binread
  end
end     

This binread() results in UTF-8 encoding (if args are not specified). However, IO.binread from Ruby 1.9.2 and later defines it as ASCII-8BIT:

static VALUE
rb_io_s_binread(int argc, VALUE *argv, VALUE io)
{
    VALUE offset;
    struct foreach_arg arg;

    rb_scan_args(argc, argv, "12", NULL, NULL, &offset);
    FilePathValue(argv[0]);
    arg.io = rb_io_open(argv[0], rb_str_new_cstr("rb:ASCII-8BIT"), Qnil, Qnil);
    if (NIL_P(arg.io)) return Qnil;
    arg.argv = argv+1;
    arg.argc = (argc > 1) ? 1 : 0;
    if (!NIL_P(offset)) {
        rb_io_seek(arg.io, offset, SEEK_SET);
    }
    return rb_ensure(io_s_read, (VALUE)&arg, rb_io_close, arg.io);
}

@sferik I am happy to write a test and patch for it, but I want to make sure I am not missing something and that there is a chance it will get pulled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants