Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Encoding Support #107

Closed
wants to merge 11 commits into from

12 participants

Ryan Tomayko Joshua Hull Wojciech Wnętrzak Gabriel Benmergui Tim Haines Matias Korhonen Eric Mill Tim Millwood Konstantin Haase Magnus Holm Tom Wardrop Joshua Peek
Ryan Tomayko
Owner

Taking a crack at the encoding problem (#75). The code I have written so far is pretty close to @judofyr's original write up on that issue. The Template class has been modified somewhat and I'm using the ERBTemplate, BuilderTemplate, and CoffeeScriptTemplate implementations to exercise the requirements.

I have a section in the README on encodings to spec out the expected behavior and general strategy. There's an overviewy part and then it describes how templates should think about template source data:

Template Source Encoding

The template source data may come from a file or from a string. In either case, the real template source encoding should be determined as follows in order of preference:

  • Template specific encoding rules (e.g., utf-8 only formats).
  • A (template specific) magic encoding comment embedded in the source string.
  • The source string's existing encoding (string only).
  • The :default_encoding option to Template.new (file only).
  • Encoding.default_external - the default system encoding (file only)

Some template file formats have strict encoding requirements. CoffeeScript is a utf-8 only format for instance. Template implementations are encouraged to use this type of information to constrain the detection logic defined above.

This part feels pretty solid to me. The ERB/Erubis, Builder, and CS templates all adhere to these rules and document their behavior when the source encoding detection process is somehow constrained.

This next part I've only prose and no code. It's a lot more shaky:

Render Context Encoding

When the system internal encoding (Encoding.default_internal) is not set (MRI default), templates should be evaluated and produce a result string encoded the same as the template source data. e.g., A Big5 encoded template on disk will generate a Big5 result string and expect interpolated values to be Big5 compatible.

When Encoding.default_internal is set, templates should be converted from the template source encoding to the internal encoding before being compiled / evaluated and the result string should be encoded in the default internal encoding. For instance, when default_internal is set to UTF-8, a Big5 encoded template on disk will be converted to and generate a UTF-8 result string and interpolated values must be utf-8 compatible.

I do like this behavior in theory. It's compatible with ActionView and seems consistent with the spirit of Encoding.default_internal. However, the caller needs the ability to override these behaviors for cases where default_internal cannot be changed but the caller knows render context transcoding is needed or not needed. For that I think we should add another option:

Templates that perform render context transcoding must allow these default behaviors to be controlled via the :transcode option:

  • :transcode => true - Convert from template source encoding to the system default internal encoding before evaluating the template. The result string is guaranteed to be in the default internal encoding. Do nothing when Encoding.default_internal is nil.

    This is the default behavior when no :transcode option is given.

  • :transcode => false - Perform no encoding conversion. The result string will have the same encoding as the detected template source string.

    This is the default behavior when default_internal is nil.

  • :transcode => 'utf-8' - Ignore default_internal. Instead, convert from template source encoding to utf-8 before evaluating the template. The result string is guaranteed to be utf-8 encoded. The encoding value ('utf-8') may be any valid encoding name or Encoding constant.

My plan is to start working on the default transcoding behavior + :transcode option for all of the templates I've touched so far. What do you guys think?

/cc @judofyr, @josh, @rkh, @josevalim, @apotonick, @nesquena, @brianmario, @DAddYE, everyone ...

added some commits
Ryan Tomayko revise and define more strictly the default_encoding option
The default_encoding option takes effect only when tilt reads
template data from the filesystem. Templates provided via custom
reader block are assumed to be tagged with a best guess encoding
already.

It's also worth noting that, unlike File.read, the default file
reader does not perform Encoding.default_internal transcoding. The
string is marked with the default_encoding or the system encoding
(Encoding.default_external) but no transcoding is performed. This is
because magic comments or template specific encoding settings are
not yet available.
d602069
Ryan Tomayko generate ruby source in same encoding as template source data e18b99c
Ryan Tomayko specify template source encoding behavior in README, tests ee65ca9
Ryan Tomayko ERB templates adhere to source template encoding behavior 35e5a48
Ryan Tomayko refine ruby source comment extraction utility methods 7ef2de5
Ryan Tomayko Builder template adheres to source template encoding behavior 9c0ecf8
Ryan Tomayko start to spec out :transcode behavior a little
None of this is happening in the code yet. I'm just trying to figure
out what it might look like.
fcb9a1e
Ryan Tomayko CoffeeScript template requires utf-8 input, generates utf-8 output only
All of the various input and output overrides are ignored
essentially.
b73c7da
Ryan Tomayko reorg encoding spec in README f36cc18
Ryan Tomayko typos abound in binary method comments ae8900b
Ryan Tomayko fix markdown bullet indent in README 9a474c5
Joshua Peek josh commented on the diff
lib/tilt/builder.rb
@@ -14,7 +35,10 @@ module Tilt
require_template_library 'builder'
end
- def prepare; end
+ def prepare
+ return if !data.respond_to?(:to_str)
+ @source = assign_source_encoding(data.to_str)
Joshua Peek
josh added a note

I like the opt-in for assign_source_encoding.

Ryan Tomayko Owner

Yeah. I'm going with an approach where the base class provides some convenience APIs but the behavior is more or less in the hands of the template subclass. I don't think there's any other way since template's vary so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Joshua Peek josh commented on the diff
README.md
((65 lines not shown))
+ - `:transcode => true` - Convert from template source encoding to the system
+ default internal encoding (`Encoding.default_internal`) before evaluating the
+ template. The result string is guaranteed to be in the default internal
+ encoding. Do nothing when `Encoding.default_internal` is nil.
+
+ This is the default behavior when no `:transcode` option is given.
+
+ - `:transcode => false` - Perform no encoding conversion. The result string
+ will have the same encoding as the detected template source string.
+
+ This is the default behavior when `Encoding.default_internal` is nil.
+
+ - `:transcode => 'utf-8'` - Ignore `Encoding.default_internal`. Instead,
+ convert from template source encoding to utf-8 before evaluating the
+ template. The result string is guaranteed to be utf-8 encoded. The encoding
+ value (`'utf-8'`) may be any valid encoding name or Encoding constant.
Joshua Peek
josh added a note

I don't think this :transcode option is implemented yet. Is this going to be set as an option on initialize or supposed to be passed each time to the render method?

We might be able to provide some of these things for the handler by adding them to the default render implementation. The before evaluating step doesn't always exist for every handler. For an example, coffee, sass and markdown are contextless and ignore the locals option. So these handlers only need to call encode on the final string they produce. So we might be able to move that final encode to our render. But I'm not sure if thats entirely a good idea.

Ryan Tomayko Owner

Nope I haven't started in on any of the transcoding yet.

I hadn't even considered making this an option to render. I figured we'd want it on initialize and also maybe as a Template class attribute that can be overridden by subclasses. Having it on render could be interesting. I'm not sure how much it'd be used though. I think the 99% case for transcoding is going to be people setting Encoding.default_internal = 'utf-8' or wanting the same effect when running templates.

Joshua Peek
josh added a note

Don't really want to put it on render cause we have no options arg. Passing to initialize makes the most sense.

We could provide some sort of Template#encode() helper that encodes the string to options[:transcode] || default_internal. So handlers can call that before they return the final result.

Ryan Tomayko Owner

Exactly what I'm thinking. It needs to happen before evaluation for context / interpolating templates though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Ryan Tomayko
Owner

Related: #75

Joshua Hull

@rtomayko Any update on this?

Ryan Tomayko
Owner

@joshbuddy: I haven't had a second to look at this for a long while now, unfortunately. Hopefully things will settle down soon. I'd definitely like to get this closed out.

Joshua Hull

@rtomayko No worries. Just wanted to make sure it didn't get lost in the shuffle.

Wojciech Wnętrzak

I have same problems with template encodings.

I tested encoding branch and it works fine for me.

Any update on this issue?

Gabriel Benmergui

This is a pretty serious issue for my application. I cannot support internationalization and users can easily break pages if they put the wrong characters.

Is there a quickfix or a monkey-patch i can do to fix this? adding the magic comment to the erb did not work. I've been waiting 2 months for this merge :S.

Tim Haines

Rail's asset pipeline fails on templates with unicode in them due to the lack of encoding support right?

Tim Haines

FWIW The encodings branch of the gem worked for me.

Gabriel Benmergui

Yes that was my solution also.

Matias Korhonen

What's the status on this? Just wondering why the encodings branch hasn't gotten merged into master?

Eric Mill

I too would love these changes, having some related breakages on my end - any chance these'll be merged into master soon?

Marios Antonoudiou mariosant referenced this pull request in middleman/middleman
Closed

Bug with encoding #738

Konstantin Haase
Collaborator

What's the status of this?

Rails, Sinatra and Ruby 2.0 default everything to UTF-8.

Magnus Holm
Collaborator

I'm going to revisit this for the 1.4 release. I'm planning on implementing (or, copying from this branch) the minimal code we need for supporting encodings. No transcoding or funky stuff, but something that makes UTF-8 templates work out-of-the-box for 99% of all users.

Tom Wardrop

Please do @judofyr, I've just hit this in a SCSS template I have which contains a single UTF-8 character. I've worked around it in my framework by simply doing a manual File#read in the block I send to #new. It seems File#binread otherwise used by Tilt always defaults to ASCII-8BIT, even on Ruby 2.0.0.

Magnus Holm
Collaborator

@Wardrop: Can you try the branch in #175 and see if it fixes your problem?

Magnus Holm
Collaborator

#175 has now been merged.

Magnus Holm judofyr closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 19, 2011
  1. revise and define more strictly the default_encoding option

    authored
    The default_encoding option takes effect only when tilt reads
    template data from the filesystem. Templates provided via custom
    reader block are assumed to be tagged with a best guess encoding
    already.
    
    It's also worth noting that, unlike File.read, the default file
    reader does not perform Encoding.default_internal transcoding. The
    string is marked with the default_encoding or the system encoding
    (Encoding.default_external) but no transcoding is performed. This is
    because magic comments or template specific encoding settings are
    not yet available.
  2. start to spec out :transcode behavior a little

    authored
    None of this is happening in the code yet. I'm just trying to figure
    out what it might look like.
  3. CoffeeScript template requires utf-8 input, generates utf-8 output only

    authored
    All of the various input and output overrides are ignored
    essentially.
  4. reorg encoding spec in README

    authored
This page is out of date. Refresh to see the latest.
78 README.md
View
@@ -191,6 +191,84 @@ template, but if you depend on a specific implementation, you should use #prefer
When a file extension has a preferred template class, Tilt will *always* use
that class, even if it raises an exception.
+Encodings
+---------
+
+All Tilt template implementations must follow a few guidelines regarding string
+encodings under MRI >= Ruby 1.9 and other encoding aware environments. This
+section defines "good behavior" for template implementations that support
+multiple encodings.
+
+There are two places where encodings come into play:
+
+ - __Template source data encoding.__ When a template is read from the
+ filesystem, how do we know what encoding to set on the string? This is
+ complicated by the fact that many template formats support embedded magic
+ encoding declarations, while others mandate that template source data be in a
+ specific encoding (utf-8 only formats).
+
+ - __Render context and result encoding.__ In what encoding is the output being
+ generated in? It's often useful to guarantee that templates are evaluated in
+ utf-8 context and will generate utf-8 output regardless of the template's
+ source encoding. What effect does `Encoding.default_internal` have on
+ template execution and output?
+
+Tilt's encoding support aims only to provide a framework for answering these
+questions for each template engine. It does not attempt to define a single
+behavior that all templates must conform to because templates vary widely in
+encoding support.
+
+### Template Source Encoding
+
+The template source data may come from a file or from a string. In either case,
+the real template source encoding should be determined as follows in order of
+preference:
+
+ - Template specific encoding rules (e.g., utf-8 only formats).
+ - A (template specific) magic encoding comment embedded in the source string.
+ - The source string's existing encoding (string only).
+ - The `:default_encoding` option to `Template.new` (file only).
+ - `Encoding.default_external` - the default system encoding (file only)
+
+Some template file formats have strict encoding requirements. CoffeeScript is a
+utf-8 only format for instance. Template implementations are encouraged to use
+this type of information to constrain the detection logic defined above.
+
+### Render Context Encoding
+
+When the system internal encoding (`Encoding.default_internal`) *is not* set
+(MRI default), templates should be evaluated and produce a result string encoded
+the same as the template source data. e.g., A Big5 encoded template on disk will
+generate a Big5 result string and expect interpolated values to be Big5
+compatible.
+
+When `Encoding.default_internal` *is* set, templates should be converted from
+the template source encoding to the internal encoding *before* being compiled /
+evaluated and the result string should be encoded in the default internal
+encoding. For instance, when `default_internal` is set to UTF-8, a Big5 encoded
+template on disk will generate a UTF-8 result string and interpolated values
+must be utf-8 compatible.
+
+Templates that perform render context transcoding must allow these default
+behaviors to be controlled via the `:transcode` option:
+
+ - `:transcode => true` - Convert from template source encoding to the system
+ default internal encoding (`Encoding.default_internal`) before evaluating the
+ template. The result string is guaranteed to be in the default internal
+ encoding. Do nothing when `Encoding.default_internal` is nil.
+
+ This is the default behavior when no `:transcode` option is given.
+
+ - `:transcode => false` - Perform no encoding conversion. The result string
+ will have the same encoding as the detected template source string.
+
+ This is the default behavior when `Encoding.default_internal` is nil.
+
+ - `:transcode => 'utf-8'` - Ignore `Encoding.default_internal`. Instead,
+ convert from template source encoding to utf-8 before evaluating the
+ template. The result string is guaranteed to be utf-8 encoded. The encoding
+ value (`'utf-8'`) may be any valid encoding name or Encoding constant.
Joshua Peek
josh added a note

I don't think this :transcode option is implemented yet. Is this going to be set as an option on initialize or supposed to be passed each time to the render method?

We might be able to provide some of these things for the handler by adding them to the default render implementation. The before evaluating step doesn't always exist for every handler. For an example, coffee, sass and markdown are contextless and ignore the locals option. So these handlers only need to call encode on the final string they produce. So we might be able to move that final encode to our render. But I'm not sure if thats entirely a good idea.

Ryan Tomayko Owner

Nope I haven't started in on any of the transcoding yet.

I hadn't even considered making this an option to render. I figured we'd want it on initialize and also maybe as a Template class attribute that can be overridden by subclasses. Having it on render could be interesting. I'm not sure how much it'd be used though. I think the 99% case for transcoding is going to be people setting Encoding.default_internal = 'utf-8' or wanting the same effect when running templates.

Joshua Peek
josh added a note

Don't really want to put it on render cause we have no options arg. Passing to initialize makes the most sense.

We could provide some sort of Template#encode() helper that encodes the string to options[:transcode] || default_internal. So handlers can call that before they return the final result.

Ryan Tomayko Owner

Exactly what I'm thinking. It needs to happen before evaluation for context / interpolating templates though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+
Template Compilation
--------------------
38 lib/tilt/builder.rb
View
@@ -1,8 +1,29 @@
require 'tilt/template'
module Tilt
- # Builder template implementation. See:
- # http://builder.rubyforge.org/
+ # XML Builder Template implementation
+ #
+ # - http://builder.rubyforge.org/
+ #
+ # Builder templates support three types of template input: string, file,
+ # and block. When the initialize block returns a non-string object that
+ # responds to call (Proc), template execution consists of calling the block
+ # with a Builder::XmlMarkup instance:
+ #
+ # BuilderTemplate.new do
+ # lambda do |xml|
+ # xml.h1 'howdy dudy'
+ # xml.p 'blaahhh'
+ # end
+ # end
+ #
+ # Builder templates can also be instantiated from a string or file. In that
+ # case, the source encoding is determined according to the rules documented
+ # in the Tilt README under Encodings. The ruby magic comment line is supported
+ # for specifying an alternative encoding.
+ #
+ # Builder templates always produce utf-8 encoded result strings regardless of
+ # the source string / file encoding.
class BuilderTemplate < Template
self.default_mime_type = 'text/xml'
@@ -14,7 +35,10 @@ def initialize_engine
require_template_library 'builder'
end
- def prepare; end
+ def prepare
+ return if !data.respond_to?(:to_str)
+ @source = assign_source_encoding(data.to_str)
Joshua Peek
josh added a note

I like the opt-in for assign_source_encoding.

Ryan Tomayko Owner

Yeah. I'm going with an approach where the base class provides some convenience APIs but the behavior is more or less in the hands of the template subclass. I don't think there's any other way since template's vary so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ end
def evaluate(scope, locals, &block)
return super(scope, locals, &block) if data.respond_to?(:to_str)
@@ -23,6 +47,10 @@ def evaluate(scope, locals, &block)
xml.target!
end
+ def precompiled_template(locals)
+ @source
+ end
+
def precompiled_preamble(locals)
return super if locals.include? :xml
"xml = ::Builder::XmlMarkup.new(:indent => 2)\n#{super}"
@@ -31,10 +59,6 @@ def precompiled_preamble(locals)
def precompiled_postamble(locals)
"xml.target!"
end
-
- def precompiled_template(locals)
- data.to_str
- end
end
end
20 lib/tilt/coffee.rb
View
@@ -1,10 +1,16 @@
require 'tilt/template'
module Tilt
- # CoffeeScript template implementation. See:
- # http://coffeescript.org/
+ # CoffeeScript template implementation.
+ #
+ # - http://coffeescript.org/
#
# CoffeeScript templates do not support object scopes, locals, or yield.
+ #
+ # All CoffeeScript files must be utf-8 encoded. The :default_encoding
+ # option and system default encoding are ignored. When a non-utf-8 string
+ # is provided via custom reader block, it is converted to utf-8 before
+ # being passed to the Coffee compiler.
class CoffeeScriptTemplate < Template
self.default_mime_type = 'application/javascript'
@@ -40,11 +46,21 @@ def prepare
if !options.key?(:bare) and !options.key?(:no_wrap)
options[:bare] = self.class.default_bare
end
+
+ # if string was given and its not utf-8, transcode it now
+ data.encode! 'UTF-8' if data.respond_to?(:encode!)
end
def evaluate(scope, locals, &block)
@output ||= CoffeeScript.compile(data, options)
end
+
+ # Override to set the @default_encoding to always be utf-8, ignoring the
+ # :default_encoding option value.
+ def read_template_file
+ @default_encoding = 'UTF-8'
+ super
+ end
end
end
49 lib/tilt/erb.rb
View
@@ -3,6 +3,11 @@
module Tilt
# ERB template implementation. See:
# http://www.ruby-doc.org/stdlib/libdoc/erb/rdoc/classes/ERB.html
+ #
+ # The template supports encoding detection via first line magic comment:
+ # <%# coding: utf-8 %>
+ #
+ # When present, the string's encoding is adjusted to the specified value.
class ERBTemplate < Template
@@default_output_variable = '_erbout'
@@ -22,17 +27,30 @@ def initialize_engine
require_template_library 'erb'
end
+ # Create an ERB object and generate the Ruby source code for the template.
+ # The resulting source string has the same encoding as the input data
+ # *unless* the template includes a magic comment, in which case the source
+ # string AND the template data will be marked with the declared encoding.
+ #
+ # The resulting source string does not include any magic comment line
+ # generated by ERB. The string.encoding should be used to determine the
+ # source and output encoding.
def prepare
@outvar = options[:outvar] || self.class.default_output_variable
options[:trim] = '<>' if options[:trim].nil? || options[:trim] == true
@engine = ::ERB.new(data, options[:safe], options[:trim], @outvar)
+ encoding = data.respond_to?(:encoding) ? data.encoding : nil
+ @source = assign_source_encoding(@engine.src, encoding, remove=true)
+ @data.force_encoding @source.encoding if @data.respond_to?(:force_encoding)
end
+ # Override to always return the generated source string.
def precompiled_template(locals)
- source = @engine.src
- source
+ @source
end
+ # Override to store the original state of the output variable before
+ # this template is executed.
def precompiled_preamble(locals)
<<-RUBY
begin
@@ -41,6 +59,8 @@ def precompiled_preamble(locals)
RUBY
end
+ # Override to reset the output variable to its state before the template
+ # was executed.
def precompiled_postamble(locals)
<<-RUBY
#{super}
@@ -49,15 +69,6 @@ def precompiled_postamble(locals)
end
RUBY
end
-
- # ERB generates a line to specify the character coding of the generated
- # source in 1.9. Account for this in the line offset.
- if RUBY_VERSION >= '1.9.0'
- def precompiled(locals)
- source, offset = super
- [source, offset + 1]
- end
- end
end
# Erubis template implementation. See:
@@ -72,6 +83,10 @@ def precompiled(locals)
# :escape_html when true, ::Erubis::EscapedEruby will be used as
# the engine class instead of the default. All content
# within <%= %> blocks will be automatically html escaped.
+ #
+ # Unlike ERB, the Erubis template engine does not support encoding detection
+ # via magic comment. Encoding declarations are ignored. The :default_encoding
+ # option or system default external encoding are used by default.
class ErubisTemplate < ERBTemplate
def self.engine_initialized?
defined? ::Erubis
@@ -87,6 +102,9 @@ def prepare
engine_class = options.delete(:engine_class)
engine_class = ::Erubis::EscapedEruby if options.delete(:escape_html)
@engine = (engine_class || ::Erubis::Eruby).new(data, options)
+ encoding = data.respond_to?(:encoding) ? data.encoding : nil
+ @source = assign_source_encoding(@engine.src, encoding, remove=false)
+ @data.force_encoding @source.encoding if @data.respond_to?(:force_encoding)
end
def precompiled_preamble(locals)
@@ -96,15 +114,6 @@ def precompiled_preamble(locals)
def precompiled_postamble(locals)
[@outvar, super].join("\n")
end
-
- # Erubis doesn't have ERB's line-off-by-one under 1.9 problem.
- # Override and adjust back.
- if RUBY_VERSION >= '1.9.0'
- def precompiled(locals)
- source, offset = super
- [source, offset - 1]
- end
- end
end
end
147 lib/tilt/template.rb
View
@@ -30,11 +30,17 @@ class << self
end
# Create a new template with the file, line, and options specified. By
- # default, template data is read from the file. When a block is given,
- # it should read template data and return as a String. When file is nil,
- # a block is required.
+ # default, template data is read from file and assumed to be in the
+ # system default external encoding (Encoding.default_external). When a
+ # block is given, it should read template data and return a String with
+ # a best guess encoding.
#
- # All arguments are optional.
+ # The :default_encoding option is supported by most template engines. When
+ # set, data read from disk will be assumed to be in this encoding instead
+ # of Encoding.default_external. The option has no effect when a custom
+ # reader block is given.
+ #
+ # All arguments are optional but a file or block must be specified.
def initialize(file=nil, line=1, options={}, &block)
@file, @line, @options = nil, 1, {}
@@ -59,12 +65,11 @@ def initialize(file=nil, line=1, options={}, &block)
# used to hold compiled template methods
@compiled_method = {}
- # used on 1.9 to set the encoding if it is not set elsewhere (like a magic comment)
- # currently only used if template compiles to ruby
+ # Overrides Encoding.default_external when reading from filesystem
@default_encoding = @options.delete :default_encoding
# load template data and prepare (uses binread to avoid encoding issues)
- @reader = block || lambda { |t| File.respond_to?(:binread) ? File.binread(@file) : File.read(@file) }
+ @reader = block || lambda { |t| read_template_file }
@data = @reader.call(self)
prepare
end
@@ -98,6 +103,29 @@ def eval_file
def initialize_engine
end
+ # Read template data from file, possibly overriding the encoding based on
+ # the default_encoding option. This is used when the object is created with
+ # a file and no reader block.
+ #
+ # Unlike File.read, this method does not transcode into the system
+ # Encoding.default_internal encoding. The best guess encoding is set and
+ # available from data.encoding.
+ #
+ # Subclasses may override this method if they have specific knowledge about
+ # the file's encoding and can provide better default encoding support.
+ #
+ # Raise exception when file doesn't exist.
+ # Does not raise an exception when the file's data is invalid in the best
+ # guess encoding.
+ def read_template_file
+ data = File.open(file, 'rb') { |io| io.read }
+ if data.respond_to?(:force_encoding)
+ encoding = @default_encoding || Encoding.default_external
+ data.force_encoding(encoding)
+ end
+ data
+ end
+
# Like Kernel#require but issues a warning urging a manual require when
# running under a threaded environment.
def require_template_library(name)
@@ -113,6 +141,14 @@ def require_template_library(name)
# variables set in this method are available when #evaluate is called.
#
# Subclasses must provide an implementation of this method.
+ #
+ # The data attribute holds the template source string marked with the best
+ # guess encoding. When the template was read from the filesystem this will
+ # be either the :default_encoding provided when the template was created or
+ # the system default Encoding.default_external encoding. When the template
+ # data was provided via reader block, it will be in whatever encoding was
+ # set on the string originally. Subclasses are responsible for detecting
+ # template specific magic syntax encodings embedded in the template data.
def prepare
if respond_to?(:compile!)
# backward compat with tilt < 0.6; just in case
@@ -156,18 +192,19 @@ def self.cached_evaluate(scope, locals, &block)
def precompiled(locals)
preamble = precompiled_preamble(locals)
template = precompiled_template(locals)
- magic_comment = extract_magic_comment(template)
- if magic_comment
- # Magic comment e.g. "# coding: utf-8" has to be in the first line.
- # So we copy the magic comment to the first line.
- preamble = magic_comment + "\n" + preamble
+
+ source = ''
+ if source.respond_to?(:force_encoding)
+ source.force_encoding template.encoding
end
- parts = [
- preamble,
- template,
- precompiled_postamble(locals)
- ]
- [parts.join("\n"), preamble.count("\n") + 1]
+
+ source << preamble
+ source << "\n"
+ source << template
+ source << "\n"
+ source << precompiled_postamble(locals)
+
+ [source, preamble.count("\n") + 1]
end
# A string containing the (Ruby) source code for the template. The
@@ -230,20 +267,29 @@ def compile_template_method(locals)
source, offset = precompiled(locals)
offset += 5
method_name = "__tilt_#{Thread.current.object_id.abs}"
- Object.class_eval <<-RUBY, eval_file, line - offset
- #{extract_magic_comment source}
+ method_source = ""
+
+ if method_source.respond_to?(:force_encoding)
+ method_source.force_encoding source.encoding
+ end
+
+ method_source << <<-RUBY
TOPOBJECT.class_eval do
def #{method_name}(locals)
Thread.current[:tilt_vars] = [self, locals]
class << self
this, locals = Thread.current[:tilt_vars]
this.instance_eval do
- #{source}
+ RUBY
+ method_source << source
+ method_source << <<-RUBY
end
end
end
end
RUBY
+
+ Object.class_eval method_source, eval_file, line - offset
unbind_compiled_method(method_name)
end
@@ -253,12 +299,59 @@ def unbind_compiled_method(method_name)
method
end
- def extract_magic_comment(script)
- comment = script.slice(/\A[ \t]*\#.*coding\s*[=:]\s*([[:alnum:]\-_]+).*$/)
- if comment && !%w[ascii-8bit binary].include?($1.downcase)
- comment
- elsif @default_encoding
- "# coding: #{@default_encoding}"
+ # Regexp used to find and remove magic comment lines from Ruby source.
+ MAGIC = /\A[ \t]*\#.*coding\s*[=:]\s*([[:alnum:]\-_]+).*?\n/mn
+
+ # Checks for a Ruby 1.9 encoding comment on the first line of source.
+ #
+ # source - string to check for magic comment line
+ # remove - set true to remove the line from the string in place
+ #
+ # Returns the encoding name string or nil when no comment was present.
+ def extract_source_encoding(source, remove=false)
+ binary source do
+ slice = remove ? :slice! : :slice
+ $1 if source.__send__(slice, MAGIC)
+ end
+ end
+
+ # Extract encoding comment from source and mark the string's encoding. The
+ # string is modified in place. When no encoding is found, the encoding
+ # passed in the default argument is used. The remove argument can be set
+ # true to remove the magic comment line from the source string in place.
+ #
+ # This method is a no-op under Ruby < 1.9
+ if ''.respond_to?(:force_encoding)
+ def assign_source_encoding(source, default=nil, remove=false)
+ if encoding = extract_source_encoding(source, remove)
+ source.force_encoding(encoding)
+ elsif default
+ source.force_encoding(default)
+ else
+ source
+ end
+ end
+ else
+ def assign_source_encoding(source, *args)
+ source
+ end
+ end
+
+ # Mark the string as BINARY/ASCII-8BIT for the duration of the block. The
+ # string is reset to its original encoding before this method returns. This
+ # combined with //n flagged regular expressions is one way to avoid encoding
+ # compatibility errors while a string's encoding is still in best guess mode.
+ if ''.respond_to?(:force_encoding)
+ def binary(string)
+ original_encoding = string.encoding
+ string.force_encoding 'BINARY'
+ yield
+ ensure
+ string.force_encoding original_encoding
+ end
+ else
+ def binary(string)
+ yield string
end
end
63 test/tilt_buildertemplate_test.rb
View
@@ -1,3 +1,4 @@
+# coding: utf-8
require 'contest'
require 'tilt'
@@ -53,6 +54,68 @@ class BuilderTemplateTest < Test::Unit::TestCase
template.render(options) { subtemplate.render(options) }
end
end
+
+ ##
+ # Encodings
+
+ if defined?(Encoding) && Encoding.respond_to?(:default_internal)
+ original_encoding = Encoding.default_external
+ setup do
+ Encoding.default_external = 'utf-8'
+ Encoding.default_internal = nil
+ end
+ teardown do
+ Encoding.default_external = original_encoding
+ Encoding.default_internal = nil
+ end
+
+ def tempfile(name='template')
+ f = Tempfile.open(name)
+ f.sync = true
+ yield f
+ ensure
+ f.close rescue nil
+ f.delete
+ end
+
+ test "reading templates using default external encoding" do
+ Encoding.default_external = 'Shift_JIS'
+ tempfile do |f|
+ f.puts("xml.em 'ふが' + @hoge".encode('Shift_JIS'))
+ template = Tilt::BuilderTemplate.new(f.path)
+ assert_equal 'Shift_JIS', template.data.encoding.to_s
+ @hoge = "ほげ".encode('Shift_JIS')
+ assert_equal 'UTF-8', template.render(self).encoding.to_s
+ end
+ end
+
+ test "reading templates using :default_encoding option override" do
+ Encoding.default_external = 'Big5'
+ tempfile do |f|
+ f.puts("xml.em 'ふが' + @hoge".encode('Shift_JIS'))
+ template = Tilt::BuilderTemplate.new(f.path, :default_encoding => 'Shift_JIS')
+ assert_equal 'Shift_JIS', template.data.encoding.to_s
+ @hoge = "ほげ".encode('Shift_JIS')
+ assert_equal 'UTF-8', template.render(self).encoding.to_s
+ end
+ end
+
+ test "reading template with magic encoding comment" do
+ Encoding.default_external = 'Big5'
+ tempfile do |f|
+ f.puts("# coding: Shift_JIS".encode('Shift_JIS'))
+ f.puts("xml.em 'ふが' + @hoge".encode('Shift_JIS'))
+ # require 'ruby-debug'
+ # debugger
+ template = Tilt::BuilderTemplate.new(f.path)
+ assert_equal 'Shift_JIS', template.data.encoding.to_s
+ @hoge = "ほげ".encode('Shift_JIS')
+ output = template.render(self)
+ assert_equal 'UTF-8', output.encoding.to_s
+ assert_equal "<em>ふがほげ</em>\n", output
+ end
+ end
+ end
end
rescue LoadError
warn "Tilt::BuilderTemplate (disabled)"
46 test/tilt_coffeescripttemplate_test.rb
View
@@ -1,3 +1,4 @@
+# coding: utf-8
require 'contest'
require 'tilt'
@@ -54,8 +55,51 @@ class CoffeeScriptTemplateTest < Test::Unit::TestCase
assert_not_equal "puts('Hello, World!');", template.render
end
end
- end
+ ##
+ # Encodings
+
+ if defined?(Encoding) && Encoding.respond_to?(:default_internal)
+ original_encoding = Encoding.default_external
+ setup { Encoding.default_external = 'utf-8' }
+ teardown { Encoding.default_external = original_encoding }
+
+ def tempfile(name='template')
+ f = Tempfile.open(name)
+ f.sync = true
+ yield f
+ ensure
+ f.close rescue nil
+ f.delete
+ end
+
+ test "ignores default external encoding" do
+ tempfile do |f|
+ f.puts("console.log 'ふがほげ'")
+ Encoding.default_external = 'Shift_JIS'
+ template = Tilt::CoffeeScriptTemplate.new(f.path)
+ assert_equal 'UTF-8', template.data.encoding.to_s
+ assert_equal 'UTF-8', template.render(self).encoding.to_s
+ end
+ end
+
+ test "ignores :default_encoding option" do
+ tempfile do |f|
+ f.puts("console.log 'ふがほげ'")
+ template = Tilt::CoffeeScriptTemplate.new(f.path, :default_encoding => 'Shift_JIS')
+ assert_equal 'UTF-8', template.data.encoding.to_s
+ assert_equal 'UTF-8', template.render(self).encoding.to_s
+ end
+ end
+
+ test "transcodes input string to utf-8" do
+ string = "console.log 'ふがほげ'".encode("Shift_JIS")
+ template = Tilt::CoffeeScriptTemplate.new { string }
+ assert_equal 'UTF-8', template.data.encoding.to_s
+ assert_equal 'UTF-8', template.render(self).encoding.to_s
+ end
+ end
+ end
rescue LoadError => boom
warn "Tilt::CoffeeScriptTemplate (disabled)"
end
75 test/tilt_erbtemplate_test.rb
View
@@ -201,25 +201,62 @@ class Scope
assert_equal "\nhello\n", template.render(Scope.new)
end
- test "encoding with magic comment" do
- f = Tempfile.open("template")
- f.puts('<%# coding: UTF-8 %>')
- f.puts('ふが <%= @hoge %>')
- f.close()
- @hoge = "ほげ"
- erb = Tilt::ERBTemplate.new(f.path)
- 3.times { erb.render(self) }
- f.delete
- end
-
- test "encoding with :default_encoding" do
- f = Tempfile.open("template")
- f.puts('ふが <%= @hoge %>')
- f.close()
- @hoge = "ほげ"
- erb = Tilt::ERBTemplate.new(f.path, :default_encoding => 'UTF-8')
- 3.times { erb.render(self) }
- f.delete
+ ##
+ # Encodings
+
+ if defined?(Encoding) && Encoding.respond_to?(:default_internal)
+ original_encoding = Encoding.default_external
+ setup do
+ Encoding.default_external = 'utf-8'
+ Encoding.default_internal = nil
+ end
+ teardown do
+ Encoding.default_external = original_encoding
+ Encoding.default_internal = nil
+ end
+
+ def tempfile(name='template')
+ f = Tempfile.open(name)
+ f.sync = true
+ yield f
+ ensure
+ f.close rescue nil
+ f.delete
+ end
+
+ test "producing default external encoded result string" do
+ Encoding.default_external = 'Shift_JIS'
+ tempfile do |f|
+ f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
+ erb = Tilt::ERBTemplate.new(f.path)
+ assert_equal 'Shift_JIS', erb.data.encoding.to_s
+ @hoge = "ほげ".encode('Shift_JIS')
+ assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
+ end
+ end
+
+ test "producing default_encoding encoded result string" do
+ Encoding.default_external = 'Big5'
+ tempfile do |f|
+ f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
+ erb = Tilt::ERBTemplate.new(f.path, :default_encoding => 'Shift_JIS')
+ assert_equal 'Shift_JIS', erb.data.encoding.to_s
+ @hoge = "ほげ".encode('Shift_JIS')
+ assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
+ end
+ end
+
+ test "producing magic comment encoded result string" do
+ Encoding.default_external = 'Big5'
+ tempfile do |f|
+ f.puts('<%# coding: Shift_JIS %>'.encode('Shift_JIS'))
+ f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
+ erb = Tilt::ERBTemplate.new(f.path)
+ assert_equal 'Shift_JIS', erb.data.encoding.to_s
+ @hoge = "ほげ".encode('Shift_JIS')
+ assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
+ end
+ end
end
end
63 test/tilt_erubistemplate_test.rb
View
@@ -1,3 +1,4 @@
+# coding: utf-8
require 'contest'
require 'tilt'
@@ -135,6 +136,68 @@ class MockOutputVariableScope
template = Tilt::ErubisTemplate.new(nil, options_hash) { |t| "Hello World!" }
assert_equal({:escape_html => true}, options_hash)
end
+
+ ##
+ # Encodings
+
+ if defined?(Encoding) && Encoding.respond_to?(:default_internal)
+ original_encoding = Encoding.default_external
+ setup do
+ Encoding.default_external = 'utf-8'
+ Encoding.default_internal = nil
+ end
+ teardown do
+ Encoding.default_external = original_encoding
+ Encoding.default_internal = nil
+ end
+
+ def tempfile(name='template')
+ f = Tempfile.open(name)
+ f.sync = true
+ yield f
+ ensure
+ f.close rescue nil
+ f.delete
+ end
+
+ test "producing default external encoded result string" do
+ Encoding.default_external = 'Shift_JIS'
+ tempfile do |f|
+ f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
+ erb = Tilt::ErubisTemplate.new(f.path)
+ assert_equal 'Shift_JIS', erb.data.encoding.to_s
+ @hoge = "ほげ".encode('Shift_JIS')
+ assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
+ end
+ end
+
+ test "producing default_encoding encoded result string" do
+ Encoding.default_external = 'Big5'
+ tempfile do |f|
+ f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
+ erb = Tilt::ErubisTemplate.new(f.path, :default_encoding => 'Shift_JIS')
+ assert_equal 'Shift_JIS', erb.data.encoding.to_s
+ @hoge = "ほげ".encode('Shift_JIS')
+ assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
+ end
+ end
+
+ # NOTE Erubis does not support ERB's magic comments.
+ # <%# coding: blah %> does not effect the template's encoding
+
+ # test "producing magic comment encoded result string" do
+ # Encoding.default_external = 'Big5'
+ # tempfile do |f|
+ # f.puts('<%# coding: Shift_JIS %>'.encode('Shift_JIS'))
+ # f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
+ # erb = Tilt::ErubisTemplate.new(f.path)
+ # assert_equal 'Shift_JIS', erb.data.encoding.to_s
+ # @hoge = "ほげ".encode('Shift_JIS')
+ # assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
+ # end
+ # end
+ end
+
end
rescue LoadError => boom
warn "Tilt::ErubisTemplate (disabled)"
48 test/tilt_template_test.rb
View
@@ -48,6 +48,54 @@ def prepare
MockTemplate.new { |template| "Hello World!" }
end
+ ##
+ # Encodings
+
+ if ''.respond_to?(:encoding)
+ original_encoding = Encoding.default_external
+
+ setup do
+ @file = Tempfile.open('template')
+ @file.puts "stuff"
+ @file.close
+ @template = @file.path
+ end
+
+ teardown do
+ Encoding.default_external = original_encoding
+ Encoding.default_internal = nil
+ @file.delete
+ end
+
+ test "reading from file assumes default external encoding" do
+ Encoding.default_external = 'Big5'
+ inst = MockTemplate.new(@template)
+ assert_equal 'Big5', inst.data.encoding.to_s
+ end
+
+ test "reading from file with a :default_encoding overrides default external" do
+ Encoding.default_external = 'Big5'
+ inst = MockTemplate.new(@template, :default_encoding => 'GBK')
+ assert_equal 'GBK', inst.data.encoding.to_s
+ end
+
+ test "reading from file with default_internal set does no transcoding" do
+ Encoding.default_internal = 'utf-8'
+ Encoding.default_external = 'Big5'
+ inst = MockTemplate.new(@template)
+ assert_equal 'Big5', inst.data.encoding.to_s
+ end
+
+ test "using provided template data verbatim when given as string" do
+ Encoding.default_internal = 'Big5'
+ inst = MockTemplate.new(@template) { "blah".force_encoding('GBK') }
+ assert_equal 'GBK', inst.data.encoding.to_s
+ end
+ end
+
+ ##
+ # Engine Initialization
+
class InitializingMockTemplate < Tilt::Template
@@initialized_count = 0
def self.initialized_count
Something went wrong with that request. Please try again.