Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding Support #107

Closed
wants to merge 11 commits into from
78 changes: 78 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,84 @@ template, but if you depend on a specific implementation, you should use #prefer
When a file extension has a preferred template class, Tilt will *always* use
that class, even if it raises an exception.

Encodings
---------

All Tilt template implementations must follow a few guidelines regarding string
encodings under MRI >= Ruby 1.9 and other encoding aware environments. This
section defines "good behavior" for template implementations that support
multiple encodings.

There are two places where encodings come into play:

- __Template source data encoding.__ When a template is read from the
filesystem, how do we know what encoding to set on the string? This is
complicated by the fact that many template formats support embedded magic
encoding declarations, while others mandate that template source data be in a
specific encoding (utf-8 only formats).

- __Render context and result encoding.__ In what encoding is the output being
generated in? It's often useful to guarantee that templates are evaluated in
utf-8 context and will generate utf-8 output regardless of the template's
source encoding. What effect does `Encoding.default_internal` have on
template execution and output?

Tilt's encoding support aims only to provide a framework for answering these
questions for each template engine. It does not attempt to define a single
behavior that all templates must conform to because templates vary widely in
encoding support.

### Template Source Encoding

The template source data may come from a file or from a string. In either case,
the real template source encoding should be determined as follows in order of
preference:

- Template specific encoding rules (e.g., utf-8 only formats).
- A (template specific) magic encoding comment embedded in the source string.
- The source string's existing encoding (string only).
- The `:default_encoding` option to `Template.new` (file only).
- `Encoding.default_external` - the default system encoding (file only)

Some template file formats have strict encoding requirements. CoffeeScript is a
utf-8 only format for instance. Template implementations are encouraged to use
this type of information to constrain the detection logic defined above.

### Render Context Encoding

When the system internal encoding (`Encoding.default_internal`) *is not* set
(MRI default), templates should be evaluated and produce a result string encoded
the same as the template source data. e.g., A Big5 encoded template on disk will
generate a Big5 result string and expect interpolated values to be Big5
compatible.

When `Encoding.default_internal` *is* set, templates should be converted from
the template source encoding to the internal encoding *before* being compiled /
evaluated and the result string should be encoded in the default internal
encoding. For instance, when `default_internal` is set to UTF-8, a Big5 encoded
template on disk will generate a UTF-8 result string and interpolated values
must be utf-8 compatible.

Templates that perform render context transcoding must allow these default
behaviors to be controlled via the `:transcode` option:

- `:transcode => true` - Convert from template source encoding to the system
default internal encoding (`Encoding.default_internal`) before evaluating the
template. The result string is guaranteed to be in the default internal
encoding. Do nothing when `Encoding.default_internal` is nil.

This is the default behavior when no `:transcode` option is given.

- `:transcode => false` - Perform no encoding conversion. The result string
will have the same encoding as the detected template source string.

This is the default behavior when `Encoding.default_internal` is nil.

- `:transcode => 'utf-8'` - Ignore `Encoding.default_internal`. Instead,
convert from template source encoding to utf-8 before evaluating the
template. The result string is guaranteed to be utf-8 encoded. The encoding
value (`'utf-8'`) may be any valid encoding name or Encoding constant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this :transcode option is implemented yet. Is this going to be set as an option on initialize or supposed to be passed each time to the render method?

We might be able to provide some of these things for the handler by adding them to the default render implementation. The before evaluating step doesn't always exist for every handler. For an example, coffee, sass and markdown are contextless and ignore the locals option. So these handlers only need to call encode on the final string they produce. So we might be able to move that final encode to our render. But I'm not sure if thats entirely a good idea.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope I haven't started in on any of the transcoding yet.

I hadn't even considered making this an option to render. I figured we'd want it on initialize and also maybe as a Template class attribute that can be overridden by subclasses. Having it on render could be interesting. I'm not sure how much it'd be used though. I think the 99% case for transcoding is going to be people setting Encoding.default_internal = 'utf-8' or wanting the same effect when running templates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't really want to put it on render cause we have no options arg. Passing to initialize makes the most sense.

We could provide some sort of Template#encode() helper that encodes the string to options[:transcode] || default_internal. So handlers can call that before they return the final result.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly what I'm thinking. It needs to happen before evaluation for context / interpolating templates though.


Template Compilation
--------------------

Expand Down
38 changes: 31 additions & 7 deletions lib/tilt/builder.rb
Original file line number Diff line number Diff line change
@@ -1,8 +1,29 @@
require 'tilt/template'

module Tilt
# Builder template implementation. See:
# http://builder.rubyforge.org/
# XML Builder Template implementation
#
# - http://builder.rubyforge.org/
#
# Builder templates support three types of template input: string, file,
# and block. When the initialize block returns a non-string object that
# responds to call (Proc), template execution consists of calling the block
# with a Builder::XmlMarkup instance:
#
# BuilderTemplate.new do
# lambda do |xml|
# xml.h1 'howdy dudy'
# xml.p 'blaahhh'
# end
# end
#
# Builder templates can also be instantiated from a string or file. In that
# case, the source encoding is determined according to the rules documented
# in the Tilt README under Encodings. The ruby magic comment line is supported
# for specifying an alternative encoding.
#
# Builder templates always produce utf-8 encoded result strings regardless of
# the source string / file encoding.
class BuilderTemplate < Template
self.default_mime_type = 'text/xml'

Expand All @@ -14,7 +35,10 @@ def initialize_engine
require_template_library 'builder'
end

def prepare; end
def prepare
return if !data.respond_to?(:to_str)
@source = assign_source_encoding(data.to_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the opt-in for assign_source_encoding.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I'm going with an approach where the base class provides some convenience APIs but the behavior is more or less in the hands of the template subclass. I don't think there's any other way since template's vary so much.

end

def evaluate(scope, locals, &block)
return super(scope, locals, &block) if data.respond_to?(:to_str)
Expand All @@ -23,6 +47,10 @@ def evaluate(scope, locals, &block)
xml.target!
end

def precompiled_template(locals)
@source
end

def precompiled_preamble(locals)
return super if locals.include? :xml
"xml = ::Builder::XmlMarkup.new(:indent => 2)\n#{super}"
Expand All @@ -31,10 +59,6 @@ def precompiled_preamble(locals)
def precompiled_postamble(locals)
"xml.target!"
end

def precompiled_template(locals)
data.to_str
end
end
end

20 changes: 18 additions & 2 deletions lib/tilt/coffee.rb
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
require 'tilt/template'

module Tilt
# CoffeeScript template implementation. See:
# http://coffeescript.org/
# CoffeeScript template implementation.
#
# - http://coffeescript.org/
#
# CoffeeScript templates do not support object scopes, locals, or yield.
#
# All CoffeeScript files must be utf-8 encoded. The :default_encoding
# option and system default encoding are ignored. When a non-utf-8 string
# is provided via custom reader block, it is converted to utf-8 before
# being passed to the Coffee compiler.
class CoffeeScriptTemplate < Template
self.default_mime_type = 'application/javascript'

Expand Down Expand Up @@ -40,11 +46,21 @@ def prepare
if !options.key?(:bare) and !options.key?(:no_wrap)
options[:bare] = self.class.default_bare
end

# if string was given and its not utf-8, transcode it now
data.encode! 'UTF-8' if data.respond_to?(:encode!)
end

def evaluate(scope, locals, &block)
@output ||= CoffeeScript.compile(data, options)
end

# Override to set the @default_encoding to always be utf-8, ignoring the
# :default_encoding option value.
def read_template_file
@default_encoding = 'UTF-8'
super
end
end
end

49 changes: 29 additions & 20 deletions lib/tilt/erb.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@
module Tilt
# ERB template implementation. See:
# http://www.ruby-doc.org/stdlib/libdoc/erb/rdoc/classes/ERB.html
#
# The template supports encoding detection via first line magic comment:
# <%# coding: utf-8 %>
#
# When present, the string's encoding is adjusted to the specified value.
class ERBTemplate < Template
@@default_output_variable = '_erbout'

Expand All @@ -22,17 +27,30 @@ def initialize_engine
require_template_library 'erb'
end

# Create an ERB object and generate the Ruby source code for the template.
# The resulting source string has the same encoding as the input data
# *unless* the template includes a magic comment, in which case the source
# string AND the template data will be marked with the declared encoding.
#
# The resulting source string does not include any magic comment line
# generated by ERB. The string.encoding should be used to determine the
# source and output encoding.
def prepare
@outvar = options[:outvar] || self.class.default_output_variable
options[:trim] = '<>' if options[:trim].nil? || options[:trim] == true
@engine = ::ERB.new(data, options[:safe], options[:trim], @outvar)
encoding = data.respond_to?(:encoding) ? data.encoding : nil
@source = assign_source_encoding(@engine.src, encoding, remove=true)
@data.force_encoding @source.encoding if @data.respond_to?(:force_encoding)
end

# Override to always return the generated source string.
def precompiled_template(locals)
source = @engine.src
source
@source
end

# Override to store the original state of the output variable before
# this template is executed.
def precompiled_preamble(locals)
<<-RUBY
begin
Expand All @@ -41,6 +59,8 @@ def precompiled_preamble(locals)
RUBY
end

# Override to reset the output variable to its state before the template
# was executed.
def precompiled_postamble(locals)
<<-RUBY
#{super}
Expand All @@ -49,15 +69,6 @@ def precompiled_postamble(locals)
end
RUBY
end

# ERB generates a line to specify the character coding of the generated
# source in 1.9. Account for this in the line offset.
if RUBY_VERSION >= '1.9.0'
def precompiled(locals)
source, offset = super
[source, offset + 1]
end
end
end

# Erubis template implementation. See:
Expand All @@ -72,6 +83,10 @@ def precompiled(locals)
# :escape_html when true, ::Erubis::EscapedEruby will be used as
# the engine class instead of the default. All content
# within <%= %> blocks will be automatically html escaped.
#
# Unlike ERB, the Erubis template engine does not support encoding detection
# via magic comment. Encoding declarations are ignored. The :default_encoding
# option or system default external encoding are used by default.
class ErubisTemplate < ERBTemplate
def self.engine_initialized?
defined? ::Erubis
Expand All @@ -87,6 +102,9 @@ def prepare
engine_class = options.delete(:engine_class)
engine_class = ::Erubis::EscapedEruby if options.delete(:escape_html)
@engine = (engine_class || ::Erubis::Eruby).new(data, options)
encoding = data.respond_to?(:encoding) ? data.encoding : nil
@source = assign_source_encoding(@engine.src, encoding, remove=false)
@data.force_encoding @source.encoding if @data.respond_to?(:force_encoding)
end

def precompiled_preamble(locals)
Expand All @@ -96,15 +114,6 @@ def precompiled_preamble(locals)
def precompiled_postamble(locals)
[@outvar, super].join("\n")
end

# Erubis doesn't have ERB's line-off-by-one under 1.9 problem.
# Override and adjust back.
if RUBY_VERSION >= '1.9.0'
def precompiled(locals)
source, offset = super
[source, offset - 1]
end
end
end
end

Loading