Skip to content
This repository

Encoding Support #107

Closed
wants to merge 11 commits into from

12 participants

Ryan Tomayko Joshua Hull Wojciech Wnętrzak Gabriel Benmergui Tim Haines Matias Korhonen Eric Mill Tim Millwood Konstantin Haase Magnus Holm Tom Wardrop Joshua Peek
Ryan Tomayko
Owner

Taking a crack at the encoding problem (#75). The code I have written so far is pretty close to @judofyr's original write up on that issue. The Template class has been modified somewhat and I'm using the ERBTemplate, BuilderTemplate, and CoffeeScriptTemplate implementations to exercise the requirements.

I have a section in the README on encodings to spec out the expected behavior and general strategy. There's an overviewy part and then it describes how templates should think about template source data:

Template Source Encoding

The template source data may come from a file or from a string. In either case, the real template source encoding should be determined as follows in order of preference:

  • Template specific encoding rules (e.g., utf-8 only formats).
  • A (template specific) magic encoding comment embedded in the source string.
  • The source string's existing encoding (string only).
  • The :default_encoding option to Template.new (file only).
  • Encoding.default_external - the default system encoding (file only)

Some template file formats have strict encoding requirements. CoffeeScript is a utf-8 only format for instance. Template implementations are encouraged to use this type of information to constrain the detection logic defined above.

This part feels pretty solid to me. The ERB/Erubis, Builder, and CS templates all adhere to these rules and document their behavior when the source encoding detection process is somehow constrained.

This next part I've only prose and no code. It's a lot more shaky:

Render Context Encoding

When the system internal encoding (Encoding.default_internal) is not set (MRI default), templates should be evaluated and produce a result string encoded the same as the template source data. e.g., A Big5 encoded template on disk will generate a Big5 result string and expect interpolated values to be Big5 compatible.

When Encoding.default_internal is set, templates should be converted from the template source encoding to the internal encoding before being compiled / evaluated and the result string should be encoded in the default internal encoding. For instance, when default_internal is set to UTF-8, a Big5 encoded template on disk will be converted to and generate a UTF-8 result string and interpolated values must be utf-8 compatible.

I do like this behavior in theory. It's compatible with ActionView and seems consistent with the spirit of Encoding.default_internal. However, the caller needs the ability to override these behaviors for cases where default_internal cannot be changed but the caller knows render context transcoding is needed or not needed. For that I think we should add another option:

Templates that perform render context transcoding must allow these default behaviors to be controlled via the :transcode option:

  • :transcode => true - Convert from template source encoding to the system default internal encoding before evaluating the template. The result string is guaranteed to be in the default internal encoding. Do nothing when Encoding.default_internal is nil.

    This is the default behavior when no :transcode option is given.

  • :transcode => false - Perform no encoding conversion. The result string will have the same encoding as the detected template source string.

    This is the default behavior when default_internal is nil.

  • :transcode => 'utf-8' - Ignore default_internal. Instead, convert from template source encoding to utf-8 before evaluating the template. The result string is guaranteed to be utf-8 encoded. The encoding value ('utf-8') may be any valid encoding name or Encoding constant.

My plan is to start working on the default transcoding behavior + :transcode option for all of the templates I've touched so far. What do you guys think?

/cc @judofyr, @josh, @rkh, @josevalim, @apotonick, @nesquena, @brianmario, @DAddYE, everyone ...

added some commits
Ryan Tomayko revise and define more strictly the default_encoding option
The default_encoding option takes effect only when tilt reads
template data from the filesystem. Templates provided via custom
reader block are assumed to be tagged with a best guess encoding
already.

It's also worth noting that, unlike File.read, the default file
reader does not perform Encoding.default_internal transcoding. The
string is marked with the default_encoding or the system encoding
(Encoding.default_external) but no transcoding is performed. This is
because magic comments or template specific encoding settings are
not yet available.
d602069
Ryan Tomayko generate ruby source in same encoding as template source data e18b99c
Ryan Tomayko specify template source encoding behavior in README, tests ee65ca9
Ryan Tomayko ERB templates adhere to source template encoding behavior 35e5a48
Ryan Tomayko refine ruby source comment extraction utility methods 7ef2de5
Ryan Tomayko Builder template adheres to source template encoding behavior 9c0ecf8
Ryan Tomayko start to spec out :transcode behavior a little
None of this is happening in the code yet. I'm just trying to figure
out what it might look like.
fcb9a1e
Ryan Tomayko CoffeeScript template requires utf-8 input, generates utf-8 output only
All of the various input and output overrides are ignored
essentially.
b73c7da
Ryan Tomayko reorg encoding spec in README f36cc18
Ryan Tomayko typos abound in binary method comments ae8900b
Ryan Tomayko fix markdown bullet indent in README 9a474c5
Joshua Peek josh commented on the diff
lib/tilt/builder.rb
@@ -14,7 +35,10 @@ module Tilt
14 35
       require_template_library 'builder'
15 36
     end
16 37
 
17  
-    def prepare; end
  38
+    def prepare
  39
+      return if !data.respond_to?(:to_str)
  40
+      @source = assign_source_encoding(data.to_str)
2
Joshua Peek
josh added a note

I like the opt-in for assign_source_encoding.

Ryan Tomayko Owner

Yeah. I'm going with an approach where the base class provides some convenience APIs but the behavior is more or less in the hands of the template subclass. I don't think there's any other way since template's vary so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Joshua Peek josh commented on the diff
README.md
((65 lines not shown))
  255
+  - `:transcode => true` - Convert from template source encoding to the system
  256
+    default internal encoding (`Encoding.default_internal`) before evaluating the
  257
+    template. The result string is guaranteed to be in the default internal
  258
+    encoding. Do nothing when `Encoding.default_internal` is nil.
  259
+
  260
+    This is the default behavior when no `:transcode` option is given.
  261
+
  262
+  - `:transcode => false` - Perform no encoding conversion. The result string
  263
+    will have the same encoding as the detected template source string.
  264
+
  265
+    This is the default behavior when `Encoding.default_internal` is nil.
  266
+
  267
+  - `:transcode => 'utf-8'` - Ignore `Encoding.default_internal`. Instead,
  268
+    convert from template source encoding to utf-8 before evaluating the
  269
+    template. The result string is guaranteed to be utf-8 encoded. The encoding
  270
+    value (`'utf-8'`) may be any valid encoding name or Encoding constant.
4
Joshua Peek
josh added a note

I don't think this :transcode option is implemented yet. Is this going to be set as an option on initialize or supposed to be passed each time to the render method?

We might be able to provide some of these things for the handler by adding them to the default render implementation. The before evaluating step doesn't always exist for every handler. For an example, coffee, sass and markdown are contextless and ignore the locals option. So these handlers only need to call encode on the final string they produce. So we might be able to move that final encode to our render. But I'm not sure if thats entirely a good idea.

Ryan Tomayko Owner

Nope I haven't started in on any of the transcoding yet.

I hadn't even considered making this an option to render. I figured we'd want it on initialize and also maybe as a Template class attribute that can be overridden by subclasses. Having it on render could be interesting. I'm not sure how much it'd be used though. I think the 99% case for transcoding is going to be people setting Encoding.default_internal = 'utf-8' or wanting the same effect when running templates.

Joshua Peek
josh added a note

Don't really want to put it on render cause we have no options arg. Passing to initialize makes the most sense.

We could provide some sort of Template#encode() helper that encodes the string to options[:transcode] || default_internal. So handlers can call that before they return the final result.

Ryan Tomayko Owner

Exactly what I'm thinking. It needs to happen before evaluation for context / interpolating templates though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Ryan Tomayko
Owner

Related: #75

Joshua Hull

@rtomayko Any update on this?

Ryan Tomayko
Owner

@joshbuddy: I haven't had a second to look at this for a long while now, unfortunately. Hopefully things will settle down soon. I'd definitely like to get this closed out.

Joshua Hull

@rtomayko No worries. Just wanted to make sure it didn't get lost in the shuffle.

Wojciech Wnętrzak

I have same problems with template encodings.

I tested encoding branch and it works fine for me.

Any update on this issue?

Gabriel Benmergui

This is a pretty serious issue for my application. I cannot support internationalization and users can easily break pages if they put the wrong characters.

Is there a quickfix or a monkey-patch i can do to fix this? adding the magic comment to the erb did not work. I've been waiting 2 months for this merge :S.

Tim Haines

Rail's asset pipeline fails on templates with unicode in them due to the lack of encoding support right?

Tim Haines

FWIW The encodings branch of the gem worked for me.

Gabriel Benmergui

Yes that was my solution also.

Matias Korhonen

What's the status on this? Just wondering why the encodings branch hasn't gotten merged into master?

Eric Mill

I too would love these changes, having some related breakages on my end - any chance these'll be merged into master soon?

Marios Antonoudiou qboss referenced this pull request in middleman/middleman
Closed

Bug with encoding #738

Konstantin Haase
Collaborator

What's the status of this?

Rails, Sinatra and Ruby 2.0 default everything to UTF-8.

Magnus Holm
Collaborator

I'm going to revisit this for the 1.4 release. I'm planning on implementing (or, copying from this branch) the minimal code we need for supporting encodings. No transcoding or funky stuff, but something that makes UTF-8 templates work out-of-the-box for 99% of all users.

Tom Wardrop

Please do @judofyr, I've just hit this in a SCSS template I have which contains a single UTF-8 character. I've worked around it in my framework by simply doing a manual File#read in the block I send to #new. It seems File#binread otherwise used by Tilt always defaults to ASCII-8BIT, even on Ruby 2.0.0.

Magnus Holm
Collaborator

@Wardrop: Can you try the branch in #175 and see if it fixes your problem?

Magnus Holm judofyr closed this
Magnus Holm
Collaborator

#175 has now been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 11 unique commits by 1 author.

Sep 19, 2011
Ryan Tomayko revise and define more strictly the default_encoding option
The default_encoding option takes effect only when tilt reads
template data from the filesystem. Templates provided via custom
reader block are assumed to be tagged with a best guess encoding
already.

It's also worth noting that, unlike File.read, the default file
reader does not perform Encoding.default_internal transcoding. The
string is marked with the default_encoding or the system encoding
(Encoding.default_external) but no transcoding is performed. This is
because magic comments or template specific encoding settings are
not yet available.
d602069
Ryan Tomayko generate ruby source in same encoding as template source data e18b99c
Ryan Tomayko specify template source encoding behavior in README, tests ee65ca9
Ryan Tomayko ERB templates adhere to source template encoding behavior 35e5a48
Ryan Tomayko refine ruby source comment extraction utility methods 7ef2de5
Ryan Tomayko Builder template adheres to source template encoding behavior 9c0ecf8
Ryan Tomayko start to spec out :transcode behavior a little
None of this is happening in the code yet. I'm just trying to figure
out what it might look like.
fcb9a1e
Ryan Tomayko CoffeeScript template requires utf-8 input, generates utf-8 output only
All of the various input and output overrides are ignored
essentially.
b73c7da
Ryan Tomayko reorg encoding spec in README f36cc18
Ryan Tomayko typos abound in binary method comments ae8900b
Ryan Tomayko fix markdown bullet indent in README 9a474c5
This page is out of date. Refresh to see the latest.
78  README.md
Source Rendered
@@ -191,6 +191,84 @@ template, but if you depend on a specific implementation, you should use #prefer
191 191
 When a file extension has a preferred template class, Tilt will *always* use
192 192
 that class, even if it raises an exception.
193 193
 
  194
+Encodings
  195
+---------
  196
+
  197
+All Tilt template implementations must follow a few guidelines regarding string
  198
+encodings under MRI >= Ruby 1.9 and other encoding aware environments. This
  199
+section defines "good behavior" for template implementations that support
  200
+multiple encodings.
  201
+
  202
+There are two places where encodings come into play:
  203
+
  204
+ - __Template source data encoding.__ When a template is read from the
  205
+   filesystem, how do we know what encoding to set on the string? This is
  206
+   complicated by the fact that many template formats support embedded magic
  207
+   encoding declarations, while others mandate that template source data be in a
  208
+   specific encoding (utf-8 only formats).
  209
+
  210
+ - __Render context and result encoding.__ In what encoding is the output being
  211
+   generated in? It's often useful to guarantee that templates are evaluated in
  212
+   utf-8 context and will generate utf-8 output regardless of the template's
  213
+   source encoding. What effect does `Encoding.default_internal` have on
  214
+   template execution and output?
  215
+
  216
+Tilt's encoding support aims only to provide a framework for answering these
  217
+questions for each template engine. It does not attempt to define a single
  218
+behavior that all templates must conform to because templates vary widely in
  219
+encoding support.
  220
+
  221
+### Template Source Encoding
  222
+
  223
+The template source data may come from a file or from a string. In either case,
  224
+the real template source encoding should be determined as follows in order of
  225
+preference:
  226
+
  227
+ - Template specific encoding rules (e.g., utf-8 only formats).
  228
+ - A (template specific) magic encoding comment embedded in the source string.
  229
+ - The source string's existing encoding (string only).
  230
+ - The `:default_encoding` option to `Template.new` (file only).
  231
+ - `Encoding.default_external` - the default system encoding (file only)
  232
+
  233
+Some template file formats have strict encoding requirements. CoffeeScript is a
  234
+utf-8 only format for instance. Template implementations are encouraged to use
  235
+this type of information to constrain the detection logic defined above.
  236
+
  237
+### Render Context Encoding
  238
+
  239
+When the system internal encoding (`Encoding.default_internal`) *is not* set
  240
+(MRI default), templates should be evaluated and produce a result string encoded
  241
+the same as the template source data. e.g., A Big5 encoded template on disk will
  242
+generate a Big5 result string and expect interpolated values to be Big5
  243
+compatible.
  244
+
  245
+When `Encoding.default_internal` *is* set, templates should be converted from
  246
+the template source encoding to the internal encoding *before* being compiled /
  247
+evaluated and the result string should be encoded in the default internal
  248
+encoding. For instance, when `default_internal` is set to UTF-8, a Big5 encoded
  249
+template on disk will generate a UTF-8 result string and interpolated values
  250
+must be utf-8 compatible.
  251
+
  252
+Templates that perform render context transcoding must allow these default
  253
+behaviors to be controlled via the `:transcode` option:
  254
+
  255
+  - `:transcode => true` - Convert from template source encoding to the system
  256
+    default internal encoding (`Encoding.default_internal`) before evaluating the
  257
+    template. The result string is guaranteed to be in the default internal
  258
+    encoding. Do nothing when `Encoding.default_internal` is nil.
  259
+
  260
+    This is the default behavior when no `:transcode` option is given.
  261
+
  262
+  - `:transcode => false` - Perform no encoding conversion. The result string
  263
+    will have the same encoding as the detected template source string.
  264
+
  265
+    This is the default behavior when `Encoding.default_internal` is nil.
  266
+
  267
+  - `:transcode => 'utf-8'` - Ignore `Encoding.default_internal`. Instead,
  268
+    convert from template source encoding to utf-8 before evaluating the
  269
+    template. The result string is guaranteed to be utf-8 encoded. The encoding
  270
+    value (`'utf-8'`) may be any valid encoding name or Encoding constant.
  271
+
194 272
 Template Compilation
195 273
 --------------------
196 274
 
38  lib/tilt/builder.rb
... ...
@@ -1,8 +1,29 @@
1 1
 require 'tilt/template'
2 2
 
3 3
 module Tilt
4  
-   # Builder template implementation. See:
5  
-  # http://builder.rubyforge.org/
  4
+  # XML Builder Template implementation
  5
+  #
  6
+  # - http://builder.rubyforge.org/
  7
+  #
  8
+  # Builder templates support three types of template input: string, file,
  9
+  # and block. When the initialize block returns a non-string object that
  10
+  # responds to call (Proc), template execution consists of calling the block
  11
+  # with a Builder::XmlMarkup instance:
  12
+  #
  13
+  #     BuilderTemplate.new do
  14
+  #       lambda do |xml|
  15
+  #         xml.h1 'howdy dudy'
  16
+  #         xml.p  'blaahhh'
  17
+  #       end
  18
+  #     end
  19
+  #
  20
+  # Builder templates can also be instantiated from a string or file. In that
  21
+  # case, the source encoding is determined according to the rules documented
  22
+  # in the Tilt README under Encodings. The ruby magic comment line is supported
  23
+  # for specifying an alternative encoding.
  24
+  #
  25
+  # Builder templates always produce utf-8 encoded result strings regardless of
  26
+  # the source string / file encoding.
6 27
   class BuilderTemplate < Template
7 28
     self.default_mime_type = 'text/xml'
8 29
 
@@ -14,7 +35,10 @@ def initialize_engine
14 35
       require_template_library 'builder'
15 36
     end
16 37
 
17  
-    def prepare; end
  38
+    def prepare
  39
+      return if !data.respond_to?(:to_str)
  40
+      @source = assign_source_encoding(data.to_str)
  41
+    end
18 42
 
19 43
     def evaluate(scope, locals, &block)
20 44
       return super(scope, locals, &block) if data.respond_to?(:to_str)
@@ -23,6 +47,10 @@ def evaluate(scope, locals, &block)
23 47
       xml.target!
24 48
     end
25 49
 
  50
+    def precompiled_template(locals)
  51
+      @source
  52
+    end
  53
+
26 54
     def precompiled_preamble(locals)
27 55
       return super if locals.include? :xml
28 56
       "xml = ::Builder::XmlMarkup.new(:indent => 2)\n#{super}"
@@ -31,10 +59,6 @@ def precompiled_preamble(locals)
31 59
     def precompiled_postamble(locals)
32 60
       "xml.target!"
33 61
     end
34  
-
35  
-    def precompiled_template(locals)
36  
-      data.to_str
37  
-    end
38 62
   end
39 63
 end
40 64
 
20  lib/tilt/coffee.rb
... ...
@@ -1,10 +1,16 @@
1 1
 require 'tilt/template'
2 2
 
3 3
 module Tilt
4  
-  # CoffeeScript template implementation. See:
5  
-  # http://coffeescript.org/
  4
+  # CoffeeScript template implementation.
  5
+  #
  6
+  # - http://coffeescript.org/
6 7
   #
7 8
   # CoffeeScript templates do not support object scopes, locals, or yield.
  9
+  #
  10
+  # All CoffeeScript files must be utf-8 encoded. The :default_encoding
  11
+  # option and system default encoding are ignored. When a non-utf-8 string
  12
+  # is provided via custom reader block, it is converted to utf-8 before
  13
+  # being passed to the Coffee compiler.
8 14
   class CoffeeScriptTemplate < Template
9 15
     self.default_mime_type = 'application/javascript'
10 16
 
@@ -40,11 +46,21 @@ def prepare
40 46
       if !options.key?(:bare) and !options.key?(:no_wrap)
41 47
         options[:bare] = self.class.default_bare
42 48
       end
  49
+
  50
+      # if string was given and its not utf-8, transcode it now
  51
+      data.encode! 'UTF-8' if data.respond_to?(:encode!)
43 52
     end
44 53
 
45 54
     def evaluate(scope, locals, &block)
46 55
       @output ||= CoffeeScript.compile(data, options)
47 56
     end
  57
+
  58
+    # Override to set the @default_encoding to always be utf-8, ignoring the
  59
+    # :default_encoding option value.
  60
+    def read_template_file
  61
+      @default_encoding = 'UTF-8'
  62
+      super
  63
+    end
48 64
   end
49 65
 end
50 66
 
49  lib/tilt/erb.rb
@@ -3,6 +3,11 @@
3 3
 module Tilt
4 4
   # ERB template implementation. See:
5 5
   # http://www.ruby-doc.org/stdlib/libdoc/erb/rdoc/classes/ERB.html
  6
+  #
  7
+  # The template supports encoding detection via first line magic comment:
  8
+  # <%# coding: utf-8 %>
  9
+  #
  10
+  # When present, the string's encoding is adjusted to the specified value.
6 11
   class ERBTemplate < Template
7 12
     @@default_output_variable = '_erbout'
8 13
 
@@ -22,17 +27,30 @@ def initialize_engine
22 27
       require_template_library 'erb'
23 28
     end
24 29
 
  30
+    # Create an ERB object and generate the Ruby source code for the template.
  31
+    # The resulting source string has the same encoding as the input data
  32
+    # *unless* the template includes a magic comment, in which case the source
  33
+    # string AND the template data will be marked with the declared encoding.
  34
+    #
  35
+    # The resulting source string does not include any magic comment line
  36
+    # generated by ERB. The string.encoding should be used to determine the
  37
+    # source and output encoding.
25 38
     def prepare
26 39
       @outvar = options[:outvar] || self.class.default_output_variable
27 40
       options[:trim] = '<>' if options[:trim].nil? || options[:trim] == true
28 41
       @engine = ::ERB.new(data, options[:safe], options[:trim], @outvar)
  42
+      encoding = data.respond_to?(:encoding) ? data.encoding : nil
  43
+      @source = assign_source_encoding(@engine.src, encoding, remove=true)
  44
+      @data.force_encoding @source.encoding if @data.respond_to?(:force_encoding)
29 45
     end
30 46
 
  47
+    # Override to always return the generated source string.
31 48
     def precompiled_template(locals)
32  
-      source = @engine.src
33  
-      source
  49
+      @source
34 50
     end
35 51
 
  52
+    # Override to store the original state of the output variable before
  53
+    # this template is executed.
36 54
     def precompiled_preamble(locals)
37 55
       <<-RUBY
38 56
         begin
@@ -41,6 +59,8 @@ def precompiled_preamble(locals)
41 59
       RUBY
42 60
     end
43 61
 
  62
+    # Override to reset the output variable to its state before the template
  63
+    # was executed.
44 64
     def precompiled_postamble(locals)
45 65
       <<-RUBY
46 66
           #{super}
@@ -49,15 +69,6 @@ def precompiled_postamble(locals)
49 69
         end
50 70
       RUBY
51 71
     end
52  
-
53  
-    # ERB generates a line to specify the character coding of the generated
54  
-    # source in 1.9. Account for this in the line offset.
55  
-    if RUBY_VERSION >= '1.9.0'
56  
-      def precompiled(locals)
57  
-        source, offset = super
58  
-        [source, offset + 1]
59  
-      end
60  
-    end
61 72
   end
62 73
 
63 74
   # Erubis template implementation. See:
@@ -72,6 +83,10 @@ def precompiled(locals)
72 83
   #   :escape_html    when true, ::Erubis::EscapedEruby will be used as
73 84
   #                   the engine class instead of the default. All content
74 85
   #                   within <%= %> blocks will be automatically html escaped.
  86
+  #
  87
+  # Unlike ERB, the Erubis template engine does not support encoding detection
  88
+  # via magic comment. Encoding declarations are ignored. The :default_encoding
  89
+  # option or system default external encoding are used by default.
75 90
   class ErubisTemplate < ERBTemplate
76 91
     def self.engine_initialized?
77 92
       defined? ::Erubis
@@ -87,6 +102,9 @@ def prepare
87 102
       engine_class = options.delete(:engine_class)
88 103
       engine_class = ::Erubis::EscapedEruby if options.delete(:escape_html)
89 104
       @engine = (engine_class || ::Erubis::Eruby).new(data, options)
  105
+      encoding = data.respond_to?(:encoding) ? data.encoding : nil
  106
+      @source = assign_source_encoding(@engine.src, encoding, remove=false)
  107
+      @data.force_encoding @source.encoding if @data.respond_to?(:force_encoding)
90 108
     end
91 109
 
92 110
     def precompiled_preamble(locals)
@@ -96,15 +114,6 @@ def precompiled_preamble(locals)
96 114
     def precompiled_postamble(locals)
97 115
       [@outvar, super].join("\n")
98 116
     end
99  
-
100  
-    # Erubis doesn't have ERB's line-off-by-one under 1.9 problem.
101  
-    # Override and adjust back.
102  
-    if RUBY_VERSION >= '1.9.0'
103  
-      def precompiled(locals)
104  
-        source, offset = super
105  
-        [source, offset - 1]
106  
-      end
107  
-    end
108 117
   end
109 118
 end
110 119
 
147  lib/tilt/template.rb
@@ -30,11 +30,17 @@ class << self
30 30
     end
31 31
 
32 32
     # Create a new template with the file, line, and options specified. By
33  
-    # default, template data is read from the file. When a block is given,
34  
-    # it should read template data and return as a String. When file is nil,
35  
-    # a block is required.
  33
+    # default, template data is read from file and assumed to be in the
  34
+    # system default external encoding (Encoding.default_external). When a
  35
+    # block is given, it should read template data and return a String with
  36
+    # a best guess encoding.
36 37
     #
37  
-    # All arguments are optional.
  38
+    # The :default_encoding option is supported by most template engines. When
  39
+    # set, data read from disk will be assumed to be in this encoding instead
  40
+    # of Encoding.default_external. The option has no effect when a custom
  41
+    # reader block is given.
  42
+    #
  43
+    # All arguments are optional but a file or block must be specified.
38 44
     def initialize(file=nil, line=1, options={}, &block)
39 45
       @file, @line, @options = nil, 1, {}
40 46
 
@@ -59,12 +65,11 @@ def initialize(file=nil, line=1, options={}, &block)
59 65
       # used to hold compiled template methods
60 66
       @compiled_method = {}
61 67
 
62  
-      # used on 1.9 to set the encoding if it is not set elsewhere (like a magic comment)
63  
-      # currently only used if template compiles to ruby
  68
+      # Overrides Encoding.default_external when reading from filesystem
64 69
       @default_encoding = @options.delete :default_encoding
65 70
 
66 71
       # load template data and prepare (uses binread to avoid encoding issues)
67  
-      @reader = block || lambda { |t| File.respond_to?(:binread) ? File.binread(@file) : File.read(@file) }
  72
+      @reader = block || lambda { |t| read_template_file }
68 73
       @data = @reader.call(self)
69 74
       prepare
70 75
     end
@@ -98,6 +103,29 @@ def eval_file
98 103
     def initialize_engine
99 104
     end
100 105
 
  106
+    # Read template data from file, possibly overriding the encoding based on
  107
+    # the default_encoding option. This is used when the object is created with
  108
+    # a file and no reader block.
  109
+    #
  110
+    # Unlike File.read, this method does not transcode into the system
  111
+    # Encoding.default_internal encoding. The best guess encoding is set and
  112
+    # available from data.encoding.
  113
+    #
  114
+    # Subclasses may override this method if they have specific knowledge about
  115
+    # the file's encoding and can provide better default encoding support.
  116
+    #
  117
+    # Raise exception when file doesn't exist.
  118
+    # Does not raise an exception when the file's data is invalid in the best
  119
+    # guess encoding.
  120
+    def read_template_file
  121
+      data = File.open(file, 'rb') { |io| io.read }
  122
+      if data.respond_to?(:force_encoding)
  123
+        encoding = @default_encoding || Encoding.default_external
  124
+        data.force_encoding(encoding)
  125
+      end
  126
+      data
  127
+    end
  128
+
101 129
     # Like Kernel#require but issues a warning urging a manual require when
102 130
     # running under a threaded environment.
103 131
     def require_template_library(name)
@@ -113,6 +141,14 @@ def require_template_library(name)
113 141
     # variables set in this method are available when #evaluate is called.
114 142
     #
115 143
     # Subclasses must provide an implementation of this method.
  144
+    #
  145
+    # The data attribute holds the template source string marked with the best
  146
+    # guess encoding. When the template was read from the filesystem this will
  147
+    # be either the :default_encoding provided when the template was created or
  148
+    # the system default Encoding.default_external encoding. When the template
  149
+    # data was provided via reader block, it will be in whatever encoding was
  150
+    # set on the string originally. Subclasses are responsible for detecting
  151
+    # template specific magic syntax encodings embedded in the template data.
116 152
     def prepare
117 153
       if respond_to?(:compile!)
118 154
         # backward compat with tilt < 0.6; just in case
@@ -156,18 +192,19 @@ def self.cached_evaluate(scope, locals, &block)
156 192
     def precompiled(locals)
157 193
       preamble = precompiled_preamble(locals)
158 194
       template = precompiled_template(locals)
159  
-      magic_comment = extract_magic_comment(template)
160  
-      if magic_comment
161  
-        # Magic comment e.g. "# coding: utf-8" has to be in the first line.
162  
-        # So we copy the magic comment to the first line.
163  
-        preamble = magic_comment + "\n" + preamble
  195
+
  196
+      source = ''
  197
+      if source.respond_to?(:force_encoding)
  198
+        source.force_encoding template.encoding
164 199
       end
165  
-      parts = [
166  
-        preamble,
167  
-        template,
168  
-        precompiled_postamble(locals)
169  
-      ]
170  
-      [parts.join("\n"), preamble.count("\n") + 1]
  200
+
  201
+      source << preamble
  202
+      source << "\n"
  203
+      source << template
  204
+      source << "\n"
  205
+      source << precompiled_postamble(locals)
  206
+
  207
+      [source, preamble.count("\n") + 1]
171 208
     end
172 209
 
173 210
     # A string containing the (Ruby) source code for the template. The
@@ -230,20 +267,29 @@ def compile_template_method(locals)
230 267
       source, offset = precompiled(locals)
231 268
       offset += 5
232 269
       method_name = "__tilt_#{Thread.current.object_id.abs}"
233  
-      Object.class_eval <<-RUBY, eval_file, line - offset
234  
-        #{extract_magic_comment source}
  270
+      method_source = ""
  271
+
  272
+      if method_source.respond_to?(:force_encoding)
  273
+        method_source.force_encoding source.encoding
  274
+      end
  275
+
  276
+      method_source << <<-RUBY
235 277
         TOPOBJECT.class_eval do
236 278
           def #{method_name}(locals)
237 279
             Thread.current[:tilt_vars] = [self, locals]
238 280
             class << self
239 281
               this, locals = Thread.current[:tilt_vars]
240 282
               this.instance_eval do
241  
-               #{source}
  283
+      RUBY
  284
+      method_source << source
  285
+      method_source << <<-RUBY
242 286
               end
243 287
             end
244 288
           end
245 289
         end
246 290
       RUBY
  291
+
  292
+      Object.class_eval method_source, eval_file, line - offset
247 293
       unbind_compiled_method(method_name)
248 294
     end
249 295
 
@@ -253,12 +299,59 @@ def unbind_compiled_method(method_name)
253 299
       method
254 300
     end
255 301
 
256  
-    def extract_magic_comment(script)
257  
-      comment = script.slice(/\A[ \t]*\#.*coding\s*[=:]\s*([[:alnum:]\-_]+).*$/)
258  
-      if comment && !%w[ascii-8bit binary].include?($1.downcase)
259  
-        comment
260  
-      elsif @default_encoding
261  
-        "# coding: #{@default_encoding}"
  302
+    # Regexp used to find and remove magic comment lines from Ruby source.
  303
+    MAGIC = /\A[ \t]*\#.*coding\s*[=:]\s*([[:alnum:]\-_]+).*?\n/mn
  304
+
  305
+    # Checks for a Ruby 1.9 encoding comment on the first line of source.
  306
+    #
  307
+    # source - string to check for magic comment line
  308
+    # remove - set true to remove the line from the string in place
  309
+    #
  310
+    # Returns the encoding name string or nil when no comment was present.
  311
+    def extract_source_encoding(source, remove=false)
  312
+      binary source do
  313
+        slice = remove ? :slice! : :slice
  314
+        $1 if source.__send__(slice, MAGIC)
  315
+      end
  316
+    end
  317
+
  318
+    # Extract encoding comment from source and mark the string's encoding. The
  319
+    # string is modified in place. When no encoding is found, the encoding
  320
+    # passed in the default argument is used. The remove argument can be set
  321
+    # true to remove the magic comment line from the source string in place.
  322
+    #
  323
+    # This method is a no-op under Ruby < 1.9
  324
+    if ''.respond_to?(:force_encoding)
  325
+      def assign_source_encoding(source, default=nil, remove=false)
  326
+        if encoding = extract_source_encoding(source, remove)
  327
+          source.force_encoding(encoding)
  328
+        elsif default
  329
+          source.force_encoding(default)
  330
+        else
  331
+          source
  332
+        end
  333
+      end
  334
+    else
  335
+      def assign_source_encoding(source, *args)
  336
+        source
  337
+      end
  338
+    end
  339
+
  340
+    # Mark the string as BINARY/ASCII-8BIT for the duration of the block. The
  341
+    # string is reset to its original encoding before this method returns. This
  342
+    # combined with //n flagged regular expressions is one way to avoid encoding
  343
+    # compatibility errors while a string's encoding is still in best guess mode.
  344
+    if ''.respond_to?(:force_encoding)
  345
+      def binary(string)
  346
+        original_encoding = string.encoding
  347
+        string.force_encoding 'BINARY'
  348
+        yield
  349
+      ensure
  350
+        string.force_encoding original_encoding
  351
+      end
  352
+    else
  353
+      def binary(string)
  354
+        yield string
262 355
       end
263 356
     end
264 357
 
63  test/tilt_buildertemplate_test.rb
... ...
@@ -1,3 +1,4 @@
  1
+# coding: utf-8
1 2
 require 'contest'
2 3
 require 'tilt'
3 4
 
@@ -53,6 +54,68 @@ class BuilderTemplateTest < Test::Unit::TestCase
53 54
           template.render(options) { subtemplate.render(options) }
54 55
       end
55 56
     end
  57
+
  58
+    ##
  59
+    # Encodings
  60
+
  61
+    if defined?(Encoding) && Encoding.respond_to?(:default_internal)
  62
+      original_encoding = Encoding.default_external
  63
+      setup do
  64
+        Encoding.default_external = 'utf-8'
  65
+        Encoding.default_internal = nil
  66
+      end
  67
+      teardown do
  68
+        Encoding.default_external = original_encoding
  69
+        Encoding.default_internal = nil
  70
+      end
  71
+
  72
+      def tempfile(name='template')
  73
+        f = Tempfile.open(name)
  74
+        f.sync = true
  75
+        yield f
  76
+      ensure
  77
+        f.close rescue nil
  78
+        f.delete
  79
+      end
  80
+
  81
+      test "reading templates using default external encoding" do
  82
+        Encoding.default_external = 'Shift_JIS'
  83
+        tempfile do |f|
  84
+          f.puts("xml.em 'ふが' + @hoge".encode('Shift_JIS'))
  85
+          template = Tilt::BuilderTemplate.new(f.path)
  86
+          assert_equal 'Shift_JIS', template.data.encoding.to_s
  87
+          @hoge = "ほげ".encode('Shift_JIS')
  88
+          assert_equal 'UTF-8', template.render(self).encoding.to_s
  89
+        end
  90
+      end
  91
+
  92
+      test "reading templates using :default_encoding option override" do
  93
+        Encoding.default_external = 'Big5'
  94
+        tempfile do |f|
  95
+          f.puts("xml.em 'ふが' + @hoge".encode('Shift_JIS'))
  96
+          template = Tilt::BuilderTemplate.new(f.path, :default_encoding => 'Shift_JIS')
  97
+          assert_equal 'Shift_JIS', template.data.encoding.to_s
  98
+          @hoge = "ほげ".encode('Shift_JIS')
  99
+          assert_equal 'UTF-8', template.render(self).encoding.to_s
  100
+        end
  101
+      end
  102
+
  103
+      test "reading template with magic encoding comment" do
  104
+        Encoding.default_external = 'Big5'
  105
+        tempfile do |f|
  106
+          f.puts("# coding: Shift_JIS".encode('Shift_JIS'))
  107
+          f.puts("xml.em 'ふが' + @hoge".encode('Shift_JIS'))
  108
+          # require 'ruby-debug'
  109
+          # debugger
  110
+          template = Tilt::BuilderTemplate.new(f.path)
  111
+          assert_equal 'Shift_JIS', template.data.encoding.to_s
  112
+          @hoge = "ほげ".encode('Shift_JIS')
  113
+          output = template.render(self)
  114
+          assert_equal 'UTF-8', output.encoding.to_s
  115
+          assert_equal "<em>ふがほげ</em>\n", output
  116
+        end
  117
+      end
  118
+    end
56 119
   end
57 120
 rescue LoadError
58 121
   warn "Tilt::BuilderTemplate (disabled)"
46  test/tilt_coffeescripttemplate_test.rb
... ...
@@ -1,3 +1,4 @@
  1
+# coding: utf-8
1 2
 require 'contest'
2 3
 require 'tilt'
3 4
 
@@ -54,8 +55,51 @@ class CoffeeScriptTemplateTest < Test::Unit::TestCase
54 55
         assert_not_equal "puts('Hello, World!');", template.render
55 56
       end
56 57
     end
57  
-  end
58 58
 
  59
+    ##
  60
+    # Encodings
  61
+
  62
+    if defined?(Encoding) && Encoding.respond_to?(:default_internal)
  63
+      original_encoding = Encoding.default_external
  64
+      setup    { Encoding.default_external = 'utf-8' }
  65
+      teardown { Encoding.default_external = original_encoding }
  66
+
  67
+      def tempfile(name='template')
  68
+        f = Tempfile.open(name)
  69
+        f.sync = true
  70
+        yield f
  71
+      ensure
  72
+        f.close rescue nil
  73
+        f.delete
  74
+      end
  75
+
  76
+      test "ignores default external encoding" do
  77
+        tempfile do |f|
  78
+          f.puts("console.log 'ふがほげ'")
  79
+          Encoding.default_external = 'Shift_JIS'
  80
+          template = Tilt::CoffeeScriptTemplate.new(f.path)
  81
+          assert_equal 'UTF-8', template.data.encoding.to_s
  82
+          assert_equal 'UTF-8', template.render(self).encoding.to_s
  83
+        end
  84
+      end
  85
+
  86
+      test "ignores :default_encoding option" do
  87
+        tempfile do |f|
  88
+          f.puts("console.log 'ふがほげ'")
  89
+          template = Tilt::CoffeeScriptTemplate.new(f.path, :default_encoding => 'Shift_JIS')
  90
+          assert_equal 'UTF-8', template.data.encoding.to_s
  91
+          assert_equal 'UTF-8', template.render(self).encoding.to_s
  92
+        end
  93
+      end
  94
+
  95
+      test "transcodes input string to utf-8" do
  96
+        string = "console.log 'ふがほげ'".encode("Shift_JIS")
  97
+        template = Tilt::CoffeeScriptTemplate.new { string }
  98
+        assert_equal 'UTF-8', template.data.encoding.to_s
  99
+        assert_equal 'UTF-8', template.render(self).encoding.to_s
  100
+      end
  101
+    end
  102
+  end
59 103
 rescue LoadError => boom
60 104
   warn "Tilt::CoffeeScriptTemplate (disabled)"
61 105
 end
75  test/tilt_erbtemplate_test.rb
@@ -201,25 +201,62 @@ class Scope
201 201
     assert_equal "\nhello\n", template.render(Scope.new)
202 202
   end
203 203
 
204  
-  test "encoding with magic comment" do
205  
-    f = Tempfile.open("template")
206  
-    f.puts('<%# coding: UTF-8 %>')
207  
-    f.puts('ふが <%= @hoge %>')
208  
-    f.close()
209  
-    @hoge = "ほげ"
210  
-    erb = Tilt::ERBTemplate.new(f.path)
211  
-    3.times { erb.render(self) }
212  
-    f.delete
213  
-  end
214  
-
215  
-  test "encoding with :default_encoding" do
216  
-    f = Tempfile.open("template")
217  
-    f.puts('ふが <%= @hoge %>')
218  
-    f.close()
219  
-    @hoge = "ほげ"
220  
-    erb = Tilt::ERBTemplate.new(f.path, :default_encoding => 'UTF-8')
221  
-    3.times { erb.render(self) }
222  
-    f.delete
  204
+  ##
  205
+  # Encodings
  206
+
  207
+  if defined?(Encoding) && Encoding.respond_to?(:default_internal)
  208
+    original_encoding = Encoding.default_external
  209
+    setup do
  210
+      Encoding.default_external = 'utf-8'
  211
+      Encoding.default_internal = nil
  212
+    end
  213
+    teardown do
  214
+      Encoding.default_external = original_encoding
  215
+      Encoding.default_internal = nil
  216
+    end
  217
+
  218
+    def tempfile(name='template')
  219
+      f = Tempfile.open(name)
  220
+      f.sync = true
  221
+      yield f
  222
+    ensure
  223
+      f.close rescue nil
  224
+      f.delete
  225
+    end
  226
+
  227
+    test "producing default external encoded result string" do
  228
+      Encoding.default_external = 'Shift_JIS'
  229
+      tempfile do |f|
  230
+        f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
  231
+        erb = Tilt::ERBTemplate.new(f.path)
  232
+        assert_equal 'Shift_JIS', erb.data.encoding.to_s
  233
+        @hoge = "ほげ".encode('Shift_JIS')
  234
+        assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
  235
+      end
  236
+    end
  237
+
  238
+    test "producing default_encoding encoded result string" do
  239
+      Encoding.default_external = 'Big5'
  240
+      tempfile do |f|
  241
+        f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
  242
+        erb = Tilt::ERBTemplate.new(f.path, :default_encoding => 'Shift_JIS')
  243
+        assert_equal 'Shift_JIS', erb.data.encoding.to_s
  244
+        @hoge = "ほげ".encode('Shift_JIS')
  245
+        assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
  246
+      end
  247
+    end
  248
+
  249
+    test "producing magic comment encoded result string" do
  250
+      Encoding.default_external = 'Big5'
  251
+      tempfile do |f|
  252
+        f.puts('<%# coding: Shift_JIS %>'.encode('Shift_JIS'))
  253
+        f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
  254
+        erb = Tilt::ERBTemplate.new(f.path)
  255
+        assert_equal 'Shift_JIS', erb.data.encoding.to_s
  256
+        @hoge = "ほげ".encode('Shift_JIS')
  257
+        assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
  258
+      end
  259
+    end
223 260
   end
224 261
 end
225 262
 
63  test/tilt_erubistemplate_test.rb
... ...
@@ -1,3 +1,4 @@
  1
+# coding: utf-8
1 2
 require 'contest'
2 3
 require 'tilt'
3 4
 
@@ -135,6 +136,68 @@ class MockOutputVariableScope
135 136
       template = Tilt::ErubisTemplate.new(nil, options_hash) { |t| "Hello World!" }
136 137
       assert_equal({:escape_html => true}, options_hash)
137 138
     end
  139
+
  140
+    ##
  141
+    # Encodings
  142
+
  143
+    if defined?(Encoding) && Encoding.respond_to?(:default_internal)
  144
+      original_encoding = Encoding.default_external
  145
+      setup do
  146
+        Encoding.default_external = 'utf-8'
  147
+        Encoding.default_internal = nil
  148
+      end
  149
+      teardown do
  150
+        Encoding.default_external = original_encoding
  151
+        Encoding.default_internal = nil
  152
+      end
  153
+
  154
+      def tempfile(name='template')
  155
+        f = Tempfile.open(name)
  156
+        f.sync = true
  157
+        yield f
  158
+      ensure
  159
+        f.close rescue nil
  160
+        f.delete
  161
+      end
  162
+
  163
+      test "producing default external encoded result string" do
  164
+        Encoding.default_external = 'Shift_JIS'
  165
+        tempfile do |f|
  166
+          f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
  167
+          erb = Tilt::ErubisTemplate.new(f.path)
  168
+          assert_equal 'Shift_JIS', erb.data.encoding.to_s
  169
+          @hoge = "ほげ".encode('Shift_JIS')
  170
+          assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
  171
+        end
  172
+      end
  173
+
  174
+      test "producing default_encoding encoded result string" do
  175
+        Encoding.default_external = 'Big5'
  176
+        tempfile do |f|
  177
+          f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
  178
+          erb = Tilt::ErubisTemplate.new(f.path, :default_encoding => 'Shift_JIS')
  179
+          assert_equal 'Shift_JIS', erb.data.encoding.to_s
  180
+          @hoge = "ほげ".encode('Shift_JIS')
  181
+          assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
  182
+        end
  183
+      end
  184
+
  185
+      # NOTE Erubis does not support ERB's magic comments.
  186
+      # <%# coding: blah %> does not effect the template's encoding
  187
+
  188
+      # test "producing magic comment encoded result string" do
  189
+      #   Encoding.default_external = 'Big5'
  190
+      #   tempfile do |f|
  191
+      #     f.puts('<%# coding: Shift_JIS %>'.encode('Shift_JIS'))
  192
+      #     f.puts('ふが <%= @hoge %>'.encode('Shift_JIS'))
  193
+      #     erb = Tilt::ErubisTemplate.new(f.path)
  194
+      #     assert_equal 'Shift_JIS', erb.data.encoding.to_s
  195
+      #     @hoge = "ほげ".encode('Shift_JIS')
  196
+      #     assert_equal 'Shift_JIS', erb.render(self).encoding.to_s
  197
+      #   end
  198
+      # end
  199
+    end
  200
+
138 201
   end
139 202
 rescue LoadError => boom
140 203
   warn "Tilt::ErubisTemplate (disabled)"
48  test/tilt_template_test.rb
@@ -48,6 +48,54 @@ def prepare
48 48
     MockTemplate.new { |template| "Hello World!" }
49 49
   end
50 50
 
  51
+  ##
  52
+  # Encodings
  53
+
  54
+  if ''.respond_to?(:encoding)
  55
+    original_encoding = Encoding.default_external
  56
+
  57
+    setup do
  58
+      @file = Tempfile.open('template')
  59
+      @file.puts "stuff"
  60
+      @file.close
  61
+      @template = @file.path
  62
+    end
  63
+
  64
+    teardown do
  65
+      Encoding.default_external = original_encoding
  66
+      Encoding.default_internal = nil
  67
+      @file.delete
  68
+    end
  69
+
  70
+    test "reading from file assumes default external encoding" do
  71
+      Encoding.default_external = 'Big5'
  72
+      inst = MockTemplate.new(@template)
  73
+      assert_equal 'Big5', inst.data.encoding.to_s
  74
+    end
  75
+
  76
+    test "reading from file with a :default_encoding overrides default external" do
  77
+      Encoding.default_external = 'Big5'
  78
+      inst = MockTemplate.new(@template, :default_encoding => 'GBK')
  79
+      assert_equal 'GBK', inst.data.encoding.to_s
  80
+    end
  81
+
  82
+    test "reading from file with default_internal set does no transcoding" do
  83
+      Encoding.default_internal = 'utf-8'
  84
+      Encoding.default_external = 'Big5'
  85
+      inst = MockTemplate.new(@template)
  86
+      assert_equal 'Big5', inst.data.encoding.to_s
  87
+    end
  88
+
  89
+    test "using provided template data verbatim when given as string" do
  90
+      Encoding.default_internal = 'Big5'
  91
+      inst = MockTemplate.new(@template) { "blah".force_encoding('GBK') }
  92
+      assert_equal 'GBK', inst.data.encoding.to_s
  93
+    end
  94
+  end
  95
+
  96
+  ##
  97
+  # Engine Initialization
  98
+
51 99
   class InitializingMockTemplate < Tilt::Template
52 100
     @@initialized_count = 0
53 101
     def self.initialized_count
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.