Skip to content
This repository
Browse code

Significantly improved internal encoding heuristics and support.

* Default Encoding.default_internal to UTF-8
* Eliminated the use of file-wide magic comments to coerce code evaluated inside the file
* Read templates as BINARY, use default_external or template-wide magic comments
  inside the Template to set the initial encoding
  * This means that template handlers in Ruby 1.9 will receive Strings encoded
    in default_internal (UTF-8 by default)
* Create a better Exception for encoding issues, and use it when the template
  source has bytes that are not compatible with the specified encoding
* Allow template handlers to opt-into handling BINARY. If they do so, they
  need to do some of their own manual encoding work
* Added a "Configuration Gotchas" section to the intro Rails Guide instructing
  users to use UTF-8 for everything
* Use config.encoding= in Ruby 1.8, and raise if a value that is an invalid
  $KCODE value is used

Also:
* Fixed a few tests that were assert() rather than assert_equal() and
  were caught by Minitest requiring a String for the message
* Fixed a test where an assert_select was misformed, also caught by
  Minitest being more restrictive
* Fixed a test where a Rack response was returning a String rather
  than an Enumerable
  • Loading branch information...
commit 64d109e3539ad600f58536d3ecabd2f87b67fd1c 1 parent af0d1a8
Yehuda Katz authored May 16, 2010
6  actionpack/lib/action_view.rb
@@ -51,7 +51,9 @@ module ActionView
51 51
 
52 52
     autoload :MissingTemplate,    'action_view/template/error'
53 53
     autoload :ActionViewError,    'action_view/template/error'
54  
-    autoload :TemplateError,     'action_view/template/error'
  54
+    autoload :EncodingError,      'action_view/template/error'
  55
+    autoload :TemplateError,      'action_view/template/error'
  56
+    autoload :WrongEncodingError, 'action_view/template/error'
55 57
 
56 58
     autoload :TemplateHandler,   'action_view/template'
57 59
     autoload :TemplateHandlers,  'action_view/template'
@@ -59,7 +61,7 @@ module ActionView
59 61
 
60 62
   autoload :TestCase, 'action_view/test_case'
61 63
 
62  
-  ENCODING_FLAG = "#.*coding[:=]\s*(\S+)[ \t]*"
  64
+  ENCODING_FLAG = '#.*coding[:=]\s*(\S+)[ \t]*'
63 65
 end
64 66
 
65 67
 require 'active_support/i18n'
201  actionpack/lib/action_view/template.rb
... ...
@@ -1,12 +1,89 @@
1  
-# encoding: utf-8
2  
-# This is so that templates compiled in this file are UTF-8
3 1
 require 'active_support/core_ext/array/wrap'
4 2
 require 'active_support/core_ext/object/blank'
  3
+require 'active_support/core_ext/kernel/singleton_class'
5 4
 
6 5
 module ActionView
7 6
   class Template
8 7
     extend ActiveSupport::Autoload
9 8
 
  9
+    # === Encodings in ActionView::Template
  10
+    #
  11
+    # ActionView::Template is one of a few sources of potential
  12
+    # encoding issues in Rails. This is because the source for
  13
+    # templates are usually read from disk, and Ruby (like most
  14
+    # encoding-aware programming languages) assumes that the
  15
+    # String retrieved through File IO is encoded in the
  16
+    # <tt>default_external</tt> encoding. In Rails, the default
  17
+    # <tt>default_external</tt> encoding is UTF-8.
  18
+    #
  19
+    # As a result, if a user saves their template as ISO-8859-1
  20
+    # (for instance, using a non-Unicode-aware text editor),
  21
+    # and uses characters outside of the ASCII range, their
  22
+    # users will see diamonds with question marks in them in
  23
+    # the browser.
  24
+    #
  25
+    # To mitigate this problem, we use a few strategies:
  26
+    # 1. If the source is not valid UTF-8, we raise an exception
  27
+    #    when the template is compiled to alert the user
  28
+    #    to the problem.
  29
+    # 2. The user can specify the encoding using Ruby-style
  30
+    #    encoding comments in any template engine. If such
  31
+    #    a comment is supplied, Rails will apply that encoding
  32
+    #    to the resulting compiled source returned by the
  33
+    #    template handler.
  34
+    # 3. In all cases, we transcode the resulting String to
  35
+    #    the <tt>default_internal</tt> encoding (which defaults
  36
+    #    to UTF-8).
  37
+    #
  38
+    # This means that other parts of Rails can always assume
  39
+    # that templates are encoded in UTF-8, even if the original
  40
+    # source of the template was not UTF-8.
  41
+    #
  42
+    # From a user's perspective, the easiest thing to do is
  43
+    # to save your templates as UTF-8. If you do this, you
  44
+    # do not need to do anything else for things to "just work".
  45
+    #
  46
+    # === Instructions for template handlers
  47
+    #
  48
+    # The easiest thing for you to do is to simply ignore
  49
+    # encodings. Rails will hand you the template source
  50
+    # as the default_internal (generally UTF-8), raising
  51
+    # an exception for the user before sending the template
  52
+    # to you if it could not determine the original encoding.
  53
+    #
  54
+    # For the greatest simplicity, you can support only
  55
+    # UTF-8 as the <tt>default_internal</tt>. This means
  56
+    # that from the perspective of your handler, the
  57
+    # entire pipeline is just UTF-8.
  58
+    #
  59
+    # === Advanced: Handlers with alternate metadata sources
  60
+    #
  61
+    # If you want to provide an alternate mechanism for
  62
+    # specifying encodings (like ERB does via <%# encoding: ... %>),
  63
+    # you may indicate that you are willing to accept
  64
+    # BINARY data by implementing <tt>self.accepts_binary?</tt>
  65
+    # on your handler.
  66
+    #
  67
+    # If you do, Rails will not raise an exception if
  68
+    # the template's encoding could not be determined,
  69
+    # assuming that you have another mechanism for
  70
+    # making the determination.
  71
+    #
  72
+    # In this case, make sure you return a String from
  73
+    # your handler encoded in the default_internal. Since
  74
+    # you are handling out-of-band metadata, you are
  75
+    # also responsible for alerting the user to any
  76
+    # problems with converting the user's data to
  77
+    # the default_internal.
  78
+    #
  79
+    # To do so, simply raise the raise WrongEncodingError
  80
+    # as follows:
  81
+    #
  82
+    #     raise WrongEncodingError.new(
  83
+    #       problematic_string,
  84
+    #       expected_encoding
  85
+    #     )
  86
+
10 87
     eager_autoload do
11 88
       autoload :Error
12 89
       autoload :Handler
@@ -16,26 +93,22 @@ class Template
16 93
 
17 94
     extend Template::Handlers
18 95
 
19  
-    attr_reader :source, :identifier, :handler, :virtual_path, :formats
  96
+    attr_reader :source, :identifier, :handler, :virtual_path, :formats,
  97
+                :original_encoding
20 98
 
21  
-    Finalizer = proc do |method_name|
  99
+    Finalizer = proc do |method_name, mod|
22 100
       proc do
23  
-        ActionView::CompiledTemplates.module_eval do
  101
+        mod.module_eval do
24 102
           remove_possible_method method_name
25 103
         end
26 104
       end
27 105
     end
28 106
 
29 107
     def initialize(source, identifier, handler, details)
30  
-      if source.encoding_aware? && source =~ %r{\A#{ENCODING_FLAG}}
31  
-        # don't snip off the \n to preserve line numbers
32  
-        source.sub!(/\A[^\n]*/, '')
33  
-        source.force_encoding($1).encode
34  
-      end
35  
-
36  
-      @source     = source
37  
-      @identifier = identifier
38  
-      @handler    = handler
  108
+      @source             = source
  109
+      @identifier         = identifier
  110
+      @handler            = handler
  111
+      @original_encoding  = nil
39 112
 
40 113
       @virtual_path = details[:virtual_path]
41 114
       @method_names = {}
@@ -48,7 +121,13 @@ def render(view, locals, &block)
48 121
       # Notice that we use a bang in this instrumentation because you don't want to
49 122
       # consume this in production. This is only slow if it's being listened to.
50 123
       ActiveSupport::Notifications.instrument("!render_template.action_view", :virtual_path => @virtual_path) do
51  
-        method_name = compile(locals, view)
  124
+        if view.is_a?(ActionView::CompiledTemplates)
  125
+          mod = ActionView::CompiledTemplates
  126
+        else
  127
+          mod = view.singleton_class
  128
+        end
  129
+
  130
+        method_name = compile(locals, view, mod)
52 131
         view.send(method_name, locals, &block)
53 132
       end
54 133
     rescue Exception => e
@@ -56,7 +135,7 @@ def render(view, locals, &block)
56 135
         e.sub_template_of(self)
57 136
         raise e
58 137
       else
59  
-        raise Template::Error.new(self, view.assigns, e)
  138
+        raise Template::Error.new(self, view.respond_to?(:assigns) ? view.assigns : {}, e)
60 139
       end
61 140
     end
62 141
 
@@ -81,37 +160,97 @@ def inspect
81 160
     end
82 161
 
83 162
     private
84  
-      def compile(locals, view)
  163
+      # Among other things, this method is responsible for properly setting
  164
+      # the encoding of the source. Until this point, we assume that the
  165
+      # source is BINARY data. If no additional information is supplied,
  166
+      # we assume the encoding is the same as Encoding.default_external.
  167
+      #
  168
+      # The user can also specify the encoding via a comment on the first
  169
+      # line of the template (# encoding: NAME-OF-ENCODING). This will work
  170
+      # with any template engine, as we process out the encoding comment
  171
+      # before passing the source on to the template engine, leaving a
  172
+      # blank line in its stead.
  173
+      #
  174
+      # Note that after we figure out the correct encoding, we then
  175
+      # encode the source into Encoding.default_internal. In general,
  176
+      # this means that templates will be UTF-8 inside of Rails,
  177
+      # regardless of the original source encoding.
  178
+      def compile(locals, view, mod)
85 179
         method_name = build_method_name(locals)
86 180
         return method_name if view.respond_to?(method_name)
87 181
 
88 182
         locals_code = locals.keys.map! { |key| "#{key} = local_assigns[:#{key}];" }.join
89 183
 
90  
-        code = @handler.call(self)
91  
-        if code.sub!(/\A(#.*coding.*)\n/, '')
92  
-          encoding_comment = $1
93  
-        elsif defined?(Encoding) && Encoding.respond_to?(:default_external)
94  
-          encoding_comment = "#coding:#{Encoding.default_external}"
  184
+        if source.encoding_aware?
  185
+          if source.sub!(/\A#{ENCODING_FLAG}/, '')
  186
+            encoding = $1
  187
+          else
  188
+            encoding = Encoding.default_external
  189
+          end
  190
+
  191
+          # Tag the source with the default external encoding
  192
+          # or the encoding specified in the file
  193
+          source.force_encoding(encoding)
  194
+
  195
+          # If the original encoding is BINARY, the actual
  196
+          # encoding is either stored out-of-band (such as
  197
+          # in ERB <%# %> style magic comments) or missing.
  198
+          # This is also true if the original encoding is
  199
+          # something other than BINARY, but it's invalid.
  200
+          if source.encoding != Encoding::BINARY && source.valid_encoding?
  201
+            source.encode!
  202
+          # If the assumed encoding is incorrect, check to
  203
+          # see whether the handler accepts BINARY. If it
  204
+          # does, it has another mechanism for determining
  205
+          # the true encoding of the String.
  206
+          elsif @handler.respond_to?(:accepts_binary?) && @handler.accepts_binary?
  207
+            source.force_encoding(Encoding::BINARY)
  208
+          # If the handler does not accept BINARY, the
  209
+          # assumed encoding (either the default_external,
  210
+          # or the explicit encoding specified by the user)
  211
+          # is incorrect. We raise an exception here.
  212
+          else
  213
+            raise WrongEncodingError.new(source, encoding)
  214
+          end
  215
+
  216
+          # Don't validate the encoding yet -- the handler
  217
+          # may treat the String as raw bytes and extract
  218
+          # the encoding some other way
95 219
         end
96 220
 
  221
+        code = @handler.call(self)
  222
+
97 223
         source = <<-end_src
98 224
           def #{method_name}(local_assigns)
99  
-            _old_virtual_path, @_virtual_path = @_virtual_path, #{@virtual_path.inspect};_old_output_buffer = output_buffer;#{locals_code};#{code}
  225
+            _old_virtual_path, @_virtual_path = @_virtual_path, #{@virtual_path.inspect};_old_output_buffer = @output_buffer;#{locals_code};#{code}
100 226
           ensure
101  
-            @_virtual_path, self.output_buffer = _old_virtual_path, _old_output_buffer
  227
+            @_virtual_path, @output_buffer = _old_virtual_path, _old_output_buffer
102 228
           end
103 229
         end_src
104 230
 
105  
-        if encoding_comment
106  
-          source = "#{encoding_comment}\n#{source}"
107  
-          line = -1
108  
-        else
109  
-          line = 0
  231
+        if source.encoding_aware?
  232
+          # Handlers should return their source Strings in either the
  233
+          # default_internal or BINARY. If the handler returns a BINARY
  234
+          # String, we assume its encoding is the one we determined
  235
+          # earlier, and encode the resulting source in the default_internal.
  236
+          if source.encoding == Encoding::BINARY
  237
+            source.force_encoding(Encoding.default_internal)
  238
+          end
  239
+
  240
+          # In case we get back a String from a handler that is not in
  241
+          # BINARY or the default_internal, encode it to the default_internal
  242
+          source.encode!
  243
+
  244
+          # Now, validate that the source we got back from the template
  245
+          # handler is valid in the default_internal
  246
+          unless source.valid_encoding?
  247
+            raise WrongEncodingError.new(@source, Encoding.default_internal)
  248
+          end
110 249
         end
111 250
 
112 251
         begin
113  
-          ActionView::CompiledTemplates.module_eval(source, identifier, line)
114  
-          ObjectSpace.define_finalizer(self, Finalizer[method_name])
  252
+          mod.module_eval(source, identifier, 0)
  253
+          ObjectSpace.define_finalizer(self, Finalizer[method_name, mod])
115 254
 
116 255
           method_name
117 256
         rescue Exception => e # errors from template code
18  actionpack/lib/action_view/template/error.rb
@@ -4,6 +4,24 @@ module ActionView
4 4
   class ActionViewError < StandardError #:nodoc:
5 5
   end
6 6
 
  7
+  class EncodingError < StandardError #:nodoc:
  8
+  end
  9
+
  10
+  class WrongEncodingError < EncodingError #:nodoc:
  11
+    def initialize(string, encoding)
  12
+      @string, @encoding = string, encoding
  13
+    end
  14
+
  15
+    def message
  16
+      "Your template was not saved as valid #{@encoding}. Please " \
  17
+      "either specify #{@encoding} as the encoding for your template " \
  18
+      "in your text editor, or mark the template with its " \
  19
+      "encoding by inserting the following as the first line " \
  20
+      "of the template:\n\n# encoding: <name of correct encoding>.\n\n" \
  21
+      "The source of your template was:\n\n#{@string}"
  22
+    end
  23
+  end
  24
+
7 25
   class MissingTemplate < ActionViewError #:nodoc:
8 26
     attr_reader :path
9 27
 
45  actionpack/lib/action_view/template/handlers/erb.rb
@@ -5,6 +5,11 @@
5 5
 
6 6
 module ActionView
7 7
   class OutputBuffer < ActiveSupport::SafeBuffer
  8
+    def initialize(*)
  9
+      super
  10
+      encode!
  11
+    end
  12
+
8 13
     def <<(value)
9 14
       super(value.to_s)
10 15
     end
@@ -72,16 +77,50 @@ class ERB < Handler
72 77
         cattr_accessor :erb_implementation
73 78
         self.erb_implementation = Erubis
74 79
 
75  
-        ENCODING_TAG = Regexp.new("\A(<%#{ENCODING_FLAG}-?%>)[ \t]*")
  80
+        ENCODING_TAG = Regexp.new("\\A(<%#{ENCODING_FLAG}-?%>)[ \\t]*")
  81
+
  82
+        def self.accepts_binary?
  83
+          true
  84
+        end
76 85
 
77 86
         def compile(template)
78  
-          erb = template.source.gsub(ENCODING_TAG, '')
  87
+          if template.source.encoding_aware?
  88
+            # Even though Rails has given us a String tagged with the
  89
+            # default_internal encoding (likely UTF-8), it is possible
  90
+            # that the String is actually encoded using a different
  91
+            # encoding, specified via an ERB magic comment. If the
  92
+            # String is not actually UTF-8, the regular expression
  93
+            # engine will (correctly) raise an exception. For now,
  94
+            # we'll reset the String to BINARY so we can run regular
  95
+            # expressions against it
  96
+            template_source = template.source.dup.force_encoding("BINARY")
  97
+
  98
+            # Erubis does not have direct support for encodings.
  99
+            # As a result, we will extract the ERB-style magic
  100
+            # comment, give the String to Erubis as BINARY data,
  101
+            # and then tag the resulting String with the extracted
  102
+            # encoding later
  103
+            erb = template_source.gsub(ENCODING_TAG, '')
  104
+            encoding = $2
  105
+
  106
+            if !encoding && (template.source.encoding == Encoding::BINARY)
  107
+              raise WrongEncodingError.new(template_source, Encoding.default_external)
  108
+            end
  109
+          end
  110
+
79 111
           result = self.class.erb_implementation.new(
80 112
             erb,
81 113
             :trim => (self.class.erb_trim_mode == "-")
82 114
           ).src
83 115
 
84  
-          result = "#{$2}\n#{result}" if $2
  116
+          # If an encoding tag was found, tag the String
  117
+          # we're returning with that encoding. Otherwise,
  118
+          # return a BINARY String, which is what ERB
  119
+          # returns. Note that if a magic comment was
  120
+          # not specified, we will return the data to
  121
+          # Rails as BINARY, which will then use its
  122
+          # own encoding logic to create a UTF-8 String.
  123
+          result = "\n#{result}".force_encoding(encoding).encode if encoding
85 124
           result
86 125
         end
87 126
       end
5  actionpack/lib/action_view/template/resolver.rb
@@ -70,7 +70,10 @@ def query(path, exts, formats)
70 70
 
71 71
       Dir[query].reject { |p| File.directory?(p) }.map do |p|
72 72
         handler, format = extract_handler_and_format(p, formats)
73  
-        Template.new(File.read(p), File.expand_path(p), handler,
  73
+
  74
+        contents = File.open(p, "rb") {|io| io.read }
  75
+
  76
+        Template.new(contents, File.expand_path(p), handler,
74 77
           :virtual_path => path, :format => format)
75 78
       end
76 79
     end
4  actionpack/test/abstract_unit.rb
@@ -12,6 +12,10 @@
12 12
 
13 13
 ENV['TMPDIR'] = File.join(File.dirname(__FILE__), 'tmp')
14 14
 
  15
+if defined?(Encoding.default_internal)
  16
+  Encoding.default_internal = "UTF-8"
  17
+end
  18
+
15 19
 require 'test/unit'
16 20
 require 'abstract_controller'
17 21
 require 'action_controller'
4  actionpack/test/controller/assert_select_test.rb
@@ -212,12 +212,12 @@ def test_assert_select_text_match
212 212
       assert_nothing_raised    { assert_select "div", "bar" }
213 213
       assert_nothing_raised    { assert_select "div", /\w*/ }
214 214
       assert_nothing_raised    { assert_select "div", :text => /\w*/, :count=>2 }
215  
-      assert_raise(Assertion) { assert_select "div", :text=>"foo", :count=>2 }
  215
+      assert_raise(Assertion)  { assert_select "div", :text=>"foo", :count=>2 }
216 216
       assert_nothing_raised    { assert_select "div", :html=>"<span>bar</span>" }
217 217
       assert_nothing_raised    { assert_select "div", :html=>"<span>bar</span>" }
218 218
       assert_nothing_raised    { assert_select "div", :html=>/\w*/ }
219 219
       assert_nothing_raised    { assert_select "div", :html=>/\w*/, :count=>2 }
220  
-      assert_raise(Assertion) { assert_select "div", :html=>"<span>foo</span>", :count=>2 }
  220
+      assert_raise(Assertion)  { assert_select "div", :html=>"<span>foo</span>", :count=>2 }
221 221
     end
222 222
   end
223 223
 
2  actionpack/test/controller/capture_test.rb
@@ -68,6 +68,6 @@ def test_proper_block_detection
68 68
 
69 69
   private
70 70
     def expected_content_for_output
71  
-      "<title>Putting stuff in the title!</title>\n\nGreat stuff!"
  71
+      "<title>Putting stuff in the title!</title>\nGreat stuff!"
72 72
     end
73 73
 end
4  actionpack/test/controller/render_test.rb
@@ -1079,7 +1079,7 @@ def test_rendering_with_conflicting_local_vars
1079 1079
 
1080 1080
   def test_action_talk_to_layout
1081 1081
     get :action_talk_to_layout
1082  
-    assert_equal "<title>Talking to the layout</title>\n\nAction was here!", @response.body
  1082
+    assert_equal "<title>Talking to the layout</title>\nAction was here!", @response.body
1083 1083
   end
1084 1084
 
1085 1085
   # :addressed:
@@ -1096,7 +1096,7 @@ def test_template_with_locals
1096 1096
 
1097 1097
   def test_yield_content_for
1098 1098
     assert_not_deprecated { get :yield_content_for }
1099  
-    assert_equal "<title>Putting stuff in the title!</title>\n\nGreat stuff!\n", @response.body
  1099
+    assert_equal "<title>Putting stuff in the title!</title>\nGreat stuff!\n", @response.body
1100 1100
   end
1101 1101
 
1102 1102
   def test_overwritting_rendering_relative_file_with_extension
3  actionpack/test/fixtures/test/content_for.erb
... ...
@@ -1,2 +1 @@
1  
-<% content_for :title do %>Putting stuff in the title!<% end %>
2  
-Great stuff!
  1
+<% content_for :title do -%>Putting stuff in the title!<% end -%>Great stuff!
2  actionpack/test/fixtures/test/content_for_concatenated.erb
... ...
@@ -1,3 +1,3 @@
1 1
 <% content_for :title, "Putting stuff "
2  
-   content_for :title, "in the title!" %>
  2
+   content_for :title, "in the title!" -%>
3 3
 Great stuff!
2  actionpack/test/fixtures/test/content_for_with_parameter.erb
... ...
@@ -1,2 +1,2 @@
1  
-<% content_for :title, "Putting stuff in the title!" %>
  1
+<% content_for :title, "Putting stuff in the title!" -%>
2 2
 Great stuff!
2  actionpack/test/fixtures/test/non_erb_block_content_for.builder
... ...
@@ -1,4 +1,4 @@
1 1
 content_for :title do
2 2
   'Putting stuff in the title!'
3 3
 end
4  
-xml << "\nGreat stuff!"
  4
+xml << "Great stuff!"
10  actionpack/test/template/render_test.rb
@@ -232,13 +232,13 @@ def test_render_with_layout
232 232
   # TODO: Move to deprecated_tests.rb
233 233
   def test_render_with_nested_layout_deprecated
234 234
     assert_deprecated do
235  
-      assert_equal %(<title>title</title>\n\n\n<div id="column">column</div>\n<div id="content">content</div>\n),
  235
+      assert_equal %(<title>title</title>\n\n<div id="column">column</div>\n<div id="content">content</div>\n),
236 236
         @view.render(:file => "test/deprecated_nested_layout.erb", :layout => "layouts/yield")
237 237
     end
238 238
   end
239 239
 
240 240
   def test_render_with_nested_layout
241  
-    assert_equal %(<title>title</title>\n\n\n<div id="column">column</div>\n<div id="content">content</div>\n),
  241
+    assert_equal %(<title>title</title>\n\n<div id="column">column</div>\n<div id="content">content</div>\n),
242 242
       @view.render(:file => "test/nested_layout.erb", :layout => "layouts/yield")
243 243
   end
244 244
 
@@ -284,7 +284,7 @@ def test_render_utf8_template_with_magic_comment
284 284
       with_external_encoding Encoding::ASCII_8BIT do
285 285
         result = @view.render(:file => "test/utf8_magic.html.erb", :layouts => "layouts/yield")
286 286
         assert_equal Encoding::UTF_8, result.encoding
287  
-        assert_equal "Русский текст\n\nUTF-8\nUTF-8\nUTF-8\n", result
  287
+        assert_equal "\nРусский \nтекст\n\nUTF-8\nUTF-8\nUTF-8\n", result
288 288
       end
289 289
     end
290 290
 
@@ -302,7 +302,7 @@ def test_render_utf8_template_with_incompatible_external_encoding
302 302
           result = @view.render(:file => "test/utf8.html.erb", :layouts => "layouts/yield")
303 303
           flunk 'Should have raised incompatible encoding error'
304 304
         rescue ActionView::Template::Error => error
305  
-          assert_match 'invalid byte sequence in Shift_JIS', error.original_exception.message
  305
+          assert_match 'Your template was not saved as valid Shift_JIS', error.original_exception.message
306 306
         end
307 307
       end
308 308
     end
@@ -313,7 +313,7 @@ def test_render_utf8_template_with_partial_with_incompatible_encoding
313 313
           result = @view.render(:file => "test/utf8_magic_with_bare_partial.html.erb", :layouts => "layouts/yield")
314 314
           flunk 'Should have raised incompatible encoding error'
315 315
         rescue ActionView::Template::Error => error
316  
-          assert_match 'invalid byte sequence in Shift_JIS', error.original_exception.message
  316
+          assert_match 'Your template was not saved as valid Shift_JIS', error.original_exception.message
317 317
         end
318 318
       end
319 319
     end
128  actionpack/test/template/template_test.rb
... ...
@@ -0,0 +1,128 @@
  1
+require "abstract_unit"
  2
+
  3
+# These are the normal settings that will be set up by Railties
  4
+# TODO: Have these tests support other combinations of these values
  5
+Encoding.default_internal = "UTF-8"
  6
+Encoding.default_external = "UTF-8"
  7
+
  8
+class TestERBTemplate < ActiveSupport::TestCase
  9
+  ERBHandler = ActionView::Template::Handlers::ERB
  10
+
  11
+  class Context
  12
+    def initialize
  13
+      @output_buffer = "original"
  14
+    end
  15
+
  16
+    def hello
  17
+      "Hello"
  18
+    end
  19
+
  20
+    def partial
  21
+      ActionView::Template.new(
  22
+        "<%= @_virtual_path %>",
  23
+        "partial",
  24
+        ERBHandler,
  25
+        :virtual_path => "partial"
  26
+      )
  27
+    end
  28
+
  29
+    def logger
  30
+      require "logger"
  31
+      Logger.new(STDERR)
  32
+    end
  33
+
  34
+    def my_buffer
  35
+      @output_buffer
  36
+    end
  37
+  end
  38
+
  39
+  def new_template(body = "<%= hello %>", handler = ERBHandler, details = {})
  40
+    ActionView::Template.new(body, "hello template", ERBHandler, {:virtual_path => "hello"})
  41
+  end
  42
+
  43
+  def render(locals = {})
  44
+    @template.render(@obj, locals)
  45
+  end
  46
+
  47
+  def setup
  48
+    @obj = Context.new
  49
+  end
  50
+
  51
+  def test_basic_template
  52
+    @template = new_template
  53
+    assert_equal "Hello", render
  54
+  end
  55
+
  56
+  def test_locals
  57
+    @template = new_template("<%= my_local %>")
  58
+    assert_equal "I'm a local", render(:my_local => "I'm a local")
  59
+  end
  60
+
  61
+  def test_restores_buffer
  62
+    @template = new_template
  63
+    assert_equal "Hello", render
  64
+    assert_equal "original", @obj.my_buffer
  65
+  end
  66
+
  67
+  def test_virtual_path
  68
+    @template = new_template("<%= @_virtual_path %>" \
  69
+                             "<%= partial.render(self, {}) %>" \
  70
+                             "<%= @_virtual_path %>")
  71
+    assert_equal "hellopartialhello", render
  72
+  end
  73
+
  74
+  if "ruby".encoding_aware?
  75
+    def test_resulting_string_is_utf8
  76
+      @template = new_template
  77
+      assert_equal Encoding::UTF_8, render.encoding
  78
+    end
  79
+
  80
+    def test_no_magic_comment_word_with_utf_8
  81
+      @template = new_template("hello \u{fc}mlat")
  82
+      assert_equal Encoding::UTF_8, render.encoding
  83
+      assert_equal "hello \u{fc}mlat", render
  84
+    end
  85
+
  86
+    # This test ensures that if the default_external
  87
+    # is set to something other than UTF-8, we don't
  88
+    # get any errors and get back a UTF-8 String.
  89
+    def test_default_external_works
  90
+      Encoding.default_external = "ISO-8859-1"
  91
+      @template = new_template("hello \xFCmlat")
  92
+      assert_equal Encoding::UTF_8, render.encoding
  93
+      assert_equal "hello \u{fc}mlat", render
  94
+    ensure
  95
+      Encoding.default_external = "UTF-8"
  96
+    end
  97
+
  98
+    def test_encoding_can_be_specified_with_magic_comment
  99
+      @template = new_template("# encoding: ISO-8859-1\nhello \xFCmlat")
  100
+      assert_equal Encoding::UTF_8, render.encoding
  101
+      assert_equal "\nhello \u{fc}mlat", render
  102
+    end
  103
+
  104
+    # TODO: This is currently handled inside ERB. The case of explicitly
  105
+    # lying about encodings via the normal Rails API should be handled
  106
+    # inside Rails.
  107
+    def test_lying_with_magic_comment
  108
+      assert_raises(ActionView::Template::Error) do
  109
+        @template = new_template("# encoding: UTF-8\nhello \xFCmlat")
  110
+        render
  111
+      end
  112
+    end
  113
+
  114
+    def test_encoding_can_be_specified_with_magic_comment_in_erb
  115
+      @template = new_template("<%# encoding: ISO-8859-1 %>hello \xFCmlat")
  116
+      result = render
  117
+      assert_equal Encoding::UTF_8, render.encoding
  118
+      assert_equal "hello \u{fc}mlat", render
  119
+    end
  120
+
  121
+    def test_error_when_template_isnt_valid_utf8
  122
+      assert_raises(ActionView::Template::Error, /\xFC/) do
  123
+        @template = new_template("hello \xFCmlat")
  124
+        render
  125
+      end
  126
+    end
  127
+  end
  128
+end
21  railties/guides/source/getting_started.textile
Source Rendered
@@ -1462,11 +1462,32 @@ Rails also comes with built-in help that you can generate using the rake command
1462 1462
 * Running +rake doc:guides+ will put a full copy of the Rails Guides in the +doc/guides+ folder of your application. Open +doc/guides/index.html+ in your web browser to explore the Guides.
1463 1463
 * Running +rake doc:rails+ will put a full copy of the API documentation for Rails in the +doc/api+ folder of your application. Open +doc/api/index.html+ in your web browser to explore the API documentation.
1464 1464
 
  1465
+h3. Configuration Gotchas
  1466
+
  1467
+The easiest way to work with Rails is to store all external data as UTF-8. If you don't, Ruby libraries and Rails will often be able to convert your native data into UTF-8, but this doesn't always work reliably, so you're better off ensuring that all external data is UTF-8.
  1468
+
  1469
+If you have made a mistake in this area, the most common symptom is a black diamond with a question mark inside appearing in the browser. Another common symptom is characters like "ü" appearing instead of "ü". Rails takes a number of internal steps to mitigate common causes of these problems that can be automatically detected and corrected. However, if you have external data that is not stored as UTF-8, it can occasionally result in these kinds of issues that cannot be automatically detected by Rails and corrected.
  1470
+
  1471
+Two very common sources of data that are not UTF-8:
  1472
+* Your text editor: Most text editors (such as Textmate), default to saving files as
  1473
+  UTF-8. If your text editor does not, this can result in special characters that you
  1474
+  enter in your templates (such as é) to appear as a diamond with a question mark inside
  1475
+  in the browser. This also applies to your I18N translation files.
  1476
+  Most editors that do not already default to UTF-8 (such as some versions of
  1477
+  Dreamweaver) offer a way to change the default to UTF-8. Do so.
  1478
+* Your database. Rails defaults to converting data from your database into UTF-8 at
  1479
+  the boundary. However, if your database is not using UTF-8 internally, it may not
  1480
+  be able to store all characters that your users enter. For instance, if your database
  1481
+  is using Latin-1 internally, and your user enters a Russian, Hebrew, or Japanese
  1482
+  character, the data will be lost forever once it enters the database. If possible,
  1483
+  use UTF-8 as the internal storage of your database.
1465 1484
 
1466 1485
 h3. Changelog
1467 1486
 
1468 1487
 "Lighthouse ticket":http://rails.lighthouseapp.com/projects/16213-rails-guides/tickets/2
1469 1488
 
  1489
+* May 16, 2010: Added a section on configuration gotchas to address common encoding
  1490
+  problems that people might have
1470 1491
 * April 30, 2010: Fixes, editing and updating of code samples by "Rohit Arondekar":http://rohitarondekar.com
1471 1492
 * April 25, 2010: Couple of more minor fixups "Mikel Lindsaar":credits:html#raasdnil
1472 1493
 * April 1, 2010: Fixed document to validate XHTML 1.0 Strict. "Jaime Iniesta":http://jaimeiniesta.com
1  railties/lib/rails.rb
@@ -23,6 +23,7 @@
23 23
   $KCODE='u'
24 24
 else
25 25
   Encoding.default_external = Encoding::UTF_8
  26
+  Encoding.default_internal = Encoding::UTF_8
26 27
 end
27 28
 
28 29
 module Rails
10  railties/lib/rails/application/configuration.rb
... ...
@@ -1,4 +1,5 @@
1 1
 require 'active_support/deprecation'
  2
+require 'active_support/core_ext/string/encoding'
2 3
 require 'rails/engine/configuration'
3 4
 
4 5
 module Rails
@@ -27,8 +28,15 @@ def initialize(*)
27 28
 
28 29
       def encoding=(value)
29 30
         @encoding = value
30  
-        if defined?(Encoding) && Encoding.respond_to?(:default_external=)
  31
+        if "ruby".encoding_aware?
31 32
           Encoding.default_external = value
  33
+          Encoding.default_internal = value
  34
+        else
  35
+          $KCODE = value
  36
+          if $KCODE == "NONE"
  37
+            raise "The value you specified for config.encoding is " \
  38
+                  "invalid. The possible values are UTF8, SJIS, or EUC"
  39
+          end
32 40
         end
33 41
       end
34 42
 
3  railties/test/application/configuration_test.rb
@@ -180,7 +180,8 @@ def teardown
180 180
       require "#{app_path}/config/application"
181 181
 
182 182
       unless RUBY_VERSION < '1.9'
183  
-        assert_equal Encoding.find("utf-8"), Encoding.default_external
  183
+        assert_equal Encoding::UTF_8, Encoding.default_external
  184
+        assert_equal Encoding::UTF_8, Encoding.default_internal
184 185
       end
185 186
     end
186 187
 

5 notes on commit 64d109e

Jeremy Kemper

It begins!

Xavier Noria

Keeping guides up to date in master commits. That's great we need more commits like this.

Yaroslav Markin

N-n-nice! Thanks

James Healy
yob commented on 64d109e May 16, 2010

very nice!

Jose Angel Cortinas

Wow Yehuda, you are the man. Amazing commit!

Prajna Zhang
Prajna commented on 64d109e May 18, 2010

very nice!

Micke Lisinge

Nice!

Please sign in to comment.
Something went wrong with that request. Please try again.