Permalink
Browse files

Significantly improved internal encoding heuristics and support.

* Default Encoding.default_internal to UTF-8
* Eliminated the use of file-wide magic comments to coerce code evaluated inside the file
* Read templates as BINARY, use default_external or template-wide magic comments
  inside the Template to set the initial encoding
  * This means that template handlers in Ruby 1.9 will receive Strings encoded
    in default_internal (UTF-8 by default)
* Create a better Exception for encoding issues, and use it when the template
  source has bytes that are not compatible with the specified encoding
* Allow template handlers to opt-into handling BINARY. If they do so, they
  need to do some of their own manual encoding work
* Added a "Configuration Gotchas" section to the intro Rails Guide instructing
  users to use UTF-8 for everything
* Use config.encoding= in Ruby 1.8, and raise if a value that is an invalid
  $KCODE value is used

Also:
* Fixed a few tests that were assert() rather than assert_equal() and
  were caught by Minitest requiring a String for the message
* Fixed a test where an assert_select was misformed, also caught by
  Minitest being more restrictive
* Fixed a test where a Rack response was returning a String rather
  than an Enumerable
  • Loading branch information...
1 parent af0d1a8 commit 64d109e3539ad600f58536d3ecabd2f87b67fd1c @wycats wycats committed May 16, 2010
View
6 actionpack/lib/action_view.rb
@@ -51,15 +51,17 @@ module ActionView
autoload :MissingTemplate, 'action_view/template/error'
autoload :ActionViewError, 'action_view/template/error'
- autoload :TemplateError, 'action_view/template/error'
+ autoload :EncodingError, 'action_view/template/error'
+ autoload :TemplateError, 'action_view/template/error'
+ autoload :WrongEncodingError, 'action_view/template/error'
autoload :TemplateHandler, 'action_view/template'
autoload :TemplateHandlers, 'action_view/template'
end
autoload :TestCase, 'action_view/test_case'
- ENCODING_FLAG = "#.*coding[:=]\s*(\S+)[ \t]*"
+ ENCODING_FLAG = '#.*coding[:=]\s*(\S+)[ \t]*'
end
require 'active_support/i18n'
View
201 actionpack/lib/action_view/template.rb
@@ -1,12 +1,89 @@
-# encoding: utf-8
-# This is so that templates compiled in this file are UTF-8
require 'active_support/core_ext/array/wrap'
require 'active_support/core_ext/object/blank'
+require 'active_support/core_ext/kernel/singleton_class'
module ActionView
class Template
extend ActiveSupport::Autoload
+ # === Encodings in ActionView::Template
+ #
+ # ActionView::Template is one of a few sources of potential
+ # encoding issues in Rails. This is because the source for
+ # templates are usually read from disk, and Ruby (like most
+ # encoding-aware programming languages) assumes that the
+ # String retrieved through File IO is encoded in the
+ # <tt>default_external</tt> encoding. In Rails, the default
+ # <tt>default_external</tt> encoding is UTF-8.
+ #
+ # As a result, if a user saves their template as ISO-8859-1
+ # (for instance, using a non-Unicode-aware text editor),
+ # and uses characters outside of the ASCII range, their
+ # users will see diamonds with question marks in them in
+ # the browser.
+ #
+ # To mitigate this problem, we use a few strategies:
+ # 1. If the source is not valid UTF-8, we raise an exception
+ # when the template is compiled to alert the user
+ # to the problem.
+ # 2. The user can specify the encoding using Ruby-style
+ # encoding comments in any template engine. If such
+ # a comment is supplied, Rails will apply that encoding
+ # to the resulting compiled source returned by the
+ # template handler.
+ # 3. In all cases, we transcode the resulting String to
+ # the <tt>default_internal</tt> encoding (which defaults
+ # to UTF-8).
+ #
+ # This means that other parts of Rails can always assume
+ # that templates are encoded in UTF-8, even if the original
+ # source of the template was not UTF-8.
+ #
+ # From a user's perspective, the easiest thing to do is
+ # to save your templates as UTF-8. If you do this, you
+ # do not need to do anything else for things to "just work".
+ #
+ # === Instructions for template handlers
+ #
+ # The easiest thing for you to do is to simply ignore
+ # encodings. Rails will hand you the template source
+ # as the default_internal (generally UTF-8), raising
+ # an exception for the user before sending the template
+ # to you if it could not determine the original encoding.
+ #
+ # For the greatest simplicity, you can support only
+ # UTF-8 as the <tt>default_internal</tt>. This means
+ # that from the perspective of your handler, the
+ # entire pipeline is just UTF-8.
+ #
+ # === Advanced: Handlers with alternate metadata sources
+ #
+ # If you want to provide an alternate mechanism for
+ # specifying encodings (like ERB does via <%# encoding: ... %>),
+ # you may indicate that you are willing to accept
+ # BINARY data by implementing <tt>self.accepts_binary?</tt>
+ # on your handler.
+ #
+ # If you do, Rails will not raise an exception if
+ # the template's encoding could not be determined,
+ # assuming that you have another mechanism for
+ # making the determination.
+ #
+ # In this case, make sure you return a String from
+ # your handler encoded in the default_internal. Since
+ # you are handling out-of-band metadata, you are
+ # also responsible for alerting the user to any
+ # problems with converting the user's data to
+ # the default_internal.
+ #
+ # To do so, simply raise the raise WrongEncodingError
+ # as follows:
+ #
+ # raise WrongEncodingError.new(
+ # problematic_string,
+ # expected_encoding
+ # )
+
eager_autoload do
autoload :Error
autoload :Handler
@@ -16,26 +93,22 @@ class Template
extend Template::Handlers
- attr_reader :source, :identifier, :handler, :virtual_path, :formats
+ attr_reader :source, :identifier, :handler, :virtual_path, :formats,
+ :original_encoding
- Finalizer = proc do |method_name|
+ Finalizer = proc do |method_name, mod|
proc do
- ActionView::CompiledTemplates.module_eval do
+ mod.module_eval do
remove_possible_method method_name
end
end
end
def initialize(source, identifier, handler, details)
- if source.encoding_aware? && source =~ %r{\A#{ENCODING_FLAG}}
- # don't snip off the \n to preserve line numbers
- source.sub!(/\A[^\n]*/, '')
- source.force_encoding($1).encode
- end
-
- @source = source
- @identifier = identifier
- @handler = handler
+ @source = source
+ @identifier = identifier
+ @handler = handler
+ @original_encoding = nil
@virtual_path = details[:virtual_path]
@method_names = {}
@@ -48,15 +121,21 @@ def render(view, locals, &block)
# Notice that we use a bang in this instrumentation because you don't want to
# consume this in production. This is only slow if it's being listened to.
ActiveSupport::Notifications.instrument("!render_template.action_view", :virtual_path => @virtual_path) do
- method_name = compile(locals, view)
+ if view.is_a?(ActionView::CompiledTemplates)
+ mod = ActionView::CompiledTemplates
+ else
+ mod = view.singleton_class
+ end
+
+ method_name = compile(locals, view, mod)
view.send(method_name, locals, &block)
end
rescue Exception => e
if e.is_a?(Template::Error)
e.sub_template_of(self)
raise e
else
- raise Template::Error.new(self, view.assigns, e)
+ raise Template::Error.new(self, view.respond_to?(:assigns) ? view.assigns : {}, e)
end
end
@@ -81,37 +160,97 @@ def inspect
end
private
- def compile(locals, view)
+ # Among other things, this method is responsible for properly setting
+ # the encoding of the source. Until this point, we assume that the
+ # source is BINARY data. If no additional information is supplied,
+ # we assume the encoding is the same as Encoding.default_external.
+ #
+ # The user can also specify the encoding via a comment on the first
+ # line of the template (# encoding: NAME-OF-ENCODING). This will work
+ # with any template engine, as we process out the encoding comment
+ # before passing the source on to the template engine, leaving a
+ # blank line in its stead.
+ #
+ # Note that after we figure out the correct encoding, we then
+ # encode the source into Encoding.default_internal. In general,
+ # this means that templates will be UTF-8 inside of Rails,
+ # regardless of the original source encoding.
+ def compile(locals, view, mod)
method_name = build_method_name(locals)
return method_name if view.respond_to?(method_name)
locals_code = locals.keys.map! { |key| "#{key} = local_assigns[:#{key}];" }.join
- code = @handler.call(self)
- if code.sub!(/\A(#.*coding.*)\n/, '')
- encoding_comment = $1
- elsif defined?(Encoding) && Encoding.respond_to?(:default_external)
- encoding_comment = "#coding:#{Encoding.default_external}"
+ if source.encoding_aware?
+ if source.sub!(/\A#{ENCODING_FLAG}/, '')
+ encoding = $1
+ else
+ encoding = Encoding.default_external
+ end
+
+ # Tag the source with the default external encoding
+ # or the encoding specified in the file
+ source.force_encoding(encoding)
+
+ # If the original encoding is BINARY, the actual
+ # encoding is either stored out-of-band (such as
+ # in ERB <%# %> style magic comments) or missing.
+ # This is also true if the original encoding is
+ # something other than BINARY, but it's invalid.
+ if source.encoding != Encoding::BINARY && source.valid_encoding?
+ source.encode!
+ # If the assumed encoding is incorrect, check to
+ # see whether the handler accepts BINARY. If it
+ # does, it has another mechanism for determining
+ # the true encoding of the String.
+ elsif @handler.respond_to?(:accepts_binary?) && @handler.accepts_binary?
+ source.force_encoding(Encoding::BINARY)
+ # If the handler does not accept BINARY, the
+ # assumed encoding (either the default_external,
+ # or the explicit encoding specified by the user)
+ # is incorrect. We raise an exception here.
+ else
+ raise WrongEncodingError.new(source, encoding)
+ end
+
+ # Don't validate the encoding yet -- the handler
+ # may treat the String as raw bytes and extract
+ # the encoding some other way
end
+ code = @handler.call(self)
+
source = <<-end_src
def #{method_name}(local_assigns)
- _old_virtual_path, @_virtual_path = @_virtual_path, #{@virtual_path.inspect};_old_output_buffer = output_buffer;#{locals_code};#{code}
+ _old_virtual_path, @_virtual_path = @_virtual_path, #{@virtual_path.inspect};_old_output_buffer = @output_buffer;#{locals_code};#{code}
ensure
- @_virtual_path, self.output_buffer = _old_virtual_path, _old_output_buffer
+ @_virtual_path, @output_buffer = _old_virtual_path, _old_output_buffer
end
end_src
- if encoding_comment
- source = "#{encoding_comment}\n#{source}"
- line = -1
- else
- line = 0
+ if source.encoding_aware?
+ # Handlers should return their source Strings in either the
+ # default_internal or BINARY. If the handler returns a BINARY
+ # String, we assume its encoding is the one we determined
+ # earlier, and encode the resulting source in the default_internal.
+ if source.encoding == Encoding::BINARY
+ source.force_encoding(Encoding.default_internal)
+ end
+
+ # In case we get back a String from a handler that is not in
+ # BINARY or the default_internal, encode it to the default_internal
+ source.encode!
+
+ # Now, validate that the source we got back from the template
+ # handler is valid in the default_internal
+ unless source.valid_encoding?
+ raise WrongEncodingError.new(@source, Encoding.default_internal)
+ end
end
begin
- ActionView::CompiledTemplates.module_eval(source, identifier, line)
- ObjectSpace.define_finalizer(self, Finalizer[method_name])
+ mod.module_eval(source, identifier, 0)
+ ObjectSpace.define_finalizer(self, Finalizer[method_name, mod])
method_name
rescue Exception => e # errors from template code
View
18 actionpack/lib/action_view/template/error.rb
@@ -4,6 +4,24 @@ module ActionView
class ActionViewError < StandardError #:nodoc:
end
+ class EncodingError < StandardError #:nodoc:
+ end
+
+ class WrongEncodingError < EncodingError #:nodoc:
+ def initialize(string, encoding)
+ @string, @encoding = string, encoding
+ end
+
+ def message
+ "Your template was not saved as valid #{@encoding}. Please " \
+ "either specify #{@encoding} as the encoding for your template " \
+ "in your text editor, or mark the template with its " \
+ "encoding by inserting the following as the first line " \
+ "of the template:\n\n# encoding: <name of correct encoding>.\n\n" \
+ "The source of your template was:\n\n#{@string}"
+ end
+ end
+
class MissingTemplate < ActionViewError #:nodoc:
attr_reader :path
View
45 actionpack/lib/action_view/template/handlers/erb.rb
@@ -5,6 +5,11 @@
module ActionView
class OutputBuffer < ActiveSupport::SafeBuffer
+ def initialize(*)
+ super
+ encode!
+ end
+
def <<(value)
super(value.to_s)
end
@@ -72,16 +77,50 @@ class ERB < Handler
cattr_accessor :erb_implementation
self.erb_implementation = Erubis
- ENCODING_TAG = Regexp.new("\A(<%#{ENCODING_FLAG}-?%>)[ \t]*")
+ ENCODING_TAG = Regexp.new("\\A(<%#{ENCODING_FLAG}-?%>)[ \\t]*")
+
+ def self.accepts_binary?
+ true
+ end
def compile(template)
- erb = template.source.gsub(ENCODING_TAG, '')
+ if template.source.encoding_aware?
+ # Even though Rails has given us a String tagged with the
+ # default_internal encoding (likely UTF-8), it is possible
+ # that the String is actually encoded using a different
+ # encoding, specified via an ERB magic comment. If the
+ # String is not actually UTF-8, the regular expression
+ # engine will (correctly) raise an exception. For now,
+ # we'll reset the String to BINARY so we can run regular
+ # expressions against it
+ template_source = template.source.dup.force_encoding("BINARY")
+
+ # Erubis does not have direct support for encodings.
+ # As a result, we will extract the ERB-style magic
+ # comment, give the String to Erubis as BINARY data,
+ # and then tag the resulting String with the extracted
+ # encoding later
+ erb = template_source.gsub(ENCODING_TAG, '')
+ encoding = $2
+
+ if !encoding && (template.source.encoding == Encoding::BINARY)
+ raise WrongEncodingError.new(template_source, Encoding.default_external)
+ end
+ end
+
result = self.class.erb_implementation.new(
erb,
:trim => (self.class.erb_trim_mode == "-")
).src
- result = "#{$2}\n#{result}" if $2
+ # If an encoding tag was found, tag the String
+ # we're returning with that encoding. Otherwise,
+ # return a BINARY String, which is what ERB
+ # returns. Note that if a magic comment was
+ # not specified, we will return the data to
+ # Rails as BINARY, which will then use its
+ # own encoding logic to create a UTF-8 String.
+ result = "\n#{result}".force_encoding(encoding).encode if encoding
result
end
end
View
5 actionpack/lib/action_view/template/resolver.rb
@@ -70,7 +70,10 @@ def query(path, exts, formats)
Dir[query].reject { |p| File.directory?(p) }.map do |p|
handler, format = extract_handler_and_format(p, formats)
- Template.new(File.read(p), File.expand_path(p), handler,
+
+ contents = File.open(p, "rb") {|io| io.read }
+
+ Template.new(contents, File.expand_path(p), handler,
:virtual_path => path, :format => format)
end
end
View
4 actionpack/test/abstract_unit.rb
@@ -12,6 +12,10 @@
ENV['TMPDIR'] = File.join(File.dirname(__FILE__), 'tmp')
+if defined?(Encoding.default_internal)
+ Encoding.default_internal = "UTF-8"
+end
+
require 'test/unit'
require 'abstract_controller'
require 'action_controller'
View
4 actionpack/test/controller/assert_select_test.rb
@@ -212,12 +212,12 @@ def test_assert_select_text_match
assert_nothing_raised { assert_select "div", "bar" }
assert_nothing_raised { assert_select "div", /\w*/ }
assert_nothing_raised { assert_select "div", :text => /\w*/, :count=>2 }
- assert_raise(Assertion) { assert_select "div", :text=>"foo", :count=>2 }
+ assert_raise(Assertion) { assert_select "div", :text=>"foo", :count=>2 }
assert_nothing_raised { assert_select "div", :html=>"<span>bar</span>" }
assert_nothing_raised { assert_select "div", :html=>"<span>bar</span>" }
assert_nothing_raised { assert_select "div", :html=>/\w*/ }
assert_nothing_raised { assert_select "div", :html=>/\w*/, :count=>2 }
- assert_raise(Assertion) { assert_select "div", :html=>"<span>foo</span>", :count=>2 }
+ assert_raise(Assertion) { assert_select "div", :html=>"<span>foo</span>", :count=>2 }
end
end
View
2 actionpack/test/controller/capture_test.rb
@@ -68,6 +68,6 @@ def test_proper_block_detection
private
def expected_content_for_output
- "<title>Putting stuff in the title!</title>\n\nGreat stuff!"
+ "<title>Putting stuff in the title!</title>\nGreat stuff!"
end
end
View
4 actionpack/test/controller/render_test.rb
@@ -1079,7 +1079,7 @@ def test_rendering_with_conflicting_local_vars
def test_action_talk_to_layout
get :action_talk_to_layout
- assert_equal "<title>Talking to the layout</title>\n\nAction was here!", @response.body
+ assert_equal "<title>Talking to the layout</title>\nAction was here!", @response.body
end
# :addressed:
@@ -1096,7 +1096,7 @@ def test_template_with_locals
def test_yield_content_for
assert_not_deprecated { get :yield_content_for }
- assert_equal "<title>Putting stuff in the title!</title>\n\nGreat stuff!\n", @response.body
+ assert_equal "<title>Putting stuff in the title!</title>\nGreat stuff!\n", @response.body
end
def test_overwritting_rendering_relative_file_with_extension
View
3 actionpack/test/fixtures/test/content_for.erb
@@ -1,2 +1 @@
-<% content_for :title do %>Putting stuff in the title!<% end %>
-Great stuff!
+<% content_for :title do -%>Putting stuff in the title!<% end -%>Great stuff!
View
2 actionpack/test/fixtures/test/content_for_concatenated.erb
@@ -1,3 +1,3 @@
<% content_for :title, "Putting stuff "
- content_for :title, "in the title!" %>
+ content_for :title, "in the title!" -%>
Great stuff!
View
2 actionpack/test/fixtures/test/content_for_with_parameter.erb
@@ -1,2 +1,2 @@
-<% content_for :title, "Putting stuff in the title!" %>
+<% content_for :title, "Putting stuff in the title!" -%>
Great stuff!
View
2 actionpack/test/fixtures/test/non_erb_block_content_for.builder
@@ -1,4 +1,4 @@
content_for :title do
'Putting stuff in the title!'
end
-xml << "\nGreat stuff!"
+xml << "Great stuff!"
View
10 actionpack/test/template/render_test.rb
@@ -232,13 +232,13 @@ def test_render_with_layout
# TODO: Move to deprecated_tests.rb
def test_render_with_nested_layout_deprecated
assert_deprecated do
- assert_equal %(<title>title</title>\n\n\n<div id="column">column</div>\n<div id="content">content</div>\n),
+ assert_equal %(<title>title</title>\n\n<div id="column">column</div>\n<div id="content">content</div>\n),
@view.render(:file => "test/deprecated_nested_layout.erb", :layout => "layouts/yield")
end
end
def test_render_with_nested_layout
- assert_equal %(<title>title</title>\n\n\n<div id="column">column</div>\n<div id="content">content</div>\n),
+ assert_equal %(<title>title</title>\n\n<div id="column">column</div>\n<div id="content">content</div>\n),
@view.render(:file => "test/nested_layout.erb", :layout => "layouts/yield")
end
@@ -284,7 +284,7 @@ def test_render_utf8_template_with_magic_comment
with_external_encoding Encoding::ASCII_8BIT do
result = @view.render(:file => "test/utf8_magic.html.erb", :layouts => "layouts/yield")
assert_equal Encoding::UTF_8, result.encoding
- assert_equal "Русский текст\n\nUTF-8\nUTF-8\nUTF-8\n", result
+ assert_equal "\nРусский \nтекст\n\nUTF-8\nUTF-8\nUTF-8\n", result
end
end
@@ -302,7 +302,7 @@ def test_render_utf8_template_with_incompatible_external_encoding
result = @view.render(:file => "test/utf8.html.erb", :layouts => "layouts/yield")
flunk 'Should have raised incompatible encoding error'
rescue ActionView::Template::Error => error
- assert_match 'invalid byte sequence in Shift_JIS', error.original_exception.message
+ assert_match 'Your template was not saved as valid Shift_JIS', error.original_exception.message
end
end
end
@@ -313,7 +313,7 @@ def test_render_utf8_template_with_partial_with_incompatible_encoding
result = @view.render(:file => "test/utf8_magic_with_bare_partial.html.erb", :layouts => "layouts/yield")
flunk 'Should have raised incompatible encoding error'
rescue ActionView::Template::Error => error
- assert_match 'invalid byte sequence in Shift_JIS', error.original_exception.message
+ assert_match 'Your template was not saved as valid Shift_JIS', error.original_exception.message
end
end
end
View
128 actionpack/test/template/template_test.rb
@@ -0,0 +1,128 @@
+require "abstract_unit"
+
+# These are the normal settings that will be set up by Railties
+# TODO: Have these tests support other combinations of these values
+Encoding.default_internal = "UTF-8"
+Encoding.default_external = "UTF-8"
+
+class TestERBTemplate < ActiveSupport::TestCase
+ ERBHandler = ActionView::Template::Handlers::ERB
+
+ class Context
+ def initialize
+ @output_buffer = "original"
+ end
+
+ def hello
+ "Hello"
+ end
+
+ def partial
+ ActionView::Template.new(
+ "<%= @_virtual_path %>",
+ "partial",
+ ERBHandler,
+ :virtual_path => "partial"
+ )
+ end
+
+ def logger
+ require "logger"
+ Logger.new(STDERR)
+ end
+
+ def my_buffer
+ @output_buffer
+ end
+ end
+
+ def new_template(body = "<%= hello %>", handler = ERBHandler, details = {})
+ ActionView::Template.new(body, "hello template", ERBHandler, {:virtual_path => "hello"})
+ end
+
+ def render(locals = {})
+ @template.render(@obj, locals)
+ end
+
+ def setup
+ @obj = Context.new
+ end
+
+ def test_basic_template
+ @template = new_template
+ assert_equal "Hello", render
+ end
+
+ def test_locals
+ @template = new_template("<%= my_local %>")
+ assert_equal "I'm a local", render(:my_local => "I'm a local")
+ end
+
+ def test_restores_buffer
+ @template = new_template
+ assert_equal "Hello", render
+ assert_equal "original", @obj.my_buffer
+ end
+
+ def test_virtual_path
+ @template = new_template("<%= @_virtual_path %>" \
+ "<%= partial.render(self, {}) %>" \
+ "<%= @_virtual_path %>")
+ assert_equal "hellopartialhello", render
+ end
+
+ if "ruby".encoding_aware?
+ def test_resulting_string_is_utf8
+ @template = new_template
+ assert_equal Encoding::UTF_8, render.encoding
+ end
+
+ def test_no_magic_comment_word_with_utf_8
+ @template = new_template("hello \u{fc}mlat")
+ assert_equal Encoding::UTF_8, render.encoding
+ assert_equal "hello \u{fc}mlat", render
+ end
+
+ # This test ensures that if the default_external
+ # is set to something other than UTF-8, we don't
+ # get any errors and get back a UTF-8 String.
+ def test_default_external_works
+ Encoding.default_external = "ISO-8859-1"
+ @template = new_template("hello \xFCmlat")
+ assert_equal Encoding::UTF_8, render.encoding
+ assert_equal "hello \u{fc}mlat", render
+ ensure
+ Encoding.default_external = "UTF-8"
+ end
+
+ def test_encoding_can_be_specified_with_magic_comment
+ @template = new_template("# encoding: ISO-8859-1\nhello \xFCmlat")
+ assert_equal Encoding::UTF_8, render.encoding
+ assert_equal "\nhello \u{fc}mlat", render
+ end
+
+ # TODO: This is currently handled inside ERB. The case of explicitly
+ # lying about encodings via the normal Rails API should be handled
+ # inside Rails.
+ def test_lying_with_magic_comment
+ assert_raises(ActionView::Template::Error) do
+ @template = new_template("# encoding: UTF-8\nhello \xFCmlat")
+ render
+ end
+ end
+
+ def test_encoding_can_be_specified_with_magic_comment_in_erb
+ @template = new_template("<%# encoding: ISO-8859-1 %>hello \xFCmlat")
+ result = render
+ assert_equal Encoding::UTF_8, render.encoding
+ assert_equal "hello \u{fc}mlat", render
+ end
+
+ def test_error_when_template_isnt_valid_utf8
+ assert_raises(ActionView::Template::Error, /\xFC/) do
+ @template = new_template("hello \xFCmlat")
+ render
+ end
+ end
+ end
+end
View
21 railties/guides/source/getting_started.textile
@@ -1462,11 +1462,32 @@ Rails also comes with built-in help that you can generate using the rake command
* Running +rake doc:guides+ will put a full copy of the Rails Guides in the +doc/guides+ folder of your application. Open +doc/guides/index.html+ in your web browser to explore the Guides.
* Running +rake doc:rails+ will put a full copy of the API documentation for Rails in the +doc/api+ folder of your application. Open +doc/api/index.html+ in your web browser to explore the API documentation.
+h3. Configuration Gotchas
+
+The easiest way to work with Rails is to store all external data as UTF-8. If you don't, Ruby libraries and Rails will often be able to convert your native data into UTF-8, but this doesn't always work reliably, so you're better off ensuring that all external data is UTF-8.
+
+If you have made a mistake in this area, the most common symptom is a black diamond with a question mark inside appearing in the browser. Another common symptom is characters like "ü" appearing instead of "ü". Rails takes a number of internal steps to mitigate common causes of these problems that can be automatically detected and corrected. However, if you have external data that is not stored as UTF-8, it can occasionally result in these kinds of issues that cannot be automatically detected by Rails and corrected.
+
+Two very common sources of data that are not UTF-8:
+* Your text editor: Most text editors (such as Textmate), default to saving files as
+ UTF-8. If your text editor does not, this can result in special characters that you
+ enter in your templates (such as é) to appear as a diamond with a question mark inside
+ in the browser. This also applies to your I18N translation files.
+ Most editors that do not already default to UTF-8 (such as some versions of
+ Dreamweaver) offer a way to change the default to UTF-8. Do so.
+* Your database. Rails defaults to converting data from your database into UTF-8 at
+ the boundary. However, if your database is not using UTF-8 internally, it may not
+ be able to store all characters that your users enter. For instance, if your database
+ is using Latin-1 internally, and your user enters a Russian, Hebrew, or Japanese
+ character, the data will be lost forever once it enters the database. If possible,
+ use UTF-8 as the internal storage of your database.
@fxn
fxn May 16, 2010

Keeping guides up to date in master commits. That's great we need more commits like this.

h3. Changelog
"Lighthouse ticket":http://rails.lighthouseapp.com/projects/16213-rails-guides/tickets/2
+* May 16, 2010: Added a section on configuration gotchas to address common encoding
+ problems that people might have
* April 30, 2010: Fixes, editing and updating of code samples by "Rohit Arondekar":http://rohitarondekar.com
* April 25, 2010: Couple of more minor fixups "Mikel Lindsaar":credits:html#raasdnil
* April 1, 2010: Fixed document to validate XHTML 1.0 Strict. "Jaime Iniesta":http://jaimeiniesta.com
View
1 railties/lib/rails.rb
@@ -23,6 +23,7 @@
$KCODE='u'
else
Encoding.default_external = Encoding::UTF_8
+ Encoding.default_internal = Encoding::UTF_8
end
module Rails
View
10 railties/lib/rails/application/configuration.rb
@@ -1,4 +1,5 @@
require 'active_support/deprecation'
+require 'active_support/core_ext/string/encoding'
require 'rails/engine/configuration'
module Rails
@@ -27,8 +28,15 @@ def initialize(*)
def encoding=(value)
@encoding = value
- if defined?(Encoding) && Encoding.respond_to?(:default_external=)
+ if "ruby".encoding_aware?
Encoding.default_external = value
+ Encoding.default_internal = value
+ else
+ $KCODE = value
+ if $KCODE == "NONE"
+ raise "The value you specified for config.encoding is " \
+ "invalid. The possible values are UTF8, SJIS, or EUC"
+ end
end
end
View
3 railties/test/application/configuration_test.rb
@@ -180,7 +180,8 @@ def teardown
require "#{app_path}/config/application"
unless RUBY_VERSION < '1.9'
- assert_equal Encoding.find("utf-8"), Encoding.default_external
+ assert_equal Encoding::UTF_8, Encoding.default_external
+ assert_equal Encoding::UTF_8, Encoding.default_internal
end
end

5 comments on commit 64d109e

@yaroslav

N-n-nice! Thanks

@yob

very nice!

@jacortinas

Wow Yehuda, you are the man. Amazing commit!

@Prajna

very nice!

@lisinge

Nice!

Please sign in to comment.