Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Implement minimal encoding support #175

Merged
merged 6 commits into from Apr 28, 2013

Conversation

Projects
None yet
7 participants
Collaborator

judofyr commented Mar 2, 2013

Oh boy, it's this time again. I want to fix Tilt's encoding support so that it's at least usable. This PR tries to handle encodings correctly within Tilt. I've excluded transcoding because it's fairy simple to implement this outside of Tilt.

The encoding of data

The encoding of the data is done in two "steps". First we try to guess the encoding, then we check if the Template-implementation wants to override it.

Step 1: If you're using a custom reader (Template.new { foobar }) we already know the encoding. If you're reading a file, then we read it as binary and force_encoding it to default_external (to avoid raising exceptions).

Step 2: There's a method called #default_encoding. If this returns a truthy value, we force_encode the data to this encoding. By default it simply looks up the :default_encoding-option, but template engines can override this with custom behavior (e.g. always return UTF-8 for CoffeeScript)

This means that the encoding of the data has/is:

  • Sensible defaults (Encoding.default_external)
  • Overridable by the user (through the :default_encoding-option)
  • Overridable by the template engine (through `#default_encoding)

The encoding of Tilt's generated source code

Tilt combines the output from the template engine with its own "add local variables and define it as an unbound method for performance". It's the encoding of this source code that determines the encoding that the code runs under (e.g. the value of "".encoding).

In this PR we use the same encoding for Tilt's source code as the template engines source code (result from precompiled_template). For legacy reasons I've added support for extracting magic comments, so Tilt behaves the same wether a template engine returns "foo".force_encoding('…') or "# coding: …\nfoo (although the former is preferred)

The encoding of the final string will (most likely) be the same as the source code encoding (although technically the source code can return whatever string with whatever encoding it likes).

Thoughts?

Is this good enough for now? Will this solve your problems?

/cc @josh, @rtomayko, @mislav, @djanowski, @rue, @nesquena, @DAddYE, @apotonick, @brianmario, @rkh, @apohllo, @argent-smith, @fibric

I like what's here so far. I think it offers enough flexibility to cover the most common cases where someone might have a set of templates in mixed encodings.

Something that's always going to be hard is actually knowing the encoding of the file being read off disk. I've seen plenty of cases where people either forget or don't realize they have to put a magic comment in their files. Ideally that wouldn't be required anyway.

I know it would potentially add some headache (especially on Heroku) but would you all be opposed to adding charlock_holmes as a dependency and using that to make a much more accurate encoding detection of the file on disk? That way at least the string would be tagged with the correct encoding and Ruby's encoding implementation could help out with the rest of the hard work that comes along with concatenation.

Although I think using charlock_holmes would work best, requiring libicu as a dependency might be a bit heavy for most users.

Contributor

josh commented Mar 2, 2013

I don't think charlock_holmes should be a hard dependency, but it'd be nice if we had a story to extend it neatly into the detection process.

Collaborator

judofyr commented Mar 2, 2013

The easiest way to do it now would be to just use a different reader:

reader = proc do |t|
  File.read(t.file).detect_encoding!
end

However, this has to be provided by the library/framework that uses Tilt.

Collaborator

judofyr commented Mar 2, 2013

I see the value of having more "magic" in Tilt to alleviate some of the encodings problems, but I don't feel we have safe way to implement it now. Tilt is currently very global; all mappings (and prefers) are registered in a global namespace. We can't really change these semantics in Tilt because it would make changes for every user of Tilt. For Tilt 2.0 I want to add a more granular API that makes it possible to opt-in to features without changing Tilt's behavior globally.

For Tilt 1.4 I'm mostly looking into making encodings suck less. Then I want to structure Tilt 2.0 so that we can make encodings awesome in later releases without breaking anything.

Collaborator

judofyr commented Mar 2, 2013

Actually, considering that this turned out to be a much smaller patch, we might want to consider releasing this as 1.3.5 and jump right to 2.0 later.

Contributor

josh commented Mar 3, 2013

For Tilt 2.0 I want to add a more granular API that makes it possible to opt-in to features without changing Tilt's behavior globally.

Definite 👍 in that direction. Thats basically my use case with sprockets. I have forks of most of the template engines I care about just to get that level of control.

Collaborator

rkh commented Mar 3, 2013

-1 as charlock_holmes as hard dependency, due to it depending on icu and being a C extension.

@judofyr judofyr referenced this pull request Mar 4, 2013

Closed

Encoding Support #107

Owner

rtomayko commented Mar 4, 2013

This looks like a great step to me as well. 👍

Contributor

nesquena commented Mar 4, 2013

I looked through the proposed patch and tested this version of tilt against my original issues and it seems to have solved our issues for the most part. 👍 Thanks @judofyr for putting this together.

Collaborator

judofyr commented Mar 6, 2013

I tweaked it a bit (it now raises an error if you initialize it with an incorrect encoding) and added some documentation.

@judofyr Great to see this issue revived. The PR description seems to make sense. It's everything I've ever wanted.

@djanowski djanowski commented on the diff Mar 7, 2013

lib/tilt/template.rb
@data = @reader.call(self)
+
+ if @data.respond_to?(:force_encoding)
@djanowski

djanowski Mar 7, 2013

Just wondering. Can't we just use File.read options here? Like: File.read(@file, encoding: default_encoding)

@judofyr

judofyr Mar 7, 2013

Collaborator

Because Ruby will try to encode it to Encoding.default_internal.

@djanowski

djanowski Mar 7, 2013

Right, right. Thanks.

@judofyr judofyr merged commit a6789fa into master Apr 28, 2013

1 check failed

default The Travis build failed
Details

@judofyr judofyr deleted the minimal-encodings branch Jul 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment