Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding Logic and Fallback... #213

Closed
Wardrop opened this issue Sep 30, 2013 · 7 comments
Closed

Encoding Logic and Fallback... #213

Wardrop opened this issue Sep 30, 2013 · 7 comments

Comments

@Wardrop
Copy link

Wardrop commented Sep 30, 2013

Yep, it's that encoding subject again. I have a problem with Tilt forcing Encoding::default_external and exploding when it doesn't work out. The problem is, a lot of web servers and thus Ruby (such in the case of Phusion Passenger) are run in the context of a clear environment, equivalent to env -i. The issue is that under this context, the locale always reverts to "POSIX" on many nix distro's, under which Encoding::default_external defaults to US_ASCII.

What happens in this scenario, is that Tilt tries forcing the encoding of what's typically a UTF-8 file to ASCII. It then does a valid_encoding? check, finds it's false, and raises an error.

I think a better behaviour would be to load the file without the binary switch (so ruby reads in the string using the default external encoding), check if the encoding is valid, if not, force encoding to UTF-8. If still invalid, THEN by all means raise the error. UTF-8 is a very common and compatible encoding. Most other encodings will still work when forced to UTF-8, so in my opinion if the default encoding fails, UTF-8 should at least be attempted as a sane fallback behaviour.

Otherwise, Tilt is always at the mercy of the environment it's being run in, making for sensitive Applications prone to breaking. Robustness should be the goal here.

Thoughts?

@judofyr
Copy link
Collaborator

judofyr commented Oct 1, 2013

Any reason you can't set Encoding.default_external yourself?

Encoding.default_external = Encoding::UTF_8

Falling back to UTF-8 seems like a good idea, but it might also lead to inconsistencies. It would mean that files with only ASCII characters would have ASCII encoding, while files with UTF-8 characters would have UTF-8 encoding. This could cause odd errors if you're concatenating, or in any other way manipulating them together.

@Wardrop
Copy link
Author

Wardrop commented Oct 3, 2013

My rationale...

The fact is, my setup is not unusual. I run a standard operating system (openSUSE), standard web server (Nginx), standard RVM installation with Ruby 2.0.p-xxx, and my templates are as standard as you can get. The fact that under such a typical operating environment, Tilt fails to load a template encoded in the most common (and most technically superior) encoding, UTF-8, to me highlights a problem that probably needs to be addressed.

Sure, it's a combination of short falls all the way down to the OS level, but Tilt's got to deal with it like all higher-level libraries should. No one should have to discover like I did that pretty much every nix* distro falls back to POSIX as it's default locale when the environment is cleared, with no means to change it to anything else; locale is completely dependant on environment variables in other words. Most init.d scripts, such the Nginx one for my openSUSE, clear the environment which is why this problem arises.

On top of that, the locale of the server is barely relevant to what the templates of my application/website are encoded in. I develop on my desktop computer, not on my server, so the encoding of my files is going to reflect the locale on my local computer, and whatever software I used to generate the template. The templates I create could be deployed to any server; imagine a popular open-source application using Tilt that's deployed all around the world.

So there's a couple of reason not use the external default encoding. With that said, I can see your point in regards to my suggestion, so perhaps instead of falling back to UTF-8, UTF-8 can be Tilt's default encoding, overridable through a setting in Tilt, and Encoding.default_external could be the fall back. I think most people would like to see there non-UTF8 templates fail, as really everyone should be using UTF8, and if they're not it's probably only by accident.

Thoughts?

@ujifgc
Copy link

ujifgc commented Oct 3, 2013

Default behavior now:

  1. binread # => US_ASCII
  2. force to Encoding.default_external # => UTF-8 or US_ASCII
  3. force to Tilt.default_encoding (none by default) # => no change
  4. check if valid
    • fails if file is UTF-8 encoded and system is in C/POSIX locale
  5. we always get Encoding.default_external data, which is system/app dependent by default

Suggested default behavior:

  1. set Tilt.default_encoding to UTF-8 by default
  2. read # => UTF-8 or US_ASCII according to Encoding.default_external
  3. if it's already Tilt.default_encoding, leave happily # => UTF-8
  4. force to Tilt.default_encoding # => UTF-8
  5. check if valid
    • fail if file is encoded wrongly and we expect it to be UTF-8
  6. we always data in Tilt.default_encoding which is UTF-8 by default

So, basically it's suggested that Tilt should ignore Encoding.default_external (set by system/app) in favor of standard UTF-8; and force some users to set Tilt.default_encoding to something exotic if they need.

@Wardrop am I seeing your point right?

@judofyr
Copy link
Collaborator

judofyr commented Oct 3, 2013

This needs some clarifications:

  • There is no Tilt.default_encoding. There's Encoding.default_external and Tilt#default_encoding. The difference between a class-level method and a instance-level method is huge.
  • We can't use read if we intend to force encoding later. If Encoding.default_internal is set then read will raise when the file is not valid in Encoding.default_external.

@Wardrop

You can set Encoding.default_external is Ruby; you don't need to touch any init files. Just start your script with:

Encoding.default_external = 'UTF-8'

Tilt already has a global default encoding: Encoding.default_external. I'm finding it hard to find cases where you want Tilt's (global) default encoding to be different from Encoding.default_external. Is it likely that your templates are UTF-8, but every other file you read is US-ASCII? Also, what if you use Tilt to load templates from multiple locations with different encodings? Then the global default encoding is useless anyway, and you'll have to pass :default_encoding => ... to every Tilt template.

The way I see it:

  1. You have a simple environment where everything is UTF-8: Set Encoding.default_external = 'UTF-8'.
  2. You have a complex environment with various encodings: You pass :default_encoding to every Tilt instance.

@Wardrop
Copy link
Author

Wardrop commented Oct 6, 2013

To clarify, this discussion is all about default behaviours. It's not that Tilt doesn't provide plenty of means to override the default behaviour. Ruby's Encoding.default_external can be set at many levels, and Tilt provides it's own means of overriding this, by either setting : default_external, or loading the file yourself by passing a block containing you're own logic.

The point of this discussion is, in what likely makes up a significant portion of production environments, the current default usually resolves to something non-UTF-8, whilst I'd hazard a guess that the [vast] majority of templates would be encoded in UTF-8. Point being, as it stands, the average user is likely to run into an encoding issue at some point between development and deployment, if not sometime after if that deployment environment was to change in some way.

I suppose there's an expectation these days that UTF-8 should always "just work" without having to make any special considerations. Am I being too idealistic? I feel the current default behaviour would make a better fallback strategy, rather than as a primary strategy. I believe UTF-8 should be assumed, unless where :default_encoding has been explicitly provided. Perhaps :default_encoding should take an extra option:

  • nil [default] - Load as UTF-8 first, then fallback to Encoding.default_external.
  • :system - Apply current behaviour of applying Encoding.default_external.
  • <string> - Should be applied as the default encoding, like :default_encoding is currently used for.

That's what I'd like to see.

@judofyr
Copy link
Collaborator

judofyr commented Nov 30, 2013

My main issue is adding another global setting. If people use Tilt in an environment that doesn't use UTF-8 they now need to set two settings (Encoding.default_external and Tilt.default_encoding). And if you're using UTF-8 and not settings Encoding.default_external you're going to have a hard time anyway.

Tilt is also a library, not a framework. I expect people to build tools on top of Tilt and these tools can be more opinionated about encodings. For now I'd rather keep it as close as possible to Ruby's encoding system.

@Wardrop
Copy link
Author

Wardrop commented Dec 3, 2013

It's an awkward situation, and really, Encoding.default_external should ideally always be UTF-8 on a modern western operating system, but there are scenarios unfortunately where that is not the case (as we've discusse). I believe setting encoding on a per-application basis is perhaps the best option, and like you said, this is something I can default to UTF-8 in my framework.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants