Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding "encoding" configuration (4th version) #1449

Merged
merged 8 commits into from Sep 24, 2013

Conversation

Projects
None yet
7 participants
@shigeya
Copy link
Contributor

shigeya commented Aug 24, 2013

Rebased towards current master, to support elimination of monkey patch to File.read.

Need tests. (require some discussion)

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Aug 24, 2013

I have considered how to implement the test.

For generation part of the program, I prefer to have both encoding=utf-8 case and without encoding (current default).
I have reviewed test on https://github.com/mojombo/jekyll/blob/master/test/test_site.rb .

First option is test whole site generation with or without utf-8, but it will add tons of tests (when run in Ruby >=1.9).
Other option is add a fraction of site generation in utf-8 only with Ruby >=1.9
Which maintainers prefer?

Also, do you need to have test for configuration? It looks like there are no tests among default configuration parameters. I can add, but I have no idea whether it is meaningful or not.

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Aug 24, 2013

btw, while I think monkey-patching String is bad idea, monkey-patching on File.read on options seems reasonable. It clearer than if RUBY_VERSION in various places. If maintainers agree on this, I can modify that way, too.

@mattr-

View changes

lib/jekyll/convertible.rb Outdated

self.content = if RUBY_VERSION < "1.9"

This comment has been minimized.

@mattr-

mattr- Aug 24, 2013

Member

Please wrap this conditional into a method call so that we don't have to parse what's going on. A method name like read_content would work better here instead.

@mattr-

View changes

lib/jekyll/tags/include.rb Outdated
@@ -62,7 +62,11 @@ def render(context)
Dir.chdir(includes_dir) do
choices = Dir['**/*'].reject { |x| File.symlink?(x) }
if choices.include?(@file)
source = File.read(@file)
source = if RUBY_VERSION < "1.9"

This comment has been minimized.

@mattr-

mattr- Aug 24, 2013

Member

Similar to the comment above, I think this should go into its own method as well.

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Aug 24, 2013

@mattr- I added a commit. Is this the style what you like to see?

@mattr-

This comment has been minimized.

Copy link
Member

mattr- commented Aug 25, 2013

Yes, that's a good start. There's a couple of other places that could use some method extraction as well. 😃

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Aug 25, 2013

I don't understand how far you want to make these descriptive compare to other codes. I see line in initializer of site.rb can be described in other way, but I think it's enough minimal and descriptive.

Of course It's possible to do that way one by one. Can you provide guidelines or conditions on judging such? Then, you don't necessary to comment every commit this way and easier.

@mattr-

This comment has been minimized.

Copy link
Member

mattr- commented Aug 25, 2013

The conditionals around the RUBY_VERSION checks need their own methods,
IMO. Once that's done, then I think it's good to merge.

On Sat, Aug 24, 2013 at 10:54 PM, Shigeya Suzuki
notifications@github.comwrote:

I don't understand how far you want to make these descriptive compare to
other codes. I see line in initialized of site.rb can be described in
other way, but I think it's enough minimal and descriptive.

Of course It's possible to do that way one by one. Can you provide
guidelines or conditions on judging such? Then, you don't necessary to
comment every commit this way and easier.


Reply to this email directly or view it on GitHubhttps://github.com//pull/1449#issuecomment-23221459
.

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Aug 25, 2013

Introduced File.read_with_options instead of patching File.read. How about this?

@mattr-

This comment has been minimized.

Copy link
Member

mattr- commented Aug 27, 2013

❤️ it!

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Aug 28, 2013

Good. then, how about this performance fix? It's marginally faster.

https://gist.github.com/shigeya/6362099

I haven't committed/push this yet.

@mattn

This comment has been minimized.

Copy link

mattn commented Aug 28, 2013

Good Job

@mattr-

This comment has been minimized.

Copy link
Member

mattr- commented Aug 28, 2013

@shigeya That also works. 😃

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Aug 28, 2013

@mattr- Ok, thanks. merged that change, and rebased on top of current master.

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Aug 28, 2013

Test passed on this branch. (I have tested on master and it's still failing on the encoding). Travis should say OK too.

@parkr

View changes

lib/jekyll/tags/include.rb Outdated
@@ -48,6 +48,11 @@ def validate_syntax
end
end

# Grab file read opts in the context
def file_read_opts_from_context(context)

This comment has been minimized.

@parkr

parkr Aug 28, 2013

Member

Does this need to specify that they come from the context? I think it'd be ok to just have file_read_opts here :)

This comment has been minimized.

@shigeya

shigeya Aug 28, 2013

Author Contributor

Okay, then, just mentioning file_read_opts is enough, then, placing context.registers[:site].file_read_opts at the option of read_with_options is sufficient, isn't it (without refactoring that as method)? On contrary, if we name the method file_read_opts, passing context to it seems wrong to me. The other option is put the read_file_opts into context or wherever place. which seems odd.

This comment has been minimized.

@parkr

parkr Aug 28, 2013

Member

When you pass the context, it's inferred that the context has something to do with it. I think def file_read_opts(context) says "based on this context, fetch me the file read options.

This comment has been minimized.

@shigeya

shigeya Aug 28, 2013

Author Contributor

If you say so, ok. I'll change that way.

@parkr

View changes

site/docs/configuration.md Outdated
<p class='name'><strong>Encoding</strong></p>
<p class="description">
Set the encoding of files. The default is the system encoding,
which determined by LANG environment variable.

This comment has been minimized.

@parkr

parkr Aug 28, 2013

Member

It'd be awesome to have a link to a list of possible encodings. Maybe a Wikipedia article for reference?

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Aug 28, 2013

LGTM after my other couple of comments.

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Aug 28, 2013

Fixed.

@mattn

This comment has been minimized.

Copy link

mattn commented Sep 9, 2013

There's no issue to merge? :)

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Sep 12, 2013

Any reason for delay for merge? if you want rebase, I can at any time...

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Sep 12, 2013

We need a rebase! Thanks for being willing to do so :) I will take a look at it after the rebase :)

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Sep 12, 2013

Will work on it tomorrow. I tried, but it's little complex to do clean merge for a drunk guy:D

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Sep 13, 2013

It was painful:< @parkr, please review and merge.

@mattn

This comment has been minimized.

Copy link

mattn commented Sep 13, 2013

good job

@mattn

This comment has been minimized.

Copy link

mattn commented Sep 18, 2013

Why not merge this? As I said this or another issue, jekyll doesn't work to read multi-byte encoding texts on windows ruby2.0 . I think you are taking too many time to fix this problem. :-(

def self.read_with_options(path, opts = {})
self.read(path, opts)
end
end

This comment has been minimized.

@parkr

parkr Sep 18, 2013

Member

is it possible to do this here:

class File
  def self.read_with_options(path, opts = {})
    if RUBY_VERSION < '1.9'
      self.read(path)
    else
      self.read(path, opts)
    end
  end
end

It's a bit DRYer. :)

This comment has been minimized.

@shigeya

shigeya Sep 19, 2013

Author Contributor

That was previous version's code. Current version is just a little faster, since it is does not require eval of RUBY_VERSION on every call. I asked whether you guys like this or not and imported. Please decide which is better.

This comment has been minimized.

@parkr

parkr Sep 19, 2013

Member

The current version is fine given that reason.

<td>
<p class='name'><strong>Encoding</strong></p>
<p class="description">
Set the encoding of files by name. Only avaialble for Ruby

This comment has been minimized.

@parkr

parkr Sep 18, 2013

Member

There's a slight spelling issue here: available :)

This comment has been minimized.

@shigeya

shigeya Sep 19, 2013

Author Contributor

Will fix this. thanks.

Set the encoding of files by name. Only avaialble for Ruby
1.9 or later).
The default value is nil, which assumes default encoding,
<code>ASCII-8BIT</code>.

This comment has been minimized.

@parkr

parkr Sep 18, 2013

Member

I think the default value is $LANG on Unix then maybe defaults back to ASCII-8BIT

This comment has been minimized.

@mattn

mattn Sep 18, 2013

I prefer utf-8 as default. $LANG is locale name not encoding name.

On 9/18/13, Parker Moore notifications@github.com wrote:

@@ -99,6 +99,22 @@ class="flag">flags (specified on the
command-line) that control them.

timezone: TIMEZONE



  •  <td>
    
  •    <p class='name'><strong>Encoding</strong></p>
    
  •    <p class="description">
    
  •        Set the encoding of files by name. Only avaialble for Ruby
    
  •        1.9 or later).
    
  •        The default value is nil, which assumes default encoding,
    
  •        <code>ASCII-8BIT</code>.
    

I think the default value is $LANG on Unix then maybe defaults back to
ASCII-8BIT


Reply to this email directly or view it on GitHub:
https://github.com/mojombo/jekyll/pull/1449/files#r6435126

  • Yasuhiro Matsumoto

This comment has been minimized.

@parkr

parkr Sep 18, 2013

Member

Encoding is in there :)

~/code/jekyll$ echo $LANG
en_US.UTF-8

This comment has been minimized.

@parkr

parkr Sep 18, 2013

Member

Also the default of UTF-8 is smart but isn't compatible with some systems This note here about ASCII-8BIT is about the usual-default system encoding being ASCII – we don't impose a default.

This comment has been minimized.

@mattn

mattn Sep 18, 2013

You need to remove en_US. if it use $LANG.

we don't impose a default.

Then, it need to set Encoding.default_external?

This comment has been minimized.

@shigeya

shigeya Sep 19, 2013

Author Contributor

Since Ruby's default behavior without locale configuration such as LANG is default to ASCII-8BIT, default should be ASCII-8BIT, since it may break other user's page build.

We can discuss whether defaults to be utf-8 or not later. This discussion can be endless. Thus, I recommend to stick with current behavior, unless clearly configured with encoding config. This allows us to configure to use utf-8 while keeping other page builds happy.

This comment has been minimized.

@parkr

parkr Sep 19, 2013

Member

👍

shigeya added some commits Aug 20, 2013

Invoke File.read with or without options depends on Ruby version
- Extract option fetch method as a separate method
- Added File.read_with_options method to use
- With performance fix
@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Sep 19, 2013

Rebased (again), fixed, tested. For convenience, I have reordered and squashed some of commits.

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Sep 19, 2013

Thank you thank you thank you!

@mattr- Let's get this merged and out in a v1.2.2 with any other fixes we deem necessary. This is a whopper!

@mattn

This comment has been minimized.

Copy link

mattn commented Sep 19, 2013

it will be coming! so cool! :)

# Returns merged optin hash for File.read of self.site (if exists)
# and a given param
def merged_file_read_opts(opts)
(self.site ? self.site.file_read_opts : {}).merge(opts)

This comment has been minimized.

@mattr-

mattr- Sep 20, 2013

Member

I don't see a reason to check for site here. Convertible classes will always have a site object.

This comment has been minimized.

@shigeya

shigeya Sep 20, 2013

Author Contributor

Hmm. I have merged previous code without thought on that. Will change quickly.

This comment has been minimized.

@shigeya

shigeya Sep 20, 2013

Author Contributor

Nope. you're wrong.

After I changed to:

self.site.file_read_opts.merge(opts)

Then, on test:

Error reading file /Users/shigeya/git/jekyll/test/fixtures/broken_front_matter1.erb: undefined method `file_read_opts' for nil:NilClass

please merge as is.

@mattr-

This comment has been minimized.

Copy link
Member

mattr- commented Sep 20, 2013

One small tiny change and then I think we're good.

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Sep 20, 2013

@mattr- Looks like @shigeya has run into a problem with your change, but the current code works. I'm 👍 to ship and release v1.2.2 this weekend or early next week.

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Sep 24, 2013

Please?

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Sep 24, 2013

Looks good to me. @mattr-, just need your final 👍 then we can merge it in.

@mattr-

This comment has been minimized.

Copy link
Member

mattr- commented Sep 24, 2013

👍 :shipit:

On Mon, Sep 23, 2013 at 9:10 PM, Parker Moore notifications@github.comwrote:

Looks good to me. @mattr- https://github.com/mattr-, just need your
final [image: 👍] then we can merge it in.


Reply to this email directly or view it on GitHubhttps://github.com//pull/1449#issuecomment-24970283
.

parkr added a commit that referenced this pull request Sep 24, 2013

Merge pull request #1449 from shigeya/config-encoding-and-yaml-opts
Adding "encoding" configuration (4th version)

@parkr parkr merged commit 2734759 into jekyll:master Sep 24, 2013

1 check passed

default The Travis CI build passed
Details

parkr added a commit that referenced this pull request Sep 24, 2013

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Sep 24, 2013

Thank you!

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Sep 24, 2013

Our pleasure – thank YOU!

@mattn

This comment has been minimized.

Copy link

mattn commented Sep 24, 2013

Coooooooooooooooool!

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Sep 24, 2013

This will be released with v1.3.0 :) Thanks for your work on this! 🎉

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Sep 24, 2013

You're welcome. I (and some of my friends) plan to use jekyll, so, hope to contribute some other area, too. I believe basic UTF-8 support accelerate use of jekyll among Japanese communities.

@doktorbro

This comment has been minimized.

Copy link
Member

doktorbro commented Sep 24, 2013

@shigeya I’m curious which setup causes your encoding issues. I don’t know much about the Japanese alphabet, but I create Russian and German content in Jekyll without any problems. These alphabets have many characters outside of ASCII. My OS is Ubuntu, all content files are UTF-8 encoded.

What is the common setup among Japanese communities?

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Sep 24, 2013

@penibelst That's a very, very long story. Following are my understandings.

First, Ruby does not configure default_external to UTF-8 according to LC_CTYPE locale configuration environment variable, since Ruby startup code intentionally reset LC_CTYPE to null string on startup (see main.c). Don't ask me why:)

Second, whether conversion succeed or not depends on pair of encodings (source and destination) and the code point of interest. Ruby raise exception, when string translated from one encoding to another, if some of the code point which does not have translation rule (internally).

What's the output of ruby -e 'puts Encoding.default_external' in your environment? If the output is 'ASCII-8BIT' or 'US-ASCII' (I bet on this), translation rule between the encoding you're using (in files) and default externals (ASCII) on the code points you're using are translatable, cause no problem.

Now, for many of Japanese character code points, ASCII-8BIT and UTF-8 are not mappable.

Thus, we need Ruby to set default external encoding to UTF-8 manually, either by setting Encoding.default_external or -E option. The patch provided here allows us to configure the external encoding to be used on file read on various locations.

Hope this helps.

@doktorbro

This comment has been minimized.

Copy link
Member

doktorbro commented Sep 25, 2013

@shigeya This sounds very bad. I thought Ruby was born in Japan.

The output of your command in my environment is -e:1: uninitialized constant Encoding (NameError).

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Sep 25, 2013

@penibelst You may need to require 'encoding' before you call Encoding.default_external

@shigeya

This comment has been minimized.

Copy link
Contributor Author

shigeya commented Sep 25, 2013

@penibelst Ah, I have overlooked that reason. You're using 1.8.7. Ruby 1.8.7 does not have encoding library. Its story is very different since it doesn't have encoding handling like as described above. The feature we discussed here is for 1.9.3 and later. On 1.8.7, as far as you're using same encoding, it should not cause encoding error. On contrary, if you'd like to have safer encoding translation, you need to use Ruby 1.9.3 or later.

On the Encoding implementation, while it is little complex and cause some difficulties in certain situation, I believe this is one of most forward looking and complete implementation of encoding among currently available computer languages. As far as the encoding is specified correctly, it work perfectly, and will not cause issues on translation and also detect issues on translations.

Additional note on above discussion, I have reviewed source code and found that, Ruby 1.9.3+ control encoding default according to LANG environment variable. Default encoding (set as default_external, etc) is platform dependent. Unfortunately, documentation on this topic is not good enough. Check Ruby 1.9.3's Encoding doc and search for LANG.

@doktorbro

This comment has been minimized.

Copy link
Member

doktorbro commented Sep 26, 2013

@parkr no such file to load -- encoding (LoadError)

@shigeya You are right, my Ruby is older than 1.9.3

puts RUBY_VERSION, ENV['LANG']
1.8.7
de_DE.UTF-8

@shigeya shigeya deleted the shigeya:config-encoding-and-yaml-opts branch Sep 26, 2013

@jekyll jekyll locked and limited conversation to collaborators Feb 27, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.