utf-8 not working in code blocks #232

Closed
benben opened this Issue Oct 20, 2011 · 13 comments

Projects

None yet

7 participants

@benben
benben commented Oct 20, 2011

given the following post:


---

test post with äüß

``` ruby

def ruby
# ä
end

```

creates a post looking like this:

test post with äüß

1 def ruby
2 # 채
3 end

So, I think there is something wrong with encoding. The strange thing is: five ä ("äääää") does work, but less then five doesn't work.

/edit/
if the codeblock contains more than 4 special chars, then encoding works. no matter where there are placed.

@fhemberger
Contributor

Should work again, added a parameter to the Pygments highlighter to treat code input as utf-8.

@benben
benben commented Oct 21, 2011

@fhemberger: I'm sorry to say, that the problem still exists :(

@fhemberger fhemberger reopened this Oct 21, 2011
@fhemberger
Contributor

Hmm, tested it with your example and it works for me. with one to 4 ä's.
System: OS X 10.6.8 with Ruby 1.9.2-p290

@mattn
Contributor
mattn commented Oct 21, 2011

I tested it on windows xp.
But I didn't get invalid texts. This change seems working for me.

@benben
benben commented Oct 21, 2011

I think its a bug in jekyll.

I moved to another machine (archlinux, ruby1.9.2-p290) and there I can't run 'rake generate' when I have umlauts in a post.

/home/ben/.rvm/gems/ruby-1.9.2-p290/gems/jekyll-0.11.0/lib/jekyll/convertible.rb:32:in `read_yaml': invalid byte sequence in US-ASCII (ArgumentError)

I fixed that with changing the line 29 in convertible.rb from

self.content = File.read(File.join(base, name))

to

self.content = File.read(File.join(base, name), :encoding => "utf-8")

now it works on both machines with umlauts in code blocks and normal text. (the other machine is an ubuntu 11.04)

@mattn
Contributor
mattn commented Oct 21, 2011

Or

LANG=en_US.UTF-8
LC_ALL=en_US.UTF-8

Or

Encoding.default_external = Encoding.find('UTF-8')
@benben
benben commented Oct 21, 2011

Ok, there is something strange going on. My hack fixes the error but not all encoding problems:

    doesn't work:
    ``` ruby

    def ruby
    # ä
    end

    ```
    works (just added a space after the ä):
    ``` ruby

    def ruby
    # ä 
    end

    ```

    works (no space after but indented one space):
    ``` ruby

    def ruby
     # ä
    end

    ```

    works (space after but another comment before):
    ``` ruby

    def ruby
    # another comment
    # ä
    end

    ```

    works:
    ``` ruby

    def ruby
    # ä ä
    end

    ```

You can download the test file here: https://gist.github.com/1303922

@fhemberger
Contributor

@benben Your testcase works for me on OS X without your patch. I've set LANG=de_DE.UTF-8 in my environment.

Update: Also works on Windows with env variables set:

LANG=de_DE.UTF-8
LC_ALL=de_DE.UTF-8
@fhemberger
Contributor

Closed: #worksforme (if there are further questions, I'll reopen the ticket)

@fhemberger fhemberger closed this Nov 3, 2011
@chendeshen

using utf-8 encoding to save all the files works all the time!

@briansimmons briansimmons pushed a commit to briansimmons/octopress that referenced this issue Aug 20, 2013
@fhemberger fhemberger Add utf-8 encoding option to Pygments highlighter, fixes #232 6aecfde
@imvman
imvman commented Dec 15, 2013

Thanks,that's ok now.

@kacha-zhou

should write the below code at the first line in scss file :
@charset "utf-8";

@woterwang

thanks scss file header write @charset "utf-8"
Solved the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment