Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

excerpt.rb error when having non-ASCII characters (but with UTF-8) in head section. #1624

Closed
agat366 opened this issue Oct 9, 2013 · 6 comments
Labels
frozen-due-to-age support This is a question about Jekyll's usage.

Comments

@agat366
Copy link

agat366 commented Oct 9, 2013

I've got the error message below, when having some 'non-traditional-English' characters in head section, like:
That touches only head section. Body section (which contains such characters as well) (if processed with redcarpet) processed great (but "not great" with maruku).


---
layout: post
title: українськи заголовок or any German, Polish etc special characters

---

Error message:

  Generating... Error reading file ... test21.markdown: invalid byte sequence in UTF-8
C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/excerpt.rb:110:in `scan': invalid byte sequence in UTF-8 (ArgumentError)
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/excerpt.rb:110:in `extract_excerpt'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/excerpt.rb:17:in `initialize'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/post.rb:302:in `new'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/post.rb:302:in `extract_excerpt'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/post.rb:100:in `read_yaml'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/post.rb:56:in `initialize'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/site.rb:162:in `new'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/site.rb:162:in `block in read_posts'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/site.rb:160:in `each'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/site.rb:160:in `read_posts'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/site.rb:132:in `read_directories'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/site.rb:102:in `read'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/site.rb:34:in `process'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/command.rb:18:in `process_site'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/commands/build.rb:23:in `build'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/lib/jekyll/commands/build.rb:7:in `process'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/jekyll-1.2.1/bin/jekyll:73:in `block (2 levels) in <top (required)>'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/commander-4.1.5/lib/commander/command.rb:180:in `call'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/commander-4.1.5/lib/commander/command.rb:180:in `call'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/commander-4.1.5/lib/commander/command.rb:155:in `run'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/commander-4.1.5/lib/commander/runner.rb:402:in `run_active_command'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/commander-4.1.5/lib/commander/runner.rb:78:in `run!'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/commander-4.1.5/lib/commander/delegates.rb:11:in `run!'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/commander-4.1.5/lib/commander/import.rb:10:in `block in <top (required)>'
@parkr
Copy link
Member

parkr commented Oct 9, 2013

We're seeing problems with UTF-8 on Windows systems. #1449 should fix it. In the meantime, use HTML entities to encode those non-ASCII characters.

@parkr parkr closed this as completed Oct 9, 2013
@agat366
Copy link
Author

agat366 commented Oct 9, 2013

Hi! Thanks for the fast response.

Are you mentioning these ones: & # 1091; ?

I guess so, but if I put them into head section (as the problem appears only with head section (body is ok (if redcarpet used))):

---
title: &#1091;&#1082;&#1088;&#1072;&#1111;&#1085;&#1089;&#1100;&#1082;&#1080;&#1081; &#1090;&#1077;&#1082;&#1089;&#1090;
---

I get the following error message:

  Generating... YAML Exception reading ... test.markdown: (<unknown>): did not find expected alphabetic or numeric character while scanning an anchor at line 3 column 8

@parkr
Copy link
Member

parkr commented Oct 9, 2013

Try wrapping them in quotes. & is a reserved character in YAML, I think.

@agat366
Copy link
Author

agat366 commented Oct 9, 2013

Ok. The following works. Thanks!

title: "&#1091;"

@parkr
Copy link
Member

parkr commented Oct 10, 2013

No problem! Once v1.3 comes out, try assigning encoding: "utf-8" in your _config.yml and trying without the HTML entities. It should correct itself (hopefully). :)

@agat366
Copy link
Author

agat366 commented Oct 10, 2013

Hi, Parker!
HTML entities do work well. However, I've bumped into some other multilingual-related thing. Actually, I am not sure if this thread is a correct place, so you might give me a clue where should I go with it or something. :)
Anyway, that issue seems related to jekyll paginator (of course, that might be some other internal part, but still).

When I use mentioned above characters, they are processed by redcarpet well (when inserted into posts processed html). However, if I use something like that:

{% for post in paginator.posts %}
      :::{{ post.content }}:::
{% endfor %}

the post.content generates something like:

:::
����� �����������

(the "post-fixing" (":::") content also disappears).
(In opposite to title (head section) characters encoding that's not really convenient to encode all the post body - not sure what to do with that).

@jekyll jekyll locked and limited conversation to collaborators Feb 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age support This is a question about Jekyll's usage.
Projects
None yet
Development

No branches or pull requests

3 participants