How to support Chinese character? #116

flyisland · 2014-07-09T14:36:32Z

I have just downloaded the 2.3.0, created a sample with "-i", and modified the "fourth-post.adoc" with Chinese characters. Then I got the following errors while baking it.

r:\Test>d:\DevTools\jbake-2.3.0\jbake
JBake v2.3.0 (2014-05-11 15:50:07PM) [http://jbake.org]

22:25:55.414 INFO  org.jbake.app.Oven - Baking has started...
22:25:55.452 INFO  org.jbake.app.Crawler - Processing [.\content\about.html]...  : new
22:25:55.460 INFO  org.jbake.app.Crawler - Processing [.\content\blog\2013\first-post.html]...  : new
22:25:55.466 INFO  org.jbake.parser.AsciidoctorEngine - Initializing Asciidoctor engine...
22:25:58.507 INFO  org.jbake.parser.AsciidoctorEngine - Asciidoctor engine initialized.
org.jruby.exceptions.RaiseException: (ArgumentError) invalid byte sequence in UTF-8
        at org.jruby.RubyRegexp.match(org/jruby/RubyRegexp.java:1539)
        at org.jruby.RubyString.match(org/jruby/RubyString.java:1758)
        at RUBY.parse_block_metadata_line(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor/lexer.rb:1866)
        at RUBY.parse_block_metadata_lines(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor/lexer.rb:1830)
        at RUBY.next_section(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor/lexer.rb:254)
        at RUBY.parse(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor/lexer.rb:52)
        at RUBY.initialize(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor/document.rb:329)
        at RUBY.load(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor.rb:805)
        at RUBY.render(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor.rb:879)
        at RUBY.render(<script>:55)
        at org.jruby.gen.InterfaceImpl1214299472.render(org/jruby/gen/InterfaceImpl1214299472.gen:13)

The text was updated successfully, but these errors were encountered:

jonbullock · 2014-07-09T23:14:18Z

Could you provide me the modified fourth-post.adoc file?

melix · 2014-07-10T07:30:58Z

From the trace, I would assume that the modified file hasn't been saved using UTF-8 encoding.

flyisland · 2014-07-10T15:05:33Z

I'm using Windows 7 Professional Edition SP1 with "Java(TM) SE Runtime Environment (build 1.7.0_51-b13)" now.

The fourth-post.adoc generated by "jbake -i" is encode in "UTF-8 without BOM" by default. I just added two Chinese characters into it, and it will generate the "invalid byte sequence in UTF-8" exception when it's baking.

After I converted the fourth-post.adoc into "UTF-8" with Notepad++ v6.5.5, I got the following errors

r:\Test>d:\DevTools\jbake-2.3.0\jbake
JBake v2.3.0 (2014-05-11 15:50:07PM) [http://jbake.org]

22:40:35.591 INFO  org.jbake.app.Oven - Baking has started...
22:40:35.625 INFO  org.jbake.app.Crawler - Processing [.\content\about.html]...  : new
22:40:35.634 INFO  org.jbake.app.Crawler - Processing [.\content\blog\2013\first-post.html]...  : new
22:40:35.639 INFO  org.jbake.parser.AsciidoctorEngine - Initializing Asciidoctor engine...
22:40:38.404 INFO  org.jbake.parser.AsciidoctorEngine - Asciidoctor engine initialized.
22:40:38.453 ERROR org.jbake.app.Parser - Error parsing meta data from header!
22:40:38.454 WARN  org.jbake.app.Crawler - .\content\blog\2013\fourth-post.adoc has an invalid header, it has been ignored!
22:40:38.455 INFO  org.jbake.app.Crawler - Processing [.\content\blog\2013\fourth-post.adoc]...  : new
22:40:39.056 INFO  org.jbake.app.Crawler - Processing [.\content\blog\2013\second-post.md]...  : new 
......

Then I converted it into "ASNI", it passed the bake, but the Chinese character was transfer into the question mark

<div class="sectionbody">
<div class="paragraph">
<p>????</p>
</div>

Please get the fourth-post.adoc in different encoding in my dropbox public folder

update: sorry for the typo, all the "Unicode" should be "UTF-8", just corrected it.

opoo · 2014-07-10T15:46:56Z

Encode all files in UTF-8

jonbullock · 2014-07-10T22:32:56Z

I've just tried the files you supplied and I get the same error when the file starts off with a BOM. However without a BOM the file displays correctly. Can you try making sure the file being baked does not have a BOM at the start of the file?

flyisland · 2014-07-11T01:36:47Z

Hi Jonathan, I'm quite sure about that.. Actually, I get different errors with/without the BOM as I showed in the previous comment. I just try it again, using the Notepad++ to make sure it's encode in "UTF-8 without BOM", still get the same "invalid byte sequence in UTF-8" exception.

And I just done another test, I added Chinese characters into the "second-post.md" file (encode in "UTF-8 without BOM" by default). It pass the baking and show correctly in html.

flyisland · 2014-07-11T01:49:05Z

Just tried in under Linux (centos-6.5-x86_64), it works now with the "UTF-8 without BOM" and Chinese together.

It seems that the Asciidoc rendering engine isn't compatible with Chinese character under Windows 7.

jonbullock · 2014-07-14T11:48:21Z

Ahh that would explain things, I tried on OS X.

I'll raise this upstream with the Asciidoctor team, I know they are busy with their 1.5.0 release and it may already have been fixed.

zzdjk6 · 2014-08-31T06:24:16Z

I tried add Chinese in Markdown file, it works well.
It might be AsciiDoc specific issue.

zeroleaf · 2014-09-01T09:45:47Z

And also won't work under Windows 8.

But I use gradle with asciidoctor plugin to render asciidoctor file, it succeed. So that might not issue of asciidoctor.

jonbullock · 2014-09-01T12:32:45Z

Thanks for the additional feedback.

@zeroleaf what version of the Asciidoctor Gradle plugin did you try?

zeroleaf · 2014-09-01T23:39:55Z

@jonbullock The plugin I use is "org.asciidoctor:asciidoctor-gradle-plugin:1.5.0"

TGITS · 2014-09-09T14:21:12Z

I have the same error as @flyisland with latin specific characters as "é" or "è" on Windows 7. There is no error with the same files on MacOSX.

zeroleaf · 2014-09-20T12:42:45Z

It work under linux but fail under windows, then may be the bug is cause by line separator.
And in AsciidoctorEngine.java file line 115 I find the code below:

body.append(line).append("\n");

It hard code to use "\n" as line separator, May be correct "\n" to

org.apache.commons.lang.SystemUtils.LINE_SEPATATOR
or
System.getProperty("line.separator");

Since I don't know how to build release package, so I can not test.
You may have a try. hope this is helpful.

jonbullock · 2014-09-21T01:30:38Z

I've just released v2.3.2 of JBake which includes Asciidoctor v1.5.0, could you see if your issues still exist in this release?

JBake is now using the same version that was used by the asciidoctor-gradle-plugin.

zeroleaf · 2014-09-21T06:01:54Z

Yeah, with this release, Chinese characters are rendered correctly under windows.

flyisland · 2014-10-08T05:27:17Z

yes, this version v2.3.2 works correctly with Chinese characters under windows now.

jonbullock · 2014-10-15T12:22:20Z

Glad to hear it.

jonbullock self-assigned this Jul 14, 2014

jonbullock added the bug label Jul 14, 2014

jonbullock added this to the v2.3.2 milestone Sep 21, 2014

jonbullock closed this as completed Sep 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to support Chinese character? #116

How to support Chinese character? #116

flyisland commented Jul 9, 2014

jonbullock commented Jul 9, 2014

melix commented Jul 10, 2014

flyisland commented Jul 10, 2014

opoo commented Jul 10, 2014

jonbullock commented Jul 10, 2014

flyisland commented Jul 11, 2014

flyisland commented Jul 11, 2014

jonbullock commented Jul 14, 2014

zzdjk6 commented Aug 31, 2014

zeroleaf commented Sep 1, 2014

jonbullock commented Sep 1, 2014

zeroleaf commented Sep 1, 2014

TGITS commented Sep 9, 2014

zeroleaf commented Sep 20, 2014

jonbullock commented Sep 21, 2014

zeroleaf commented Sep 21, 2014

flyisland commented Oct 8, 2014

jonbullock commented Oct 15, 2014

How to support Chinese character? #116

How to support Chinese character? #116

Comments

flyisland commented Jul 9, 2014

jonbullock commented Jul 9, 2014

melix commented Jul 10, 2014

flyisland commented Jul 10, 2014

opoo commented Jul 10, 2014

jonbullock commented Jul 10, 2014

flyisland commented Jul 11, 2014

flyisland commented Jul 11, 2014

jonbullock commented Jul 14, 2014

zzdjk6 commented Aug 31, 2014

zeroleaf commented Sep 1, 2014

jonbullock commented Sep 1, 2014

zeroleaf commented Sep 1, 2014

TGITS commented Sep 9, 2014

zeroleaf commented Sep 20, 2014

jonbullock commented Sep 21, 2014

zeroleaf commented Sep 21, 2014

flyisland commented Oct 8, 2014

jonbullock commented Oct 15, 2014