Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to support Chinese character? #116

Closed
flyisland opened this issue Jul 9, 2014 · 18 comments
Closed

How to support Chinese character? #116

flyisland opened this issue Jul 9, 2014 · 18 comments
Assignees
Labels
Milestone

Comments

@flyisland
Copy link

I have just downloaded the 2.3.0, created a sample with "-i", and modified the "fourth-post.adoc" with Chinese characters. Then I got the following errors while baking it.

r:\Test>d:\DevTools\jbake-2.3.0\jbake
JBake v2.3.0 (2014-05-11 15:50:07PM) [http://jbake.org]

22:25:55.414 INFO  org.jbake.app.Oven - Baking has started...
22:25:55.452 INFO  org.jbake.app.Crawler - Processing [.\content\about.html]...  : new
22:25:55.460 INFO  org.jbake.app.Crawler - Processing [.\content\blog\2013\first-post.html]...  : new
22:25:55.466 INFO  org.jbake.parser.AsciidoctorEngine - Initializing Asciidoctor engine...
22:25:58.507 INFO  org.jbake.parser.AsciidoctorEngine - Asciidoctor engine initialized.
org.jruby.exceptions.RaiseException: (ArgumentError) invalid byte sequence in UTF-8
        at org.jruby.RubyRegexp.match(org/jruby/RubyRegexp.java:1539)
        at org.jruby.RubyString.match(org/jruby/RubyString.java:1758)
        at RUBY.parse_block_metadata_line(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor/lexer.rb:1866)
        at RUBY.parse_block_metadata_lines(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor/lexer.rb:1830)
        at RUBY.next_section(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor/lexer.rb:254)
        at RUBY.parse(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor/lexer.rb:52)
        at RUBY.initialize(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor/document.rb:329)
        at RUBY.load(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor.rb:805)
        at RUBY.render(jar:file:D:/DevTools/jbake-2.3.0/lib/asciidoctor-java-integration-0.1.4.jar!/gems/asciidoctor-0.1.4/lib/asciidoctor.rb:879)
        at RUBY.render(<script>:55)
        at org.jruby.gen.InterfaceImpl1214299472.render(org/jruby/gen/InterfaceImpl1214299472.gen:13)
@jonbullock
Copy link
Member

Could you provide me the modified fourth-post.adoc file?

@melix
Copy link
Contributor

melix commented Jul 10, 2014

From the trace, I would assume that the modified file hasn't been saved using UTF-8 encoding.

@flyisland
Copy link
Author

I'm using Windows 7 Professional Edition SP1 with "Java(TM) SE Runtime Environment (build 1.7.0_51-b13)" now.

The fourth-post.adoc generated by "jbake -i" is encode in "UTF-8 without BOM" by default. I just added two Chinese characters into it, and it will generate the "invalid byte sequence in UTF-8" exception when it's baking.
2014-07-10 22_56_12-r__test_content_blog_2013_fourth-post adoc - notepad

After I converted the fourth-post.adoc into "UTF-8" with Notepad++ v6.5.5, I got the following errors

r:\Test>d:\DevTools\jbake-2.3.0\jbake
JBake v2.3.0 (2014-05-11 15:50:07PM) [http://jbake.org]

22:40:35.591 INFO  org.jbake.app.Oven - Baking has started...
22:40:35.625 INFO  org.jbake.app.Crawler - Processing [.\content\about.html]...  : new
22:40:35.634 INFO  org.jbake.app.Crawler - Processing [.\content\blog\2013\first-post.html]...  : new
22:40:35.639 INFO  org.jbake.parser.AsciidoctorEngine - Initializing Asciidoctor engine...
22:40:38.404 INFO  org.jbake.parser.AsciidoctorEngine - Asciidoctor engine initialized.
22:40:38.453 ERROR org.jbake.app.Parser - Error parsing meta data from header!
22:40:38.454 WARN  org.jbake.app.Crawler - .\content\blog\2013\fourth-post.adoc has an invalid header, it has been ignored!
22:40:38.455 INFO  org.jbake.app.Crawler - Processing [.\content\blog\2013\fourth-post.adoc]...  : new
22:40:39.056 INFO  org.jbake.app.Crawler - Processing [.\content\blog\2013\second-post.md]...  : new 
......

Then I converted it into "ASNI", it passed the bake, but the Chinese character was transfer into the question mark

<div class="sectionbody">
<div class="paragraph">
<p>????</p>
</div>

2014-07-10 22_46_54-fourth post

Please get the fourth-post.adoc in different encoding in my dropbox public folder

update: sorry for the typo, all the "Unicode" should be "UTF-8", just corrected it.

@opoo
Copy link

opoo commented Jul 10, 2014

Encode all files in UTF-8

@jonbullock
Copy link
Member

I've just tried the files you supplied and I get the same error when the file starts off with a BOM. However without a BOM the file displays correctly. Can you try making sure the file being baked does not have a BOM at the start of the file?

@flyisland
Copy link
Author

Hi Jonathan, I'm quite sure about that.. Actually, I get different errors with/without the BOM as I showed in the previous comment. I just try it again, using the Notepad++ to make sure it's encode in "UTF-8 without BOM", still get the same "invalid byte sequence in UTF-8" exception.

And I just done another test, I added Chinese characters into the "second-post.md" file (encode in "UTF-8 without BOM" by default). It pass the baking and show correctly in html.

@flyisland
Copy link
Author

Just tried in under Linux (centos-6.5-x86_64), it works now with the "UTF-8 without BOM" and Chinese together.

It seems that the Asciidoc rendering engine isn't compatible with Chinese character under Windows 7.

@jonbullock
Copy link
Member

Ahh that would explain things, I tried on OS X.

I'll raise this upstream with the Asciidoctor team, I know they are busy with their 1.5.0 release and it may already have been fixed.

@jonbullock jonbullock self-assigned this Jul 14, 2014
@jonbullock jonbullock added the bug label Jul 14, 2014
@zzdjk6
Copy link

zzdjk6 commented Aug 31, 2014

I tried add Chinese in Markdown file, it works well.
It might be AsciiDoc specific issue.

@zeroleaf
Copy link

zeroleaf commented Sep 1, 2014

And also won't work under Windows 8.

But I use gradle with asciidoctor plugin to render asciidoctor file, it succeed. So that might not issue of asciidoctor.

@jonbullock
Copy link
Member

Thanks for the additional feedback.

@zeroleaf what version of the Asciidoctor Gradle plugin did you try?

@zeroleaf
Copy link

zeroleaf commented Sep 1, 2014

@jonbullock The plugin I use is "org.asciidoctor:asciidoctor-gradle-plugin:1.5.0"

@TGITS
Copy link

TGITS commented Sep 9, 2014

I have the same error as @flyisland with latin specific characters as "é" or "è" on Windows 7. There is no error with the same files on MacOSX.

@zeroleaf
Copy link

It work under linux but fail under windows, then may be the bug is cause by line separator.
And in AsciidoctorEngine.java file line 115 I find the code below:

body.append(line).append("\n");

It hard code to use "\n" as line separator, May be correct "\n" to

org.apache.commons.lang.SystemUtils.LINE_SEPATATOR
or
System.getProperty("line.separator");

Since I don't know how to build release package, so I can not test.
You may have a try. hope this is helpful.

@jonbullock
Copy link
Member

I've just released v2.3.2 of JBake which includes Asciidoctor v1.5.0, could you see if your issues still exist in this release?

JBake is now using the same version that was used by the asciidoctor-gradle-plugin.

@zeroleaf
Copy link

Yeah, with this release, Chinese characters are rendered correctly under windows.

@jonbullock jonbullock added this to the v2.3.2 milestone Sep 21, 2014
@flyisland
Copy link
Author

yes, this version v2.3.2 works correctly with Chinese characters under windows now.

@jonbullock
Copy link
Member

Glad to hear it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants