New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to support Chinese character? #116
Comments
Could you provide me the modified fourth-post.adoc file? |
From the trace, I would assume that the modified file hasn't been saved using UTF-8 encoding. |
I'm using Windows 7 Professional Edition SP1 with "Java(TM) SE Runtime Environment (build 1.7.0_51-b13)" now. The fourth-post.adoc generated by "jbake -i" is encode in "UTF-8 without BOM" by default. I just added two Chinese characters into it, and it will generate the "invalid byte sequence in UTF-8" exception when it's baking. After I converted the fourth-post.adoc into "UTF-8" with Notepad++ v6.5.5, I got the following errors
Then I converted it into "ASNI", it passed the bake, but the Chinese character was transfer into the question mark <div class="sectionbody">
<div class="paragraph">
<p>????</p>
</div> Please get the fourth-post.adoc in different encoding in my dropbox public folder update: sorry for the typo, all the "Unicode" should be "UTF-8", just corrected it. |
Encode all files in UTF-8 |
I've just tried the files you supplied and I get the same error when the file starts off with a BOM. However without a BOM the file displays correctly. Can you try making sure the file being baked does not have a BOM at the start of the file? |
Hi Jonathan, I'm quite sure about that.. Actually, I get different errors with/without the BOM as I showed in the previous comment. I just try it again, using the Notepad++ to make sure it's encode in "UTF-8 without BOM", still get the same "invalid byte sequence in UTF-8" exception. And I just done another test, I added Chinese characters into the "second-post.md" file (encode in "UTF-8 without BOM" by default). It pass the baking and show correctly in html. |
Just tried in under Linux (centos-6.5-x86_64), it works now with the "UTF-8 without BOM" and Chinese together. It seems that the Asciidoc rendering engine isn't compatible with Chinese character under Windows 7. |
Ahh that would explain things, I tried on OS X. I'll raise this upstream with the Asciidoctor team, I know they are busy with their 1.5.0 release and it may already have been fixed. |
I tried add Chinese in Markdown file, it works well. |
And also won't work under Windows 8. But I use gradle with asciidoctor plugin to render asciidoctor file, it succeed. So that might not issue of asciidoctor. |
Thanks for the additional feedback. @zeroleaf what version of the Asciidoctor Gradle plugin did you try? |
@jonbullock The plugin I use is "org.asciidoctor:asciidoctor-gradle-plugin:1.5.0" |
I have the same error as @flyisland with latin specific characters as "é" or "è" on Windows 7. There is no error with the same files on MacOSX. |
It work under linux but fail under windows, then may be the bug is cause by line separator. body.append(line).append("\n"); It hard code to use "\n" as line separator, May be correct "\n" to org.apache.commons.lang.SystemUtils.LINE_SEPATATOR Since I don't know how to build release package, so I can not test. |
I've just released v2.3.2 of JBake which includes Asciidoctor v1.5.0, could you see if your issues still exist in this release? JBake is now using the same version that was used by the asciidoctor-gradle-plugin. |
Yeah, with this release, Chinese characters are rendered correctly under windows. |
yes, this version v2.3.2 works correctly with Chinese characters under windows now. |
Glad to hear it. |
I have just downloaded the 2.3.0, created a sample with "-i", and modified the "fourth-post.adoc" with Chinese characters. Then I got the following errors while baking it.
The text was updated successfully, but these errors were encountered: