New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandoc (Markdown to HTML) converts some character entities to UTF-8 #844

Closed
hukbert opened this Issue May 3, 2013 · 2 comments

Comments

Projects
None yet
2 participants
@hukbert

hukbert commented May 3, 2013

I have a Markdown document containing the HTML character entity →. When I convert this to HTML using pandoc -o myfile.html myfile.md, the character is converted to a UTF-8 encoded right arrow character, which my browser displays as an ugly jumble →. Other character entities like &, on the other hand, are preserved correctly as inline HTML.

A workaround to this is to include a tag
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
at the beginning of my Markdown document, but that seems a little inelegant as I can't assume that any Markdown converter will produce UTF-8 encoded output. IMHO, pandoc should either consequently preserve HTML character entites, or properly announce UTF-8 encoding in the HTML output.

I'm using pandoc on Windows:

$ pandoc -v
pandoc 1.11.1
Compiled with citeproc-hs 0.3.8, texmath 0.6.1.3, highlighting-kate 0.5.3.8
...
@jgm

This comment has been minimized.

Owner

jgm commented May 3, 2013

Pandoc converts all entities to unicode characters. That is because it needs to handle output formats other than HTML.

If you use the -s flag to create a standalone document, pandoc will apply its default template, which includes the meta tag specifying UTF-8.

Another option is to use the --ascii flag, which will cause &rarr; to be output as &#8594; (the equivalent character).

@hukbert

This comment has been minimized.

hukbert commented May 3, 2013

Thanks a lot!

@hukbert hukbert closed this May 3, 2013

troglobit added a commit to troglobit/resume that referenced this issue Aug 30, 2016

Workaround for pandoc conversion from md --> HTML
The generated HTML output from pandoc generated unicode characters,
which did not display correctly in Firefox.  This patch adds a small
workaround as suggested in jgm/pandoc#844

Signed-off-by: Joachim Nilsson <troglobit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment