Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML Symbols not correctly parsed in 1.16? TexMath / HTML issue!? #2734

Closed
Tinkerer- opened this issue Feb 22, 2016 · 5 comments
Closed

HTML Symbols not correctly parsed in 1.16? TexMath / HTML issue!? #2734

Tinkerer- opened this issue Feb 22, 2016 · 5 comments

Comments

@Tinkerer-
Copy link

Since I've upgraded to pandoc 1.16 I have an issue with HTML characters. For example, &euml should become ë but instead pandoc decides to interpret it, and make it ë (&$euml;).

Example:

C:\Users\test>cat test.txt
This is a ë test
C:\Users\test>pandoc test.txt
<p>This is a &amp;euml; test</p>

From previous pandoc versions I would have expected the output to have been:

<p>This is a &euml; test</p>

I'm using pandoc 1.16.0.2, built by cabal.

Edit: when I'm using the Windows installed pandoc (also 1.16.0.2), pandoc behaves as expected!

Edit2: The difference between my Windows intalled pandoc ("good") and the cabal version ("bad"), as indicated by a diff of the version output is that the Windows installed version uses texmath 0.8.4.1, where the version compiled by Cabal uses texmath 0.8.4.2

@Tinkerer- Tinkerer- changed the title HTML Symbols not correctly parsed in 1.16? HTML Symbols not correctly parsed in 1.16? TexMath / HTML issue! Feb 22, 2016
@Tinkerer- Tinkerer- changed the title HTML Symbols not correctly parsed in 1.16? TexMath / HTML issue! HTML Symbols not correctly parsed in 1.16? TexMath / HTML issue!? Feb 22, 2016
@Jmuccigr
Copy link
Contributor

On a Mac here, pandoc 1.16.0.2. I get an ë for this:

This is a &euml; test.

EDIT: this was on the command line.

@jgm
Copy link
Owner

jgm commented Feb 22, 2016

Strange! I'm seeing the same thing on
http://johnmacfarlane.net/babelmark2/?normalize=1&text=Hdin+in+a+%26euml%3B+test%0A
and
http://pandoc.org/try/?text=This+is+a+%26%23333%3B+test%0A&from=markdown&to=html

however, trying from the command line (with both pandoc 1.16
and the current dev version), I get proper output.

I don't see any changes between 1.16.0.2 and current dev
that would have affected this, but it may have to do with
changes in a dependent library.

This needs further investigation.

+++ Tinkerer- [Feb 22 16 03:59 ]:

Since I've upgraded to pandoc 1.16 I have an issue with HTML
characters. For example, &euml should become ë but instead pandoc
decides to interpret it, and make it &$euml;.

Example:
C:\Users\test>cat test.txt
This is a ë test
C:\Users\test>pandoc test.txt

This is a &euml; test

From previous pandoc versions I would have expected the output to have
been:

This is a ë test

I'm using pandoc 1.16.0.2, built by cabal.


Reply to this email directly or [1]view it on GitHub.

References

  1. HTML Symbols not correctly parsed in 1.16? TexMath / HTML issue!? #2734

@jgm jgm closed this as completed in 0180807 Feb 22, 2016
@jgm
Copy link
Owner

jgm commented Feb 22, 2016

The linux build that gets incorrect output is compiled
against tagsoup-0.13.3. I'm guessing that's the issue.

See the tagsoup changelog:

0.13.6 #28, some named entities require a trailing semicolon

I will raise the tagsoup lower bound to prevent linking
against buggy versions of tagsoup.

+++ John Muccigrosso [Feb 22 16 08:58 ]:

On a Mac here, pandoc 1.16.0.2. I get an ë for this:

This is a ë test.


Reply to this email directly or [1]view it on GitHub.

References

  1. HTML Symbols not correctly parsed in 1.16? TexMath / HTML issue!? #2734 (comment)

@jgm
Copy link
Owner

jgm commented Feb 22, 2016

PS. If you use cabal to install pandoc, the fix should be
as simple as:

cabal install tagsoup-0.13.8 pandoc --force --reinstall

+++ Tinkerer- [Feb 22 16 03:59 ]:

Since I've upgraded to pandoc 1.16 I have an issue with HTML
characters. For example, &euml should become ë but instead pandoc
decides to interpret it, and make it &$euml;.

Example:
C:\Users\test>cat test.txt
This is a ë test
C:\Users\test>pandoc test.txt

This is a &euml; test

From previous pandoc versions I would have expected the output to have
been:

This is a ë test

I'm using pandoc 1.16.0.2, built by cabal.


Reply to this email directly or [1]view it on GitHub.

References

  1. HTML Symbols not correctly parsed in 1.16? TexMath / HTML issue!? #2734

@Tinkerer-
Copy link
Author

Great, that fixed it! Thanks for the quick reply and for all the work on Pandoc in general!

c-forster pushed a commit to c-forster/pandoc that referenced this issue Mar 4, 2016
This fixes entity-related problems.

Closes jgm#2734.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants