Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image path with escaped quote cannot be found when run --standalone #1489

Closed
jrowen opened this issue Aug 5, 2014 · 16 comments
Closed

image path with escaped quote cannot be found when run --standalone #1489

jrowen opened this issue Aug 5, 2014 · 16 comments

Comments

@jrowen
Copy link

jrowen commented Aug 5, 2014

When run with the --standalone arg, pandoc cannot find images if the path is wrapped in escaped quotes.

This example,

var x = "<img src=\"./image/pic.png\" alt=\"pic.png\" width=400 />"

will generate the error (97),

pandoc: Could not find data file \"./image/pic.png\"

Removing the escaped quote for the src attribute eliminates the error.

One other related item, if a space is not included after the width attribute (e.g. width=400/>) the output text looks like the following

var x = "<img src="data:image/png;base64,..." alt="\&quot;pic.png\&quot;" width="400/">

instead of the expected

var x = "<img src="data:image/png;base64,..." alt="\&quot;pic.png\&quot;" width="400"></img>
@jgm
Copy link
Owner

jgm commented Aug 5, 2014

Can you post the precise command you're using and the pandoc version? Also, why are you escaping the quotes?

On Aug 5, 2014, at 6:45 AM, jrowen notifications@github.com wrote:

When run with the --standalone arg, pandoc cannot find images if the path is wrapped in escaped quotes.

This example,

var x = "<img src="./image/pic.png" alt="pic.png" width=400 />"
will generate the error (97),

pandoc: Could not find data file "./image/pic.png"
Removing the escaped quote for the src attribute eliminates the error.

One other related item, if a space is not included after the width attribute (e.g. width=400/>) the output text looks like the following

var x = "\"pic.png\"
instead of the expected

var x = "\"pic.png\"

Reply to this email directly or view it on GitHub.

@jrowen
Copy link
Author

jrowen commented Aug 5, 2014

I'm using a series of R libraries that call pandoc 1.12.3 (via the rmarkdown package). The WebGL code generated by one library, rgl, generates strings in the format noted above. Here's a related link. I've also submitted a request for the rgl library to update some javascript to work better with pandoc.

Below is the pandoc command.

"C:/Program Files/RStudio/bin/pandoc/pandoc" test.utf8.md --to html --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-implicit_figures --output test.html --smart --email-obfuscation none --standalone --section-divs --template C:\Users\jrowen\Documents\R\win-library\3.1\rmarkdown\rmd\h\default.html --variable theme:bootstrap --include-in-header C:\Users\jrowen\AppData\Local\Temp\RtmpYnhpzL\rmarkdown-str2a6c64c1683.html --mathjax --variable mathjax-url:https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML --no-highlight --variable highlightjs=test_files/highlight 

@jrowen
Copy link
Author

jrowen commented Aug 5, 2014

I ran into one other javascript parsing item that appears to give pandoc trouble. If spacing isn't added around conditional statements in the md file, additional quotes are added to the resulting html.

if (val<check) { ... } causes problems, while if (val < check) { ... } does not.

I can file as a new issue if desired.

@jgm
Copy link
Owner

jgm commented Aug 5, 2014

Does this js snippet occur inside html script tags in the md file? Can you post the whole context so I can try it?

On Aug 5, 2014, at 12:00 PM, jrowen notifications@github.com wrote:

I ran into one other javascript parsing item that appears to give pandoc trouble. If spacing isn't added around conditional statements in the md file, additional quotes are added to the resulting html.

if (val<check) { ... } causes problems, while if (val < check) { ... } does not.

I can file as a new issue if desired.


Reply to this email directly or view it on GitHub.

@jrowen
Copy link
Author

jrowen commented Aug 5, 2014

Here are a few md files that should help debug the problem (sorry, they are a little large). The referenced image and js files are here.

  • This file you will generate a js error, but change pow<value to pow < value and the error is gone.
  • This file shows some unexpected formatting in the resulting html around the width attribute in lines 497 and 508, but add a space between at the end of the img tag in lines 444 and 455 of the md file (e.g. width=673/> to width=673 />) and the html file looks as expected.
  • This file will generate error, pandoc: Could not find data file \"./test_files/figure-html/chunck1.png\", but remove the escaped quotes from lines 444 and 455 and it runs without error.

@jgm
Copy link
Owner

jgm commented Aug 5, 2014

I suspect this has to do with rawVerbatimBlock in Text.Pandoc.Readers.Markdown (note to self).

@jgm
Copy link
Owner

jgm commented Aug 8, 2014

@jrowen, I was unable to reproduce what you're seeing with the command line you gave. However, with --self-contained I can. Here is a minimal example:

% pandoc --self-contained -s
<style type="text/javascript">
var x = "<img src=\"./image/pic.png\" alt=\"pic.png\" width=400 />"
</style>
^D
pandoc: Could not find data file \"./image/pic.png\"

I got this both with 1.12.4.2 and with the dev version.

Now, it seems to me that the problem is that the --self-contained transformation should NOT do anything to tags inside literal javascript strings. Your post threw me off, because you seem to presuppose that it should, and only object to the execution. But in general, the --self-contained transformation doesn't try to interpret javascript; the behavior above is a bug, but perhaps not the bug you thought it was.

@jgm
Copy link
Owner

jgm commented Aug 8, 2014

PS. the dev version gives the better message Could not fetch \"./image/pic.png\".

@jgm
Copy link
Owner

jgm commented Aug 8, 2014

This confirms my hunch:

Prelude Text.Pandoc.SelfContained Text.Pandoc> let x = "<style type=\"text/javascript\">\nvar x = \"<img src=\\\"./image/pic.png\\\" alt=\\\"pic.png\\\" width=400 />\"\n</style>\n"
Prelude Text.Pandoc.SelfContained Text.Pandoc> makeSelfContained def x
<interactive>: Could not fetch \"./image/pic.png\"
\"./image/pic.png\": openBinaryFile: does not exist (No such file or directory)
*** Exception: ExitFailure 67

@jgm
Copy link
Owner

jgm commented Aug 8, 2014

More clues:

Prelude Text.Pandoc.SelfContained Text.Pandoc Text.HTML.TagSoup> parseTags x
[TagOpen "style" [("type","text/javascript")],TagText "\nvar x = \"",TagOpen "img" [("src","\\\"./image/pic.png\\\""),("alt","\\\"pic.png\\\""),("width","400")],TagClose "img",TagText "\"\n",TagClose "style",TagText "\n"]

So the problem traces to the tagsoup library's parsing of this input into tags. I believe this is a bug in tagsoup, but I may not fully understand the HTML 5 parsing algorithm, which tagsoup purports to implement. I will check with the tagsoup maintainer.

@ndmitchell
Copy link

@jgm @jrowen, I note the example is using <style> tags and treating them like <script> tags. Is that an essential/relevant detail of the bug report, or a mistake?

@jrowen
Copy link
Author

jrowen commented Aug 8, 2014

@jgm, using your simple example, the changes below work as expected, creating a self-contained html doc.

<script type="text/javascript">
var x = '<img src="./image/pic.png" alt=\"pic.png\" width=400 />'
</script>

Here are a couple examples of the other script parsing issues I mentioned.

If a space isn't included at the end of the img tag, the resulting html doc includes width="400/">' instead of width="400"></img>'.

<script type="text/javascript">
var x = '<img src="./image/pic.png" alt=\"pic.png\" width=400/>'
</script>

Finally, if space is not included around the < operator, the resulting html doc includes a closing tag of < script> instead of </script>.

<script type="text/javascript">
var x = 2; var y = 1
if (x<y) {
    var z = "text"
}
</script>

@jgm
Copy link
Owner

jgm commented Aug 8, 2014

@jrowen, I don't see any bugs here. In your second example, I see (using 1.12.4.2):

% pandoc --self-contained
<script type="text/javascript">
var x = '<img src="./image/pic.png" alt=\"pic.png\" width=400/>'
</script>
^D
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<script type="text/javascript">
var x = '<img src="./image/pic.png" alt=\"pic.png\" width=400/>'
</script>
</body>
</html>

Here the contents of the javascript string are just as they were in the source, which is the expected behavior. (--self-contained is not meant to modify anything inside a javascript string literal).

In the third example:

% pandoc --self-contained
<script type="text/javascript">
var x = 2; var y = 1
if (x<y) {
    var z = "text"
}
</script>
^D
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<script type="text/javascript">
var x = 2; var y = 1
if (x<y) {
    var z = "text"
}
</script>
</body>
</html>

the javascript section is just as you included it, with a correct closing tag.

Do you get different results when you run the same command with the same input as above?

@jrowen
Copy link
Author

jrowen commented Aug 8, 2014

Below is the output (version 1.12.3) I see for the second example (some src text replaced with ...),

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<script type="text/javascript">
var x = '<img src="data:image/png;base64,..." alt="\&
quot;pic.png\&quot;" width="400/">'
</script>
</body>
</html>

and here is the output if I add a space, using width=400 /> instead,

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<script type="text/javascript">
var x = '<img src="data:image/png;base64,..." alt="\&
quot;pic.png\&quot;" width="400"></img>'
</script>
</body>
</html>

@jrowen
Copy link
Author

jrowen commented Aug 8, 2014

Here is the same for the third example, without space (x<y),

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<script type="text/javascript">
var x = 2; var y = 1
if (x<y) { var z="text" } < script>
</body>
</html>

and with space (x < y)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<script type="text/javascript">
var x = 2; var y = 1
if (x < y) {
    var z = "text"
}
</script>
</body>
</html>

It looks like I'm using a slightly older version, so maybe this has now been fixed.

@jgm
Copy link
Owner

jgm commented Aug 8, 2014

@jrowen, if you are using 1.12.3, then I suggest you simply upgrade.
It may be that your pandoc was compiled against an older version
of the tagsoup library that did not parse script tags correctly.

@jgm jgm closed this as completed Aug 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants