Skip to content
This repository has been archived by the owner on Jul 17, 2023. It is now read-only.

Wrong escaping of CDATA #109

Closed
daroczig opened this issue May 30, 2020 · 13 comments
Closed

Wrong escaping of CDATA #109

daroczig opened this issue May 30, 2020 · 13 comments

Comments

@daroczig
Copy link
Contributor

Example document:

---
title: "test page"
output:
  conflr::confluence_document:
    space_key: "***"
---

<ac:structured-macro ac:name="expand">
  <ac:parameter ac:name="title">hidden stuff below</ac:parameter>
  <ac:rich-text-body><![CDATA[<p>foo</p>]]></ac:rich-text-body>
</ac:structured-macro>

Note the CDATA section for passing in HTML.

This fails with:

Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html,  : 
  Opening and ending tag mismatch: confl-ac-rich-text-body line 1 and p [76]
Calls: render ... read_xml.character -> read_xml.raw -> doc_parse_raw

And renders a markdown file as:

<ac:structured-macro ac:name="expand"> <ac:parameter ac:name="title">hidden stuff below</ac:parameter> <ac:rich-text-body><![CDATA[<p>foo

</p>

\]\]\></ac:rich-text-body> </ac:structured-macro>

Note that the closing tag of the CDATA is somehow got escaped.

Not sure what's causing the problem, as conflr:::restore_cdata seems to match it and run correctly (after patching to work with rich-text-body as well besides plain-text-body), but some other mechanism might mess up the closing CDATA tag later on in the process?

@daroczig
Copy link
Contributor Author

Jumping into translate_to_confl_macro shows the below for html_text:

[1] "<p><confl-ac-structured-macro confl-ac-name=\"expand\"> <confl-ac-parameter confl-ac-name=\"title\">hidden stuff below</confl-ac-parameter> <confl-ac-rich-text-body>&lt;![CDATA[<p>foo</p>\n</p>\n<p>]]&gt;</confl-ac-rich-text-body> </confl-ac-structured-macro></p>\n"

@yutannihilation
Copy link
Contributor

Hmm, thanks. I guess this is basically because CDATA syntax is not supported by commonmark (yes, GitHub flavored markdown does...). Maybe I need to bypass CDATA section...?

This Rmd (conflr uses commonmark variant)

---
output:
  md_document:
    variant: commonmark
---

<ac:rich-text-body><![CDATA[<p>foo</p>]]></ac:rich-text-body>

will be knitted to this:

<ac:rich-text-body><![CDATA[<p>foo

</p>

\]\]\></ac:rich-text-body>

@daroczig
Copy link
Contributor Author

Oh, I see -- thank you!

Interesting, as cdata seems to be part of the commonmark specs at https://spec.commonmark.org/0.29/#cdata-section

Anyway, do you know about any workaround passing arbitrary text in a macro using rich-text-body? Including eg an exclamation mark or similar character triggers that opening/closing tag mismatch error.

@yutannihilation
Copy link
Contributor

Interesting, as cdata seems to be part of the commonmark specs at https://spec.commonmark.org/0.29/#cdata-section

Wow! Sorry, then I simply misunderstood the spec...

Hmm, I don't come up with any workaround at the moment.

@yutannihilation
Copy link
Contributor

One problem is that ac: namespace is not recognized, so this can be avoided by replacing ac: with confl-ac- (this is done in the post processor anyway)

commonmark::markdown_commonmark("<ac:rich-text-body><![CDATA[<p>foo</p>]]></ac:rich-text-body>")
#> [1] "<ac:rich-text-body><![CDATA[<p>foo</p>]]>\\</ac:rich-text-body\\>\n"
commonmark::markdown_commonmark("<ac-rich-text-body><![CDATA[<p>foo</p>]]></ac-rich-text-body>")
#> [1] "<ac-rich-text-body><![CDATA[<p>foo</p>]]></ac-rich-text-body>\n"

Created on 2020-05-30 by the reprex package (v0.3.0)

But, this doesn't actually solve the problem. I have no idea what's happening here.

@daroczig
Copy link
Contributor Author

Thanks for looking into this further.

I also spent some time on this, and found that the problem for the tag mismatch is that something in the middle of the flow adds p tags, eg

<confl-ac-structured-macro confl-ac-name="expand">
  <confl-ac-parameter confl-ac-name="title">hidden stuff below</confl-ac-parameter>
  <confl-ac-rich-text-body><p>foo!</p></confl-ac-rich-text-body>
</confl-ac-structured-macro>

Will be rendered in md as:

<confl-ac-structured-macro confl-ac-name="expand"> <confl-ac-parameter confl-ac-name="title">hidden stuff below</confl-ac-parameter> <confl-ac-rich-text-body>

<p>

foo\!

</p>

</confl-ac-rich-text-body> </confl-ac-structured-macro>

Which is OK, but if I peek into html_text in eg translate_to_confl_macro, I see this:

<p><confl-ac-structured-macro confl-ac-name="expand"> <confl-ac-parameter confl-ac-name="title">hidden stuff below</confl-ac-parameter> <confl-ac-rich-text-body></p>
<p>
<p>foo!</p>
</p>
<p></confl-ac-rich-text-body> </confl-ac-structured-macro></p>

So something adding the p tags that of course messes up the original opening/closing tags.

Any thoughts on what's causing that?

Interestingly, using eg b tags in the original text instead of p works:

<confl-ac-structured-macro confl-ac-name="expand">
  <confl-ac-parameter confl-ac-name="title">hidden stuff below</confl-ac-parameter>
  <confl-ac-rich-text-body><b>foo!</b></confl-ac-rich-text-body>
</confl-ac-structured-macro>

And this actually get pushed to Confluence and renders as foo! there without any issue.

@daroczig
Copy link
Contributor Author

The problem is I think that commonmark::markdown_html adds those extra p tags due to the blank lines in the markdown version ... that I have no idea yet what inserts.

@daroczig
Copy link
Contributor Author

Well, now I think I know that's the root problem:

$ echo '<confl-ac-structured-macro confl-ac-name="expand"><confl-ac-parameter confl-ac-name="title">hidden stuff below</confl-ac-parameter><confl-ac-rich-text-body><p>foo!</p></confl-ac-rich-text-body></confl-ac-structured-macro>' | pandoc -t commonmark

<confl-ac-structured-macro confl-ac-name="expand"><confl-ac-parameter confl-ac-name="title">hidden
stuff below</confl-ac-parameter><confl-ac-rich-text-body>

<p>

foo\!

</p>

</confl-ac-rich-text-body></confl-ac-structured-macro>

Sorry for the many messages here 🤦

I will let you know if I find a solution to this problem -- not really related to conflr

@yutannihilation
Copy link
Contributor

A pandoc expert told me that we can use raw attribute here. Could you try this?

---
title: "test page"
output:
  conflr::confluence_document:
    space_key: "***"
---

<ac:structured-macro ac:name="expand">
  <ac:parameter ac:name="title">hidden stuff below</ac:parameter>
  <ac:rich-text-body>`<![CDATA[<p>foo</p>]]>`{=html}</ac:rich-text-body>
</ac:structured-macro>

@atusy
Copy link

atusy commented May 30, 2020

You can also treat the whole block as raw HTML.
This would be easier to read and debug.

```{=html}
<ac:structured-macro ac:name="expand">
  <ac:parameter ac:name="title">hidden stuff below</ac:parameter>
  <ac:rich-text-body><![CDATA[<p>foo</p>]]></ac:rich-text-body>
</ac:structured-macro>
```

@yutannihilation
Copy link
Contributor

Thanks @atusy!

@daroczig
Copy link
Contributor Author

Awesome, thank you very much to both @yutannihilation and @atusy 🙇

FTR this is a bit more complex example on what I tried to achieve (someone might find this useful):

---
title: "gergely test page"
output:
  conflr::confluence_document:
    space_key: "***
---

```{=html}
<confl-ac-structured-macro confl-ac-name="expand">
  <confl-ac-parameter confl-ac-name="title">hidden stuff below</confl-ac-parameter>
  <confl-ac-rich-text-body>
    <p>foo! `r 4+8`</p>
    ```{r echo=FALSE, results='asis'}
    library(pander)
    panderOptions('knitr.auto.asis', FALSE)
    library(commonmark)
    cat(markdown_html(pander_return(head(iris), style = 'rmarkdown'), extensions = TRUE))
    ```
  </confl-ac-rich-text-body>
</confl-ac-structured-macro>
```

OK!

Resulting in:

image


Also, closing the ticket as no need for CDATA 😄
Thanks again!

@yutannihilation
Copy link
Contributor

Phew, pretty complex! Glad that you find that way, thanks for sharing.

conflr might eventually support CDATA (or introduce some special syntax for expand macro?), but I'm not sure if it's worth implementing at the moment, considering the complexity we found here. I filed a new issue #110 for this. If you find some case where we need better support for CDATA, please comment there :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants