Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protect_math() fails for "\\", "\," and "\;" #61

Open
hriebl opened this issue Jan 26, 2022 · 5 comments · May be fixed by #62
Open

protect_math() fails for "\\", "\," and "\;" #61

hriebl opened this issue Jan 26, 2022 · 5 comments · May be fixed by #62
Assignees
Labels
bug Something isn't working

Comments

@hriebl
Copy link

hriebl commented Jan 26, 2022

Thank you for this great package! I'm using it to format Python code chunks in RMarkdown with black, isort, etc.

I think I've found a small issue with protect_math(), which seems to fail for \\, \, and \;. Here's my reprex: Create a file example.md with the following content:

Example 1: $a = \begin{pmatrix} b \\ c \end{pmatrix}$

Example 2: $f(x, \, y)$

Example 3: $f(x, \; y)$

Then run:

x <- tinkr::yarn$new("example.md")
x$protect_math()
x$show()
## Example 1: $a = \begin{pmatrix} b \ c \end{pmatrix}$
##
## Example 2: $f(x, , y)$
##
## Example 3: $f(x, ; y)$

Would be cool to see a fix for this!

Session Info
Session info ────────────────────────────────────────
 setting  value
 version  R version 4.1.2 (2021-11-01)
 os       Fedora Linux 35 (Workstation Edition)
 system   x86_64, linux-gnu
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Berlin
 date     2022-01-26
 rstudio  2021.09.2+382 Ghost Orchid (desktop)
 pandoc   2.14.0.3 @ /usr/bin/pandocPackages ────────────────────────────────────────────
 package     * version date (UTC) lib source
 brio          1.1.3   2021-11-30 [1] CRAN (R 4.1.2)
 cachem        1.0.6   2021-08-19 [1] CRAN (R 4.1.1)
 callr         3.7.0   2021-04-20 [1] CRAN (R 4.1.1)
 cli           3.1.1   2022-01-20 [1] CRAN (R 4.1.2)
 crayon        1.4.2   2021-10-29 [1] CRAN (R 4.1.1)
 desc          1.4.0   2021-09-28 [1] CRAN (R 4.1.1)
 devtools      2.4.3   2021-11-30 [1] CRAN (R 4.1.2)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.1)
 fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.1)
 fs            1.5.2   2021-12-08 [1] CRAN (R 4.1.2)
 glue          1.6.1   2022-01-22 [1] CRAN (R 4.1.2)
 lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.1)
 magrittr      2.0.1   2020-11-17 [2] CRAN (R 4.1.0)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.1.2)
 pkgbuild      1.3.1   2021-12-20 [1] CRAN (R 4.1.2)
 pkgload       1.2.4   2021-11-30 [1] CRAN (R 4.1.2)
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.1.1)
 processx      3.5.2   2021-04-30 [1] CRAN (R 4.1.1)
 ps            1.6.0   2021-02-28 [1] CRAN (R 4.1.1)
 purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.1)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.1)
 remotes       2.4.2   2021-11-30 [1] CRAN (R 4.1.2)
 rlang         0.4.12  2021-10-18 [1] CRAN (R 4.1.1)
 rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.1.1)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
 testthat      3.1.2   2022-01-20 [1] CRAN (R 4.1.2)
 usethis       2.1.5   2021-12-09 [1] CRAN (R 4.1.2)
 withr         2.4.3   2021-11-30 [1] CRAN (R 4.1.2)

 [1] /home/hannes/R/x86_64-redhat-linux-gnu-library/4.1
 [2] /usr/lib64/R/library
 [3] /usr/share/R/library

───────────────────────────────────────────────────────
@hriebl
Copy link
Author

hriebl commented Jan 27, 2022

Also, display math seems to require newlines after the first $$ and before the second $$.

example.md:

$$
\exp(x)
$$

$$\exp(x)$$

R code:

x <- tinkr::yarn$new("examples/example.md")
x$protect_math()
x$show()
## $$
## \exp(x)
## $$
## 
## $$\\exp(x)$$

@zkamvar zkamvar self-assigned this Jan 31, 2022
@zkamvar zkamvar added the bug Something isn't working label Jan 31, 2022
zkamvar added a commit that referenced this issue Feb 2, 2022
@zkamvar zkamvar linked a pull request Feb 2, 2022 that will close this issue
@zkamvar
Copy link
Member

zkamvar commented Feb 2, 2022

Thank you for the report! Sorry I haven't gotten to it until now (been swamped with releasing a large project). The inline block math behaviour was known, but I did not know about the slashes disappearing. I think I may have some time this week to address both of them.

@hriebl
Copy link
Author

hriebl commented Feb 2, 2022

No worries! Thank you for looking into it!

zkamvar added a commit that referenced this issue Mar 11, 2022
@zkamvar
Copy link
Member

zkamvar commented Mar 15, 2022

Just a follow up, I have been looking into this and it's a problem with the fact that commonmark sees everything in a commonmark context and it doesn't understand math mode (which is why we needed to create protect_math() in the first place [see #38]), so everything in math mode is printed "as is" without escaping markdown characters that normally need to be escaped (most notably \).

The problem is that when commonmark reads in markdown, it helpfully removes any escapes so we don't know which characters in the original document were escaped 😩. Take for example:

f <- textConnection("a\\; b; c \\\\ d")
y <- tinkr::yarn$new(f)
writeLines(as.character(y$body))
#> <?xml version="1.0" encoding="UTF-8"?>
#> <!DOCTYPE document SYSTEM "CommonMark.dtd">
#> <document xmlns="http://commonmark.org/xml/1.0">
#>   <paragraph>
#>     <text xml:space="preserve">a; b; c \ d</text>
#>   </paragraph>
#> </document>

Created on 2022-03-14 by the reprex package (v2.0.1)

This has a semicolon and a backslash that are both escaped in the original document, but in the XML representation, those escape characters are just not there and unfortunately, things like $\begin{pmatrix}b\\c\end{pmatrix}$ are perfectly valid LaTeX. I think the best I can do is to do a post-search through the 'asis' nodes to protect single slashes and punctuation that have spaces wrapped around them. The other solution is to re-read in the original document and replace the escaped markdown with the original markdown (which is [the process we use to resolve anchor links]).

That being said, until I get this fixed, if your escaped math symbols will have consistent formatting (e.g. any escaped symbols will always be surrounded by spaces), then you can post-process after protecting the math by re-adding in the escapes with regex:

ex <- r"{ 
Example 1: $a = \begin{pmatrix} b \\ c \end{pmatrix}$ 
 
Example 2: $f(x, \, y)$ 
 
Example 3: $f(x, \; y)$ 
}"  
y <- tinkr::yarn$new(textConnection(ex))$protect_math()
asis <- xml2::xml_find_all(y$body, ".//md:text[@asis]", y$ns)
txt <- xml2::xml_text(asis)
xml2::xml_set_text(asis, gsub("\\s([,;\\])\\s", "\\\\\\1", txt))
#> {xml_nodeset (3)}
#> [1] <text asis="true">$a = \\begin{pmatrix} b\\\\c \\end{pmatrix}$</text>
#> [2] <text asis="true">$f(x,\\,y)$</text>
#> [3] <text asis="true">$f(x,\\;y)$</text>
y$show()
#> Example 1: $a = \begin{pmatrix} b\\c \end{pmatrix}$
#> 
#> Example 2: $f(x,\,y)$
#> 
#> Example 3: $f(x,\;y)$

Created on 2022-03-14 by the reprex package (v2.0.1)

@zkamvar
Copy link
Member

zkamvar commented Mar 15, 2022

And I forgot to add in the spaces around the output in the gsub command, but you get the gist 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants