Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characters transformed into mathematical expressions #38

Closed
pokyah opened this issue Apr 28, 2021 · 12 comments
Closed

Characters transformed into mathematical expressions #38

pokyah opened this issue Apr 28, 2021 · 12 comments
Labels
bug Something isn't working

Comments

@pokyah
Copy link

pokyah commented Apr 28, 2021

Here is a reprex :

equation.md is a file containing the following string :

$$
Q_{N(norm)}=\frac{C_N +C_{N-1}}2\times 
\frac{\sum _{i=N-n}^{N}Q_i} {\sum_{j=N-n}^{N}{(\frac{C_j+C_{j-1}}2)}}
$$

when I run :

file = "equation.md"
ex1 <- tinkr::yarn$new(file)
ex1$write(file)

I end up with a modified mathematical expression:

$$
Q\_{N(norm)}=\\frac{C\_N +C\_{N-1}}2\\times
\\frac{\\sum *{i=N-n}^{N}Q\_i} {\\sum*{j=N-n}^{N}{(\\frac{C\_j+C\_{j-1}}2)}}
$$

I've tried to play around with the encoding using:

ex1 <- tinkr::yarn$new(file, encoding = "utf-8")
ex1 <- tinkr::yarn$new(file, encoding = "unkown")
ex1 <- tinkr::yarn$new(file, encoding = "latin_1")

But nothing has solved my problem.

How can I make sure that tinkr won't mess these mathematical expressions?

Thanks for your help

@zkamvar zkamvar added the bug Something isn't working label Apr 28, 2021
@zkamvar
Copy link
Member

zkamvar commented Apr 28, 2021

Oh that's fun! The reason for this is likely because {tinkr} uses the commonmark C library (via {commonmark}) to produce the internal XML representation, which doesn't know anything about LaTeX or math.

One solution is to add a method that can parse the document for math expressions, add an attribute that forces them to render as-is in the xsl stylesheet. I have a working version of this in another package I maintain that uses {tinkr}: https://github.com/carpentries/pegboard/blob/7be430383dc5436b66ac0733ff8685bcba4e4556/inst/stylesheets/xml2md_gfm_kramdown.xsl#L22-L26

Here is a reprex from your example confirming your result.

library("tinkr")
math <- r'{
$$
Q_{N(norm)}=\frac{C_N +C_{N-1}}2\times 
\frac{\sum _{i=N-n}^{N}Q_i} {\sum_{j=N-n}^{N}{(\frac{C_j+C_{j-1}}2)}}
$$
}'
tmp1 <- tempfile()
tmp2 <- tempfile()
writeLines(math, tmp1)
x <- yarn$new(tmp1)
x$write(tmp2)
cat(readLines(tmp1), sep = "\n")
#> 
#> $$
#> Q_{N(norm)}=\frac{C_N +C_{N-1}}2\times 
#> \frac{\sum _{i=N-n}^{N}Q_i} {\sum_{j=N-n}^{N}{(\frac{C_j+C_{j-1}}2)}}
#> $$
cat(readLines(tmp2), sep = "\n")
#> $$
#> Q\_{N(norm)}=\\frac{C\_N +C\_{N-1}}2\\times
#> \\frac{\\sum *{i=N-n}^{N}Q\_i} {\\sum*{j=N-n}^{N}{(\\frac{C\_j+C\_{j-1}}2)}}
#> $$

Created on 2021-04-28 by the reprex package (v2.0.0)

@maelle
Copy link
Member

maelle commented Apr 29, 2021

@zkamvar oh this is cool, should we add it to tinkr's default stylesheet?

@zkamvar
Copy link
Member

zkamvar commented Apr 29, 2021

@zkamvar oh this is cool, should we add it to tinkr's default stylesheet?

I think it's definitely worthwhile since this kind of rule would allow both math and #20 to go through. The tricky part is actually parsing the content to add an attribute.

@maelle
Copy link
Member

maelle commented Apr 30, 2021

Ok so what's needed is both more parsing + adding lines to the stylesheet? Would you have time to make a PR or should I make a note to look into this? Thank you in any case. 🙏

@zkamvar
Copy link
Member

zkamvar commented Apr 30, 2021

Since I've been meaning to work on #20 with a similar solution, I will try to get a PR through next week.

@zkamvar
Copy link
Member

zkamvar commented May 1, 2021

I'm drafting what I have right now, which is #39, but it still needs some work, tests, and documentation. What needs to be done:

  • a rudimentary block math protector
  • a fully-featured block math protector (I need to strip off all other text attributes before I add the 'asis' attribute)
  • splitter for inline text that needs specific sections protected
  • inline math protector
  • list protector

@pokyah
Copy link
Author

pokyah commented May 3, 2021

Excellent!

Thanks for your reactivity!

@zkamvar
Copy link
Member

zkamvar commented May 3, 2021

To write down my thoughts before I set this down until later this week when things slow down a bit:

The block quote math protector works by labeling nodes all nodes between $$ with @asis='true' and adding a new rule for emph asis nodes to use the _ instead of *. You can see that it works in https://github.com/ropensci/tinkr/pull/39/files#diff-606c8d74c9d3b98e57b3eefcfd8d852529a6240b1a0ef2152621b4dd58e73b24

At the moment, it only protects block math via $$. Math delimted by \[ is sill unprotected, partially because commonmark treats this the same as [ during conversion (which is why we get the issue with the checkboxes), so it's difficult to detect which ones were true math and which ones were just brackets that someone really meant to insert (e.g. citations). I think the solution to this (and checkboxes) is to pre-process bare square brackets before they go through commonmark with TINKR_RB and TINKR_LB and then replace them in XML.

Inline math is tricky because $26 and $24 could be interpreted as math, but it's not unreasonable to require the delimiters to be actually touching the math.

@zkamvar
Copy link
Member

zkamvar commented May 5, 2021

Okay, right now in 14655c5, we have the ability to parse math with the following stipulations:

  1. Math must be represented in dollar notation
  2. Inline math must start and end on the same line
  3. There must be no blank lines before suffix marks for inline math
  4. Block math must start and end with $$ on their own lines.
remotes::install_github("ropensci/tinkr@14655c558026b")
library("tinkr")
m <- yarn$new(system.file("extdata", "math-example.md", package = "tinkr"))
m$show()
output of `m$show()`
#> ---
#> title: An example with math elements
#> ---
#> 
#> This example has $\\LaTeX$ elements embedded in the
#> text. It is intended to demonstrate that m $\\alpha\_\\tau$ h
#> mode can work with tinkr. $y =
#> mx + b$
#> 
#> - \[ \] This is an empty checkbox
#> - \[x\] This is a checked checkbox
#> - [This is a link](https://ropensci.org)
#> - \[this is an example\]
#> 
#> Here is an example from the mathjax website:
#> 
#> When $a \\ne 0$, there are two solutions to (ax^2 + bx + c = 0) and they are
#> \[x = {-b \\pm \\sqrt{b^2-4ac} \\over 2a}.\]
#> 
#> ```latex
#> $$
#> \begin{align} % This mode aligns the equations to the '&=' signs
#> \begin{split} % This mode groups the equations into one.
#> \bar{r}_d &= \frac{\sum\sum{cov_{j,k}}}{
#>                    \sum\sum{\sqrt{var_{j} \cdot var_{k}}}} \\
#>           &= \frac{V_O - V_E}{2\sum\sum{\sqrt{var_{j} \cdot var_{k}}}}
#> \end{split}
#> \end{align}
#> $$
#> ```
#> 
#> $$
#> \\begin{align} % This mode aligns the equations to the '\&=' signs
#> \\begin{split} % This mode groups the equations into one.
#> \\bar{r}*d \&= \\frac{\\sum\\sum{cov*{j,k}}}{
#> \\sum\\sum{\\sqrt{var\_{j} \\cdot var\_{k}}}} \\
#> \&= \\frac{V\_O - V\_E}{2\\sum\\sum{\\sqrt{var\_{j} \\cdot var\_{k}}}}
#> \\end{split}
#> \\end{align}
#> $$
#> 
#> When $a \\ne 0$, there are two solutions to $ax^2 + bx + c = 0$ and they are
#> 
#> ```latex
#> $$
#> x = {-b \pm \sqrt{b^2-4ac} \over 2a}
#> $$
#> ```
#> 
#> $$
#> x = {-b \\pm \\sqrt{b^2-4ac} \\over 2a}
#> $$
#> 
#> Below is an example from [https://github.com/ropensci/tinkr/issues/38](https://github.com/ropensci/tinkr/issues/38)
#> 
#> ```latex
#> $$
#> Q_{N(norm)}=\frac{C_N +C_{N-1}}2\times 
#> \frac{\sum _{i=N-n}^{N}Q_i} {\sum_{j=N-n}^{N}{(\frac{C_j+C_{j-1}}2)}}
#> $$
#> ```
#> 
#> $$
#> Q\_{N(norm)}=\\frac{C\_N +C\_{N-1}}2\\times
#> \\frac{\\sum *{i=N-n}^{N}Q\_i} {\\sum*{j=N-n}^{N}{(\\frac{C\_j+C\_{j-1}}2)}}
#> $$
m$protect_math()$show()
output of `m$protect_math()$show()`
#> ---
#> title: An example with math elements
#> ---
#> 
#> This example has $\LaTeX$ elements embedded in the
#> text. It is intended to demonstrate that m $\alpha_\tau$ h
#> mode can work with tinkr. $y =
#> mx + b$
#> 
#> - \[ \] This is an empty checkbox
#> - \[x\] This is a checked checkbox
#> - [This is a link](https://ropensci.org)
#> - \[this is an example\]
#> 
#> Here is an example from the mathjax website:
#> 
#> When $a \ne 0$, there are two solutions to (ax^2 + bx + c = 0) and they are
#> \[x = {-b \\pm \\sqrt{b^2-4ac} \\over 2a}.\]
#> 
#> ```latex
#> $$
#> \begin{align} % This mode aligns the equations to the '&=' signs
#> \begin{split} % This mode groups the equations into one.
#> \bar{r}_d &= \frac{\sum\sum{cov_{j,k}}}{
#>                    \sum\sum{\sqrt{var_{j} \cdot var_{k}}}} \\
#>           &= \frac{V_O - V_E}{2\sum\sum{\sqrt{var_{j} \cdot var_{k}}}}
#> \end{split}
#> \end{align}
#> $$
#> ```
#> 
#> $$
#> \begin{align} % This mode aligns the equations to the '&=' signs
#> \begin{split} % This mode groups the equations into one.
#> \bar{r}_d &= \frac{\sum\sum{cov_{j,k}}}{
#> \sum\sum{\sqrt{var_{j} \cdot var_{k}}}} \
#> &= \frac{V_O - V_E}{2\sum\sum{\sqrt{var_{j} \cdot var_{k}}}}
#> \end{split}
#> \end{align}
#> $$
#> 
#> When $a \ne 0$, there are two solutions to $ax^2 + bx + c = 0$ and they are
#> 
#> ```latex
#> $$
#> x = {-b \pm \sqrt{b^2-4ac} \over 2a}
#> $$
#> ```
#> 
#> $$
#> x = {-b \pm \sqrt{b^2-4ac} \over 2a}
#> $$
#> 
#> Below is an example from [https://github.com/ropensci/tinkr/issues/38](https://github.com/ropensci/tinkr/issues/38)
#> 
#> ```latex
#> $$
#> Q_{N(norm)}=\frac{C_N +C_{N-1}}2\times 
#> \frac{\sum _{i=N-n}^{N}Q_i} {\sum_{j=N-n}^{N}{(\frac{C_j+C_{j-1}}2)}}
#> $$
#> ```
#> 
#> $$
#> Q_{N(norm)}=\frac{C_N +C_{N-1}}2\times
#> \frac{\sum _{i=N-n}^{N}Q_i} {\sum_{j=N-n}^{N}{(\frac{C_j+C_{j-1}}2)}}
#> $$

Created on 2021-05-05 by the reprex package (v2.0.0)

@zkamvar
Copy link
Member

zkamvar commented May 6, 2021

@pokyah, assuming you use dollar notation throughout, I think I have fixed the issue. You can try it out by installing from github:

remotes::install_github("ropensci/tinkr#39")

I've added a small demo in #39

@zkamvar
Copy link
Member

zkamvar commented May 11, 2021

@pokyah, This feature has now been added to the default branch of the repository. Thank you for opening this issue. It has definitely helped us improve {tinkr} and got us to feed two birds with one scone by including a solution for github checkboxes!

If the new version does not address your issue, feel free to reopen this.

@zkamvar zkamvar closed this as completed May 11, 2021
@pokyah
Copy link
Author

pokyah commented Jun 30, 2021

@zkamvar , many thanks for your efficient support! Will work back on it soon and let you know if I encounter another difficulty or not!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants