Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MathJax support and bug fix #15

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

clark800
Copy link

This prevents processing of text inside $ and $$ so that MathJax will work. Without this, for example, a math formula with underscores would get broken by <em> insertions.

For single dollar signs, since documents often contain single dollar signs for other reasons, we only change the behavior if the span does not start or end with whitespace and does not contain a newline. In the unlikely event that a document unintentionally matches this pattern, it is even more unlikely that it will cause a real problem because it just prevents processing but doesn't make any changes. Dollar signs inside code blocks will not be affected.

This also fixes a bug that caused "surround" characters to generate tags even when the start and stop tokens were at the same location.

@karlb
Copy link
Owner

karlb commented Nov 4, 2023

Thanks for the PR! The code seems to work fine.

Things that let me hesitate instead of just pushing the merge button:

  • The syntax is not part of CommonMark and the ecosystem did not settle on one obvious syntax (looking at https://github.com/cben/mathdown/wiki/math-in-markdown)
  • Is the heuristic good enough not to break existing documents? This is hard to judge, unfortuantely.
  • Do we need both syntaxes? Using only $$ could further reduce false positives.

@N-R-K
Copy link
Collaborator

N-R-K commented Nov 4, 2023

Given that smu is supposed to be simple and minimal - missing even some commonmark features - a nonstandard feature like this seems out of place to me. It is probably better kept in personal builds.

Or at the very least, it shouldn't be enabled by default. md4c for example has nonstandard extensions but they are disabled by default and needs to be enabled via cli flag.

@clark800
Copy link
Author

clark800 commented Nov 4, 2023

Yeah, I noticed that there are a variety of conventions within markdown, which was a bit surprising, especially for example with a case like md4c which outputs custom HTML tags that don't seem to be compatible with MathJax.

I think the choice of syntax is fairly easy though because it's most natural to not introduce a new syntax for writing LaTeX when you have the option of using the LaTeX syntax. LaTeX inline math delimiters are $ ... $ and \( ... \) and display math delimiters are $$ ... $$ and \[ ... \] (https://www.overleaf.com/learn/latex/Mathematical_expressions). Of these, the ones with backslashes interfere with escaping in markdown so the dollar signs are the only choice that doesn't introduce a new syntax and doesn't interfere with escaping in markdown.

MathJax looks for these LaTeX delimiters, so it would only make sense to translate from another set of delimiters if there was a serious concern about breaking documents, but I think the way it is setup here makes this issue fairly negligible, as explained in my last comment.

GitHub Flavored Markdown uses $/$$ syntax so it's fairly standard (https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/writing-mathematical-expressions). It's also used in popular markdown renderers like Pandoc (https://pandoc.org/chunkedhtml-demo/8.13-math.html).

It is definitely possible that this change could affect the output for existing documents, but only for rare cases like:

I paid $100 and *they* paid -$100.

The difference in the output would just be whether the asterisks get rendered as italics. In general, the difference would be whether inline markdown commands get rendered between spans of dollar signs on a single line where there is no whitespace on the inner sides of the dollar signs, which is unusual on the left side of a dollar sign unless it's a negative value.

Though I would technically consider this a breaking change, I think this is a sufficiently rare edge case to worry about much. I did consider whether there should be a flag for this and I felt that it would be unnecessary complexity.

Pandoc uses the same rule except it additionally checks that the second dollar sign is not followed by a digit, which would also exclude this case. I slightly prefer the slightly simpler and more symmetric rule in this PR, but if you want to follow the Pandoc convention that would be an easy change.

I think both $ and $$ are important because $ means inline math and $$ mean display math. Using additional conventions to combine them would increase complexity or reduce flexibility.

@clark800
Copy link
Author

clark800 commented Nov 8, 2023

I updated the PR so that it will have no impact on the default build and math support is only enabled when built with make math.

This update also makes it possible to configure the math delimiters at build time so it's easy to customize by setting the CPPFLAGS environment variable when running make.

I also decided to use the \[ \] and \( \) delimiters in the output since we don't have to be so careful about causing changes anymore, and this will make it so inline math works without having to change the default MathJax configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants