Skip to content

Conversation

@ysulyma
Copy link

@ysulyma ysulyma commented Mar 28, 2022

This PR adds the \data command to the HTML extension to allow adding data-* attributes to MathJax content.

I have opened a separate PR to document this command: mathjax/MathJax-docs#299

Copy link
Member

@zorkow zorkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, this should be useful for a lot of people. But before we can merge can you please implement the following changes:

  • Instead of the splitTokens method use the existing keyvalOptions method in input/tex/ParseUtil.ts.
  • Postprocess keys and values:
    1. Strip double quotes from values
    2. Remove keys that contain characters not allowed in attribute names as specified in https://html.spec.whatwg.org/multipage/syntax.html#attributes-2. Your current regexp is too restrictive. E.g., data-{foo}-123456789-+^#$()&@% is a perfectly legal, albeit very ugly, attribute name. Also the current matching could have unintended effects, like foo{}bar="baz" leading to the attribute data-bar="baz".

HtmlMethods.Data = (parser: TexParser, name: string) => {
const dataset = parser.GetArgument(name);
const arg = GetArgumentMML(parser, name);
for (const [prop, val] of splitTokens(dataset)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use keyvalOptions method from ParseUtil.ts.

* @param {string} str String to split.
*/
function splitTokens(str: string) {
const matches = Array.from(str.matchAll(/\b([A-Za-z0-9_-]+)=(['"])(.+?)\2/g));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too restrictive. See https://html.spec.whatwg.org/multipage/syntax.html#attributes-2 for HTML attribute names.

@zorkow zorkow added this to the 3.2.1 milestone Apr 8, 2022
@dpvc
Copy link
Member

dpvc commented Apr 8, 2022

Strip double quotes from values

I'm not sure that is necessary. Attribute values can include double-quotes, and when they are serialized, the double quote will be represented using ". So double quotes don't need to be handled specially in the values. It is the keys that need to be sanitized.

@zorkow
Copy link
Member

zorkow commented Apr 10, 2022

Strip double quotes from values

I'm not sure that is necessary. Attribute values can include double-quotes, and when they are serialized, the double quote will be represented using ". So double quotes don't need to be handled specially in the values. It is the keys that need to be sanitized.

We only need stripping of quotes, if we want to have the exact behaviour as suggested by the current splitTokens implementation. Currently splitTokens('foo="a",bar=b'); will return [ [ 'foo', 'a' ] ], while keyvalOptions('foo="a",bar=b'); yields { foo: '"a"', bar: 'b' }.
I don't really see a need for requiring arguments to be quoted either and believe we should allow (single/double) quotes.

@ysulyma is there a compelling reason for requiring quotes?

@ysulyma
Copy link
Author

ysulyma commented Apr 10, 2022

@zorkow KaTeX does not require quotes. I found that objectionable since HTML and JSON both require quotes.

In any case, the keyvalOptions method does not behave as expected for data strings:

Input Expected output keyvalOptions output
foo=a, bar=b {foo: "a", bar: "b"} {foo: "a", bar: "b"}
foo="a", bar="b" {foo: "a", bar: "b"} {foo: "\"a\"", bar: "\"b\""}
foo="a, bar=b" {foo: "a, bar=b"} {foo: "\"a", bar: "b\""}

Neither does my splitTokens method. So I think I should write a smarter method for parsing the data string.

@dpvc
Copy link
Member

dpvc commented Apr 10, 2022

I found that objectionable since HTML and JSON both require quotes.

I understand your view, but you are not writing either HTML or JSON, but rather LaTeX, and the LaTeX standard is to handle key/value pairs as in keyvalOptions. That is, to a LaTeX user, to get {foo: "a, bar=b"} you should use foo={a, bar=b} as you would in all other modern LaTeX packages. I understand wanting this to look like the HTML attributes, but that is not the LaTeX way.

@ysulyma
Copy link
Author

ysulyma commented Apr 16, 2022

@zorkow ok, I've made the requested changes.

@dpvc
Copy link
Member

dpvc commented Apr 18, 2022

This looks better, but you should also disallow noncharacters as required by the spec. You might need to add the u flag to the regex in order to handle the characters in the higher planes, though I don't know if all browsers support that.

Also, I wonder if throwing a TexError wouldn't be better than just ignoring the bad ones.

Finally, we follow the git-flow model for PRs, so this should be branched from the develop branch and targeted at develop as well, not at master. Only releases go to master. You may have to rebase your branch to develop and force push the result.

@dpvc dpvc removed this from the 3.2.1 milestone Apr 20, 2022
@dpvc
Copy link
Member

dpvc commented Jun 6, 2022

I'm closing this PR in favor of your other one.

We need to make a v3.2.2 release to fix a couple of issues with the v3.2.1 version, but will include your PR in the next feature release.

@dpvc dpvc closed this Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants