Question: Custom extension to support new syntax #2125

haayhappen · 2021-07-02T08:46:31Z

Marked version:

Describe the bug

I'm trying to use the custom extension of a tokenizer in order to add support for a custom syntax: {{variable}}
In the end this should render as some custom html.

The tokenizer seems to loop over the text far to often and therefore the results are multiple parsed html blocks with the same context.

To Reproduce
Here is my code for the extension:
https://gist.github.com/haayhappen/57b7c77151165584e810f036d536c3aa

Input:

Hey {{1}}

Marked Output:

<p>Hey</p><span class="variable">1</span><span class="variable">1</span><span class="variable">1</span><span class="variable">1</span><span class="variable">1</span><span class="variable">1</span>

Expected Output:

<p>Hey</p><span class="variable">1</span>

Expected behavior
I expect the custom syntax to be evaluated only once.

Did I misuse the extension? Any help highly appreciated!
Thanks

The text was updated successfully, but these errors were encountered:

calculuschild · 2021-07-02T14:01:06Z

On my initial skim through, your extension looks fine, but I'd have to dig deeper to see what's going on.

What's the output from the Lexer if you do this?

const tokens = marked.lexer(md);
console.log(tokens);

calculuschild · 2021-07-02T18:21:45Z

The issue may be in your start function. Remove the /g, which may be returning invalid values if you have more than one instance of {{something}} in your document, since it will try to match all of them instead of just the next one in the text.

Second, I would recommend making your rule begin with a ^, as in ^{{(.*?})} to make sure it only matches the current start of the string. A tokenizer wants to see if the immediate next text starts off a token, but without ^ you may have strange results since it may consume text in the wrong order.

calculuschild · 2021-07-12T19:40:50Z

@haayhappen Were you able to resolve this issue?

haayhappen · 2021-07-12T20:36:18Z

Sorry about my absence @calculuschild

I wasn't able to resolve it and had to move on to use a different approach. However I highly appreciate your help and think this could also be helpful to others.

qubyte · 2021-08-22T15:34:02Z

I'm implementing a custom extension to do highlighting with the mark element and I'm seeing the same issue. The markdown will look like: this ==doesn't== work and should render to this <mark>doesn't</mark> work. Unfortunately the match seems to apply multiple times.

const extension = {
  name: 'mark',
  level: 'inline',
  start(src) {
    return src.match(/==(?!\s)/)?.index;
  },
  tokenizer(src) {
    const rule = /==(?!\s)([^\n]+)(?!\s)==/;
    const match = rule.exec(src);

    if (match) {
      console.log(src, match) // This shows me multiple matches.
      return {
        type: 'mark',
        raw: match[0],
        inner: this.lexer.inlineTokens(match[1].trim())
      };
    }
  },
  renderer(token) {
    return `<mark>${this.parser.parseInline(token.inner)}</mark>`;
  }
};

Example logs:

I'm (John) Smith ==of== ABC (corp). [
  '==of==',
  'of',
  index: 17,
  input: "I'm (John) Smith ==of== ABC (corp).",
  groups: undefined
]
ohn) Smith ==of== ABC (corp). [
  '==of==',
  'of',
  index: 11,
  input: 'ohn) Smith ==of== ABC (corp).',
  groups: undefined
]
mith ==of== ABC (corp). [
  '==of==',
  'of',
  index: 5,
  input: 'mith ==of== ABC (corp).',
  groups: undefined
]

So it looks like the start index of the source string is shifting, but feeding some of the same characters to the extension multiple times.

calculuschild · 2021-08-22T15:37:25Z

@qubyte I would suggest changing const rule by inserting the "start of string" marker ^ so that your token only matches when the lexer is actually at that position.

const rule = /^==(?!\s)([^\n]+)(?!\s)==/;

qubyte · 2021-08-22T15:39:47Z

Nice. That works! I'd have never figured that out on my own. Thanks! 😅

qubyte · 2021-08-22T15:40:48Z

It does make me a little confused about what the start function is used for. I was assuming that it was feeding the start index forward.

calculuschild · 2021-08-22T16:15:07Z

The start function does feed that index forward; essentially it acts as a hint to the lexer to pause lexing of standard Markdown at that point and check if there is a valid extension token there so it doesn't get consumed by another token. However this is only a hint that there may be a token here to avoid another token consuming it. It is not the only way your extension might be triggered.

There are many cases when the string not be aligned with the start of your token. By default, all user extensions are given priority over standard Markdown syntax, and the lexer will first run your tokenizer attempting to match if I'm (John) Smith ==of== ABC (corp). before attempting to parse it as a paragraph. Without ^, your tokenizer sees that there is indeed a match somewhere in the string, and treat it as a valid token even though it isn't in the right location. Then, because you returned a token to the lexer, it will consume characters as normal (from the wrong location) and then repeat the search for the next token, giving your extension priority again.

I don't know if I've explained it well but feel free to ask if you have questions.

qubyte · 2021-08-22T18:04:56Z

Thank you for the detailed explanation! You've also answered a couple of other questions I had before I'd formed them (mainly about priority). My little extension has other issues (the regex needs a bit of tweaking), but you've given me everything I need to finish building it. Thanks again!

qubyte · 2021-08-22T18:34:12Z

In case it's helpful to anyone else, the solution I ended up with eschewed a regular expression for the tokenizer because the regular expression was coming out too complex for me to reliably maintain. I still used one for the start function though:

const extension = {
  name: 'mark',
  level: 'inline',
  start(src) {
    return src.match(/==(?!\s)/)?.index;
  },
  tokenizer(src) {
    if (!src.startsWith('==')) {
      return;
    }

    const nextIndex = src.indexOf('==', 2);

    if (nextIndex !== -1) {
      return {
        type: 'mark',
        raw: src.slice(0, nextIndex + 2),
        inner: this.lexer.inlineTokens(src.slice(2, nextIndex))
      };
    }
  },
  renderer(token) {
    return `<mark>${this.parser.parseInline(token.inner)}</mark>`;
  }
};

haayhappen closed this as completed Jul 12, 2021

tipiirai mentioned this issue Feb 5, 2024

Inline components nuejs/nue#187

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Custom extension to support new syntax #2125

Question: Custom extension to support new syntax #2125

haayhappen commented Jul 2, 2021 •

edited by calculuschild

calculuschild commented Jul 2, 2021 •

edited

calculuschild commented Jul 2, 2021 •

edited

calculuschild commented Jul 12, 2021

haayhappen commented Jul 12, 2021

qubyte commented Aug 22, 2021

calculuschild commented Aug 22, 2021

qubyte commented Aug 22, 2021

qubyte commented Aug 22, 2021

calculuschild commented Aug 22, 2021 •

edited

qubyte commented Aug 22, 2021

qubyte commented Aug 22, 2021

Question: Custom extension to support new syntax #2125

Question: Custom extension to support new syntax #2125

Comments

haayhappen commented Jul 2, 2021 • edited by calculuschild

calculuschild commented Jul 2, 2021 • edited

calculuschild commented Jul 2, 2021 • edited

calculuschild commented Jul 12, 2021

haayhappen commented Jul 12, 2021

qubyte commented Aug 22, 2021

calculuschild commented Aug 22, 2021

qubyte commented Aug 22, 2021

qubyte commented Aug 22, 2021

calculuschild commented Aug 22, 2021 • edited

qubyte commented Aug 22, 2021

qubyte commented Aug 22, 2021

haayhappen commented Jul 2, 2021 •

edited by calculuschild

calculuschild commented Jul 2, 2021 •

edited

calculuschild commented Jul 2, 2021 •

edited

calculuschild commented Aug 22, 2021 •

edited