-
Hello, I'm hoping for a bit of a tip how to tokenize multiline constructs. I was trying to "guess this" for long hours, but it escapes me, could I ask for few tips? :) Let's say I want to capture multiline content between custom
I'm writing simple tokenizer like this: https://gist.github.com/vadistic/d9d3bd683bd6227e36ffc1a4b7203574 and using it with micromark via Q1: What kind of hooks should I use for such case? I'm guessing The text works, but gets null/eof after single line. And tbh I cannot experiment with behavior of other kinds, because I cannot come up with working example :/ Q2: Why all non-text hooks produce those errors for me? What I'm doing wrong? // with document https://github.com/micromark/micromark/blob/main/lib/initialize/document.js#L184
// with flow https://github.com/micromark/micromark/blob/main/lib/util/subtokenize.js#L167
Q3: How to subtokenize content between tags? Q3.1: How to subtokenize content between tags using all available constructs? (with flow constructs) The goal would be to render any available syntax inside those custom tags. Q3.2: And how I would prevent this if I wanted the opposite? To skip any nested tokenization? The goal would be to prevent parsing any nested content if it contained invalid (non-markdown) code. Q3.3: How to subtokenize content as "text", as is my The goal would be eg. something like rendering mdx string properties I know I can hack this on mdast level, but I want to try to have exact ast from the start. This is just an example of broader scope :) **Q3.4: what's the default behavior if tokenizer jsut consumes code? Q4: What construct options should I be using? My best guess is Could provide me like with 3 words how Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
“Containers” come from the margin, such as a block quote or a list item. So, “indent based”.
Set
That error indicates that the lines aren’t linked properly. Tip: use the debugging features: https://github.com/micromark/micromark#size--debug.
Can you elaborate more on why you want content in
That seems more likely. See fenced code for an example.
Use text chunks tokens with a micromark/lib/tokenize/heading-atx.js Lines 63 to 69 in c8c644a Q4: What construct options should I be using. Yep. Lazy means that lines don‘t have to be indented nicely in containers. Only content has that: - asdasd
this is still part of the content, and doesn’t close the blockquote Certain things can or cannot interrupt another construct, here is an example of indented code not interrupting a paragraph:
Partial means that it’s not a “whole” construct, just a small nameless subroutine used to check if something matches. |
Beta Was this translation helpful? Give feedback.
“Containers” come from the margin, such as a block quote or a list item. So, “indent based”.
If you have a start and end line, it’s flow. Code (fenced) and HTML (flow) are much more similar to what you’re doing.
Or, in fact, the directive “container”, which is actually flow: https://github.com/micromark/micromark-extension-directive/blob/29eb2a025d177ee02682c0a7e0cc142a4aaa6633/lib/syntax.js#L12.
Set
concrete: true
on your construct: https://github.com/micromark/micromark-extension-directive/blob/29eb2a025d177ee02682c0a7e0cc142a4aaa6633/lib/tokenize-directive-container.js#L4.