-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is rehype-raw not parsing textnodes? #8
Comments
I think rehype-raw does the opposite of what you expect it to. It deals with HTML in markdown, not markdown in HTML in markdown! You’d need a different project for this I think! |
Yeah, it definitely sounds like you’re expecting sort-of the “inverse” of what this project does! Someone will have to create that — this isn’t it! |
I find this behaviour counter-intuitive as well. I am afraid I have to agree with @Sewdn on this with the following explanation: If you claim that we are dealing with HTML in markdown only, then IMHO The line breaks are unnecessary. See pandoc's behaviour If you want to interpret a single HTML linebreak as whitespace (strictly what HTML and markdown do), then only the second line break (in the blank line after the opening tag) should count as markdown which is interpreted as another whitespace and not another I am afraid that you might be double counting here! |
After a lot more research, it turns out that in Commonmark and Pandoc strict, only markdown within inline tags is converted, block tags are treated as pure html. Pandoc in its regular operation provides the convenience of processing markdown in block tags. While the behaviour is imho counter-intuitive, it is in line with current rules of markdown conversion to html. With this regard, all I have to add is that the above example where it seems the |
@CxRes Correct, that’s what my first comment was in regard to as well. The second example is markdown (a), with one block of HTML (b), inside which is some further markdown syntax (c). This project focussed on HTML inside markdown (b), not on markdown in HTML (c). That’s why this issue is unrelated to this project. Your first comment is a different question: it’s about a “non-standard” syntax. Which would be an issue in a different project (remark-parse). And which I don’t think we should support because we stick with CM / GFM. remark is pluggable, which means that it can of course be supported, but through a plugin that changes the HTML tokeniser! |
@wooorm A plugin would do nicely 😄 (I am guessing one that reruns remark-parse on the html nodes, however that is beyond my capabilities/knowledge!). Especially since there is a requirement (such as in my use case) to render documents and/or use styles + conventions meant for extended flavors such as Pandoc (which is sort of a defacto standard for non-standard/extended md syntax). I would still suggest the examples here could be made more clear, because a novice like yours truly is going to trip up from time to time. One is an anomaly, two's a trend! |
That would be welcome! I’d suggest to first check out if it’s possible to change the remark HTML block tokeniser to exit on one newline instead of two. Another route to take would be to support an attribute on elements (
Feel free to open a PR! |
@wooorm I have tried this every which way to process example 2 with no dice:
My hope was to place an option processMarkdownInHtml in the parser, which would then allow the different processing in a single pass. The other alternative is to look at each block html value and run the parser over it a second time - which is ugly. I simply do not have the knowledge needed for this and would have to defer to your expertise... |
Updated the example in Readme to clarify the processing of markdown embedded within html. In response to discussion in rehypejs#8
Updated the example in Readme to clarify the processing of markdown embedded within html. In response to discussion in rehypejs#8
When you look at the documentation, rehype-raw is parsing text-nodes that might be nested within html tags (and can contain html syntax itself):
yields:
The textnode of the div.note element was interpreted as a markdown paragraph and thus parsed and wrapped in a p-element, with its inner text also being parsed, resulting in a em-node with markdown as text value.
However, when I dont use the paragraph whitespace:
or
this yields:
The textnode of the div.node element was not wrapped in a paragraph (because it didnt have the required linebreaks). This is correct behaviour. But I would still expect the text value to be processed to end up with an emphasis element like this:
Is this intended behaviour not to parse the entire textnode because it is no paragraph? If it is a bug, I will look into it to fix it.
Thanks for looking into this!
The text was updated successfully, but these errors were encountered: