-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle newlines inside strong/emphasis tags #123
base: master
Are you sure you want to change the base?
Conversation
Thanks for this! It’s an interesting case because I don’t think it’s possible to produce Your proposed outcome is: _Hello_
_world_ which converts back to HTML as: <p><em>Hello</em>
<em>world</em></p> which renders as: Hello world To make this a bit more robust, perhaps toMarkdown should place any number of newlines with a space in any inline element before converting it? |
yea actually my proposed outcome for
which, translated back to html, is:
which I think is right, but that's not what the old version of my code actually did. The new version I think handles this case properly. Thoughts? |
Thanks! I think the result of your most recent version is more inline with the outcome of the rendered HTML, however I am not entirely convinced that a converter is the correct place to handle this case. Converters should be as simple as possible, and should contain as little logic as possible. This is unavoidable in some cases (e.g. ul/ol), but it feels like the em/strong converter is doing too much. In this case I’m favouring stripping multiple newlines from inline elements before they are converted (i.e. somewhere in the Hope that makes sense?! |
So you're saying that:
|
Perhaps
should convert to
which would then result in the following when converted back to HTML:
This could be achieved by doing something like the following in the // Remove line-breaks caused by multiple <br>
if(isInline(node)) content = content.replace(/( \n){2,}/g, ' \n') Obviously it’s not perfect, but considering that it’s not possible to create the original HTML from markdown, and that it fixes the current issue (of the content not being wrapped correctly), as well as maintaining the simplicity of the converters, I feel it’s a reasonably good solution. Has this HTML been generated by |
@domchristie as backstory, I came across this issue while trying to convert HTML generated by trix editor to markdown. In trix, if you turn on italics and hit return twice, it'll generate this markup. It's perfectly valid html, which I agree cannot reasonably be generated by markdown. Is the goal of this library to only convert HTML that could be generated by markdown into markdown? If so then I think I agree your solution is the best. If the goal is to do a best-effort two-way conversion between the two then I think collapsing multiple newlines into one newline isn't great. Either way I agree that the processor should probably handle this instead of the converter, good call! What if I change the processor to call inline converters once per line? Another option would be to convert |
Yeh, it’s an interesting point. The goal of this library isn’t to only convert HTML that has been generated from Markdown, but to take a best-effort approach. However that’s not to say that it’s responsibility is to handle all cases including invalid or poorly authored HTML. About
So to have two successive In this case of The options: Original
Option 1.More than one line-break is squashed to just one. Results in:
Converted back to HTML:
Option 2.Two paragraphs are created:
To me they’re both incorrect! Option 1 maintains more of the original markup (since the So in cases like this, I’d opt for the simplest, most maintainable solution. As I mentioned above, I don’t think it’s the responsibility of the library to fix/alter the HTML structure, but the idea of converting line-by-line sounds interesting! Aside: Interestingly The Guardian’s Scribe creates a new paragraph on |
I have contributed some thoughts to the issue on trix here: basecamp/trix#75 (comment) |
Any news on if this will be implemented? I have a similar issue where I'm converting html from emails to markdown. I don't have control over the markup of the emails coming through. This markup
gets converted to
|
Previously the following markup:
would get converted into the markdown:
which is incorrect. Both lines should be surrounded with underscores. This pull request splits
em
,i,
strong, and
b` tags into lines and surrounds each line with the appropriate markdown