Skip to content

Commit

Permalink
Fix to filter out initial, final <br>s
Browse files Browse the repository at this point in the history
Closes GH-66.
  • Loading branch information
wooorm committed Oct 13, 2021
1 parent ff69c3f commit a02be0e
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 7 deletions.
36 changes: 29 additions & 7 deletions lib/all.js
Expand Up @@ -18,17 +18,39 @@ export function all(h, parent) {
// @ts-expect-error Assume `parent` is a parent.
const nodes = parent.children || []
/** @type {Array.<MdastNode>} */
let values = []
const values = []
let index = -1
/** @type {MdastNode|Array.<MdastNode>} */
let result
let length = nodes.length
let child = nodes[index + 1]

while (++index < nodes.length) {
// Trim initial and final `<br>`s.
// They’re not semantic per HTML, and they can’t be made in markdown things
// like paragraphs or headings.
while (child && child.type === 'element' && child.tagName === 'br') {
index++
child = nodes[index + 1]
}

child = nodes[length - 1]

while (
length - 1 > index &&
child &&
child.type === 'element' &&
child.tagName === 'br'
) {
length--
child = nodes[length - 1]
}

while (++index < length) {
// @ts-expect-error assume `parent` is a parent.
result = one(h, nodes[index], parent)
const result = one(h, nodes[index], parent)

if (result) {
values = values.concat(result)
if (Array.isArray(result)) {
values.push(...result)
} else if (result) {
values.push(result)
}
}

Expand Down
9 changes: 9 additions & 0 deletions test/fixtures/br/index.html
@@ -1,2 +1,11 @@
<p>alpha<br>bravo</p>
<pre>charlie<br>delta</pre>
<p><br></p>
<p>echo<br></p>
<p>foxtrot<br><br></p>
<p><br>golf</p>
<p><br><br>hotel</p>
<p><br>india<br></p>
<p><br><br>juliett<br><br></p>
<h1><br>kilo</h1>
<h1>lima<br></h1>
16 changes: 16 additions & 0 deletions test/fixtures/br/index.md
Expand Up @@ -3,3 +3,19 @@ bravo

charlie
delta

echo

foxtrot

golf

hotel

india

juliett

# kilo

# lima

9 comments on commit a02be0e

@JounQin
Copy link
Member

@JounQin JounQin commented on a02be0e Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this commit, the following will stop working in rehype-remark:

{
  handlers: {
    br(h, node) {
      return h(node, 'html', '<br>')
    },
  },
}

@wooorm Is this expected? I expect this will preserve <br> in markdown.

@wooorm
Copy link
Member Author

@wooorm wooorm commented on a02be0e Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. There isn’t a good reason to use initial and final brs:

<p>some text<br></p>

^-- this may look a bit different on a screen with/without the br, but it doesn’t have a different meaning.

<p>some<br>text</p>

Or:

<td>some<br>text</td>

^-- These two are kept by this patch (and your code handler still runs)

@JounQin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn’t a good reason to use initial and final brs

This makes it possible to add extra new lines in markdown which is supported before.

<p><br></p>
<p><br></p>
<br>

<br>

@wooorm
Copy link
Member Author

@wooorm wooorm commented on a02be0e Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this library more as: “I have some crappy HTML from somewhere on the internet, make it readable in plain markdown”.
Closer to how the Reader functionality works on mac/ios, or alternatively mozilla/readability.

@JounQin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is not. The users was able to preserve all <br>s if they want, they can even preserve all parts of the output ast as a whole html node, but it is now very difficult to do this after this commit. At least this is a BREAKING CHANGE.

@wooorm
Copy link
Member Author

@wooorm wooorm commented on a02be0e Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is not.

Yes, it is the goal of this project. To provide clean markdown.

The users was able to preserve all <br>s if they want,

Your issue was that invalid markdown was generated. That is now fixed. Correct markdown is now generated.

At least this is a BREAKING CHANGE.

It fixed the problem you raised. You used a workaround before. That workaround does still work for semantic HTML. The workaround does not work for unsemantic HTML. Keeping unsemantic HTML in markdown is not a goal of this project.


Why are you using unsemantic HTML?

@JounQin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is said HTML in Markdown is supported: https://github.com/syntax-tree/hast-util-to-mdast#html-in-markdown

Is the mentioned svg also unsemantic?

Another awesome feature of Markdown is that you can author HTML inside it. As we focus on readability we don’t do that, but you can by passing a handler.


Why are you using unsemantic HTML?

The current changes are fine to my own personally, but I don't think we should remove the ability to keep simple <br>.

@wooorm
Copy link
Member Author

@wooorm wooorm commented on a02be0e Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the mentioned svg also unsemantic?

No, SVG has semantics in HTML. That example is fine.
br also has semantics in HTML. If they are used in poems or addresses. Not if they just add some margin
See MDN for more info: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/br

@JounQin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this library more as: “I have some crappy HTML from somewhere on the internet, make it readable in plain markdown”.

I still disagree with this clarification, because it's not documented anywhere AFAIK.

Removing a feature in a bug fix also seems wrong to me.

Please sign in to comment.