Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasklists don't work if checkbox is a direct child of a list item #80

Closed
4 tasks done
Mr0grog opened this issue Aug 16, 2023 · 12 comments · Fixed by #81
Closed
4 tasks done

Tasklists don't work if checkbox is a direct child of a list item #80

Mr0grog opened this issue Aug 16, 2023 · 12 comments · Fixed by #81
Labels
💪 phase/solved Post is done

Comments

@Mr0grog
Copy link
Contributor

Mr0grog commented Aug 16, 2023

Initial checklist

Affected packages and versions

has-util-to-mdast@10.0.0

Link to runnable example

No response

Steps to reproduce

Tasklists (lists where each item starts with a checkbox) are parsed correctly if the leading checkbox <input> element in an <li> node is nested in a <p> node, but not if the <input> is a direct child of the <li>. I’m not sure why this is a requirement (it’s certainly not in the HTML spec: https://html.spec.whatwg.org/multipage/grouping-content.html#the-li-element), so I assume it’s a bug. (But I appreciate also handling this common case where things are in a paragraph in the list item, which is best practice HTML.)

You can see where this explicit check for a <p> is implemented in lib/handlers/li.js:

export function li(state, node) {
const head = node.children[0]
/** @type {boolean | null} */
let checked = null
/** @type {Element | undefined} */
let clone
// Check if this node starts with a checkbox.
if (head && head.type === 'element' && head.tagName === 'p') {
const checkbox = head.children[0]
if (
checkbox &&
checkbox.type === 'element' &&
checkbox.tagName === 'input' &&
checkbox.properties &&
(checkbox.properties.type === 'checkbox' ||
checkbox.properties.type === 'radio')
) {
checked = Boolean(checkbox.properties.checked)
clone = {
...node,
children: [
{...head, children: head.children.slice(1)},
...node.children.slice(1)
]
}
}

I used this script to test:

import {inspect} from 'node:util';
import {toMdast} from 'hast-util-to-mdast';

const mdast = toMdast(hastTree);
console.log(inspect(mdast, false, 100, true));

Expected behavior

I expected that this script:

import {inspect} from 'node:util';
import {toMdast} from 'hast-util-to-mdast';

const mdast = toMdast({
  type: 'root',
  children: [
    {
      type: 'element',
      tagName: 'ul',
      children: [
        {
          type: 'element',
          tagName: 'li',
          children: [
            {
              type: 'element',
              tagName: 'input',
              properties: { type: 'checkbox', checked: true }
            },
            {
              type: 'text',
              value: 'Checked'
            }
          ]
        },
        {
          type: 'element',
          tagName: 'li',
          children: [
            {
              type: 'element',
              tagName: 'input',
              properties: { type: 'checkbox', checked: false }
            },
            {
              type: 'text',
              value: 'Unhecked'
            }
          ]
        }
      ]
    }
  ]
});

console.log(inspect(mdast, false, 100, true));

…to output:

{
  type: 'root',
  children: [
    {
      type: 'list',
      ordered: false,
      start: null,
      spread: false,
      children: [
        {
          type: 'listItem',
          spread: false,
          checked: true,
          children: [
            {
              type: 'paragraph',
              children: [ { type: 'text', value: 'Checked' } ]
            }
          ]
        },
        {
          type: 'listItem',
          spread: false,
          checked: false,
          children: [
            {
              type: 'paragraph',
              children: [ { type: 'text', value: 'Unhecked' } ]
            }
          ]
        }
      ]
    }
  ]
}

…which corresponds to this Markdown:

- [x] Checked
- [ ] Unchecked

Actual behavior

Instead, the above script outputs:

{
  type: 'root',
  children: [
    {
      type: 'list',
      ordered: false,
      start: null,
      spread: false,
      children: [
        {
          type: 'listItem',
          spread: false,
          checked: null,
          children: [
            {
              type: 'paragraph',
              children: [ { type: 'text', value: '[x]Checked' } ]
            }
          ]
        },
        {
          type: 'listItem',
          spread: false,
          checked: null,
          children: [
            {
              type: 'paragraph',
              children: [ { type: 'text', value: '[ ]Unhecked' } ]
            }
          ]
        }
      ]
    }
  ]
}

…which corresponds to this Markdown:

- \[x]Checked
- \[ ]Unchecked

Instead, to get the expected output, you. need to do:

import {inspect} from 'node:util';
import {toMdast} from 'hast-util-to-mdast';

const mdast = toMdast({
  type: 'root',
  children: [
    {
      type: 'element',
      tagName: 'ul',
      children: [
        {
          type: 'element',
          tagName: 'li',
          children: [
            {
              type: 'element',
              tagName: 'p',
              children: [
                {
                  type: 'element',
                  tagName: 'input',
                  properties: { type: 'checkbox', checked: true }
                },
                {
                  type: 'text',
                  value: 'Checked'
                }
              ]
            }
          ]
        },
        {
          type: 'element',
          tagName: 'li',
          children: [
            {
              type: 'element',
              tagName: 'p',
              children: [
                {
                  type: 'element',
                  tagName: 'input',
                  properties: { type: 'checkbox', checked: false }
                },
                {
                  type: 'text',
                  value: 'Unhecked'
                }
              ]
            }
          ]
        }
      ]
    }
  ]
});

console.log(inspect(mdast, false, 100, true));

Affected runtime and version

node>=16

Affected package manager and version

npm@9.5.1

Affected OS and version

macOS 13.5

Build and bundle tools

No response

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Aug 16, 2023
@Mr0grog
Copy link
Contributor Author

Mr0grog commented Aug 16, 2023

Quick update: you can also test this with actual HTML using hast-util-from-html in case there's any concern I’ve created a malformed tree in the original post:

import {inspect} from 'node:util';
import {fromHtml} from 'hast-util-from-html';
import {toMdast} from 'hast-util-to-mdast';

const hast1 = fromHtml(`<ul>
  <li>
    <input type="checkbox" checked> Checked
  </li>
  <li>
    <input type="checkbox"> Unchecked
  </li>
</ul>`, {fragment: true, verbose: false});
const mdast1 = toMdast(hast1);
console.log('Bare inputs:' + inspect(mdast1, false, 100, true));

console.log('\n-----------------\n');

const hast2 = fromHtml(`<ul>
  <li>
    <p><input type="checkbox" checked> Checked</p>
  </li>
  <li>
    <p><input type="checkbox"> Unchecked</p>
  </li>
</ul>`, {fragment: true, verbose: false});
const mdast2 = toMdast(hast2);
console.log('Paragraph inputs:' + inspect(mdast2, false, 100, true));

@wooorm
Copy link
Member

wooorm commented Aug 17, 2023

Sure, good to have, looks like a bugfix indeed.

Something like:

  let head = node.children[0]

  if (head && head.type === 'element' && head.tagName === 'p') {
    head = head.children[0]
  }

  if (head && head.type === 'element' && head.tagName === 'input' )

I think?

Are you interested in doing the work?

@Mr0grog
Copy link
Contributor Author

Mr0grog commented Aug 17, 2023

Sure, happy to take this. 👍

Related question: should this try and handle cases where the first node is a comment? (I was also going to ask about whitespace, but it looks like that's already taken care of by the time we get to this step.) For example:

<ul>
  <li>
    <!-- Still mark the list item as a tasklist item even though this comment is here? -->
    <input type="checkbox" checked> Checked
  </li>
</ul>

@Mr0grog
Copy link
Contributor Author

Mr0grog commented Aug 17, 2023

Also, I guess, any other containing element, too, not just a <p>?

<ul>
  <li>
    <span><input type="checkbox" checked> Checked</span>
  </li>
  <li>
    <label><input type="checkbox" checked> Checked</label>
  </li>
  <li>
    <div><input type="checkbox" checked> Checked</div>
  </li>
</ul>

There are probably a handful it doesn't make sense to look inside (<picture>? Void elements that should have no children like <img>?), but not sure if actually checking the tagName is worthwhile.

And nesting:

<ul>
  <li>
    <p><span><input type="checkbox" checked> Checked</span></p>
  </li>
</ul>

Useful or too complex?

Just trying to think of what's worth covering if I’m making this part more flexible.

@wooorm
Copy link
Member

wooorm commented Aug 18, 2023

Compatibility with markdown is important but I don‘t think we should look into arbitrary elements, perhaps it’s a form, or something else, that could lead to bugs.

The comment… Maybe. If you feel like it feel free to do it.
A whitespace text node might be good to have, though?

@Mr0grog
Copy link
Contributor Author

Mr0grog commented Aug 18, 2023

Makes sense, I’ll try and do this Sunday if I've got time. Otherwise sometime next week.

A whitespace text node might be good to have, though?

It seems like whitespace nodes at the start of <li> and <p> elements are removed by rehype-minify-whitespace before walking the tree and handling each element, so I don’t think the li handler actually needs to worry about that (unless I’m misunderstanding what you’re getting at):

export function toMdast(tree, options) {
// We have to clone, cause we’ll use `rehype-minify-whitespace` on the tree,
// which modifies
const cleanTree = structuredClone(tree)
const settings = options || emptyOptions
const state = createState(settings)
/** @type {MdastNodes} */
let mdast
// To do: use `satisfies` in `rehype-minify-whitespace`
// @ts-expect-error: does return a transformer, that does accept any node.
rehypeMinifyWhitespace({newlines: settings.newlines === true})(cleanTree)
visit(cleanTree, function (node) {

@Mr0grog
Copy link
Contributor Author

Mr0grog commented Aug 20, 2023

The comment… Maybe. If you feel like it feel free to do it.

Well, I wrote a fairly complex version of this that skipped over leading comments, but it didn’t work because…

  1. state.wrap() won’t wrap the comment in an Mdast paragraph node because it can’t tell if it’s phrasing content (reasonable, since the comment is just a type="html" node, rather than its own special type).
  2. mdast-util-task-list-gfm-item then requires a task list item to start with a paragraph, otherwise it treats it as a regular old list item. https://github.com/syntax-tree/mdast-util-gfm-task-list-item/blob/ef2d59f69f6d0da6147b24af37f671a1b5a6803f/lib/index.js#L112-L115

Possible changes to address this seem fairly complex and out of scope here, so I’m going to unwind this code and drop the idea of supporting comments. (This also suggests handling other kinds of container elements for the checkbox besides <p> will be similarly complicated if they aren’t phrasing elements, e.g. div. I know we already decided not to do that here, but figured it’s worth noting. OTOH, maybe that suggests a useful middle ground — the checkbox could be wrapped in a <p> or any phrasing element?)

Unfortunately trying to figure out what was going on there used all my remaining time for this today, so I’ll probably get back to it sometime later.

@wooorm
Copy link
Member

wooorm commented Aug 25, 2023

It seems like whitespace nodes at the start of

  • and

    elements are removed by rehype-minify-whitespace before walking the tree and handling each element, so I don’t think the li handler actually needs to worry about that (unless I’m misunderstanding what you’re getting at):

  • Ah perfect, I didn’t think about that!

    1. mdast-util-task-list-gfm-item then requires a task list item to start with a paragraph, otherwise it treats it as a regular old list item

    In mdast, there must be a paragraph node in listItem. It doesn’t always translate to an HTML <p> element.
    The reason is that in HTML, the content of li could be whatever, in the markdown AST it’s strictly always flow content (paragraphs and such).

    1. state.wrap() won’t wrap the comment in an Mdast paragraph node because it can’t tell if it’s phrasing content (reasonable, since the comment is just a type="html" node, rather than its own special type).

    we could try to detect comments (starts with <!--). But that seems more complexity/hassle than it’s worth.

    Unfortunately trying to figure out what was going on there used all my remaining time for this today, so I’ll probably get back to it sometime later.

    Sure, thanks. No rush from me! Or, if you can drop the code here, I or someone else can see if it can be made into a PR!

    Mr0grog added a commit to Mr0grog/hast-util-to-mdast that referenced this issue Sep 4, 2023
    Previously, list items with leading checkboxes would only be treated as task list items if the checkbox was wrapped in a `<p>` element, like:
    
        <li><p><input type="checkbox"> Other content</p></li>
    
    This adds support for detecting task lists even when the checkbox is not nested in a `<p>`:
    
        <li><input type="checkbox"> Other content</li>
    
    This does *not* add support for some other complex scenarios, such as the checkbox being inside other phrasing content, being more deeply nested, or having comments in front of it:
    
        <li><strong><input type="checkbox"> Other content</strong></li>
        <li><p><strong><input type="checkbox"> Other content</strong></p></li>
        <li><!-- Comment --><input type="checkbox"> Other content</li>
    
    Fixes syntax-tree#80.
    @Mr0grog
    Copy link
    Contributor Author

    Mr0grog commented Sep 4, 2023

    Just got back to this today and will post a PR momentarily. Sorry for the delay and the lack of replies; I had a last minute family issue come up.

    In mdast, there must be a paragraph node in listItem. It doesn’t always translate to an HTML <p> element.

    Ah, that makes sense. I wasn't quite familiar enough with the ins and outs of the Mdast model to catch that the paragraph type is conceptually different from a <p> element and is required there.

    we could try to detect comments (starts with <!--). But that seems more complexity/hassle than it’s worth.

    Yeah, that makes sense. It's way out of scope for this change, but I wonder if adding a comment type to Mdast would be valuable (Seems like it might be useful for things like MDX, where the comment syntax is different; not sure what they do to handle this case now). In any case, adding and supporting that across all the tools seems like a huge task.

    @wooorm
    Copy link
    Member

    wooorm commented Sep 5, 2023

    Comments in MDX are expressions, just like {Math.PI}, but then they contain just a comment. So it’s an embedded AST in an AST

    @Mr0grog
    Copy link
    Contributor Author

    Mr0grog commented Sep 6, 2023

    Right, I guess I was thinking more about converting HTML or MD (or something else) → MDX, so you’d need to know that <!-- whatever --> (or a comment in whatever syntax you’re converting from) translates to {/* whatever */} instead of being left as-is in Markdown. But I bet few people are really doing that conversion. Huge amounts of work for almost no payoff at all. 🙃

    (I think I had this on my mind because someone recently filed an issue on https://github.com/Mr0grog/google-docs-to-markdown that turned out to be that they were trying to use the output in MDX, which didn’t always work because the output is actually Markdown, not MDX.)

    @wooorm
    Copy link
    Member

    wooorm commented Sep 24, 2023

    md -> mdx might be interesting for several folks! But that’s a different discussion!

    wooorm pushed a commit that referenced this issue Sep 30, 2023
    Closes GH-80.
    Closes GH-81.
    
    Reviewed-by: Titus Wormer <tituswormer@gmail.com>
    @wooorm wooorm added the 💪 phase/solved Post is done label Sep 30, 2023
    @github-actions github-actions bot removed the 🤞 phase/open Post is being triaged manually label Sep 30, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    💪 phase/solved Post is done
    Development

    Successfully merging a pull request may close this issue.

    2 participants