Skip to content

x/net/html: ParseFragment fails to parse sub-table elements in the root position #7232

@gopherbot

Description

@gopherbot

by algorithmicimperative:

1. Use `html.ParseFragment` to parse a fragment of HTML where the root elements are
`<tbody>`, `<tr>` or `<td>` (and probably other table sub-elements)

For example:

s := `<td>first</td>
    <td>second</td>
    <td>third</td>
`
doc, err := html.ParseFragment(strings.NewReader(s), &html.Node{
    Type: html.ElementNode,
    Data: "body",
    DataAtom: atom.Body,
})


2. Check the result `fmt.Printf("%#v\n", doc)`


What is the expected output?

`[]*html.Node` of 3 `td` elements


What do you see instead?

`[]*html.Node` of a single text node containing the `first second third` text.


Which operating system are you using? Linux


Which version are you using?  1.2



ParseFragment works fine with other semantically incorrect structures, like
`<option>` elements. Has trouble with table sub-elements though.

If this isn't a bug and is failing by design, perhaps we need something like
`atom.DocumentFragment` that will receive any arbitrary HTML.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions