Skip to content

encoding/xml: XML CDATA section could be joined together with regular characters #12611

Open
@pgundlach

Description

@pgundlach

go version go1.5 darwin/amd64

One thing I stumbled across yesterday (not a real bug, but a minor nuisance from a user's perspective perhaps):

package main

import (
    "encoding/xml"
    "fmt"
    "strings"
)

func main() {
    src := `<root>a<![CDATA[b]]>c</root>`
    r := strings.NewReader(src)

    dec := xml.NewDecoder(r)
    for {
        tok, err := dec.Token()
        if err != nil {
            fmt.Println(err)
            break
        }
        fmt.Printf("%#v\n", tok)
    }
}

gives

xml.StartElement{Name:xml.Name{Space:"", Local:"root"}, Attr:[]xml.Attr{}}
xml.CharData{0x61}
xml.CharData{0x62}
xml.CharData{0x63}
xml.EndElement{Name:xml.Name{Space:"", Local:"root"}}
EOF

I would expect one xml.CharData{} token instead:

xml.StartElement{Name:xml.Name{Space:"", Local:"root"}, Attr:[]xml.Attr{}}
xml.CharData{0x61, 0x62, 0x63}
xml.EndElement{Name:xml.Name{Space:"", Local:"root"}}
EOF

While I understand the source of the three tokens, I would expect one as the user (= me) is unable to distinguish between a CDATA node and a regular text node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions