New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: XML CDATA section could be joined together with regular characters #12611

Open
pgundlach opened this Issue Sep 14, 2015 · 0 comments

Comments

Projects
None yet
2 participants
@pgundlach
Copy link

pgundlach commented Sep 14, 2015

go version go1.5 darwin/amd64

One thing I stumbled across yesterday (not a real bug, but a minor nuisance from a user's perspective perhaps):

package main

import (
    "encoding/xml"
    "fmt"
    "strings"
)

func main() {
    src := `<root>a<![CDATA[b]]>c</root>`
    r := strings.NewReader(src)

    dec := xml.NewDecoder(r)
    for {
        tok, err := dec.Token()
        if err != nil {
            fmt.Println(err)
            break
        }
        fmt.Printf("%#v\n", tok)
    }
}

gives

xml.StartElement{Name:xml.Name{Space:"", Local:"root"}, Attr:[]xml.Attr{}}
xml.CharData{0x61}
xml.CharData{0x62}
xml.CharData{0x63}
xml.EndElement{Name:xml.Name{Space:"", Local:"root"}}
EOF

I would expect one xml.CharData{} token instead:

xml.StartElement{Name:xml.Name{Space:"", Local:"root"}, Attr:[]xml.Attr{}}
xml.CharData{0x61, 0x62, 0x63}
xml.EndElement{Name:xml.Name{Space:"", Local:"root"}}
EOF

While I understand the source of the three tokens, I would expect one as the user (= me) is unable to distinguish between a CDATA node and a regular text node.

@ianlancetaylor ianlancetaylor changed the title XML CDATA section could be joined together with regular characters encoding/xml: XML CDATA section could be joined together with regular characters Sep 14, 2015

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Sep 14, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment