Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: Parsing of CDATA yields incorrect error. #1112

Closed
gopherbot opened this Issue Sep 16, 2010 · 3 comments

Comments

Projects
None yet
3 participants
@gopherbot
Copy link

gopherbot commented Sep 16, 2010

by jimteeuwen:

What steps will reproduce the problem?
Build and run the following code:

package main

import "xml"
import "strings"
import "fmt"
import "os"

func main() {
    fragment := `<test><![CDATA[ &val=foo ]]></test>`
    parser := xml.NewParser(strings.NewReader(fragment))
    
    var err os.Error

    for {
        if _, err = parser.Token(); err != nil {
            if err == os.EOF {
                return
            }
            fmt.Fprintf(os.Stderr, "Xml Error: %s\n", err)
            return
        }
    }
}

What is the expected output?
Program should run without error.

What do you see instead?
Xml Error: XML syntax error on line 1: invalid character entity &val;

Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
64-bit ArchLinux

Which revision are you using?  (hg identify)
e1752be5d932 tip

Please provide any additional information below.
The XML parser in the Go xml package should ignore special tokens like & and < or
> when they are inside a <![CDATA[ ... ]]> tag.
The current implementation does not and consequently yields parse errors where it should
not.
@gopherbot

This comment has been minimized.

Copy link
Author

gopherbot commented Sep 16, 2010

Comment 1 by jimteeuwen:

For completeness, here is a link and excerpt of the XML specification:
http://www.w3.org/TR/REC-xml/#sec-cdata-sect
[Definition: CDATA sections may occur anywhere character data may occur; they are used
to escape blocks of text containing characters which would otherwise be recognized as
markup. CDATA sections begin with the string " <![CDATA[ " and end with the string "
]]> ":]
Within a CDATA section, only the CDEnd string ("]]>") is recognized as markup, so
that left angle brackets and ampersands may occur in their literal form; they need not
(and cannot) be escaped using " < " and " & ". CDATA sections cannot nest.
@adg

This comment has been minimized.

Copy link
Contributor

adg commented Sep 17, 2010

Comment 2:

Labels changed: added packagebug.

Status changed to HelpWanted.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Sep 24, 2010

Comment 3:

This issue was closed by revision 8d87cca.

Status changed to Fixed.

@mikioh mikioh changed the title package 'xml'. Parsing of CDATA yields incorrect error. encoding/xml: Parsing of CDATA yields incorrect error. Jan 9, 2015

@golang golang locked and limited conversation to collaborators Jun 24, 2016

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.