Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: parser does not accumulate nested directives properly #1549

Closed
gopherbot opened this Issue Feb 24, 2011 · 4 comments

Comments

Projects
None yet
3 participants
@gopherbot
Copy link

gopherbot commented Feb 24, 2011

by ehog.hedge:

1. Compile the attached tryx.go
2. Run with input redirected to the attached some.xml

What is the expected output?

A single `directive` object with nested !ENTITYs

What do you see instead?

An incomplete DOCTYPE directive, four ENTITY directives,
and trailing text with the final ]>.

Which compiler are you using (5g, 6g, 8g, gccgo)?
Which operating system are you using?
Which revision are you using?  (hg identify)

8g, Fedora Core 9, ef61c195edc3+ tip

code from xml.go:

543                       // Probably a directive: <!DOCTYPE ...>,
<!ENTITY ...>, etc.
  544                  // We don't care, but accumulate for caller.
  545                  p.buf.Reset()
  546                  p.buf.WriteByte(b)
  547                  for {
  548                          if b, ok = p.mustgetc(); !ok {
  549                                  return nil, p.err
  550                          }
  551                          if b == '>' {
  552                                  break
  553                          }
  554                          p.buf.WriteByte(b)
  555                  }
  556                  return Directive(p.buf.Bytes()), nil

This cuts the !DOCTYPE off at the closing > of the !ENTITY
(and discards the >). So the caller doesn't get the entire
!DOCTYPE, and nor does it get all of the text necessary for
reconstructing the !DOCTYPE. Unless, I suppose, it has to
keep pulling Directives and joining them with > until it gets
the ]>? In which case that's pretty horrid.

Looking at the XML spec

  http://www.w3.org/TR/REC-xml/

looks like a directive can have nested other directives,
and as far as I can see, <> may also appear inside 
quoted attributes.

So a revised version of that code could count <> and
only finish accumulation of the text when it hits a 
properly balanced >, not counting < or > if they are
inside '...' or "..." strings.

That would mean that the entire outermost directive, and all
its nested directives, would be available to the program
using the parser. As is currently the case, what it chooses
to do with the contents is up to it.

Attachments:

  1. some.xml (430 bytes)
@robpike

This comment has been minimized.

Copy link
Contributor

robpike commented Feb 24, 2011

Comment 1:

You didn't attach tryx.go. Although it's probably trivial, could you please provide it?

Owner changed to r...@golang.org.

Status changed to Accepted.

@robpike

This comment has been minimized.

Copy link
Contributor

robpike commented Feb 24, 2011

Comment 2:

Status changed to WaitingForReply.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Feb 24, 2011

Comment 4:

Waiting for your CL...

Status changed to Accepted.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Feb 28, 2011

Comment 5:

This issue was closed by revision b00f731.

Status changed to Fixed.

@mikioh mikioh changed the title xml parser does not accumulate nested directives properly encoding/xml: parser does not accumulate nested directives properly Jan 9, 2015

@golang golang locked and limited conversation to collaborators Jun 24, 2016

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.