Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/xml: Allow alternative Decoder.AutoClose behavior #19506

Closed
aclindsa opened this Issue Mar 11, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@aclindsa
Copy link

aclindsa commented Mar 11, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.7.5 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/aclindsa/go"
GORACE=""
GOROOT="/usr/lib/go"
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build155808714=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"

What did you do?

This is a feature request, not a bug report. I attempted to unmarshal/decode invalid xml while using Strict=false and the list of 'AutoClose'ing tags. Here's an example:
https://play.golang.org/p/ITJmTGZt96

What did you expect to see?

I initially expected for 'autoclosed' tags to contain the CharData immediately following the opening tag.

What did you see instead?

Instead, autoclosed tags were closed immediately, even before character data immediately following them.

I understand that in the case of malformed XML, it is ambiguous how it can/should be fixed up so there's no general way to know exactly what should be done. In some cases, however, you know that certain elements always expect to contain character data and it can be safely assumed that they should only be autoclosed after consuming it. I would like to request an analogue to Decoder.AutoClose (I'll call it AutoCloseAfterChardata for lack of a better name) which behaves identically to Decoder.AutoClose, except that it never closes the element on the top of the tag stack if the current token being considered is CharData.

@bradfitz bradfitz changed the title encoding/xml: Allow alternative Decoder.AutoClose behavior proposal: encoding/xml: Allow alternative Decoder.AutoClose behavior Mar 21, 2017

@gopherbot gopherbot added this to the Proposal milestone Mar 21, 2017

@gopherbot gopherbot added the Proposal label Mar 21, 2017

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Mar 27, 2017

I think 'Strict=false' is basically frozen. This was added to parse HTML as a kludge before we had an HTML parser. Now we do have an HTML parser (in x/net/html), so for HTML you should use that. Otherwise real XML should have its closing tags included properly.

@rsc rsc closed this Mar 27, 2017

@aclindsa

This comment has been minimized.

Copy link
Author

aclindsa commented Mar 27, 2017

@rsc My use case is that I have some SGML (I know, I know) I need to parse which is neither valid XML nor HTML. Unfortunately 'use valid XML' isn't an option for me since I'm implementing the OFX specification (http://ofx.net), and many banks still use the older SGML version of the spec.

Do you have a suggestion for how best to parse this SGML (or is implementing this functionality myself and maintaining a fork of the encoding/xml package myself my best bet)?

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Mar 29, 2017

The right way forward is for you to make a copy of encoding/xml and adjust the non-strict mode to your liking. It shouldn't be hard, it just doesn't need to go back into the standard library.

@golang golang locked and limited conversation to collaborators Mar 29, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.