You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a feature request, not a bug report. I attempted to unmarshal/decode invalid xml while using Strict=false and the list of 'AutoClose'ing tags. Here's an example: https://play.golang.org/p/ITJmTGZt96
What did you expect to see?
I initially expected for 'autoclosed' tags to contain the CharData immediately following the opening tag.
What did you see instead?
Instead, autoclosed tags were closed immediately, even before character data immediately following them.
I understand that in the case of malformed XML, it is ambiguous how it can/should be fixed up so there's no general way to know exactly what should be done. In some cases, however, you know that certain elements always expect to contain character data and it can be safely assumed that they should only be autoclosed after consuming it. I would like to request an analogue to Decoder.AutoClose (I'll call it AutoCloseAfterChardata for lack of a better name) which behaves identically to Decoder.AutoClose, except that it never closes the element on the top of the tag stack if the current token being considered is CharData.
The text was updated successfully, but these errors were encountered:
bradfitz
changed the title
encoding/xml: Allow alternative Decoder.AutoClose behavior
proposal: encoding/xml: Allow alternative Decoder.AutoClose behavior
Mar 21, 2017
I think 'Strict=false' is basically frozen. This was added to parse HTML as a kludge before we had an HTML parser. Now we do have an HTML parser (in x/net/html), so for HTML you should use that. Otherwise real XML should have its closing tags included properly.
@rsc My use case is that I have some SGML (I know, I know) I need to parse which is neither valid XML nor HTML. Unfortunately 'use valid XML' isn't an option for me since I'm implementing the OFX specification (http://ofx.net), and many banks still use the older SGML version of the spec.
Do you have a suggestion for how best to parse this SGML (or is implementing this functionality myself and maintaining a fork of the encoding/xml package myself my best bet)?
The right way forward is for you to make a copy of encoding/xml and adjust the non-strict mode to your liking. It shouldn't be hard, it just doesn't need to go back into the standard library.
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.7.5 linux/amd64
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/aclindsa/go"
GORACE=""
GOROOT="/usr/lib/go"
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build155808714=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
What did you do?
This is a feature request, not a bug report. I attempted to unmarshal/decode invalid xml while using Strict=false and the list of 'AutoClose'ing tags. Here's an example:
https://play.golang.org/p/ITJmTGZt96
What did you expect to see?
I initially expected for 'autoclosed' tags to contain the CharData immediately following the opening tag.
What did you see instead?
Instead, autoclosed tags were closed immediately, even before character data immediately following them.
I understand that in the case of malformed XML, it is ambiguous how it can/should be fixed up so there's no general way to know exactly what should be done. In some cases, however, you know that certain elements always expect to contain character data and it can be safely assumed that they should only be autoclosed after consuming it. I would like to request an analogue to Decoder.AutoClose (I'll call it AutoCloseAfterChardata for lack of a better name) which behaves identically to Decoder.AutoClose, except that it never closes the element on the top of the tag stack if the current token being considered is CharData.
The text was updated successfully, but these errors were encountered: