proposal: add natural XML proposal

See golang/go#13504 Change-Id: Ie9877b10ae3eed8ad5e5763d35e48d94c6f8f584 Reviewed-on: https://go-review.googlesource.com/30364 Reviewed-by: Russ Cox <rsc@golang.org>
golang · Mar 7, 2017 · f2f2bb9 · f2f2bb9
1 parent d7b1d9e
commit f2f2bb9
Showing 1 changed file with 134 additions and 0 deletions.
diff --git a/design/13504-natural-xml.md b/design/13504-natural-xml.md
@@ -0,0 +1,134 @@
+# Proposal: Natural XML
+
+Author(s): Sam Whited <sam@samwhited.com>
+
+Last updated: 2016-09-27
+
+Discussion at https://golang.org/issue/13504.
+
+
+## Abstract
+
+The `encoding/xml` API is arguably difficult to work with.
+In order to fix these issues, a more natural API is needed that acts on nodes in
+a tree like structure instead of directly on the token stream.
+
+
+## Background
+
+XML parsers generally operate in one of two modes of operation, a "DOM style"
+mode in which entire documents are parsed into a tree-like data structure, the
+"Document Object Model" (DOM), and an event-driven "SAX style" mode  (Simple API
+for XML) in which tokens are streamed one at a time and only handled if they
+would trigger a callback or event.
+The benefit of a DOM style node is that all information contained in the XML is
+rapidly accessible and can be accessed at will, whereas in a SAX style mode
+only information at the current parse location is readily available and other
+arrangements have to be made to store previously visible information.
+However, the SAX style mode generally provides a relatively small and stable
+memory footprint, while the DOM style mode requires parsers to load an entire
+document into memory.
+
+Go currently supports a hybrid approach to this situation: entire documents or
+elements may be read into native data structures, or individual tokens may be
+read off the wire and handled directly by the application.
+This works well for simple elements where the entire structure is known, but for
+XML with an arbitrary format it forces use of the low-level token stream APIs
+directly which is error prone and cumbersome.
+
+
+## Proposal
+
+Having a higher level tree-like API will allow users to manipulate arbitrary XML
+in a more natural way that is compatible with Go's hybrid SAX and DOM style
+approach to parsing XML.
+
+
+### Implementation
+
+An interface originally [suggested][167632824] by RSC is proposed:
+
+[167632824]: https://github.com/golang/go/issues/13504#issuecomment-167632824
+
+
+```go
+// An Element represents the complete parse of a single XML element.
+type Element struct {
+	StartElement
+	Child []Child
+}
+
+// A Child is an interface holding one of the element child types:
+// *Element, CharData, or Comment.
+type Child interface{}
+```
+
+The `*Element` type will implement `xml.Marshaler` and `xml.Unmarshaler` to make
+it compatible with the existing  `(*xml.Encoder) Encode` and `(*xml.Decoder)
+Decode` methods for situations where entire XML elements should be consumed.
+This makes it compatible with both styles of XML parsing in Go.
+For example, an entire element could be unmarshaled simply:
+
+```go
+el := xml.Element{}
+err := d.Decode(&el)
+```
+
+Or specific children could be unmarshaled:
+
+```go
+tok, _ := d.Token()
+el := xml.Element{StartElement: tok.(StartElement)}
+
+// Only unmarshal the child named "body"
+for ; err == nil; tok, err = d.Token() {
+	if start, ok := tok.(StartElement); ok && start.Name.Local == "body" {
+		child := xml.Child{}
+		_ = xml.DecodeElement(&child, start)
+		el.Child = append(el.Child, child)
+	}
+}
+```
+
+The author volunteers to complete this work in the next release cycle with
+enough time left after this proposal is accepted and conservatively estimates
+that a week of work would be required to complete the changes, including tests.
+The changes themselves are relatively easy and this lengthy estimate is mostly
+because the authors time is limited to evenings and weekends.
+If someone who's job permitted them to work on Go were to accept the task, the
+work could almost certainly be completed much quicker.
+
+
+## Rationale
+
+For large XML documents or streams that cannot be parsed all at once, the given
+approach does make parsing less complicated since we still have to iterate over
+the token stream.
+It may be possible to fix this by adding new methods to the `*xml.Encode` and
+`*xml.Decode` types specifically for dealing with elements, but the author
+deems that the benefit is not worth the added complexity to the XML package.
+The current solution is simple and does not preclude adding a more robust
+Element based API at a later date.
+
+
+## Compatibility
+
+This proposal does not introduce any changes that would break compatibility
+with existing code.
+It adds two types which would need to be covered under the compatibility
+promise in the future.
+
+
+## Open issues (if applicable)
+
+* For elements with large numbers of children, accessing a specific child via
+  a slice may be slow.
+  Using a map would be a simple fix, but this makes accessing arrays with few
+  elements slower (the crossover is somewhere around 10 elements in a very
+  informal benchmark).
+  Using a trie or some other appropriate tree-like structure can give us the
+  best of both worlds, but adds a great deal of complexity that is almost
+  certainly not worth it.
+  It may, however, be worth not making the children slice public (and using
+  accessor methods instead) so that the implementation could easily be switched
+  out at a later date.