Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: support for collecting all attributes #3633

Closed
rsc opened this Issue May 17, 2012 · 29 comments

Comments

Projects
None yet
@rsc
Copy link
Contributor

rsc commented May 17, 2012

Received via private mail.  Think about for Go 1.1.

---

I'm currently using "encoding/xml" to read some XML into some structs. All is
going well until I hit an XML type that could have n number of attributes with p number
of child nodes (and each child node can follow the same rules). I think I have the child
node thing solved, but what about collecting all of the attributes?

This is what I have at this point:

    type Extensions struct {
        XMLName xml.Name
        Attrs   []string     `xml:",attr"`     // Does not work. Need a suggestion here.
        Data    string       `xml:",chardata"`
        Nodes   []Extensions `xml:",any"`
    }

Thanks in advance for any help.
@niemeyer

This comment has been minimized.

Copy link
Contributor

niemeyer commented May 17, 2012

Comment 1:

We already have xml.Attr. This should be supported:
type Extensions struct {
        ...
        Attrs []xml.Attr
        ...
}
@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented May 17, 2012

Comment 2:

Sounds good.  What's the trigger?  Any []xml.Attr?  Does there need to
be a tag like ,any?
@niemeyer

This comment has been minimized.

Copy link
Contributor

niemeyer commented May 17, 2012

Comment 3:

The type seems enough of a hint in the described case, but we should probably enforce
the use of ",attr" with it, to sanitize the interaction with attributes in nested
elements.
We have this today, which is quite useful:
    Value  string    `xml:"sub>node"
Several people asked to complement with this:
    Attr  string     `xml:"sub>node>attrname,attr"
So this would be the counterpart:
    Attrs []xml.Attr `xml:"sub>node,attr"`
And in the simple case:
    Attrs []xml.Attr `xml:",attr"`
@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented May 17, 2012

Comment 4:

sgtm
@anacrolix

This comment has been minimized.

Copy link
Contributor

anacrolix commented May 31, 2012

Comment 5:

Is there any way to make use of xml.Attr this way in the xml package for Go 1.0? Do I
have to use the []string `xml:",attr"` for now?
@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented Sep 12, 2012

Comment 6:

Should probably start on this if its for Go 1.1.
@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented Dec 10, 2012

Comment 7:

Labels changed: added size-m.

@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented Mar 12, 2013

Comment 8:

I am sad to say it, but I think we will have to postpone XML work until
after Go 1.1.
I regret that we didn't have more time to make encoding/xml better, but
given the tradeoff I think focusing on core performance and
implementation pieces for this final release push is probably the right
choice. Unlike most of the performance and other stuff we're trying to
shake out right now, functionality such as XML parsing can be provided
by go get-able libraries as a stopgap until Go 1.2.

Labels changed: added go1.2, removed go1.1.

Owner changed to ---.

@dominikh

This comment has been minimized.

Copy link
Member

dominikh commented Jul 13, 2013

Comment 9:

Is this still being considered for Go 1.2?
@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented Jul 15, 2013

Comment 10:

I said sgtm in #4 but now I am not so sure. All the > confuse me.
Go 1.2 will likely have support for custom marshalers and unmarshalers. Perhaps that
will be good enough and we can postpone this specific thing until we have experience
using those.
@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented Jul 30, 2013

Comment 11:

Labels changed: added feature.

@mattetti

This comment has been minimized.

Copy link
Contributor

mattetti commented Aug 1, 2013

Comment 12:

Let say you have some XML like that:
<FileRef>
  <Name Value="my-doc.pdf" />
</FileRef>
To extract the value info I have to create 2 structure types:
type FooFile struct {
    Filename             AttrValue    `xml:"FileRef>Name"`
}
// Wrapper structure used to extract XML node value attributes (string).
type AttrValue struct {
    Value string `xml:",attr"`
}
And once I unmarshal my XML and get an object of type FooFile, I need to call 
file.Filename.Value()
Being able to use `xml:"FileRef>Name,Value"`  would be nice for sure. I personally
don't find that the > are confusing. Not sure how the new custom unmarshelers will
work tho.
@robpike

This comment has been minimized.

Copy link
Contributor

robpike commented Aug 29, 2013

Comment 13:

Letting this soak until after 1.2 and the new marshaling code gets a chance.

Labels changed: removed go1.2.

@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented Nov 27, 2013

Comment 14:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented Nov 27, 2013

Comment 15:

Labels changed: removed feature.

@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented Dec 4, 2013

Comment 16:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor Author

rsc commented Dec 4, 2013

Comment 17:

Labels changed: added repo-main.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Feb 4, 2014

Comment 18 by andrewjohnmigliore:

Another vote for supporting something like:
    Attr  string     `xml:"sub>node>attrname,attr"
    Attrs []xml.Attr `xml:"sub>node,attr"`
    Attrs []xml.Attr `xml:",attr"`
Without this support, parsing XML that is attribute laden and #CDATA light using
xml.Unmarshal() is just down right plain ugly and goes against the concise nature of
golang!
cheers
@andredasilvapinto

This comment has been minimized.

Copy link

andredasilvapinto commented Dec 10, 2014

+1 for having support for unmarshalling attributes of a specific node without having to replicate the entire struct hierarchy.

http://stackoverflow.com/q/27404456/43046

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

@grmartin

This comment has been minimized.

Copy link

grmartin commented Jun 1, 2015

I still vote for this. Too bad its not a priority for you folks, looks like I'm gonna roll my own.

@raitucarp

This comment has been minimized.

Copy link

raitucarp commented Jun 6, 2015

up.. I need this feature

@Zilog8

This comment has been minimized.

Copy link

Zilog8 commented Sep 25, 2015

A feature like this would make some xml encodings much more concise. For example, it would simplify item handling in MRSS from this:

type Thumbnail struct {
     Url string `xml:"url,attr"`
}

type Content struct {
     Url      string `xml:"url,attr"`
     Bitrate  string `xml:"bitrate,attr"`
     Duration string `xml:"duration,attr"`
     Height   string `xml:"height,attr"`
}

type Item struct {
     Title  string    `xml:"title"`
     Thumb  Thumbnail `xml:"media:thumbnail"`
     Media  Content   `xml:"media:content"`
}

Into this:

type Item struct {
     Title    string `xml:"title"`
     ThumbUrl string `xml:"media:thumbnail>url,attr"`
     MediaUrl string `xml:"media:content>url,attr"`
     Bitrate  string `xml:"media:content>bitrate,attr"`
     Duration string `xml:"media:content>duration,attr"`
     Height   string `xml:"media:content>height,attr"`
}
@mrcook

This comment has been minimized.

Copy link

mrcook commented Oct 15, 2015

I've started to port my Ruby EPUB tool to Go and I'm having trouble with gathering up the book [OPF] Metadata due to there being an arbitrary number/type of nodes, with an arbitrary number/type of attributes. So being able to use Attrs []xml.Attr to collect them up would be a great feature. Is there any possibility of this feature being added?

Here's an example of the kind of data that needs parsing:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
  <dc:identifier id="pub-identifier">_simple_book</dc:identifier>
  <meta refines="#pub-identifier" property="dcterms:identifier">_simple_book</meta>
  <dc:title id="pub-title">A Book</dc:title>
  <meta refines="#pub-title" property="dcterms:title">A Book: Subtitle</meta>
  <dc:date opf:event="original-publication">2015-10-10</dc:date>
  <dc:date opf:event="publication">2015-10-10</dc:date>
  <dc:language>en</dc:language>
  <dc:creator opf:role="aut" opf:file-as="Doe, Jon">Jon Doe</dc:creator>
  <dc:subject>Fiction</dc:subject>
  <dc:description>Some description</dc:description>
  <dc:publisher>A Publisher</dc:publisher>
  <dc:rights>Copyright</dc:rights>
  <meta content="cover-image" name="cover"/>
</metadata>
@ghost

This comment has been minimized.

Copy link

ghost commented Oct 22, 2015

If anyone needs a generic solution to collecting an array of attributes, this is what I use currently:

type Node struct {
    XMLName    xml.Name
    Attributes []xml.Attr
    Data       string
    Nodes      []*Node
}

func (e *Node) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    var nodes []*Node
    var done bool
    for !done {
        t, err := d.Token()
        if err != nil {
            return err
        }
        switch t := t.(type) {
        case xml.CharData:
            e.Data = strings.TrimSpace(string(t))
        case xml.StartElement:
            e := &Node{}
            e.UnmarshalXML(d, t)
            nodes = append(nodes, e)
        case xml.EndElement:
            done = true
        }
    }
    e.XMLName = start.Name
    e.Attributes = start.Attr
    e.Nodes = nodes
    return nil
}

func (e *Node) MarshalXML(enc *xml.Encoder, start xml.StartElement) error {
    start.Name = e.XMLName
    start.Attr = e.Attributes
    return enc.EncodeElement(struct {
        Data  string `xml:",chardata"`
        Nodes []*Node
    }{
        Data:  e.Data,
        Nodes: e.Nodes,
    }, start)
}

It would be nice to have this support built in.

https://play.golang.org/p/o60LVVmpgq

@ghost

This comment has been minimized.

Copy link

ghost commented Oct 22, 2015

Are you accepting contributions for:

type Node struct {
        ...
         Attrs []xml.Attr `xml:",attr"`
        ...
}

If so, I would happily make the change.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 23, 2015

CL https://golang.org/cl/16292 mentions this issue.

@ivankravchenko

This comment has been minimized.

Copy link

ivankravchenko commented Apr 24, 2016

I came up with some kind of working code for @Zilog8's example in
#3688 (comment)

Code diff gist: https://gist.github.com/ivankravchenko/036f68e671e33179b636bd58f6ebc9d0

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 13, 2016

CL https://golang.org/cl/30946 mentions this issue.

@asafschers

This comment has been minimized.

Copy link

asafschers commented Sep 11, 2017

https://golang.org/doc/go1.8 -
Unmarshal now has wildcard support for collecting all attributes using the new ",any,attr" struct tag.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.