Skip to content

lucsky/go-exml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

exml Build Status

The exml package provides an intuitive event based XML parsing API which sits on top of a standard Go encoding/xml/Decoder, greatly simplifying the parsing code while retaining the raw speed and low memory overhead of the underlying stream engine, regardless of the size of the input. The module takes care of the complex tasks of maintaining contexts between event handlers allowing you to concentrate on dealing with the actual structure of the XML document.

Installation

HEAD:

go get github.com/lucsky/go-exml

v3.1.1:

go get gopkg.in/lucsky/go-exml.v3

The third version of exml provides compile time callback safety at the cost of an API CHANGE. Ad hoc $text events have been replaced by the specific OnText and OnTextOf event registration methods. Also new in v3.1: custom xml.Decoder support, type attribute readers, typed assignment/appending shortcuts (AssignT and AppendT) and full API documentation. v3.1.1 fixes a major bug causing the handlers stack to become inconsistent when ignoring tags.

v2:

go get gopkg.in/lucsky/go-exml.v2

The second version of exml has a better implementation based on a dynamic handler tree, allowing global events (see example below), having lower memory usage and also being faster.

v1:

go get gopkg.in/lucsky/go-exml.v1

Initial (and naive) implementation based on a flat list of absolute event paths.

Usage

The best way to illustrate how exml makes parsing very easy is to look at actual examples. Consider the following contrived sample document:

<?xml version="1.0"?>
<address-book name="homies">
    <contact>
        <first-name>Tim</first-name>
        <last-name>Cook</last-name>
        <address>Cupertino</address>
    </contact>
    <contact>
        <first-name>Steve</first-name>
        <last-name>Ballmer</last-name>
        <address>Redmond</address>
    </contact>
    <contact>
        <first-name>Mark</first-name>
        <last-name>Zuckerberg</last-name>
        <address>Menlo Park</address>
    </contact>
</address-book>

Here is a way to parse it into an array of contact objects using exml:

package main

import (
    "fmt"
    "os"

    "gopkg.in/lucsky/go-exml.v3"
)

type AddressBook struct {
    Name     string
    Contacts []*Contact
}

type Contact struct {
    FirstName string
    LastName  string
    Address   string
}

func main() {
    reader, _ := os.Open("example.xml")
    defer reader.Close()

    addressBook := AddressBook{}
    decoder := exml.NewDecoder(reader)

    decoder.On("address-book", func(attrs exml.Attrs) {
        addressBook.Name, _ = attrs.Get("name")

        decoder.On("contact", func(attrs exml.Attrs) {
            contact := &Contact{}
            addressBook.Contacts = append(addressBook.Contacts, contact)

            decoder.On("first-name", func(attrs exml.Attrs) {
                decoder.OnText(func(text exml.CharData) {
                    contact.FirstName = string(text)
                })
            })

            decoder.On("last-name", func(attrs exml.Attrs) {
                decoder.OnText(func(text exml.CharData) {
                    contact.LastName = string(text)
                })
            })

            decoder.On("address", func(attrs exml.Attrs) {
                decoder.OnText(func(text exml.CharData) {
                    contact.Address = string(text)
                })
            })
        })
    })

    decoder.Run()

    fmt.Printf("Address book: %s\n", addressBook.Name)
    for _, c := range addressBook.Contacts {
        fmt.Printf("- %s %s @ %s\n", c.FirstName, c.LastName, c.Address)
    }
}

To reduce the amount and depth of event callbacks that you have to write, exml allows you to register handlers on events paths:

decoder.OnTextOf("address-book/contact/first-name", func(text exml.CharData) {
    fmt.Println("First name: ", string(text))
})

// This works too:
decoder.On("address-book/contact", func(attrs exml.Attrs) {
    decoder.OnTextOf("last-name", func(text exml.CharData) {
        fmt.Println("Last name: ", string(text))
    })
})

// And this as well:
decoder.On("address-book/contact/address", func(attrs exml.Attrs) {
    decoder.OnText(func(text exml.CharData) {
        fmt.Println("Address: ", string(text))
    })
})

Finally, since using nodes text content to initialize struct fields is a pretty frequent task, exml provides shortcuts to make it shorter to write. Let's revisit our address book example and use this shortcut:

contact := &Contact{}
decoder.OnTextOf("first-name", exml.Assign(&contact.FirstName))
decoder.OnTextOf("last-name", exml.Assign(&contact.LastName))
decoder.OnTextOf("address", exml.Assign(&contact.Address))

Other assignment shortcuts (AssignBool, AssignFloat, AssignInt and AssignUInt) allow to pick typed values. Another type of shortcuts allow to accumulate text content from various nodes to a single slice:

info := []string{}
decoder.OnTextOf("first-name", exml.Append(&info))
decoder.OnTextOf("last-name", exml.Append(&info))
decoder.OnTextOf("address", exml.Append(&info))

In the same way, there are typed versions of the appending shortcuts (AppendBool, AppendFloat, AppendInt and AppendUInt) which allow to append typed parsed values.

The second version (aka v2) of exml introduced global events which allow to register a top level handler that would be picked up at any level whenever a corresponding XML node is encountered. For example, this snippet would allow to print all text nodes regardless of their depth and parent tag:

decoder := exml.NewDecoder(reader)
decoder.OnText(func(text CharData) {
    fmt.Println(string(text))
})

API

The full API is visible at the exml gopkg.in page.

Benchmarks

The included benchmarks show that exml can be massively faster than standard unmarshaling and the difference would most likely be even greater for bigger inputs.

% go test -bench . -benchmem
OK: 23 passed
PASS
Benchmark_UnmarshalSimple      50000         57156 ns/op        6138 B/op        128 allocs/op
Benchmark_UnmarshalText       100000         22423 ns/op        3452 B/op         61 allocs/op
Benchmark_UnmarshalCDATA      100000         23460 ns/op        3483 B/op         61 allocs/op
Benchmark_UnmarshalMixed      100000         28807 ns/op        4034 B/op         67 allocs/op
Benchmark_DecodeSimple       5000000           388 ns/op          99 B/op          3 allocs/op
Benchmark_DecodeText         5000000           485 ns/op         114 B/op          3 allocs/op
Benchmark_DecodeCDATA        5000000           485 ns/op         114 B/op          3 allocs/op
Benchmark_DecodeMixed        5000000           487 ns/op         114 B/op          3 allocs/op
ok      github.com/lucsky/go-exml   11.194s

Contributors

  • Luc Heinrich (author)
  • Hubert Figuière

License

Code is under the BSD 2 Clause (NetBSD) license.