A small, quick, lightweight, and fairly-well featured XML parser and query language for Common Lisp. It has the following dependencies:
Once loaded you should be able to immediately begin parsing XML from both a string and a source file using
(xml-parse string &optional source) (xml-load pathname)
Let's give them a try...
CL-USER > (xml-parse "<test>Hello, world!</test>") #<XML::XML-DOC "test">
In the test folder are a couple files you can try parsing as well.
CL-USER > (xml-load #p"test/rss.xml") #<XML::XML-DOC "rss">
What Does It Parse?
The XML package can parse all valid XML files.
Since it is non-validating, it will parse - but ignore - ELEMENT, ATTLIST, and NOTATION declarations in the DTD. It also ignores processing instructions. It does process ENTITY declarations, but will skip over parameter entity references in the DTD.
Most importantly, it will track all external references in the DOCTYPE and ENTITY declarations (i.e. SYSTEM and PUBLIC references), and you have access to them, but it will not automatically resolve them.
It also correctly parses XML namespaces. However, a namespace must be declared for it to be considered valid. If an undeclared namespace is parsed, it will be skipped and the element will use the entire namespace+name for its name.
<root> <items xmlns='default' xmlns:dc='dc'> <dc:item foo='bar' dc:baz='hello, world!'/> <undeclared:item/> </items> </root>
In the above example, the
xml-element-namespace for the root tag will be NIL. For the items tag it will be the 'default' namespace. The first item tag it will be 'dc'. The foo attribute will also use the default namespace, and the baz attribute will use the 'dc' namespace. However, there is no 'undeclared' namespace, so the second item will actually be named undeclared:item and use the 'default' namespace.
It should be easy enough (using the inspector) to see how the document and tags, attributes, etc. are laid out. But, for more complicated document and node traversal, it is recommended to use the
(xml-query [tag|doc] [query|string]) ;=> results
A query is extremely similar to the short-hand of an XPath, except it also allows for arbitrary Lisp code to be executed. So, while it doesn't support the full RFC for XML queries, you should still be able to do a whole lot with it very easily.
Query strings can also be compiled and re-used with
xml-query-compile instead of being re-compiled every use.
For example, let's load the RSS file in the test folder:
CL-USER > (setf rss (xml-load #p"test/rss.xml")) #<XML::XML-DOC "rss">
Now, let's get the title of all RSS items:
CL-USER > (xml-query rss "//item/title/%text") ("Star City" "The Engine That Does More" "Astronauts' Dirty Laundry")
The only difference between the above and an XPath was the use of
%text instead of the XML query function
xml-query system has a few dynamic variables that are always bound and can be referenced within the query:
%node The current node value (usually an
xml-node, but not necessarily).
%parent When %node is an
xml-node, this is the node's parent.
%name When %node is an
xml-node, this is the node's name.
%text When %node is an
xml-node, this is the node's value.
%position This is the index (1-based) of %node in the results.
In addition to the special variables, it's also possible to execute arbitrary Lisp code or call functions within the query. For example, let's parse all the links in the feed into URL objects.
CL-USER > (xml-query rss "//link/(url:url-parse %text)") (http://liftoff.msfc.nasa.gov http://liftoff.msfc.nasa.gov/news/2003/news-starcity.asp http://liftoff.msfc.nasa.gov/news/2003/news-VASIMR.asp http://liftoff.msfc.nasa.gov/news/2003/news-laundry.asp)
Similarly, we can privide just a function that's used for a map and accomplish the same thing:
CL-USER > (xml-query rss "//link/%text/'url:url-parse") (http://liftoff.msfc.nasa.gov http://liftoff.msfc.nasa.gov/news/2003/news-starcity.asp http://liftoff.msfc.nasa.gov/news/2003/news-VASIMR.asp http://liftoff.msfc.nasa.gov/news/2003/news-laundry.asp)
The queries also support sub-queries as filters. Let's find all items published on friday...
CL-USER > (xml-query rss "//item/pubDate[(search \"Fri\" %text)]/..") (#<XML::XML-TAG "item">)
Or, we can just index directly to the item we want. Maybe the first?
CL-USER > (xml-query rss "//item") (#<XML::XML-TAG "item">)
Note: Remember, in XPath queries all indices are 1-based!
Indexing directly, however, is really just shorthand for filtering on the %position...
CL-USER > (xml-query rss "/rss/*/item[(= %position 1)]") (#<XML::XML-TAG "item">)
Let's continue by first loading up the sample RDF feed...
CL-USER > (setf rdf (xml-load "test/rdf.xml")) #<XML::XML-DOC "RDF">
Next, let's find all items that have an "about" tag.
CL-USER > (xml-query rdf "//item[@about]") (#<XML::XML-TAG "item"> #<XML::XML-TAG "item"> #<XML::XML-TAG "item">)
Or, finally, how about all resource tags that are children of a "Seq" tag?
CL-USER > (xml-query rdf "//@resource[../../../Seq]") (#<XML::XML-NODE "resource"> #<XML::XML-NODE "resource"> #<XML::XML-NODE "resource">)
As you can see, there's an aweful lot that's possible with the query language. Just to recap the basics:
// - Find descendants.
/<*|tag> - Find immediate child tags.
/@attribute - Find all child attributes.
/(..) - Map results through Lisp form.
/'symbol - Map results through Lisp symbol-function.
[..] - Filter results with a sub-query.
[n] - Filter by position (1-based).
[(..)] - Filter by Lisp form.
Querying with Namespaces
Querying for tags and attributes can also use namespaces. You can filter by namespace as well as wildcarding ("*"). Using the rdf.xml example in the test folder:
CL-USER > (setf rdf (xml-load #p"test/rdf.xml")) #<XML-DOC "RDF"> CL-USER > (xml-query rdf "//channel/*/Seq/rdf:li/@rdf:resource") (#<XML-ATTRIBUTE "resource"> #<XML-ATTRIBUTE "resource"> #<XML-ATTRIBUTE "resource">)
To quickly disect the example, it will...
- ...start at the document root (since it's an absolute path),
- ...recursively search for a 'channel' tag,
- ...match all child tags,
- ...search for any 'Seq' tags,
- ...search for any 'li' tags in the 'rdf' namespace,
- ...return 'resource' attributes in the 'rdf' namespace.
XML package is pretty sparse by design.
There are a couple functions for parsing from a source file or string, and searching a document for tags using a path. All other methods exposed are accessors in the
xml-node, or a few other classes.
Parsing and Loading XML Files
(xml-parse string &optional source) ;=> xml-doc (xml-load pathname) ;=> xml-doc (xml-read node) ;=> value
(xml-query [tag|doc] [query|string]) ;=> list (xml-query-compile string) ;=> xml-query
(xml-doc-source xml-doc) ;=> pathname (xml-doc-version xml-doc) ;=> string (xml-doc-encoding xml-doc) ;=> keyword (xml-doc-standalone xml-doc) ;=> boolean (xml-doc-root xml-doc) ;=> xml-tag (xml-doc-doctype xml-doc) ;=> xml-doctype
(xml-ref-public-id xml-external-ref) ;=> string (xml-ref-system-uri xml-external-ref) ;=> string
xml-doctype methods (subclass of
(xml-doctype-root xml-doctype) ;=> string (xml-doctype-entities xml-doctyoe) ;=> xml-entities
(xml-node-doc xml-node) ;=> xml-doc (xml-node-parent xml-node) ;=> xml-node (xml-node-namespace xml-node) ;=> xml-node (xml-node-name xml-node) ;=> string (xml-node-value xml-node) ;=> string
xml-entity methods (subclass of
(xml-entity-ndata xml-entity) ;=> string
xml-tag methods (subclass of
(xml-tag-namespaces xml-tag) ;=> xml-nodes (xml-tag-elements xml-tag) ;=> xml-tags (xml-tag-attributes xml-tag) ;=> xml-nodes
Something not working right? Have a question or comment? Feel free to email me or create an issue on GitHub.