Library and CLI for learning about XML formats.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Library and CLI for learning about XML formats.

xmlearn is a python module and command line utility which can be used to glean information about the structure of an XML document.

The CLI commands map more or less directly to python functions.

There are some lower-level python functions which are not provided as CLI commands.

A major goal when developing this module was to learn about XML and Python support for it. I suspect that many of its functions are already better implemented elsewhere. Any suggestions are likely welcome.


The code is maintained as a repository at


This code is copyright 2010 Ted Tibbetts and is licensed under the GNU Public License. See the file COPYING for details.


Requires lxml for all functionality.

Requires python-graph, python-graph-dot, and pydot for graphing.

The CLI uses the argparse module. Note that argparse is included with Python 2.7.

Only tested with Python 2.6.


There are three CLI commands.

The global -i option is used to pass an input file; this defaults to stdin.

All three operate within the scope of an XPath provided by the global -p option; this path defaults to the root.

usage: xmlearn [-h] [-i INFILE] [-p PATH] {graph,dump,tags} ...

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE, --infile INFILE
                        The XML file to learn about. Defaults to stdin.
  -p PATH, --path PATH  An XPath to be applied to various actions. Defaults to the root node.


The dump functionality pretty-prints the structure of the document under a given XPath.

usage: xmlearn dump [-h] [-f XML_FORMAT] [-r RULESET]
                         [-d MAXDEPTH] [-w WIDTH]
                         [-l [RULESET]] [-v]

Dump xml data according to a set of rules.

optional arguments:
  -h, --help            show this help message and exit
  -f XML_FORMAT, --format XML_FORMAT
                        The XML format or schema. Defaults to generic XML.
                        xmlearn currently supports just generic and Docbook.
                        Specifying `-f docbook` will enable the `book` ruleset
                          to be specified via `-r book`.
  -r RULESET, --ruleset RULESET
                        Which set of rules to apply.
  -d MAXDEPTH, --maxdepth MAXDEPTH
                        How many levels to dump.
  -w WIDTH, --width WIDTH
                        The output width of the dump.
  -l [RULESET], --list-rulesets [RULESET]
                        Get a list of rulesets or information about a particular ruleset
  -v, --verbose         Enable verbose ruleset list. Only useful with `-l`.


tags provides info about the tags used in the document. Currently it can present either a list of tags or the list of tags which are children of a given tag.

usage: xmlearn tags [-h] [-e] [-C] [-c [PARENT]]

Show information about tags.

optional arguments:
  -h, --help            show this help message and exit
  -e, --show-element    Enables display of the element path.
                        Without this option, data from multiple matching elements
                          will be listed in unbroken series.
                        This is mostly useful when the path selects multiple elements.
  -C, --no-combine      Do not combine results from various path elements.
                        This option is only meaningful when the --path leads to multiple elements.
  -c [PARENT], --child [PARENT]
                        List all tags which appear as children of PARENT.


Builds an image file containing a graph of the tag relationships within the XML document.

Any formats supported by pydot can be used for the resulting image file.

usage: xmlearn graph [-h] [--format FORMAT] [-F FORCE_EXTENSION] outfile

Build a graph from the XML tags relationships.

positional arguments:
  outfile               The filename for the graph image. If no --format is given,
                          it will be based on this name.

optional arguments:
  -h, --help            show this help message and exit
  --format FORMAT       The format for the graph image.
                        It will be appended to the filename unless they already concur or -F is passed.
                        Allow the filename extension to differ from the file format.
                        Without this option, the format extension will be appended to the filename.