Path expressions

Dominic Cleal edited this page Nov 10, 2013 · 2 revisions

Table of Contents

Path Expressions

The Augeas tree consists of nodes with a string label and a string value, where several siblings can have the same label; for example, there are multiple nodes /files/etc/ssh/sshd_config/AcceptEnv. Normal paths, like paths in a file system, are therefore not enough to specify which node exactly is meant when a node is to be deleted or have its value changed. Additionally, it is often necessary to find nodes based on complex criteria, for example to find the IP address of the host with alias myhost.example.com in /etc/hosts.

Augeas addresses these needs by providing path expressions, a notation for entries on the tree modeled closely on XPath; if you're familiar with XPath, Augeas' path expressions will be a snap to learn. If not, you can rely on the many XPath tutorials on the web in addition to this page.

Path Expressions by Example

It is strongly recommended that you follow these examples along in augtool. For each of them, type match "EXPR" at the prompt — note that you have to enclose the expression in double quotes if it contains spaces.

The simplest path expression looks like a path in a file system. For example, /files/etc/hosts/1/alias — though it might match multiple nodes. If the first entry in /etc/hosts has three aliases, the path expression will match those three nodes.

Positions

To pick the second of those aliases, we can add a predicate enclosed in brackets: /files/etc/hosts/1/alias[2]. To evaluate this expression, first all nodes matching the expression without the predicate are collected into a set (a node set). Then the predicate is applied to each of them in turn, and a new node set is constructed from the nodes for which the predicate is true.

Each node in a node set has a position, starting at 1, and the predicate [N] is short for [position() = N]. The position of the last node in a node set is last() so that the predicate [position() = last() or [last()] for short will match the last node in a node set.

The last-but-one node in a node set can be matched with [last() - 1]; predicates can also contain comparisons with &lt; etc. so that [position() < last()] will match all but the last node in a node set.

Putting this together, we can get various aliases for a host with

  /files/etc/hosts/1/alias[last()]
  /files/etc/hosts/1/alias[last() - 1]
  /files/etc/hosts/1/alias[position() < last()]
  /files/etc/hosts/1/alias[position() > 1]

Node Tests

A predicate with just a node set is true if there is at least one node in the node set, so that

  /files/etc/hosts/*[alias]

will match all host entries with at least one alias. To check for entries with more than five aliases, the count() function can be used:

  /files/etc/hosts/*[count(alias) > 5]

Predicates can not only be used on the last step in a path expression, but also on intermediate steps. For example, to get the penultimate alias of the host with IP address 127.0.0.1:

  /files/etc/hosts/*[ipaddr = "127.0.0.1"]/alias[ last() - 1 ]

Related Nodes

What if we are not interested in the alias of a host itself, but the IP address of the host with alias myhost.example.com ? We can get that with

  /files/etc/hosts/*/ipaddr[../alias = 'myhost.example.com']

The expression inside the predicate ../alias = 'myhost.example.com' is evaluated by first constructing the node set ../alias which contains all the siblings of the current node with label alias. The predicate is true if there is at least one node in that node set with value myhost.example.com.

Multiple comparisons can be combined with and and or, so that the IP address of the host that has an alias or a canonical name of myhost.example.com is

  /files/etc/hosts/*/ipaddr[../alias = 'myhost.example.com' or ../canonical = 'myhost.example.com']

Axes

There are different directions in which we can continue a search in the tree from any given node. In XPath speak, these directions are called axes. The notation .. is short for the parent axis. Similarly, the notation alias looks for a child of the current node with label alias, i.e. searches along the child axis.

You can think of an axis as a set of nodes, which gets filtered by the labels of the nodes in the set — to find the child alias, first construct a node set with all children of the current node, and then keep only the ones with label alias. The full syntax for searching along a specific axis is AXIS::NAME where AXIS is one of the following:

self
only the current node
child
the children of the current node
descendant
any descendant of the current node, i.e. all the nodes in the subtree rooted at the current node
descendant-or-self
like descendant, but also includes the current node
parent
the parent of the current node
ancestor
the parent, the parent's parent etc. of the current node, all the way up to the root of the tree
root
the root of the tree
following-sibling
the siblings that come after the context node (since version 0.5.3)
preceding-sibling
the siblings that come before the context node, in reverse order (since version 0.5.3)
The NAME can either be a specific node label, like alias, or the wildcard *, which matches any node on the axis, no matter what their label.

Since writing the full AXIS::NAME notation for each step in a path is very unwieldy, there are some useful abbreviations for commonly used combinations of axis and name. First off, a name by itself, like alias is short for child::alias. The single dot . and double dots .. are short for self::* and parent::*.

Searching over a whole subtree is very useful, and can be done with // which is short for descendant-or-self::*, so that the expression

  /augeas//error

will match any node with label error anywhere in the subtree starting at /augeas.

Multiple Predicates

Multiple predicates are evaluated by filtering node sets successively:

 /files/etc/pam.d/*/*[module = 'system-auth'][type = 'account']

This becomes clearer when we select the name of the last service in /etc/services on port 22:

  /files/etc/services/service-name[port = '22'][last()]

This is evaluated by first finding all service-name nodes with a child port whose value is 22 and then taking the last such service-name node. This is very different from

  /files/etc/services/service-name[last()][port = '22']

You can also match on the value of the current node. The next example picks all nodes /files/etc/ntp.conf/server whose value is 192.168.0.1

  /files/etc/ntp.conf/server[. = '192.168.0.1']

Tips & Tricks

Some useful expressions, in no particular order:

  /augeas//error

any error node in the /augeas subtree. Useful for checking quickly if anything went wrong.

  /files/etc/hosts/*/ipaddr[count(../alias) = 0]

match nodes that do not have an alias sibling

  /files/etc/hosts/*[label() != '#comment']

match nodes that are not comments

  /files/etc/sysctl.conf/#comment[following-sibling::*[1][self::kernel.sysrq]]

match the comment just before the kernel.sysrq entry in /etc/sysctl.conf

  /files/etc/fstab/*[count(opt[. = "noexec"]) = 0]

finds all entries in /etc/fstab that do not have a noexec option. The interesting bit is the argument to count: a set of all the opt children of the current entry whose value is noexec. As written above, this will also list comment nodes; they can be filtered with an additional [label() != '#comment'] predicate.

  /files/etc/hosts/*[ipaddr =~ regexp("192\..*")]

finds all entries in /etc/hosts whose IP address starts with 192.

Detailed Specification

Types

The language for path expressions is statically typed. The type system is very simple, and consists of only a few basic types:

boolean a boolean value; the result of comparisons like '=' and '!=' are boolean values
number an integer
string string literals can be enclosed either in single or double quotes
regexp a regular expression, returned by the regexp function. Can be used with the =~ operator for matching
nodeset a set of nodes in the tree
Types are checked when a path expression is compiled, and type errors immediately stop whatever operation was using the path expression.

Formal Grammar

The formal grammar for path expressions (in EBNF notation) is:

 PathExpr ::= LocationPath | PrimaryExpr
 PrimaryExpr ::= Literal
               | Number
               | FunctionCall
               | '(' Expr ')'
 FunctionCall ::= Name '(' ( Expr ( ',' Expr )* )? ')'
 Literal ::= '"' /[^"]* / '"' | "'" /[^']* / "'"
 Number       ::= /[0-9]+/
 
 LocationPath ::= RelativeLocationPath | AbsoluteLocationPath
 AbsoluteLocationPath ::= '/' RelativeLocationPath?
                        | AbbreviatedAbsoluteLocationPath
 AbbreviatedAbsoluteLocationPath ::= '//' RelativeLocationPath
 RelativeLocationPath ::= Step
                        | RelativeLocationPath '/' Step
                        | AbbreviatedRelativeLocationPath
 AbbreviatedRelativeLocationPath ::= RelativeLocationPath '//' Step
 
 Step ::= AxisSpecifier NameTest Predicate* | '.' | '..'
 AxisSpecifier ::= AxisName '::' | <epsilon>
 AxisName ::= 'ancestor'
            | 'ancestor-or-self'
            | 'child'
            | 'descendant'
            | 'descendant-or-self'
            | 'parent'
            | 'self'
            | 'root' NameTest     ::= '*' | Name
 Predicate    ::= "[" Expr "]" *
 Name ::= /([^][/=) \t\n]|\\.)+/

Builtin Functions

The following functions can be used:

last() the position of the last node in the current context
position() the position of the context node, starting with 1
label() the label of the context node as a string
count(NODESET) the number of nodes in NODESET
regexp(STRING|NODESET) construct a regular expression from STRING or NODESET. The regular expression syntax is the same that is used in lenses, essentially extended POSIX regular expression syntax. If a nodeset is passed in, the resulting regular expression is the union (|) of interpreting each node's value as a regular expression.
glob(STRING|NODESET) construct a regular expression from a STRING or NODESET. Interpret the string, or the values of each node in the nodeset, as a glob and produce an equivalent regular expression. For a nodeset, the resulting regular expression is the union (|) of interpreting each node's value as a glob. The only wildcard characters in globs are * to match any number of characters, and ? to match a single character.