# Accessing XML with XPATH

In the prior lessons on ElementTree and lxml.objectify, we encountered a few different ways—using methods and attributes—of navigating XML documents that are read into Python objects.  The XML world also provides a "native" language for navigating objects that is supported by many XML libraries.  These XPATH descriptions are language neutral while nonetheless feeling natural in a Python context.  Both `lxml.etree` and the Python standard library `xml.etree` support XPATH, with the standard library limiting itself to a useful subset.

We can simply repeat the same selections that we made using other XML styles in previous lessons.  Note that since one XPATH option is recursive search of all elements, we do not need to perform a separate `.getroot()` call to utilize the tree.

In [1]:
import xml.etree.ElementTree as ET
tree = ET.parse('data/quran.xml')

```python
# ElementTree style
suras = tree.getroot().find('suracoll').findall('sura')
[elem.text for elem in suras[100] if elem.tag == 'v']

# objectify style
tree.getroot().tstmt.suracoll.sura[100].v[:]
```

For a first task, let us find the verses of Sura 101, as we did in other manners.  XPATH uses one-based indexing rather than Python's zero-based indexing.  Recursive search is indicated by a double slash `//`.

In [8]:
# We can start from the tree, not only its root element
[v.text for v in tree.findall('.//sura[101]/v')]

['The terrible calamity!',
 'What is the terrible calamity!',
 'And what will make you comprehend what the terrible calamity is?',
 'The day on which men shall be as scattered moths,',
 'And the mountains shall be as loosened wool.',
 'Then as for him whose measure of good deeds is heavy,',
 'He shall live a pleasant life.',
 'And as for him whose measure of good deeds is light,',
 'His abode shall be the abyss.',
 'And what will make you know what it is?',
 'A burning fire.']

If we like, we can search starting with some nested element.  XPATH searches return lists of matching elements, even when only one is present.

In [3]:
quran = tree.getroot()
sura101 = quran.findall('.//sura[101]')[0]
sura101.findall('v[5]')[0].text

'And the mountains shall be as loosened wool.'

In [4]:
# Non-recursive, must nest path directly
title = tree.findall('./coverpg/title')[0]
title.text

'The Quran'

We can search elements to find the first with a certain attribute.

In [9]:
tree.find('[@attr1]')

<Element 'tstmt' at 0x7fbf601f7400>

If an XPATH is not matched in the current object, it does not raise an exception, but simply returns an empty list to indicate that nothing matches.

In [6]:
tree.findall('.//no/such/path')

[]

# Extras

If you decide to use `lxml` instead of the standard library, some enhanced XPATH capabilities are present.  These include a selector `text()` to pull the text of an element directly using XPATH rather than as a Python attribute, and the ability to use regular expressions to match path patterns.  

While these capabilities are powerful and useful for heavy users of XML, the capabilities in the standard library are more than adequate for simplifying access to elements and searching trees.