# XML Processing Basics
Reading, accessing, traversing and transforming XML documents

In [None]:
%cat example.xml

## Import the module `ElementTree` from the package `xml.etree`under the name `ET`

In [None]:
import xml.etree.ElementTree as ET

## Parse the example XML file into a hierarchical xml element tree object

In [None]:
tree = ET.parse('example.xml')

In [None]:
type(tree)

## Get the root element

In [None]:
root = tree.getroot()

In [None]:
type(root)

Get the tag of the root element

In [None]:
root.tag

## Loop over all immediate subelements of the root

In [None]:
for element in root:
    print(element.tag)

The subelements of an element are a list supporting numerical indexing access and slicing.

In [None]:
sent1 = root[0]

In [None]:
sent1[0:3]

## Loop over all nested subelements (traverse a tree)
A recursive function to recursively traverse and print all tags in an XML document according to their nestedness.
A recursive function calls itself when executed.

In [None]:
def print_indented_tags(element, indent=""):
    print(indent + element.tag)
    for child in element:
        print_indented_tags(child, indent + "    ")
print_indented_tags(root)

## Loop over specific nested elments in a tree

In [None]:
for element in root.iter('w'):
    print(element.text)

## Count all immediate subelements with a given tag

In [None]:
len(sent1.findall('w'))

## Get a specific attribute of an element

In [None]:
sent1 = root[0]
sent1.get('id')

## Get all attributes of an element
The Python attribute `attrib` is a dictionary containing all XML attributes, that is key/value pairs.

In [None]:
sent1.attrib

Get a specific value in the tree by indexing

In [None]:
sent1[0].attrib

## Get the text data of an element
The Python attribute `.text` contains the (first) text data of an element as a string.

In [None]:
sent1[0].text

The layout texts in mixed elements containing text and other elements is not handled with builtin `xml` package. XML comments are also just removed in the DOM. [`lxml`](https://lxml.de/) is a more complete XML processor library for Python.

In [None]:
root.text

In [None]:
for text in root.itertext():
    print("'",text,"'",sep="")

## Get the list of tokens with list comprehension

In [None]:
[w.text for w in sent1]

## Your task
Recreate the grep verticalized format from grepping exercise for the example text above
```
<s lang="en" n="a2-s117">
Is	VBZ	be
the	DT	the
West	NN	west
still	RB	still
a	DT	a
dominating	JJ	dominating
power	NN	power
?	SENT	?
<s lang="en" n="a2-s118">
Is	VBZ	be
the	DT	the
East	NN	east

```