2019-Aug-07, Wednesday

From<br>
https://docs.python.org/3.6/library/xml.etree.elementtree.html

In [1]:
import xml.etree.ElementTree as ET

CountryData.xml

```xml
<?xml version="1.0"?>
<data>
	<country name="Liechtenstein">
		<rank>1</rank>
		<year>2008</year>
		<gdppc>141100</gdppc>
		<neighbor name="Austria" direction="E"/>
		<neighbor name="Switzerland" direction="W"/>
	</country>
	<country name="Singapore">
		<rank>4</rank>
		<year>2011</year>
		<gdppc>59900</gdppc>
		<neighbor name="Malaysia" direction="N"/>
	</country>
	<country name="Panama">
		<rank>68</rank>
		<year>2011</year>
		<gdppc>13600</gdppc>
		<neighbor name="Costa Rica" direction="W"/>
		<neighbor name="Colombia" direction="E"/>
	</country>
</data>
```

In [2]:
tree1 = ET.parse("CountryData.xml")

In [3]:
root1 = tree1.getroot()

In [4]:
root1

<Element 'data' at 0x10f090db8>

In [5]:
# Tag of root:
root1.tag

'data'

In [6]:
# Dictionary of attributes of root:
root1.attrib

{}

In [7]:
# Children of root element:
for child in root1:
    print (child.tag, child.attrib)

country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}


In [8]:
# Children of root[0]
for elem in root1[0]:
    print (elem)

<Element 'rank' at 0x10f0a35e8>
<Element 'year' at 0x10f0a3598>
<Element 'gdppc' at 0x10f0a3688>
<Element 'neighbor' at 0x10f0a36d8>
<Element 'neighbor' at 0x10f0a3728>


In [9]:
print (root1[0][0])
print (root1[0][1])
print (root1[0][2])

<Element 'rank' at 0x10f0a35e8>
<Element 'year' at 0x10f0a3598>
<Element 'gdppc' at 0x10f0a3688>


In [10]:
len(root1[0]) #rank, year, gdppc, neighbor, neighbor: 5 things.

5

In [11]:
for ii in range(len(root1[0])):
    print (root1[0][ii].tag, root1[0][ii].attrib, root1[0][ii].text)

rank {} 1
year {} 2008
gdppc {} 141100
neighbor {'name': 'Austria', 'direction': 'E'} None
neighbor {'name': 'Switzerland', 'direction': 'W'} None


In [12]:
# This lists all the neighbors elements in all the country elements.
for neigh in root1.iter("neighbor"):
    print (neigh.attrib)

{'name': 'Austria', 'direction': 'E'}
{'name': 'Switzerland', 'direction': 'W'}
{'name': 'Malaysia', 'direction': 'N'}
{'name': 'Costa Rica', 'direction': 'W'}
{'name': 'Colombia', 'direction': 'E'}


---

XML with namespaces

- If the XML input has namespaces, then tags and attributes with prefixes in the form ```prefix:some_tag``` get expanded to ```{uri}some_tag``` where the prefix is replaced by the full URI.

- If there is a default namespace, that full URI gets prepended to all of the non-prefixed tags.

- In the following example, there are 2 namespces, one with the prefix "fictional", and the other has no prefix and is therefore the default namespace.

ActorList.xml
```xml
<?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
        xmlns="http://people.example.com">
	<actor>
		<name>John Cleese</name>
		<fictional:character>Lancelot</fictional:character>
		<fictional:character>Archie Leach</fictional:character>
	</actor>
	<actor>
		<name>Eric Idle</name>
		<fictional:character>Sir Robin</fictional:character>
		<fictional:character>Gunther</fictional:character>
        <fictional:character>Commander Clement</fictional:character>
	</actor>
</actors>
```

In [13]:
tree2 = ET.parse("ActorList.xml")

In [14]:
root2 = tree2.getroot()

In [15]:
root2

<Element '{http://people.example.com}actors' at 0x10f0b7e08>

In [16]:
for child in root2:
    print (child.tag)

{http://people.example.com}actor
{http://people.example.com}actor


In [17]:
root2.findall('{http://people.example.com}actor')

[<Element '{http://people.example.com}actor' at 0x10f0b7ea8>,
 <Element '{http://people.example.com}actor' at 0x10f0ca098>]

In [18]:
for elem in root2.findall('{http://people.example.com}actor'):
    name = elem.find('{http://people.example.com}name')
    print (name.text)
    
    for char in elem.findall('{http://characters.example.com}character'):
        #note here the namespace is different than the one in the previous for loop.
        print (" |-->", char.text)

John Cleese
 |--> Lancelot
 |--> Archie Leach
Eric Idle
 |--> Sir Robin
 |--> Gunther
 |--> Commander Clement


In [19]:
# Alternatively:
MyNameSpace = {"RealPerson": "http://people.example.com",
               "Role": "http://characters.example.com"}

for elem in root2.findall("RealPerson:actor", MyNameSpace):
    name = elem.find("RealPerson:name", MyNameSpace)
    print (name.text)
    
    for char in elem.findall("Role:character", MyNameSpace):
        print (" |-->", char.text)

John Cleese
 |--> Lancelot
 |--> Archie Leach
Eric Idle
 |--> Sir Robin
 |--> Gunther
 |--> Commander Clement
