The XML tree structure makes navigation, modification, and removal relatively simple programmatically. Python has a built in library, ElementTree, that has functions to read and manipulate XMLs (and other similarly structured files).

In [1]:
import os
os.chdir("C:\RILPY")

import xml.etree.ElementTree as ET

### Parsing XML data

In [2]:
tree = ET.parse('bcs.xml')
root = tree.getroot()

In [3]:
root.tag

'BusinessCards'

In [4]:
root.attrib

{}

You can easily iterate over subelements (commonly called "children") in the root by using a simple "for" loop.

In [5]:
for child in root:
    print(child.tag, child.attrib)

BusinessCard {}
BusinessCard {}


Typically it is helpful to know all the elements in the entire tree. One useful function for doing that is root.iter(). You can put this function into a "for" loop and it will iterate over the entire tree.

In [6]:
[elem.tag for elem in root.iter()]

['BusinessCards',
 'BusinessCard',
 'Name',
 'phone',
 'phone',
 'phone',
 'email',
 'BusinessCard',
 'Name',
 'phone',
 'phone',
 'phone',
 'email']

This gives a general notion for how many elements you have, but it does not show the attributes or levels in the tree.

There is a helpful way to see the whole document. Any element has a .tostring() method. If you pass the root into the .tostring() method, you can return the whole document. Within ElementTree (remember aliased as ET), .tostring() takes a slightly strange form.

Since ElementTree is a powerful library that can interpret more than just XML, you must specify both the encoding and decoding of the document you are displaying as the string. For XMLs, use 'utf8' - this is the typical document format type for an XML.



In [7]:
print(ET.tostring(root, encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<BusinessCards>
   <BusinessCard>
      <Name>Joe Marini</Name>
      <phone type="mobile">(415) 555-4567</phone>
      <phone type="work">(800) 555-9876</phone>
      <phone type="fax">(510) 555-1234</phone>
      <email>joe@joe.com</email>
   </BusinessCard>

   <BusinessCard>
      <Name>Someone Else</Name>
      <phone type="mobile">(415) 555-0000</phone>
      <phone primary="primary" type="work">(800) 555-1111</phone>
      <phone type="fax">(510) 555-2222</phone>
      <email>someone@else.com</email>
   </BusinessCard>
</BusinessCards>


You can expand the use of the iter() function to help with finding particular elements of interest. root.iter() will list all subelements under the root that match the element specified. Here, you will list all attributes of the REGION element in the tree:

In [8]:
for phone in root.iter('phone'):
    print(phone.attrib)

{'type': 'mobile'}
{'type': 'work'}
{'type': 'fax'}
{'type': 'mobile'}
{'type': 'work', 'primary': 'primary'}
{'type': 'fax'}
