The xml.etree.ElementTree module provides a simple and effective way to parse and create XML data.

We can get the list of attributes and their values in the root tag. Once we find the attributes, it helps us navigate the XML tree easily.

The parse(“file.xml”) function takes XML file format to parse it. Take a look - The getroot() method returns the root element of ‘my_document.xml’.
We can also filter the results out of the xml tree by using the findall() function of this module.

In [None]:
import xml.etree.ElementTree as ET
#importing python's built in module for xml

xml_tree = ET.parse(r"C:\Users\malik.alston\Desktop\Data\movie.xml")
#parse xml file

xml_root = xml_tree.getroot()
#get root element
#getting the root element in xml means accessing the top level tag  that contains all other elements in the xml structure
#when you parse xml in python, you get a tree-like structure. To start working with or navigating that structure, you begin at the root.

print(xml_root.tag)
#prints the top level tag which is collection

print('print name of the Child Elements:')
for child in xml_root:
      print(child.tag, child.attrib)

collection
[]
print name of the Child Elements:
genre {'category': 'Action'}
genre {'category': 'Thriller'}


root.tag will return the tag name of the XML element represented by the root variable, as shown in the below code example.

root.attrib is used to access the attributes of an XML element, as shown in the below code example.

In [10]:
[elem.tag for elem in xml_root.iter()]

['collection',
 'genre',
 'decade',
 'movie',
 'format',
 'year',
 'rating',
 'description',
 'movie',
 'format',
 'year',
 'rating',
 'description',
 'movie',
 'format',
 'year',
 'rating',
 'description',
 'decade',
 'movie',
 'format',
 'year',
 'rating',
 'description',
 'movie',
 'format',
 'year',
 'rating',
 'description',
 'movie',
 'format',
 'year',
 'rating',
 'description',
 'genre',
 'decade',
 'movie',
 'format',
 'year',
 'rating',
 'description',
 'decade',
 'movie',
 'format',
 'year',
 'rating',
 'description',
 'movie',
 'format',
 'year',
 'rating',
 'description']

Now you know that the children of the root collection are all genre. To designate the genre, the XML uses the attribute category. There are Action, Thriller, and Comedy movies according the genre element.

Typically it is helpful to know all the elements in the entire tree. One useful function for doing that is root.iter(). You can put this function into a "for" loop and it will iterate over the entire tree.

This gives a general notion for how many elements you have, but it does not show the attributes or levels in the tree.

There is a helpful way to see the whole document. Any element has a .tostring() method. If you pass the root into the .tostring() method, you can return the whole document. Within ElementTree (remember aliased as ET), .tostring() takes a slightly strange form.

Since ElementTree is a powerful library that can interpret more than just XML, you must specify both the encoding and decoding of the document you are displaying as the string. For XMLs, use 'utf8' - this is the typical document format type for an XML.

In [11]:
print(ET.tostring(xml_root, encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<collection>
    <genre category="Action">
        <decade years="1980s">
            <movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
                <format multiple="No">DVD</format>
                <year>1981</year>
                <rating>PG</rating>
                <description>
                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of the 
                Covenant before the Nazis.'
                </description>
            </movie>
            <movie favorite="True" title="THE KARATE KID">
                <format multiple="Yes">DVD,Online</format>
                <year>1984</year>
                <rating>PG</rating>
                <description>None provided.</description>
            </movie>
            <movie favorite="False" title="Back 2 the Future">
                <format multiple="False">Blu-ray</format>
                <year>1985</year>

You can expand the use of the iter() function to help with finding particular elements of interest. root.iter() will list all subelements under the root that match the element specified. Here, you will list all attributes of the movie element in the tree:

In [12]:
for movie in xml_root.iter('movie'):
  print(movie.attrib)


{'favorite': 'True', 'title': 'Indiana Jones: The raiders of the lost Ark'}
{'favorite': 'True', 'title': 'THE KARATE KID'}
{'favorite': 'False', 'title': 'Back 2 the Future'}
{'favorite': 'False', 'title': 'X-Men'}
{'favorite': 'True', 'title': 'Batman Returns'}
{'favorite': 'False', 'title': 'Reservoir Dogs'}
{'favorite': 'False', 'title': 'ALIEN'}
{'favorite': 'True', 'title': "Ferris Bueller's Day Off"}
{'favorite': 'FALSE', 'title': 'American Psycho'}
