My learning material prepared piecing together elements from <br>
- [**Datacamp for parsing XML using ElementTree**](https://www.datacamp.com/community/tutorials/python-xml-elementtree)
- [**Tutorialspoint**](https://www.tutorialspoint.com/xml/)
- [**W3schools**](https://www.w3schools.com/Xml/)
- [**How to parse XML using Minidom**](https://www.guru99.com/manipulating-xml-with-python.html)

**What is XML?** <br>
- XML stands for "Extensible **Markup** Language". 
    - It is mainly used in *webpages*, where the data has a specific structure and is understood dynamically by the XML framework.
- XML creates *a tree-like structure* that is easy to interpret and supports a hierarchy. 
    - Whenever a page follows XML, it can be called an *XML document*
- XML document consists of *elements* --> defined by a starting tag '<' and an ending tag '>'
- Elements can have sub-elements a.k.a 'Child elements'
- Largest, top-level element is called the *root*, which contains all other elements
- Attributes are name-value pairs

###### Gender as an attribute

###### In Markdown format
<person gender="female">
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>

###### Gender as an element

###### In Markdown format
<person>
  <gender>female</gender>
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>

###### Conclusion: Content of an element is displayed, but attributes are not displayed

###### An element can have multiple attributes

###### E.g. of an element with multiple attributes and child elements

###### Python's built in library for XML parsing <br> 1. `ElementTree`

In [52]:
import xml.etree.ElementTree as ET

In [53]:
import re

###### Parsing XML Data

In [54]:
tree = ET.parse("movies_incorrect.xml")
root = tree.getroot()

In [55]:
root

<Element 'collection' at 0x2b5dcfe29408>

In [56]:
print("Name of root:",root.tag)
print("Attriubutes of root:",root.attrib)

Name of root: collection
Attriubutes of root: {}


In [57]:
for child in root:
    print(child.tag, child.attrib)

genre {'category': 'Action'}
genre {'category': 'Thriller'}
genre {'category': 'Comedy'}


The children of the root `collection` are all `genre` <br>
To designate the `genre`, the XML uses the attribute `category`

In [58]:
root.iter()

<_elementtree._element_iterator at 0x2b5dcfd9ae60>

In [59]:
# all elements under root (In other words, all elements in the whole document)
print([elem.tag for elem in root.iter()])

['collection', 'genre', 'decade', 'movie', 'format', 'year', 'rating', 'description', 'movie', 'format', 'year', 'rating', 'description', 'movie', 'format', 'year', 'rating', 'description', 'decade', 'movie', 'format', 'year', 'rating', 'description', 'movie', 'format', 'year', 'rating', 'description', 'movie', 'format', 'year', 'rating', 'description', 'genre', 'decade', 'movie', 'format', 'year', 'rating', 'description', 'decade', 'movie', 'format', 'year', 'rating', 'description', 'movie', 'format', 'year', 'rating', 'description', 'genre', 'decade', 'movie', 'format', 'year', 'rating', 'description', 'decade', 'movie', 'format', 'year', 'rating', 'description', 'movie', 'format', 'year', 'rating', 'description', 'decade', 'movie', 'format', 'year', 'rating', 'description', 'decade', 'movie', 'format', 'year', 'rating', 'description']


In [60]:
[element for element in root.iter('movie')]

[<Element 'movie' at 0x2b5dcfeca0e8>,
 <Element 'movie' at 0x2b5dcfeca048>,
 <Element 'movie' at 0x2b5dcfeca548>,
 <Element 'movie' at 0x2b5dcfeca728>,
 <Element 'movie' at 0x2b5dcfeca8b8>,
 <Element 'movie' at 0x2b5dcfecaa48>,
 <Element 'movie' at 0x2b5dcfecac78>,
 <Element 'movie' at 0x2b5dcfecaea8>,
 <Element 'movie' at 0x2b5dcfebd0e8>,
 <Element 'movie' at 0x2b5dcfebd318>,
 <Element 'movie' at 0x2b5dcfebd4f8>,
 <Element 'movie' at 0x2b5dcfebd6d8>,
 <Element 'movie' at 0x2b5dcfebd8b8>,
 <Element 'movie' at 0x2b5dcfebdae8>]

In [61]:
tree.getiterator('movie')

[<Element 'movie' at 0x2b5dcfeca0e8>,
 <Element 'movie' at 0x2b5dcfeca048>,
 <Element 'movie' at 0x2b5dcfeca548>,
 <Element 'movie' at 0x2b5dcfeca728>,
 <Element 'movie' at 0x2b5dcfeca8b8>,
 <Element 'movie' at 0x2b5dcfecaa48>,
 <Element 'movie' at 0x2b5dcfecac78>,
 <Element 'movie' at 0x2b5dcfecaea8>,
 <Element 'movie' at 0x2b5dcfebd0e8>,
 <Element 'movie' at 0x2b5dcfebd318>,
 <Element 'movie' at 0x2b5dcfebd4f8>,
 <Element 'movie' at 0x2b5dcfebd6d8>,
 <Element 'movie' at 0x2b5dcfebd8b8>,
 <Element 'movie' at 0x2b5dcfebdae8>]

In [62]:
tree.getiterator('genre')

[<Element 'genre' at 0x2b5dcfe292c8>,
 <Element 'genre' at 0x2b5dcfecabd8>,
 <Element 'genre' at 0x2b5dcfebd278>]

In [63]:
print(ET.tostring(tree.getiterator('genre')[0], encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<genre category="Action">
        <decade years="1980s">
            <movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
                <format multiple="No">DVD</format>
                <year>1981</year>
                <rating>PG</rating>
                <description>
                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of the 
                Covenant before the Nazis.'
                </description>
            </movie>
               <movie favorite="True" title="THE KARATE KID">
               <format multiple="Yes">DVD,Online</format>
               <year>1984</year>
               <rating>PG</rating>
               <description>None provided.</description>
            </movie>
            <movie favorite="False" title="Back 2 the Future">
               <format multiple="False">Blu-ray</format>
               <year>1985</year>
               <rat

###### For identifying attributes in an element

In [64]:
for movie in root.iter('movie'):
    print(movie.attrib)

{'favorite': 'True', 'title': 'Indiana Jones: The raiders of the lost Ark'}
{'favorite': 'True', 'title': 'THE KARATE KID'}
{'favorite': 'False', 'title': 'Back 2 the Future'}
{'favorite': 'False', 'title': 'X-Men'}
{'favorite': 'True', 'title': 'Batman Returns'}
{'favorite': 'False', 'title': 'Reservoir Dogs'}
{'favorite': 'False', 'title': 'ALIEN'}
{'favorite': 'True', 'title': "Ferris Bueller's Day Off"}
{'favorite': 'FALSE', 'title': 'American Psycho'}
{'favorite': 'False', 'title': 'Batman: The Movie'}
{'favorite': 'True', 'title': 'Easy A'}
{'favorite': 'True', 'title': 'Dinner for SCHMUCKS'}
{'favorite': 'False', 'title': 'Ghostbusters'}
{'favorite': 'True', 'title': 'Robin Hood: Prince of Thieves'}


###### If no attributes, we can use `text` method to fetch the value of an element

In [65]:
for description in root.iter('description'):
    print(description.text)


                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of the 
                Covenant before the Nazis.'
                
None provided.
Marty McFly
Two mutants come to a private academy for their kind whose resident superhero team must 
               oppose a terrorist organization with similar powers.
NA.
WhAtEvER I Want!!!?!
"""""""""
Funny movie about a funny guy
psychopathic Bateman
What a joke!
Emma Stone = Hester Prynne
Tim (Rudd) is a rising executive
                 who “succeeds” in finding the perfect guest, 
                 IRS employee Barry (Carell), for his boss’ monthly event, 
                 a so-called “dinner for idiots,” which offers certain 
                 advantages to the exec who shows up with the biggest buffoon.
                 
Who ya gonna call?
Robin Hood slaying


###### Use of Xpath

In [66]:
for movie in root.findall("./genre/decade/movie"):
    print(movie.attrib)

{'favorite': 'True', 'title': 'Indiana Jones: The raiders of the lost Ark'}
{'favorite': 'True', 'title': 'THE KARATE KID'}
{'favorite': 'False', 'title': 'Back 2 the Future'}
{'favorite': 'False', 'title': 'X-Men'}
{'favorite': 'True', 'title': 'Batman Returns'}
{'favorite': 'False', 'title': 'Reservoir Dogs'}
{'favorite': 'False', 'title': 'ALIEN'}
{'favorite': 'True', 'title': "Ferris Bueller's Day Off"}
{'favorite': 'FALSE', 'title': 'American Psycho'}
{'favorite': 'False', 'title': 'Batman: The Movie'}
{'favorite': 'True', 'title': 'Easy A'}
{'favorite': 'True', 'title': 'Dinner for SCHMUCKS'}
{'favorite': 'False', 'title': 'Ghostbusters'}
{'favorite': 'True', 'title': 'Robin Hood: Prince of Thieves'}


In [67]:
for movie in root.findall("./genre/decade/movie/[year='1992']"):
    print(movie.attrib)

{'favorite': 'True', 'title': 'Batman Returns'}
{'favorite': 'False', 'title': 'Reservoir Dogs'}


In [68]:
for element in root.findall("./genre/decade[@years='1990s']/movie/."):
    print(element.attrib)

{'favorite': 'False', 'title': 'X-Men'}
{'favorite': 'True', 'title': 'Batman Returns'}
{'favorite': 'False', 'title': 'Reservoir Dogs'}
{'favorite': 'True', 'title': 'Robin Hood: Prince of Thieves'}


In [69]:
# taking 1 step back
for element in root.findall("./genre/decade[@years='1990s']/movie/.."):
    print(element.attrib)
    print(element.text)

{'years': '1990s'}

            
{'years': '1990s'}

            


In [70]:
# taking 2 steps back
for element in root.findall("./genre/decade[@years='1990s']/movie/..."):
    print(element.attrib)
    print(element.text)

{'years': '1990s'}

            
{'years': '1990s'}

            


In [71]:
# taking 3 steps back
for element in root.findall("./genre/decade[@years='1990s']/movie/...."):
    print(element.attrib)
    print(element.text)

{'category': 'Action'}

        
{'category': 'Comedy'}

        


In [72]:
for element in root.findall("./genre/decade[@years='1990s']/movie/format/."):
    print(element.attrib)
    print(element.text)

{'multiple': 'Yes'}
dvd, digital
{'multiple': 'No'}
VHS
{'multiple': 'No'}
Online
{'multiple': 'No'}
Blu_Ray


###### Correcting in accuracies in the XML

In [73]:
for element in root.findall("./genre/decade/movie/format[@multiple='No']/.."):
    print()
    print(element.attrib['title'])
    print("******")
    for sub_element in element:
        if sub_element.attrib!={}:
            print(sub_element.tag,": ",sub_element.attrib)
        print(sub_element.tag,": ",sub_element.text)


Indiana Jones: The raiders of the lost Ark
******
format :  {'multiple': 'No'}
format :  DVD
year :  1981
rating :  PG
description :  
                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of the 
                Covenant before the Nazis.'
                

Batman Returns
******
format :  {'multiple': 'No'}
format :  VHS
year :  1992
rating :  PG13
description :  NA.

Reservoir Dogs
******
format :  {'multiple': 'No'}
format :  Online
year :  1992
rating :  R
description :  WhAtEvER I Want!!!?!

Ferris Bueller's Day Off
******
format :  {'multiple': 'No'}
format :  DVD
year :  1986
rating :  PG13
description :  Funny movie about a funny guy

American Psycho
******
format :  {'multiple': 'No'}
format :  blue-ray
year :  2000
rating :  Unrated
description :  psychopathic Bateman

Easy A
******
format :  {'multiple': 'No'}
format :  DVD
year :  2010
rating :  PG--13
description :  Emma Stone = Hester Prynne

Ghostbus

In [74]:
# where mistake is
for element in root.findall("./genre/decade/movie"):
    for sub_element in element:
        if sub_element.tag=='format':
            if re.search(",",sub_element.text) and sub_element.attrib['multiple']=='No':
                print(element.attrib['title'])
                print("****")
                print(sub_element.attrib, sub_element.text)

Ghostbusters
****
{'multiple': 'No'} Online,VHS


In [75]:
# where mistake is
for element in root.findall("./genre/decade/movie"):
    for sub_element in element:
        if sub_element.tag=='format':
            if re.search(",",sub_element.text) and sub_element.attrib['multiple']=='No':
                print(element.attrib['title'])
                print("****")
                for sub_element in element:
                    print(sub_element.tag,":")
                    if sub_element.attrib!={}:
                        print(sub_element.attrib)
                    if sub_element.text!='':
                        print(sub_element.text)
                    print()

Ghostbusters
****
format :
{'multiple': 'No'}
Online,VHS

year :
1984

rating :
PG

description :
Who ya gonna call?



In [76]:
# how to rectify the mistake
for form in root.findall("./genre/decade/movie/format"):
    # Search for the commas in the format text
    match = re.search(',',form.text)
    if match:
        form.set('multiple','Yes')
    else:
        form.set('multiple','No')

In [77]:
# rectified the tree
for element in root.findall("./genre/decade/movie"):
    if element.attrib['title']=='Ghostbusters':
        print(ET.tostring(element,encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<movie favorite="False" title="Ghostbusters">
                <format multiple="Yes">Online,VHS</format>
                <year>1984</year>
                <rating>PG</rating>
                <description>Who ya gonna call?</description>
            </movie>
        


In [78]:
# write that to the xml file again
tree.write("movies_correct.xml")

###### Moving elements 

In [79]:
for decade in root.findall("./genre/decade"):
    print(decade.attrib)
    for year in decade.findall("./movie/year"):
        print(year.text, '\n')

{'years': '1980s'}
1981 

1984 

1985 

{'years': '1990s'}
2000 

1992 

1992 

{'years': '1970s'}
1979 

{'years': '1980s'}
1986 

2000 

{'years': '1960s'}
1966 

{'years': '2010s'}
2010 

2011 

{'years': '1980s'}
1984 

{'years': '1990s'}
1991 



In [80]:
for genre in root.findall('./genre'):
    print(genre.attrib)

{'category': 'Action'}
{'category': 'Thriller'}
{'category': 'Comedy'}


In [81]:
for genre in root.findall("./genre"):
    for decade in genre.findall("./decade"):
        for movie in decade.findall("./movie/[year='2000']"):
            print(movie.attrib['title'])
            print(genre.attrib)
            print(decade.attrib)
            for sub_element in movie:
                if sub_element.tag=='year':
                    print(sub_element.text)

X-Men
{'category': 'Action'}
{'years': '1990s'}
2000
American Psycho
{'category': 'Thriller'}
{'years': '1980s'}
2000


*X-Men and American Psycho released in 2000 are in wrong decades and in Thriller and Action genres*

###### Adding a SubElement to a tree

###### Creating new elements new_decade_action and new_decade_thriller in action and thriller genres respectively

In [82]:
action = root.find("./genre[@category='Action']")
thriller = root.find("./genre[@category='Thriller']")
new_dec_action = ET.SubElement(action, 'decade')
new_dec_thriller = ET.SubElement(thriller, 'decade')
new_dec_action.attrib["years"] = '2000s'
new_dec_thriller.attrib["years"] = '2000s'

In [83]:
print(ET.tostring(action, encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<genre category="Action">
        <decade years="1980s">
            <movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
                <format multiple="No">DVD</format>
                <year>1981</year>
                <rating>PG</rating>
                <description>
                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of the 
                Covenant before the Nazis.'
                </description>
            </movie>
               <movie favorite="True" title="THE KARATE KID">
               <format multiple="Yes">DVD,Online</format>
               <year>1984</year>
               <rating>PG</rating>
               <description>None provided.</description>
            </movie>
            <movie favorite="False" title="Back 2 the Future">
               <format multiple="No">Blu-ray</format>
               <year>1985</year>
               <rating

In [84]:
print(ET.tostring(thriller, encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<genre category="Thriller">
        <decade years="1970s">
            <movie favorite="False" title="ALIEN">
                <format multiple="No">DVD</format>
                <year>1979</year>
                <rating>R</rating>
                <description>"""""""""</description>
            </movie>
        </decade>
        <decade years="1980s">
            <movie favorite="True" title="Ferris Bueller's Day Off">
                <format multiple="No">DVD</format>
                <year>1986</year>
                <rating>PG13</rating>
                <description>Funny movie about a funny guy</description>
            </movie>
            <movie favorite="FALSE" title="American Psycho">
                <format multiple="No">blue-ray</format>
                <year>2000</year>
                <rating>Unrated</rating>
                <description>psychopathic Bateman</description>
            </movie>
        </decade>
    <decade years="2000s" />

- Use `.append()` to append 'X-Men' movie to the 2000s under action 

In [85]:
xmen = root.find("./genre/decade/movie[@title='X-Men']")
dec2000s = root.find("./genre[@category='Action']/decade[@years='2000s']")
dec2000s.append(xmen)

- Use `.remove()` to remove the same film from 1990s 

In [86]:
dec1990s = root.find("./genre[@category='Action']/decade[@years='1990s']")
dec1990s.remove(xmen)

In [87]:
print(ET.tostring(action, encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<genre category="Action">
        <decade years="1980s">
            <movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
                <format multiple="No">DVD</format>
                <year>1981</year>
                <rating>PG</rating>
                <description>
                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of the 
                Covenant before the Nazis.'
                </description>
            </movie>
               <movie favorite="True" title="THE KARATE KID">
               <format multiple="Yes">DVD,Online</format>
               <year>1984</year>
               <rating>PG</rating>
               <description>None provided.</description>
            </movie>
            <movie favorite="False" title="Back 2 the Future">
               <format multiple="No">Blu-ray</format>
               <year>1985</year>
               <rating

In [88]:
#repeating the same step for 'American Psycho'
american_psycho = root.find("./genre/decade/movie[@title='American Psycho']")
dec2000s = root.find("./genre[@category='Thriller']/decade[@years='2000s']")
dec2000s.append(american_psycho)

In [89]:
dec1980s = root.find("./genre[@category='Thriller']/decade[@years='1980s']")

In [90]:
print(ET.tostring(dec1980s, encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<decade years="1980s">
            <movie favorite="True" title="Ferris Bueller's Day Off">
                <format multiple="No">DVD</format>
                <year>1986</year>
                <rating>PG13</rating>
                <description>Funny movie about a funny guy</description>
            </movie>
            <movie favorite="FALSE" title="American Psycho">
                <format multiple="No">blue-ray</format>
                <year>2000</year>
                <rating>Unrated</rating>
                <description>psychopathic Bateman</description>
            </movie>
        </decade>
    


In [91]:
dec1980s.remove(american_psycho)

In [92]:
print(ET.tostring(dec1980s, encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<decade years="1980s">
            <movie favorite="True" title="Ferris Bueller's Day Off">
                <format multiple="No">DVD</format>
                <year>1986</year>
                <rating>PG13</rating>
                <description>Funny movie about a funny guy</description>
            </movie>
            </decade>
    


In [93]:
#writing back to the xml
tree.write("movies_correct.xml")

###### Python's API for XML processing no. 2 dom.minidom

- Document Object Model (dom) is another api for accessing and processing XML 

In [94]:
from xml.dom import minidom

In [95]:
dom = minidom.parse("movies_correct.xml")

In [96]:
print("Print the entire dom contents removing the newline")
print(dom.toprettyxml(newl='')) #newline is empty

Print the entire dom contents removing the newline
<?xml version="1.0" ?><collection>	
    	<genre category="Action">		
        		<decade years="1980s">			
            			<movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">				
                				<format multiple="No">DVD</format>				
                				<year>1981</year>				
                				<rating>PG</rating>				
                				<description>
                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of the 
                Covenant before the Nazis.'
                </description>				
            			</movie>			
               			<movie favorite="True" title="THE KARATE KID">				
               				<format multiple="Yes">DVD,Online</format>				
               				<year>1984</year>				
               				<rating>PG</rating>				
               				<description>None provided.</description>				
            			</movie>			
            			<movie

In [397]:
print("Get the root node using `documentElement`")
print(dom.documentElement)
print(dom.documentElement.nodeName)

Get the root node using `documentElement`
<DOM Element: collection at 0x1c80aff7a60>
collection


In [97]:
print("Getting the child nodes of the root node")
dom.documentElement.childNodes

Getting the child nodes of the root node


[<DOM Text node "'\n    '">,
 <DOM Element: genre at 0x2b5dcfeacaf8>,
 <DOM Text node "'\n\n    '">,
 <DOM Element: genre at 0x2b5dcfef2048>,
 <DOM Text node "'\n    '">,
 <DOM Element: genre at 0x2b5dcfef2b90>,
 <DOM Text node "'\n'">]

In [103]:
# one crude way to remove those 'Text node' ; cropping up only in minidom module
dom.documentElement.childNodes = dom.documentElement.childNodes[1:6:2]

In [104]:
dom.documentElement.childNodes

[<DOM Element: genre at 0x2b5dcfeacaf8>,
 <DOM Element: genre at 0x2b5dcfef2048>,
 <DOM Element: genre at 0x2b5dcfef2b90>]

In [110]:
# better way to remove those 'Text nodes' 
dom = minidom.parse("movies_correct.xml") # re-initializing dom variable
print(dom.documentElement.childNodes)
corrected_root_childnodes = [child_node for child_node in ]

[<DOM Element: collection at 0x2b5dcfef7e88>]


In [111]:
#genre elements
print(dom.getElementsByTagName('genre'))
print()
print(dom.getElementsByTagName('genre').length)
print()
print([each_node.getAttribute("category") for each_node in dom.getElementsByTagName('genre')])
print()
print([(each_node.getAttribute('category'),child_node.nodeName,child_node.getAttribute('years')) for each_node in dom.getElementsByTagName('genre') for child_node in each_node.childNodes if child_node.nodeName!='#text'])

[<DOM Element: genre at 0x2b5dcfef7df0>, <DOM Element: genre at 0x2b5de449ddf0>, <DOM Element: genre at 0x2b5de44a09c8>]

3

['Action', 'Thriller', 'Comedy']

[('Action', 'decade', '1980s'), ('Action', 'decade', '1990s'), ('Action', 'decade', '2000s'), ('Thriller', 'decade', '1970s'), ('Thriller', 'decade', '1980s'), ('Thriller', 'decade', '2000s'), ('Comedy', 'decade', '1960s'), ('Comedy', 'decade', '2010s'), ('Comedy', 'decade', '1980s'), ('Comedy', 'decade', '1990s')]


In [118]:
# printing all values of 'genre' element
genre_element = dom.getElementsByTagName('genre')
genre_length = dom.getElementsByTagName('genre').length
for genre in genre_element:
    for decade in genre.childNodes:
        if decade.nodeName!='#text':
            for movie in decade.childNodes:
                if movie.nodeName!='#text':
                    print(movie.tagName,":",movie.getAttribute('title'))
                    print(genre.tagName,":",genre.getAttribute('category'))
                    print(decade.tagName,":",decade.getAttribute('years'))

movie : Indiana Jones: The raiders of the lost Ark
genre : Action
decade : 1980s
movie : THE KARATE KID
genre : Action
decade : 1980s
movie : Back 2 the Future
genre : Action
decade : 1980s
movie : Batman Returns
genre : Action
decade : 1990s
movie : Reservoir Dogs
genre : Action
decade : 1990s
movie : X-Men
genre : Action
decade : 2000s
movie : ALIEN
genre : Thriller
decade : 1970s
movie : Ferris Bueller's Day Off
genre : Thriller
decade : 1980s
movie : American Psycho
genre : Thriller
decade : 2000s
movie : Batman: The Movie
genre : Comedy
decade : 1960s
movie : Easy A
genre : Comedy
decade : 2010s
movie : Dinner for SCHMUCKS
genre : Comedy
decade : 2010s
movie : Ghostbusters
genre : Comedy
decade : 1980s
movie : Robin Hood: Prince of Thieves
genre : Comedy
decade : 1990s


###### Summary: commands learnt

`ET.parse`
`tree.getroot()` to get to the root node <br>
`root.iter()` -- the generator that can be iterated in a list to fetch all elements in it <br>
`root.iter('sub_element')` -- this generator can be used to fetch all 'sub_element' or 'child nodes' present in any lower level hierarchy from the root <br>
`root.findall(xpath)` -- alternate way to get elements using xpath <br>
`element.attrib` and `element.text` methods <br>
`root.find(xpath)` -- to go to the xpath location <br>
`ET.SubElement(root.find(xpath),subelement_name)` -- to add a new subelement/node at the end of `root.find(xpath)` that was not previously there <br>
`.append()` and `.remove()` -- used to move elements or individually to append or remove an element