# XML

    XML - eXtensible Markup Language
        - designed to store and transport data.
        - often used for distributing data over the
           Internet(especial in web development).

    
XML vs HTML
    XML : is used to store or transport data. So the XML is a Complement to HTML.
    HTML: is used to format and display the same data.

- XML Tags are Case Sensitive
- All XML Elements Must Have a Closeing Tag
    <p>This is a paragraph.</p>
    <br />  <!-- This is a self closing -->
- XML Attribute Values Must Always be Quoted

In [1]:
import xml

help(xml)

Help on package xml:

NAME
    xml - Core XML support for Python.

MODULE REFERENCE
    https://docs.python.org/3.10/library/xml.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This package contains four sub-packages:
    
    dom -- The W3C Document Object Model.  This supports DOM Level 1 +
           Namespaces.
    
    parsers -- Python wrappers for XML parsers (currently only supports Expat).
    
    sax -- The Simple API for XML, developed by XML-Dev, led by David
           Megginson and ported to Python by Lars Marius Garshol.  This
           supports the SAX 2 API.
    
    etree -- The ElementTree XML library.  This is a subset of the full
           ElementTree XML release.

PAC

In [2]:
dir(xml)

['__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'dom',
 'parsers']

In [1]:
import xml.etree.ElementTree as ET

In [2]:
root = ET.Element("root")

child = ET.SubElement(root, "child")
child2 = ET.SubElement(root, "child2")

In [3]:
root

<Element 'root' at 0x7f2a38527ce0>

In [4]:
# xml string

ET.tostring(root)

b'<root><child /><child2 /></root>'

In [5]:
result_str = ET.tostring(root)

print(result_str)

b'<root><child /><child2 /></root>'


In [6]:
result_str = ET.tostring(root).decode("utf-8")
print(result_str)

<root><child /><child2 /></root>


In [7]:
# To write an xml file
with open("a_first.xml", "w") as fh:
    fh.write(result_str)

### Another Example

In [8]:
rt = ET.Element("root")

ch1 = ET.SubElement(rt, "child1")
ch2 = ET.SubElement(rt, "child2")

ET.tostring(rt)

b'<root><child1 /><child2 /></root>'

In [10]:
rt = ET.Element("root")

ch1 = ET.SubElement(rt, "child1")
ch1.text = "This is child1"
ch2 = ET.SubElement(rt, "child2")

ET.tostring(rt).decode("utf-8")

'<root><child1>This is child1</child1><child2 /></root>'

In [12]:
rt = ET.Element("root")

ch1 = ET.SubElement(rt, "child1")
ch1.text = "This is child1"
ch2 = ET.SubElement(rt, "child2")
ch2.text = "This is child2"

print(ET.tostring(rt).decode("utf-8"))

<root><child1>This is child1</child1><child2>This is child2</child2></root>


In [14]:
with open("b_second.xml", "wb") as fh:
    fh.write(ET.tostring(rt))

### Prettyprint 

In [15]:
from xml.dom import minidom


xmlstr = minidom.parseString(ET.tostring(rt))

print(xmlstr)

<xml.dom.minidom.Document object at 0x7f2a385a0280>


In [17]:
res = xmlstr.toprettyxml()

print(res)

<?xml version="1.0" ?>
<root>
	<child1>This is child1</child1>
	<child2>This is child2</child2>
</root>



In [18]:
with open("b_third.xml", "w") as fh:
    fh.write(res)

##  Parsing XML

In [19]:
import xml.etree.ElementTree as ET

tree = ET.parse("books.xml")

In [20]:
tree

<xml.etree.ElementTree.ElementTree at 0x7f2a385635e0>

In [21]:
# To check for presence of a particular tag in xml file
tree.findall("book")

[<Element 'book' at 0x7f2a38578950>,
 <Element 'book' at 0x7f2a3859fe20>,
 <Element 'book' at 0x7f2a3859ff10>,
 <Element 'book' at 0x7f2a3859d1c0>,
 <Element 'book' at 0x7f2a3859e750>,
 <Element 'book' at 0x7f2a3859f510>]

In [22]:
tree.findall("title")

[]

In [31]:
books = {}
for each in tree.findall("book"):
    # print(type(each), each)
    # print(each.tag, each.attrib["isbn"])
    isbn = each.attrib["isbn"]
    for each_sub in each.findall("title"):
        book_title = each_sub.text
        books[isbn] = book_title

books

{'0-596-00128-2': 'Python & XML',
 '0-596-15810-6': 'Programming Python, 4th Edition',
 '0-596-15806-8': 'Learning Python, 4th Edition',
 '0-596-15808-4': 'Python Pocket Reference, 4th Edition',
 '0-596-00797-3': 'Python Cookbook, 2nd Edition',
 '0-596-10046-9': 'Python in a Nutshell, 2nd Edition'}

In [32]:
# Assignment: Enhance it by changing the values of books dict
# as a tuple containing book_title, author and date

{"0-596-00128-2": ("Python & XML", "Jones, Drake", "December 2001")}

{'0-596-00128-2': ('Python & XML', 'Jones, Drake', 'December 2001')}

### Parse XMl String

In [33]:
input_string = """
<stuff>
    <users>
        <user x="2">
            <id>001</id>
            <name>Udhay</name>
        </user>
        <user x="7">
            <id>009</id>
            <name>Prakash</name>
        </user>
    </users>
</stuff>"""

In [36]:
stuff_tree = ET.fromstring(input_string)

stuff_tree

<Element 'stuff' at 0x7f2a385dc7c0>

In [38]:
nodes = stuff_tree.findall("users/user")

nodes

[<Element 'user' at 0x7f2a385dc450>, <Element 'user' at 0x7f2a385dc9f0>]

In [39]:
print("User count:", len(nodes))

for item in nodes:
    print("\nName", item.find("name").text)
    print("Id", item.find("id").text)
    print("Attribute", item.get("x"))

User count: 2

Name Udhay
Id 001
Attribute 2

Name Prakash
Id 009
Attribute 7


In [40]:
data = """
<person>
  <name>Gudo Van Russum</name>
  <phone type="intl">
     +1 734 808 5456
   </phone>
   <email hide="yes"/>
</person>"""

tree = ET.fromstring(data)

print("Name:", tree.find("name").text)
print("Attr:", tree.find("email").get("hide"))

Name: Gudo Van Russum
Attr: yes


### Using lxml module

In [41]:
! pip install -U lxml --user

Collecting lxml
  Downloading lxml-4.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m51.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.9.2


In [43]:
# import lxml

from lxml import etree as ET

In [44]:
ET

<module 'lxml.etree' from '/home/codespace/.local/lib/python3.10/site-packages/lxml/etree.cpython-310-x86_64-linux-gnu.so'>

In [45]:
# creating the XML

rt = ET.Element("root")

ch1 = ET.Element("child1")
rt.append(ch1)

rt

<Element root at 0x7f2a385365c0>

In [47]:
xml_str = ET.tostring(rt, pretty_print=True).decode("utf-8")

print(xmlstr)

<xml.dom.minidom.Document object at 0x7f2a385a0280>


In [50]:
# creating the XML

rt = ET.Element("root")

ch1 = ET.Element("child1")
rt.append(ch1)

# another child with text
child2 = ET.Element("child2")
child2.text = "some text"


root.append(child2)


# pretty string
s = ET.tostring(root, pretty_print=True)
# print(s)
print(s.decode("utf-8"))

TypeError: append() argument must be xml.etree.ElementTree.Element, not lxml.etree._Element

In [51]:
## Assignment : use modeule xmltodict for parsing xml

In [52]:
! pip install -U xmltodict --user

Collecting xmltodict
  Downloading xmltodict-0.13.0-py2.py3-none-any.whl (10.0 kB)
Installing collected packages: xmltodict
Successfully installed xmltodict-0.13.0


In [53]:
import xmltodict

with open("books.xml", "r") as fh:
    file_content = fh.read()
    doc = xmltodict.parse(file_content)

print(doc)

{'catalog': {'book': [{'@isbn': '0-596-00128-2', 'title': 'Python & XML', 'date': 'December 2001', 'author': 'Jones, Drake'}, {'@isbn': '0-596-15810-6', 'title': 'Programming Python, 4th Edition', 'date': 'October 2010', 'author': 'Lutz'}, {'@isbn': '0-596-15806-8', 'title': 'Learning Python, 4th Edition', 'date': 'September 2009', 'author': 'Lutz'}, {'@isbn': '0-596-15808-4', 'title': 'Python Pocket Reference, 4th Edition', 'date': 'October 2009', 'author': 'Lutz'}, {'@isbn': '0-596-00797-3', 'title': 'Python Cookbook, 2nd Edition', 'date': 'March 2005', 'author': 'Martelli, Ravenscroft, Ascher'}, {'@isbn': '0-596-10046-9', 'title': 'Python in a Nutshell, 2nd Edition', 'date': 'July 2006', 'author': 'Martelli'}]}}


In [54]:
doc

{'catalog': {'book': [{'@isbn': '0-596-00128-2',
    'title': 'Python & XML',
    'date': 'December 2001',
    'author': 'Jones, Drake'},
   {'@isbn': '0-596-15810-6',
    'title': 'Programming Python, 4th Edition',
    'date': 'October 2010',
    'author': 'Lutz'},
   {'@isbn': '0-596-15806-8',
    'title': 'Learning Python, 4th Edition',
    'date': 'September 2009',
    'author': 'Lutz'},
   {'@isbn': '0-596-15808-4',
    'title': 'Python Pocket Reference, 4th Edition',
    'date': 'October 2009',
    'author': 'Lutz'},
   {'@isbn': '0-596-00797-3',
    'title': 'Python Cookbook, 2nd Edition',
    'date': 'March 2005',
    'author': 'Martelli, Ravenscroft, Ascher'},
   {'@isbn': '0-596-10046-9',
    'title': 'Python in a Nutshell, 2nd Edition',
    'date': 'July 2006',
    'author': 'Martelli'}]}}

In [55]:
# Assignment:  explore how to convert the dict ,
# back to xml using this xmltodict module.. Hint: unparse()